Wednesday, 19 September 2012

Puzzle 65: A Strange Saga of a Suspicious Sort


This program sorts an array of randomly chosen Integer instances, using a custom comparator, and then prints a word describing the order of the array. Recall that the Comparator interface has a single method, compare, which returns a negative value if its first argument is less than its second, zero if its two arguments are equal, and a positive value if its first argument is greater than its second. This program is a showcase for release 5.0 features. It uses autoboxing and unboxing, generics, and enum types. What does it print?
import java.util.*;



public class SuspiciousSort {

    public static void main(String[] args) {

        Random rnd = new Random();

        Integer[] arr = new Integer[100];



        for (int i = 0; i < arr.length; i++)

            arr[i] = rnd.nextInt();



        Comparator<Integer> cmp = new Comparator<Integer>() {

            public int compare(Integer i1, Integer i2) {

                return i2 - i1;

            }

        };

        Arrays.sort(arr, cmp);

        System.out.println(order(arr));

    }



    enum Order { ASCENDING, DESCENDING, CONSTANT, UNORDERED };



    static Order order(Integer[] a) {

        boolean ascending = false;

        boolean descending = false;



        for (int i = 1; i < a.length; i++) {

            ascending  |= (a[i] > a[i-1]);

            descending |= (a[i] < a[i-1]);

        }

 

        if (ascending  && !descending)

            return Order.ASCENDING;

        if (descending && !ascending)

            return Order.DESCENDING;

        if (!ascending)

            return Order.CONSTANT;   // All elements equal

        return Order.UNORDERED;      // Array is not sorted

    }

}


Solution 65: A Strange Saga of a Suspicious Sort

The main method creates an array of Integer instances, initializes it with random values, and sorts the array using the comparator cmp. This comparator's compare method returns its second argument minus its first, which is positive if its second argument represents a larger value than its first, zero if they're equal, and negative if its second argument represents a smaller value than its first. This is the opposite of what is normally done by the compare method, so this comparator should impose a descending order.
After sorting the array, the main method passes it to the static method order and prints the result returned by this method. This method returns CONSTANT if all the elements in the array represent equal values, ASCENDING if the second element in every adjacent pair is greater than or equal to the first, DESCENDING if the second element in every adjacent pair is less than or equal to the first, and UNORDERED if none of these conditions holds. Although it is theoretically possible that all 100 random numbers in the array are equal to one another, the odds of this happening are infinitessimal: 1 in 232x99, which is approximately 1 in 5 x 10953. Therefore, it seems likely that the program should print DESCENDING. If you ran it, you almost certainly saw it print UNORDERED. Why would it do such a thing?
The order method is straightforward, and it does not lie. The Arrays.sort method has been around for years, and it works fine. This leaves only one place tolook for bugs: the comparator. At first glance, it may seem unlikely that the comparator is broken. After all, it uses a standard idiom: If you have two numbers and you want a value whose sign indicates their order, compute their difference. This idiom has been around at least since the early 1970s. It was commonly used in the early days of UNIX. Unfortunately, this idiom never worked properly. Perhaps this puzzle should have been called "The Case of the Idiotic Idiom?" The problem with this idiom is that a fixed-width integer is not big enough to hold the difference of two arbitrary integers of the same width. When you subtract two int or long values, the result can overflow, in which case it will have the wrong sign. For example, consider this program:
public class Overflow {

    public static void main(String[] args) {

        int x = -2000000000;

        int z = 2000000000;

        System.out.println(x - z);

    }

}


Clearly, x is less than z, yet the program prints 294967296, which is positive. Given that this comparison idiom is broken, why is it used so commonly? Because it works most of the time. It breaks only if the numbers to which it is applied differ by more than Integer.MAX_VALUE. This means that for many applications, failures won't be observed in practice. Worse, they may be observed infrequently enough that the bug will never get found and fixed.
So what does this mean for the behavior of our program? If you look at the Comparator documentation, you will see that the relation it implements must be transitive. In other words, (compare(x, y) > 0) && (compare(y, z) > 0) implies that compare(x, z) > 0. Consider the case in which x and z have the values in the Overflow example and y has the value 0. Our comparator violates transitivity for these values. In fact, it returns the wrong value for one quarter of all int pairs chosen at random. Performing a search or sort with such a comparator or using it to order a sorted collection can cause unspecified behavior, which is what we observed when we ran the program. For the mathematically inclined, the general contract of the Comparator.compare method requires that comparators impose a total order, but this one fails to do so on several counts.
We can fix our program by substituting a Comparator implementation that obeys the general contract. Because we want to reverse the natural order, we don't even have to write our own comparator. The Collections class provides one that's made to order. If you replace the original Arrays.sort invocation by Arrays.sort(arr, Collections.reverseOrder()), the program will print DESCENDING as expected.
Alternatively, you can write your own comparator. The following code is not "clever," but it works, causing the program to print DESCENDING as expected:
public int compare(Integer i1, Integer i2) {

    return (i2 < i1 ? -1 : (i2 == i1 ? 0 : 1));

}


This puzzle has several lessons. The most specific is: Do not use a subtraction-based comparator unless you are sure that the difference between values will never be greater than Integer.MAX_VALUE [EJ Item 11]. More generally, beware of int overflow, as discussed in Puzzles 3, 26, and 33. Another lesson is that you should avoid "clever" code. Strive to write clear, correct code, and do not optimize it unless it proves necessary [EJ Item 37].
For language designers, the lesson is the same as for Puzzles 3, 26, and 33: It is perhaps worth considering support for some from of integer arithmetic that does not overflow silently. Also, it might be worth providing a three-valued comparator operator in the language, as Perl does (the <=> operator).

No comments:

Post a Comment

Your comments are welcome!