This program sorts an array of randomly
chosen Integer instances, using a custom comparator, and then prints a
word describing the order of the array. Recall that the Comparator
interface has a single method, compare, which returns a negative value
if its first argument is less than its second, zero if its two arguments are
equal, and a positive value if its first argument is greater than its second.
This program is a showcase for release 5.0 features. It uses autoboxing and
unboxing, generics, and enum types. What does it print?
import java.util.*; public class SuspiciousSort { public static void main(String[] args) { Random rnd = new Random(); Integer[] arr = new Integer[100]; for (int i = 0; i < arr.length; i++) arr[i] = rnd.nextInt(); Comparator<Integer> cmp = new Comparator<Integer>() { public int compare(Integer i1, Integer i2) { return i2 - i1; } }; Arrays.sort(arr, cmp); System.out.println(order(arr)); } enum Order { ASCENDING, DESCENDING, CONSTANT, UNORDERED }; static Order order(Integer[] a) { boolean ascending = false; boolean descending = false; for (int i = 1; i < a.length; i++) { ascending |= (a[i] > a[i-1]); descending |= (a[i] < a[i-1]); } if (ascending && !descending) return Order.ASCENDING; if (descending && !ascending) return Order.DESCENDING; if (!ascending) return Order.CONSTANT; // All elements equal return Order.UNORDERED; // Array is not sorted } }
Solution 65: A Strange Saga of a Suspicious Sort
The main
method creates an array of Integer instances, initializes it with
random values, and sorts the array using the comparator cmp. This
comparator's compare method returns its second argument minus its
first, which is positive if its second argument represents a larger value than
its first, zero if they're equal, and negative if its second argument represents
a smaller value than its first. This is the opposite of what is normally done by
the compare method, so this comparator should impose a descending
order.
After sorting the array, the main method passes it to
the static method order and prints the result returned by this method.
This method returns CONSTANT if all the elements in the array represent
equal values, ASCENDING if the second element in every adjacent pair is
greater than or equal to the first, DESCENDING if the second element in
every adjacent pair is less than or equal to the first, and UNORDERED
if none of these conditions holds. Although it is theoretically possible that
all 100 random numbers in the array are equal to one another, the odds of this
happening are infinitessimal: 1 in 232x99, which is approximately 1
in 5 x 10953. Therefore, it seems likely that the program should
print DESCENDING. If you ran it, you almost certainly saw it print
UNORDERED. Why would it do such a thing?
The order method is straightforward, and it does not
lie. The Arrays.sort method has been around for years, and it works
fine. This leaves only one place tolook for bugs: the comparator. At first
glance, it may seem unlikely that the comparator is broken. After all, it uses a
standard idiom: If you have two numbers and you want a value whose sign
indicates their order, compute their difference. This idiom has been around at
least since the early 1970s. It was commonly used in the early days of UNIX.
Unfortunately, this idiom never worked properly. Perhaps this puzzle should have
been called "The Case of the Idiotic Idiom?" The problem with this idiom is that
a fixed-width integer is not big enough to hold the difference of two arbitrary
integers of the same width. When you subtract two int or long values, the result can
overflow, in which case it will have the wrong sign. For example, consider this
program:
public class Overflow { public static void main(String[] args) { int x = -2000000000; int z = 2000000000; System.out.println(x - z); } }
Clearly, x is less than z, yet the program
prints 294967296, which is positive. Given that this comparison idiom
is broken, why is it used so commonly? Because it works most of the time. It
breaks only if the numbers to which it is applied differ by more than
Integer.MAX_VALUE. This means that for many applications, failures
won't be observed in practice. Worse, they may be observed infrequently enough
that the bug will never get found and fixed.
So what does this mean for the behavior of our program? If you
look at the Comparator documentation, you will see that the relation it
implements must be transitive. In other words,
(compare(x, y) > 0) && (compare(y, z) > 0) implies that
compare(x, z) > 0. Consider the case in which x and
z have the values in the Overflow example and y has
the value 0. Our comparator violates transitivity for these values. In
fact, it returns the wrong value for one quarter of all int pairs
chosen at random. Performing a search or sort with such a comparator or using it
to order a sorted collection can cause unspecified behavior, which is what we
observed when we ran the program. For the mathematically inclined, the general
contract of the Comparator.compare method requires that comparators
impose a total order, but this one fails to do so
on several counts.
We can fix our program by substituting a Comparator
implementation that obeys the general contract. Because we want to reverse the
natural order, we don't even have to write our own comparator. The
Collections class provides one that's made to order. If you replace the
original Arrays.sort invocation by Arrays.sort(arr,
Collections.reverseOrder()), the program will print DESCENDING as
expected.
Alternatively, you can write your own comparator. The following
code is not "clever," but it works, causing the program to print
DESCENDING as expected:
public int compare(Integer i1, Integer i2) { return (i2 < i1 ? -1 : (i2 == i1 ? 0 : 1)); }
This puzzle has
several lessons. The most specific is: Do not use a
subtraction-based comparator unless you are sure that the difference between
values will never be greater than Integer.MAX_VALUE [EJ Item
11]. More generally, beware of int overflow, as discussed in Puzzles 3, 26, and 33. Another lesson is that you
should avoid "clever" code. Strive to write clear, correct code, and do not
optimize it unless it proves necessary [EJ Item 37].
For language designers, the lesson is the same as for Puzzles 3, 26, and 33: It is perhaps worth considering
support for some from of integer arithmetic that does not overflow silently.
Also, it might be worth providing a three-valued comparator operator in the
language, as Perl does (the <=> operator).
No comments:
Post a Comment
Your comments are welcome!