The following shuffle method purports to shuffle its
input array fairly. In other words, it purports to generate all permutations
with equal likelihood, assuming that the underlying pseudorandom number
generator is fair. Does it make good on its promise? If not, how do you fix
it?
import java.util.Random; public class Shuffle { private static Random rnd = new Random(); public static void shuffle(Object[] a) { for (int i = 0; i < a.length; i++) swap(a, i, rnd.nextInt(a.length)); } private static void swap(Object[] a, int i, int j) { Object tmp = a[i]; a[i] = a[j]; a[j] = tmp; } }
Solution 94: Lost in the Shuffle
Looking at the shuffle method,
there's nothing obviously wrong with it. It iterates through the array, swapping
a randomly chosen element from the array into each location. That ought to
shuffle the array fairly, right? Wrong. There's a big difference between saying
"There's nothing obviously wrong with this code" and "There's obviously nothing
wrong with this code." In this case, there's something very wrong, but it isn't
obvious unless you specialize in algorithms.
If you call the shuffle method on an array of length
n, the loop iterates n times. In each iteration, the method chooses one of
the n integers between 0 and n - 1. Therefore, there are nn possible executions of the method. We
assumed the random number generator is fair, so each execution occurs with equal
likelihood. Each execution generates one permutation of the array. There is,
however, one small problem: There are n! distinct
permutations of an array of length n. (The
exclamation point after n indicates the factorial operation: n
factorial is defined as n x (n - 1) x (n - 2) x ...x
1.) The problem is that nn is not
divisible by n! for any n greater than 2, because n! has every prime factor from 2 through n, but nn has
only the prime factors that make up n. This
proves beyond a shadow of a doubt that the shuffle method generates
some permutations more often than others.
To make this concrete, let's consider an array of length 3
containing the strings "a", "b", and "c". There are
33 = 27 possible executions of the shuffle method. All are
equally likely, and each generates some permutation. There are 3! = 6 distinct
permutations of the array: {"a", "b", "c"}, {"a", "c", "b"}, {"b",
"a", "c"}, {"b", "c", "a"}, {"c", "a", "b"}, and {"c", "b", "a"}. Because
27 is not divisible by 6, some of these permutations must be generated by more
executions than others, so the shuffle method is not fair.
One problem with this proof is that it offers no intuition into
the bias induced by the method; it merely proves that a bias exists. Often the
best way to gain some insight is to perform an experiment. We ran a program that
calculates the expected value of the element at each position when the method is
run on the "identity array," where a[i] = i. Loosely
speaking, the expected value is the average value that you'll see in the element
if you run the shuffle method repeatedly. If the shuffle
method were fair, the expected value would be the same for each element: ((n -1 ) / 2). Figure
10.1 shows the expected value for each element in an array of length 9. Note
the distinctive shape of the graph: It starts low, increases beyond the fair
value (4), and settles down to the fair value in the last element.
Figure 10.1. Expected values for the shuffle method on the identity array.
Why does the graph
have this shape? We don't know all the details but we can offer some intuition.
Let's restrict our attention to the array's first element. After the first
iteration of the loop, it has the correct expected value of (n - 1) / 2. In the second iteration, however, there is
1 chance in n that the random number generator
will return 0 and the value in the first element will be replaced by 1 or 0. In
other words, the second iteration systematically reduces the expected value of
the first element. In the third iteration, there's a further 1 chance in n that the first element is replaced by 2, 1, or 0, and
so on. For the first n / 2 iterations of the
loop, the expected value of the first element decreases. For the second n / 2 iterations, it increases but never catches up to
its fair value. Note that the last element in the array is guaranteed to have
the correct expected value, as the last step in the execution of the method is
to select it at random from all the elements in the array.
OK, so our shuffle method is broken. How do we fix it?
Use the shuffle method provided by the
library:
import java.util.*; public class Shuffle { public static void shuffle(Object[] a) { Collections.shuffle(Arrays.asList(a)); } }
Whenever the libraries provide a
method that does what you need, use it [EJ Item 30]. Generally speaking,
the libraries provides high-quality solutions requiring a minimum of effort on
your part.
On the other hand, after you suffered
through all that math, it seems unfair not to tell you how to fix the broken
shuffle method. The fix is actually quite straightforward. In the body
of the loop, swap the current element with an element selected at random from
the portion of the array starting at the current element and extending to the
end of the array. Do not touch an element once you've swapped a value into it.
This is essentially the algorithm that is used by the library method:
public static void shuffle(Object[] a) { for (int i = 0; i < a.length; i++) swap(a, i, i + rnd.nextInt(a.length - i)); }
It's easy to prove this method fair by induction. For the base
case, observe that it is trivially fair for an array of length 0. For the
induction step, if you apply the method to an array of length n > 0, it correctly selects a random element for the
zeroth position of the result array. Then, it iterates over the remainder of the
array: At each position, it selects an element chosen at random from the
"subarray" beginning at that position and extending to the end of the original
array. But that is exactly what the method would
do if it were applied directly to the subarray of length n - 1, starting at position 1 in the original array.
This completes the proof. It also suggests a recursive formulation of the
shuffle method, whose details are left as an exercise for the
reader.
You might think that that's all there is to this story, but
there's one final chapter. Do you suppose that the fixed shuffle method
generates all permutations of a 52-card deck, represented as a 52-element array,
with equal likelihood? After all, we just proved
it fair. It probably won't surprise you at this point that the answer is an
emphatic no. The problem is that we assumed, way back at the start of the
puzzle, that "the underlying pseudorandom number generator is fair." It
isn't.
The random number generator, java.util.Random, takes a
64-bit seed, and the sequence of numbers it generates is fully determined by
that seed. There are 52! permutations of a 52-card deck, but only 264
seeds. What fraction of the permutations does that cover? Would you believe 2.3
x 10-47 percent? That is a polite way of saying "practically none."
If you use java.security.SecureRandom in place of
java.util.Random, you get a 160-bit seed, but that buys you
surprisingly little: The shuffle method still fails to return some
permutations for arrays with more than 40 elements (because 40! >
2160). For a 52-element array, you still get only 1.8 x
10-18 percent of the possible permutations.
Does that mean that you shouldn't trust these pseudorandom
number generators for shuffling cards? It depends. They generate a negligible
fraction of the possible permutations, but they have no
systematic bias that we're aware of. It seems fair to say that these generators
are good enough for casual use. If you need a state-of-the-art random number
generator, you'll have to look elsewhere.
In summary, shuffling an array, like many algorithms, is
tricky. It's easy to get it wrong and hard to tell that you did. All other
things being equal, you should use trusted libraries in preference to
handwritten code. If you want to learn more about the issues discussed in this
puzzle, see [Knuth98 3.4.2].
No comments:
Post a Comment
Your comments are welcome!