Tuesday, 18 September 2012

Puzzle 52: Sum Fun


This program computes and caches a sum in one class and prints it from another. What does the program print? Here's a hint: As you may recall from algebra, the sum of the integers from 1 to n is n(n + 1) / 2.
class Cache {

    static {

        initializeIfNecessary();

    }

   

    private static int sum;

    public static int getSum() {

        initializeIfNecessary();

        return sum;

    }



    private static boolean initialized = false;

    private static synchronized void initializeIfNecessary() {

        if (!initialized) {

            for (int i = 0; i < 100; i++)

                sum += i;

            initialized = true;

        }

    }

}



public class Client {

    public static void main(String[] args) {

        System.out.println(Cache.getSum());

    }

}


Solution 52: Sum Fun

On cursory inspection, you might think that this program adds the numbers from 1 to 100, but it doesn't. Take a closer look at the loop. It is the typical half-open loop, so it goes from 0 to 99. With that in mind, you might think that the program prints the sum of the numbers from 0 to 99. Using the formula from the hint, this sum is 99 x 100 / 2, or 4,950. The program, however, thinks otherwise. It prints 9900, fully twice this value. What accounts for its enthusiasm?
The author of the program obviously went to a lot of trouble to make sure that sum was initialized before use. The program combines lazy and eager initialization and even uses synchronization to make sure that the cache works in the presence of multiple threads. It seems that this program has all the bases covered, yet it doesn't work. What's the matter with it?
Like the program in Puzzle 49, this program suffers from a class initialization ordering problem. To understand its behavior, let's trace its execution. Before it can invoke Client.main, the VM must initialize the class Client. This initialization is so simple that it isn't worth talking about. The Client.main method invokes Cache.getSum. Before the getSum method can be executed, the VM must initialize the class Cache.
Recall that class initialization executes static initializers in the order they appear in the source. The Cache class has two static initializers: the static block at the top of the class and the initialization of the static field initialized. The block appears first. It invokes the method initializeIfNecessary, which tests the field initialized. Because no value has been assigned to this field, it has the default boolean value of false. Similarly, sum has the default int value of 0. Therefore, the initializeIfNecessary method does what you'd expect, adding 4,950 to sum and setting initialized to true.
After the static block executes, the static initializer for the initialized field sets it back to false, completing the class initialization of Cache. Unfortunately, sum now contains the correct cached value, but initialized contains false: The Cache class's two pieces of critical state are out of sync.
The main method in the Client class then invokes Cache.getSum, which in turn invokes initializeIfNecessary. Because the initialized flag is false, the initializeIfNecessary method enters its loop, which adds another 4,950 to the value of sum, increasing its value to 9,900. The getSum method returns this value, and the program prints it.
Clearly, the author of this program didn't think about the order in which the initialization of the Cache class would take place. Unable to decide between eager and lazy initialization, the author tried to do both, resulting in a big mess. Use either eager initialization or lazy initialization, never both.
If the time and space cost to initialize a field is low or the field is required in every execution of the program, eager initialization is appropriate. If the cost is high and the field might not be required in some executions, lazy initialization may be preferable [EJ Item 48]. Also, lazy initialization may be necessary to break a cycle in class or instance initialization (Puzzle 51).
The Cache class could be repaired either by reordering the static initializations so the initialized field was not reset to false after sum was initialized or by removing the explicit static initialization of the initialized field. Although the resulting programs would work, they would still be confusing and ill-structured. The Cache class should be rewritten to use eager initialization. The resulting version is obviously correct and much simpler than the original. With this version of the Cache class, the program prints 4950 as expected:
class Cache {

    private static final int sum = computeSum();



    private static int computeSum() {

        int result = 0;

        for (int i = 0; i < 100; i++)

            result += i;

        return result;

    }



    public static int getSum() {

        return sum;

    }

}


Note that we use a helper method to initialize sum. A helper method is generally preferable to a static block, as it lets you name the computation. In the rare cases when you must use a static block to initialize a static field, put the block immediately after the field declaration. This enhances clarity and eliminates the possibility of static initialization competing with a static block, as in the original program.
In summary, think about class initialization order, especially when it is nontrivial. Do your best to keep the class initialization sequence simple. Use eager initialization unless you have some good reason to use lazy initialization, such as performance or the need to break a cycle in initialization.

No comments:

Post a Comment

Your comments are welcome!