This program consists of a simple immutable class that
represents a name, with a main method that puts a name into a set and
checks whether the set contains the name. What does the program print?
import java.util.*;
public class Name {
private final String first, last;
public Name(String first, String last) {
this.first = first;
this.last = last;
}
public boolean equals(Object o) {
if (!(o instanceof Name))
return false;
Name n = (Name)o;
return n.first.equals(first) && n.last.equals(last);
}
public static void main(String[] args) {
Set<Name> s = new HashSet<Name>();
s.add(new Name("Mickey", "Mouse"));
System.out.println(
s.contains(new Name("Mickey", "Mouse")));
}
}
Solution 57: What's in a Name?
A Name
instance consists of a first name and a last name. Two Name instances
are equal, as computed by the equals method, if their first names are
equal and their last names are equal. First names and last names are compared
using the equals method defined in String. Two strings are
equal if they consist of the same characters in the same order. Therefore, two
Name instances are equal if they represent the same name. For example,
the following method invocation returns true:
new Name("Mickey", "Mouse").equals(new Name("Mickey", "Mouse"))
The main method of the program creates two
Name instances, both representing Mickey Mouse. The program puts the
first instance into a hash set and then checks whether the set contains the
second. The two Name instances are equal, so it might seem that the
program should print true. If you ran it, it almost certainly printed
false. What is wrong with the program?
The bug is that Name violates the hashCode
contract. This might seem strange, as Name doesn't even have a
hashCode method, but that is precisely the problem. The Name
class overrides the equals method, and the hashCode contract
demands that equal objects have equal hash codes. To fulfill this contract,
you must override hashCode whenever you
override equals [EJ Item 8].
Because it fails to override hashCode, the
Name class inherits its hashCode implementation from
Object. This implementation returns an identity-based hash code. In other words, distinct
objects are likely to have unequal hash values, even if they are equal.
Name does not fulfill the hashCode contract, so the behavior
of a hash set containing Name elements is unspecified.
When the program puts the first Name instance into the
hash set, the set puts an entry for this instance into a hash bucket. The set
chooses the hash bucket based on the hash value of the instance, as computed by
its hashCode method. When it checks whether the second Name
instance is contained in the hash set, the program chooses which bucket to
search based on the hash value of the second instance. Because the second
instance is distinct from the first, it is likely to have a different hash
value. If the two hash values map to different buckets, the contains
method will return false: The beloved rodent is in the hash set, but
the set can't find him.
Suppose that the two Name instances map to the same
bucket. Then what? All HashSet implementations that we know of have an
optimization in which each entry stores the hash value of its element in
addition to the element itself. When searching for an element, the
implementation selects the appropriate hash bucket and traverses its entries,
comparing the hash value stored in each entry with the hash value of the desired
element. Only if the two hash values are equal does the implementation check the
elements for equality. This optimization makes sense because it is usually much
cheaper to compare hash codes than elements.
Because of this optimization, it is not enough for the hash set
to search in the right bucket; the two Name instances must have equal
hash values in order for the hash set to recognize them as equal. The odds that
the program prints TRue are therefore the odds that two consecutively
created objects have the same identity hash code. A quick experiment showed
these odds to be about one in 25,000,000. Results may vary depending on which
Java implementation is used, but you are highly unlikely to see the program
print TRue on any JRE we know of.
To fix the problem, simply add an appropriate hashCode
method to the Name class. Although any method whose return value is
determined solely by the first and last name will satisfy the contract, a
high-quality hash function should attempt to return different hash values for
different names. The following method will do nicely [EJ Item 8]. Once this
method is added, the program will print true as expected:
public int hashCode() {
return 37 * first.hashCode() + last.hashCode();
}
In summary, always override hashCode when you override
equals. More generally, obey the general contract when you override a
method that has one. This is an issue for most of the non-final methods declared
in Object [EJ
Chapter 3]. Failure to follow this advice can result in arbitrary, unspecified
behavior.
No comments:
Post a Comment
Your comments are welcome!