5.1

A `ChainedHashTable` data structure uses hashing with chaining to store
data as an array,
, of lists. An integer,
, keeps track of the
total number of items in all lists (see Figure 5.1):

List<T>[] t; int n;The hash value of a data item , denoted is a value in the range . All items with hash value are stored in the list at . To ensure that lists don't get too long, we maintain the invariant

To add an element, , to the hash table, we first check if the length of needs to be increased and, if so, we grow . With this out of the way we hash to get an integer, , in the range , and we append to the list :

boolean add(T x) { if (find(x) != null) return false; if (n+1 > t.length) resize(); t[hash(x)].add(x); n++; return true; }Growing the table, if necessary, involves doubling the length of and reinserting all elements into the new table. This strategy is exactly the same as the one used in the implementation of

Besides growing, the only other work done when adding a new value
to a
`ChainedHashTable` involves appending
to the list
. For
any of the list implementations described in Chapters 2
or 3, this takes only constant time.

To remove an element, , from the hash table, we iterate over the list until we find so that we can remove it:

T remove(T x) { Iterator<T> it = t[hash(x)].iterator(); while (it.hasNext()) { T y = it.next(); if (y.equals(x)) { it.remove(); n--; return y; } } return null; }This takes time, where denotes the length of the list stored at .

Searching for the element in a hash table is similar. We perform a linear search on the list :

T find(Object x) { for (T y : t[hash(x)]) if (y.equals(x)) return y; return null; }Again, this takes time proportional to the length of the list .

The performance of a hash table depends critically on the choice of the hash function. A good hash function will spread the elements evenly among the lists, so that the expected size of the list is . On the other hand, a bad hash function will hash all values (including ) to the same table location, in which case the size of the list will be . In the next section we describe a good hash function.

5.1.1 Multiplicative Hashing

Multiplicative hashing is an efficient method of generating hash values based on modular arithmetic (discussed in Section 2.3) and integer division. It uses the operator, which calculates the integral part of a quotient, while discarding the remainder. Formally, for any integers and , .

In multiplicative hashing, we use a hash table of size for some integer (called the dimension). The formula for hashing an integer is

int hash(Object x) { return (z * x.hashCode()) >>> (w-d); }

The following lemma, whose proof is deferred until later in this section, shows that multiplicative hashing does a good job of avoiding collisions:

With Lemma 5.1, the performance of , and are easy to analyze:

as required.

Now, we want to prove Lemma 5.1, but first we need a result from number theory. In the following proof, we use the notation to denote , where each is a bit, either 0 or 1. In other words, is the integer whose binary representation is given by . We use to denote a bit of unknown value.

Suppose, for the sake of contradiction, that there are two such values and , with . Then

for some integer . Thinking in terms of binary numbers, we have

Furthermore , since and . Since is odd, it has no trailing 0's in its binary representation:

The utility of Lemma 5.3 comes from the following observation: If is chosen uniformly at random from , then is uniformly distributed over . In the following proof, it helps to think of the binary representation of , which consists of random bits followed by a 1.

when or

when . Therefore, we only have to bound the probability that looks like (5.2) or (5.3).

Let be the unique odd integer such that for some integer . By Lemma 5.3, the binary representation of has random bits, followed by a 1:

The following theorem summarizes the performance of a `ChainedHashTable`
data structure:

*Furthermore, beginning with an empty ChainedHashTable, any sequence of
and
operations results in a total of
time spent during all calls to
.*

- ... integer.
^{5.1} - This is true for most programming languages including C, C#, C++, and Java. Notable exceptions are Python and Ruby, in which the result of a fixed-length -bit integer operation that overflows is upgraded to a variable-length representation.