5.1 ChainedHashTable: Hashing with Chaining

A ChainedHashTable data structure uses hashing with chaining to store
data as an array,
, of lists. An integer,
, keeps track of the
total number of items in all lists (see Figure 5.1):

To add an element,
, to the hash table, we first check if
the length of
needs to be increased and, if so, we grow
.
With this out of the way we hash
to get an integer,
, in the
range
, and we append
to the list
:

Growing the table,
if necessary, involves doubling the length of
and reinserting
all elements into the new table. This strategy is exactly the same
as the one used in the implementation of ArrayStack and the same
result applies: The cost of growing is only constant when amortized
over a sequence of insertions (see Lemma 2.1 on
page ).

Besides growing, the only other work done when adding a new value to a ChainedHashTable involves appending to the list . For any of the list implementations described in Chapters 2 or 3, this takes only constant time.

To remove an element,
, from the hash table, we iterate over the list
until we find
so that we can remove it:

This takes
time, where
denotes the length
of the list stored at
.

Searching for the element
in a hash table is similar. We perform
a linear search on the list
:

Again, this takes time proportional to the length of the list
.

The performance of a hash table depends critically on the choice of the hash function. A good hash function will spread the elements evenly among the lists, so that the expected size of the list is . On the other hand, a bad hash function will hash all values (including ) to the same table location, in which case the size of the list will be . In the next section we describe a good hash function.

5.1.1 Multiplicative Hashing

Multiplicative hashing is an efficient method of generating hash values based on modular arithmetic (discussed in Section 2.3) and integer division. It uses the operator, which calculates the integral part of a quotient, while discarding the remainder. Formally, for any integers and , .

In multiplicative hashing, we use a hash table of size for some integer (called the dimension). The formula for hashing an integer is

The following lemma, whose proof is deferred until later in this section, shows that multiplicative hashing does a good job of avoiding collisions:

With Lemma 5.1, the performance of , and are easy to analyze:

as required.

Now, we want to prove Lemma 5.1, but first we need a result from number theory. In the following proof, we use the notation to denote , where each is a bit, either 0 or 1. In other words, is the integer whose binary representation is given by . We use to denote a bit of unknown value.

Suppose, for the sake of contradiction, that there are two such values and , with . Then

for some integer . Thinking in terms of binary numbers, we have

Furthermore , since and . Since is odd, it has no trailing 0's in its binary representation:

The utility of Lemma 5.3 comes from the following observation: If is chosen uniformly at random from , then is uniformly distributed over . In the following proof, it helps to think of the binary representation of , which consists of random bits followed by a 1.

when or

when . Therefore, we only have to bound the probability that looks like (5.2) or (5.3).

Let be the unique odd integer such that for some integer . By Lemma 5.3, the binary representation of has random bits, followed by a 1:

The following theorem summarizes the performance of a ChainedHashTable data structure:

*Furthermore, beginning with an empty ChainedHashTable, any sequence of
and
operations results in a total of
time spent during all calls to
.*

- ... integer.
^{5.1} - This is true for most programming languages including C, C#, C++, and Java. Notable exceptions are Python and Ruby, in which the result of a fixed-length -bit integer operation that overflows is upgraded to a variable-length representation.