5.1 ChainedHashTable: Hashing with Chaining

A ChainedHashTable data structure uses hashing with chaining to store data as an array, $\ensuremath{\ensuremath{\mathit{t}}}$ , of lists. An integer, $\ensuremath{\ensuremath{\mathit{n}}}$ , keeps track of the total number of items in all lists (see Figure 5.1):
$\begin{leftbar} \begin{flushleft} \hspace*{1em} \ensuremath{\mathrm{initialize}(... ...th{\ensuremath{\mathit{n}} \gets \ensuremath{0}}\\ \end{flushleft}\end{leftbar}$

**Figure 5.1:** An example of a ChainedHashTable with $\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}=14$ and $\ensuremath{\ensuremath{\mathrm{length}(\ensuremath{\mathit{t}})}}=16$ . In this example $\ensuremath{\ensuremath{\mathrm{hash}(\ensuremath{\mathit{x}})}}=6$
$\includegraphics[width=\textwidth ]{figs-python/chainedhashtable}$

To add an element, $\ensuremath{\ensuremath{\mathit{x}}}$ , to the hash table, we first check if the length of $\ensuremath{\ensuremath{\mathit{t}}}$ needs to be increased and, if so, we grow $\ensuremath{\ensuremath{\mathit{t}}}$ . With this out of the way we hash $\ensuremath{\ensuremath{\mathit{x}}}$ to get an integer, $\ensuremath{\ensuremath{\mathit{i}}}$ , in the range $\{0,\ldots,\ensuremath{\ensuremath{\mathrm{length}(\ensuremath{\mathit{t}})}}-1\}$ , and we append $\ensuremath{\ensuremath{\mathit{x}}}$ to the list $\ensuremath{\ensuremath{\mathit{t}}[\ensuremath{\mathit{i}}]}$ :
$\begin{leftbar} \begin{flushleft} \hspace*{1em} \ensuremath{\mathrm{add}(\ensure... ...return}} \ensuremath{\ensuremath{\mathit{true}}}\\ \end{flushleft}\end{leftbar}$
Growing the table, if necessary, involves doubling the length of $\ensuremath{\ensuremath{\mathit{t}}}$ and reinserting all elements into the new table. This strategy is exactly the same as the one used in the implementation of ArrayStack and the same result applies: The cost of growing is only constant when amortized over a sequence of insertions (see Lemma 2.1 on page

Besides growing, the only other work done when adding a new value $\ensuremath{\ensuremath{\mathit{x}}}$ to a ChainedHashTable involves appending $\ensuremath{\ensuremath{\mathit{x}}}$ to the list $\ensuremath{\ensuremath{\mathit{t}}[\mathrm{hash}(\ensuremath{\mathit{x}})]}$ . For any of the list implementations described in Chapters 2 or 3, this takes only constant time.

To remove an element, $\ensuremath{\ensuremath{\mathit{x}}}$ , from the hash table, we iterate over the list $\ensuremath{\ensuremath{\mathit{t}}[\mathrm{hash}(\ensuremath{\mathit{x}})]}$ until we find $\ensuremath{\ensuremath{\mathit{x}}}$ so that we can remove it:
$\begin{leftbar} \begin{flushleft} \hspace*{1em} \ensuremath{\mathrm{remove}(\ens... ...\color{black} \textbf{return}} \ensuremath{nil} \\ \end{flushleft}\end{leftbar}$
This takes $O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}_{\ensuremath{\ensuremath{\mathrm{hash}(\ensuremath{\mathit{x}})}}})$ time, where $\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}_{\ensuremath{\ensuremath{\ensuremath{\mathit{i}}}}}$ denotes the length of the list stored at $\ensuremath{\ensuremath{\mathit{t}}[\ensuremath{\mathit{i}}]}$ .

Searching for the element $\ensuremath{\ensuremath{\mathit{x}}}$ in a hash table is similar. We perform a linear search on the list $\ensuremath{\ensuremath{\mathit{t}}[\mathrm{hash}(\ensuremath{\mathit{x}})]}$ :
$\begin{leftbar} \begin{flushleft} \hspace*{1em} \ensuremath{\mathrm{find}(\ensur... ...{return}} \ensuremath{\ensuremath{\mathit{nil}}}\\ \end{flushleft}\end{leftbar}$
Again, this takes time proportional to the length of the list $\ensuremath{\ensuremath{\mathit{t}}[\mathrm{hash}(\ensuremath{\mathit{x}})]}$ .

The performance of a hash table depends critically on the choice of the hash function. A good hash function will spread the elements evenly among the $\ensuremath{\mathrm{length}(\ensuremath{\mathit{t}})}$ lists, so that the expected size of the list $\ensuremath{\ensuremath{\mathit{t}}[\mathrm{hash}(\ensuremath{\mathit{x}})]}$ is $O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}/\ensuremath{\ensuremath{\mathrm{length}(\ensuremath{\mathit{t}})})} = O(1)$ . On the other hand, a bad hash function will hash all values (including $\ensuremath{\ensuremath{\mathit{x}}}$ ) to the same table location, in which case the size of the list $\ensuremath{\ensuremath{\mathit{t}}[\mathrm{hash}(\ensuremath{\mathit{x}})]}$ will be $\ensuremath{\ensuremath{\mathit{n}}}$ . In the next section we describe a good hash function.

5.1.1 Multiplicative Hashing

Multiplicative hashing is an efficient method of generating hash values based on modular arithmetic (discussed in Section 2.3) and integer division. It uses the $\ddiv$ operator, which calculates the integral part of a quotient, while discarding the remainder. Formally, for any integers $a\ge 0$ and $b\ge 1$ , $a\ddiv b = \lfloor a/b\rfloor$ .

In multiplicative hashing, we use a hash table of size $2^{\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}}$ for some integer $\ensuremath{\ensuremath{\mathit{d}}}$ (called the dimension). The formula for hashing an integer $\ensuremath{\ensuremath{\ensuremath{\mathit{x}}}}\in\{0,\ldots,2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}-1\}$ is

The following lemma, whose proof is deferred until later in this section, shows that multiplicative hashing does a good job of avoiding collisions:

**Figure 5.2:** The operation of the multiplicative hash function with $\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}=32$ and $\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}=8$ .
$\begin{figure}\begin{center} \resizebox{.98\textwidth}{!}{ \setlength{\arrayru... ... \end{tabular}} \setlength{\arrayrulewidth}{.4pt} \end{center} \end{figure}$

With Lemma 5.1, the performance of $\ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ , and $\ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ are easy to analyze:

Proof. Let

be the (multi-)set of elements stored in the hash table that are not equal to $\ensuremath{\ensuremath{\mathit{x}}}$ . For an element $\ensuremath{\ensuremath{\ensuremath{\mathit{y}}}}\in S$ , define the indicator variable

$\displaystyle I_{\ensuremath{\ensuremath{\ensuremath{\mathit{y}}}}} = \left\{\b... ...h}(\ensuremath{\mathit{y}})}}$} \\ 0 & \mbox{otherwise} \end{array}\right.$

and notice that, by Lemma 5.1, $\mathrm{E}[I_{\ensuremath{\ensuremath{\ensuremath{\mathit{y}}}}}] \le 2/2^{\e... ...hit{d}}}}}=2/\ensuremath{\ensuremath{\mathrm{length}(\ensuremath{\mathit{t}})}}$ . The expected length of the list $\ensuremath{\ensuremath{\mathit{t}}[\mathrm{hash}(\ensuremath{\mathit{x}})]}$ is given by

$\displaystyle \mathrm{E}\left[\ensuremath{\ensuremath{\ensuremath{\mathit{t}}[\mathrm{hash}(\ensuremath{\mathit{x}})].\mathrm{size}()}}\right]$	$\displaystyle =$	$\displaystyle \mathrm{E}\left[\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}... ...mathit{y}}}}\in S} I_{\ensuremath{\ensuremath{\ensuremath{\mathit{y}}}}}\right]$
	$\displaystyle =$	$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}_{\ensuremath{\e... ...{y}}}}\in S} \mathrm{E}[I_{\ensuremath{\ensuremath{\ensuremath{\mathit{y}}}}} ]$
	$\displaystyle \le$	$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}_{\ensuremath{\e... ...}}}}\in S} 2/\ensuremath{\ensuremath{\mathrm{length}(\ensuremath{\mathit{t}})}}$
	$\displaystyle \le$	$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}_{\ensuremath{\e... ...uremath{\mathit{y}}}}\in S} 2/\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$
	$\displaystyle \le$	$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}_{\ensuremath{\e... ...{\ensuremath{\mathit{x}}}}})2/\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$
	$\displaystyle \le$	$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}_{\ensuremath{\ensuremath{\ensuremath{\mathit{x}}}}} + 2 \enspace ,$

as required. $\qedsymbol$

Now, we want to prove Lemma 5.1, but first we need a result from number theory. In the following proof, we use the notation $(b_r,\ldots,b_0)_2$ to denote $\sum_{i=0}^r b_i2^i$ , where each

is a bit, either 0 or 1. In other words, $(b_r,\ldots,b_0)_2$ is the integer whose binary representation is given by $b_r,\ldots,b_0$ . We use $\star$ to denote a bit of unknown value.

Proof. Since the number of choices for $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}$ and

is the same, it is sufficient to prove that there is at most one value $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}\in S$ that satisfies $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}q\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}} = i$ .

Suppose, for the sake of contradiction, that there are two such values $\ensuremath{\ensuremath{\mathit{z}}}$ and $\ensuremath{z}'$ , with $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}>\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}'$ . Then

$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}q\bmod 2^{\ensur... ...athit{z}}}}'q \bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}} = i$

$\displaystyle (\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}-\ensuremath{\e... ...thit{z}}}}')q\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}} = 0$

But this means that

$\displaystyle (\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}-\ensuremath{\e... ...math{\mathit{z}}}}')q = k 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$

(5.1)

for some integer

. Thinking in terms of binary numbers, we have

$\displaystyle (\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}-\ensuremath{\e... ...0,\ldots,0}_{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}})_2 \enspace ,$

so that the $\ensuremath{\ensuremath{\mathit{w}}}$ trailing bits in the binary representation of $(\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}')q$ are all 0's.

Furthermore $k\neq 0$ , since $q\neq 0$ and $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}'\neq 0$ . Since is odd, it has no trailing 0's in its binary representation:

$\displaystyle q = (\star,\ldots,\star,1)_2 \enspace .$

Since $\vert\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}-\ensuremath{\ensuremath... ...ath{\mathit{z}}}}'\vert < 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$ , $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}'$ has fewer than $\ensuremath{\ensuremath{\mathit{w}}}$ trailing 0's in its binary representation:

$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}-\ensuremath{\en... ...\ldots,0}_{<\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}})_2 \enspace .$

Therefore, the product $(\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}')q$ has fewer than $\ensuremath{\ensuremath{\mathit{w}}}$ trailing 0's in its binary representation:

$\displaystyle (\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}-\ensuremath{\e... ...ldots,0}_{<\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}})_2 \enspace .$

Therefore $(\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}')q$ cannot satisfy (5.1), yielding a contradiction and completing the proof. $\qedsymbol$

The utility of Lemma 5.3 comes from the following observation: If $\ensuremath{\ensuremath{\mathit{z}}}$ is chosen uniformly at random from

, then $\ensuremath{\ensuremath{\mathit{zt}}}$ is uniformly distributed over

. In the following proof, it helps to think of the binary representation of $\ensuremath{\ensuremath{\mathit{z}}}$ , which consists of $\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}-1$ random bits followed by a 1.

Proof. [Proof of Lemma 5.1] First we note that the condition $\ensuremath{\ensuremath{\mathrm{hash}(\ensuremath{\mathit{x}})}}=\ensuremath{\ensuremath{\mathrm{hash}(\ensuremath{\mathit{y}})}}$ is equivalent to the statement ``the highest-order $\ensuremath{\ensuremath{\mathit{d}}}$ bits of $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}} \ensuremath{\ensuremath{\ensuremath{\mathit{x}}}}\bmod2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$ and the highest-order $\ensuremath{\ensuremath{\mathit{d}}}$ bits of $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}} \ensuremath{\ensuremath{\ensuremath{\mathit{y}}}}\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$ are the same.'' A necessary condition of that statement is that the highest-order $\ensuremath{\ensuremath{\mathit{d}}}$ bits in the binary representation of $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}(\ensuremath{\ensuremath{\ens... ...emath{\mathit{y}}}})\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$ are either all 0's or all 1's. That is,

$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}(\ensuremath{\en... ...\ensuremath{\mathit{w}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}})_2$

(5.2)

when $\ensuremath{\ensuremath{\ensuremath{\mathit{zx}}}}\bmod 2^{\ensuremath{\ensure... ...emath{\mathit{zy}}}}\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$ or

$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}(\ensuremath{\en... ...{\mathit{w}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}})_2 \enspace .$

(5.3)

when $\ensuremath{\ensuremath{\ensuremath{\mathit{zx}}}}\bmod 2^{\ensuremath{\ensure... ...emath{\mathit{zy}}}}\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$ . Therefore, we only have to bound the probability that $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}(\ensuremath{\ensuremath{\ens... ...emath{\mathit{y}}}})\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$ looks like (5.2) or (5.3).

Let be the unique odd integer such that $(\ensuremath{\ensuremath{\ensuremath{\mathit{x}}}}-\ensuremath{\ensuremath{\en... ...\mathit{y}}}})\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}=q2^r$ for some integer $r\ge 0$ . By Lemma 5.3, the binary representation of $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}q\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$ has $\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}-1$ random bits, followed by a 1:

$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}q\bmod 2^{\ensur... ...}-1},\ldots,b_{1}}_{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}-1},1)_2$

Therefore, the binary representation of $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}(\ensuremath{\ensuremath{\ens... ...th{\mathit{z}}}}q2^r\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$ has $\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}-r-1$ random bits, followed by a 1, followed by

0's:

$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}(\ensuremath{\en... ...\ensuremath{\ensuremath{\mathit{w}}}}-r-1},1,\underbrace{0,0,\ldots,0}_{r})_2$

We can now finish the proof: If $r > \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}$ , then the $\ensuremath{\ensuremath{\mathit{d}}}$ higher order bits of $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}(\ensuremath{\ensuremath{\ens... ...emath{\mathit{y}}}})\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$ contain both 0's and 1's, so the probability that $\ensuremath{\ensuremath{\ensuremath{\mathit{z}}}}(\ensuremath{\ensuremath{\ens... ...emath{\mathit{y}}}})\bmod 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$ looks like (5.2) or (5.3) is 0. If $\ensuremath{\ensuremath{\ensuremath{\mathit{r}}}}=\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}$ , then the probability of looking like (5.2) is 0, but the probability of looking like (5.3) is $1/2^{\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}-1}=2/2^{\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}}$ (since we must have $b_1,\ldots,b_{d-1}=1,\ldots,1$ ). If $r < \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}$ , then we must have $b_{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}-r-1},\ldots,b_{\ensuremat... ...h{\mathit{w}}}}-r-\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}}=0,\ldots,0$ or $b_{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}-r-1},\ldots,b_{\ensuremat... ...h{\mathit{w}}}}-r-\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}}=1,\ldots,1$ . The probability of each of these cases is $1/2^{\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}}$ and they are mutually exclusive, so the probability of either of these cases is $2/2^{\ensuremath{\ensuremath{\ensuremath{\mathit{d}}}}}$ . This completes the proof. $\qedsymbol$

5.1.2 Summary

The following theorem summarizes the performance of a ChainedHashTable data structure:

5.1 ChainedHashTable: Hashing with Chaining

5.1.1 Multiplicative Hashing

5.1.2 Summary

Footnotes