A
data structure uses hashing with chaining to store
data as an array,
, of lists. An integer,
, keeps track of the
total number of items in all lists:
array<List> t; int n;The hash value of a data item
To add an element,
, to the hash table, we first check if the length
of
needs to be increased and, if so, we grow
. With
this out of the way we hash
to get an integer,
, in the range
and we append
to the list
:
bool add(T x) { if (find(x) != null) return false; if (n+1 > t.length) resize(); t[hash(x)].add(x); n++; return true; }Growing the table, if necessary, involves doubling the length of
Besides growing, the only other work done when adding
to a
involves appending
to the list
. For any of
the list implementations described in Chapters 2 or
3, this takes only constant time.
To remove an element
from the hash table we iterate over the list
until we find
so that we can remove it:
T remove(T x) { int j = hash(x); for (int i = 0; i < t[j].size(); i++) { T y = t[j].get(i); if (x == y) { t[j].remove(i); n--; return y; } } return null; }This takes
Searching for the element
in a hash table is similar. We perform
a linear search on the list
:
T find(T x) { int j = hash(x); for (int i = 0; i < t[j].size(); i++) if (x == t[j].get(i)) return t[j].get(i); return null; }Again, this takes time proportional to the length of the list
The performance of a hash table depends critically on the choice of the
hash function. A good hash function will spread the elements evenly
among the
lists, so that the expected size of the list
is
. On the other hand, a bad
hash function will hash all values (including
) to the same table
location, in which case the size of the list
will be
.
In the next section we describe a good hash function.
Multiplicative hashing is an efficient method of generating hash
values based on modular arithmetic (discussed in )
and integer division. It uses the operator, which calculates
the integral part of a quotient, while discarding the remainder.
Formally, for any integers
and
,
.
In multiplicative hashing, we use a hash table of size
for some
integer
(called the dimension). The formula for hashing an
integer
is
int hash(T x) { return ((unsigned)(z * hashCode(x))) >> (w-d); }
The following lemma, whose proof is deferred until later in this section, shows that multiplicative hashing does a good job of avoiding collisions:
With Lemma 5.1, the performance of
, and
are easy to analyze:
![]() |
![]() |
![]() |
|
![]() |
![]() |
||
![]() |
![]() |
||
![]() |
![]() |
||
![]() |
![]() |
||
![]() |
![]() |
Now, we want to prove Lemma 5.1, but first we need a
result from number theory. In the following proof, we use the notation
to denote
, where each
is a bit, either 0 or 1. In other words,
is
the integer whose binary representation is given by
.
We use
to denote a bit of unknown value.
Suppose, for the sake of contradiction, that there are two such values
and
, with
. Then
Furthermore since
and
. Since
is odd, it has no trailing 0's in its binary representation:
The utility of Lemma 5.3 comes from the following
observation: If
is chosen uniformly at random from
, then
is uniformly distributed over
. In the following proof, it helps
to think of the binary representation of
, which consists of
random bits followed by a 1.
Let be the unique odd integer such that
for some integer
. By
Lemma 5.3, the binary representation of
has
random bits, followed by a 1:
The following theorem summarizes the performance of the
data structure:
Furthermore, beginning with an empty
, any sequence of
and
operations results in a total of
time spent during all calls to
.
opendatastructures.org