A
data structure uses hashing with chaining to store
data as an array,
, of lists. An integer,
, keeps track of the
total number of items in all lists:
array<List> t; int n;The hash value of a data item
To add an element,
, to the hash table, we first check if the length
of
needs to be increased and, if so, we grow
. With
this out of the way we hash
to get an integer,
, in the range
and we append
to the list
:
bool add(T x) {
if (find(x) != null) return false;
if (n+1 > t.length) resize();
t[hash(x)].add(x);
n++;
return true;
}
Growing the table, if necessary, involves doubling the length of
).
Besides growing, the only other work done when adding
to a
involves appending
to the list
. For any of
the list implementations described in Chapters 2 or
3, this takes only constant time.
To remove an element
from the hash table we iterate over the list
until we find
so that we can remove it:
T remove(T x) {
int j = hash(x);
for (int i = 0; i < t[j].size(); i++) {
T y = t[j].get(i);
if (x == y) {
t[j].remove(i);
n--;
return y;
}
}
return null;
}
This takes
Searching for the element
in a hash table is similar. We perform
a linear search on the list
:
T find(T x) {
int j = hash(x);
for (int i = 0; i < t[j].size(); i++)
if (x == t[j].get(i))
return t[j].get(i);
return null;
}
Again, this takes time proportional to the length of the list
The performance of a hash table depends critically on the choice of the
hash function. A good hash function will spread the elements evenly
among the
lists, so that the expected size of the list
is
. On the other hand, a bad
hash function will hash all values (including
) to the same table
location, in which case the size of the list
will be
.
In the next section we describe a good hash function.
Multiplicative hashing is an efficient method of generating hash
values based on modular arithmetic (discussed in )
and integer division. It uses the
operator, which calculates
the integral part of a quotient, while discarding the remainder.
Formally, for any integers
and
,
.
In multiplicative hashing, we use a hash table of size
for some
integer
(called the dimension). The formula for hashing an
integer
is
int hash(T x) {
return ((unsigned)(z * hashCode(x))) >> (w-d);
}
The following lemma, whose proof is deferred until later in this section, shows that multiplicative hashing does a good job of avoiding collisions:
With Lemma 5.1, the performance of
, and
are easy to analyze:
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
Now, we want to prove Lemma 5.1, but first we need a
result from number theory. In the following proof, we use the notation
to denote
, where each
is a bit, either 0 or 1. In other words,
is
the integer whose binary representation is given by
.
We use
to denote a bit of unknown value.
Suppose, for the sake of contradiction, that there are two such values
and
, with
. Then
Furthermore
since
and
. Since
is odd, it has no trailing 0's in its binary representation:
), yielding a
contradiction and completing the proof.
The utility of Lemma 5.3 comes from the following
observation: If
is chosen uniformly at random from
, then
is uniformly distributed over
. In the following proof, it helps
to think of the binary representation of
, which consists of
random bits followed by a 1.
) or (
).
Let
be the unique odd integer such that
for some integer
. By
Lemma 5.3, the binary representation of
has
random bits, followed by a 1:
) or (
) is 0. If
) is 0, but the
probability of looking like (
) is
The following theorem summarizes the performance of the
data structure:
Furthermore, beginning with an empty
, any sequence of
and
operations results in a total of
time spent during all calls to
.
opendatastructures.org