In this chapter, we discuss a beautiful data structure: the skiplist,
which has a variety of applications. Using a skiplist we can implement a
List
that has \(O(\log n)\) time implementations of get(i)
, set(i,x)
,
add(i,x)
, and remove(i)
. We can also implement an SSet
in which
all operations run in \(O(\log \texttt{n})\) expected time.
The efficiency of skiplists relies on their use of randomization. When a new element is added to a skiplist, the skiplist uses random coin tosses to determine the height of the new element. The performance of skiplists is expressed in terms of expected running times and path lengths. This expectation is taken over the random coin tosses used by the skiplist. In the implementation, the random coin tosses used by a skiplist are simulated using a pseudo-random number (or bit) generator.
4.1 The Basic Structure
Conceptually, a skiplist is a sequence of singly-linked lists
\(L_0,\ldots,L_h\). Each list \(L_r\) contains a subset of the items
in \(L_{r-1}\). We start with the input list \(L_0\) that contains n
items and construct \(L_1\) from \(L_0\), \(L_2\) from \(L_1\), and so on.
The items in \(L_r\) are obtained by tossing a coin for each element, x
,
in \(L_{r-1}\) and including x
in \(L_r\) if the coin turns up as heads.
This process ends when we create a list \(L_r\) that is empty. An example
of a skiplist is shown in Figure 4.1.
For an element, x
, in a skiplist, we call the height
of x
the
largest value \(r\) such that x
appears in \(L_r\). Thus, for example,
elements that only appear in \(L_0\) have height \(0\). If we spend a few
moments thinking about it, we notice that the height of x
corresponds
to the following experiment: Toss a coin repeatedly until it comes
up as tails. How many times did it come up as heads? The answer, not
surprisingly, is that the expected height of a node is 1. (We expect to
toss the coin twice before getting tails, but we don't count the last
toss.) The height of a skiplist is the height of its tallest node.
At the head of every list is a special node, called the sentinel,
that acts as a dummy node for the list. The key property of skiplists
is that there is a short path, called the search path,
from the
sentinel in \(L_h\) to every node in \(L_0\). Remembering how to construct
a search path for a node, u
, is easy (see Figure 4.2)
: Start at the top left corner of your skiplist (the sentinel in \(L_h\))
and always go right unless that would overshoot u
, in which case you
should take a step down into the list below.
More precisely, to construct the search path for the node u
in \(L_0\),
we start at the sentinel, w
, in \(L_h\). Next, we examine w.next
.
If w.next
contains an item that appears before u
in \(L_0\), then
we set \(\texttt{w}=\texttt{w.next}\). Otherwise, we move down and continue the search
at the occurrence of w
in the list \(L_{h-1}\). We continue this way
until we reach the predecessor of u
in \(L_0\).
The following result, which we will prove in Section 4.4, shows that the search path is quite short:
u
, in \(L_0\) is at
most \(2\log \texttt{n} + O(1) = O(\log \texttt{n})\).
A space-efficient way to implement a skiplist is to define a Node
,
u
, as consisting of a data value, x
, and an array, next
, of
pointers, where u.next[i]
points to u
's successor in the list
\(L_{\texttt{i}}\). In this way, the data, x
, in a node is
referenced
only once, even though x
may appear in several lists.
class Node<T> {
T x;
Node<T>[] next;
Node(T ix, int h) {
x = ix;
next = (Node<T>[])Array.newInstance(Node.class, h+1);
}
int height() {
return next.length - 1;
}
}
The next two sections of this chapter discuss two different applications of skiplists. In each of these applications, \(L_0\) stores the main structure (a list of elements or a sorted set of elements). The primary difference between these structures is in how a search path is navigated; in particular, they differ in how they decide if a search path should go down into \(L_{r-1}\) or go right within \(L_r\).
4.2 SkiplistSSet
: An Efficient SSet
A SkiplistSSet
uses a skiplist structure to implement the SSet
interface. When used in this way, the list \(L_0\) stores the elements of
the SSet
in sorted order. The find(x)
method works by following
the search path for the smallest value y
such that \(\texttt{y}\ge\texttt{x}\):
T find(T x) {
Node<T> u = findPredNode(x);
return u.next[0] == null ? null : u.next[0].x;
}
Node<T> findPredNode(T x) {
Node<T> u = sentinel;
int r = h;
while (r >= 0) {
while (u.next[r] != null && c.compare(u.next[r].x,x) < 0)
u = u.next[r]; // go right in list r
r--; // go down into list r-1
}
return u;
}
Following the search path for y
is easy: when situated at
some node, u
, in \(L_{\texttt{r}}\), we look right to u.next[r].x
.
If \(\texttt{x}>\texttt{u.next[r].x}\), then we take a step to the right in
\(L_{\texttt{r}}\); otherwise, we move down into \(L_{\texttt{r}-1}\). Each step
(right or down) in this search takes only constant time; thus, by
Lemma 4.1, the expected running time of find(x)
is \(O(\log \texttt{n})\).
Before we can add an element to a SkipListSSet
, we need a method to
simulate tossing coins to determine the height, k
, of a new node.
We do so by picking a random integer, z
, and counting the number of
trailing \(1\)s in the binary representation of z
:7This method
does not exactly replicate the coin-tossing experiment since the value of
k
will always be less than the number of bits in an int
. However,
this will have negligible impact unless the number of elements in the
structure is much greater than \(2^{32}=4294967296\).
int pickHeight() {
int z = rand.nextInt();
int k = 0;
int m = 1;
while ((z & m) != 0) {
k++;
m <<= 1;
}
return k;
}
To implement the add(x)
method in a SkiplistSSet
we search for x
and then splice x
into a few lists \(L_0\),…,\(L_{\texttt{k}}\), where k
is selected using the pickHeight()
method. The easiest way to do this
is to use an array, stack
, that keeps track of the nodes at which
the search path goes down from some list \(L_{\texttt{r}}\) into \(L_{\texttt{r}-1}\).
More precisely, stack[r]
is the node in \(L_{\texttt{r}}\) where the search path
proceeded down into \(L_{\texttt{r}-1}\). The nodes that we modify to insert x
are precisely the nodes \(\texttt{stack[0]},\ldots,\texttt{stack[k]}\). The following
code implements this algorithm for add(x)
:
boolean add(T x) {
Node<T> u = sentinel;
int r = h;
int comp = 0;
while (r >= 0) {
while (u.next[r] != null
&& (comp = c.compare(u.next[r].x,x)) < 0)
u = u.next[r];
if (u.next[r] != null && comp == 0) return false;
stack[r--] = u; // going down, store u
}
Node<T> w = new Node<T>(x, pickHeight());
while (h < w.height())
stack[++h] = sentinel; // height increased
for (int i = 0; i < w.next.length; i++) {
w.next[i] = stack[i].next[i];
stack[i].next[i] = w;
}
n++;
return true;
}
Removing an element, x
, is done in a similar way, except that there
is no need for stack
to keep track of the search path. The removal
can be done as we are following the search path. We search for x
and each time the search moves downward from a node u
, we check if
\(\texttt{u.next.x}=\texttt{x}\) and if so, we splice u
out of the list:
boolean remove(T x) {
boolean removed = false;
Node<T> u = sentinel;
int r = h;
int comp = 0;
while (r >= 0) {
while (u.next[r] != null
&& (comp = c.compare(u.next[r].x, x)) < 0) {
u = u.next[r];
}
if (u.next[r] != null && comp == 0) {
removed = true;
u.next[r] = u.next[r].next[r];
if (u == sentinel && u.next[r] == null)
h--; // height has gone down
}
r--;
}
if (removed) n--;
return removed;
}
4.2.1 Summary
The following theorem summarizes the performance of skiplists when used to implement sorted sets:
SkiplistSSet
implements the SSet
interface. A SkiplistSSet
supports
the operations add(x)
, remove(x)
, and find(x)
in \(O(\log \texttt{n})\)
expected time per operation.
4.3 SkiplistList
: An Efficient Random-Access List
A SkiplistList
implements the List
interface using a skiplist
structure. In a SkiplistList
, \(L_0\) contains the elements of the
list in the order in which they appear in the list. As in a
SkiplistSSet
, elements can be added, removed, and accessed in \(O(\log
\texttt{n})\) time.
For this to be possible, we need a way to follow the search path for
the i
th element in \(L_0\). The easiest way to do this is to define
the notion of the length of an edge in some list, \(L_{\texttt{r}}\).
We define the length of every edge in \(L_{0}\) as 1. The length of an edge, e
,
in \(L_{\texttt{r}}\), \(\texttt{r}>0\), is defined as the sum of the lengths of the edges below e
in \(L_{\texttt{r}-1}\). Equivalently, the length of e
is
the number of edges in \(L_0\) below e
. See Figure 4.5 for
an example of a skiplist with the lengths of its edges shown. Since the
edges of skiplists are stored in arrays, the lengths can be stored the same
way:
class Node {
T x;
Node[] next;
int[] length;
Node(T ix, int h) {
x = ix;
next = (Node[])Array.newInstance(Node.class, h+1);
length = new int[h+1];
}
int height() {
return next.length - 1;
}
}
The useful property of this definition of length is that, if we are
currently at a node that is at position j
in \(L_0\) and we follow an
edge of length \(\ell\), then we move to a node whose position, in \(L_0\),
is \(\texttt{j}+\ell\). In this way, while following a search path, we can keep
track of the position, j
, of the current node in \(L_0\). When at a
node, u
, in \(L_{\texttt{r}}\), we go right if j
plus the length of the edge
u.next[r]
is less than i
. Otherwise, we go down into \(L_{\texttt{r}-1}\).
Node findPred(int i) {
Node u = sentinel;
int r = h;
int j = -1; // index of the current node in list 0
while (r >= 0) {
while (u.next[r] != null && j + u.length[r] < i) {
j += u.length[r];
u = u.next[r];
}
r--;
}
return u;
}
T get(int i) {
return findPred(i).next[0].x;
}
T set(int i, T x) {
Node u = findPred(i).next[0];
T y = u.x;
u.x = x;
return y;
}
Since the hardest part of the operations get(i)
and set(i,x)
is
finding the i
th node in \(L_0\), these operations run in
\(O(\log \texttt{n})\) time.
Adding an element to a SkiplistList
at a position, i
, is fairly
simple. Unlike in a SkiplistSSet
, we are sure that a new
node will actually be added, so we can do the addition at the same time
as we search for the new node's location. We first pick the height, k
,
of the newly inserted node, w
, and then follow the search path for i
.
Any time the search path moves down from \(L_{\texttt{r}}\) with \(\texttt{r}\le \texttt{k}\), we
splice w
into \(L_{\texttt{r}}\). The only extra care needed is to ensure that
the lengths of edges are updated properly. See Figure 4.6.
Note that, each time the search path goes down at a node, u
, in \(L_{\texttt{r}}\),
the length of the edge u.next[r]
increases by one, since we are adding
an element below that edge at position i
. Splicing the node w
between two nodes,
u
and z
, works as shown in Figure 4.7. While
following the search path we are already keeping track of the position,
j
, of u
in \(L_0\). Therefore, we know that the length of the edge from
u
to w
is \(\texttt{i}-\texttt{j}\). We can also deduce the length of the edge
from w
to z
from the length, \(\ell\), of the edge from u
to z
.
Therefore, we can splice in w
and update the lengths of the edges in
constant time.
This sounds more complicated than it is, for the code is actually quite simple:
void add(int i, T x) {
Node w = new Node(x, pickHeight());
if (w.height() > h)
h = w.height();
add(i, w);
}
Node add(int i, Node w) {
Node u = sentinel;
int k = w.height();
int r = h;
int j = -1; // index of u
while (r >= 0) {
while (u.next[r] != null && j+u.length[r] < i) {
j += u.length[r];
u = u.next[r];
}
u.length[r]++; // accounts for new node in list 0
if (r <= k) {
w.next[r] = u.next[r];
u.next[r] = w;
w.length[r] = u.length[r] - (i - j);
u.length[r] = i - j;
}
r--;
}
n++;
return u;
}
By now, the implementation of
the remove(i)
operation in a SkiplistList
should be obvious. We follow the search path for the node at position i
. Each time the search path takes a step down from a node, u
, at level r
we decrement the length of the edge leaving u
at that level. We also check if u.next[r]
is the element of rank i
and, if so, splice it out of the list at that level. An example is shown in Figure 4.8.
T remove(int i) {
T x = null;
Node u = sentinel;
int r = h;
int j = -1; // index of node u
while (r >= 0) {
while (u.next[r] != null && j+u.length[r] < i) {
j += u.length[r];
u = u.next[r];
}
u.length[r]--; // for the node we are removing
if (j + u.length[r] + 1 == i && u.next[r] != null) {
x = u.next[r].x;
u.length[r] += u.next[r].length[r];
u.next[r] = u.next[r].next[r];
if (u == sentinel && u.next[r] == null)
h--;
}
r--;
}
n--;
return x;
}
4.3.1 Summary
The following theorem summarizes the performance of the SkiplistList
data structure:
SkiplistList
implements the List
interface. A SkiplistList
supports the operations get(i)
, set(i,x)
, add(i,x)
, and
remove(i)
in \(O(\log \texttt{n})\) expected time per operation.
4.4 Analysis of Skiplists
In this section, we analyze the expected height, size, and length of the search path in a skiplist. This section requires a background in basic probability. Several proofs are based on the following basic observation about coin tosses.
1 & \mbox{if the coin is tossed \(i\) or more times} \end{array}\right. \end{equation*} Note that \(I_i=1\) if and only if the first \(i-1\) coin tosses are tails, so \(\E[I_i]=\Pr\{I_i=1\}=1/2^{i-1}\). Observe that \(T\), the total number of coin tosses, can be written as \(T=\sum_{i=1}^{\infty} I_i\). Therefore, \begin{align*} \E[T] & = \E\left[\sum_{i=1}^\infty I_i\right] \\
& = \sum_{i=1}^\infty \E\left[I_i\right] \\
& = \sum_{i=1}^\infty 1/2^{i-1} \\
& = 1 + 1/2 + 1/4 + 1/8 + \cdots \\
& = 2 \enspace . \end{align*}
The next two lemmata tell us that skiplists have linear size:
x
, is included in list
\(L_{\texttt{r}}\) is \(1/2^{\texttt{r}}\), so the expected number of nodes in \(L_{\texttt{r}}\)
is \(\texttt{n}/2^{\texttt{r}}\).8See Section 1.3.4 to see how this
is derived using indicator variables and linearity of expectation.
Therefore, the total expected number of nodes in all lists is
\begin{equation*} \sum_{\texttt{r}=0}^\infty \texttt{n}/2^{\texttt{r}} = \texttt{n}(1+1/2+1/4+1/8+\cdots) = 2\texttt{n} \enspace . \end{equation*}
n
elements is at most
\(\log \texttt{n} + 2\).
1 & \mbox{if \(L_{\texttt{r}}\) is non-empty} \end{array}\right. \end{equation*} The height,
h
, of the skiplist is then given by
\begin{equation*}
\texttt{h} = \sum_{r=1}^\infty I_{\texttt{r}} \enspace .
\end{equation*}
Note that \(I_{\texttt{r}}\) is never more than the length, \(|L_{\texttt{r}}|\), of \(L_{\texttt{r}}\), so
\begin{equation*}
\E[I_{\texttt{r}}] \le \E[|L_{\texttt{r}}|] = \texttt{n}/2^{\texttt{r}} \enspace .
\end{equation*}
Therefore, we have
\begin{align*}
\E[\texttt{h}] &= \E\left[\sum_{r=1}^\infty I_{\texttt{r}}\right] \\&= \sum_{\texttt{r}=1}^{\infty} E[I_{\texttt{r}}] \\
&= \sum_{\texttt{r}=1}^{\lfloor\log \texttt{n}\rfloor} E[I_{\texttt{r}}] + \sum_{r=\lfloor\log \texttt{n}\rfloor+1}^{\infty} E[I_{\texttt{r}}] \\
&\le \sum_{\texttt{r}=1}^{\lfloor\log \texttt{n}\rfloor} 1 + \sum_{r=\lfloor\log \texttt{n}\rfloor+1}^{\infty} \texttt{n}/2^{\texttt{r}} \\
&\le \log \texttt{n} + \sum_{\texttt{r}=0}^\infty 1/2^{\texttt{r}} \\
&= \log \texttt{n} + 2 \enspace . \end{align*}
x
. This path starts at the predecessor of x
in \(L_0\). At any point in time, if the path can go up a level, then
it does. If it cannot go up a level then it goes left. Thinking about
this for a few moments will convince us that the reverse search path for
x
is identical to the search path for x
, except that it is reversed.
The number of nodes that the reverse search path visits at a particular
level, r
, is related to the following experiment: Toss a coin.
If the coin comes up as heads, then move up and stop. Otherwise, move
left and repeat the experiment. The number of coin tosses before
the heads represents the number of steps to the left that a reverse
search path takes at a particular level.9Note that this
might overcount the number of steps to the left, since the experiment
should end either at the first heads or when the search path reaches
the sentinel, whichever comes first. This is not a problem since the
lemma is only stating an upper bound. Lemma 4.2 tells us
that the expected number of coin tosses before the first heads is 1.
Let \(S_{\texttt{r}}\) denote the number of steps the forward search path takes at level
\(\texttt{r}\) that go to the right. We have just argued that \(\E[S_{\texttt{r}}]\le
1\). Furthermore, \(S_{\texttt{r}}\le |L_{\texttt{r}}|\), since we can't take more steps
in \(L_{\texttt{r}}\) than the length of \(L_{\texttt{r}}\), so
\begin{equation*}
\E[S_{\texttt{r}}] \le \E[|L_{\texttt{r}}|] = \texttt{n}/2^{\texttt{r}} \enspace .
\end{equation*}
We can now finish as in the proof of Lemma 4.4.
Let \(S\) be the length of the search path for some node, u
, in a
skiplist, and let \(\texttt{h}\) be the height of the skiplist. Then
\begin{align*}
\E[S]
&= \E\left[ \texttt{h} + \sum_{\texttt{r}=0}^\infty S_{\texttt{r}} \right] \\
&= \E[\texttt{h}] + \sum_{\texttt{r}=0}^\infty \E[S_{\texttt{r}}] \\
&= \E[\texttt{h}] + \sum_{\texttt{r}=0}^{\lfloor\log \texttt{n}\rfloor} \E[S_{\texttt{r}}]
+ \sum_{\texttt{r}=\lfloor\log \texttt{n}\rfloor+1}^\infty \E[S_{\texttt{r}}] \\
&\le \E[\texttt{h}] + \sum_{\texttt{r}=0}^{\lfloor\log \texttt{n}\rfloor} 1
+ \sum_{r=\lfloor\log \texttt{n}\rfloor+1}^\infty \texttt{n}/2^{\texttt{r}} \\
&\le \E[\texttt{h}] + \sum_{\texttt{r}=0}^{\lfloor\log \texttt{n}\rfloor} 1
+ \sum_{\texttt{r}=0}^{\infty} 1/2^{\texttt{r}} \\
&\le \E[\texttt{h}] + \sum_{\texttt{r}=0}^{\lfloor\log \texttt{n}\rfloor} 1
+ \sum_{\texttt{r}=0}^{\infty} 1/2^{\texttt{r}} \\
&\le \E[\texttt{h}] + \log \texttt{n} + 3 \\
&\le 2\log \texttt{n} + 5 \enspace .
\end{align*}
The following theorem summarizes the results in this section:
4.5 Discussion and Exercises
Skiplists were introduced by Pugh [62] who also presented
a number of applications and extensions of skiplists [61]. Since then they
have been studied extensively. Several researchers have done very
precise analyses of the expected length and variance of the length of the
search path for the i
th element in a skiplist [45,44,58].
Deterministic versions [53], biased versions [8,26],
and self-adjusting versions [12] of skiplists have all been
developed. Skiplist implementations have been written for various
languages and frameworks and have been used in open-source database
systems [71,63]. A variant of skiplists is used in the HP-UX
operating system kernel's process management structures [42].
Skiplists are even part of the Java 1.6 API [55].
add(3,x)
on the SkiplistList
in Figure 4.5. Assume that pickHeight()
selects a height
of 4 for the newly created node.
add(x)
or a remove(x)
operation, the expected
number of pointers in a SkiplistSet
that get changed is constant.
- Show that, with this modification, the expected length of a search path is at most \((1/p)\log_{1/p} \texttt{n} + O(1)\).
- What is the value of \(p\) that minimizes the preceding expression?
- What is the expected height of the skiplist?
- What is the expected number of nodes in the skiplist?
find(x)
method in a SkiplistSet
sometimes performs
redundant comparisons; these occur when x
is compared to the
same value more than once. They can occur when, for some node, u
,
\(\texttt{u.next[r]} = \texttt{u.next[r-1]}\). Show how these redundant comparisons
happen and modify find(x)
so that they are avoided. Analyze the
expected number of comparisons done by your modified find(x)
method.
SSet
interface, but also allows fast access to elements by rank.
That is, it also supports the function get(i)
, which returns the
element whose rank is i
in \(O(\log \texttt{n})\) expected time. (The rank
of an element x
in an SSet
is the number of elements in the SSet
that are less than x
.)
stack
in the add(x)
code on
page X is a finger; the shaded nodes in
Figure 4.3 show the contents of the finger.) One can
think of a finger as pointing out the path to a node in the lowest
list, \(L_0\).
A finger search implements the find(x)
operation using a
finger, by walking up the list using the finger until reaching a node
u
such that \(\texttt{u.x} < \texttt{x}\) and \(\texttt{u.next}=\texttt{null}\) or \(\texttt{u.next.x} >
\texttt{x}\) and then performing a normal search for x
starting from u
.
It is possible to prove that the expected number of steps required
for a finger search is \(O(1+\log r)\), where \(r\) is the number values
in \(L_0\) between x
and the value pointed to by the finger.
Implement a subclass of Skiplist
called SkiplistWithFinger
that
implements find(x)
operations using an internal finger. This subclass
stores a finger, which is then used so that every find(x)
operation
is implemented as a finger search. During each find(x)
operation
the finger is updated so that each find(x)
operation uses, as a
starting point, a finger that points to the result of the previous
find(x)
operation.
truncate(i)
, that truncates a SkiplistList
at position i
. After the execution of this method, the size
of the list is i
and it contains only the elements at indices
\(0,\ldots,\texttt{i}-1\). The return value is another SkiplistList
that
contains the elements at indices \(\texttt{i},\ldots,\texttt{n}-1\). This method
should run in \(O(\log \texttt{n})\) time.
SkiplistList
method, absorb(l2)
, that takes as an
argument a SkiplistList
, l2
, empties it and appends its contents,
in order, to the receiver. For example, if l1
contains \(a,b,c\)
and l2
contains \(d,e,f\), then after calling l1.absorb(l2)
, l1
will contain \(a,b,c,d,e,f\) and l2
will be empty. This method should
run in \(O(\log \texttt{n})\) time.
SEList
, design
and implement a space-efficient SSet
, SESSet
. To do this, store the
data, in order, in an SEList
, and store the blocks of this SEList
in an SSet
. If the original SSet
implementation uses \(O(\texttt{n})\)
space to store n
elements, then the SESSet
will use enough space
for n
elements plus \(O(\texttt{n}/\texttt{b}+\texttt{b})\) wasted space.
SSet
as your underlying structure, design and implement an
application that reads a (large) text file and allows you to search,
interactively, for any substring contained in the text. As the user
types their query, a matching part of the text (if any) should appear
as a result.
Hint 1: Every substring is a prefix of some suffix, so it suffices to store all suffixes of the text file.
Hint 2: Any suffix can be represented compactly as a single integer indicating where the suffix begins in the text.
Test your application on some large texts, such as some of the books available at Project Gutenberg [1]. If done correctly, your applications will be very responsive; there should be no noticeable lag between typing keystrokes and seeing the results.
- Explain how removing some edges of a skiplist leads to a structure that looks like a binary tree and is similar to a binary search tree.
- Skiplists and binary search trees each use about the same number of pointers (2 per node). Skiplists make better use of those pointers, though. Explain why.