In this chapter, we study a binary search tree data structure, the
ScapegoatTree
. This structure is based on the common wisdom that,
when something goes wrong, the first thing people tend to do is find
someone to blame (the scapegoat).
Once blame is firmly
established, we can leave the scapegoat to fix the problem.
A ScapegoatTree
keeps itself balanced by partial rebuilding
operations.
During a partial rebuilding operation, an entire subtree is
deconstructed and rebuilt into a perfectly balanced subtree. There are
many ways of rebuilding a subtree rooted at node u
into a perfectly
balanced tree. One of the simplest is to traverse u
's subtree,
gathering all its nodes into an array, a
, and then to recursively
build a balanced subtree using a
. If we let \(\texttt{m}=\texttt{a.length}/2\),
then the element a[m]
becomes the root of the new subtree,
\(\texttt{a}[0],\ldots,\texttt{a}[\texttt{m}-1]\) get stored recursively in the left subtree
and \(\texttt{a}[\texttt{m}+1],\ldots,\texttt{a}[\texttt{a.length}-1]\) get stored recursively in the
right subtree.
void rebuild(Node<T> u) {
int ns = size(u);
Node<T> p = u.parent;
Node<T>[] a = (Node<T>[]) Array.newInstance(Node.class, ns);
packIntoArray(u, a, 0);
if (p == nil) {
r = buildBalanced(a, 0, ns);
r.parent = nil;
} else if (p.right == u) {
p.right = buildBalanced(a, 0, ns);
p.right.parent = p;
} else {
p.left = buildBalanced(a, 0, ns);
p.left.parent = p;
}
}
int packIntoArray(Node<T> u, Node<T>[] a, int i) {
if (u == nil) {
return i;
}
i = packIntoArray(u.left, a, i);
a[i++] = u;
return packIntoArray(u.right, a, i);
}
Node<T> buildBalanced(Node<T>[] a, int i, int ns) {
if (ns == 0)
return nil;
int m = ns / 2;
a[i + m].left = buildBalanced(a, i, m);
if (a[i + m].left != nil)
a[i + m].left.parent = a[i + m];
a[i + m].right = buildBalanced(a, i + m + 1, ns - m - 1);
if (a[i + m].right != nil)
a[i + m].right.parent = a[i + m];
return a[i + m];
}
rebuild(u)
takes \(O(\texttt{size(u)})\) time. The resulting subtree
has minimum height; there is no tree of smaller height that
has size(u)
nodes.
8.1 ScapegoatTree
: A Binary Search Tree with Partial Rebuilding
A ScapegoatTree
is a BinarySearchTree
that, in addition to keeping
track of the number, n
, of nodes in the tree also keeps a counter, q
,
that maintains an upper-bound on the number of nodes.
int q;
n
and q
obey the following inequalities:
\begin{equation*}
\texttt{q}/2 \le \texttt{n} \le \texttt{q} \enspace .
\end{equation*}
In addition, a ScapegoatTree
has logarithmic height; at all times, the height of the scapegoat tree does not exceed
\begin{equation}
\log_{3/2} \texttt{q} \le \log_{3/2} 2\texttt{n} < \log_{3/2} \texttt{n} + 2\enspace .
\end{equation}
Even with this constraint, a ScapegoatTree
can look surprisingly unbalanced. The tree in Figure 8.1 has \(\texttt{q}=\texttt{n}=10\) and height \(5<\log_{3/2}10 \approx 5.679\).
Implementing the find(x)
operation in a ScapegoatTree
is done
using the standard algorithm for searching in a BinarySearchTree
(see Section 6.2). This takes time proportional to the
height of the tree which, by Equation 8.1 is \(O(\log \texttt{n})\).
To implement the add(x)
operation, we first increment n
and q
and then use the usual algorithm for adding x
to a binary search
tree; we search for x
and then add a new leaf u
with \(\texttt{u.x}=\texttt{x}\).
At this point, we may get lucky and the depth of u
might not exceed
\(\log_{3/2}\texttt{q}\). If so, then we leave well enough alone and don't do
anything else.
Unfortunately, it will sometimes happen that \(\texttt{depth(u)} > \log_{3/2}
\texttt{q}\). In this case, we need to reduce the height. This isn't a big
job; there is only one node, namely u
, whose depth exceeds \(\log_{3/2}
\texttt{q}\). To fix u
, we walk from u
back up to the root looking for a
scapegoat, w
. The scapegoat, w
, is a very unbalanced node.
It has the property that
\begin{equation}
\frac{\texttt{size(w.child)}}{\texttt{size(w)}} > \frac{2}{3} \enspace ,
\end{equation}
where w.child
is the child of w
on the path from the root to u
.
We'll very shortly prove that a scapegoat exists. For now, we can
take it for granted. Once we've found the scapegoat w
, we completely
destroy the subtree rooted at w
and rebuild it into a perfectly balanced
binary search tree. We know, from Equation 8.2, that, even before
the addition of u
, w
's subtree was not a complete binary tree.
Therefore, when we rebuild w
, the height decreases by at least 1 so that the height of the ScapegoatTree
is once again at most \(\log_{3/2}\texttt{q}\).
boolean add(T x) {
// first do basic insertion keeping track of depth
Node<T> u = newNode(x);
int d = addWithDepth(u);
if (d > log32(q)) {
// depth exceeded, find scapegoat
Node<T> w = u.parent;
while (3*size(w) <= 2*size(w.parent))
w = w.parent;
rebuild(w.parent);
}
return d >= 0;
}
w
and rebuilding the
subtree rooted at w
, then the running time of add(x)
is dominated
by the initial search, which takes \(O(\log \texttt{q}) = O(\log \texttt{n})\) time.
We will account for the cost of finding the scapegoat and rebuilding
using amortized analysis in the next section.
The implementation of remove(x)
in a ScapegoatTree
is very simple.
We search for x
and remove it using the usual algorithm for removing a
node from a BinarySearchTree
. (Note that this can never increase the
height of the tree.) Next, we decrement n
, but leave q
unchanged.
Finally, we check if \(\texttt{q} > 2\texttt{n}\) and, if so, then we rebuild the entire
tree into a perfectly balanced binary search tree and set \(\texttt{q}=\texttt{n}\).
boolean remove(T x) {
if (super.remove(x)) {
if (2*n < q) {
rebuild(r);
q = n;
}
return true;
}
return false;
}
remove(x)
operation is proportional to the height of the tree, and is
therefore \(O(\log \texttt{n})\).
8.1.1 Analysis of Correctness and Running-Time
In this section, we analyze the correctness and amortized running time
of operations on a ScapegoatTree
. We first prove the correctness by
showing that, when the add(x)
operation results in a node that violates
Condition Equation 8.1, then we can always find a scapegoat:
u
be a node of depth \(h>\log_{3/2} \texttt{q}\) in a ScapegoatTree
.
Then there exists a node \(\texttt{w}\) on the path from u
to the root
such that
\begin{equation*}
\frac{\texttt{size(w)}}{\texttt{size(parent(w))}} > 2/3 \enspace .
\end{equation*}
w
on the path from u
to the root. Denote the path
from the root to u
as \(\texttt{r}=\texttt{u}_0,\ldots,\texttt{u}_h=\texttt{u}\). Then, we have
\(\texttt{size(u}_0\texttt{)}=\texttt{n}\),
\(\texttt{size(u}_1\texttt{)}\le\frac{2}{3}\texttt{n}\),
\(\texttt{size(u}_2\texttt{)}\le\frac{4}{9}\texttt{n}\) and, more generally,
\begin{equation*}
\texttt{size(u}_i\texttt{)}\le\left(\frac{2}{3}\right)^i\texttt{n} \enspace .
\end{equation*}
But this gives a contradiction, since \(\texttt{size(u)}\ge 1\), hence
\begin{equation*}
1 \le \texttt{size(u)} \le \left(\frac{2}{3}\right)^h\texttt{n}
< \left(\frac{2}{3}\right)^{\log_{3/2} \texttt{q}}\texttt{n}
\le \left(\frac{2}{3}\right)^{\log_{3/2} \texttt{n}}\texttt{n}
= \left(\frac{1}{\texttt{n}}\right) \texttt{n}
= 1 \enspace .
\end{equation*}
Next, we analyze the parts of the running time that are not yet
accounted for. There are two parts: The cost of calls to size(u)
when searching for scapegoat nodes, and the cost of calls to rebuild(w)
when we find a scapegoat w
. The cost of calls to size(u)
can be
related to the cost of calls to rebuild(w)
, as follows:
add(x)
in a ScapegoatTree
, the cost of finding the scapegoat w
and rebuilding the subtree rooted at w
is \(O(\texttt{size(w)})\).
w
, once we find it, is
\(O(\texttt{size(w)})\). When searching for the scapegoat node, we call size(u)
on a sequence of nodes \(\texttt{u}_0,\ldots,\texttt{u}_k\) until we find the scapegoat
\(\texttt{u}_k=\texttt{w}\). However, since \(\texttt{u}_k\) is the first node in this sequence
that is a scapegoat, we know that
\begin{equation*}
\texttt{size(u}_{i}\texttt{)} < \frac{2}{3}\texttt{size(u}_{i+1}\texttt{)}
\end{equation*}
for all \(i\in\{0,\ldots,k-2\}\). Therefore, the cost of all calls to size(u)
is
\begin{eqnarray*}
O\left( \sum_{i=0}^k \texttt{size(u}_{k-i}\texttt{)} \right)
&=& O\left(
\texttt{size(u}_k\texttt{)}
+ \sum_{i=0}^{k-1} \texttt{size(u}_{k-i-1}\texttt{)}
\right) \\&=& O\left( \texttt{size(u}_k\texttt{)} + \sum_{i=0}^{k-1} \left(\frac{2}{3}\right)^i\texttt{size(u}_{k}\texttt{)} \right) \\
&=& O\left( \texttt{size(u}_k\texttt{)}\left(1+ \sum_{i=0}^{k-1} \left(\frac{2}{3}\right)^i \right)\right) \\
&=& O(\texttt{size(u}_k\texttt{)}) = O(\texttt{size(w)}) \enspace , \end{eqnarray*} where the last line follows from the fact that the sum is a geometrically decreasing series.
All that remains is to prove an upper-bound on the cost of all calls to
rebuild(u)
during a sequence of \(m\) operations:
ScapegoatTree
any sequence of \(m\) add(x)
and remove(x)
operations causes at most \(O(m\log m)\) time to be used
by rebuild(u)
operations.
rebuild(u)
is paid for with
credits stored at u
.
During an insertion or deletion, we give one credit to each node on the
path to the inserted node, or deleted node, u
. In this way we hand
out at most \(\log_{3/2}\texttt{q}\le \log_{3/2}m\) credits per operation.
During a deletion we also store an additional credit “on the side.”
Thus, in total we give out at most \(O(m\log m)\) credits. All that
remains is to show that these credits are sufficient to pay for all
calls to rebuild(u)
.
If we call rebuild(u)
during an insertion, it is because u
is
a scapegoat. Suppose, without loss of generality, that
\begin{equation*}
\frac{\texttt{size(u.left)}}{\texttt{size(u)}} > \frac{2}{3} \enspace .
\end{equation*}
Using the fact that
\begin{equation*}
\texttt{size(u)} = 1 + \texttt{size(u.left)} + \texttt{size(u.right)}
\end{equation*}
we deduce that
\begin{equation*}
\frac{1}{2}\texttt{size(u.left)} > \texttt{size(u.right)} \enspace
\end{equation*}
and therefore
\begin{equation*}
\texttt{size(u.left)} - \texttt{size(u.right)} > \frac{1}{2}\texttt{size(u.left)} >
\frac{1}{3}\texttt{size(u)} \enspace .
\end{equation*}
Now, the last time a subtree containing u
was rebuilt (or when u
was inserted, if a subtree containing u
was never rebuilt), we had
\begin{equation*}
\texttt{size(u.left)} - \texttt{size(u.right)} \le 1 \enspace .
\end{equation*}
Therefore, the number of add(x)
or remove(x)
operations that have
affected u.left
or u.right
since then is at least
\begin{equation*}
\frac{1}{3}\texttt{size(u)} - 1 \enspace .
\end{equation*}
and there are therefore at least this many credits stored at u
that are available to pay for the \(O(\texttt{size(u)})\) time it takes to
call rebuild(u)
.
If we call rebuild(u)
during a deletion, it is because \(\texttt{q} > 2\texttt{n}\).
In this case, we have \(\texttt{q}-\texttt{n}> \texttt{n}\) credits stored “on the side,” and
we use these to pay for the \(O(\texttt{n})\) time it takes to rebuild the root.
This completes the proof.
8.1.2 Summary
The following theorem summarizes the performance of theScapegoatTree
data structure:
ScapegoatTree
implements the SSet
interface. Ignoring the cost
of rebuild(u)
operations, a ScapegoatTree
supports the operations
add(x)
, remove(x)
, and find(x)
in \(O(\log \texttt{n})\) time per operation.
Furthermore, beginning with an empty ScapegoatTree
, any sequence of \(m\)
add(x)
and remove(x)
operations results in a total of \(O(m\log m)\)
time spent during all calls to rebuild(u)
.
8.2 Discussion and Exercises
The term scapegoat tree is due to Galperin and Rivest [33], who define and analyze these trees. However, the same structure was discovered earlier by Andersson [5,7], who called them general balanced trees since they can have any shape as long as their height is small.
Experimenting with the ScapegoatTree
implementation will reveal that
it is often considerably slower than the other SSet
implementations
in this book. This may be somewhat surprising, since height bound of
\begin{equation*}
\log_{3/2}\texttt{q} \approx 1.709\log \texttt{n} + O(1)
\end{equation*}
is better than the expected length of a search path in a Skiplist
and
not too far from that of a Treap
. The implementation could be optimized
by storing the sizes of subtrees explicitly at each node or by reusing
already computed subtree sizes (Exercises Exercise 8.5
and Exercise 8.6). Even with these optimizations,
there will always be sequences of add(x)
and delete(x)
operation for
which a ScapegoatTree
takes longer than other SSet
implementations.
This gap in performance is due to the fact that, unlike the other SSet
implementations discussed in this book, a ScapegoatTree
can spend a lot
of time restructuring itself. Exercise 8.3 asks you to prove
that there are sequences of n
operations in which a ScapegoatTree
will spend on the order of \(\texttt{n}\log \texttt{n}\) time in calls to rebuild(u)
.
This is in contrast to other SSet
implementations discussed in this
book, which only make \(O(\texttt{n})\) structural changes during a sequence
of n
operations. This is, unfortunately, a necessary consequence of
the fact that a ScapegoatTree
does all its restructuring by calls to
rebuild(u)
[20].
Despite their lack of performance, there are applications in which a
ScapegoatTree
could be the right choice. This would occur any time
there is additional data associated with nodes that cannot be updated
in constant time when a rotation is performed, but that can be updated
during a rebuild(u)
operation. In such cases, the ScapegoatTree
and related structures based on partial rebuilding may work. An example of such an application is outlined in Exercise 8.11.
ScapegoatTree
in Figure 8.1.
ScapegoatTree
, and show where the credits described in the
proof of Lemma 8.3 go, and how they are used during
this sequence of additions.
ScapegoatTree
and call add(x)
for \(\texttt{x}=1,2,3,\ldots,\texttt{n}\), then the total time spent during calls to
rebuild(u)
is at least \(c\texttt{n}\log \texttt{n}\) for some constant \(c>0\).
ScapegoatTree
, as described in this chapter, guarantees that the
length of the search path does not exceed \(\log_{3/2}\texttt{q}\).
- Design, analyze, and implement a modified version of
ScapegoatTree
where the length of the search path does not exceed \(\log_{\texttt{b}} \texttt{q}\), whereb
is a parameter with \(1<\texttt{b}<2\). - What does your analysis and/or your experiments say about the
amortized cost of
find(x)
,add(x)
andremove(x)
as a function ofn
andb
?
add(x)
method of the ScapegoatTree
so that it does not
waste any time recomputing the sizes of subtrees that have already
been computed. This is possible because, by the time the method
wants to compute size(w)
, it has already computed one of size(w.left)
or size(w.right)
. Compare the performance of your modified
implementation with the implementation given here.
ScapegoatTree
data structure that
explicitly stores and maintains the sizes of the subtree rooted at
each node. Compare the performance of the resulting implementation
with that of the original ScapegoatTree
implementation as well as
the implementation from Exercise 8.5.
rebuild(u)
method discussed at the beginning of this
chapter so that it does not require the use of an array to store the
nodes of the subtree being rebuilt. Instead, it should use recursion
to first connect the nodes into a linked list and then convert this
linked list into a perfectly balanced binary tree. (There are
very elegant recursive implementations of both steps.)
WeightBalancedTree
. This is a tree in
which each node u
, except the root, maintains the balance
invariant that \(\texttt{size(u)} \le (2/3)\texttt{size(u.parent)}\). The add(x)
and
remove(x)
operations are identical to the standard BinarySearchTree
operations, except that any time the balance invariant is violated at
a node u
, the subtree rooted at u.parent
is rebuilt.
Your analysis should show that operations on a WeightBalancedTree
run in \(O(\log\texttt{n})\) amortized time.
CountdownTree
. In a CountdownTree
each
node u
keeps a timer u.t
. The add(x)
and remove(x)
operations are exactly the same as in a standard BinarySearchTree
except that, whenever one of these operations affects u
's subtree,
u.t
is decremented. When \(\texttt{u.t}=0\) the entire subtree rooted at u
is rebuilt into a perfectly balanced binary search tree. When a node
u
is involved in a rebuilding operation (either because u
is rebuilt
or one of u
's ancestors is rebuilt) u.t
is reset to \(\texttt{size(u)}/3\).
Your analysis should show that operations on a CountdownTree
run
in \(O(\log \texttt{n})\) amortized time. (Hint: First show that each node u
satisfies some version of a balance invariant.)
DynamiteTree
. In a DynamiteTree
each
node u
keeps tracks of the size of the subtree rooted at u
in a
variable u.size
. The add(x)
and remove(x)
operations are exactly
the same as in a standard BinarySearchTree
except that, whenever one
of these operations affects a node u
's subtree, u
explodes
with probability \(1/\texttt{u.size}\). When u
explodes, its entire subtree
is rebuilt into a perfectly balanced binary search tree.
Your analysis should show that operations on a DynamiteTree
run
in \(O(\log \texttt{n})\) expected time.
Sequence
data structure that maintains a
sequence (list) of elements. It supports these operations:
addAfter(e)
: Add a new element after the elemente
in the sequence. Return the newly added element. (Ife
is null, the new element is added at the beginning of the sequence.)remove(e)
: Removee
from the sequence.testBefore(e1,e2)
: returntrue
if and only ife1
comes beforee2
in the sequence.
The Sequence
data structure can be implemented by storing the elements
in something like a ScapegoatTree
, in the same order that they occur
in the sequence. To implement testBefore(e1,e2)
in constant time,
each element e
is labelled with an integer that encodes the path from
the root to e
. In this way, testBefore(e1,e2)
can be implemented
by comparing the labels of e1
and e2
.