8.1 ScapegoatTree: A Binary Search Tree with Partial Rebuilding

A ScapegoatTree is a BinarySearchTree that, in addition to keeping track of the number, $\mathtt{n}$ , of nodes in the tree also keeps a counter, $\mathtt{q}$ , that maintains an upper-bound on the number of nodes.

**Figure 8.1:** A `ScapegoatTree` with 10 nodes and height 5.
$\includegraphics{figs/scapegoat-insert-1}$

Implementing the $\mathtt{find(x)}$ operation in a ScapegoatTree is done using the standard algorithm for searching in a BinarySearchTree (see ). This takes time proportional to the height of the tree which, by (

) is $O(\log \ensuremath{\mathtt{n}})$ .

To implement the $\mathtt{add(x)}$ operation, we first increment $\mathtt{n}$ and $\mathtt{q}$ and then use the usual algorithm for adding $\mathtt{x}$ to a binary search tree; we search for $\mathtt{x}$ and then add a new leaf $\mathtt{u}$ with $\ensuremath{\mathtt{u.x}}=\ensuremath{\mathtt{x}}$ . At this point, we may get lucky and the depth of $\mathtt{u}$ might not exceed $\log_{3/2}\ensuremath{\mathtt{q}}$ . If so, then we leave well enough alone and don't do anything else.

Unfortunately, it will sometimes happen that $\ensuremath{\mathtt{depth(u)}} > \log_{3/2} \ensuremath{\mathtt{q}}$ . In this case we need to do something to reduce the height. This isn't a big job; there is only one node, namely $\mathtt{u}$ , whose depth exceeds $\log_{3/2}\ensuremath{\mathtt{q}}$ . To fix $\mathtt{u}$ , we walk from $\mathtt{u}$ back up to the root looking for a scapegoat, $\mathtt{w}$ . The scapegoat, $\mathtt{w}$ , is a very unbalanced node. It has the property that

The implementation of $\mathtt{remove(x)}$ in a ScapegoatTree is very simple. We search for $\mathtt{x}$ and remove it using the usual algorithm for removing a node from a BinarySearchTree. (Note that this can never increase the height of the tree.) Next, we decrement $\mathtt{n}$ but leave $\mathtt{q}$ unchanged. Finally, we check if $\ensuremath{\mathtt{q}} > 2\ensuremath{\mathtt{n}}$ and, if so, we rebuild the entire tree into a perfectly balanced binary search tree and set $\ensuremath{\mathtt{q}}=\ensuremath{\mathtt{n}}$ .

8.1.1 Analysis of Correctness and Running-Time

In this section we analyze the correctness and amortized running time of operations on a ScapegoatTree. We first prove the correctness by showing that, when the $\mathtt{add(x)}$ operation results in a node that violates Condition (

), then we can always find a scapegoat:

Proof. Suppose, for the sake of contradiction, that this is not the case, and

$\displaystyle \frac{\ensuremath{\mathtt{size(w)}}}{\ensuremath{\mathtt{size(parent(w))}}} \le 2/3 \enspace .$

for all nodes $\mathtt{w}$ on the path from $\mathtt{u}$ to the root. Denote the path from the root to $\mathtt{u}$ as $\ensuremath{\mathtt{r}}=\ensuremath{\mathtt{u}}_0,\ldots,\ensuremath{\mathtt{u}}_h=\ensuremath{\mathtt{u}}$ . Then, we have $\ensuremath{\mathtt{size(u}}_0\ensuremath{\mathtt{)}}=\ensuremath{\mathtt{n}}$ , $\ensuremath{\mathtt{size(u}}_1\ensuremath{\mathtt{)}}\le\frac{2}{3}\ensuremath{\mathtt{n}}$ , $\ensuremath{\mathtt{size(u}}_2\ensuremath{\mathtt{)}}\le\frac{4}{9}\ensuremath{\mathtt{n}}$ and, more generally,

$\displaystyle \ensuremath{\mathtt{size(u}}_i\ensuremath{\mathtt{)}}\le\left(\frac{2}{3}\right)^i\ensuremath{\mathtt{n}} \enspace .$

But this gives a contradiction, since $\ensuremath{\mathtt{size(u)}}\ge 1$ , hence

$\displaystyle 1 \le \ensuremath{\mathtt{size(u)}} \le \left(\frac{2}{3}\right)^... ...suremath{\mathtt{n}}}\right) \ensuremath{\mathtt{n}} = 1 \enspace . \qedhere$

$\qedsymbol$

Next, we analyze the parts of the running time that we have not yet accounted for. There are two parts: The cost of calls to $\mathtt{size(u)}$ when search for scapegoat nodes, and the cost of calls to $\mathtt{rebuild(w)}$ when we find a scapegoat $\mathtt{w}$ . The cost of calls to $\mathtt{size(u)}$ can be related to the cost of calls to $\mathtt{rebuild(w)}$ , as follows:

Proof. The cost of rebuilding the scapegoat node $\mathtt{w}$ , once we find it, is $O(\ensuremath{\mathtt{size(w)}})$ . When searching for the scapegoat node, we call $\mathtt{size(u)}$ on a sequence of nodes $\ensuremath{\mathtt{u}}_0,\ldots,\ensuremath{\mathtt{u}}_k$ until we find the scapegoat $\ensuremath{\mathtt{u}}_k=\ensuremath{\mathtt{w}}$ . However, since $\ensuremath{\mathtt{u}}_k$ is the first node in this sequence that is a scapegoat, we know that

$\displaystyle \ensuremath{\mathtt{size(u}}_{i}\ensuremath{\mathtt{)}} < \frac{2}{3}\ensuremath{\mathtt{size(u}}_{i+1}\ensuremath{\mathtt{)}}$

for all $i\in\{0,\ldots,k-2\}$ . Therefore, the cost of all calls to $\mathtt{size(u)}$ is

$\displaystyle O\left( \sum_{i=0}^k \ensuremath{\mathtt{size(u}}_{k-i}\ensuremath{\mathtt{)}} \right)$	$\displaystyle =$	$\displaystyle O\left( \ensuremath{\mathtt{size(u}}_k\ensuremath{\mathtt{)}} ... ...i=0}^{k-1} \ensuremath{\mathtt{size(u}}_{k-i-1}\ensuremath{\mathtt{)}} \right)$
	$\displaystyle =$	$\displaystyle O\left( \ensuremath{\mathtt{size(u}}_k\ensuremath{\mathtt{)}} ... ...{2}{3}\right)^i\ensuremath{\mathtt{size(u}}_{k}\ensuremath{\mathtt{)}} \right)$
	$\displaystyle =$	$\displaystyle O\left( \ensuremath{\mathtt{size(u}}_k\ensuremath{\mathtt{)}}\left(1+ \sum_{i=0}^{k-1} \left(\frac{2}{3}\right)^i \right)\right)$
	$\displaystyle =$	$\displaystyle O(\ensuremath{\mathtt{size(u}}_k\ensuremath{\mathtt{)}}) = O(\ensuremath{\mathtt{size(w)}}) \enspace ,$

where the last line follows from the fact that the sum is a geometrically decreasing series. $\qedsymbol$

All that remains is to prove an upper-bound on the cost of calls to $\mathtt{rebuild(u)}$ :

Proof. To prove this, we will use a credit scheme. Each node stores a number of credits. Each credit can pay for some constant,

, units of time spent rebuilding. The scheme gives out a total of $O(m\log m)$ credits and every call to $\mathtt{rebuild(u)}$ is paid for with credits stored at $\mathtt{u}$ .

During an insertion or deletion, we give one credit to each node on the path to the inserted node, or deleted node, $\mathtt{u}$ . In this way we hand out at most $\log_{3/2}\ensuremath{\mathtt{q}}\le \log_{3/2}m$ credits per operation. During a deletion we also store an additional 1 credit ``on the side.'' Thus, in total we give out at most $O(m\log m)$ credits. All that remains is to show that these credits are sufficient to pay for all calls to $\mathtt{rebuild(u)}$ .

If we call $\mathtt{rebuild(u)}$ during an insertion, it is because $\mathtt{u}$ is a scapegoat. Suppose, without loss of generality, that

$\displaystyle \frac{\ensuremath{\mathtt{size(u.left)}}}{\ensuremath{\mathtt{size(u)}}} > \frac{2}{3} \enspace .$

Using the fact that

$\displaystyle \ensuremath{\mathtt{size(u)}} = 1 + \ensuremath{\mathtt{size(u.left)}} + \ensuremath{\mathtt{size(u.right)}}$

we deduce that

$\displaystyle \frac{1}{2}\ensuremath{\mathtt{size(u.left)}} > \ensuremath{\mathtt{size(u.right)}} \enspace$

and therefore

$\displaystyle \ensuremath{\mathtt{size(u.left)}} - \ensuremath{\mathtt{size(u.r... ...\mathtt{size(u.left)}} > \frac{1}{3}\ensuremath{\mathtt{size(u)}} \enspace .$

Now, the last time a subtree containing $\mathtt{u}$ was rebuilt (or when $\mathtt{u}$ was inserted, if a subtree containing $\mathtt{u}$ was never rebuilt), we had

$\displaystyle \ensuremath{\mathtt{size(u.left)}} - \ensuremath{\mathtt{size(u.right)}} \le 1 \enspace .$

Therefore, the number of $\mathtt{add(x)}$ or $\mathtt{remove(x)}$ operations that have affected $\mathtt{u.left}$ or $\mathtt{u.right}$ since then is at least

$\displaystyle \frac{1}{3}\ensuremath{\mathtt{size(u)}} - 1 \enspace .$

and there are therefore at least this many credits stored at $\mathtt{u}$ that are available to pay for the $O(\ensuremath{\mathtt{size(u)}})$ time it takes to call $\mathtt{rebuild(u)}$ .

If we call $\mathtt{rebuild(u)}$ during a deletion, it is because $\ensuremath{\mathtt{q}} > 2\ensuremath{\mathtt{n}}$ . In this case, we have $\ensuremath{\mathtt{q}}-\ensuremath{\mathtt{n}}> \ensuremath{\mathtt{n}}$ credits stored ``on the side'' and we use these to pay for the $O(\ensuremath{\mathtt{n}})$ time it takes to rebuild the root. This completes the proof. $\qedsymbol$

8.1 `ScapegoatTree`: A Binary Search Tree with Partial Rebuilding

8.1.1 Analysis of Correctness and Running-Time

8.1.2 Summary