8.1 ScapegoatTree: A Binary Search Tree with Partial Rebuilding

A ScapegoatTree is a BinarySearchTree that, in addition to keeping track of the number, $\ensuremath{\ensuremath{\mathit{n}}}$ , of nodes in the tree also keeps a counter, $\ensuremath{\ensuremath{\mathit{q}}}$ , that maintains an upper-bound on the number of nodes.
$\begin{leftbar} \begin{flushleft} \hspace*{1em} \ensuremath{\mathrm{initialize}(... ...th{\ensuremath{\mathit{q}} \gets \ensuremath{0}}\\ \end{flushleft}\end{leftbar}$
At all times, $\ensuremath{\ensuremath{\mathit{n}}}$ and $\ensuremath{\ensuremath{\mathit{q}}}$ obey the following inequalities:

**Figure 8.1:** A ScapegoatTree with 10 nodes and height 5.
$\includegraphics[scale=0.90909]{figs-python/scapegoat-insert-1}$

Implementing the $\ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation in a ScapegoatTree is done using the standard algorithm for searching in a BinarySearchTree (see Section 6.2). This takes time proportional to the height of the tree which, by (8.1) is $O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ .

To implement the $\ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ operation, we first increment $\ensuremath{\ensuremath{\mathit{n}}}$ and $\ensuremath{\ensuremath{\mathit{q}}}$ and then use the usual algorithm for adding $\ensuremath{\ensuremath{\mathit{x}}}$ to a binary search tree; we search for $\ensuremath{\ensuremath{\mathit{x}}}$ and then add a new leaf $\ensuremath{\ensuremath{\mathit{u}}}$ with $\ensuremath{\ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{x}}}}=\ensuremath{\ensuremath{\ensuremath{\mathit{x}}}}$ . At this point, we may get lucky and the depth of $\ensuremath{\ensuremath{\mathit{u}}}$ might not exceed $\log_{3/2}\ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}$ . If so, then we leave well enough alone and don't do anything else.

Unfortunately, it will sometimes happen that $\ensuremath{\ensuremath{\mathrm{depth}(\ensuremath{\mathit{u}})}} > \log_{3/2} \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}$ . In this case, we need to reduce the height. This isn't a big job; there is only one node, namely $\ensuremath{\ensuremath{\mathit{u}}}$ , whose depth exceeds $\log_{3/2} \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}$ . To fix $\ensuremath{\ensuremath{\mathit{u}}}$ , we walk from $\ensuremath{\ensuremath{\mathit{u}}}$ back up to the root looking for a scapegoat, $\ensuremath{\ensuremath{\mathit{w}}}$ . The scapegoat, $\ensuremath{\ensuremath{\mathit{w}}}$ , is a very unbalanced node. It has the property that

$\begin{leftbar} % latex2html id marker 40162\begin{flushleft} \hspace*{1em} \e... ...urn}} \ensuremath{\ensuremath{\mathit{d}} \ge 0}\\ \end{flushleft}\end{leftbar}$

The implementation of $\ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ in a ScapegoatTree is very simple. We search for $\ensuremath{\ensuremath{\mathit{x}}}$ and remove it using the usual algorithm for removing a node from a BinarySearchTree. (Note that this can never increase the height of the tree.) Next, we decrement $\ensuremath{\ensuremath{\mathit{n}}}$ , but leave $\ensuremath{\ensuremath{\mathit{q}}}$ unchanged. Finally, we check if $\ensuremath{\ensuremath{\ensuremath{\mathit{q}}}} > 2\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$ and, if so, then we rebuild the entire tree into a perfectly balanced binary search tree and set $\ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}=\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$ .
$\begin{leftbar} \begin{flushleft} \hspace*{1em} \ensuremath{\mathrm{remove}(\ens... ...eturn}} \ensuremath{\ensuremath{\mathit{false}}}\\ \end{flushleft}\end{leftbar}$
Again, if we ignore the cost of rebuilding, the running time of the $\ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ operation is proportional to the height of the tree, and is therefore $O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ .

8.1.1 Analysis of Correctness and Running-Time

In this section, we analyze the correctness and amortized running time of operations on a ScapegoatTree. We first prove the correctness by showing that, when the $\ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ operation results in a node that violates Condition (8.1), then we can always find a scapegoat:

Proof. Suppose, for the sake of contradiction, that this is not the case, and

$\displaystyle \frac{\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{w... ...\mathrm{size}(\mathrm{parent}(\ensuremath{\mathit{w}})})}} \le 2/3 \enspace .$

for all nodes $\ensuremath{\ensuremath{\mathit{w}}}$ on the path from $\ensuremath{\ensuremath{\mathit{u}}}$ to the root. Denote the path from the root to $\ensuremath{\ensuremath{\mathit{u}}}$ as $\ensuremath{\ensuremath{\ensuremath{\mathit{r}}}}=\ensuremath{\ensuremath{\ens... ...h{\ensuremath{\mathit{u}}}}_h=\ensuremath{\ensuremath{\ensuremath{\mathit{u}}}}$ . Then, we have $\ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_0\ensuremath{)}=\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$ , $\ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_1\ensuremath{)}\le\frac{2}{3}\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$ , $\ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_2\ensuremath{)}\le\frac{4}{9}\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$ and, more generally,

$\displaystyle \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}... ...ac{2}{3}\right)^i\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}} \enspace .$

But this gives a contradiction, since $\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}}\ge 1$ , hence

$\displaystyle 1 \le \ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u... ...t) \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}} = 1 \enspace . \qedhere$

$\qedsymbol$

Next, we analyze the parts of the running time that are not yet accounted for. There are two parts: The cost of calls to $\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}$ when searching for scapegoat nodes, and the cost of calls to $\ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{w}})}$ when we find a scapegoat $\ensuremath{\ensuremath{\mathit{w}}}$ . The cost of calls to $\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}$ can be related to the cost of calls to $\ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{w}})}$ , as follows:

Proof. The cost of rebuilding the scapegoat node $\ensuremath{\ensuremath{\mathit{w}}}$ , once we find it, is $O(\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{w}})}})$ . When searching for the scapegoat node, we call $\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}$ on a sequence of nodes $\ensuremath{\ensuremath{\ensuremath{\mathit{u}}}}_0,\ldots,\ensuremath{\ensuremath{\ensuremath{\mathit{u}}}}_k$ until we find the scapegoat $\ensuremath{\ensuremath{\ensuremath{\mathit{u}}}}_k=\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}$ . However, since $\ensuremath{\ensuremath{\ensuremath{\mathit{u}}}}_k$ is the first node in this sequence that is a scapegoat, we know that

$\displaystyle \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}... ...th{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_{i+1}\ensuremath{)}$

for all $i\in\{0,\ldots,k-2\}$ . Therefore, the cost of all calls to $\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}$ is

$\displaystyle O\left( \sum_{i=0}^k \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_{k-i}\ensuremath{)} \right)$	$\displaystyle =$	$\displaystyle O\left( \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\ma... ...emath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_{k-i-1}\ensuremath{)} \right)$
	$\displaystyle =$	$\displaystyle O\left( \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\ma... ...nsuremath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_{k}\ensuremath{)} \right)$
	$\displaystyle =$	$\displaystyle O\left( \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\ma... ...nsuremath{)}\left(1+ \sum_{i=0}^{k-1} \left(\frac{2}{3}\right)^i \right)\right)$
	$\displaystyle =$	$\displaystyle O(\ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u... ... O(\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{w}})}}) \enspace ,$

where the last line follows from the fact that the sum is a geometrically decreasing series. $\qedsymbol$

All that remains is to prove an upper-bound on the cost of all calls to $\ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ during a sequence of

operations:

Proof. To prove this, we will use a credit scheme. We imagine that each node stores a number of credits. Each credit can pay for some constant,

, units of time spent rebuilding. The scheme gives out a total of $O(m\log m)$ credits and every call to $\ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ is paid for with credits stored at $\ensuremath{\ensuremath{\mathit{u}}}$ .

During an insertion or deletion, we give one credit to each node on the path to the inserted node, or deleted node, $\ensuremath{\ensuremath{\mathit{u}}}$ . In this way we hand out at most $\log_{3/2}\ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}\le \log_{3/2}m$ credits per operation. During a deletion we also store an additional credit ``on the side.'' Thus, in total we give out at most $O(m\log m)$ credits. All that remains is to show that these credits are sufficient to pay for all calls to $\ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ .

If we call $\ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ during an insertion, it is because $\ensuremath{\ensuremath{\mathit{u}}}$ is a scapegoat. Suppose, without loss of generality, that

$\displaystyle \frac{\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u... ...ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}}} > \frac{2}{3} \enspace .$

Using the fact that

$\displaystyle \ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}} ... ...uremath{\mathrm{size}(\ensuremath{\mathit{u}}.\ensuremath{\mathit{right}})}}$

we deduce that

$\displaystyle \frac{1}{2}\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mat... ...mathrm{size}(\ensuremath{\mathit{u}}.\ensuremath{\mathit{right}})}} \enspace$

and therefore

$\displaystyle \ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}}.\en... ...3}\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}} \enspace .$

Now, the last time a subtree containing $\ensuremath{\ensuremath{\mathit{u}}}$ was rebuilt (or when $\ensuremath{\ensuremath{\mathit{u}}}$ was inserted, if a subtree containing $\ensuremath{\ensuremath{\mathit{u}}}$ was never rebuilt), we had

$\displaystyle \ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}}.\en... ...size}(\ensuremath{\mathit{u}}.\ensuremath{\mathit{right}})}} \le 1 \enspace .$

Therefore, the number of $\ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ or $\ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ operations that have affected $\ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{left}}}$ or $\ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{right}}}$ since then is at least

$\displaystyle \frac{1}{3}\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}} - 1 \enspace .$

and there are therefore at least this many credits stored at $\ensuremath{\ensuremath{\mathit{u}}}$ that are available to pay for the $O(\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}})$ time it takes to call $\ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ .

If we call $\ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ during a deletion, it is because $\ensuremath{\ensuremath{\ensuremath{\mathit{q}}}} > 2\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$ . In this case, we have $\ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}> \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$ credits stored ``on the side,'' and we use these to pay for the $O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ time it takes to rebuild the root. This completes the proof. $\qedsymbol$

8.1 ScapegoatTree: A Binary Search Tree with Partial Rebuilding

8.1.1 Analysis of Correctness and Running-Time

8.1.2 Summary