8.1 ScapegoatTree: A Binary Search Tree with Partial Rebuilding

A ScapegoatTree is a BinarySearchTree that, in addition to keeping track of the number, $ \ensuremath{\ensuremath{\mathit{n}}}$, of nodes in the tree also keeps a counter, $ \ensuremath{\ensuremath{\mathit{q}}}$, that maintains an upper-bound on the number of nodes.
\hspace*{1em} \ensuremath{\mathrm{initialize}(...{\ensuremath{\mathit{q}} \gets \ensuremath{0}}\\
At all times, $ \ensuremath{\ensuremath{\mathit{n}}}$ and $ \ensuremath{\ensuremath{\mathit{q}}}$ obey the following inequalities:

$\displaystyle \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}/2 \le \ensurema...
...\mathit{n}}}} \le \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}} \enspace .

In addition, a ScapegoatTree has logarithmic height; at all times, the height of the scapegoat tree does not exceed

$\displaystyle \log_{3/2} \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}} \le ...
...}} < \log_{3/2} \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}} + 2\enspace .$ (8.1)

Even with this constraint, a ScapegoatTree can look surprisingly unbalanced. The tree in Figure 8.1 has $ \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}=\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}=10$ and height $ 5<\log_{3/2}10 \approx 5.679$.

Figure 8.1: A ScapegoatTree with 10 nodes and height 5.

Implementing the $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation in a ScapegoatTree is done using the standard algorithm for searching in a BinarySearchTree (see Section 6.2). This takes time proportional to the height of the tree which, by (8.1) is $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$.

To implement the $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ operation, we first increment $ \ensuremath{\ensuremath{\mathit{n}}}$ and $ \ensuremath{\ensuremath{\mathit{q}}}$ and then use the usual algorithm for adding $ \ensuremath{\ensuremath{\mathit{x}}}$ to a binary search tree; we search for $ \ensuremath{\ensuremath{\mathit{x}}}$ and then add a new leaf $ \ensuremath{\ensuremath{\mathit{u}}}$ with $ \ensuremath{\ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{x}}}}=\ensuremath{\ensuremath{\ensuremath{\mathit{x}}}}$. At this point, we may get lucky and the depth of $ \ensuremath{\ensuremath{\mathit{u}}}$ might not exceed $ \log_{3/2}\ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}$. If so, then we leave well enough alone and don't do anything else.

Unfortunately, it will sometimes happen that $ \ensuremath{\ensuremath{\mathrm{depth}(\ensuremath{\mathit{u}})}} > \log_{3/2}
\ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}$. In this case, we need to reduce the height. This isn't a big job; there is only one node, namely $ \ensuremath{\ensuremath{\mathit{u}}}$, whose depth exceeds $ \log_{3/2}
\ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}$. To fix $ \ensuremath{\ensuremath{\mathit{u}}}$, we walk from $ \ensuremath{\ensuremath{\mathit{u}}}$ back up to the root looking for a scapegoat, $ \ensuremath{\ensuremath{\mathit{w}}}$. The scapegoat, $ \ensuremath{\ensuremath{\mathit{w}}}$, is a very unbalanced node. It has the property that

$\displaystyle \frac{\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{w...
...{\ensuremath{\mathrm{size}(\ensuremath{\mathit{w}})}}} > \frac{2}{3} \enspace ,$ (8.2)

where $ \ensuremath{\ensuremath{\mathit{w}}.\ensuremath{\mathit{child}}}$ is the child of $ \ensuremath{\ensuremath{\mathit{w}}}$ on the path from the root to $ \ensuremath{\ensuremath{\mathit{u}}}$. We'll very shortly prove that a scapegoat exists. For now, we can take it for granted. Once we've found the scapegoat $ \ensuremath{\ensuremath{\mathit{w}}}$, we completely destroy the subtree rooted at $ \ensuremath{\ensuremath{\mathit{w}}}$ and rebuild it into a perfectly balanced binary search tree. We know, from (8.2), that, even before the addition of $ \ensuremath{\ensuremath{\mathit{u}}}$, $ \ensuremath{\ensuremath{\mathit{w}}}$'s subtree was not a complete binary tree. Therefore, when we rebuild $ \ensuremath{\ensuremath{\mathit{w}}}$, the height decreases by at least 1 so that the height of the ScapegoatTree is once again at most $ \log_{3/2}\ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}$.

% latex2html id marker 40162\begin{flushleft}
\hspace*{1em} \e...
...urn}} \ensuremath{\ensuremath{\mathit{d}} \ge 0}\\

Figure 8.2: Inserting 3.5 into a ScapegoatTree increases its height to 6, which violates (8.1) since $ 6 > \log_{3/2} 11 \approx 5.914$. A scapegoat is found at the node containing 5.
\includegraphics[scale=0.90909]{figs-python/scapegoat-insert-3} \includegraphics[scale=0.90909]{figs-python/scapegoat-insert-4}
If we ignore the cost of finding the scapegoat $ \ensuremath{\ensuremath{\mathit{w}}}$ and rebuilding the subtree rooted at $ \ensuremath{\ensuremath{\mathit{w}}}$, then the running time of $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ is dominated by the initial search, which takes $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}) = O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ time. We will account for the cost of finding the scapegoat and rebuilding using amortized analysis in the next section.

The implementation of $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ in a ScapegoatTree is very simple. We search for $ \ensuremath{\ensuremath{\mathit{x}}}$ and remove it using the usual algorithm for removing a node from a BinarySearchTree. (Note that this can never increase the height of the tree.) Next, we decrement $ \ensuremath{\ensuremath{\mathit{n}}}$, but leave $ \ensuremath{\ensuremath{\mathit{q}}}$ unchanged. Finally, we check if $ \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}} > 2\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$ and, if so, then we rebuild the entire tree into a perfectly balanced binary search tree and set $ \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}=\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$.
\hspace*{1em} \ensuremath{\mathrm{remove}(\ens...
...eturn}} \ensuremath{\ensuremath{\mathit{false}}}\\
Again, if we ignore the cost of rebuilding, the running time of the $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ operation is proportional to the height of the tree, and is therefore $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$.

8.1.1 Analysis of Correctness and Running-Time

In this section, we analyze the correctness and amortized running time of operations on a ScapegoatTree. We first prove the correctness by showing that, when the $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ operation results in a node that violates Condition (8.1), then we can always find a scapegoat:

Lemma 8..1   Let $ \ensuremath{\ensuremath{\mathit{u}}}$ be a node of depth $ h>\log_{3/2} \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}$ in a ScapegoatTree. Then there exists a node $ \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}$ on the path from $ \ensuremath{\ensuremath{\mathit{u}}}$ to the root such that

$\displaystyle \frac{\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{w...
...h{\mathrm{size}(\mathrm{parent}(\ensuremath{\mathit{w}})})}} > 2/3 \enspace .

Proof. Suppose, for the sake of contradiction, that this is not the case, and

$\displaystyle \frac{\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{w...
...\mathrm{size}(\mathrm{parent}(\ensuremath{\mathit{w}})})}} \le 2/3 \enspace .

for all nodes $ \ensuremath{\ensuremath{\mathit{w}}}$ on the path from $ \ensuremath{\ensuremath{\mathit{u}}}$ to the root. Denote the path from the root to $ \ensuremath{\ensuremath{\mathit{u}}}$ as $ \ensuremath{\ensuremath{\ensuremath{\mathit{r}}}}=\ensuremath{\ensuremath{\ens...
...h{\ensuremath{\mathit{u}}}}_h=\ensuremath{\ensuremath{\ensuremath{\mathit{u}}}}$. Then, we have $ \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_0\ensuremath{)}=\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$, $ \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_1\ensuremath{)}\le\frac{2}{3}\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$, $ \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_2\ensuremath{)}\le\frac{4}{9}\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$ and, more generally,

$\displaystyle \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}...{2}{3}\right)^i\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}} \enspace .

But this gives a contradiction, since $ \ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}}\ge 1$, hence

$\displaystyle 1 \le \ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u...
...t) \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}
= 1 \enspace . \qedhere

$ \qedsymbol$

Next, we analyze the parts of the running time that are not yet accounted for. There are two parts: The cost of calls to $ \ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}$ when searching for scapegoat nodes, and the cost of calls to $ \ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{w}})}$ when we find a scapegoat $ \ensuremath{\ensuremath{\mathit{w}}}$. The cost of calls to $ \ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}$ can be related to the cost of calls to $ \ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{w}})}$, as follows:

Lemma 8..2   During a call to $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ in a ScapegoatTree, the cost of finding the scapegoat $ \ensuremath{\ensuremath{\mathit{w}}}$ and rebuilding the subtree rooted at $ \ensuremath{\ensuremath{\mathit{w}}}$ is $ O(\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{w}})}})$.

Proof. The cost of rebuilding the scapegoat node $ \ensuremath{\ensuremath{\mathit{w}}}$, once we find it, is $ O(\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{w}})}})$. When searching for the scapegoat node, we call $ \ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}$ on a sequence of nodes $ \ensuremath{\ensuremath{\ensuremath{\mathit{u}}}}_0,\ldots,\ensuremath{\ensuremath{\ensuremath{\mathit{u}}}}_k$ until we find the scapegoat $ \ensuremath{\ensuremath{\ensuremath{\mathit{u}}}}_k=\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}$. However, since $ \ensuremath{\ensuremath{\ensuremath{\mathit{u}}}}_k$ is the first node in this sequence that is a scapegoat, we know that

$\displaystyle \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}...{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_{i+1}\ensuremath{)}

for all $ i\in\{0,\ldots,k-2\}$. Therefore, the cost of all calls to $ \ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}$ is
$\displaystyle O\left( \sum_{i=0}^k \ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u}}}}_{k-i}\ensuremath{)} \right)$ $\displaystyle =$ $\displaystyle O\left(
  $\displaystyle =$ $\displaystyle O\left(
  $\displaystyle =$ $\displaystyle O\left(
\sum_{i=0}^{k-1} \left(\frac{2}{3}\right)^i
  $\displaystyle =$ $\displaystyle O(\ensuremath{\ensuremath{size}(\ensuremath{\ensuremath{\mathit{u...
... O(\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{w}})}}) \enspace ,$  

where the last line follows from the fact that the sum is a geometrically decreasing series. $ \qedsymbol$

All that remains is to prove an upper-bound on the cost of all calls to $ \ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ during a sequence of $ m$ operations:

Lemma 8..3   Starting with an empty ScapegoatTree any sequence of $ m$ $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ and $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ operations causes at most $ O(m\log m)$ time to be used by $ \ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ operations.

Proof. To prove this, we will use a credit scheme. We imagine that each node stores a number of credits. Each credit can pay for some constant, $ c$, units of time spent rebuilding. The scheme gives out a total of $ O(m\log m)$ credits and every call to $ \ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ is paid for with credits stored at $ \ensuremath{\ensuremath{\mathit{u}}}$.

During an insertion or deletion, we give one credit to each node on the path to the inserted node, or deleted node, $ \ensuremath{\ensuremath{\mathit{u}}}$. In this way we hand out at most $ \log_{3/2}\ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}\le \log_{3/2}m$ credits per operation. During a deletion we also store an additional credit ``on the side.'' Thus, in total we give out at most $ O(m\log m)$ credits. All that remains is to show that these credits are sufficient to pay for all calls to $ \ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$.

If we call $ \ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ during an insertion, it is because $ \ensuremath{\ensuremath{\mathit{u}}}$ is a scapegoat. Suppose, without loss of generality, that

$\displaystyle \frac{\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u...
...ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}}} > \frac{2}{3} \enspace .

Using the fact that

$\displaystyle \ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}} ...

we deduce that

$\displaystyle \frac{1}{2}\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mat...
...mathrm{size}(\ensuremath{\mathit{u}}.\ensuremath{\mathit{right}})}} \enspace

and therefore

$\displaystyle \ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}}.\en...
...3}\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}} \enspace .

Now, the last time a subtree containing $ \ensuremath{\ensuremath{\mathit{u}}}$ was rebuilt (or when $ \ensuremath{\ensuremath{\mathit{u}}}$ was inserted, if a subtree containing $ \ensuremath{\ensuremath{\mathit{u}}}$ was never rebuilt), we had

$\displaystyle \ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}}.\en...
...size}(\ensuremath{\mathit{u}}.\ensuremath{\mathit{right}})}} \le 1 \enspace .

Therefore, the number of $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ or $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ operations that have affected $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{left}}}$ or $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{right}}}$ since then is at least

$\displaystyle \frac{1}{3}\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}} - 1 \enspace .

and there are therefore at least this many credits stored at $ \ensuremath{\ensuremath{\mathit{u}}}$ that are available to pay for the $ O(\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}})$ time it takes to call $ \ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$.

If we call $ \ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ during a deletion, it is because $ \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}} > 2\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$. In this case, we have $ \ensuremath{\ensuremath{\ensuremath{\mathit{q}}}}-\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}> \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}$ credits stored ``on the side,'' and we use these to pay for the $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ time it takes to rebuild the root. This completes the proof. $ \qedsymbol$

8.1.2 Summary

The following theorem summarizes the performance of the ScapegoatTree data structure:

Theorem 8..1   A ScapegoatTree implements the SSet interface. Ignoring the cost of $ \ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$ operations, a ScapegoatTree supports the operations $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$, $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$, and $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ in $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ time per operation.

Furthermore, beginning with an empty ScapegoatTree, any sequence of $ m$ $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ and $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ operations results in a total of $ O(m\log m)$ time spent during all calls to $ \ensuremath{\mathrm{rebuild}(\ensuremath{\mathit{u}})}$.