Subsections

8.1 $\mathtt{ScapegoatTree}$ : A Binary Search Tree with Partial Rebuilding

A $\mathtt{ScapegoatTree}$ is a $\mathtt{BinarySearchTree}$ that, in addition to keeping track of the number, $\mathtt{n}$ , of nodes in the tree also keeps a counter, $\mathtt{q}$ , that maintains an upper-bound on the number of nodes.

  int q;

At all times, $\mathtt{n}$ and $\mathtt{q}$ obey the following inequalities:

$\displaystyle \ensuremath{\mathtt{q}}/2 \le \ensuremath{\mathtt{n}} \le \ensuremath{\mathtt{q}} \enspace .$

In addition, a $\mathtt{ScapegoatTree}$ has logarithmic height; at all times, the height of the scapegoat tree does not exceed

$\displaystyle \log_{3/2} \ensuremath{\mathtt{q}} \le \log_{3/2} 2\ensuremath{\mathtt{n}} < \log_{3/2} \ensuremath{\mathtt{n}} + 2\enspace .$

(8.1)

Even with this constraint, a $\mathtt{ScapegoatTree}$ can look surprisingly unbalanced. The tree in Figure 8.1 has $\ensuremath{\mathtt{q}}=\ensuremath{\mathtt{n}}=10$ and height $5<\log_{3/2}10 \approx 5.679$ .

**Figure 8.1:** A $\mathtt{ScapegoatTree}$ with 10 nodes and height 5.
$\includegraphics[scale=0.90909]{figs/scapegoat-insert-1}$

Implementing the $\mathtt{find(x)}$ operation in a $\mathtt{ScapegoatTree}$ is done using the standard algorithm for searching in a $\mathtt{BinarySearchTree}$ (see Section 6.2). This takes time proportional to the height of the tree which, by (8.1) is $O(\log \ensuremath{\mathtt{n}})$ .

To implement the $\mathtt{add(x)}$ operation, we first increment $\mathtt{n}$ and $\mathtt{q}$ and then use the usual algorithm for adding $\mathtt{x}$ to a binary search tree; we search for $\mathtt{x}$ and then add a new leaf $\mathtt{u}$ with $\ensuremath{\mathtt{u.x}}=\ensuremath{\mathtt{x}}$ . At this point, we may get lucky and the depth of $\mathtt{u}$ might not exceed $\log_{3/2}\ensuremath{\mathtt{q}}$ . If so, then we leave well enough alone and don't do anything else.

Unfortunately, it will sometimes happen that $\ensuremath{\mathtt{depth(u)}} > \log_{3/2} \ensuremath{\mathtt{q}}$ . In this case, we need to reduce the height. This isn't a big job; there is only one node, namely $\mathtt{u}$ , whose depth exceeds $\log_{3/2} \ensuremath{\mathtt{q}}$ . To fix $\mathtt{u}$ , we walk from $\mathtt{u}$ back up to the root looking for a scapegoat, $\mathtt{w}$ . The scapegoat, $\mathtt{w}$ , is a very unbalanced node. It has the property that

$\displaystyle \frac{\ensuremath{\mathtt{size(w.child)}}}{\ensuremath{\mathtt{size(w)}}} > \frac{2}{3} \enspace ,$

(8.2)

where $\mathtt{w.child}$ is the child of $\mathtt{w}$ on the path from the root to $\mathtt{u}$ . We'll very shortly prove that a scapegoat exists. For now, we can take it for granted. Once we've found the scapegoat $\mathtt{w}$ , we completely destroy the subtree rooted at $\mathtt{w}$ and rebuild it into a perfectly balanced binary search tree. We know, from (8.2), that, even before the addition of $\mathtt{u}$ , $\mathtt{w}$ 's subtree was not a complete binary tree. Therefore, when we rebuild $\mathtt{w}$ , the height decreases by at least 1 so that the height of the $\mathtt{ScapegoatTree}$ is once again at most $\log_{3/2}\ensuremath{\mathtt{q}}$ .

  bool add(T x) {
    // first do basic insertion keeping track of depth
    Node *u = new Node;
    u->x = x;
    u->left = u->right = u->parent = nil;
    int d = addWithDepth(u);
    if (d > log32(q)) {
      // depth exceeded, find scapegoat
      Node *w = u->parent;
      int a = BinaryTree<Node>::size(w);
      int b = BinaryTree<Node>::size(w->parent);
      while (3*a <= 2*b) {
        w = w->parent;
        a = BinaryTree<Node>::size(w);
        b = BinaryTree<Node>::size(w->parent);
      }
      rebuild(w->parent);
    }
    return d >= 0;
  }

Figure 8.2: Inserting 3.5 into a $\mathtt{ScapegoatTree}$ increases its height to 6, which violates (8.1) since $6 > \log_{3/2} 11 \approx 5.914$ . A scapegoat is found at the node containing 5.

$\includegraphics[scale=0.90909]{figs/scapegoat-insert-3}$

$\includegraphics[scale=0.90909]{figs/scapegoat-insert-4}$

If we ignore the cost of finding the scapegoat $\mathtt{w}$ and rebuilding the subtree rooted at $\mathtt{w}$ , then the running time of $\mathtt{add(x)}$ is dominated by the initial search, which takes $O(\log \ensuremath{\mathtt{q}}) = O(\log \ensuremath{\mathtt{n}})$ time. We will account for the cost of finding the scapegoat and rebuilding using amortized analysis in the next section.

The implementation of $\mathtt{remove(x)}$ in a $\mathtt{ScapegoatTree}$ is very simple. We search for $\mathtt{x}$ and remove it using the usual algorithm for removing a node from a $\mathtt{BinarySearchTree}$ . (Note that this can never increase the height of the tree.) Next, we decrement $\mathtt{n}$ , but leave $\mathtt{q}$ unchanged. Finally, we check if $\ensuremath{\mathtt{q}} > 2\ensuremath{\mathtt{n}}$ and, if so, then we rebuild the entire tree into a perfectly balanced binary search tree and set $\ensuremath{\mathtt{q}}=\ensuremath{\mathtt{n}}$ .

  bool remove(T x) {
    if (BinarySearchTree<Node,T>::remove(x)) {
      if (2*n < q) {
        rebuild(r);
        q = n;
      }
      return true;
    }
    return false;
  }

Again, if we ignore the cost of rebuilding, the running time of the $\mathtt{remove(x)}$ operation is proportional to the height of the tree, and is therefore $O(\log \ensuremath{\mathtt{n}})$ .

8.1.1 Analysis of Correctness and Running-Time

In this section, we analyze the correctness and amortized running time of operations on a $\mathtt{ScapegoatTree}$ . We first prove the correctness by showing that, when the $\mathtt{add(x)}$ operation results in a node that violates Condition (8.1), then we can always find a scapegoat:

Lemma 8..1 Let $\mathtt{u}$ be a node of depth $h>\log_{3/2} \ensuremath{\mathtt{q}}$ in a $\mathtt{ScapegoatTree}$ . Then there exists a node $\ensuremath{\mathtt{w}}$ on the path from $\mathtt{u}$ to the root such that

$\displaystyle \frac{\ensuremath{\mathtt{size(w)}}}{\ensuremath{\mathtt{size(parent(w))}}} > 2/3 \enspace .$

Proof. Suppose, for the sake of contradiction, that this is not the case, and

$\displaystyle \frac{\ensuremath{\mathtt{size(w)}}}{\ensuremath{\mathtt{size(parent(w))}}} \le 2/3 \enspace .$

for all nodes $\mathtt{w}$ on the path from $\mathtt{u}$ to the root. Denote the path from the root to $\mathtt{u}$ as $\ensuremath{\mathtt{r}}=\ensuremath{\mathtt{u}}_0,\ldots,\ensuremath{\mathtt{u}}_h=\ensuremath{\mathtt{u}}$ . Then, we have $\ensuremath{\mathtt{size(u}}_0\ensuremath{\mathtt{)}}=\ensuremath{\mathtt{n}}$ , $\ensuremath{\mathtt{size(u}}_1\ensuremath{\mathtt{)}}\le\frac{2}{3}\ensuremath{\mathtt{n}}$ , $\ensuremath{\mathtt{size(u}}_2\ensuremath{\mathtt{)}}\le\frac{4}{9}\ensuremath{\mathtt{n}}$ and, more generally,

$\displaystyle \ensuremath{\mathtt{size(u}}_i\ensuremath{\mathtt{)}}\le\left(\frac{2}{3}\right)^i\ensuremath{\mathtt{n}} \enspace .$

But this gives a contradiction, since $\ensuremath{\mathtt{size(u)}}\ge 1$ , hence

$\displaystyle 1 \le \ensuremath{\mathtt{size(u)}} \le \left(\frac{2}{3}\right)^... ...suremath{\mathtt{n}}}\right) \ensuremath{\mathtt{n}} = 1 \enspace . \qedhere$

$\qedsymbol$

Next, we analyze the parts of the running time that are not yet accounted for. There are two parts: The cost of calls to $\mathtt{size(u)}$ when searching for scapegoat nodes, and the cost of calls to $\mathtt{rebuild(w)}$ when we find a scapegoat $\mathtt{w}$ . The cost of calls to $\mathtt{size(u)}$ can be related to the cost of calls to $\mathtt{rebuild(w)}$ , as follows:

Lemma 8..2 During a call to $\mathtt{add(x)}$ in a $\mathtt{ScapegoatTree}$ , the cost of finding the scapegoat $\mathtt{w}$ and rebuilding the subtree rooted at $\mathtt{w}$ is $O(\ensuremath{\mathtt{size(w)}})$ .

Proof. The cost of rebuilding the scapegoat node $\mathtt{w}$ , once we find it, is $O(\ensuremath{\mathtt{size(w)}})$ . When searching for the scapegoat node, we call $\mathtt{size(u)}$ on a sequence of nodes $\ensuremath{\mathtt{u}}_0,\ldots,\ensuremath{\mathtt{u}}_k$ until we find the scapegoat $\ensuremath{\mathtt{u}}_k=\ensuremath{\mathtt{w}}$ . However, since $\ensuremath{\mathtt{u}}_k$ is the first node in this sequence that is a scapegoat, we know that

$\displaystyle \ensuremath{\mathtt{size(u}}_{i}\ensuremath{\mathtt{)}} < \frac{2}{3}\ensuremath{\mathtt{size(u}}_{i+1}\ensuremath{\mathtt{)}}$

for all $i\in\{0,\ldots,k-2\}$ . Therefore, the cost of all calls to $\mathtt{size(u)}$ is

$\displaystyle O\left( \sum_{i=0}^k \ensuremath{\mathtt{size(u}}_{k-i}\ensuremath{\mathtt{)}} \right)$	$\displaystyle =$	$\displaystyle O\left( \ensuremath{\mathtt{size(u}}_k\ensuremath{\mathtt{)}} + \... ...{i=0}^{k-1} \ensuremath{\mathtt{size(u}}_{k-i-1}\ensuremath{\mathtt{)}} \right)$
	$\displaystyle =$	$\displaystyle O\left( \ensuremath{\mathtt{size(u}}_k\ensuremath{\mathtt{)}} + \... ...c{2}{3}\right)^i\ensuremath{\mathtt{size(u}}_{k}\ensuremath{\mathtt{)}} \right)$
	$\displaystyle =$	$\displaystyle O\left( \ensuremath{\mathtt{size(u}}_k\ensuremath{\mathtt{)}}\left(1+ \sum_{i=0}^{k-1} \left(\frac{2}{3}\right)^i \right)\right)$
	$\displaystyle =$	$\displaystyle O(\ensuremath{\mathtt{size(u}}_k\ensuremath{\mathtt{)}}) = O(\ensuremath{\mathtt{size(w)}}) \enspace ,$

where the last line follows from the fact that the sum is a geometrically decreasing series. $\qedsymbol$

All that remains is to prove an upper-bound on the cost of all calls to $\mathtt{rebuild(u)}$ during a sequence of operations:

Lemma 8..3 Starting with an empty $\mathtt{ScapegoatTree}$ any sequence of $\mathtt{add(x)}$ and $\mathtt{remove(x)}$ operations causes at most $O(m\log m)$ time to be used by $\mathtt{rebuild(u)}$ operations.

Proof. To prove this, we will use a credit scheme. We imagine that each node stores a number of credits. Each credit can pay for some constant,

, units of time spent rebuilding. The scheme gives out a total of $O(m\log m)$ credits and every call to $\mathtt{rebuild(u)}$ is paid for with credits stored at $\mathtt{u}$ .

During an insertion or deletion, we give one credit to each node on the path to the inserted node, or deleted node, $\mathtt{u}$ . In this way we hand out at most $\log_{3/2}\ensuremath{\mathtt{q}}\le \log_{3/2}m$ credits per operation. During a deletion we also store an additional credit ``on the side.'' Thus, in total we give out at most $O(m\log m)$ credits. All that remains is to show that these credits are sufficient to pay for all calls to $\mathtt{rebuild(u)}$ .

If we call $\mathtt{rebuild(u)}$ during an insertion, it is because $\mathtt{u}$ is a scapegoat. Suppose, without loss of generality, that

$\displaystyle \frac{\ensuremath{\mathtt{size(u.left)}}}{\ensuremath{\mathtt{size(u)}}} > \frac{2}{3} \enspace .$

Using the fact that

$\displaystyle \ensuremath{\mathtt{size(u)}} = 1 + \ensuremath{\mathtt{size(u.left)}} + \ensuremath{\mathtt{size(u.right)}}$

we deduce that

$\displaystyle \frac{1}{2}\ensuremath{\mathtt{size(u.left)}} > \ensuremath{\mathtt{size(u.right)}} \enspace$

and therefore

$\displaystyle \ensuremath{\mathtt{size(u.left)}} - \ensuremath{\mathtt{size(u.r... ...\mathtt{size(u.left)}} > \frac{1}{3}\ensuremath{\mathtt{size(u)}} \enspace .$

Now, the last time a subtree containing $\mathtt{u}$ was rebuilt (or when $\mathtt{u}$ was inserted, if a subtree containing $\mathtt{u}$ was never rebuilt), we had

$\displaystyle \ensuremath{\mathtt{size(u.left)}} - \ensuremath{\mathtt{size(u.right)}} \le 1 \enspace .$

Therefore, the number of $\mathtt{add(x)}$ or $\mathtt{remove(x)}$ operations that have affected $\mathtt{u.left}$ or $\mathtt{u.right}$ since then is at least

$\displaystyle \frac{1}{3}\ensuremath{\mathtt{size(u)}} - 1 \enspace .$

and there are therefore at least this many credits stored at $\mathtt{u}$ that are available to pay for the $O(\ensuremath{\mathtt{size(u)}})$ time it takes to call $\mathtt{rebuild(u)}$ .

If we call $\mathtt{rebuild(u)}$ during a deletion, it is because $\ensuremath{\mathtt{q}} > 2\ensuremath{\mathtt{n}}$ . In this case, we have $\ensuremath{\mathtt{q}}-\ensuremath{\mathtt{n}}> \ensuremath{\mathtt{n}}$ credits stored ``on the side,'' and we use these to pay for the $O(\ensuremath{\mathtt{n}})$ time it takes to rebuild the root. This completes the proof. $\qedsymbol$

8.1.2 Summary

The following theorem summarizes the performance of the $\mathtt{ScapegoatTree}$ data structure:

Theorem 8..1 A $\mathtt{ScapegoatTree}$ implements the $\mathtt{SSet}$ interface. Ignoring the cost of $\mathtt{rebuild(u)}$ operations, a $\mathtt{ScapegoatTree}$ supports the operations $\mathtt{add(x)}$ , $\mathtt{remove(x)}$ , and $\mathtt{find(x)}$ in $O(\log \ensuremath{\mathtt{n}})$ time per operation.

Furthermore, beginning with an empty $\mathtt{ScapegoatTree}$ , any sequence of $\mathtt{add(x)}$ and $\mathtt{remove(x)}$ operations results in a total of $O(m\log m)$ time spent during all calls to $\mathtt{rebuild(u)}$ .

opendatastructures.org

8.1 : A Binary Search Tree with Partial Rebuilding

8.1.1 Analysis of Correctness and Running-Time

8.1.2 Summary

8.1 $\mathtt{ScapegoatTree}$ : A Binary Search Tree with Partial Rebuilding