8.2 Discussion and Exercises

The term scapegoat tree is due to Galperin and Rivest [30], who define and analyze these trees. However, the same structure was discovered earlier by Andersson [4,6], who called them general balanced trees since they can have any shape as long as their height is small.

Experimenting with the $ \mathtt{ScapegoatTree}$ implementation will reveal that it is often considerably slower than the other $ \mathtt{SSet}$ implementations in this book. This may be somewhat surprising, since height bound of

$\displaystyle \log_{3/2}\ensuremath{\mathtt{q}} \approx 1.709\log \ensuremath{\mathtt{n}} + O(1)
$

is better than the expected length of a search path in a $ \mathtt{Skiplist}$ and not too far from that of a $ \mathtt{Treap}$. The implementation could be optimized by storing the sizes of subtrees explicitly at each node or by reusing already computed subtree sizes (Exercises 8.5 and 8.6). Even with these optimizations, there will always be sequences of $ \mathtt{add(x)}$ and $ \mathtt{delete(x)}$ operation for which a $ \mathtt{ScapegoatTree}$ takes longer than other $ \mathtt{SSet}$ implementations.

This gap in performance is due to the fact that, unlike the other $ \mathtt{SSet}$ implementations discussed in this book, a $ \mathtt{ScapegoatTree}$ can spend a lot of time restructuring itself. Exercise 8.3 asks you to prove that there are sequences of $ \mathtt{n}$ operations in which a $ \mathtt{ScapegoatTree}$ will spend on the order of $ \ensuremath{\mathtt{n}}\log \ensuremath{\mathtt{n}}$ time in calls to $ \mathtt{rebuild(u)}$. This is in contrast to other $ \mathtt{SSet}$ implementations discussed in this book that only make $ O(\ensuremath{\mathtt{n}})$ structural changes during a sequence of $ \mathtt{n}$ operations. This is, unfortunately, a necessary consequence of the fact that a $ \mathtt{ScapegoatTree}$ does all its restructuring by calls to $ \mathtt{rebuild(u)}$ [17].

Despite their lack of performance, there are applications in which a $ \mathtt{ScapegoatTree}$ could be the right choice. This would occur any time there is additional data associated with nodes that cannot be updated in constant time when a rotation is performed, but that can be updated during a $ \mathtt{rebuild(u)}$ operation. In such cases, the $ \mathtt{ScapegoatTree}$ and related structures based on partial rebuilding may work. An example of such an application is outlined in Exercise 8.12.

Exercise 8..1   Illustrate the addition of the values 1.5 and then 1.6 on the $ \mathtt{ScapegoatTree}$ in Figure 8.1.

Exercise 8..2   Illustrate what happens when we start with an empty $ \mathtt{ScapegoatTree}$ and add the sequence $ 1,5,2,4,3$ and show where the credits described in the proof of Lemma 8.3 go, and how they are used during this sequence of additions.

Exercise 8..3   Show that, if we start with an empty $ \mathtt{ScapegoatTree}$ and call $ \mathtt{add(x)}$ for $ \ensuremath{\mathtt{x}}=1,2,3,\ldots,\ensuremath{\mathtt{n}}$, then the total time spent during calls to $ \mathtt{rebuild(u)}$ is at least $ c\ensuremath{\mathtt{n}}\log \ensuremath{\mathtt{n}}$ for some constant $ c>0$.

Exercise 8..4   The $ \mathtt{ScapegoatTree}$, as described in this chapter, guarantees that the length of the search path does not exceed $ \log_{3/2}\ensuremath{\mathtt{q}}$.
  1. Design, analyze, and implement a modified version of $ \mathtt{ScapegoatTree}$ where the length of the search path does not exceed $ \log_{\ensuremath{\mathtt{b}}} \ensuremath{\mathtt{q}}$, where $ \mathtt{b}$ is a parameter with $ 1<\ensuremath{\mathtt{b}}<2$.
  2. What does your analysis and/or experiments say about the amortized cost of $ \mathtt{find(x)}$, $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ as a function of $ \mathtt{n}$ and $ \mathtt{b}$?

Exercise 8..5   Modify the $ \mathtt{add(x)}$ method of the $ \mathtt{ScapegoatTree}$ so that it does not waste any time recomputing the sizes of subtrees that it has already computed the size of. This is possible because, by the time the method wants to compute $ \mathtt{size(w)}$, it has already computed $ \mathtt{size(w.left)}$ or $ \mathtt{size(w.right)}$. Compare the performance of your modified implementation with the implementation given here.

Exercise 8..6   Implement a second version of the $ \mathtt{ScapegoatTree}$ data structure that explicitly stores and maintains the sizes of the subtree rooted at each node. Compare the performance of the resulting implementation with that of the original $ \mathtt{ScapegoatTree}$ implementation as well as the implementation from Exercise 8.5.

Exercise 8..7   Reimplement the $ \mathtt{rebuild(u)}$ method discussed at the beginning of this chapter so that it does not require the use of an array to store the nodes of the subtree being rebuilt. Instead, it should use recursion to first connect the nodes into a linked list and then convert this linked list into a perfectly balanced binary tree. (There are very elegant recursive implementations of both steps.)

Exercise 8..8   Analyze and implement a $ \mathtt{WeightBalancedTree}$. This is a tree in which each node $ \mathtt{u}$, except the root, maintains the balance invariant that $ \ensuremath{\mathtt{size(u)}} \le (2/3)\ensuremath{\mathtt{size(u.parent)}}$. The $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ operations are identical to the standard $ \mathtt{BinarySearchTree}$ operations, except that any time the balance invariant is violated at a node $ \mathtt{u}$, the subtree rooted at $ \mathtt{u.parent}$ is rebuilt. Your analysis should show that operations on a $ \mathtt{WeightBalancedTree}$ run in $ O(\log \ensuremath{\mathtt{n}})$ amortized time.

Exercise 8..9   Analyze and implement a $ \mathtt{WeightBalancedTree}$. This is a tree in which each node $ \mathtt{u}$, except the root, maintains the balance invariant that $ \ensuremath{\mathtt{size(u)}} \le (2/3)\ensuremath{\mathtt{size(u.parent)}}$. The $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ operations are identical to the standard $ \mathtt{BinarySearchTree}$ operations, except that any time the balance invariant is violated at a node $ \mathtt{u}$, rotations are performed in order to restore these invariants. Your analysis should show that operations on a $ \mathtt{WeightBalancedTree}$ run in $ O(\log \ensuremath{\mathtt{n}})$ worst-case time.

Exercise 8..10   Analyze and implement a $ \mathtt{CountdownTree}$. In a $ \mathtt{CountdownTree}$ each node $ \mathtt{u}$ keeps a timer $ \mathtt{u.t}$. The $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ operations are exactly the same as in a standard $ \mathtt{BinarySearchTree}$ except that, whenever one of these operations affects $ \mathtt{u}$'s subtree, $ \mathtt{u.t}$ is decremented. When $ \ensuremath{\mathtt{u.t}}=0$ the entire subtree rooted at $ \mathtt{u}$ is rebuilt into a perfectly balanced binary search tree. When a node $ \mathtt{u}$ is involved in a rebuilding operation (either because $ \mathtt{u}$ is rebuilt or one of $ \mathtt{u}$'s ancestors is rebuilt) $ \mathtt{u.t}$ is reset to $ \ensuremath{\mathtt{size(u)}}/3$.

Your analysis should show that operations on a $ \mathtt{CountdownTree}$ run in $ O(\log \ensuremath{\mathtt{n}})$ amortized time. (Hint: First show that each node $ \mathtt{u}$ satisfies some version of a balance invariant.)

Exercise 8..11   Analyze and implement a $ \mathtt{DynamiteTree}$. In a $ \mathtt{DynamiteTree}$ each node $ \mathtt{u}$ keeps tracks of the size of the subtree rooted at $ \mathtt{u}$ in a variable $ \mathtt{u.size}$. The $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ operations are exactly the same as in a standard $ \mathtt{BinarySearchTree}$ except that, whenever one of these operations affects a node $ \mathtt{u}$'s subtree, $ \mathtt{u}$ explodes with probability $ 1/\ensuremath{\mathtt{u.size}}$. When $ \mathtt{u}$ explodes, its entire subtree is rebuilt into a perfectly balanced binary search tree.

Your analysis should show that operations on a $ \mathtt{DynamiteTree}$ run in $ O(\log \ensuremath{\mathtt{n}})$ expected time.

Exercise 8..12   Design and implement a $ \mathtt{Sequence}$ data structure that maintains a sequence (list) of elements. It supports these operations: The first two operations should run in $ O(\log \ensuremath{\mathtt{n}})$ amortized time. The third operation should run in constant-time.

The $ \mathtt{Sequence}$ data structure can be implemented by storing the elements in something like a $ \mathtt{ScapegoatTree}$, in the same order that they occur in the sequence. To implement $ \mathtt{testBefore(e1,e2)}$ in constant time, each element $ \mathtt{e}$ is labelled with an integer that encodes the path from the root to $ \mathtt{e}$. In this way, $ \mathtt{testBefore(e1,e2)}$ can be implemented just by comparing the labels of $ \mathtt{e1}$ and $ \mathtt{e2}$.

opendatastructures.org