9.4 Discussion and Exercises

Red-black trees were first introduced by Guibas and Sedgewick [32]. Despite their high implementation complexity they are found in some of the most commonly used libraries and applications. Most algorithms and data structures discuss some variant of red-black trees.

Andersson [4] describes a left-leaning version of balanced trees that are similar to red-black trees but have the additional constraint that any node has at most one red child. This implies that these trees simulate 2-3 trees rather than 2-4 trees. They are significantly simpler, though, than the RedBlackTree structure presented in this chapter.

Sedgewick [57] describes at least two versions of left-leaning red-black trees. These use recursion along with a simulation of top-down splitting and merging in 2-4 trees. The combination of these two techniques makes for particularly short and elegant code.

A related, and older, data structure is the AVL tree [2]. AVL trees are height-balanced: At each node $ u$, the height of the subtree rooted at $ \mathtt{u.left}$ and the subtree rooted at $ \mathtt{u.right}$ differ by at most one. It follows immediately that, if $ F(h)$ is the minimum number of leaves in a tree of height $ h$, then $ F(h)$ obeys the Fibonacci recurrence

$\displaystyle F(h) = F(h-1) + F(h-2)
$

with base cases $ F(0)=1$ and $ F(1)=1$. This means $ F(h)$ is approximately $ \varphi^h/\sqrt{5}$, where $ \varphi=(1+\sqrt{5})/2\approx1.61803399$ is the golden ratio. (More precisely, $ \vert\varphi^h/\sqrt{5} - F(h)\vert\le 1/2$.) Arguing as in the proof of Lemma 9.1, this implies

$\displaystyle h \le \log_\varphi \ensuremath{\mathtt{n}} \approx 1.440420088\log \ensuremath{\mathtt{n}} \enspace ,
$

so AVL trees have smaller height than red-black trees. The height-balanced property can be maintained during $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ operations by walking back up the path to the root and performing a rebalancing operation at each node $ \mathtt{u}$ where the height of $ \mathtt{u}$'s left and right subtrees differ by 2. See Figure 9.10.

Figure 9.10: Rebalancing in an AVL tree. At most 2 rotations are required to convert a node whose subtrees have height $ h$ and $ h+2$ into a node whose subtrees each have height at most $ h+1$.
\includegraphics{figs/avl-rebalance}

Andersson's variant of red-black trees, Sedgewick's variant of red-black trees, and AVL trees are all simpler to implement than the RedBlackTree structure defined here. Unfortunately, none of them can guarantee that the amortized time spent rebalancing is $ O(1)$ per update. In particular, there is no analogue of Theorem 9.2 for those structures.

Exercise 9..1   Why does the method $ \mathtt{remove(x)}$ in the RedBlackTree implementation perform the assignment $ \mathtt{u.parent=w.parent}$? Shouldn't this already be done by the call to $ \mathtt{splice(w)}$?

Exercise 9..2   Suppose a 2-4 tree, $ T$, has $ \ensuremath{\mathtt{n}}_\ell$ leaves and $ \ensuremath{\mathtt{n}}_i$ internal nodes.
  1. What is the minimum value of $ \ensuremath{\mathtt{n}}_i$, as a function of $ \ensuremath{\mathtt{n}}_\ell$?
  2. What is the maximum value of $ \ensuremath{\mathtt{n}}_i$, as a function of $ \ensuremath{\mathtt{n}}_\ell$?
  3. If $ T'$ is a red-black tree that represents $ T$, then how many red nodes does $ T'$ have?

Exercise 9..3   Prove that, during an $ \mathtt{add(x)}$ operation, an AVL tree must perform at most one rebalancing operation (that involves at most 2 rotations; see Figure 9.10). Give an example of an AVL tree and a $ \mathtt{remove(x)}$ operation on that tree that requires on the order of $ \log \ensuremath{\mathtt{n}}$ rebalancing operations.

Exercise 9..4   Implement an AVLTree class that implements AVL trees as described above. Compare its performance to that of the RedBlackTree implementation. Which implementation has a faster $ \mathtt{find(x)}$ operation?

Exercise 9..5   Design and implement a series of experiments that compare the relative performance of $ \mathtt{find(x)}$, $ \mathtt{add(x)}$, and $ \mathtt{remove(x)}$ for SkiplistSSet, ScapegoatTree, Treap, and RedBlackTree. Be sure to include multiple test scenarios, including cases where the data is random, already sorted, is removed in random order, is removed in sorted order, and so on.

opendatastructures.org