7.3 Discussion and Exercises

Random binary search trees have been studied extensively. Devroye [14] gives a proof of Lemma 7.1 and related results. There are much stronger results in the literature as well. The most impressive of which is due to Reed [53], who shows that the expected height of a random binary search tree is

$\displaystyle \alpha\ln n - \beta\ln\ln n + O(1)
$

where $ \alpha\approx4.31107$ is the unique solution on $ [2,\infty)$ of the equation $ \alpha\ln((2e/\alpha))=1$ and $ \beta=\frac{3}{2\ln(\alpha/2)}$ . Furthermore, the variance of the height is constant.

The name $ \mathtt{Treap}$ was coined by Aragon and Seidel [56] who discussed $ \mathtt{Treap}$s and some of their variants. However, their basic structure was studied much earlier by Vuillemin [61] who called them Cartesian trees.

One space-optimization of the $ \mathtt{Treap}$ data structure that is sometimes performed is the elimination of the explicit storage of the priority $ \mathtt{p}$ in each node. Instead, the priority of a node, $ \mathtt{u}$, is computed by hashing $ \mathtt{u}$'s address in memory. Although a number of hash functions will probably work well for this in practice, for the important parts of the proof of Lemma 7.1 to remain valid, the hash function should be randomized and have the min-wise independent property: For any distinct values $ x_1,\ldots,x_k$, each of the hash values $ h(x_1),\ldots,h(x_k)$ should be distinct with high probability and, for each $ i\in\{1,\ldots,k\}$,

$\displaystyle \Pr\{h(x_i) = \min\{h(x_1),\ldots,h(x_k)\}\} \le c/k
$

for some constant $ c$. One such class of hash functions that is easy to implement and fairly fast is tabulation hashing [51].

Exercise 7..1   Prove the assertion that there are $ 21,964,800$ sequences that generate the tree on the right hand side of Figure 7.1. (Hint: Give a recursive formula for the number of sequences that generate a complete binary tree of height $ h$ and evaluate this formula for $ h=3$.)

Exercise 7..2   Design and implement the $ \mathtt{permute(a)}$ method that takes as input an array, $ \mathtt{a}$, containing $ \mathtt{n}$ distinct values and randomly permutes $ \mathtt{a}$. The method should run in $ O(\ensuremath{\mathtt{n}})$ time and you should prove that each of the $ \ensuremath{\mathtt{n}}!$ possible permutations of $ \mathtt{a}$ is equally probable.

Exercise 7..3   Use both parts of Lemma 7.2 to prove that the expected number of rotations performed by an $ \mathtt{add(x)}$ operation (and hence also a $ \mathtt{remove(x)}$ operation) is $ O(1)$.

Exercise 7..4   Design and implement a version of a $ \mathtt{Treap}$ that includes a $ \mathtt{get(i)}$ operation that returns the key with rank $ \mathtt{i}$ in the $ \mathtt{Treap}$. (Hint: Have each node, $ \mathtt{u}$, keep track of the size of the subtree rooted at $ \mathtt{u}$.)

Exercise 7..5   Design and implement a version of a $ \mathtt{Treap}$ that supports the $ \mathtt{split(x)}$ operation. This operation removes all values from the $ \mathtt{Treap}$ that are greater than $ \mathtt{x}$ and returns a second $ \mathtt{Treap}$ that contains all the removed values.

For example, the code $ \mathtt{t2 = t.split(x)}$ removes from $ \mathtt{t}$ all values greater than $ \mathtt{x}$ and returns a new $ \mathtt{Treap}$ $ \mathtt{t2}$ containing all these values. The $ \mathtt{split(x)}$ operation should run in $ O(\log \ensuremath{\mathtt{n}})$ expected time.

opendatastructures.org