7.3 Discussion and Exercises

Random binary search trees have been studied extensively. Devroye [16] gives a proof of Lemma 7.1 and related results. There are much stronger results in the literature as well. The most impressive of which is due to Reed [57], who shows that the expected height of a random binary search tree is

$\displaystyle \alpha\ln n - \beta\ln\ln n + O(1)
$

where $ \alpha\approx4.31107$ is the unique solution on $ [2,\infty)$ of the equation $ \alpha\ln((2e/\alpha))=1$ and $ \beta=\frac{3}{2\ln(\alpha/2)}$ . Furthermore, the variance of the height is constant.

The name $ \mathtt{Treap}$ was coined by Aragon and Seidel [60] who discussed $ \mathtt{Treap}$s and some of their variants. However, their basic structure was studied much earlier by Vuillemin [67] who called them Cartesian trees.

One space-optimization of the $ \mathtt{Treap}$ data structure that is sometimes performed is the elimination of the explicit storage of the priority $ \mathtt{p}$ in each node. Instead, the priority of a node, $ \mathtt{u}$, is computed by hashing $ \mathtt{u}$'s address in memory. Although a number of hash functions will probably work well for this in practice, for the important parts of the proof of Lemma 7.1 to remain valid, the hash function should be randomized and have the min-wise independent property: For any distinct values $ x_1,\ldots,x_k$, each of the hash values $ h(x_1),\ldots,h(x_k)$ should be distinct with high probability and, for each $ i\in\{1,\ldots,k\}$,

$\displaystyle \Pr\{h(x_i) = \min\{h(x_1),\ldots,h(x_k)\}\} \le c/k
$

for some constant $ c$. One such class of hash functions that is easy to implement and fairly fast is tabulation hashing (Section 5.2.3).

Another $ \mathtt{Treap}$ variant that doesn't store priorities at each node is the randomized binarysearch tree of Martínez and Roura [46]. In this variant, every node, $ \mathtt{u}$, stores the size, $ \mathtt{u.size}$, of the subtree rooted at $ \mathtt{u}$. Both the $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ algorithms are randomized. The algorithm for adding $ \mathtt{x}$ to the subtree rooted at $ \mathtt{u}$ does the following:

  1. With probability $ 1/(\ensuremath{\mathtt{size(u)}}+1)$, $ \mathtt{x}$ is added the usual way, as a leaf, and rotations are then done to bring $ \mathtt{x}$ up to the root of this subtree.
  2. Otherwise, $ \mathtt{x}$ is recursively added into one of the two subtrees rooted at $ \mathtt{u.left}$ or $ \mathtt{u.right}$, as appropriate.
The first case corresponds to an $ \mathtt{add(x)}$ operation in a $ \mathtt{Treap}$ where $ \mathtt{x}$'s node receives a random priority that is smaller than any of the $ \mathtt{size(u)}$ priorities in $ \mathtt{u}$'s subtree, and this case occurs with exactly the same probability.

Removing a value $ \mathtt{x}$ from a randomized binary search tree is similar to the process of removing from a $ \mathtt{Treap}$. We find the node, $ \mathtt{u}$, that contains $ \mathtt{x}$ and then perform rotations that repeatedly increase the depth of $ \mathtt{u}$ until it becomes a leaf, at which point we can splice it from the tree. The choice of whether to perform a left or right rotation at each step is randomized.

  1. With probability $ \mathtt{u.left.size/(u.size-1)}$, we perform a right rotation at $ \mathtt{u}$, making $ \mathtt{u.left}$ the root of the subtree that was formerly rooted at $ \mathtt{u}$.
  2. With probability $ \mathtt{u.right.size/(u.size-1)}$, we perform a left rotation at $ \mathtt{u}$, making $ \mathtt{u.right}$ the root of the subtree that was formerly rooted at $ \mathtt{u}$.
Again, we can easily verify that these are exactly the same probabilities that the removal algorithm in a $ \mathtt{Treap}$ will perform a left or right rotation of $ \mathtt{u}$.

Randomized binary search trees have the disadvantage, compared to treaps, that when adding and removing elements they make many random choices and they must maintain the sizes of subtrees. One advantage of randomized binary search trees over treaps is that subtree sizes can serve another useful purpose, namely to provide access by rank in $ O(\log \ensuremath{\mathtt{n}})$ expected time (see Exercise 7.10). In comparison, the random priorities stored in treap nodes have no use other than keeping the treap balanced.

Exercise 7..1   Illustrate the addition of 4.5 (with priority 7) and then 7.5 (with priority 20) on the $ \mathtt{Treap}$ in Figure 7.5.

Exercise 7..2   Illustrate the removal of 5 and then 7 on the $ \mathtt{Treap}$ in Figure 7.5.

Exercise 7..3   Prove the assertion that there are $ 21,964,800$ sequences that generate the tree on the right hand side of Figure 7.1. (Hint: Give a recursive formula for the number of sequences that generate a complete binary tree of height $ h$ and evaluate this formula for $ h=3$.)

Exercise 7..4   Design and implement the $ \mathtt{permute(a)}$ method that takes as input an array, $ \mathtt{a}$, containing $ \mathtt{n}$ distinct values and randomly permutes $ \mathtt{a}$. The method should run in $ O(\ensuremath{\mathtt{n}})$ time and you should prove that each of the $ \ensuremath{\mathtt{n}}!$ possible permutations of $ \mathtt{a}$ is equally probable.

Exercise 7..5   Use both parts of Lemma 7.2 to prove that the expected number of rotations performed by an $ \mathtt{add(x)}$ operation (and hence also a $ \mathtt{remove(x)}$ operation) is $ O(1)$.

Exercise 7..6   Modify the $ \mathtt{Treap}$ implementation given here so that it does not explicitly store priorities. Instead, it should simulate them by hashing the $ \mathtt{hashCode()}$ of each node.

Exercise 7..7   Suppose that a binary search tree stores, at each node, $ \mathtt{u}$, the height, $ \mathtt{u.height}$, of the subtree rooted at $ \mathtt{u}$, and the size, $ \mathtt{u.size}$ of the subtree rooted at $ \mathtt{u}$.
  1. Show how, if we perform a left or right rotation at $ \mathtt{u}$, then these two quantities can be updated, in constant time, for all nodes affected by the rotation.
  2. Explain why the same result is not possible if we try to also store the depth, $ \mathtt{u.depth}$, of each node $ \mathtt{u}$.

Exercise 7..8   Design an implement an algorithm that constructs a $ \mathtt{Treap}$ from a sorted array, $ \mathtt{a}$, of $ \mathtt{n}$ elements. This method should run in $ O(\ensuremath{\mathtt{n}})$ worst-case time and should construct a $ \mathtt{Treap}$ that is indistinguishable from one in which the elements of $ \mathtt{a}$ were added one at a time using the $ \mathtt{add(x)}$ method.

Exercise 7..9   This exercise works out the details of how one can efficiently search a $ \mathtt{Treap}$ given a pointer that is close to the node we are searching for.
  1. Design and implement a $ \mathtt{Treap}$ implementation in which each node keeps track of the minimum and maximum values in its subtree.
  2. Using this extra information, add a $ \mathtt{fingerFind(x,u)}$ method that executes the $ \mathtt{find(x)}$ operation with the help of a pointer to the node $ \mathtt{u}$ (which is hopefully not far from the node that contains $ \mathtt{x}$). This operation should start at $ \mathtt{u}$ and walk upwards until it reaches a node $ \mathtt{w}$ such that $ \ensuremath{\mathtt{w.min}}\le \ensuremath{\mathtt{x}}\le \ensuremath{\mathtt{w.max}}$. From that point onwards, it should perform a standard search for $ \mathtt{x}$ starting from $ \mathtt{w}$. (One can show that $ \mathtt{fingerFind(x,u)}$ takes $ O(1+\log r)$ time, where $ r$ is the number of elements in the treap whose value is between $ \mathtt{x}$ and $ \mathtt{u.x}$.)
  3. Extend your implementation into a version of a treap that starts all its $ \mathtt{find(x)}$ operations from the node most recently found by $ \mathtt{find(x)}$.

Exercise 7..10   Design and implement a version of a $ \mathtt{Treap}$ that includes a $ \mathtt{get(i)}$ operation that returns the key with rank $ \mathtt{i}$ in the $ \mathtt{Treap}$. (Hint: Have each node, $ \mathtt{u}$, keep track of the size of the subtree rooted at $ \mathtt{u}$.)

Exercise 7..11   Implement a $ \mathtt{TreapList}$, an implementation of the $ \mathtt{List}$ interface as a treap. Each node in the treap should store a list item, and an in-order traversal of the treap finds the items in the same order that they occur in the list. All the $ \mathtt{List}$ operations $ \mathtt{get(i)}$, $ \mathtt{set(i,x)}$, $ \mathtt{add(i,x)}$ and $ \mathtt{remove(i)}$ should run in $ O(\log \ensuremath{\mathtt{n}})$ expected time.

Exercise 7..12   Design and implement a version of a $ \mathtt{Treap}$ that supports the $ \mathtt{split(x)}$ operation. This operation removes all values from the $ \mathtt{Treap}$ that are greater than $ \mathtt{x}$ and returns a second $ \mathtt{Treap}$ that contains all the removed values.

Example: the code $ \mathtt{t2 = t.split(x)}$ removes from $ \mathtt{t}$ all values greater than $ \mathtt{x}$ and returns a new $ \mathtt{Treap}$ $ \mathtt{t2}$ containing all these values. The $ \mathtt{split(x)}$ operation should run in $ O(\log \ensuremath{\mathtt{n}})$ expected time.

Warning: For this modification to work properly and still allow the $ \mathtt{size()}$ method to run in constant time, it is necessary to implement the modifications in Exercise 7.10.

Exercise 7..13   Design and implement a version of a $ \mathtt{Treap}$ that supports the $ \mathtt{absorb(t2)}$ operation, which can be thought of as the inverse of the $ \mathtt{split(x)}$ operation. This operation removes all values from the $ \mathtt{Treap}$ $ \mathtt{t2}$ and adds them to the receiver. This operation presupposes that the smallest value in $ \mathtt{t2}$ is greater than the largest value in the receiver. The $ \mathtt{absorb(t2)}$ operation should run in $ O(\log \ensuremath{\mathtt{n}})$ expected time.

Exercise 7..14   Implement Martinez's randomized binary search trees, as discussed in this section. Compare the performance of your implementation with that of the $ \mathtt{Treap}$ implementation.

opendatastructures.org