7.3 Discussion and Exercises

Random binary search trees have been studied extensively. Devroye [19] gives a proof of Lemma 7.1 and related results. There are much stronger results in the literature as well, the most impressive of which is due to Reed [62], who shows that the expected height of a random binary search tree is

$\displaystyle \alpha\ln n - \beta\ln\ln n + O(1)
$

where $ \alpha\approx4.31107$ is the unique solution on the interval $ [2,\infty)$ of the equation $ \alpha\ln((2e/\alpha))=1$ and $ \beta=\frac{3}{2\ln(\alpha/2)}$ . Furthermore, the variance of the height is constant.

The name Treap was coined by Seidel and Aragon [65] who discussed Treaps and some of their variants. However, their basic structure was studied much earlier by Vuillemin [74] who called them Cartesian trees.

One possible space-optimization of the Treap data structure is the elimination of the explicit storage of the priority $ \ensuremath{\ensuremath{\mathit{p}}}$ in each node. Instead, the priority of a node, $ \ensuremath{\ensuremath{\mathit{u}}}$, is computed by hashing $ \ensuremath{\ensuremath{\mathit{u}}}$'s address in memory. Although a number of hash functions will probably work well for this in practice, for the important parts of the proof of Lemma 7.1 to remain valid, the hash function should be randomized and have the min-wise independent property: For any distinct values $ x_1,\ldots,x_k$, each of the hash values $ h(x_1),\ldots,h(x_k)$ should be distinct with high probability and, for each $ i\in\{1,\ldots,k\}$,

$\displaystyle \Pr\{h(x_i) = \min\{h(x_1),\ldots,h(x_k)\}\} \le c/k
$

for some constant $ c$. One such class of hash functions that is easy to implement and fairly fast is tabulation hashing (Section 5.2.3).

Another Treap variant that doesn't store priorities at each node is the randomized binary search tree of Martínez and Roura [51]. In this variant, every node, $ \ensuremath{\ensuremath{\mathit{u}}}$, stores the size, $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{size}}}$, of the subtree rooted at $ \ensuremath{\ensuremath{\mathit{u}}}$. Both the $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ and $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ algorithms are randomized. The algorithm for adding $ \ensuremath{\ensuremath{\mathit{x}}}$ to the subtree rooted at $ \ensuremath{\ensuremath{\mathit{u}}}$ does the following:

  1. With probability $ 1/(\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}}+1)$, the value $ \ensuremath{\ensuremath{\mathit{x}}}$ is added the usual way, as a leaf, and rotations are then done to bring $ \ensuremath{\ensuremath{\mathit{x}}}$ up to the root of this subtree.
  2. Otherwise (with probability $ 1-1/(\ensuremath{\ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}}+1)$), the value $ \ensuremath{\ensuremath{\mathit{x}}}$ is recursively added into one of the two subtrees rooted at $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{left}}}$ or $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{right}}}$, as appropriate.
The first case corresponds to an $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ operation in a Treap where $ \ensuremath{\ensuremath{\mathit{x}}}$'s node receives a random priority that is smaller than any of the $ \ensuremath{\mathrm{size}(\ensuremath{\mathit{u}})}$ priorities in $ \ensuremath{\ensuremath{\mathit{u}}}$'s subtree, and this case occurs with exactly the same probability.

Removing a value $ \ensuremath{\ensuremath{\mathit{x}}}$ from a randomized binary search tree is similar to the process of removing from a Treap. We find the node, $ \ensuremath{\ensuremath{\mathit{u}}}$, that contains $ \ensuremath{\ensuremath{\mathit{x}}}$ and then perform rotations that repeatedly increase the depth of $ \ensuremath{\ensuremath{\mathit{u}}}$ until it becomes a leaf, at which point we can splice it from the tree. The choice of whether to perform a left or right rotation at each step is randomized.

  1. With probability $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{left}}.\ensuremath{\mathit{size}}/(\ensuremath{\mathit{u}}.\ensuremath{\mathit{size}}-1)}$, we perform a right rotation at $ \ensuremath{\ensuremath{\mathit{u}}}$, making $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{left}}}$ the root of the subtree that was formerly rooted at $ \ensuremath{\ensuremath{\mathit{u}}}$.
  2. With probability $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{right}}.\ensuremath{\mathit{size}}/(\ensuremath{\mathit{u}}.\ensuremath{\mathit{size}}-1)}$, we perform a left rotation at $ \ensuremath{\ensuremath{\mathit{u}}}$, making $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{right}}}$ the root of the subtree that was formerly rooted at $ \ensuremath{\ensuremath{\mathit{u}}}$.
Again, we can easily verify that these are exactly the same probabilities that the removal algorithm in a Treap will perform a left or right rotation of $ \ensuremath{\ensuremath{\mathit{u}}}$.

Randomized binary search trees have the disadvantage, compared to treaps, that when adding and removing elements they make many random choices, and they must maintain the sizes of subtrees. One advantage of randomized binary search trees over treaps is that subtree sizes can serve another useful purpose, namely to provide access by rank in $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ expected time (see Exercise 7.10). In comparison, the random priorities stored in treap nodes have no use other than keeping the treap balanced.

Exercise 7..1   Illustrate the addition of 4.5 (with priority 7) and then 7.5 (with priority 20) on the Treap in Figure 7.5.

Exercise 7..2   Illustrate the removal of 5 and then 7 on the Treap in Figure 7.5.

Exercise 7..3   Prove the assertion that there are $ 21,964,800$ sequences that generate the tree on the right hand side of Figure 7.1. (Hint: Give a recursive formula for the number of sequences that generate a complete binary tree of height $ h$ and evaluate this formula for $ h=3$.)

Exercise 7..4   Design and implement the $ \ensuremath{\mathrm{permute}(\ensuremath{\mathit{a}})}$ method that takes as input an array, $ \ensuremath{\ensuremath{\mathit{a}}}$, that contains $ \ensuremath{\ensuremath{\mathit{n}}}$ distinct values and randomly permutes $ \ensuremath{\ensuremath{\mathit{a}}}$. The method should run in $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ time and you should prove that each of the $ \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}!$ possible permutations of $ \ensuremath{\ensuremath{\mathit{a}}}$ is equally probable.

Exercise 7..5   Use both parts of Lemma 7.2 to prove that the expected number of rotations performed by an $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ operation (and hence also a $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ operation) is $ O(1)$.

Exercise 7..6   Modify the Treap implementation given here so that it does not explicitly store priorities. Instead, it should simulate them by hashing the $ \ensuremath{\mathrm{hash\_code}()}$ of each node.

Exercise 7..7   Suppose that a binary search tree stores, at each node, $ \ensuremath{\ensuremath{\mathit{u}}}$, the height, $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{height}}}$, of the subtree rooted at $ \ensuremath{\ensuremath{\mathit{u}}}$, and the size, $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{size}}}$ of the subtree rooted at $ \ensuremath{\ensuremath{\mathit{u}}}$.
  1. Show how, if we perform a left or right rotation at $ \ensuremath{\ensuremath{\mathit{u}}}$, then these two quantities can be updated, in constant time, for all nodes affected by the rotation.
  2. Explain why the same result is not possible if we try to also store the depth, $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{depth}}}$, of each node $ \ensuremath{\ensuremath{\mathit{u}}}$.

Exercise 7..8   Design and implement an algorithm that constructs a Treap from a sorted array, $ \ensuremath{\ensuremath{\mathit{a}}}$, of $ \ensuremath{\ensuremath{\mathit{n}}}$ elements. This method should run in $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ worst-case time and should construct a Treap that is indistinguishable from one in which the elements of $ \ensuremath{\ensuremath{\mathit{a}}}$ were added one at a time using the $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ method.

Exercise 7..9   This exercise works out the details of how one can efficiently search a Treap given a pointer that is close to the node we are searching for.
  1. Design and implement a Treap implementation in which each node keeps track of the minimum and maximum values in its subtree.
  2. Using this extra information, add a $ \ensuremath{\mathrm{finger\_find}(\ensuremath{\mathit{x}},\ensuremath{\mathit{u}})}$ method that executes the $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation with the help of a pointer to the node $ \ensuremath{\ensuremath{\mathit{u}}}$ (which is hopefully not far from the node that contains $ \ensuremath{\ensuremath{\mathit{x}}}$). This operation should start at $ \ensuremath{\ensuremath{\mathit{u}}}$ and walk upwards until it reaches a node $ \ensuremath{\ensuremath{\mathit{w}}}$ such that $ \ensuremath{\ensuremath{\ensuremath{\mathit{w}}.\ensuremath{\mathit{min}}}}\le...
...\le \ensuremath{\ensuremath{\ensuremath{\mathit{w}}.\ensuremath{\mathit{max}}}}$. From that point onwards, it should perform a standard search for $ \ensuremath{\ensuremath{\mathit{x}}}$ starting from $ \ensuremath{\ensuremath{\mathit{w}}}$. (One can show that $ \ensuremath{\mathrm{finger\_find}(\ensuremath{\mathit{x}},\ensuremath{\mathit{u}})}$ takes $ O(1+\log r)$ time, where $ r$ is the number of elements in the treap whose value is between $ \ensuremath{\ensuremath{\mathit{x}}}$ and $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{x}}}$.)
  3. Extend your implementation into a version of a treap that starts all its $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operations from the node most recently found by $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$.

Exercise 7..10   Design and implement a version of a Treap that includes a $ \ensuremath{\mathrm{get}(\ensuremath{\mathit{i}})}$ operation that returns the key with rank $ \ensuremath{\ensuremath{\mathit{i}}}$ in the Treap. (Hint: Have each node, $ \ensuremath{\ensuremath{\mathit{u}}}$, keep track of the size of the subtree rooted at $ \ensuremath{\ensuremath{\mathit{u}}}$.)

Exercise 7..11   Implement a TreapList, an implementation of the List interface as a treap. Each node in the treap should store a list item, and an in-order traversal of the treap finds the items in the same order that they occur in the list. All the List operations $ \ensuremath{\mathrm{get}(\ensuremath{\mathit{i}})}$, $ \ensuremath{\mathrm{set}(\ensuremath{\mathit{i}},\ensuremath{\mathit{x}})}$, $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{i}},\ensuremath{\mathit{x}})}$ and $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{i}})}$ should run in $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ expected time.

Exercise 7..12   Design and implement a version of a Treap that supports the $ \ensuremath{\mathrm{split}(\ensuremath{\mathit{x}})}$ operation. This operation removes all values from the Treap that are greater than $ \ensuremath{\ensuremath{\mathit{x}}}$ and returns a second Treap that contains all the removed values.

Example: the code $ \ensuremath{\ensuremath{\mathit{t_2}} \gets \ensuremath{\ensuremath{\mathit{t}}.\mathrm{split}(\ensuremath{\mathit{x}})}}$ removes from $ \ensuremath{\ensuremath{\mathit{t}}}$ all values greater than $ \ensuremath{\ensuremath{\mathit{x}}}$ and returns a new Treap $ \ensuremath{\ensuremath{\mathit{t_2}}}$ containing all these values. The $ \ensuremath{\mathrm{split}(\ensuremath{\mathit{x}})}$ operation should run in $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ expected time.

Warning: For this modification to work properly and still allow the $ \ensuremath{\mathrm{size}()}$ method to run in constant time, it is necessary to implement the modifications in Exercise 7.10.

Exercise 7..13   Design and implement a version of a Treap that supports the $ \ensuremath{\mathrm{absorb}(\ensuremath{\mathit{t_2}})}$ operation, which can be thought of as the inverse of the $ \ensuremath{\mathrm{split}(\ensuremath{\mathit{x}})}$ operation. This operation removes all values from the Treap $ \ensuremath{\ensuremath{\mathit{t_2}}}$ and adds them to the receiver. This operation presupposes that the smallest value in $ \ensuremath{\ensuremath{\mathit{t_2}}}$ is greater than the largest value in the receiver. The $ \ensuremath{\mathrm{absorb}(\ensuremath{\mathit{t_2}})}$ operation should run in $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ expected time.

Exercise 7..14   Implement Martinez's randomized binary search trees, as discussed in this section. Compare the performance of your implementation with that of the Treap implementation.

opendatastructures.org