Processing math: 0%
Open Data Struc­tures
\newcommand{\E}{\mathrm{E}} \DeclareMathOperator{\ddiv}{div}
Con­tents
7 Ran­dom Bi­nary Search Trees

In this chap­ter, we pre­sent a bi­nary search tree struc­ture that uses ran­dom­iza­tion to achieve O(\log \texttt{n}) ex­pected time for all op­er­a­tions.

7.1 Ran­dom Bi­nary Search Trees

Con­sider the two bi­nary search trees shown in Fig­ure 7.1, each of which has \texttt{n}=15 nodes. The one on the left is a list and the other is a per­fectly bal­anced bi­nary search tree. The one on the left has a height of \texttt{n}-1=14 and the one on the right has a height of three.

Fig­ure 7.1 Two bi­nary search trees con­tain­ing the in­te­gers 0,\ldots,14.

Imag­ine how these two trees could have been con­structed. The one on the left oc­curs if we start with an empty BinarySearchTree and add the se­quence \begin{equation*} \langle 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 \rangle \enspace . \end{equation*} No other se­quence of ad­di­tions will cre­ate this tree (as you can prove by in­duc­tion on n). On the other hand, the tree on the right can be cre­ated by the se­quence \begin{equation*} \langle 7,3,11,1,5,9,13,0,2,4,6,8,10,12,14 \rangle \enspace . \end{equation*} Other se­quences work as well, in­clud­ing \begin{equation*} \langle 7,3,1,5,0,2,4,6,11,9,13,8,10,12,14 \rangle \enspace , \end{equation*} and \begin{equation*} \langle 7,3,1,11,5,0,2,4,6,9,13,8,10,12,14 \rangle \enspace . \end{equation*} In fact, there are 21,964,800 ad­di­tion se­quences that gen­er­ate the tree on the right and only one that gen­er­ates the tree on the left.

The above ex­am­ple gives some anec­do­tal ev­i­dence that, if we choose a ran­dom per­mu­ta­tion of 0,\ldots,14, and add it into a bi­nary search tree, then we are more likely to get a very bal­anced tree (the right side of Fig­ure 7.1) than we are to get a very un­bal­anced tree (the left side of Fig­ure 7.1).

We can for­mal­ize this no­tion by study­ing ran­dom bi­nary search trees. A ran­dom bi­nary search tree of size n is ob­tained in the fol­low­ing way: Take a ran­dom per­mu­ta­tion, \texttt{x}_0,\ldots,\texttt{x}_{\texttt{n}-1}, of the in­te­gers 0,\ldots,\texttt{n}-1 and add its el­e­ments, one by one, into a BinarySearchTree. By ran­dom per­mu­ta­tion we mean that each of the pos­si­ble \texttt{n}! per­mu­ta­tions (or­der­ings) of 0,\ldots,\texttt{n}-1 is equally likely, so that the prob­a­bil­ity of ob­tain­ing any par­tic­u­lar per­mu­ta­tion is 1/\texttt{n}!.

Note that the val­ues 0,\ldots,\texttt{n}-1 could be re­placed by any or­dered set of n el­e­ments with­out chang­ing any of the prop­er­ties of the ran­dom bi­nary search tree. The el­e­ment \texttt{x}\in\{0,\ldots,\texttt{n}-1\} is sim­ply stand­ing in for the el­e­ment of rank x in an or­dered set of size n.

Be­fore we can pre­sent our main re­sult about ran­dom bi­nary search trees, we must take some time for a short di­gres­sion to dis­cuss a type of num­ber that comes up fre­quently when study­ing ran­dom­ized struc­tures. For a non-neg­a­tive in­te­ger, k, the k-th har­monic num­ber, de­noted H_k, is de­fined as \begin{equation*} H_k = 1 + 1/2 + 1/3 + \cdots + 1/k \enspace . \end{equation*} The har­monic num­ber H_k has no sim­ple closed form, but it is very closely re­lated to the nat­ural log­a­rithm of k. In par­tic­u­lar, \begin{equation*} \ln k < H_k \le \ln k + 1 \enspace . \end{equation*} Read­ers who have stud­ied cal­cu­lus might no­tice that this is be­cause the in­te­gral \int_1^k\! (1/x)  \mathrm{d}x = \ln k. Keep­ing in mind that an in­te­gral can be in­ter­preted as the area be­tween a curve and the x-axis, the value of H_k can be lower-bounded by the in­te­gral \int_1^k\! (1/x)  \mathrm{d}x and up­per-bounded by 1+ \int_1^k\! (1/x)  \mathrm{d}x. (See Fig­ure 7.2 for a graph­i­cal ex­pla­na­tion.)

Fig­ure 7.2 The kth har­monic num­ber H_k=\sum_{i=1}^k 1/i is up­per- and lower-bounded by two in­te­grals. The value of these in­te­grals is given by the area of the shaded re­gion, while the value of H_k is given by the area of the rec­tan­gles.

Lemma 7.1 In a ran­dom bi­nary search tree of size n, the fol­low­ing state­ments hold:
  1. For any \texttt{x}\in\{0,\ldots,\texttt{n}-1\}, the ex­pected length of the search path for x is H_{\texttt{x}+1} + H_{\texttt{n}-\texttt{x}} - O(1).13The ex­pres­sions \texttt{x}+1 and \texttt{n}-\texttt{x} can be in­ter­preted re­spec­tively as the num­ber of el­e­ments in the tree less than or equal to x and the num­ber of el­e­ments in the tree greater than or equal to x.
  2. For any \texttt{x}\in(-1,n)\setminus\{0,\ldots,\texttt{n}-1\}, the ex­pected length of the search path for x is H_{\lceil\texttt{x}\rceil} + H_{\texttt{n}-\lceil\texttt{x}\rceil}.

We will prove Lemma 7.1 in the next sec­tion. For now, con­sider what the two parts of Lemma 7.1 tell us. The first part tells us that if we search for an el­e­ment in a tree of size n, then the ex­pected length of the search path is at most 2\ln n + O(1). The sec­ond part tells us the same thing about search­ing for a value not stored in the tree. When we com­pare the two parts of the lemma, we see that it is only slightly faster to search for some­thing that is in the tree com­pared to some­thing that is not.

7.1.1 Proof of Lemma 7.1

The key ob­ser­va­tion needed to prove Lemma 7.1 is the fol­low­ing: The search path for a value x in the open in­ter­val (-1,\texttt{n}) in a ran­dom bi­nary search tree, T, con­tains the node with key i < \texttt{x} if, and only if, in the ran­dom per­mu­ta­tion used to cre­ate T, i ap­pears be­fore any of \{i+1,i+2,\ldots,\lfloor\texttt{x}\rfloor\}.

To see this, refer to Fig­ure 7.3 and no­tice that until some value in \{i,i+1,\ldots,\lfloor\texttt{x}\rfloor\} is added, the search paths for each value in the open in­ter­val (i-1,\lfloor\texttt{x}\rfloor+1) are iden­ti­cal. (Re­mem­ber that for two val­ues to have dif­fer­ent search paths, there must be some el­e­ment in the tree that com­pares dif­fer­ently with them.) Let j be the first el­e­ment in \{i,i+1,\ldots,\lfloor\texttt{x}\rfloor\} to ap­pear in the ran­dom per­mu­ta­tion. No­tice that j is now and will al­ways be on the search path for x. If j\neq i then the node \texttt{u}_j con­tain­ing j is cre­ated be­fore the node \texttt{u}_i that con­tains i. Later, when i is added, it will be added to the sub­tree rooted at \texttt{u}_j\texttt{.left}, since \(ix will never visit this sub­tree be­cause it will pro­ceed to \texttt{u}_j\texttt{.right} after vis­it­ing \texttt{u}_j.

Fig­ure 7.3 The value i<\texttt{x} is on the search path for x if and only if i is the first el­e­ment among \{i,i+1,\ldots,\lfloor\texttt{x}\rfloor\} added to the tree.

Sim­i­larly, for i>\texttt{x}, i ap­pears in the search path for x if and only if i ap­pears be­fore any of \{\lceil\texttt{x}\rceil, \lceil\texttt{x}\rceil+1,\ldots,i-1\} in the ran­dom per­mu­ta­tion used to cre­ate T.

No­tice that, if we start with a ran­dom per­mu­ta­tion of \{0,\ldots,\texttt{n}\}, then the sub­se­quences con­tain­ing only \{i,i+1,\ldots,\lfloor\texttt{x}\rfloor\} and \{\lceil\texttt{x}\rceil, \lceil\texttt{x}\rceil+1,\ldots,i-1\} are also ran­dom per­mu­ta­tions of their re­spec­tive el­e­ments. Each el­e­ment, then, in the sub­sets \{i,i+1,\ldots,\lfloor\texttt{x}\rfloor\} and \{\lceil\texttt{x}\rceil, \lceil\texttt{x}\rceil+1,\ldots,i-1\} is equally likely to ap­pear be­fore any other in its sub­set in the ran­dom per­mu­ta­tion used to cre­ate T. So we have \begin{equation*} \Pr\{\mbox{\(i\) is on the search path for \texttt{x}}\} = \left\{ \begin{array} 1/(\lfloor\texttt{x}\rfloor-i+1) & \mbox{if \(i < \texttt{x}\)} \\ 1/(i-\lceil\texttt{x}\rceil+1) & \mbox{if \(i > \texttt{x}\)} \end{array}\right . \enspace . \end{equation*}

With this ob­ser­va­tion, the proof of Lemma 7.1 in­volves some sim­ple cal­cu­la­tions with har­monic num­bers:

Proof of Lemma 7.1 Let I_i be the in­di­ca­tor ran­dom vari­able that is equal to one when i ap­pears on the search path for x and zero oth­er­wise. Then the length of the search path is given by \begin{equation*} \sum_{i\in\{0,\ldots,\texttt{n}-1\}\setminus\{\texttt{x}\}} I_i \end{equation*} so, if \texttt{x}\in\{0,\ldots,\texttt{n}-1\}, the ex­pected length of the search path is given by (see Fig­ure 7.4.a) \begin{align*} \E\left[\sum_{i=0}^{\texttt{x}-1} I_i + \sum_{i=\texttt{x}+1}^{\texttt{n}-1} I_i\right] & = \sum_{i=0}^{\texttt{x}-1} \E\left[I_i\right] + \sum_{i=\texttt{x}+1}^{\texttt{n}-1} \E\left[I_i\right] \\ & = \sum_{i=0}^{\texttt{x}-1} 1/(\lfloor\texttt{x}\rfloor-i+1) + \sum_{i=\texttt{x}+1}^{\texttt{n}-1} 1/(i-\lceil\texttt{x}\rceil+1) \\ & = \sum_{i=0}^{\texttt{x}-1} 1/(\texttt{x}-i+1) + \sum_{i=\texttt{x}+1}^{\texttt{n}-1} 1/(i-\texttt{x}+1) \\ & = \frac{1}{2}+\frac{1}{3}+\cdots+\frac{1}{\texttt{x}+1} \\ & \quad{} + \frac{1}{2}+\frac{1}{3}+\cdots+\frac{1}{\texttt{n}-\texttt{x}} \\ & = H_{\texttt{x}+1} + H_{\texttt{n}-\texttt{x}} - 2 \enspace . \end{align*} The cor­re­spond­ing cal­cu­la­tions for a search value \texttt{x}\in(-1,n)\setminus\{0,\ldots,\texttt{n}-1\} are al­most iden­ti­cal (see Fig­ure 7.4.b).


(a)
[2ex]

(b)
[2ex]
Fig­ure 7.4 The prob­a­bil­i­ties of an el­e­ment being on the search path for x when (a) x is an in­te­ger and (b) when x is not an in­te­ger.

7.1.2 Sum­mary

The fol­low­ing the­o­rem sum­ma­rizes the per­for­mance of a ran­dom bi­nary search tree:

The­o­rem 7.1 A ran­dom bi­nary search tree can be con­structed in O(\texttt{n}\log \texttt{n}) time. In a ran­dom bi­nary search tree, the find(x) op­er­a­tion takes O(\log \texttt{n}) ex­pected time.

We should em­pha­size again that the ex­pec­ta­tion in The­o­rem 7.1 is with re­spect to the ran­dom per­mu­ta­tion used to cre­ate the ran­dom bi­nary search tree. In par­tic­u­lar, it does not de­pend on a ran­dom choice of x; it is true for every value of x.

7.2 Treap: A Ran­dom­ized Bi­nary Search Tree

The prob­lem with ran­dom bi­nary search trees is, of course, that they are not dy­namic. They don't sup­port the add(x) or remove(x) op­er­a­tions needed to im­ple­ment the SSet in­ter­face. In this sec­tion we de­scribe a data struc­ture called a Treap that uses Lemma 7.1 to im­ple­ment the SSet in­ter­face.14The name Treap comes from the fact that this data struc­ture is si­mul­ta­ne­ously a bi­nary search tree (Sec­tion 6.2) and a heap (Chap­ter 10).

A node in a Treap is like a node in a BinarySearchTree in that it has a data value, x, but it also con­tains a unique nu­mer­i­cal pri­or­ity, p, that is as­signed at ran­dom:

    class Node<T> extends BinarySearchTree.BSTNode<Node<T>,T> {
        int p;
    }
In ad­di­tion to being a bi­nary search tree, the nodes in a Treap also obey the heap prop­erty: In other words, each node has a pri­or­ity smaller than that of its two chil­dren. An ex­am­ple is shown in Fig­ure 7.5.

Fig­ure 7.5 An ex­am­ple of a Treap con­tain­ing the in­te­gers 0,\ldots,9. Each node, u, is il­lus­trated as a box con­tain­ing \texttt{u.x},\texttt{u.p}.

The heap and bi­nary search tree con­di­tions to­gether en­sure that, once the key (x) and pri­or­ity (p) for each node are de­fined, the shape of the Treap is com­pletely de­ter­mined. The heap prop­erty tells us that the node with min­i­mum pri­or­ity has to be the root, r, of the Treap. The bi­nary search tree prop­erty tells us that all nodes with keys smaller than r.x are stored in the sub­tree rooted at r.left and all nodes with keys larger than r.x are stored in the sub­tree rooted at r.right.

The im­por­tant point about the pri­or­ity val­ues in a Treap is that they are unique and as­signed at ran­dom. Be­cause of this, there are two equiv­a­lent ways we can think about a Treap. As de­fined above, a Treap obeys the heap and bi­nary search tree prop­er­ties. Al­ter­na­tively, we can think of a Treap as a BinarySearchTree whose nodes were added in in­creas­ing order of pri­or­ity. For ex­am­ple, the Treap in Fig­ure 7.5 can be ob­tained by adding the se­quence of (\texttt{x},\texttt{p}) val­ues \begin{equation*} \langle (3,1), (1,6), (0,9), (5,11), (4,14), (9,17), (7,22), (6,42), (8,49), (2,99) \rangle \end{equation*} into a BinarySearchTree.

Since the pri­or­i­ties are cho­sen ran­domly, this is equiv­a­lent to tak­ing a ran­dom per­mu­ta­tion of the keys—in this case the per­mu­ta­tion is \begin{equation*} \langle 3, 1, 0, 5, 9, 4, 7, 6, 8, 2 \rangle \end{equation*} —and adding these to a BinarySearchTree. But this means that the shape of a treap is iden­ti­cal to that of a ran­dom bi­nary search tree. In par­tic­u­lar, if we re­place each key x by its rank,15The rank of an el­e­ment x in a set S of el­e­ments is the num­ber of el­e­ments in S that are less than x. then Lemma 7.1 ap­plies. Re­stat­ing Lemma 7.1 in terms of Treaps, we have:

Lemma 7.2 In a Treap that stores a set S of n keys, the fol­low­ing state­ments hold:
  1. For any \texttt{x}\in S, the ex­pected length of the search path for x is H_{r(\texttt{x})+1} + H_{\texttt{n}-r(\texttt{x})} - O(1).
  2. For any \texttt{x}\not\in S, the ex­pected length of the search path for x is H_{r(\texttt{x})} + H_{\texttt{n}-r(\texttt{x})}.
Here, r(\texttt{x}) de­notes the rank of x in the set S\cup\{\texttt{x}\}.
Again, we em­pha­size that the ex­pec­ta­tion in Lemma 7.2 is taken over the ran­dom choices of the pri­or­i­ties for each node. It does not re­quire any as­sump­tions about the ran­dom­ness in the keys.

Lemma 7.2 tells us that Treaps can im­ple­ment the find(x) op­er­a­tion ef­fi­ciently. How­ever, the real ben­e­fit of a Treap is that it can sup­port the add(x) and delete(x) op­er­a­tions. To do this, it needs to per­form ro­ta­tions in order to main­tain the heap prop­erty. Refer to Fig­ure 7.6. A ro­ta­tion in a bi­nary search tree is a local mod­i­fi­ca­tion that takes a par­ent u of a node w and makes w the par­ent of u, while pre­serv­ing the bi­nary search tree prop­erty. Ro­ta­tions come in two flavours: left or right de­pend­ing on whether w is a right or left child of u, re­spec­tively.

Fig­ure 7.6 Left and right ro­ta­tions in a bi­nary search tree.

The code that im­ple­ments this has to han­dle these two pos­si­bil­i­ties and be care­ful of a bound­ary case (when u is the root), so the ac­tual code is a lit­tle longer than Fig­ure 7.6 would lead a reader to be­lieve:

    void rotateLeft(Node u) {
        Node w = u.right;
        w.parent = u.parent;
        if (w.parent != nil) {
            if (w.parent.left == u) {
                w.parent.left = w;
            } else {
                w.parent.right = w;
            }
        }
        u.right = w.left;
        if (u.right != nil) {
            u.right.parent = u;
        }
        u.parent = w;
        w.left = u;
        if (u == r) { r = w; r.parent = nil; }
    }    
    void rotateRight(Node u) {
        Node w = u.left;
        w.parent = u.parent;
        if (w.parent != nil) {
            if (w.parent.left == u) {
                w.parent.left = w;
            } else {
                w.parent.right = w;
            }
        }
        u.left = w.right;
        if (u.left != nil) {
            u.left.parent = u;
        }
        u.parent = w;
        w.right = u;
        if (u == r) { r = w; r.parent = nil; }
    }
In terms of the Treap data struc­ture, the most im­por­tant prop­erty of a ro­ta­tion is that the depth of w de­creases by one while the depth of u in­creases by one.

Using ro­ta­tions, we can im­ple­ment the add(x) op­er­a­tion as fol­lows: We cre­ate a new node, u, as­sign u.x=x, and pick a ran­dom value for u.p. Next we add u using the usual add(x) al­go­rithm for a BinarySearchTree, so that u is now a leaf of the Treap. At this point, our Treap sat­is­fies the bi­nary search tree prop­erty, but not nec­es­sar­ily the heap prop­erty. In par­tic­u­lar, it may be the case that u.parent.p > u.p. If this is the case, then we per­form a ro­ta­tion at node w=u.parent so that u be­comes the par­ent of w. If u con­tin­ues to vi­o­late the heap prop­erty, we will have to re­peat this, de­creas­ing u's depth by one every time, until u ei­ther be­comes the root or \texttt{u.parent.p} < \texttt{u.p}.

    boolean add(T x) {
        Node<T> u = newNode();
        u.x = x;
        u.p = rand.nextInt();
        if (super.add(u)) {
            bubbleUp(u);
            return true;
        }
        return false;
    }
    void bubbleUp(Node<T> u) {
        while (u.parent != nil && u.parent.p > u.p) {
            if (u.parent.right == u) {
                rotateLeft(u.parent);
            } else {
                rotateRight(u.parent);
            }
        }
        if (u.parent == nil) {
            r = u;
        }
    }
An ex­am­ple of an add(x) op­er­a­tion is shown in Fig­ure 7.7.




Fig­ure 7.7 Adding the value 1.5 into the Treap from Fig­ure 7.5.

The run­ning time of the add(x) op­er­a­tion is given by the time it takes to fol­low the search path for x plus the num­ber of ro­ta­tions per­formed to move the newly-added node, u, up to its cor­rect lo­ca­tion in the Treap. By Lemma 7.2, the ex­pected length of the search path is at most 2\ln \texttt{n}+O(1). Fur­ther­more, each ro­ta­tion de­creases the depth of u. This stops if u be­comes the root, so the ex­pected num­ber of ro­ta­tions can­not ex­ceed the ex­pected length of the search path. There­fore, the ex­pected run­ning time of the add(x) op­er­a­tion in a Treap is O(\log \texttt{n}). (Ex­er­cise 7.5 asks you to show that the ex­pected num­ber of ro­ta­tions per­formed dur­ing an ad­di­tion is ac­tu­ally only O(1).)

The remove(x) op­er­a­tion in a Treap is the op­po­site of the add(x) op­er­a­tion. We search for the node, u, con­tain­ing x, then per­form ro­ta­tions to move u down­wards until it be­comes a leaf, and then we splice u from the Treap. No­tice that, to move u down­wards, we can per­form ei­ther a left or right ro­ta­tion at u, which will re­place u with u.right or u.left, re­spec­tively. The choice is made by the first of the fol­low­ing that apply:

  1. If u.left and u.right are both null, then u is a leaf and no ro­ta­tion is per­formed.
  2. If u.left (or u.right) is null, then per­form a right (or left, re­spec­tively) ro­ta­tion at u.
  3. If \texttt{u.left.p} < \texttt{u.right.p} (or \texttt{u.left.p} > \texttt{u.right.p}), then per­form a right ro­ta­tion (or left ro­ta­tion, re­spec­tively) at u.
These three rules en­sure that the Treap doesn't be­come dis­con­nected and that the heap prop­erty is re­stored once u is re­moved.
    boolean remove(T x) {
        Node<T> u = findLast(x);
        if (u != nil && c.compare(u.x, x) == 0) {
            trickleDown(u);
            splice(u);
            return true;
        }
        return false;
    }
    void trickleDown(Node<T> u) {
        while (u.left != nil || u.right != nil) {
            if (u.left == nil) {
                rotateLeft(u);
            } else if (u.right == nil) {
                rotateRight(u);
            } else if (u.left.p < u.right.p) {
                rotateRight(u);
            } else {
                rotateLeft(u);
            }
            if (r == u) {
                r = u.parent;
            }
        }
    }
An ex­am­ple of the remove(x) op­er­a­tion is shown in Fig­ure 7.8.



Fig­ure 7.8 Re­mov­ing the value 9 from the Treap in Fig­ure 7.5.

The trick to an­a­lyze the run­ning time of the remove(x) op­er­a­tion is to no­tice that this op­er­a­tion re­verses the add(x) op­er­a­tion. In par­tic­u­lar, if we were to rein­sert x, using the same pri­or­ity u.p, then the add(x) op­er­a­tion would do ex­actly the same num­ber of ro­ta­tions and would re­store the Treap to ex­actly the same state it was in be­fore the remove(x) op­er­a­tion took place. (Read­ing from bot­tom-to-top, Fig­ure 7.8 il­lus­trates the ad­di­tion of the value 9 into a Treap.) This means that the ex­pected run­ning time of the remove(x) on a Treap of size n is pro­por­tional to the ex­pected run­ning time of the add(x) op­er­a­tion on a Treap of size \texttt{n}-1. We con­clude that the ex­pected run­ning time of remove(x) is O(\log \texttt{n}).

7.2.1 Sum­mary

The fol­low­ing the­o­rem sum­ma­rizes the per­for­mance of the Treap data struc­ture:

The­o­rem 7.2 A Treap im­ple­ments the SSet in­ter­face. A Treap sup­ports the op­er­a­tions add(x), remove(x), and find(x) in O(\log \texttt{n}) ex­pected time per op­er­a­tion.

It is worth com­par­ing the Treap data struc­ture to the SkiplistSSet data struc­ture. Both im­ple­ment the SSet op­er­a­tions in O(\log \texttt{n}) ex­pected time per op­er­a­tion. In both data struc­tures, add(x) and remove(x) in­volve a search and then a con­stant num­ber of pointer changes (see Ex­er­cise 7.5 below). Thus, for both these struc­tures, the ex­pected length of the search path is the crit­i­cal value in as­sess­ing their per­for­mance. In a SkiplistSSet, the ex­pected length of a search path is \begin{equation*} 2\log \texttt{n} + O(1) \enspace , \end{equation*} In a Treap, the ex­pected length of a search path is \begin{equation*} 2\ln \texttt{n} +O(1) \approx 1.386\log \texttt{n} + O(1) \enspace . \end{equation*} Thus, the search paths in a Treap are con­sid­er­ably shorter and this trans­lates into no­tice­ably faster op­er­a­tions on Treaps than Skiplists. Ex­er­cise 4.7 in Chap­ter 4 shows how the ex­pected length of the search path in a Skiplist can be re­duced to \begin{equation*} e\ln \texttt{n} + O(1) \approx 1.884\log \texttt{n} + O(1) \end{equation*} by using bi­ased coin tosses. Even with this op­ti­miza­tion, the ex­pected length of search paths in a SkiplistSSet is no­tice­ably longer than in a Treap.

7.3 Dis­cus­sion and Ex­er­cises

Ran­dom bi­nary search trees have been stud­ied ex­ten­sively. De­vroye [19] gives a proof of Lemma 7.1 and re­lated re­sults. There are much stronger re­sults in the lit­er­a­ture as well, the most im­pres­sive of which is due to Reed [64], who shows that the ex­pected height of a ran­dom bi­nary search tree is \begin{equation*} \alpha\ln n - \beta\ln\ln n + O(1) \end{equation*} where \alpha\approx4.31107 is the unique so­lu­tion on the in­ter­val [2,\infty) of the equa­tion \alpha\ln((2e/\alpha))=1 and \beta=\frac{3}{2\ln(\alpha/2)} . Fur­ther­more, the vari­ance of the height is con­stant.

The name Treap was coined by Sei­del and Aragon [67] who dis­cussed Treaps and some of their vari­ants. How­ever, their basic struc­ture was stud­ied much ear­lier by Vuillemin [76] who called them Carte­sian trees.

One pos­si­ble space-op­ti­miza­tion of the Treap data struc­ture is the elim­i­na­tion of the ex­plicit stor­age of the pri­or­ity p in each node. In­stead, the pri­or­ity of a node, u, is com­puted by hash­ing u's ad­dress in mem­ory (in 32-bit Java, this is equiv­a­lent to hash­ing u.hashCode()). Al­though a num­ber of hash func­tions will prob­a­bly work well for this in prac­tice, for the im­por­tant parts of the proof of Lemma 7.1 to re­main valid, the hash func­tion should be ran­dom­ized and have the min-wise in­de­pen­dent prop­erty: For any dis­tinct val­ues x_1,\ldots,x_k, each of the hash val­ues h(x_1),\ldots,h(x_k) should be dis­tinct with high prob­a­bil­ity and, for each i\in\{1,\ldots,k\}, \begin{equation*} \Pr\{h(x_i) = \min\{h(x_1),\ldots,h(x_k)\}\} \le c/k \end{equation*} for some con­stant c. One such class of hash func­tions that is easy to im­ple­ment and fairly fast is tab­u­la­tion hash­ing (Sec­tion 5.2.3).

An­other Treap vari­ant that doesn't store pri­or­i­ties at each node is the ran­dom­ized bi­nary search tree of Martí nez and Roura [51]. In this vari­ant, every node, u, stores the size, u.size, of the sub­tree rooted at u. Both the add(x) and remove(x) al­go­rithms are ran­dom­ized. The al­go­rithm for adding x to the sub­tree rooted at u does the fol­low­ing:

  1. With prob­a­bil­ity 1/(\texttt{size(u)}+1), the value x is added the usual way, as a leaf, and ro­ta­tions are then done to bring x up to the root of this sub­tree.
  2. Oth­er­wise (with prob­a­bil­ity 1-1/(\texttt{size(u)}+1)), the value x is re­cur­sively added into one of the two sub­trees rooted at u.left or u.right, as ap­pro­pri­ate.
The first case cor­re­sponds to an add(x) op­er­a­tion in a Treap where x's node re­ceives a ran­dom pri­or­ity that is smaller than any of the size(u) pri­or­i­ties in u's sub­tree, and this case oc­curs with ex­actly the same prob­a­bil­ity.

Re­mov­ing a value x from a ran­dom­ized bi­nary search tree is sim­i­lar to the process of re­mov­ing from a Treap. We find the node, u, that con­tains x and then per­form ro­ta­tions that re­peat­edly in­crease the depth of u until it be­comes a leaf, at which point we can splice it from the tree. The choice of whether to per­form a left or right ro­ta­tion at each step is ran­dom­ized.

  1. With prob­a­bil­ity u.left.size/(u.size-1), we per­form a right ro­ta­tion at u, mak­ing u.left the root of the sub­tree that was for­merly rooted at u.
  2. With prob­a­bil­ity u.right.size/(u.size-1), we per­form a left ro­ta­tion at u, mak­ing u.right the root of the sub­tree that was for­merly rooted at u.
Again, we can eas­ily ver­ify that these are ex­actly the same prob­a­bil­i­ties that the re­moval al­go­rithm in a Treap will per­form a left or right ro­ta­tion of u.

Ran­dom­ized bi­nary search trees have the dis­ad­van­tage, com­pared to treaps, that when adding and re­mov­ing el­e­ments they make many ran­dom choices, and they must main­tain the sizes of sub­trees. One ad­van­tage of ran­dom­ized bi­nary search trees over treaps is that sub­tree sizes can serve an­other use­ful pur­pose, namely to pro­vide ac­cess by rank in O(\log \texttt{n}) ex­pected time (see Ex­er­cise 7.10). In com­par­i­son, the ran­dom pri­or­i­ties stored in treap nodes have no use other than keep­ing the treap bal­anced.

Ex­er­cise 7.1 Il­lus­trate the ad­di­tion of 4.5 (with pri­or­ity 7) and then 7.5 (with pri­or­ity 20) on the Treap in Fig­ure 7.5.

Ex­er­cise 7.2 Il­lus­trate the re­moval of 5 and then 7 on the Treap in Fig­ure 7.5.

Ex­er­cise 7.3 Prove the as­ser­tion that there are 21,964,800 se­quences that gen­er­ate the tree on the right hand side of Fig­ure 7.1. (Hint: Give a re­cur­sive for­mula for the num­ber of se­quences that gen­er­ate a com­plete bi­nary tree of height h and eval­u­ate this for­mula for h=3.)

Ex­er­cise 7.4 De­sign and im­ple­ment the permute(a) method that takes as input an array, a, that con­tains n dis­tinct val­ues and ran­domly per­mutes a. The method should run in O(\texttt{n}) time and you should prove that each of the \texttt{n}! pos­si­ble per­mu­ta­tions of a is equally prob­a­ble.

Ex­er­cise 7.5 Use both parts of Lemma 7.2 to prove that the ex­pected num­ber of ro­ta­tions per­formed by an add(x) op­er­a­tion (and hence also a remove(x) op­er­a­tion) is O(1).

Ex­er­cise 7.6 Mod­ify the Treap im­ple­men­ta­tion given here so that it does not ex­plic­itly store pri­or­i­ties. In­stead, it should sim­u­late them by hash­ing the hashCode() of each node.

Ex­er­cise 7.7 Sup­pose that a bi­nary search tree stores, at each node, u, the height, u.height, of the sub­tree rooted at u, and the size, u.size of the sub­tree rooted at u.
  1. Show how, if we per­form a left or right ro­ta­tion at u, then these two quan­ti­ties can be up­dated, in con­stant time, for all nodes af­fected by the ro­ta­tion.
  2. Ex­plain why the same re­sult is not pos­si­ble if we try to also store the depth, u.depth, of each node u.

Ex­er­cise 7.8 De­sign and im­ple­ment an al­go­rithm that con­structs a Treap from a sorted array, a, of n el­e­ments. This method should run in O(\texttt{n}) worst-case time and should con­struct a Treap that is in­dis­tin­guish­able from one in which the el­e­ments of a were added one at a time using the add(x) method.

Ex­er­cise 7.9 This ex­er­cise works out the de­tails of how one can ef­fi­ciently search a Treap given a pointer that is close to the node we are search­ing for.
  1. De­sign and im­ple­ment a Treap im­ple­men­ta­tion in which each node keeps track of the min­i­mum and max­i­mum val­ues in its sub­tree.
  2. Using this extra in­for­ma­tion, add a fingerFind(x,u) method that ex­e­cutes the find(x) op­er­a­tion with the help of a pointer to the node u (which is hope­fully not far from the node that con­tains x). This op­er­a­tion should start at u and walk up­wards until it reaches a node w such that \texttt{w.min}\le \texttt{x}\le \texttt{w.max}. From that point on­wards, it should per­form a stan­dard search for x start­ing from w. (One can show that fingerFind(x,u) takes O(1+\log r) time, where r is the num­ber of el­e­ments in the treap whose value is be­tween x and u.x.)
  3. Ex­tend your im­ple­men­ta­tion into a ver­sion of a treap that starts all its find(x) op­er­a­tions from the node most re­cently found by find(x).

Ex­er­cise 7.10 De­sign and im­ple­ment a ver­sion of a Treap that in­cludes a get(i) op­er­a­tion that re­turns the key with rank i in the Treap. (Hint: Have each node, u, keep track of the size of the sub­tree rooted at u.)

Ex­er­cise 7.11 Im­ple­ment a TreapList, an im­ple­men­ta­tion of the List in­ter­face as a treap. Each node in the treap should store a list item, and an in-or­der tra­ver­sal of the treap finds the items in the same order that they occur in the list. All the List op­er­a­tions get(i), set(i,x), add(i,x) and remove(i) should run in O(\log \texttt{n}) ex­pected time.

Ex­er­cise 7.12 De­sign and im­ple­ment a ver­sion of a Treap that sup­ports the split(x) op­er­a­tion. This op­er­a­tion re­moves all val­ues from the Treap that are greater than x and re­turns a sec­ond Treap that con­tains all the re­moved val­ues.

Ex­am­ple: the code t2 = t.split(x) re­moves from t all val­ues greater than x and re­turns a new Treap t2 con­tain­ing all these val­ues. The split(x) op­er­a­tion should run in O(\log \texttt{n}) ex­pected time.

Warn­ing: For this mod­i­fi­ca­tion to work prop­erly and still allow the size() method to run in con­stant time, it is nec­es­sary to im­ple­ment the mod­i­fi­ca­tions in Ex­er­cise 7.10.

Ex­er­cise 7.13 De­sign and im­ple­ment a ver­sion of a Treap that sup­ports the absorb(t2) op­er­a­tion, which can be thought of as the in­verse of the split(x) op­er­a­tion. This op­er­a­tion re­moves all val­ues from the Treap t2 and adds them to the re­ceiver. This op­er­a­tion pre­sup­poses that the small­est value in t2 is greater than the largest value in the re­ceiver. The absorb(t2) op­er­a­tion should run in O(\log \texttt{n}) ex­pected time.

Ex­er­cise 7.14 Im­ple­ment Mar­tinez's ran­dom­ized bi­nary search trees, as dis­cussed in this sec­tion. Com­pare the per­for­mance of your im­ple­men­ta­tion with that of the Treap im­ple­men­ta­tion.