Subsections


6.2 $ \mathtt{BinarySearchTree}$: An Unbalanced Binary Search Tree

A $ \mathtt{BinarySearchTree}$ is a special kind of binary tree in which each node, $ \mathtt{u}$, also stores a data value, $ \mathtt{u.x}$, from some total order. The data values in a binary search tree obey the binary search tree property: For a node, $ \mathtt{u}$, every data value stored in the subtree rooted at $ \mathtt{u.left}$ is less than $ \mathtt{u.x}$ and every data value stored in the subtree rooted at $ \mathtt{u.right}$ is greater than $ \mathtt{u.x}$. An example of a $ \mathtt{BinarySearchTree}$ is shown in Figure 6.5.

Figure 6.5: A binary search tree.
\includegraphics{figs/bst-example}

6.2.1 Searching

The binary search tree property is extremely useful because it allows us to quickly locate a value, $ \mathtt{x}$, in a binary search tree. To do this we start searching for $ \mathtt{x}$ at the root, $ \mathtt{r}$. When examining a node, $ \mathtt{u}$, there are three cases:

  1. If $ \ensuremath{\mathtt{x}}< \ensuremath{\mathtt{u.x}}$ then the search proceeds to $ \mathtt{u.left}$;
  2. If $ \ensuremath{\mathtt{x}}> \ensuremath{\mathtt{u.x}}$ then the search proceeds to $ \mathtt{u.right}$;
  3. If $ \ensuremath{\mathtt{x}}= \ensuremath{\mathtt{u.x}}$ then we have found the node $ \mathtt{u}$ containing $ \mathtt{x}$.
The search terminates when Case 3 occurs or when $ \mathtt{u=nil}$. In the former case, we found $ \mathtt{x}$. In the latter case, we conclude that $ \mathtt{x}$ is not in the binary search tree.
  T findEQ(T x) {
    Node *w = r;
    while (w != nil) {
      int comp = compare(x, w->x);
      if (comp < 0) {
        w = w->left;
      } else if (comp > 0) {
        w = w->right;
      } else {
        return w->x;
      }
    }
    return null;
  }

Two examples of searches in a binary search tree are shown in Figure 6.6. As the second example shows, even if we don't find $ \mathtt{x}$ in the tree, we still gain some valuable information. If we look at the last node, $ \mathtt{u}$, at which Case 1 occurred, we see that $ \mathtt{u.x}$ is the smallest value in the tree that is greater than $ \mathtt{x}$. Similarly, the last node at which Case 2 occurred contains the largest value in the tree that is less than $ \mathtt{x}$. Therefore, by keeping track of the last node, $ \mathtt{z}$, at which Case 1 occurs, a $ \mathtt{BinarySearchTree}$ can implement the $ \mathtt{find(x)}$ operation that returns the smallest value stored in the tree that is greater than or equal to $ \mathtt{x}$:

  T find(T x) {
    Node *w = r, *z = nil;
    while (w != nil) {
      int comp = compare(x, w->x);
      if (comp < 0) {
        z = w;
        w = w->left;
      } else if (comp > 0) {
        w = w->right;
      } else {
        return w->x;
      }
    }
    return z == nil ? null : z->x;
  }

Figure 6.6: An example of (a) a successful search (for $ 6$) and (b) an unsuccessful search (for $ 10$) in a binary search tree.
\includegraphics{figs/bst-example-2} \includegraphics{figs/bst-example-3}
(a) (b)

6.2.2 Addition

To add a new value, $ \mathtt{x}$, to a $ \mathtt{BinarySearchTree}$, we first search for $ \mathtt{x}$. If we find it, then there is no need to insert it. Otherwise, we store $ \mathtt{x}$ at a leaf child of the last node, $ \mathtt{p}$, encountered during the search for $ \mathtt{x}$. Whether the new node is the left or right child of $ \mathtt{p}$ depends on the result of comparing $ \mathtt{x}$ and $ \mathtt{p.x}$.

  bool add(T x) {
    Node *p = findLast(x);
    Node *u = new Node;
    u->x = x;
    return addChild(p, u);
  }
  Node* findLast(T x) {
    Node *w = r, *prev = nil;
    while (w != nil) {
      prev = w;
      int comp = compare(x, w->x);
      if (comp < 0) {
        w = w->left;
      } else if (comp > 0) {
        w = w->right;
      } else {
        return w;
      }
    }
    return prev;
  }
  bool addChild(Node *p, Node *u) {
      if (p == nil) {
        r = u;              // inserting into empty tree
      } else {
        int comp = compare(u->x, p->x);
        if (comp < 0) {
          p->left = u;
        } else if (comp > 0) {
          p->right = u;
        } else {
          return false;   // u.x is already in the tree
        }
        u->parent = p;
      }
      n++;
      return true;
    }
An example is shown in Figure 6.7. The most time-consuming part of this process is the initial search for $ \mathtt{x}$, which takes time proportional to the height of the newly added node $ \mathtt{u}$. In the worst case, this is equal to the height of the $ \mathtt{BinarySearchTree}$.

Figure 6.7: Inserting the value $ 8.5$ into a binary search tree.
\includegraphics{figs/bst-example-4} \includegraphics{figs/bst-example-5}

6.2.3 Removal

Deleting a value stored in a node, $ \mathtt{u}$, of a $ \mathtt{BinarySearchTree}$ is a little more difficult. If $ \mathtt{u}$ is a leaf, then we can just detach $ \mathtt{u}$ from its parent. Even better: If $ \mathtt{u}$ has only one child, then we can splice $ \mathtt{u}$ from the tree by having $ \mathtt{u.parent}$ adopt $ \mathtt{u}$'s child (see Figure 6.8):

  void splice(Node *u) {
    Node *s, *p;
    if (u->left != nil) {
      s = u->left;
    } else {
      s = u->right;
    }
    if (u == r) {
      r = s;
      p = nil;
    } else {
      p = u->parent;
      if (p->left == u) {
        p->left = s;
      } else {
        p->right = s;
      }
    }
    if (s != nil) {
      s->parent = p;
    }
    n--;
  }

Figure 6.8: Removing a leaf ($ 6$) or a node with only one child ($ 9$) is easy.
\includegraphics{figs/bst-splice}

Things get tricky, though, when $ \mathtt{u}$ has two children. In this case, the simplest thing to do is to find a node, $ \mathtt{w}$, that has less than two children such that we can replace $ \mathtt{u.x}$ with $ \mathtt{w.x}$. To maintain the binary search tree property, the value $ \mathtt{w.x}$ should be close to the value of $ \mathtt{u.x}$. For example, picking $ \mathtt{w}$ such that $ \mathtt{w.x}$ is the smallest value greater than $ \mathtt{u.x}$ will do. Finding the node $ \mathtt{w}$ is easy; it is the smallest value in the subtree rooted at $ \mathtt{u.right}$. This node can be easily removed because it has no left child. (See Figure 6.9)

  void remove(Node *u) {
    if (u->left == nil || u->right == nil) {
      splice(u);
      delete u;
    } else {
      Node *w = u->right;
      while (w->left != nil)
        w = w->left;
      u->x = w->x;
      splice(w);
      delete w;
    }
  }

Figure 6.9: Deleting a value ($ 11$) from a node, $ \mathtt{u}$, with two children is done by replacing $ \mathtt{u}$'s value with the smallest value in the right subtree of $ \mathtt{u}$.
\includegraphics{figs/bst-delete-1} \includegraphics{figs/bst-delete-2}  

6.2.4 Summary

The $ \mathtt{find(x)}$, $ \mathtt{add(x)}$, and $ \mathtt{remove(x)}$ operations in a $ \mathtt{BinarySearchTree}$ each involve following a path from the root of the tree to some node in the tree. Without knowing more about the shape of the tree it is difficult to say much about the length of this path, except that it is less than $ \mathtt{n}$, the number of nodes in the tree. The following (unimpressive) theorem summarizes the performance of the $ \mathtt{BinarySearchTree}$ data structure:

Theorem 6..1   A $ \mathtt{BinarySearchTree}$ implements the $ \mathtt{SSet}$ interface. A $ \mathtt{BinarySearchTree}$ supports the operations $ \mathtt{add(x)}$, $ \mathtt{remove(x)}$, and $ \mathtt{find(x)}$ in $ O(\ensuremath{\mathtt{n}})$ time per operation.

Theorem 6.1 compares poorly with Theorem 4.1, which shows that the $ \mathtt{SkiplistSSet}$ structure can implement the $ \mathtt{SSet}$ interface with $ O(\log \ensuremath{\mathtt{n}})$ expected time per operation. The problem with the $ \mathtt{BinarySearchTree}$ structure is that it can become unbalanced. Instead of looking like the tree in Figure 6.5 it can look like a long chain of $ \mathtt{n}$ nodes, all but the last having exactly one child.

There are a number of ways of avoiding unbalanced binary search trees, all of which lead to data structures that have $ O(\log \ensuremath{\mathtt{n}})$ time operations. In Chapter 7 we show how $ O(\log \ensuremath{\mathtt{n}})$ expected time operations can be achieved with randomization. In Chapter 8 we show how $ O(\log \ensuremath{\mathtt{n}})$ amortized time operations can be achieved with partial rebuilding operations. In Chapter 9 we show how $ O(\log \ensuremath{\mathtt{n}})$ worst-case time operations can be achieved by simulating a tree that is not binary: a tree in which nodes can have up to four children.

opendatastructures.org