13.1 $ \mathtt{BinaryTrie}$: A digital search tree

A $ \mathtt{BinaryTrie}$ encode a set of $ \mathtt{w}$ bit integers in a binary tree. All leaves in the tree have depth $ \mathtt{w}$ and each integer is encoded as a root-to-leaf path. The path for the integer $ \mathtt{x}$ turns left at level $ \mathtt{i}$ if the $ \mathtt{i}$th most significant bit of $ \mathtt{x}$ is a 0 and turns right if it is a 1. Figure 13.1 shows an example for the case $ \ensuremath{\mathtt{w}}=4$, in which the trie stores the integers 3(0011), 9(1001), 12(1100), and 13(1101).

Figure 13.1: The integers stored in a binary trie are encoded as root-to-leaf paths.
\includegraphics{figs/binarytrie-ex-1}

Because the search path for a value $ \mathtt{x}$ depends on the bits of $ \mathtt{x}$ it will be helpful to name the children of a node, $ \mathtt{u}$, $ \mathtt{u.child[0]}$ ( $ \mathtt{left}$) and $ \mathtt{u.child[1]}$ ( $ \mathtt{right}$). These child pointers will actually serve double-duty. Since the leaves in a binary trie have no children, the pointers are used to string the leaves together into a doubly-linked list. For a leaf in the binary trie $ \mathtt{u.child[0]}$ ( $ \mathtt{prev}$) is the node that comes before $ \mathtt{u}$ in the list and $ \mathtt{u.child[1]}$ ( $ \mathtt{next}$) is the node that follows $ \mathtt{u}$ in the list. A special node, $ \mathtt{dummy}$, is used both before the first node and after the last node in the list (see Section 3.2). In the code samples, $ \mathtt{u.child[0]}$, $ \mathtt{u.left}$, and $ \mathtt{u.prev}$ refer to the same field in the node $ \mathtt{u}$, as do $ \mathtt{u.child[1]}$, $ \mathtt{u.right}$, and $ \mathtt{u.next}$.

Each node, $ \mathtt{u}$, also contains an additional pointer $ \mathtt{u.jump}$. If $ \mathtt{u}$'s left child is missing, then $ \mathtt{u.jump}$ points to the smallest leaf in $ \mathtt{u}$'s subtree. If $ \mathtt{u}$'s right child is missing, then $ \mathtt{u.jump}$ points to the largest leaf in $ \mathtt{u}$'s subtree. An example of a $ \mathtt{BinaryTrie}$, showing $ \mathtt{jump}$ pointers and the doubly-linked list at the leaves, is shown in Figure 13.2

Figure 13.2: A $ \mathtt{BinaryTrie}$ with $ \mathtt{jump}$ pointers shown as curved dashed edges.
\includegraphics{figs/binarytrie-ex-2}

The $ \mathtt{find(x)}$ operation in a $ \mathtt{BinaryTrie}$ is fairly straightforward. We try to follow the search path for $ \mathtt{x}$ in the trie. If we reach a leaf, then we have found $ \mathtt{x}$. If, we reach a node $ \mathtt{u}$ where we cannot proceed (because $ \mathtt{u}$ is missing a child) then we follow $ \mathtt{u.jump}$, which takes us either to smallest leaf larger than $ \mathtt{x}$ or the largest leaf smaller than $ \mathtt{x}$. Which of these two cases occurs depends on whether $ \mathtt{u}$ is missing its left or right child, respectively. In the former case ( $ \mathtt{u}$ is missing its left child), we have found the value we are looking for. In the latter case ( $ \mathtt{u}$ is missing its right child), we can use the linked list to reach the value we are looking for. Each of these cases is illustrated in Figure 13.3.

  T find(T x) {
    int i, c = 0;
    unsigned ix = intValue(x);
    Node *u = &r;
    for (i = 0; i < w; i++) {
      c = (ix >> (w-i-1)) & 1;
      if (u->child[c] == NULL) break;
      u = u->child[c];
    }
    if (i == w) return u->x;  // found it
    u = (c == 0) ? u->jump : u->jump->next;
    return u == &dummy ? NULL : u->x;
  }
Figure 13.3: The paths followed by $ \mathtt{find(5)}$ and $ \mathtt{find(8)}$.
\includegraphics{figs/binarytrie-ex-3}
The running-time of the $ \mathtt{find(x)}$ method is dominated by the time it takes to follow a root-to-leaf path, so it runs in $ O(\ensuremath{\mathtt{w}})$ time.

The $ \mathtt{add(x)}$ operation in a $ \mathtt{BinaryTrie}$ is fairly straightforward, but it has a lot of things to take care of:

  1. It follows the search path for $ \mathtt{x}$ until reaching a node $ \mathtt{u}$ where it can no longer proceed.
  2. It creates the remainder of the search path from $ \mathtt{u}$ to a leaf that contains $ \mathtt{x}$.
  3. It adds the node, $ \mathtt{u'}$, containing $ \mathtt{x}$ to the linked list of leaves (it has access to $ \mathtt{u'}$'s predecessor, $ \mathtt{pred}$, in the linked list from the $ \mathtt{jump}$ pointer of the last node, $ \mathtt{u}$, encountered during step 1.)
  4. It walks back up the search path for $ \mathtt{x}$ adjusting $ \mathtt{jump}$ pointers at the nodes whose $ \mathtt{jump}$ pointer should now point to $ \mathtt{x}$.
An addition is illustrated in Figure 13.4.
Figure 13.4: Adding the values 2 and 15 to the $ \mathtt{BinaryTrie}$ in Figure 13.2.
\includegraphics{figs/binarytrie-add}
  bool add(T x) {
    int i, c = 0;
    unsigned ix = intValue(x);
    Node *u = &r;
    // 1 - search for ix until falling out of the trie
    for (i = 0; i < w; i++) {
      c = (ix >> (w-i-1)) & 1;
      if (u->child[c] == NULL) break;
      u = u->child[c];
    }
    if (i == w) return false; // trie already contains x - abort
    Node *pred = (c == right) ? u->jump : u->jump->left; // save for step 3
    u->jump = NULL;  // u will have two children shortly
    // 2 - add path to ix
    for (; i < w; i++) {
      c = (ix >> (w-i-1)) & 1;
      u->child[c] = new Node();
      u->child[c]->parent = u;
      u = u->child[c];
    }
    u->x = x;
    // 3 - add u to linked list
    u->prev = pred;
    u->next = pred->next;;
    u->prev->next = u;
    u->next->prev = u;
    // 4 - walk back up, updating jump pointers
    Node *v = u->parent;
    while (v != NULL) {
      if ((v->left == NULL
          && (v->jump == NULL || intValue(v->jump->x) > ix))
      || (v->right == NULL
          && (v->jump == NULL || intValue(v->jump->x) < ix)))
        v->jump = u;
      v = v->parent;
    }
    n++;
    return true;
  }
This method performs one walk down the search path for $ \mathtt{x}$ and one walk back up. Each step of these walks takes constant time, so the $ \mathtt{add(x)}$ runs in $ O(\ensuremath{\mathtt{w}})$ time.

The $ \mathtt{remove(x)}$ operation undoes the work of $ \mathtt{add(x)}$. Like $ \mathtt{add(x)}$, it has a lot of things to take care of:

  1. It follows the search path for $ \mathtt{x}$ until reaching the leaf, $ \mathtt{u}$, containing $ \mathtt{x}$.
  2. It removes $ \mathtt{u}$ from the doubly-linked list
  3. It deletes $ \mathtt{u}$ and then walks back up the search path for $ \mathtt{x}$ deleting nodes until reaching a node $ \mathtt{v}$ that has a child that is not on the search path for $ \mathtt{x}$
  4. It walks upwards from $ \mathtt{v}$ to the root updating any $ \mathtt{jump}$ pointers that point to $ \mathtt{u}$.
A removal is illustrated in Figure 13.5.
Figure 13.5: Removing the value 9 from the $ \mathtt{BinaryTrie}$ in Figure 13.2.
\includegraphics{figs/binarytrie-remove}
  bool remove(T x) {
    // 1 - find leaf, u, containing x
    int i = 0, c;
    unsigned ix = intValue(x);
    Node *u = &r;
    for (i = 0; i < w; i++) {
      c = (ix >> (w-i-1)) & 1;
      if (u->child[c] == NULL) return false;
      u = u->child[c];
    }
    // 2 - remove u from linked list
    u->prev->next = u->next;
    u->next->prev = u->prev;
    Node *v = u;
    // 3 - delete nodes on path to u
    for (i = w-1; i >= 0; i--) {
      c = (ix >> (w-i-1)) & 1;
      v = v->parent;
      delete v->child[c];
      v->child[c] = NULL;
      if (v->child[1-c] != NULL) break;
    }
    // 4 - update jump pointers
    v->jump = u;
    for (; i >= 0; i--) {
      c = (ix >> (w-i-1)) & 1;
      if (v->jump == u)
        v->jump = u->child[1-c];
      v = v->parent;
    }
    n--;
    return true;
  }

Theorem 13..1   A $ \mathtt{BinaryTrie}$ implements the $ \mathtt{SSet}$ interface for $ \mathtt{w}$-bit integers. A $ \mathtt{BinaryTrie}$ supports the operations $ \mathtt{add(x)}$, $ \mathtt{remove(x)}$, and $ \mathtt{find(x)}$ in $ O(\ensuremath{\mathtt{w}})$ time per operation. The space used by a $ \mathtt{BinaryTrie}$ that stores $ \mathtt{n}$ values is $ O(\ensuremath{\mathtt{n}}\cdot\ensuremath{\mathtt{w}})$.

opendatastructures.org