13.2 $ \mathtt{XFastTrie}$: Searching in Doubly-Logarithmic Time

The performance of the $ \mathtt{BinaryTrie}$ structure is not that impressive. The number of elements, $ \mathtt{n}$, stored in the structure is at most $ 2^{\ensuremath{\mathtt{w}}}$, so $ \log \ensuremath{\mathtt{n}} \le \ensuremath{\mathtt{w}}$. In other words, any of the comparison-based $ \mathtt{SSet}$ structures described in other parts of this book are at least as efficient as a $ \mathtt{BinaryTrie}$, and don't have the restriction of only being able to store integers.

Next we describe the $ \mathtt{XFastTrie}$, which is just a $ \mathtt{BinaryTrie}$ with $ \mathtt{w+1}$ hash tables--one for each level of the trie. These hash tables are used to speed up the $ \mathtt{find(x)}$ operation to $ O(\log \ensuremath{\mathtt{w}})$ time. Recall that the $ \mathtt{find(x)}$ operation in a $ \mathtt{BinaryTrie}$ is almost complete once we reach a node, $ \mathtt{u}$, where the search path for $ \mathtt{x}$ would like to proceed to $ \mathtt{u.right}$ (or $ \mathtt{u.left}$) but $ \mathtt{u}$ has no right (respectively, left) child. At this point, the search uses $ \mathtt{u.jump}$ to jump to a leaf, $ \mathtt{v}$, of the $ \mathtt{BinaryTrie}$ and either return $ \mathtt{v}$ or its successor in the linked list of leaves. An $ \mathtt{XFastTrie}$ speeds up the search process by using binary search on the levels of the trie to locate the node $ \mathtt{u}$.

To use binary search, we need a way to determine if the node $ \mathtt{u}$ we are looking for is above a particular level, $ \mathtt{i}$, of if $ \mathtt{u}$ is at or below level $ \mathtt{i}$. This information is given by the highest-order $ \mathtt{i}$ bits in the binary representation of $ \mathtt{x}$; these bits determine the search path that $ \mathtt{x}$ takes from the root to level $ \mathtt{i}$. For an example, refer to Figure 13.6; in this figure the last node, $ \mathtt{u}$, on search path for 14 (whose binary representation is 1110) is the node labelled $ 11{\star \star }$ at level 2 because there is no node labelled $ 111{\star}$ at level 3. Thus, we can label each node at level $ \mathtt{i}$ with an $ \mathtt{i}$-bit integer. Then the node $ \mathtt{u}$ we are searching for is at or below level $ \mathtt{i}$ if and only if there is a node at level $ \mathtt{i}$ whose label matches the highest-order $ \mathtt{i}$ bits of $ \mathtt{x}$.

Figure 13.6: The search path for 14 (1110) ends at the node labelled $ 11{\star \star }$ since there is no node labelled $ 111\star $.
\includegraphics{figs/xfast-path}

In an $ \mathtt{XFastTrie}$, we store, for each $ \ensuremath{\mathtt{i}}\in\{0,\ldots,\ensuremath{\mathtt{w}}\}$, all the nodes at level $ \mathtt{i}$ in a $ \mathtt{USet}$, $ \mathtt{t[i]}$, that is implemented as a hash table (Chapter 5). Using this $ \mathtt{USet}$ allows us to check in constant expected time if there is a node at level $ \mathtt{i}$ whose label matches the highest-order $ \mathtt{i}$ bits of $ \mathtt{x}$. In fact, we can even find this node using $ \mathtt{t[i].find(x>>(w-i))}$.

The hash tables $ \ensuremath{\mathtt{t[0]}},\ldots,\ensuremath{\mathtt{t[w]}}$ allow us to use binary search to find $ \mathtt{u}$. Initially, we know that $ \mathtt{u}$ is at some level $ \mathtt{i}$ with $ 0\le \ensuremath{\mathtt{i}}< \ensuremath{\mathtt{w}}+1$. We therefore initialize $ \ensuremath{\mathtt{l}}=0$ and $ \ensuremath{\mathtt{h}}=\ensuremath{\mathtt{w}}+1$ and repeatedly look at the hash table $ \mathtt{t[i]}$, where $ \ensuremath{\mathtt{i}}=\lfloor
(\ensuremath{\mathtt{l+h}})/2\rfloor$. If $ \ensuremath{\mathtt{t[i]}}$ contains a node whose label matches $ \mathtt{x}$'s highest-order $ \mathtt{i}$ bits then we set $ \mathtt{l=i}$ ( $ \mathtt{u}$ is at or below level $ \mathtt{i}$), otherwise we set $ \mathtt{h=i}$ ( $ \mathtt{u}$ is above level $ \mathtt{i}$). This process terminates when $ \ensuremath{\mathtt{h-l}}\le 1$, in which case we determine that $ \mathtt{u}$ is at level $ \mathtt{l}$. We then complete the $ \mathtt{find(x)}$ operation using $ \mathtt{u.jump}$ and the doubly-linked list of leaves.

  T find(T x) {
    int l = 0, h = w+1;
    unsigned ix = intValue(x);
    Node *v, *u = &r;
    while (h-l > 1) {
      int i = (l+h)/2;
      XPair<Node> p(ix >> (w-i));
      if ((v = t[i].find(p).u) == NULL) {
        h = i;
      } else {
        u = v;
        l = i;
      }
    }
    if (l == w) return u->x;
    Node *pred = (((ix >> (w-l-1)) & 1) == 1) ? u->jump : u->jump->prev;
    return (pred->next == &dummy) ? NULL : pred->next->x;
  }
Each iteration of the $ \mathtt{while}$ loop in the above method decreases $ \mathtt{h-l}$ by roughly a factor of 2, so this loop finds $ \mathtt{u}$ after $ O(\log \ensuremath{\mathtt{w}})$ iterations. Each iteration performs a constant amount of work and one $ \mathtt{find(x)}$ operation in a $ \mathtt{USet}$, which takes constant expected time. The remaining work takes only constant time, so the $ \mathtt{find(x)}$ method in an $ \mathtt{XFastTrie}$ takes only $ O(\log \ensuremath{\mathtt{w}})$ expected time.

The $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ methods for an $ \mathtt{XFastTrie}$ are almost identical to the same methods in a $ \mathtt{BinaryTrie}$. The only modifications are for managing the hash tables $ \mathtt{t[0]}$,..., $ \mathtt{t[w]}$. During the $ \mathtt{add(x)}$ operation, when a new node is created at level $ \mathtt{i}$, this node is added to $ \mathtt{t[i]}$. During a $ \mathtt{remove(x)}$ operation, when a node is removed form level $ \mathtt{i}$, this node is removed from $ \mathtt{t[i]}$. Since adding and removing from a hash table take constant expected time, this does not increase the running times of $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ by more than a constant factor. We omit a code listing or $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ since it is almost identical to the (long) code listing already provided for the same methods in a $ \mathtt{BinaryTrie}$.

The following theorem summarizes the performance of an $ \mathtt{XFastTrie}$:

Theorem 13..2   An $ \mathtt{XFastTrie}$ implements the $ \mathtt{SSet}$ interface for $ \mathtt{w}$-bit integers. An $ \mathtt{XFastTrie}$ supports the operations The space used by an $ \mathtt{XFastTrie}$ that stores $ \mathtt{n}$ values is $ O(\ensuremath{\mathtt{n}}\cdot\ensuremath{\mathtt{w}})$.

opendatastructures.org