13.2 XFastTrie: Searching in Doubly-Logarithmic Time

The performance of the BinaryTrie structure is not that impressive. The number of elements, $ \mathtt{n}$, stored in the structure is at most $ 2^{\ensuremath{\mathtt{w}}}$, so $ \log \ensuremath{\mathtt{n}} \le \ensuremath{\mathtt{w}}$. In other words, any of the comparison-based SSet structures described in other parts of this book are at least as efficient as a BinaryTrie, and don't have the restriction of only being able to store integers.

Next we describe the XFastTrie, which is just a BinaryTrie with $ \mathtt{w+1}$ hash tables--one for each level of the trie. These hash tables are used to speed up the $ \mathtt{find(x)}$ operation to $ O(\log \ensuremath{\mathtt{w}})$ time. Recall that the $ \mathtt{find(x)}$ operation in a BinaryTrie is almost complete once we reach a node, $ \mathtt{u}$, where the search path for $ \mathtt{x}$ would like to proceed to $ \mathtt{u.right}$ (or $ \mathtt{u.left}$) but $ \mathtt{u}$ has no right (respectively, left) child. At this point, the search uses $ \mathtt{u.jump}$ to jump to a leaf, $ \mathtt{v}$, of the BinaryTrie and either return $ \mathtt{v}$ or its successor in the linked list of leaves. An XFastTrie speeds up the search process by using binary search on the levels of the trie to locate the node $ \mathtt{u}$.

To use binary search, we need a way to determine if the node $ \mathtt{u}$ we are looking for is above a particular level, $ \mathtt{i}$, of if $ \mathtt{u}$ is at or below level $ \mathtt{i}$. This information is given by the highest-order $ \mathtt{i}$ bits in the binary representation of $ \mathtt{x}$; these bits determine the search path that $ \mathtt{x}$ takes from the root to level $ \mathtt{i}$. For an example, refer to Figure 13.6; in this figure the last node, $ \mathtt{u}$, on search path for 14 (whose binary representation is 1110) is the node labelled $ 11{\star \star }$ at level 2 because there is no node labelled $ 111{\star}$ at level 3. Thus, we can label each node at level $ \mathtt{i}$ with an $ \mathtt{i}$-bit integer. Then the node $ \mathtt{u}$ we are searching for is at or below level $ \mathtt{i}$ if and only if there is a node at level $ \mathtt{i}$ whose label matches the highest-order $ \mathtt{i}$ bits of $ \mathtt{x}$.

Figure 13.6: The search path for 14 (1110) ends at the node labelled $ 11{\star \star }$ since there is no node labelled $ 111\star $.
\includegraphics{figs/xfast-path}

In an XFastTrie, we store, for each $ \ensuremath{\mathtt{i}}\in\{0,\ldots,\ensuremath{\mathtt{w}}\}$, all the nodes at level $ \mathtt{i}$ in a USet, $ \mathtt{t[i]}$, that is implemented as a hash table (). Using this USet allows us to check in constant expected time if there is a node at level $ \mathtt{i}$ whose label matches the highest-order $ \mathtt{i}$ bits of $ \mathtt{x}$. In fact, we can even find this node using $ \mathtt{t[i].find(x\text{\ttfamily >>}(w-i))}$.

The hash tables $ \ensuremath{\mathtt{t[0]}},\ldots,\ensuremath{\mathtt{t[w]}}$ allow us to use binary search to find $ \mathtt{u}$. Initially, we know that $ \mathtt{u}$ is at some level $ \mathtt{i}$ with $ 0\le \ensuremath{\mathtt{i}}< \ensuremath{\mathtt{w}}+1$. We therefore initialize $ \ensuremath{\mathtt{l}}=0$ and $ \ensuremath{\mathtt{h}}=\ensuremath{\mathtt{w}}+1$ and repeatedly look at the hash table $ \mathtt{t[i]}$, where $ \ensuremath{\mathtt{i}}=\lfloor
(\ensuremath{\mathtt{l+h}})/2\rfloor$. If $ \ensuremath{\mathtt{t[i]}}$ contains a node whose label matches $ \mathtt{x}$'s highest-order $ \mathtt{i}$ bits then we set $ \mathtt{l=i}$ ( $ \mathtt{u}$ is at or below level $ \mathtt{i}$), otherwise we set $ \mathtt{h=i}$ ( $ \mathtt{u}$ is above level $ \mathtt{i}$). This process terminates when $ \ensuremath{\mathtt{h-l}}\le 1$, in which case we determine that $ \mathtt{u}$ is at level $ \mathtt{l}$. We then complete the $ \mathtt{find(x)}$ operation using $ \mathtt{u.jump}$ and the doubly-linked list of leaves.

    T find(T x) {
        int l = 0, h = w+1, ix = it.intValue(x);
        Node v, u = r, q = newNode();
        while (h-l > 1) {
            int i = (l+h)/2;
            q.prefix = ix >>> w-i;
            if ((v = t[i].find(q)) == null) {
                h = i;
            } else {
                u = v;
                l = i;
            }
        }
        if (l == w) return u.x;
        Node pred = (((ix >>> w-l-1) & 1) == 1) ? u.jump : u.jump.child[0];
        return (pred.child[next] == dummy) ? null : pred.child[next].x;
    }
Each iteration of the $ \mathtt{while}$ loop in the above method decreases $ \mathtt{h-l}$ by roughly a factor of 2, so this loop finds $ \mathtt{u}$ after $ O(\log \ensuremath{\mathtt{w}})$ iterations. Each iteration performs a constant amount of work and one $ \mathtt{find(x)}$ operation in a USet, which takes constant expected time. The remaining work takes only constant time, so the $ \mathtt{find(x)}$ method in an XFastTrie takes only $ O(\log \ensuremath{\mathtt{w}})$ expected time.

The $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ methods for an XFastTrie are almost identical to the same methods in a BinaryTrie. The only modifications are for managing the hash tables $ \mathtt{t[0]}$,..., $ \mathtt{t[w]}$. During the $ \mathtt{add(x)}$ operation, when a new node is created at level $ \mathtt{i}$, this node is added to $ \mathtt{t[i]}$. During a $ \mathtt{remove(x)}$ operation, when a node is removed form level $ \mathtt{i}$, this node is removed from $ \mathtt{t[i]}$. Since adding and removing from a hash table take constant expected time, this does not increase the running times of $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ by more than a constant factor. We omit a code listing or $ \mathtt{add(x)}$ and $ \mathtt{remove(x)}$ since it is almost identical to the (long) code listing already provided for the same methods in a BinaryTrie.

The following theorem summarizes the performance of an XFastTrie:

Theorem 13..2   An XFastTrie implements the SSet interface for $ \mathtt{w}$-bit integers. An XFastTrie supports the operations The space used by an XFastTrie that stores $ \mathtt{n}$ values is $ O(\ensuremath{\mathtt{n}}\cdot\ensuremath{\mathtt{w}})$.

opendatastructures.org