13.2 XFastTrie: Searching in Doubly-Logarithmic Time

The performance of the BinaryTrie structure is not very impressive. The number of elements, $ \ensuremath{\ensuremath{\mathit{n}}}$, stored in the structure is at most $ 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$, so $ \log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}\le \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}$. In other words, any of the comparison-based SSet structures described in other parts of this book are at least as efficient as a BinaryTrie, and are not restricted to only storing integers.

Next we describe the XFastTrie, which is just a BinaryTrie with $ \ensuremath{\ensuremath{\mathit{w}}+1}$ hash tables--one for each level of the trie. These hash tables are used to speed up the $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation to $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}})$ time. Recall that the $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation in a BinaryTrie is almost complete once we reach a node, $ \ensuremath{\ensuremath{\mathit{u}}}$, where the search path for $ \ensuremath{\ensuremath{\mathit{x}}}$ would like to proceed to $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{right}}}$ (or $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{left}}}$) but $ \ensuremath{\ensuremath{\mathit{u}}}$ has no right (respectively, left) child. At this point, the search uses $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{jump}}}$ to jump to a leaf, $ \ensuremath{\ensuremath{\mathit{v}}}$, of the BinaryTrie and either return $ \ensuremath{\ensuremath{\mathit{v}}}$ or its successor in the linked list of leaves. An XFastTrie speeds up the search process by using binary search on the levels of the trie to locate the node $ \ensuremath{\ensuremath{\mathit{u}}}$.

To use binary search, we need a way to determine if the node $ \ensuremath{\ensuremath{\mathit{u}}}$ we are looking for is above a particular level, $ \ensuremath{\ensuremath{\mathit{i}}}$, of if $ \ensuremath{\ensuremath{\mathit{u}}}$ is at or below level $ \ensuremath{\ensuremath{\mathit{i}}}$. This information is given by the highest-order $ \ensuremath{\ensuremath{\mathit{i}}}$ bits in the binary representation of $ \ensuremath{\ensuremath{\mathit{x}}}$; these bits determine the search path that $ \ensuremath{\ensuremath{\mathit{x}}}$ takes from the root to level $ \ensuremath{\ensuremath{\mathit{i}}}$. For an example, refer to Figure 13.6; in this figure the last node, $ \ensuremath{\ensuremath{\mathit{u}}}$, on search path for 14 (whose binary representation is 1110) is the node labelled $ 11{\star\star}$ at level 2 because there is no node labelled $ 111{\star}$ at level 3. Thus, we can label each node at level $ \ensuremath{\ensuremath{\mathit{i}}}$ with an $ \ensuremath{\ensuremath{\mathit{i}}}$-bit integer. Then, the node $ \ensuremath{\ensuremath{\mathit{u}}}$ we are searching for would be at or below level $ \ensuremath{\ensuremath{\mathit{i}}}$ if and only if there is a node at level $ \ensuremath{\ensuremath{\mathit{i}}}$ whose label matches the highest-order $ \ensuremath{\ensuremath{\mathit{i}}}$ bits of $ \ensuremath{\ensuremath{\mathit{x}}}$.

Figure 13.6: Since there is no node labelled $ 111\star$, the search path for 14 (1110) ends at the node labelled $ 11{\star\star}$ .
\includegraphics[scale=0.90909]{figs-python/xfast-path}

In an XFastTrie, we store, for each $ \ensuremath{\ensuremath{\ensuremath{\mathit{i}}}}\in\{0,\ldots,\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}\}$, all the nodes at level $ \ensuremath{\ensuremath{\mathit{i}}}$ in a USet, $ \ensuremath{\ensuremath{\mathit{t}}[\ensuremath{\mathit{i}}]}$, that is implemented as a hash table (Chapter 5). Using this USet allows us to check in constant expected time if there is a node at level $ \ensuremath{\ensuremath{\mathit{i}}}$ whose label matches the highest-order $ \ensuremath{\ensuremath{\mathit{i}}}$ bits of $ \ensuremath{\ensuremath{\mathit{x}}}$. In fact, we can even find this node using $ \ensuremath{\ensuremath{\mathit{t}}[\ensuremath{\mathit{i}}].\mathrm{find}(\en...
...{\mathit{x}}\ensuremath{\gg}(\ensuremath{\mathit{w}}-\ensuremath{\mathit{i}})})$

The hash tables $ \ensuremath{\ensuremath{\ensuremath{\mathit{t}}[0]}},\ldots,\ensuremath{\ensuremath{\ensuremath{\mathit{t}}[\ensuremath{\mathit{w}}]}}$ allow us to use binary search to find $ \ensuremath{\ensuremath{\mathit{u}}}$. Initially, we know that $ \ensuremath{\ensuremath{\mathit{u}}}$ is at some level $ \ensuremath{\ensuremath{\mathit{i}}}$ with $ 0\le \ensuremath{\ensuremath{\ensuremath{\mathit{i}}}}< \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}+1$. We therefore initialize $ \ensuremath{\ensuremath{\ensuremath{\mathit{\ell}}}}=0$ and $ \ensuremath{\ensuremath{\ensuremath{\mathit{h}}}}=\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}+1$ and repeatedly look at the hash table $ \ensuremath{\ensuremath{\mathit{t}}[\ensuremath{\mathit{i}}]}$, where $ \ensuremath{\ensuremath{\ensuremath{\mathit{i}}}}=\lfloor
(\ensuremath{\ensuremath{\ensuremath{\mathit{\ell}}+\ensuremath{\mathit{h}}}})/2\rfloor$. If $ \ensuremath{\ensuremath{\ensuremath{\mathit{t}}[\ensuremath{\mathit{i}}]}}$ contains a node whose label matches $ \ensuremath{\ensuremath{\mathit{x}}}$'s highest-order $ \ensuremath{\ensuremath{\mathit{i}}}$ bits then we set $ \ensuremath{\ensuremath{\mathit{\ell}}\gets \ensuremath{i}}$ ( $ \ensuremath{\ensuremath{\mathit{u}}}$ is at or below level $ \ensuremath{\ensuremath{\mathit{i}}}$); otherwise we set $ \ensuremath{\ensuremath{\mathit{h}}\gets \ensuremath{i}}$ ( $ \ensuremath{\ensuremath{\mathit{u}}}$ is above level $ \ensuremath{\ensuremath{\mathit{i}}}$). This process terminates when $ \ensuremath{\ensuremath{\ensuremath{\mathit{h}}-\ensuremath{\mathit{\ell}}}}\le 1$, in which case we determine that $ \ensuremath{\ensuremath{\mathit{u}}}$ is at level $ \ensuremath{\ensuremath{\mathit{\ell}}}$. We then complete the $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation using $ \ensuremath{\ensuremath{\mathit{u}}.\ensuremath{\mathit{jump}}}$ and the doubly-linked list of leaves.
\begin{leftbar}
\begin{flushleft}
\hspace*{1em} \ensuremath{\mathrm{find}(\ensur...
...suremath{\mathit{next}}.\ensuremath{\mathit{x}}}\\
\end{flushleft}\end{leftbar}
Each iteration of the $ {\color{black} \textbf{while}}$ loop in the above method decreases $ \ensuremath{\ensuremath{\mathit{h}}-\ensuremath{\mathit{\ell}}}$ by roughly a factor of two, so this loop finds $ \ensuremath{\ensuremath{\mathit{u}}}$ after $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}})$ iterations. Each iteration performs a constant amount of work and one $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation in a USet, which takes a constant expected amount of time. The remaining work takes only constant time, so the $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ method in an XFastTrie takes only $ O(\log\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}})$ expected time.

The $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ and $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ methods for an XFastTrie are almost identical to the same methods in a BinaryTrie. The only modifications are for managing the hash tables $ \ensuremath{\ensuremath{\mathit{t}}[0]}$,..., $ \ensuremath{\ensuremath{\mathit{t}}[\ensuremath{\mathit{w}}]}$. During the $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ operation, when a new node is created at level $ \ensuremath{\ensuremath{\mathit{i}}}$, this node is added to $ \ensuremath{\ensuremath{\mathit{t}}[\ensuremath{\mathit{i}}]}$. During a $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ operation, when a node is removed form level $ \ensuremath{\ensuremath{\mathit{i}}}$, this node is removed from $ \ensuremath{\ensuremath{\mathit{t}}[\ensuremath{\mathit{i}}]}$. Since adding and removing from a hash table take constant expected time, this does not increase the running times of $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ and $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ by more than a constant factor. We omit a code listing for $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ and $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ since the code is almost identical to the (long) code listing already provided for the same methods in a BinaryTrie.

The following theorem summarizes the performance of an XFastTrie:

Theorem 13..2   An XFastTrie implements the SSet interface for $ \ensuremath{\ensuremath{\mathit{w}}}$-bit integers. An XFastTrie supports the operations The space used by an XFastTrie that stores $ \ensuremath{\ensuremath{\mathit{n}}}$ values is $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}\cdot\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}})$.

opendatastructures.org