13.4 Discussion and Exercises

The first data structure to provide $ O(\log \ensuremath{\mathtt{w}})$ time $ \mathtt{add(x)}$, $ \mathtt{remove(x)}$, and $ \mathtt{find(x)}$ operations was proposed by van Emde Boas and has since become known as the van Emde Boas (or stratified) tree [72]. The original van Emde Boas structure had size $ 2^{\ensuremath{\mathtt{w}}}$, so was impractical for large integers.

The XFastTrie and YFastTrie data structures were discovered by Willard [75]. The XFastTrie structure is very closely related to van Emde Boas trees. One view of this is that the hash tables in an XFastTrie replace arrays in a van Emde Boas tree. That is, instead of storing the hash table $ \mathtt{t[i]}$, a van Emde Boas tree stores an array of length $ 2^{\ensuremath{\mathtt{i}}}$.

Another structure for storing integers is Fredman and Willard's fusion trees [30]. This structure can store $ \mathtt{n}$ $ \mathtt{w}$-bit integers in $ O(\ensuremath{\mathtt{n}})$ space so that the $ \mathtt{find(x)}$ operation runs in $ O((\log \ensuremath{\mathtt{n}})/(\log
\ensuremath{\mathtt{w}}))$ time. By using a fusion tree when $ \log \ensuremath{\mathtt{w}} > \sqrt{\log \ensuremath{\mathtt{n}}}$ and a YFastTrie when $ \log \ensuremath{\mathtt{w}} \le \sqrt{\log \ensuremath{\mathtt{n}}}$ one obtains an $ O(\ensuremath{\mathtt{n}})$ space data structure that can implement the $ \mathtt{find(x)}$ operation in $ O(\sqrt{\log \ensuremath{\mathtt{n}}})$ time. Recent lower-bound results of P{\v{a\/}}\kern.05emtra{\c{s\/}}cu and Thorup [59] show that these results are more-or-less optimal, at least for structures that use only $ O(\ensuremath{\mathtt{n}})$ space.

Exercise 13..1   Design and implement a simplified version of a BinaryTrie that doesn't have a linked list or jump pointers, but for which $ \mathtt{find(x)}$ still runs in $ O(\ensuremath{\mathtt{w}})$ time.

Exercise 13..2   Design and implement a simplified implementation of an XFastTrie that doesn't use a binary trie at all. Instead, your implementation should store everything in a doubly-linked list and $ \ensuremath{\mathtt{w}}+1$ hash tables.

Exercise 13..3   We can think of a BinaryTrie as a structure that stores bit strings of length $ \mathtt{w}$ in such a way that each bitstring is represented as a root to leaf path. Extend this idea into an SSet implementation that stores variable-length strings and implements $ \mathtt{add(s)}$, $ \mathtt{remove(s)}$, and $ \mathtt{find(s)}$ in time proporitional to the length of $ \mathtt{s}$.

Hint: Each node in your data structure should store a hash table that is indexed by character values.

Exercise 13..4   For an integer $ \ensuremath{\mathtt{x}}\in\{0,\ldots2^{\ensuremath{\mathtt{w}}}-1\}$, let $ d(\ensuremath{\mathtt{x}})$ denote the difference between $ \mathtt{x}$ and the value returned by $ \mathtt{find(x)}$ [if $ \mathtt{find(x)}$ returns $ \mathtt{null}$, then define $ d(\ensuremath{\mathtt{x}})$ as $ 2^\ensuremath{\mathtt{w}}$]. For example, if $ \mathtt{find(23)}$ returns 43, then $ d(23)=20$.
  1. Design and implement a modified version of the $ \mathtt{find(x)}$ operation in an XFastTrie that runs in $ O(1+\log d(\ensuremath{\mathtt{x}}))$ expected time. Hint: The hash table $ t[\ensuremath{\mathtt{w}}]$ contains all the values, $ \mathtt{x}$, such that $ d(\ensuremath{\mathtt{x}})=0$, so that would be a good place to start.
  2. Design and implement a modified version of the $ \mathtt{find(x)}$ operation in an XFastTrie that runs in $ O(1+\log\log d(\ensuremath{\mathtt{x}}))$ expected time.

opendatastructures.org