13.4 Discussion and Exercises

The first data structure to provide $ O(\log\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}})$ time $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$, $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$, and $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operations was proposed by van Emde Boas and has since become known as the van Emde Boas (or stratified) tree [72]. The original van Emde Boas structure had size $ 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$, making it impractical for large integers.

The XFastTrie and YFastTrie data structures were discovered by Willard [75]. The XFastTrie structure is closely related to van Emde Boas trees; for instance, the hash tables in an XFastTrie replace arrays in a van Emde Boas tree. That is, instead of storing the hash table $ \ensuremath{\ensuremath{\mathit{t}}[\ensuremath{\mathit{i}}]}$, a van Emde Boas tree stores an array of length $ 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{i}}}}}$.

Another structure for storing integers is Fredman and Willard's fusion trees [32]. This structure can store $ \ensuremath{\ensuremath{\mathit{n}}}$ $ \ensuremath{\ensuremath{\mathit{w}}}$-bit integers in $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ space so that the $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation runs in $ O((\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})/(\log
\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}))$ time. By using a fusion tree when $ \log \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}} > \sqrt{\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}}$ and a YFastTrie when $ \log \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}} \le \sqrt{\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}}$, one obtains an $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ space data structure that can implement the $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation in $ O(\sqrt{\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}})$ time. Recent lower-bound results of P{\v{a\/}}\kern.05emtra{\c{s\/}}cu and Thorup [57] show that these results are more or less optimal, at least for structures that use only $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ space.

Exercise 13..1   Design and implement a simplified version of a BinaryTrie that does not have a linked list or jump pointers, but for which $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$

still runs in $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}})$ time.

Exercise 13..2   Design and implement a simplified implementation of an XFastTrie that doesn't use a binary trie at all. Instead, your implementation should store everything in a doubly-linked list and $ \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}+1$ hash tables.

Exercise 13..3   We can think of a BinaryTrie as a structure that stores bit strings of length $ \ensuremath{\ensuremath{\mathit{w}}}$ in such a way that each bitstring is represented as a root to leaf path. Extend this idea into an SSet implementation that stores variable-length strings and implements $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{s}})}$, $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{s}})}$, and $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{s}})}$ in time proporitional to the length of $ \ensuremath{\ensuremath{\mathit{s}}}$.

Hint: Each node in your data structure should store a hash table that is indexed by character values.

Exercise 13..4   For an integer $ \ensuremath{\ensuremath{\ensuremath{\mathit{x}}}}\in\{0,\ldots2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}-1\}$, let $ d(\ensuremath{\ensuremath{\ensuremath{\mathit{x}}}})$ denote the difference between $ \ensuremath{\ensuremath{\mathit{x}}}$ and the value returned by $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ [if $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ returns $ \ensuremath{\ensuremath{\mathit{nil}}}$, then define $ d(\ensuremath{\ensuremath{\ensuremath{\mathit{x}}}})$ as $ 2^\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}$]. For example, if $ \ensuremath{\mathrm{find}(23)}$ returns 43, then $ d(23)=20$.
  1. Design and implement a modified version of the $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation in an XFastTrie that runs in $ O(1+\log d(\ensuremath{\ensuremath{\ensuremath{\mathit{x}}}}))$ expected time. Hint: The hash table $ t[\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}]$ contains all the values, $ \ensuremath{\ensuremath{\mathit{x}}}$, such that $ d(\ensuremath{\ensuremath{\ensuremath{\mathit{x}}}})=0$, so that would be a good place to start.
  2. Design and implement a modified version of the $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation in an XFastTrie that runs in $ O(1+\log\log d(\ensuremath{\ensuremath{\ensuremath{\mathit{x}}}}))$ expected time.