13. Data Structures for Integers

In this chapter, we return to the problem of implementing an SSet. The difference now is that we assume the elements stored in the SSet are $ \ensuremath{\ensuremath{\mathit{w}}}$-bit integers. That is, we want to implement $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$, $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$, and $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ where $ \ensuremath{\ensuremath{\ensuremath{\mathit{x}}}}\in\{0,\ldots,2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}-1\}$. It is not too hard to think of plenty of applications where the data--or at least the key that we use for sorting the data--is an integer.

We will discuss three data structures, each building on the ideas of the previous. The first structure, the BinaryTrie performs all three SSet operations in $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}})$ time. This is not very impressive, since any subset of $ \{0,\ldots,2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}-1\}$ has size $ \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}\le 2^{\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}}$, so that $ \log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}} \le \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}$. All the other SSet implementations discussed in this book perform all operations in $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$ time so they are all at least as fast as a BinaryTrie.

The second structure, the XFastTrie, speeds up the search in a BinaryTrie by using hashing. With this speedup, the $ \ensuremath{\mathrm{find}(\ensuremath{\mathit{x}})}$ operation runs in $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}})$ time. However, $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ and $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ operations in an XFastTrie still take $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}})$ time and the space used by an XFastTrie is $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}}\cdot\ensuremath{\ensuremath{\ensuremath{\mathit{w}}}})$.

The third data structure, the YFastTrie, uses an XFastTrie to store only a sample of roughly one out of every $ \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}}$ elements and stores the remaining elements in a standard SSet structure. This trick reduces the running time of $ \ensuremath{\mathrm{add}(\ensuremath{\mathit{x}})}$ and $ \ensuremath{\mathrm{remove}(\ensuremath{\mathit{x}})}$ to $ O(\log \ensuremath{\ensuremath{\ensuremath{\mathit{w}}}})$ and decreases the space to $ O(\ensuremath{\ensuremath{\ensuremath{\mathit{n}}}})$.

The implementations used as examples in this chapter can store any type of data, as long as an integer can be associated with it. In the code samples, the variable $ \ensuremath{\ensuremath{\mathit{ix}}}$ is always the integer value associated with $ \ensuremath{\ensuremath{\mathit{x}}}$, and the method $ \ensuremath{\mathrm{int\_value}(\ensuremath{\mathit{x}})}$ converts $ \ensuremath{\ensuremath{\mathit{x}}}$ to its associated integer. In the text, however, we will simply treat $ \ensuremath{\ensuremath{\mathit{x}}}$ as if it is an integer.