Subsections


3.3 $ \mathtt{SEList}$: A Space-Efficient Linked List

One of the drawbacks of linked lists (besides the time it takes to access elements that are deep within the list) is their space usage. Each node in a $ \mathtt{DLList}$ requires an additional two references to the next and previous nodes in the list. Two of the fields in a $ \mathtt{Node}$ are dedicated to maintaining the list and only one of the fields is for storing data!

An $ \mathtt{SEList}$ (space-efficient list) reduces this wasted space using a simple idea: Rather than store individual elements in a $ \mathtt{DLList}$, we store a block (array) containing several items. More precisely, an $ \mathtt{SEList}$ is parameterized by a block size $ \mathtt{b}$. Each individual node in an $ \mathtt{SEList}$ stores a block that can hold up to $ \mathtt{b+1}$ elements.

It will turn out, for reasons that become clear later, that it will be helpful if we can do $ \mathtt{Deque}$ operations on each block. The data structure we choose for this is a $ \mathtt{BDeque}$ (bounded deque), derived from the $ \mathtt{ArrayDeque}$ structure described in Section 2.4. The $ \mathtt{BDeque}$ differs from the $ \mathtt{ArrayDeque}$ in one small way: When a new $ \mathtt{BDeque}$ is created, the size of the backing array $ \mathtt{a}$ is fixed at $ \mathtt{b+1}$ and it never grows or shrinks. The important property of a $ \mathtt{BDeque}$ is that it allows for the addition or removal of elements at either the front or back in constant time. This will be useful as elements are shifted from one block to another.

  class BDeque : public ArrayDeque<T> {
  public:
    BDeque(int b) {
      n = 0;
      j = 0;
      array<int> z(b+1);
      a = z;
    }
    ~BDeque() { }
    // C++ Question: Why is this necessary?
    void add(int i, T x) {
      ArrayDeque<T>::add(i, x);
    }
    bool add(T x) {
      ArrayDeque<T>::add(size(), x);
      return true;
    }
    void resize() {}
  };

An $ \mathtt{SEList}$ is then a doubly-linked list of blocks:

  class Node {
  public:
    BDeque d;
    Node *prev, *next;
    Node(int b) : d(b) { }
  };
  int n;
  Node dummy;

3.3.1 Space Requirements

An $ \mathtt{SEList}$ places very tight restrictions on the number of elements in a block: Unless a block is the last block, then that block contains at least $ \ensuremath{\mathtt{b}}-1$ and at most $ \ensuremath{\mathtt{b}}+1$ elements. This means that, if an $ \mathtt{SEList}$ contains $ \mathtt{n}$ elements, then it has at most

$\displaystyle \ensuremath{\mathtt{n}}/(\ensuremath{\mathtt{b}}-1) + 1 = O(\ensuremath{\mathtt{n}}/\ensuremath{\mathtt{b}})
$

blocks. The $ \mathtt{BDeque}$ for each block contains an array of length $ \ensuremath{\mathtt{b}}+1$ but, for all blocks except the last, at most a constant amount of space is wasted in this array. The remaining memory used by a block is also constant. This means that the wasted space in an $ \mathtt{SEList}$ is only $ O(\ensuremath{\mathtt{b}}+\ensuremath{\mathtt{n}}/\ensuremath{\mathtt{b}})$. By choosing a value of $ \mathtt{b}$ within a constant factor of $ \sqrt{\ensuremath{\mathtt{n}}}$ we can make the space-overhead of an SEList approach the $ \sqrt{\ensuremath{\mathtt{n}}}$ lower bound given in Section 2.6.2.

3.3.2 Finding Elements

The first challenge we face with an $ \mathtt{SEList}$ is finding the list item with a given index $ \mathtt{i}$. Note that the location of an element consists of two parts: The node $ \mathtt{u}$ that contains the block that contains the element as well as the index $ \mathtt{j}$ of the element within its block.

  class Location {
  public:
    Node *u;
    int j;
    Location() { }
    Location(Node *u, int j) {
      this->u = u;
      this->j = j;
    }
  };

To find the block that contains a particular element, we proceed in the same way as in a $ \mathtt{DLList}$. We either start at the front of the list and traverse in the forward direction or at the back of the list and traverse backwards until we reach the node we want. The only difference is that, each time we move from one node to the next, we skip over a whole block of elements.

  void getLocation(int i, Location &ell) {
    if (i < n / 2) {
      Node *u = dummy.next;
      while (i >= u->d.size()) {
        i -= u->d.size();
        u = u->next;
      }
      ell.u = u;
      ell.j = i;
    } else {
      Node *u = &dummy;
      int idx = n;
      while (i < idx) {
        u = u->prev;
        idx -= u->d.size();
      }
      ell.u = u;
      ell.j = i - idx;
    }
  }

Remember that, with the exception of at most one block, each block contains at least $ \ensuremath{\mathtt{b}}-1$ elements, so each step in our search gets us $ \ensuremath{\mathtt{b}}-1$ elements closer to the element we are looking for. If we are searching forward, this means we reach the node we want after $ O(1+\ensuremath{\mathtt{i}}/\ensuremath{\mathtt{b}})$ steps. If we search backwards, we reach the node we want after $ O(1+(\ensuremath{\mathtt{n}}-\ensuremath{\mathtt{i}})/\ensuremath{\mathtt{b}})$ steps. The algorithm takes the smaller of these two quantities depending on the value of $ \mathtt{i}$, so the time to locate the item with index $ \mathtt{i}$ is $ O(1+\min\{\ensuremath{\mathtt{i}},\ensuremath{\mathtt{n}}-\ensuremath{\mathtt{i}}\}/\ensuremath{\mathtt{b}})$.

Once we know how to locate the item with index $ \mathtt{i}$, the $ \mathtt{get(i)}$ and $ \mathtt{set(i,x)}$ operations translate into getting or setting a particular index in the correct block:

  T get(int i) {
    Location l;
    getLocation(i, l);
    return l.u->d.get(l.j);
  }
  T set(int i, T x) {
    Location l;
    getLocation(i, l);
    T y = l.u->d.get(l.j);
    l.u->d.set(l.j, x);
    return y;
  }

The running times of these operations are dominated by the time it takes to locate the item, so they also run in $ O(1+\min\{\ensuremath{\mathtt{i}},\ensuremath{\mathtt{n}}-\ensuremath{\mathtt{i}}\}/\ensuremath{\mathtt{b}})$ time.

3.3.3 Adding an Element

Things start to get complicated when adding elements to an $ \mathtt{SEList}$. Before considering the general case, we consider the easier operation, $ \mathtt{add(x)}$, in which $ \mathtt{x}$ is added to the end of the list. If the last block is full (or does not exist because there are no blocks yet), then we first allocate a new block and append it to the list of blocks. Now that we are sure that the last block exists and is not full, we append $ \mathtt{x}$ to the last block.

  void add(T x) {
    Node *last = dummy.prev;
    if (last == &dummy || last->d.size() == b+1) {
      last = addBefore(&dummy);
    }
    last->d.add(x);
    n++;
  }

Things get more complicated when we add to the interior of the list using $ \mathtt{add(i,x)}$. We first locate $ \mathtt{i}$ to get the node $ \mathtt{u}$ whose block contains the $ \mathtt{i}$th list item. The problem is that we want to insert $ \mathtt{x}$ into $ \mathtt{u}$'s block, but we have to be prepared for the case where $ \mathtt{u}$'s block already contains $ \ensuremath{\mathtt{b}}+1$ elements, so that it is full and there is no room for $ \mathtt{x}$.

Let $ \ensuremath{\mathtt{u}}_0,\ensuremath{\mathtt{u}}_1,\ensuremath{\mathtt{u}}_2,\ldots$ denote $ \mathtt{u}$, $ \mathtt{u.next}$, $ \mathtt{u.next.next}$, and so on. We explore $ \ensuremath{\mathtt{u}}_0,\ensuremath{\mathtt{u}}_1,\ensuremath{\mathtt{u}}_2,\ldots$ looking for a node that can provide space for $ \mathtt{x}$. Three cases can occur during our space exploration (see Figure 3.4):

Figure 3.4: The three cases that occur during the addition of an item $ \mathtt{x}$ in the interior of an $ \mathtt{SEList}$. (This $ \mathtt{SEList}$ has block size $ \ensuremath{\mathtt{b}}=3$.)
\includegraphics{figs/selist-add-a}
\includegraphics{figs/selist-add-b}
\includegraphics{figs/selist-add-c}

  1. We quickly (in $ r+1\le \ensuremath{\mathtt{b}}$ steps) find a node $ \ensuremath{\mathtt{u}}_r$ whose block is not full. In this case, we perform $ r$ shifts of an element from one block into the next, so that the free space in $ \ensuremath{\mathtt{u}}_r$ becomes a free space in $ \ensuremath{\mathtt{u}}_0$. We can then insert $ \mathtt{x}$ into $ \ensuremath{\mathtt{u}}_0$'s block.

  2. We quickly (in $ r+1\le \ensuremath{\mathtt{b}}$ steps) run off the end of the list of blocks. In this case, we add a new empty block to the end of the list of blocks and proceed as in the first case.

  3. After $ \mathtt{b}$ steps we do not find any block that is not full. In this case, $ \ensuremath{\mathtt{u}}_0,\ldots,\ensuremath{\mathtt{u}}_{\ensuremath{\mathtt{b}}-1}$ is a sequence of $ \mathtt{b}$ blocks that each contain $ \ensuremath{\mathtt{b}}+1$ elements. We insert a new block $ \ensuremath{\mathtt{u}}_{\ensuremath{\mathtt{b}}}$ at the end of this sequence and spread the original $ \ensuremath{\mathtt{b}}(\ensuremath{\mathtt{b}}+1)$ elements so that each block of $ \ensuremath{\mathtt{u}}_0,\ldots,\ensuremath{\mathtt{u}}_{\ensuremath{\mathtt{b}}}$ contains exactly $ \mathtt{b}$ elements. Now $ \ensuremath{\mathtt{u}}_0$'s block contains only $ \mathtt{b}$ elements so it has room for us to insert $ \mathtt{x}$.

  void add(int i, T x) {
    if (i == n) {
      add(x);
      return;
    }
    Location l; getLocation(i, l);
    Node *u = l.u;
    int r = 0;
    while (r < b && u != &dummy && u->d.size() == b+1) {
      u = u->next;
      r++;
    }
    if (r == b) {      // found b blocks each with b+1 elements
      spread(l.u);
      u = l.u;
    }
    if (u == &dummy) {  // ran off the end of the list - add new node
      u = addBefore(u);
    }
    while (u != l.u) { // work backwards, shifting an element at each step
      u->d.add(0, u->prev->d.remove(u->prev->d.size()-1));
      u = u->prev;
    }
    u->d.add(l.j, x);
    n++;
  }

The running time of the $ \mathtt{add(i,x)}$ operation depends on which of the three cases above occurs. Cases 1 and 2 involve examining and shifting elements through at most $ \mathtt{b}$ blocks and take $ O(\ensuremath{\mathtt{b}})$ time. Case 3 involves calling the $ \mathtt{spread(u)}$ method, which moves $ \ensuremath{\mathtt{b}}(\ensuremath{\mathtt{b}}+1)$ elements and takes $ O(\ensuremath{\mathtt{b}}^2)$ time. If we ignore the cost of Case 3 (which we will account for later with amortization) this means that the total running time to locate $ \mathtt{i}$ and perform the insertion of $ \mathtt{x}$ is $ O(\ensuremath{\mathtt{b}}+\min\{\ensuremath{\mathtt{i}},\ensuremath{\mathtt{n}}-\ensuremath{\mathtt{i}}\}/\ensuremath{\mathtt{b}})$.

3.3.4 Removing an Element

Removing an element, using the $ \mathtt{remove(i)}$ method from an $ \mathtt{SEList}$ is similar to adding an element. We first locate the node $ \mathtt{u}$ that contains the element with index $ \mathtt{i}$. Now, we have to be prepared for the case where we cannot remove an element from $ \mathtt{u}$ without causing $ \mathtt{u}$'s block to have size less than $ \ensuremath{\mathtt{b}}-1$, which is not allowed.

Again, let $ \ensuremath{\mathtt{u}}_0,\ensuremath{\mathtt{u}}_1,\ensuremath{\mathtt{u}}_2,\ldots$ denote $ \mathtt{u}$, $ \mathtt{u.next}$, $ \mathtt{u.next.next}$, We examine $ \ensuremath{\mathtt{u}}_0,\ensuremath{\mathtt{u}}_1,\ensuremath{\mathtt{u}}_2,\ldots$ in order looking for a node from which we can borrow an element to make the size of $ \ensuremath{\mathtt{u}}_0$'s block larger than $ \ensuremath{\mathtt{b}}-1$. There are three cases to consider (see Figure 3.5):

Figure 3.5: The three cases that occur during the removal of an item $ \mathtt{x}$ in the interior of an $ \mathtt{SEList}$. (This $ \mathtt{SEList}$ has block size $ \ensuremath{\mathtt{b}}=3$.)
\includegraphics{figs/selist-remove-a}
\includegraphics{figs/selist-remove-b}
\includegraphics{figs/selist-remove-c}

  1. We quickly (in $ r+1\le \ensuremath{\mathtt{b}}$ steps) find a node whose block contains more than $ \ensuremath{\mathtt{b}}-1$ elements. In this case, we perform $ r$ shifts of an element from one block into the previous, so that the extra element in $ \ensuremath{\mathtt{u}}_r$ becomes an extra element in $ \ensuremath{\mathtt{u}}_0$. We can then remove the appropriate element from $ \ensuremath{\mathtt{u}}_0$'s block.

  2. We quickly (in $ r+1\le \ensuremath{\mathtt{b}}$ steps) run off the end of the list of blocks. In this case, $ \ensuremath{\mathtt{u}}_r$ is the last block, and there is no requirement that $ \ensuremath{\mathtt{u}}_r$'s block contain at least $ \ensuremath{\mathtt{b}}-1$ elements. Therefore, we proceed as above, borrowing an element from $ \ensuremath{\mathtt{u}}_r$ to make an extra element in $ \ensuremath{\mathtt{u}}_0$. If this causes $ \ensuremath{\mathtt{u}}_r$'s block to become empty, then we remove it.

  3. After $ \mathtt{b}$ steps we do not find any block containing more than $ \ensuremath{\mathtt{b}}-1$ elements. In this case, $ \ensuremath{\mathtt{u}}_0,\ldots,\ensuremath{\mathtt{u}}_{\ensuremath{\mathtt{b}}-1}$ is a sequence of $ \mathtt{b}$ blocks that each contain $ \ensuremath{\mathtt{b}}-1$ elements. We gather these $ \ensuremath{\mathtt{b}}(\ensuremath{\mathtt{b}}-1)$ elements into $ \ensuremath{\mathtt{u}}_0,\ldots,\ensuremath{\mathtt{u}}_{\ensuremath{\mathtt{b}}-2}$ so that each of these $ \ensuremath{\mathtt{b}}-1$ blocks contains exactly $ \mathtt{b}$ elements and we remove $ \ensuremath{\mathtt{u}}_{\ensuremath{\mathtt{b}}-1}$, which is now empty. Now $ \ensuremath{\mathtt{u}}_0$'s block contains $ \mathtt{b}$ elements so we can remove the appropriate element from it.

  T remove(int i) {
    Location l; getLocation(i, l);
    T y = l.u->d.get(l.j);
    Node *u = l.u;
    int r = 0;
    while (r < b && u != &dummy && u->d.size() == b - 1) {
      u = u->next;
      r++;
    }
    if (r == b) { // found b blocks each with b-1 elements
      gather(l.u);
    }
    u = l.u;
    u->d.remove(l.j);
    while (u->d.size() < b - 1 && u->next != &dummy) {
      u->d.add(u->next->d.remove(0));
      u = u->next;
    }
    if (u->d.size() == 0)
      remove(u);
    n--;
    return y;
  }

Like the $ \mathtt{add(i,x)}$ operation, the running time of the $ \mathtt{remove(i)}$ operation is $ O(\ensuremath{\mathtt{b}}+\min\{\ensuremath{\mathtt{i}},\ensuremath{\mathtt{n}}-\ensuremath{\mathtt{i}}\}/\ensuremath{\mathtt{b}})$ if we ignore the cost of the $ \mathtt{gather(u)}$ method that occurs in Case 3.

3.3.5 Amortized Analysis of Spreading and Gathering

Next, we consider the cost of the $ \mathtt{gather(u)}$ and $ \mathtt{spread(u)}$ methods that may be executed by the $ \mathtt{add(i,x)}$ and $ \mathtt{remove(i)}$ methods. For completeness, here they are:

  void spread(Node *u) {
    Node *w = u;
    for (int j = 0; j < b; j++) {
      w = w->next;
    }
    w = addBefore(w);
    while (w != u) {
      while (w->d.size() < b)
        w->d.add(0, w->prev->d.remove(w->prev->d.size()-1));
      w = w->prev;
    }
  }
  void gather(Node *u) {
    Node *w = u;
    for (int j = 0; j < b-1; j++) {
      while (w->d.size() < b)
        w->d.add(w->next->d.remove(0));
      w = w->next;
    }
    remove(w);
  }

The running time of each of these methods is dominated by the two nested loops. Both the inner loop and outer loop execute at most $ \ensuremath{\mathtt{b}}+1$ times, so the total running time of each of these methods is $ O((\ensuremath{\mathtt{b}}+1)^2)=O(\ensuremath{\mathtt{b}}^2)$. However, the following lemma shows that these methods execute on at most one out of every $ \mathtt{b}$ calls to $ \mathtt{add(i,x)}$ or $ \mathtt{remove(i)}$.

Lemma 3..1   If an empty $ \mathtt{SEList}$ is created and any sequence of $ m\ge 1$ calls to $ \mathtt{add(i,x)}$ and $ \mathtt{remove(i)}$ are performed, then the total time spent during all calls to $ \mathtt{spread()}$ and $ \mathtt{gather()}$ is $ O(\ensuremath{\mathtt{b}}m)$.

Proof. We will use the potential method of amortized analysis. We say that a node $ \mathtt{u}$ is fragile if $ \mathtt{u}$'s block does not contain $ \mathtt{b}$ elements (so that $ \mathtt{u}$ is either the last node, or contains $ \ensuremath{\mathtt{b}}-1$ or $ \ensuremath{\mathtt{b}}+1$ elements). Any node whose block contains $ \mathtt{b}$ elements is rugged. Define the potential of an $ \mathtt{SEList}$ as the number of fragile nodes it contains. We will consider only the $ \mathtt{add(i,x)}$ operation and its relation to the number of calls to $ \mathtt{spread(u)}$. The analysis of $ \mathtt{remove(i)}$ and $ \mathtt{gather(u)}$ is identical.

Notice that, if Case 1 occurs during the $ \mathtt{add(i,x)}$ method, then only one node, $ \ensuremath{\mathtt{u}}_r$ has the size of its block changed. Therefore, at most one node, namely $ \ensuremath{\mathtt{u}}_r$, goes from being rugged to being fragile. If Case 2 occurs, then a new node is created, and this node is fragile, but no other node changes sizes, so the number of fragile nodes increases by one. Thus, in either Case 1 or Case 2 the potential of the SEList increases by at most 1.

Finally, if Case 3 occurs, it is because $ \ensuremath{\mathtt{u}}_0,\ldots,\ensuremath{\mathtt{u}}_{\ensuremath{\mathtt{b}}-1}$ are all fragile nodes. Then $ \ensuremath{\mathtt{spread(}}u_0\ensuremath{\mathtt{)}}$ is called and these $ \mathtt{b}$ fragile nodes are replaced with $ \ensuremath{\mathtt{b}}+1$ rugged nodes. Finally, $ \mathtt{x}$ is added to $ \ensuremath{\mathtt{u}}_0$'s block, making $ \ensuremath{\mathtt{u}}_0$ fragile. In total the potential decreases by $ \ensuremath{\mathtt{b}}-1$.

In summary, the potential starts at 0 (there are no nodes in the list). Each time Case 1 or Case 2 occurs, the potential increases by at most 1. Each time Case 3 occurs, the potential decreases by $ \ensuremath{\mathtt{b}}-1$. The potential (which counts the number of fragile nodes) is never less than 0. We conclude that, for every occurrence of Case 3, there are at least $ \ensuremath{\mathtt{b}}-1$ occurrences of Case 1 or Case 2. Thus, for every call to $ \mathtt{spread(u)}$ there are at least $ \mathtt{b}$ calls to $ \mathtt{add(i,x)}$. This completes the proof. $ \qedsymbol$

3.3.6 Summary

The following theorem summarizes the performance of the $ \mathtt{SEList}$ data structure:

Theorem 3..3   An $ \mathtt{SEList}$ implements the $ \mathtt{List}$ interface. Ignoring the cost of calls to $ \mathtt{spread(u)}$ and $ \mathtt{gather(u)}$, an $ \mathtt{SEList}$ with block size $ \mathtt{b}$ supports the operations Furthermore, beginning with an empty $ \mathtt{SEList}$, any sequence of $ m$ $ \mathtt{add(i,x)}$ and $ \mathtt{remove(i)}$ operations results in a total of $ O(\ensuremath{\mathtt{b}}m)$ time spent during all calls to $ \mathtt{spread(u)}$ and $ \mathtt{gather(u)}$.

The space (measured in words)3.1 used by an $ \mathtt{SEList}$ that stores $ \mathtt{n}$ elements is $ \ensuremath{\mathtt{n}} +O(\ensuremath{\mathtt{b}} + \ensuremath{\mathtt{n}}/\ensuremath{\mathtt{b}})$.

The $ \mathtt{SEList}$ is a tradeoff between an $ \mathtt{ArrayList}$ and a $ \mathtt{DLList}$ where the relative mix of these two structures depends on the block size $ \mathtt{b}$. At the extreme $ \ensuremath{\mathtt{b}}=2$, each $ \mathtt{SEList}$ node stores at most 3 values, which is really not much different than a $ \mathtt{DLList}$. At the other extreme, $ \ensuremath{\mathtt{b}}>\ensuremath{\mathtt{n}}$, all the elements are stored in a single array, just like in an $ \mathtt{ArrayList}$. In between these two extremes lies a tradeoff between the time it takes to add or remove a list item and the time it takes to locate a particular list item.



Footnotes

... words)3.1
Recall Section 1.3 for a discussion of how memory is measured.
opendatastructures.org