13.1 : A digital search tree

A encodes a set of bit integers in a binary tree. All leaves in the tree have depth and each integer is encoded as a root-to-leaf path. The path for the integer turns left at level if the th most significant bit of is a 0 and turns right if it is a 1. Figure 13.1 shows an example for the case , in which the trie stores the integers 3(0011), 9(1001), 12(1100), and 13(1101).

Because the search path for a value depends on the bits of , it will be helpful to name the children of a node, , ( ) and ( ). These child pointers will actually serve double-duty. Since the leaves in a binary trie have no children, the pointers are used to string the leaves together into a doubly-linked list. For a leaf in the binary trie ( ) is the node that comes before in the list and ( ) is the node that follows in the list. A special node, , is used both before the first node and after the last node in the list (see Section 3.2). In the code samples, , , and refer to the same field in the node , as do , , and .

Each node, , also contains an additional pointer . If 's left child is missing, then points to the smallest leaf in 's subtree. If 's right child is missing, then points to the largest leaf in 's subtree. An example of a , showing pointers and the doubly-linked list at the leaves, is shown in Figure 13.2.

The operation in a is fairly straightforward. We try to follow the search path for in the trie. If we reach a leaf, then we have found . If we reach a node where we cannot proceed (because is missing a child), then we follow , which takes us either to the smallest leaf larger than or the largest leaf smaller than . Which of these two cases occurs depends on whether is missing its left or right child, respectively. In the former case ( is missing its left child), we have found the node we want. In the latter case ( is missing its right child), we can use the linked list to reach the node we want. Each of these cases is illustrated in Figure 13.3.

T find(T x) { int i, c = 0; unsigned ix = intValue(x); Node *u = &r; for (i = 0; i < w; i++) { c = (ix >> (w-i-1)) & 1; if (u->child[c] == NULL) break; u = u->child[c]; } if (i == w) return u->x; // found it u = (c == 0) ? u->jump : u->jump->next; return u == &dummy ? null : u->x; }The running-time of the method is dominated by the time it takes to follow a root-to-leaf path, so it runs in time.

The operation in a is also fairly straightforward, but has a lot of work to do:

- It follows the search path for until reaching a node where it can no longer proceed.
- It creates the remainder of the search path from to a leaf that contains .
- It adds the node, , containing to the linked list of leaves (it has access to the predecessor, , of in the linked list from the pointer of the last node, , encountered during step 1.)
- It walks back up the search path for adjusting pointers at the nodes whose pointer should now point to .

bool add(T x) { int i, c = 0; unsigned ix = intValue(x); Node *u = &r; // 1 - search for ix until falling out of the trie for (i = 0; i < w; i++) { c = (ix >> (w-i-1)) & 1; if (u->child[c] == NULL) break; u = u->child[c]; } if (i == w) return false; // already contains x - abort Node *pred = (c == right) ? u->jump : u->jump->left; u->jump = NULL; // u will have two children shortly // 2 - add path to ix for (; i < w; i++) { c = (ix >> (w-i-1)) & 1; u->child[c] = new Node(); u->child[c]->parent = u; u = u->child[c]; } u->x = x; // 3 - add u to linked list u->prev = pred; u->next = pred->next;; u->prev->next = u; u->next->prev = u; // 4 - walk back up, updating jump pointers Node *v = u->parent; while (v != NULL) { if ((v->left == NULL && (v->jump == NULL || intValue(v->jump->x) > ix)) || (v->right == NULL && (v->jump == NULL || intValue(v->jump->x) < ix))) v->jump = u; v = v->parent; } n++; return true; }This method performs one walk down the search path for and one walk back up. Each step of these walks takes constant time, so the method runs in time.

The operation undoes the work of . Like , it has a lot of work to do:

- It follows the search path for until reaching the leaf, , containing .
- It removes from the doubly-linked list.
- It deletes and then walks back up the search path for deleting nodes until reaching a node that has a child that is not on the search path for .
- It walks upwards from to the root updating any pointers that point to .

bool remove(T x) { // 1 - find leaf, u, containing x int i = 0, c; unsigned ix = intValue(x); Node *u = &r; for (i = 0; i < w; i++) { c = (ix >> (w-i-1)) & 1; if (u->child[c] == NULL) return false; u = u->child[c]; } // 2 - remove u from linked list u->prev->next = u->next; u->next->prev = u->prev; Node *v = u; // 3 - delete nodes on path to u for (i = w-1; i >= 0; i--) { c = (ix >> (w-i-1)) & 1; v = v->parent; delete v->child[c]; v->child[c] = NULL; if (v->child[1-c] != NULL) break; } // 4 - update jump pointers c = (ix >>> w-i-1) & 1; v.jump = u.child[1-c]; v = v.parent i-- for (; i >= 0; i--) { c = (ix >> (w-i-1)) & 1; if (v->jump == u) v->jump = u->child[1-c]; v = v->parent; } n--; return true; }

opendatastructures.org