 : A Binary Search Tree with Partial Rebuilding
: A Binary Search Tree with Partial Rebuilding
A 
 is a
 is a 
 that, in addition to keeping
track of the number,
 that, in addition to keeping
track of the number, 
 , of  nodes in the tree also keeps a counter,
, of  nodes in the tree also keeps a counter, 
 ,
that maintains an upper-bound on the number of nodes.
,
that maintains an upper-bound on the number of nodes. 
int q;At all times,
 and
 and 
 obey the following inequalities:
 obey the following inequalities:
 
 has logarithmic height; at all times, the height of the scapegoat tree does not exceed
 has logarithmic height; at all times, the height of the scapegoat tree does not exceed
 can look surprisingly unbalanced.  The tree in Figure 8.1 has
 can look surprisingly unbalanced.  The tree in Figure 8.1 has 
 and height
 and height 
 .
.
Implementing the 
 operation in a
 operation in a 
 is done
using the standard algorithm for searching in a
 is done
using the standard algorithm for searching in a 
 (see Section 6.2).  This takes time proportional to the
height of the tree which, by (8.1) is
(see Section 6.2).  This takes time proportional to the
height of the tree which, by (8.1) is 
 .
.
To implement the 
 operation, we first increment
 operation, we first increment 
 and
 and 
 and then use the usual algorithm for adding
and then use the usual algorithm for adding 
 to a binary search
tree; we search for
 to a binary search
tree; we search for 
 and then add a new leaf
 and then add a new leaf 
 with
 with 
 .
At this point, we may get lucky and the depth of
.
At this point, we may get lucky and the depth of 
 might not exceed
 might not exceed
 . If so, then we leave well enough alone and don't do
anything else.
. If so, then we leave well enough alone and don't do
anything else.
Unfortunately, it will sometimes happen that 
 .  In this case, we need to reduce the height.  This isn't a big
job; there is only one node, namely
.  In this case, we need to reduce the height.  This isn't a big
job; there is only one node, namely 
 , whose depth exceeds
, whose depth exceeds 
 .  To fix
.  To fix 
 , we walk from
, we walk from 
 back up to the root looking for a
scapegoat,
 back up to the root looking for a
scapegoat, 
 .  The scapegoat,
.  The scapegoat, 
 , is a very unbalanced node.
It has the property that
, is a very unbalanced node.
It has the property that
 is the child of
 is the child of 
 on the path from the root to
 on the path from the root to 
 .
We'll very shortly prove that a scapegoat exists.  For now, we can
take it for granted.  Once we've found the scapegoat
.
We'll very shortly prove that a scapegoat exists.  For now, we can
take it for granted.  Once we've found the scapegoat 
 , we completely
destroy the subtree rooted at
, we completely
destroy the subtree rooted at 
 and rebuild it into a perfectly balanced
binary search tree.  We know, from (8.2), that, even before
the addition of
 and rebuild it into a perfectly balanced
binary search tree.  We know, from (8.2), that, even before
the addition of 
 ,
, 
 's subtree was not a complete binary tree.
Therefore, when we rebuild
's subtree was not a complete binary tree.
Therefore, when we rebuild 
 , the height decreases by at least 1 so that the height of the
, the height decreases by at least 1 so that the height of the 
 is once again at most
 is once again at most 
 .
.
  bool add(T x) {
    // first do basic insertion keeping track of depth
    Node *u = new Node;
    u->x = x;
    u->left = u->right = u->parent = nil;
    int d = addWithDepth(u);
    if (d > log32(q)) {
      // depth exceeded, find scapegoat
      Node *w = u->parent;
      int a = BinaryTree<Node>::size(w);
      int b = BinaryTree<Node>::size(w->parent);
      while (3*a <= 2*b) {
        w = w->parent;
        a = BinaryTree<Node>::size(w);
        b = BinaryTree<Node>::size(w->parent);
      }
      rebuild(w->parent);
    }
    return d >= 0;
  }
| 
 | 
 and rebuilding the
subtree rooted at
 and rebuilding the
subtree rooted at 
 , then the running time of
, then the running time of 
 is dominated
by the initial search, which takes
 is dominated
by the initial search, which takes 
 time.
We will account for the cost of finding the scapegoat and rebuilding
using amortized analysis in the next section.
 time.
We will account for the cost of finding the scapegoat and rebuilding
using amortized analysis in the next section.
The implementation of 
 in a
 in a 
 is very simple.
We search for
 is very simple.
We search for 
 and remove it using the usual algorithm for removing a
node from a
 and remove it using the usual algorithm for removing a
node from a 
 .  (Note that this can never increase the
height of the tree.)  Next, we decrement
.  (Note that this can never increase the
height of the tree.)  Next, we decrement 
 , but leave
, but leave 
 unchanged.
Finally, we check if
 unchanged.
Finally, we check if 
 and, if so, then we rebuild the entire
tree into a perfectly balanced binary search tree and set
 and, if so, then we rebuild the entire
tree into a perfectly balanced binary search tree and set 
 .
.
  bool remove(T x) {
    if (BinarySearchTree<Node,T>::remove(x)) {
      if (2*n < q) {
        rebuild(r);
        q = n;
      }
      return true;
    }
    return false;
  }
Again, if we ignore the cost of rebuilding, the running time of the
 operation is proportional to the height of the tree, and is
therefore
 operation is proportional to the height of the tree, and is
therefore 
 .
.
In this section, we analyze the correctness and amortized running time
of operations on a 
 .  We first prove the correctness by
showing that, when the
.  We first prove the correctness by
showing that, when the 
 operation results in a node that violates
Condition (8.1), then we can always find a scapegoat:
 operation results in a node that violates
Condition (8.1), then we can always find a scapegoat:
 be a node of depth
 be a node of depth 
 in a
 in a 
 .
  Then there exists a node
.
  Then there exists a node 
 on the path from
 on the path from 
 to the root
  such that
 to the root
  such that
  
 
 
 on the path from
 on the path from 
 to the root.  Denote the path
  from the root to
 to the root.  Denote the path
  from the root to 
 as
 as 
 .  Then, we have
.  Then, we have
  
 ,
,
  
 ,
, 
  
 and, more generally,
 and, more generally,
  
 
 , hence
, hence
  
 
 
Next, we analyze the parts of the running time that are not yet
accounted for.  There are two parts:  The cost of calls to 
 when searching for scapegoat nodes, and the cost of calls to
when searching for scapegoat nodes, and the cost of calls to 
 when we find a scapegoat
when we find a scapegoat 
 .  The cost of calls to
.  The cost of calls to 
 can be
related to the cost of calls to
 can be
related to the cost of calls to 
 , as follows:
, as follows:
 in a
 in a 
 , the cost of finding the scapegoat
, the cost of finding the scapegoat 
 and rebuilding the subtree rooted at
 and rebuilding the subtree rooted at 
 is
 is 
 .
.
 , once we find it, is
, once we find it, is
 .  When searching for the scapegoat node, we call
.  When searching for the scapegoat node, we call 
 on a sequence of nodes
on a sequence of nodes 
 until we find the scapegoat
 until we find the scapegoat
 .  However, since
.  However, since 
 is the first node in this sequence
that is a scapegoat, we know that
 is the first node in this sequence
that is a scapegoat, we know that
 
 .  Therefore, the cost of all calls to
.  Therefore, the cost of all calls to 
 is
 is
|  |  |  | |
|  |  | ||
|  |  | ||
|  |  | 
 
All that remains is to prove an upper-bound on the cost of all calls to
 during a sequence of
 during a sequence of  operations:
 operations:
 any sequence of
 any sequence of  
 
 and
  and 
 operations causes at most
 operations causes at most 
 time to be used
  by
 time to be used
  by 
 operations.
 operations.
 , units of time spent rebuilding.  The scheme gives out a total of
, units of time spent rebuilding.  The scheme gives out a total of
  
 credits and every call to
 credits and every call to 
 is paid for with
  credits stored at
 is paid for with
  credits stored at 
 .
.
During an insertion or deletion, we give one credit to each node on the
  path to the inserted node, or deleted node, 
 .  In this way we hand
  out at most
.  In this way we hand
  out at most 
 credits per operation.
  During a deletion we also store an additional credit ``on the side.''
  Thus, in total we give out at most
 credits per operation.
  During a deletion we also store an additional credit ``on the side.''
  Thus, in total we give out at most 
 credits.  All that
  remains is to show that these credits are sufficient to pay for all
  calls to
 credits.  All that
  remains is to show that these credits are sufficient to pay for all
  calls to 
 .
.
If we call 
 during an insertion, it is because
 during an insertion, it is because 
 is
  a scapegoat.  Suppose, without loss of generality, that
 is
  a scapegoat.  Suppose, without loss of generality, that
  
 
 
 
 
 was rebuilt (or when
 was rebuilt (or when 
 was inserted, if a subtree containing
  was inserted, if a subtree containing 
 was never rebuilt), we had
 was never rebuilt), we had
  
 
 or
 or 
 operations that have
  affected
 operations that have
  affected 
 or
 or 
 since then is at least
 since then is at least
  
 
 that are available to pay for the
  that are available to pay for the 
 time it takes to
  call
 time it takes to
  call 
 .
.
If we call 
 during a deletion, it is because
 during a deletion, it is because 
 .
  In this case, we have
.
  In this case, we have 
 credits stored ``on the side,'' and
  we use these to pay for the
 credits stored ``on the side,'' and
  we use these to pay for the 
 time it takes to rebuild the root.
  This completes the proof.
 time it takes to rebuild the root.
  This completes the proof.
  
 data structure:
 data structure:
 implements the
 implements the 
 interface. Ignoring the cost
  of
 interface. Ignoring the cost
  of 
 operations, a
 operations, a 
 supports the operations
 supports the operations
  
 ,
, 
 , and
, and 
 in
 in 
 time per operation.
 time per operation.
Furthermore, beginning with an empty 
 , any sequence of
, any sequence of  
  
 and
 and 
 operations results in a total of
 operations results in a total of 
 time spent during all calls to
  time spent during all calls to 
 .
.
opendatastructures.org