This chapter introduces one of the most fundamental structures in computer science: binary trees. The use of the word tree here comes from the fact that, when we draw them, the resultant drawing often resembles the trees found in a forest. There are many ways of ways of defining binary trees. Mathematically, a binary tree is a connected, undirected, finite graph with no cycles, and no vertex of degree greater than three.
For most computer science applications, binary trees are rooted:
A special node, r
, of degree at most two is called the root
of the tree. For every node, \(\texttt{u}\neq \texttt{r}\), the second node on the
path from u
to r
is called the parent of u
.
Each of the
other nodes adjacent to u
is called a child
of u
. Most of the
binary trees we are interested in are ordered,
so we distinguish
between the left child and right child of u
.
In illustrations, binary trees are usually drawn from the root downward, with the root at the top of the drawing and the left and right children respectively given by left and right positions in the drawing (Figure 6.1). For example, Figure 6.2.a shows a binary tree with nine nodes.
(a) | (b) |
Because binary trees are so important, a certain terminology has developed
for them: The depth
of a node, u
, in a binary tree is the
length of the path from u
to the root of the tree. If a node, w
,
is on the path from u
to r
, then w
is called an ancestor
of u
and u
a descendant
of w
. The subtree of a
node, u
, is the binary tree that is rooted at u
and contains all
of u
's descendants. The height
of a node, u
, is the length
of the longest path from u
to one of its descendants. The height of
a tree is the height of its root.
A node, u
, is a leaf
if it has no children.
We sometimes think of the tree as being augmented with external nodes. Any node that does not have a left child has an external node as its left child, and, correspondingly, any node that does not have a right child has an external node as its right child (see Figure 6.2.b). It is easy to verify, by induction, that a binary tree with \(\texttt{n}\ge 1\) real nodes has \(\texttt{n}+1\) external nodes.
6.1 BinaryTree
: A Basic Binary Tree
The simplest way to represent a node, u
, in a binary tree is to
explicitly store the (at most three) neighbours of u
:
class BTNode<Node extends BTNode<Node>> {
Node left;
Node right;
Node parent;
}
nil
.
In this way, both external nodes of the tree and the parent of the root
correspond to the value nil
.
The binary tree itself can then be represented by a
reference to its root node, r
:
Node r;
We can compute the depth of a node, u
, in a binary tree by counting
the number of steps on the path from u
to the root:
int depth(Node u) {
int d = 0;
while (u != r) {
u = u.parent;
d++;
}
return d;
}
6.1.1 Recursive Algorithms
Using recursive algorithms makes it very easy to compute facts about
binary trees. For example, to compute the size of (number of nodes in)
a binary tree rooted at node u
, we recursively compute the sizes of the
two subtrees rooted at the children of u
, sum up these sizes, and add one:
int size(Node u) {
if (u == nil) return 0;
return 1 + size(u.left) + size(u.right);
}
To compute the height of a node u
, we can compute the height of u
's
two subtrees, take the maximum, and add one:
int height(Node u) {
if (u == nil) return -1;
return 1 + Math.max(height(u.left), height(u.right));
}
6.1.2 Traversing Binary Trees
The two algorithms from the previous section both use recursion to visit all the nodes in a binary tree. Each of them visits the nodes of the binary tree in the same order as the following code:
void traverse(Node u) {
if (u == nil) return;
traverse(u.left);
traverse(u.right);
}
Using recursion this way produces very short, simple code, but it can also be problematic. The maximum depth of the recursion is given by the maximum depth of a node in the binary tree, i.e., the tree's height. If the height of the tree is very large, then this recursion could very well use more stack space than is available, causing a crash.
To traverse a binary tree without recursion, you can use an algorithm that
relies on where it came from to determine where it will go next. See
Figure 6.3. If we arrive at a node u
from u.parent
,
then the next thing to do is to visit u.left
. If we arrive at u
from u.left
, then the next thing to do is to visit u.right
. If we
arrive at u
from u.right
, then we are done visiting u
's subtree,
and so we return to u.parent
. The following code implements this
idea, with code included for handling the cases where any of u.left
,
u.right
, or u.parent
is nil
:
void traverse2() {
Node u = r, prev = nil, next;
while (u != nil) {
if (prev == u.parent) {
if (u.left != nil) next = u.left;
else if (u.right != nil) next = u.right;
else next = u.parent;
} else if (prev == u.left) {
if (u.right != nil) next = u.right;
else next = u.parent;
} else {
next = u.parent;
}
prev = u;
u = next;
}
}
The same facts that can be computed with recursive algorithms can also be
computed in this way, without recursion. For example, to compute the size
of the tree we keep a counter, n
, and increment n
whenever visiting
a node for the first time:
int size2() {
Node u = r, prev = nil, next;
int n = 0;
while (u != nil) {
if (prev == u.parent) {
n++;
if (u.left != nil) next = u.left;
else if (u.right != nil) next = u.right;
else next = u.parent;
} else if (prev == u.left) {
if (u.right != nil) next = u.right;
else next = u.parent;
} else {
next = u.parent;
}
prev = u;
u = next;
}
return n;
}
In some implementations of binary trees, the parent
field is not used.
When this is the case, a non-recursive implementation is still possible,
but the implementation has to use a List
(or Stack
) to keep track
of the path from the current node to the root.
A special kind of traversal that does not fit the pattern of the above
functions is the breadth-first traversal.
In a breadth-first
traversal, the nodes are visited level-by-level starting at the root and
moving down, visiting the nodes at each level from left to right (see
Figure 6.4). This is similar to the way that we would read a
page of English text. Breadth-first traversal is implemented using a
queue, q
, that initially contains only the root, r
. At each step,
we extract the next node, u
, from q
, process u
and add u.left
and u.right
(if they are non-nil
) to q
:
void bfTraverse() {
Queue<Node> q = new LinkedList<Node>();
if (r != nil) q.add(r);
while (!q.isEmpty()) {
Node u = q.remove();
if (u.left != nil) q.add(u.left);
if (u.right != nil) q.add(u.right);
}
}
6.2 BinarySearchTree
: An Unbalanced Binary Search Tree
A BinarySearchTree
is a special kind of binary tree in which each node, u
,
also stores a data value, u.x
, from some total order. The data values in
a binary search tree obey the binary search tree property:
For
a node, u
, every data value stored in the subtree rooted at u.left
is less than u.x
and every data value stored in the subtree rooted at
u.right
is greater than u.x
. An example of a BinarySearchTree
is shown in Figure 6.5.
6.2.1 Searching
The binary search tree property is extremely useful because it allows
us to quickly locate a value, x
, in a binary search tree. To do this we start
searching for x
at the root, r
. When examining a node, u
, there
are three cases:
- If \(\texttt{x}< \texttt{u.x}\), then the search proceeds to
u.left
; - If \(\texttt{x}> \texttt{u.x}\), then the search proceeds to
u.right
; - If \(\texttt{x}= \texttt{u.x}\), then we have found the node
u
containingx
.
u=nil
. In the
former case, we found x
. In the latter case, we conclude that x
is not in the binary search tree.
T findEQ(T x) {
Node u = r;
while (u != nil) {
int comp = c.compare(x, u.x);
if (comp < 0)
u = u.left;
else if (comp > 0)
u = u.right;
else
return u.x;
}
return null;
}
Two examples of searches in a binary search tree are shown in
Figure 6.6. As the second example shows, even if we don't
find x
in the tree, we still gain some valuable information. If we
look at the last node, u
, at which Case 1 occurred, we see that u.x
is the smallest value in the tree that is greater than x
. Similarly,
the last node at which Case 2 occurred contains the largest value in the
tree that is less than x
. Therefore, by keeping track of the last
node, z
, at which Case 1 occurs, a BinarySearchTree
can implement
the find(x)
operation that returns the smallest value stored in the
tree that is greater than or equal to x
:
T find(T x) {
Node w = r, z = nil;
while (w != nil) {
int comp = c.compare(x, w.x);
if (comp < 0) {
z = w;
w = w.left;
} else if (comp > 0) {
w = w.right;
} else {
return w.x;
}
}
return z == nil ? null : z.x;
}
(a) | (b) |
6.2.2 Addition
To add a new value, x
, to a BinarySearchTree
, we first search for
x
. If we find it, then there is no need to insert it. Otherwise,
we store x
at a leaf child of the last node, p
, encountered during the
search for x
. Whether the new node is the left or right child of p
depends on the result of comparing x
and p.x
.
boolean add(T x) {
Node p = findLast(x);
return addChild(p, newNode(x));
}
Node findLast(T x) {
Node w = r, prev = nil;
while (w != nil) {
prev = w;
int comp = c.compare(x, w.x);
if (comp < 0) {
w = w.left;
} else if (comp > 0) {
w = w.right;
} else {
return w;
}
}
return prev;
}
boolean addChild(Node p, Node u) {
if (p == nil) {
r = u; // inserting into empty tree
} else {
int comp = c.compare(u.x, p.x);
if (comp < 0) {
p.left = u;
} else if (comp > 0) {
p.right = u;
} else {
return false; // u.x is already in the tree
}
u.parent = p;
}
n++;
return true;
}
x
, which takes an
amount of time proportional to the height of the newly added node u
.
In the worst case, this is equal to the height of the BinarySearchTree
.
6.2.3 Removal
Deleting a value stored in a node, u
, of a BinarySearchTree
is a
little more difficult. If u
is a leaf, then we can just detach u
from its parent. Even better: If u
has only one child, then we can
splice u
from the tree by having u.parent
adopt u
's child (see
Figure 6.8):
void splice(Node u) {
Node s, p;
if (u.left != nil) {
s = u.left;
} else {
s = u.right;
}
if (u == r) {
r = s;
p = nil;
} else {
p = u.parent;
if (p.left == u) {
p.left = s;
} else {
p.right = s;
}
}
if (s != nil) {
s.parent = p;
}
n--;
}
Things get tricky, though, when u
has two children. In this case,
the simplest thing to do is to find a node, w
, that has less than
two children and such that w.x
can replace u.x
. To maintain
the binary search tree property, the value w.x
should be close to the
value of u.x
. For example, choosing w
such that w.x
is the smallest
value greater than u.x
will work. Finding the node w
is easy; it is
the smallest value in the subtree rooted at u.right
. This node can
be easily removed because it has no left child (see Figure 6.9).
void remove(Node u) {
if (u.left == nil || u.right == nil) {
splice(u);
} else {
Node w = u.right;
while (w.left != nil)
w = w.left;
u.x = w.x;
splice(w);
}
}
6.2.4 Summary
The find(x)
, add(x)
, and remove(x)
operations in a
BinarySearchTree
each involve following a path from the root of the
tree to some node in the tree. Without knowing more about the shape of
the tree it is difficult to say much about the length of this path,
except that it is less than n
, the number of nodes in the tree.
The following (unimpressive) theorem summarizes the performance of the
BinarySearchTree
data structure:
BinarySearchTree
implements the SSet
interface and
supports the operations add(x)
, remove(x)
,
and find(x)
in \(O(\texttt{n})\) time per operation.
Theorem 6.1 compares poorly with Theorem 4.1, which shows
that the SkiplistSSet
structure can implement the SSet
interface
with \(O(\log \texttt{n})\) expected time per operation. The problem with the
BinarySearchTree
structure is that it can become unbalanced.
Instead of looking like the tree in Figure 6.5 it can look like a long
chain of n
nodes, all but the last having exactly one child.
There are a number of ways of avoiding unbalanced binary search trees, all of which lead to data structures that have \(O(\log \texttt{n})\) time operations. In Chapter 7 we show how \(O(\log \texttt{n})\) expected time operations can be achieved with randomization. In Chapter 8 we show how \(O(\log \texttt{n})\) amortized time operations can be achieved with partial rebuilding operations. In Chapter 9 we show how \(O(\log \texttt{n})\) worst-case time operations can be achieved by simulating a tree that is not binary: one in which nodes can have up to four children.
6.3 Discussion and Exercises
Binary trees have been used to model relationships for thousands of years. One reason for this is that binary trees naturally model (pedigree) family trees. These are the family trees in which the root is a person, the left and right children are the person's parents, and so on, recursively. In more recent centuries binary trees have also been used to model species trees in biology, where the leaves of the tree represent extant species and the internal nodes of the tree represent speciation events in which two populations of a single species evolve into two separate species.
Binary search trees appear to have been discovered independently by several groups in the 1950s [48]. Further references to specific kinds of binary search trees are provided in subsequent chapters.
When implementing a binary tree from scratch, there are several design decisions to be made. One of these is the question of whether or not each node stores a pointer to its parent. If most of the operations simply follow a root-to-leaf path, then parent pointers are unnecessary, waste space, and are a potential source of coding errors. On the other hand, the lack of parent pointers means that tree traversals must be done recursively or with the use of an explicit stack. Some other methods (like inserting or deleting into some kinds of balanced binary search trees) are also complicated by the lack of parent pointers.
Another design decision is concerned with how to store the parent, left
child, and right child pointers at a node. In the implementation given
here, these pointers are stored as separate variables. Another option
is to store them in an array, p
, of length 3, so that u.p[0]
is the
left child of u
, u.p[1]
is the right child of u
, and u.p[2]
is
the parent of u
. Using an array this way means that some sequences
of if
statements can be simplified into algebraic expressions.
An example of such a simplification occurs during tree traversal. If
a traversal arrives at a node u
from u.p[i]
, then the next node in
the traversal is \(\texttt{u.p}[(\texttt{i}+1)\bmod 3]\). Similar examples occur when
there is left-right symmetry. For example, the sibling of u.p[i]
is
\(\texttt{u.p}[(\texttt{i}+1)\bmod 2]\). This trick works whether u.p[i]
is a left
child (\(\texttt{i}=0\)) or a right child (\(\texttt{i}=1\)) of u
. In some cases this
means that some complicated code that would otherwise need to have both a
left version and right version can be written only once. See the methods
rotateLeft(u)
and rotateRight(u)
on page X
for an example.
size2(u)
, that computes the size
of the subtree rooted at node u
.
height2(u)
, that computes the height
of node u
in a BinaryTree
.
u
, the size of
the subtrees rooted at u.left
and u.right
differ by at most one.
Write a recursive method, isBalanced()
, that tests if a binary tree
is balanced. Your method should run in \(O(\texttt{n})\) time. (Be sure to
test your code on some large trees with different shapes; it is easy
to write a method that takes much longer than \(O(\texttt{n})\) time.)
A pre-order traversal of a binary tree is a traversal that visits
each node, u
, before any of its children. An in-order traversal
visits u
after visiting all the nodes in u
's left subtree but before
visiting any of the nodes in u
's right subtree. A post-order
traversal visits u
only after visiting all other nodes in u
's subtree.
The pre/in/post-order numbering of a tree labels the nodes of a tree with
the integers \(0,\ldots,\texttt{n}-1\) in the order that they are encountered
by a pre/in/post-order traversal. See Figure 6.10
for an example.
BinaryTree
whose nodes have fields for storing
pre-order, post-order, and in-order numbers. Write recursive methods
preOrderNumber()
, inOrderNumber()
, and postOrderNumbers()
that
assign these numbers correctly. These methods should each run in
\(O(\texttt{n})\) time.
nextPreOrder(u)
, nextInOrder(u)
, and
nextPostOrder(u)
that return the node that follows u
in a pre-order,
in-order, or post-order traversal, respectively. These functions
should take amortized constant time; if we start at any node
u
and repeatedly call one of these functions and assign the return
value to u
until \(\texttt{u}=\texttt{null}\), then the cost of all these calls should
be \(O(\texttt{n})\).
- Given a node
u
, determine the size of the subtree rooted atu
. - Given a node
u
, determine the depth ofu
. - Given two nodes
u
andw
, determine ifu
is an ancestor ofw
n
nodes can be represented
using at most \(2(\texttt{n}-1)\) bits. (Hint: think about recording what
happens during a traversal and then playing back that recording to
reconstruct the tree.)
BinarySearchTree
method, getLE(x)
,
that returns a list of all items in the tree that are less than or
equal to x
. The running time of your method should be \(O(\texttt{n}'+\texttt{h})\)
where \(\texttt{n}'\) is the number of items less than or equal to x
and h
is the height of the tree.
BinarySearchTree
in such a way that the resulting tree has
height \(\texttt{n}-1\). How many ways are there to do this?
BinarySearchTree
and perform the operations add(x)
followed by remove(x)
(with the same value of x
) do we necessarily
return to the original tree?
remove(x)
operation increase the height of any node in a
BinarySearchTree
? If so, by how much?
add(x)
operation increase the height of any node in a
BinarySearchTree
? Can it increase the height of the tree? If so,
by how much?
BinarySearchTree
in which each node,
u
, maintains values u.size
(the size of the subtree rooted at u
),
u.depth
(the depth of u
), and u.height
(the height of the subtree
rooted at u
).
These values should be maintained, even during calls to the add(x)
and remove(x)
operations, but this should not increase the cost of
these operations by more than a constant factor.