1.2 Mathematical Background

In this section, we review some mathematical notations and tools used throughout this book, including logarithms, big-Oh notation, and probability theory.

1.2.1 Logarithms

In this book, the expression $ \log_b k$ denotes the base-$ b$ logarithm of $ k$. That is, the unique value $ x$ that satisfies

$\displaystyle b^{x} = k \enspace .

Most of the logarithms in this book are base 2 (binary logarithms), in which case we drop the base, so that $ \log k$ is shorthand for $ \log_2 k$.

Another logarithm that comes up several times in this book is the natural logarithm. Here we use the notation $ \ln k$ to denote $ \log_e k$, where $ e$ -- Euler's constant -- is given by

$\displaystyle e = \lim_{n\rightarrow\infty} \left(1+\frac{1}{n}\right)^n
\approx 2.71828 \enspace .

The natural logarithm comes up frequently because it is the value of a particularly common integral:

$\displaystyle \int_{1}^{k} 1/x\,\mathrm{d}x = \ln k \enspace .

Two of the most common manipulations we do with logarithms are removing them from an exponent:

$\displaystyle b^{\log_b k} = k

and changing the base of a logarithm:

$\displaystyle \log_b k = \frac{\log_a k}{\log_a b} \enspace .

For example, we can use these two manipulations to compare the natural and binary logarithms

$\displaystyle \ln k = \frac{\log k}{\log e} = \frac{\log k}{(\ln e)/(\ln 2)} =
(\ln 2)(\log k) \approx 0.693147\log k \enspace .

1.2.2 Factorials

In one or two places in this book, the factorial function is used. For a non-negative integer $ n$, the notation $ n!$ (pronounced ``$ n$ factorial'') denotes

$\displaystyle n! = 1\cdot2\cdot3\cdot\cdots\cdot n \enspace .

Factorials appear because $ n!$ counts the number of distinct permutations, i.e., orderings, of n distinct elements. For the special case $ n=0$, $ 0!$ is defined as 1.

The quantity $ n!$ can be approximated using Stirling's Approximation:

$\displaystyle n!
= \sqrt{2\pi n}\left(\frac{n}{e}\right)^{n}e^{\alpha(n)} \enspace ,


$\displaystyle \frac{1}{12n+1} < \alpha(n) < \frac{1}{12n} \enspace .

Stirling's Approximation also approximates $ \ln(n!)$:

$\displaystyle \ln(n!) = n\ln n - n + \frac{1}{2}\ln(2\pi n) + \alpha(n)

(In fact, Stirling's Approximation is most easily proven by approximating $ \ln(n!)=\ln 1 + \ln 2 + \cdots + \ln n$ by the integral $ \int_1^n \ln n\,\mathrm{d}n = n\ln n - n +1$.)

Related to the factorial function are the binomial coefficients. For a non-negative integer $ n$ and an integer $ k\in\{0,\ldots,n\}$, the notation $ \binom{n}{k}$ denotes:

$\displaystyle \binom{n}{k} = \frac{n!}{k!(n-k)!} \enspace .

The binomial coefficient $ \binom{n}{k}$ (pronounced ``$ n$ choose $ k$'') counts the number of subsets of an $ n$ element set that have size $ k$, i.e., the number of ways of choosing $ k$ distinct integers from the set $ \{1,\ldots,n\}$.

1.2.3 Asymptotic Notation

When analyzing data structures in this book, we will want to talk about the running times of various operations. The exact running times will, of course, vary from computer to computer and even from run to run on an individual computer. Therefore, instead of analyzing running times exactly, we will use the so-called big-Oh notation: For a function $ f(n)$, $ O(f(n))$ denotes a set of functions,

$\displaystyle O(f(n)) = \{g(n):$$\displaystyle \mbox{there exists $c>0$, and $n_0$\ such that
$g(n) \le c\cdot f(n)$\ for all $n\ge n_0$}$$\displaystyle \} \enspace .

Thinking graphically, this set consists of the functions $ g(n)$ where $ c\cdot f(n)$ starts to dominate $ g(n)$ when $ n$ is sufficiently large.

We generally use asymptotic notation to simplify functions. For example, in place of $ 5n\log n + 8n - 200$ we can write, simply, $ O(n\log n)$. This is proven as follows:

$\displaystyle 5n\log n + 8n - 200$ $\displaystyle \le 5n\log n + 8n$    
  $\displaystyle \le 5n\log n + 8n\log n$ $\displaystyle \mbox{ for $n\ge 2$\ (so that $\log n \ge 1$)}$    
  $\displaystyle \le 13n\log n$    

which demonstrates that the fundtion $ f(n)=5n\log n + 8n - 200$ is in the set $ O(\log n)$ using the constants $ c=13$ and $ n_0 = 2$.

There are a number of useful shortcuts when using asymptotic notation. First:

$\displaystyle O(n^{c_1}) \subset O(n^{c_2}) \enspace ,$

for any $ c_1 < c_2$. Second: For any constants $ a,b,c > 0$,

$\displaystyle O(a) \subset O(\log n) \subset O(n^{b}) \subset O({c}^n) \enspace . $

These inclusion relations can be multiplied by any positive value, and they still hold. For example, multiplying by $ n$ yields:

$\displaystyle O(n) \subset O(n\log n) \subset O(n^{1+b}) \subset O(n{c}^n) \enspace . $

Continuing in a long and distinguished tradition, we will abuse this notation by writing things like $ f_1(n) = O(f(n))$ when what we really mean is $ f_1(n) \in O(f(n))$. We will also make statements like ``the running time of this operation is $ O(f(n))$'' when this statement should be ``the running time of this operation is a member of $ O(f(n))$.'' These shortcuts are mainly to avoid awkward language and to make it easier to use asymptotic notation within strings of equations.

A particularly strange example of this comes when we write statements like

$\displaystyle T(n) = 2\log n + O(1) \enspace .

Again, this would be more correctly written as

$\displaystyle T(n) \le 2\log n + [$$\displaystyle \mbox{some member of $O(1)$]}$$\displaystyle \enspace .

The expression $ O(1)$ also brings up another issue. Since there is no variable in this expression, it may not be clear what variable is getting arbitrarily large. Without context, there is no way to tell. In the example above, since the only variable in the rest of the equation is $ n$, we can assume that this should be read as $ T(n) = 2\log n + O(f(n))$, where $ f(n) = 1$.

In a few cases, we will use asymptotic notation on functions with more than one variable. There seems to be no standard for this, but for our purposes, the following definition is sufficient:

$\displaystyle O(f(n_1,\ldots,n_k)) =
...ots,n_k$\ such that $g(n_1,\ldots,n_k)\ge z$}
\end{array}\right\} \enspace .

This definition captures the situation we really care about: when the arguments $ n_1,\ldots,n_k$ make $ g$ take on large values. This agrees with the univariate definition of $ O(f(n))$ when $ f(n)$ is an increasing function of $ n$. The reader should be warned that, although this works for our purposes, other texts may treat multivariate functions and asymptotic notation differently.

1.2.4 Randomization and Probability

Some of the data structures presented in this book are randomized; they make random choices that are independent of the data being stored in them or the operations being performed on them. For this reason, performing the same set of operations more than once using these structures could result in different running times. When analyzing these data structures we are interested in their average or expected running times.

Formally, the running time of an operation on a randomized data structure is a random variable and we want to study its expected value. For a discrete random variable $ X$ taking on values in some countable universe $ U$, the expected value of $ X$, denoted by $ \mathrm{E}[X]$ is given the by the formula

$\displaystyle \mathrm{E}[X] = \sum_{x\in U} x\cdot\Pr\{X=x\} \enspace .

Here $ \Pr\{\mathcal{E}\}$ denotes the probability that the event $ \mathcal{E}$ occurs. In all the examples in this book, these probabilities are only with respect to whatever random choices are made by the randomized data structure; there is no assumption that the data stored in the structure is random or that the sequence of operations performed on the data structure is random.

One of the most important properties of expected values is linearity of expectation: For any two random variables $ X$ and $ Y$,

$\displaystyle \mathrm{E}[X+Y] = \mathrm{E}[X] + \mathrm{E}[Y] \enspace .

More generally, for any random variables $ X_1,\ldots,X_k$,

$\displaystyle \mathrm{E}\left[\sum_{i=1}^k X_k\right] = \sum_{i=1}^k \mathrm{E}[X_i] \enspace .

Linearity of expectation allows us to break down complicated random variables (like the left hand sides of the above equations) into sums of simpler random variables (the right hand sides).

A useful trick, that we will use repeatedly, is that of defining indicator random variables. These binary variables are useful when we want to count something and are best illustrated by an example. Suppose we toss a fair coin $ k$ times and we want to know the expected number of times the coin comes up heads. Intuitively, we know the answer is $ k/2$, but if we try to prove it using the definition of expected value, we get

$\displaystyle \mathrm{E}[X]$ $\displaystyle = \sum_{i=0}^k i\cdot\Pr\{X=i\}$    
  $\displaystyle = \sum_{i=0}^k i\cdot\binom{k}{i}/2^k$    
  $\displaystyle = k\cdot \sum_{i=0}^{k-1}\binom{k-1}{i}/2^k$    
  $\displaystyle = k/2 \enspace .$    

This requires that we know enough to calculate that $ \Pr\{X=i\}
= \binom{k}{i}/2^k$, that we know the binomial identity $ i\binom{k}{i}=k\binom{k-1}{i}$, and that we know the binomial identity $ \sum_{i=0}^{k} \binom{k}{i} = 2^{k}$.

Using indicator variables and linearity of expectation makes things much easier: For each $ i\in\{1,\ldots,k\}$, define the indicator random variable

$\displaystyle I_i = \begin{cases}
1 & \text{if the $i$th coin toss is heads} \\
0 & \text{otherwise.}


$\displaystyle \mathrm{E}[I_i] = (1/2)1 + (1/2)0 = 1/2 \enspace . $

Now, $ X=\sum_{i=1}^k I_i$, so

$\displaystyle \mathrm{E}[X]$ $\displaystyle = \mathrm{E}\left[\sum_{i=1}^k I_i\right]$    
  $\displaystyle = \sum_{i=1}^k \mathrm{E}[I_i]$    
  $\displaystyle = \sum_{i=1}^k 1/2$    
  $\displaystyle = k/2 \enspace .$    

This is a bit more long-winded, but doesn't require that we know any magical identities or compute any non-trivial probabilities. Even better: It agrees with the intuition that we expect half the coins to come up heads precisely because each individual coin has probability $ 1/2$ of coming up heads.