In this section, we review some mathematical notations and tools used throughout this book, including logarithms, big-Oh notation, and probability theory. This review will be brief and is not intended as an introduction. Readers who feel they are missing this background are encouraged to read, and do exercises from, the appropriate sections of the very good (and free) textbook on mathematics for computer science [50].
The expression  denotes the number
 denotes the number  raised to the power of
 raised to the power of  .
If
.
If  is a positive integer, then this is just the value of
 is a positive integer, then this is just the value of  multiplied by itself
multiplied by itself  times:
 times:
 
 is a negative integer,
 is a negative integer, 
 .  When
.  When  ,
,  .
When
.
When  is not an integer, we can still define exponentiation in terms
of the exponential function
 is not an integer, we can still define exponentiation in terms
of the exponential function  (see below), which is itself defined in
terms of the exponential series, but this is best left to a calculus text.
 (see below), which is itself defined in
terms of the exponential series, but this is best left to a calculus text.
In this book, the expression  denotes the base-
 denotes the base- logarithm
of
 logarithm
of  .  That is, the unique value
.  That is, the unique value  that satisfies
 that satisfies
 
 is shorthand for
 is shorthand for
 .
.
An informal, but useful, way to think about logarithms is to think
of  as the number of times we have to divide
 as the number of times we have to divide  by
 by  before the result is less than or equal to 1.  For example, when one
does binary search, each comparison reduces the number of possible
answers by a factor of 2.  This is repeated until there is at most one
possible answer.  Therefore, the number of comparison done by binary
search when there are initially at most
before the result is less than or equal to 1.  For example, when one
does binary search, each comparison reduces the number of possible
answers by a factor of 2.  This is repeated until there is at most one
possible answer.  Therefore, the number of comparison done by binary
search when there are initially at most  possible answers is at
most
 possible answers is at
most 
 .
.
Another logarithm that comes up several times in this book is the
natural logarithm.  Here we use the notation  to denote
 to denote
 , where
, where  -- Euler's constant -- is given by
 -- Euler's constant -- is given by
 
 
 
 
 
In one or two places in this book, the factorial function is used.
For a non-negative integer  , the notation
, the notation  (pronounced ``
 (pronounced `` factorial'') is defined to mean
 factorial'') is defined to mean 
 
 counts the number of distinct
permutations, i.e., orderings, of
 counts the number of distinct
permutations, i.e., orderings, of  distinct elements.
For the special case
 distinct elements.
For the special case  ,
,  is defined as 1.
 is defined as 1. 
The quantity  can be approximated using Stirling's Approximation:
 can be approximated using Stirling's Approximation:
 
 
 :
:
 
 by the integral
 by the integral
 .)
.)
Related to the factorial function are the binomial coefficients.
For a non-negative integer  and an integer
 and an integer 
 ,
the notation
,
the notation 
 denotes:
 denotes:
 
 (pronounced ``
 (pronounced `` choose
 choose  '')
counts the number of subsets of an
'')
counts the number of subsets of an  element set that have size
 element set that have size  ,
i.e., the number of ways of choosing
,
i.e., the number of ways of choosing  distinct integers from the
set
 distinct integers from the
set 
 .
.
When analyzing data structures in this book, we want to talk about
the running times of various operations.  The exact running times will,
of course, vary from computer to computer and even from run to run on an
individual computer.  When we talk about the running time of an operation
we are referring to the number of computer instructions performed during
the operation.  Even for simple code, this quantity can be difficult to
compute exactly.  Therefore, instead of analyzing running times exactly,
we will use the so-called big-Oh notation: For a function  ,
,
 denotes a set of functions,
 denotes a set of functions,
 
 where
 where
 starts to dominate
 starts to dominate  when
 when  is sufficiently large.
 is sufficiently large.
We generally use asymptotic notation to simplify functions.  For example,
in place of 
 we can write
 we can write 
 .
This is proven as follows:
.
This is proven as follows:
|  |  | ||
|  |  | ||
|  | 
 is in
the set
 is in
the set 
 using the constants
 using the constants  and
 and  .
.
A number of useful shortcuts can be applied when using asymptotic notation. First:
 
 .  Second: For any constants
.  Second: For any constants  ,
,
 
 yields:
 yields:
 
Continuing in a long and distinguished tradition, we will abuse this
notation by writing things like 
 when what we really
mean is
 when what we really
mean is 
 .  We will also make statements like ``the
running time of this operation is
.  We will also make statements like ``the
running time of this operation is  '' when this statement should
be ``the running time of this operation is a member of
'' when this statement should
be ``the running time of this operation is a member of  .''
These shortcuts are mainly to avoid awkward language and to make it
easier to use asymptotic notation within strings of equations.
.''
These shortcuts are mainly to avoid awkward language and to make it
easier to use asymptotic notation within strings of equations.
A particularly strange example of this occurs when we write statements like
 

![$\displaystyle \mbox{some member of $O(1)$]}$](img380.png)
 
The expression  also brings up another issue. Since there is
no variable in this expression, it may not be clear which variable is
getting arbitrarily large.  Without context, there is no way to tell.
In the example above, since the only variable in the rest of the equation
is
 also brings up another issue. Since there is
no variable in this expression, it may not be clear which variable is
getting arbitrarily large.  Without context, there is no way to tell.
In the example above, since the only variable in the rest of the equation
is  , we can assume that this should be read as
, we can assume that this should be read as 
 , where
, where  .
.
Big-Oh notation is not new or unique to computer science. It was used by the number theorist Paul Bachmann as early as 1894, and is immensely useful for describing the running times of computer algorithms. Consider the following piece of code:
  void  snippet() {
    for (int i = 0; i < n; i++) 
      a[i] = i;
  }
One execution of this method involves
 assignment (
 assignment (
 ),
),
 comparisons (
 comparisons (
 ),
),
 increments (
 increments (
 ),
),
 array offset calculations (
 array offset calculations (
![$ \mathtt{a[i]}$](img393.png) ), and
), and
 indirect assignments (
 indirect assignments (
![$ \mathtt{a[i] = i}$](img395.png) ).
).
 
 ,
,  ,
,  ,
,  , and
, and  are constants that depend on the
machine running the code and represent the time to perform assignments,
comparisons, increment operations, array offset calculations, and indirect
assignments, respectively.  However, if this expression represents the
running time of two lines of code, then clearly this kind of analysis
will not be tractable to complicated code or algorithms.  Using big-Oh
notation, the running time can be simplified to
 are constants that depend on the
machine running the code and represent the time to perform assignments,
comparisons, increment operations, array offset calculations, and indirect
assignments, respectively.  However, if this expression represents the
running time of two lines of code, then clearly this kind of analysis
will not be tractable to complicated code or algorithms.  Using big-Oh
notation, the running time can be simplified to
 
 ,
,
 ,
,  ,
,  , and
, and  in the above example means that, in general, it
will not be possible to compare two running times to know which is faster
without knowing the values of these constants.  Even if we make the
effort to determine these constants (say, through timing tests), then
our conclusion will only be valid for the machine we run our tests on.
 in the above example means that, in general, it
will not be possible to compare two running times to know which is faster
without knowing the values of these constants.  Even if we make the
effort to determine these constants (say, through timing tests), then
our conclusion will only be valid for the machine we run our tests on.
Big-Oh notation allows us to reason at a much higher level, making
it possible to analyze more complicated functions.  If two algorithms
have the same big-Oh running time, then we won't know which is faster,
and there may not be a clear winner.  One may be faster on one machine,
and the other may be faster on a different machine.  However, if the
two algorithms have demonstrably different big-Oh running times, then
we can be certain that the one with the smaller running time will be
faster for large enough values of 
 .
.
An example of how big-Oh notation allows us to compare two different
functions is shown in Figure 1.5, which compares the rate
of growth of 
 versus
 versus 
 .  It might be
that
.  It might be
that  is the running time of a complicated linear time algorithm
while
 is the running time of a complicated linear time algorithm
while  is the running time of a considerably simpler algorithm
based on the divide-and-conquer paradigm.  This illustrates that,
although
 is the running time of a considerably simpler algorithm
based on the divide-and-conquer paradigm.  This illustrates that,
although 
 is greater than
 is greater than  for small values of
 for small values of 
 ,
the opposite is true for large values of
,
the opposite is true for large values of 
 .  Eventually
.  Eventually 
 wins out, by an increasingly wide margin.  Analysis using big-Oh notation
told us that this would happen, since
wins out, by an increasingly wide margin.  Analysis using big-Oh notation
told us that this would happen, since 
 .
.
In a few cases, we will use asymptotic notation on functions with more than one variable. There seems to be no standard for this, but for our purposes, the following definition is sufficient:
 
 make
 make  take on large values.  This definition
also agrees with the univariate definition of
 take on large values.  This definition
also agrees with the univariate definition of  when
 when  is an increasing function of
is an increasing function of  .  The reader should be warned that,
although this works for our purposes, other texts may treat multivariate
functions and asymptotic notation differently.
.  The reader should be warned that,
although this works for our purposes, other texts may treat multivariate
functions and asymptotic notation differently.
Some of the data structures presented in this book are randomized; they make random choices that are independent of the data being stored in them or the operations being performed on them. For this reason, performing the same set of operations more than once using these structures could result in different running times. When analyzing these data structures we are interested in their average or expected running times.
Formally, the running time of an operation on a randomized data structure
is a random variable, and we want to study its expected value.
For
a discrete random variable  taking on values in some countable
universe
 taking on values in some countable
universe  , the expected value of
, the expected value of  , denoted by
, denoted by 
![$ \mathrm{E}[X]$](img433.png) , is given
by the formula
, is given
by the formula
![$\displaystyle \mathrm{E}[X] = \sum_{x\in U} x\cdot\Pr\{X=x\} \enspace .
$](img434.png) 
 denotes the probability that the event
 denotes the probability that the event
 occurs.  In all of the examples in this book, these
probabilities are only with respect to the random choices made by the
randomized data structure;  there is no assumption that the data stored
in the structure, nor the sequence of operations performed on the
data structure, is random.
 occurs.  In all of the examples in this book, these
probabilities are only with respect to the random choices made by the
randomized data structure;  there is no assumption that the data stored
in the structure, nor the sequence of operations performed on the
data structure, is random.
One of the most important properties of expected values is linearity
of expectation.
For any two random variables  and
 and  ,
,
![$\displaystyle \mathrm{E}[X+Y] = \mathrm{E}[X] + \mathrm{E}[Y] \enspace .
$](img439.png) 
 ,
,
![$\displaystyle \mathrm{E}\left[\sum_{i=1}^k X_k\right] = \sum_{i=1}^k \mathrm{E}[X_i] \enspace .
$](img441.png) 
A useful trick, that we will use repeatedly, is defining indicator
random variables.
These binary variables are useful when we want to
count something and are best illustrated by an example.  Suppose we toss
a fair coin  times and we want to know the expected number of times
the coin turns up as heads.
Intuitively, we know the answer is
 times and we want to know the expected number of times
the coin turns up as heads.
Intuitively, we know the answer is  ,
but if we try to prove it using the definition of expected value, we get
,
but if we try to prove it using the definition of expected value, we get
| ![$\displaystyle \mathrm{E}[X]$](img444.png) |  | |
|  | ||
|  | ||
|  | 
 , and that we know the binomial identities
, and that we know the binomial identities
 and
 and 
 .
.
Using indicator variables and linearity of expectation makes things
much easier.  For each 
 , define the indicator
random variable
, define the indicator
random variable
 
![$\displaystyle \mathrm{E}[I_i] = (1/2)1 + (1/2)0 = 1/2 \enspace . $](img454.png) 
 , so
, so
| ![$\displaystyle \mathrm{E}[X]$](img456.png) | ![$\displaystyle = \mathrm{E}\left[\sum_{i=1}^k I_i\right]$](img457.png) | |
| ![$\displaystyle = \sum_{i=1}^k \mathrm{E}[I_i]$](img458.png) | ||
|  | ||
|  | 
 .
.
opendatastructures.org