11.3 Discussion and Exercises

Sorting is probably the fundamental algorithmic problem in computer science, and has a long history. Knuth [41] attributes the merge-sort algorithm to von Neumann (1945). Quicksort is due to Hoare [33]. The original heap-sort algorithm is due to Williams [63], but the version presented here (in which the heap is constructed bottom-up in $ O(\ensuremath{\mathtt{n}})$ time) is due to Floyd [23]. Lower-bounds for comparison-based sorting appear to be folklore. The following table summarizes the performance of these comparison-based algorithms:

  comparisons  in-place  
Merge-sort $ \ensuremath{\mathtt{n}}\log \ensuremath{\mathtt{n}}$     worst-case No
Quicksort $ 1.38\ensuremath{\mathtt{n}}\log \ensuremath{\mathtt{n}}$   $ {}+ O(\ensuremath{\mathtt{n}})$  expected Yes
Heap-sort $ 2\ensuremath{\mathtt{n}}\log \ensuremath{\mathtt{n}}$   $ {}+ O(\ensuremath{\mathtt{n}})$  worst-case Yes

Each of these comparison-based algorithms has advantages and disadvantages. Merge-sort does the fewest comparisons and does not rely on randomization. Unfortunately, it uses an auxilliary array during its merge phase. Allocating this array can be expensive and is a potential point of failure if memory is limited. Quicksort is an in-place algorithm and is a close second in terms of the number of comparisons, but is randomized so this running time is not always guaranteed. Heap-sort does the most comparisons, but it is in-place and deterministic.

There is one setting in which merge-sort is a clear-winner; this occurs when sorting a linked-list. In this case, the auxiliary array is not needed; two sorted linked lists are very easily merged into a single sorted linked-list by pointer manipulations.

The counting-sort and radix-sort algorithms described here are due to Seward [57, Section 2.4.6]. However, variants of radix-sort have been used since the 1920's to sort punch cards using punched card sorting machines. These machines can sort a stack of cards into two piles based on the existence (or not) of a hole in a specific location on the card. Repeating this process for different hole locations gives an implementation of radix-sort.

Exercise 11..1   Implement a version of the merge-sort algorithm that sorts a $ \mathtt{DLList}$ without using an auxiliary array.

Exercise 11..2   Analyze the expected number of comparisons done by Quicksort a little more carefully than the proof of Theorem 11.3. In particular, show that the expected number of comparisons is $ 2\ensuremath{\mathtt{n}}H_\ensuremath{\mathtt{n}} -\ensuremath{\mathtt{n}} + H_\ensuremath{\mathtt{n}}$.

Exercise 11..3   Find another pair of permutations of $ 1,2,3$ that are not correctly sorted by the comparison-tree in Figure 11.6.

Exercise 11..4   Prove that $ \log \ensuremath{\mathtt{n}}! = \ensuremath{\mathtt{n}}\log \ensuremath{\mathtt{n}}-O(\ensuremath{\mathtt{n}})$.

Exercise 11..5   Prove that a binary tree with $ k$ leaves has height at least $ \log k$.

Exercise 11..6   Prove that, if we pick a random leaf from a binary tree with $ k$ leaves, then the expected height of this leaf is at least $ \log k$. (Hint: Use induction along with the inequality $ (k_1/k)\log k_1 +
(k_2/k)\log k_2) \ge \log k-1$, when $ k_1+k_2=k$.)

Exercise 11..7   Simple floating point numbers are numbers of the form $ x\cdot10^{y}$, where $ 0\le x\le 1$ and $ y$ is an integer and each of $ x$ and $ y$ can be represented by at most $ k$ bits. Describe a version of radix-sort that can be used to sort simple floating point numbers.

Extend your version of radix sort so that it can handle signed values of both $ x$ and $ y$ and implement a version of radix-sort that sorts arrays of type $ \mathtt{float}$.

opendatastructures.org