1.7 Discussion and Exercises

The List, USet, and SSet interfaces described in Section 1.1 are influenced by the Java Collections Framework [54]. These are essentially simplified versions of the List, Set/Map, and SortedSet/SortedMap interfaces found in the Java Collections Framework. Indeed, the accompanying source code includes wrapper classes for making USet and SSet implementations into Set, Map, SortedSet, and SortedMap implementations.

For a superb (and free) treatment of the mathematics discussed in this chapter, including asymptotic notation, logarithms, factorials, Stirling's approximation, basic probability, and lots more, see the textbook by Leyman, Leighton, and Meyer [50]. For a gentle calculus text that includes formal definitions of exponentials and logarithms, see the (freely available) classic text by Thompson [71].

For more information on basic probability, especially as it relates to computer science, see the textbook by Ross [65]. Another good reference, that covers both asymptotic notation and probability, is the textbook by Graham, Knuth, and Patashnik [37].

Readers wanting to brush up on their Java programming can find many Java tutorials online [56].

Exercise 1..1 This exercise is designed to help get the reader familiar with choosing the right data structure for the right problem. If implemented, the parts of this exercise should be done by making use of an implementation of the relevant interface (Stack, Queue, Deque, USet, or SSet) provided by the Java Collections Framework.

Solve the following problems by reading a text file one line at a time and performing operations on each line in the appropriate data structure(s). Your implementations should be fast enough that even files containing a million lines can be processed in a few seconds.

Read the input one line at a time and then write the lines out in reverse order, so that the last input line is printed first, then the second last input line, and so on.
Read the first 50 lines of input and then write them out in reverse order. Read the next 50 lines and then write them out in reverse order. Do this until there are no more lines left to read, at which point any remaining lines should be output in reverse order.
In other words, your output will start with the 50th line, then the 49th, then the 48th, and so on down to the first line. This will be followed by the 100th line, followed by the 99th, and so on down to the 51st line. And so on.
Your code should never have to store more than 50 lines at any given time.
Read the input one line at a time. At any point after reading the first 42 lines, if some line is blank (i.e., a string of length 0) then output the line that occured 42 lines prior to that one. For example, if Line 242 is blank, then your program should output line 200. This program should be implemented so that it never stores more than 43 lines of the input at any given time.
Read the input one line at a time and write each line to the output if it is not a duplicate of some previous input line. Take special care so that a file with a lot of duplicate lines does not use more memory than what is required for the number of unique lines.
Read the input one line at a time and write each line to the output only if you have already read this line before. (The end result is that you remove the first occurrence of each line.) Take special care so that a file with a lot of duplicate lines does not use more memory than what is required for the number of unique lines.
Read the entire input one line at a time. Then output all lines sorted by length, with the shortest lines first. In the case where two lines have the same length, resolve their order using the usual ``sorted order.'' Duplicate lines should be printed only once.
Do the same as the previous question except that duplicate lines should be printed the same number of times that they appear in the input.
Read the entire input one line at a time and then output the even numbered lines (starting with the first line, line 0) followed by the odd-numbered lines.
Read the entire input one line at a time and randomly permute the lines before outputting them. To be clear: You should not modify the contents of any line. Instead, the same collection of lines should be printed, but in a random order.