Assignment 4: Search Performance

We have been looking at sorting algorithms and discussing their runtimes. In this assignment, you will empirically verify the relative efficiencies discussed in lecture. You will do this by timing various sorting implementation on different sized arrays and tabulating the results. Using the results, you will comparing the algorithms' performances and come to conclusions about which versions are best.

Details

Algorithms

You will test the following algorithms:

Simple Quick Sort – Simple Quicksort with A[last] as the pivot (in file Quick.java)
Median of 3 (5) – Median of 3 Quicksort as given in TextMergeQuick.java (base case array size < 5)
Median of 3 (10) – Median of 3 Quicksort as given in TextMergeQuick.java, but with base case array size < 10
Median of 3 (50) – Median of 3 Quicksort as given in TextMergeQuick.java, but with base case array size < 50
Random Pivot – Random Pivot Quicksort with base case array size < 5
Insertion Sort – Insertion Sort (code is available in the Lecture Notes)
Merge Sort – Merge Sort (as given in TextMergeQuick.java)

The code for all algorithms above, except #5 (Random Pivot), is already completely written - you only have to change the base case value in the Median of 3 sort from a constant to a variable so that you can give it different values during your program execution. You must write the Random Pivot Quick Sort so that it works correctly. This is actually very similar to the simple Quick Sort, except that you choose the pivot index as a random integer between first and last (inclusive) rather than choosing it as A[last].

User Input

Your program should allow the following to be input from the user:

The size of the arrays to be tested.
The number of trials for each test. The overall time for the test will be the average of the times for each of the trials. For random data, each trial should have different numbers, but the data for a given trial should be the same random data for each algorithm. In other words, consider, for example, an array called A1, algorithms QS1 and QS2, and trials T1 and T2. If A1 is filled with random numbers for QS1 in trial T1, then those same numbers (in the same initial positions) should be used for QS2 in trial T1. However, different random numbers should be generated for trial T2, again using the same numbers for both QS1 and QS2.
The name of the file your results will be output to.

Part of your program may be graded by a program. The prompts of your program must be identical to those shown below: (user input appears in bold and underlined)

Enter array size: 25000
Enter number of trials: 10
Enter file name: test25k.txt

Data

For each algorithm, your program should iterate through three initial setups of the data:

Random – fill the arrays with random integers. To make your assessments more accurate, each of your algorithms should utilize the same random data, as mentioned above. This can be accomplished in several ways but you will lose credit if the data is not the same.
Sorted – fill the arrays with successive integers starting at 1.
Reverse Sorted – fill the arrays with decreasing integers starting at the array length.

Timing

You will need to time your algorithms. Use the method System.nanoTime(). This method returns the number of nanoseconds since the JVM started (a nanosecond is 10^-9 seconds, i.e. one second is one billion nanoseconds). Timing a trial follows this pattern:

long start = System.nanoTime();
// Execute the sorting method here (array should ALREADY be filled before timing starts)
long finish = System.nanoTime();
long delta_ns = finish - start;

Since you are performing multiple trials, for a given algorithm you will add the times for the trials together, then divide by the number of trials to get the average time per trial. Divide by 1 billion to get your final results in seconds rather than nanoseconds.

Output

For each of the variations in the run, your program must output its results to the file named by the user. Note that since you have 3 data setups and 7 algorithms, each overall execution of your program should produce 21 different results. Each result should look like the following example:

Algorithm: Simple Quick Sort
Array Size: 25000
Order: Random
Number of trials: 10
Average Time per trial: 0.0063856 sec.

When displaying the algorithm name and order, use the bolded names exactly as shown above. So, "Simple Quick Sort" is valid, but these are not valid: "simple quick sort", "SIMPLE QUICK SORT", "Simple QuickSort", "Simple QS", "Simpl Quik Sort".

Trace Output Mode

In order for your TA to be able to test the correctness of your sorting algorithms and main program logic, you are required to have a Trace Output Mode for your program. This mode should be automatically set when the Array Size is <= 20. In Trace Output Mode, your program should output all of the following to standard output (i.e. the display) for each trial of each algorithm:

Algorithm being used
Array Size
Data configuration (sorted, reverse sorted, random)
Initial data in array prior to sorting
Data in array after sorting
Time (in nanoseconds) required for the sort

The evaluation of the correctness of your algorithms and data processing will be heavily based on the Trace Output Mode for your program. If you do not implement this or it does not work correctly, you will likely lose a lot of credit. Also, be sure that Trace Output Mode is off for arrays larger than 20.

Experiment

The goal is to see how the runtimes of the algorithms change as the size of the arrays increases. However, actual runtimes will vary based on various factors, including processor speed, memory capacity/speed, and how busy the computer is. Follow the guidelines below for the array sizes. Use 10 trials for all of your runs.

Size = 25000, Filename = test25k.txt
Size = 50000, Filename = test50k.txt
Size = 100000, Filename = test100k.txt
Note: Only do the first 3 sizes above for the Simple Quick Sort and Insertion Sort. Even with these it may take a while for the algorithms to complete and you will have to increase the stack size of the JRE to accommodate the execution (see below). For the sizes below you will only have 15 results.
Size = 200000, Filename = test200k.txt
Size = 400000, Filename = test400k.txt
Size = 800000, Filename = test800k.txt
Size = 1600000, Filename = teset1600k.txt

To increase the stack size when running a Java program, include the switch/argument/flag: "-Xss10m", e.g.:

java -Xss10m Assig4

After running that command, the user should be prompted as shown below: (user input appears in bold and underlined)

Enter array size: 25000
Enter number of trials: 10
Enter file name: test25k.txt

Given that input, your program should produce output similar to test25k.txt. Note: in the output file, the times are randomly generated and do not indicate the runtimes of the algorithms on the input.

Results

After you have finished all of your runs, tabulate your results in a spreadsheet. Use a different worksheet for each initial ordering (Random, Sorted, Reverse Sorted). In each worksheet, make a table of your results with the array sizes as the columns and the algorithms as the rows. Also make a graph for each of your tables so that you can visualize the growth of the runtime for each algorithm. You must also write a brief summary / discussion of your results. Based on your tables, indicate the best algorithm for each of the initial data orderings. Based on your overall results (for all data orderings), speculate on what you think the best of the 7 algorithms is for general purpose use. Your write-up should be well written and justified by your results. Your write-up can be embedded in your spreadsheet or submitted as a separate document (ex: a Word document).

Final Notes & Hints

You will only be sorting primitive arrays of integers. Do not create ArrayLists to sort (although you may use ArrayLists in other ways if you want).
For help with generating random integers, use the nextInt method provided by the Random class. Note that there are two nextInt methods; consider which is more appropriate here.
To make your results more accurate, do not run anything else on your machine while you are doing your runs. Don't worry about system processes that are running, just make sure you don't run any other applications.
To make your results consistent, do all of your runs on the same machine under the same (if possible) circumstances.
Note that for smaller arrays and in some cases even for larger arrays, the time for a given trial may be very small, perhaps even negligible.
Be sure to time only the actual sorting procedure. Do not time loading the data into the array or any I/O (especially not I/O since this is very slow and will skew the timing greatly).
As we discussed in class, in some cases a recursive algorithm makes so many calls that it uses up all of the memory in the runtime stack, causing the JRE to crash. To prevent this problem, we can invoke the Java interpreter with a flag to indicate the size of the runtime stack. This can be done in the following way: prompt> java -Xsssize MainClassName You may have to experiment with the value for size to avoid getting a StackOverflowError, but using 10M should work for all of the recursive algorithms above. If you think about how these algorithms execute, you will see that we only have to worry about the stack size for the Simple QuickSort algorithm in the cases of the sorted and reverse sorted data.
Part of your program may be auto-graded. Therefore, it is very important that your input and output follow exactly as shown above.

Submission and Grading:

Complete the Assignment Information Sheet.

Make sure you submit all source code files, your spreadsheet of results, and your summary/discussion in a single .zip file to get full credit. Name your main program Assign4.java. Your spreadsheet must be saved in Excel (.xls, .xlsx) or OpenDocument (.ods) format. If you submit your summary/discussion in a separate file, it must be in plain text (.txt), Word (.doc, .docx), or OpenDocument (.odt) format.

Submit your final zip file to CourseWeb in the Assignment 4 folder.

The grading rubric can be found here: Rubric (doc).

The assignment is due Monday, November 16 by 11:59 pm. As with all programming assignments, you have unlimited uploads (before the deadline), so you may upload the assignment before the deadline. If you later decide to upload another, you may do so without penalty (as long as it's before the assignment deadline). The last submission uploaded will be the one graded.

If you would like ungraded feedback on a programming assignment, you may send an email to your TA or the instructor and ask for feedback; please send your code as well. If your question is basically "Are there any problems with my program?" or "Can you check my code?" tell us what you've already done to test your program; provide the output from the test runs of your program.

For more advice on submitting your assignment, see the Programming Assignments section of the Tips for Success page.