CS 1501

CS 1501 Summer 2003

Practice Questions for Midterm Exam

Here are some example questions that may help you to study for the midterm exam. Try to answer the questions fully before looking at the answers. Though these questions indicate some of the material that may be on the exam, they are by no means comprehensive. Remember to also study EVERYTHING up to the last class before the exam on the Syllabus and Class Schedule , as well as everything pertaining to the first 3 assignments.

Fill in the Blanks Complete the statements below with the MOST APPROPRIATE words/phrases.

a) Given N keys, each with b bits, in a digital search tree, the WORST CASE search time for a key in the tree requires ____________________ bit comparisons, while the AVERAGE CASE search time requires ________________ bit comparisons.

b) Two variations of QuickSort designed to minimize the probability of the worst case are __________________ and ___________________.

c) The GradeSchool integer multiplication algorithm requires Theta______________ time for N-bit integers.

d) If I have an open addressing hash table of size M, and a cluster of size C, the probability that a random key will be Inserted into the location immediately after the cluster is _________________.

e) Given a substitution cipher using an alphabet with S values in it, there are ______________________ possible keys that can be used.

f) An RSA code can be broken in a straightforward way by __________________________________________________.

g) To avoid being ambiguous, variable-length codeword compression algorithms must satisfy the ____________________________________.

True/False Indicate whether each of the following statements is True or False. For False answers, explain why it is false.

a) The brute-force string matching algorithm requires Theta(N²) character comparisons in the worst case.

b) A Patricia tree reduces the number of nodes from a Multiway Trie by eliminating nodes with a single child.

Short Answers and Calculations

1) You have two programs, Program A, which runs in time k₁N and Program B, which runs in time k₂log₂(N) for some constants k₁ and k₂. Assume that for a problem of size N_o, both programs take X seconds to execute. Approximately how much time would each program take to run if we double the problem size? Show your work.

2) Define what it means to have a collision in a hash table, and why we cannot usually prevent them from occurring.

3) Consider InsertionSort of an array. Explain what causes its worst case performance and derive the Theta run-time (in detail) of the algorithm in this case.

4) Consider the recursive pseudo-code function below:

void foo(int A[], int low, int high)

{

if (low < high)

{

for (int i = low; i <= high; i++)

constant_time_process(A[i]);

for (int i = high; i >= low; i--)

constant_time_process(A[i]);

int mid = (low + high)/2;

foo(A, low, mid);

foo(A, mid, high);

int index = randvalue(low, high);

if (index < mid)

foo(A, low, mid);

else

foo(A, mid, high);

}

Give and justify the recurrence for this function, assuming that the original array size is N (from low to high).

5) I would like to send a message to a friend such that only the friend can read it and such that the friend knows that the message must be from me. Describe how I can do this in a fairly simple way using RSA, and how my friend will be able to read the message and know that it was from me.

6) Assume you want to save time and (you hope) space with your Huffman code by using a standardized tree, rather that one tailored to each file. In this way you can incorporate the frequencies directly into your algorithm as constants, rather than having to put them in the front of your encoded file. Furthermore, you do not have to count the frequencies in each file, so you can encode in 1 pass rather than two. It seems like this is a great idea, but maybe it isn't. Explain what, if anything, may be bad about your variation.

Coding

1) Assume that you are using linear probing in a hash table of strings. Function h(x) is defined as we discussed it in class (and you do NOT have to write it). Complete the code for the Find function below, which will return 1 if the item is present and 0 otherwise. Be sure to handle ALL possibilities.

int Find(string item, string table[]) // Assume table size is M.  Also

{ // assume that all table locations

int index = h(item); // were set to the empty string ("")

// prior to any Inserts and that no

// Deletes are allowed.

// fill in code

}

2) Consider the code for QuickSort below:

void quicksort(itemType a[], int L, int R)

  int i, j; itemType v;

  if (R > L)

     v = a[R]; i = L-1; j = R;

     for (;;)

         while (a[++i] < v) ;

         while (a[--j] > v) ;

         if (i >= j) break;

         swap(a, i, j);

     swap(a, i, R);

     quicksort(a, L, i-1);

     quicksort(a, i+1, R);

Assume the array passed into the function has data in locations L to R. Explain in detail why the code as written will likely cause a run-time error, and clearly explain 2 ways that it can be fixed so that it will always run properly. Be specific and give an example that will cause it to crash as shown.

SOLUTIONS

Fill in the Blanks Complete the statements below with the MOST APPROPRIATE words/phrases.

a) Given N keys, each with b bits, in a digital search tree, the WORST CASE search time for a key in the tree requires ______b______________ bit comparisons, while the AVERAGE CASE search time requires ___Theta(lgN)___ bit comparisons.

b) Two variations of QuickSort designed to minimize the probability of the worst case are __median of three_ and __random pivot_____.

c) The GradeSchool integer multiplication algorithm requires Theta____N²______ time for N-bit integers.

d) If I have an open addressing hash table of size M, and a cluster of size C, the probability that a random key will be Inserted into the location immediately after the cluster is ___(C+1)/M_______.

e) Given a substitution cipher using an alphabet with S values in it, there are _______S!_____________ possible keys that can be used.

f) An RSA code can be broken in a straightforward way by _______factoring N___________________.

g) To avoid being ambiguous, variable-length codeword compression algorithms must satisfy the ____prefix property______.

True/False Indicate whether each of the following statements is True or False. For False answers, explain why it is false.

a) The brute-force string matching algorithm requires Theta(N²) character comparisons in the worst case. FALSE – this algorithm requires Theta(NM) character comparisons in the worst case, where M is the pattern length.

b) A Patricia tree reduces the number of nodes from a Multiway Trie by eliminating nodes with a single child. TRUE

Short Answers and Calculations

For Program A, since the time is linear, we know that if we double the problem size the run-time should also double. Thus we can say 2X seconds for Program A. For Program B it is more complicated, since Program B runs in logarithmic time. However we can still solve this with some math:

We know: k₂log₂(N_o) = X

And we want to solve k₂log₂(2N_o) = ?

Remembering properties of logarithms, we can rewrite the problem as follows:

k₂log₂(2N_o) = k₂[log₂(2) + log₂(N_o)] = k₂ + k₂log₂(N_o) = k₂ + X

2) Define what it means to have a collision in a hash table, and why we cannot usually prevent them from occurring.

A collision occurs in a hash table if, for two keys, x₁ and x₂, h(x₁) = h(x₂), with x₁ != x₂. Collisions cannot usually be prevented, since, in most instances, the key space being used (all possible keys) is greater in size than the table size, and, by the Pigeonhole Principle, at least two distinct keys must map to the same table location.

3) The worst case for InsertionSort of an array occurs when the data is initially reverse-sorted. This produces the worst case because each item, when shifted from the "unsorted" (right) part of the array into the "sorted" (left) part of the array, must be moved all the way to the first location, resulting in the maximum amount of comparisons and shifts. The Theta run-time in this case is derived as follows (assuming items start in location 1 of the array) for comparisons (the same analysis holds for shifts):

For Item 2 we need 1 comparison to move it to location 1
For Item 3 we need 2 comparisons to move it to location 1
For Item 4 we need 3 comparisons to move it to location 1
…

For Item N we need (N-1) comparisons to move it to location 1

Summing the comparisons we get 1 + 2 + … + (N-1) which we know to equal (N-1)(N)/2, which is Theta(N²).

4) Given the initial array size of N, we see the following: Each for loop will execute N times, requiring constant time for each iteration. Thus, the for loops will require Theta(N) time. There are 3 recursive calls that execute as a result of the first call, and each of those is on an array of half the original size. Thus we get the overall recurrence T(N) = 3T(N/2) + Theta(N).

5) To do this in a simple way, I do two steps before sending the message:

Step 1) DECRYPT the message using MY PRIVATE RSA key. I could alternatively use a digital signature on the message and encrypt the signature for a similar result.

Step 2) ENCRYPT the result from 1) above using my FRIEND'S PUBLIC RSA key.

I then send the result from 2) to my friend.

Upon receiving the message, my friend reverses the process:

Step 1) DECRYPT the received message using my FRIEND'S PRIVATE RSA key.

Step 2) ENCRYPT the result from 1) using MY PUBLIC RSA key (or encrypt the signature and compare as discussed in class).

Now the only way a non-garbage message will result is if the original message was DECRYPTED using my key, so my friend knows that the message must have been from me (he must trust that key used was indeed my key).

6) A standardized tree will in fact save you both time and space AS LONG AS THE FILES YOU COMPRESS HAVE THE SAME FREQUENCIES AS THOSE USED IN YOUR TREE. On the other hand, if, for a given file, the frequencies do NOT match those of your tree, you could get poor compression and in fact even expansion of your file. This is clearly not desirable. Since different types of files tend to have very different character frequencies (ex. C++ file vs. a novel vs. some raw numeric data) this variation in general would not be a good one to use.

Coding

int Find(string item, string table[]) // Assume table size is M.  Also

{ // assume that all table locations

int index = h(item); // were set to the empty string ("")

// prior to any Inserts and that no

// Deletes are allowed.

for (int i = 0; i < M; i++)

{

int curr = (index + i) % M;

if (table[curr] == item) // found

return 1;

else if (table[curr] == "") // empty spot – not found

return 0;

}

return 0; // cycled through all locations – not found

}

2) The loop while (a[--j] > v) ; may go past the left end of the array, since it does no bounds checking. For example, if the data in the array is initially reverse sorted, the pivot will be the smallest value in the array and the loop will not terminate within the bounds of the array. To fix this, we can either place a sentinel into location a[0] that is guaranteed to stop the loop, for ex: a[0] = 0; if all of the regular values are positive, or a[0] = a[r] if the values can be anything and we choose a[r] as the sentinel. Alternatively, we could add another test to the loop:

while (j > l && a[--j] > v); // it is the variable l (for left) , not the number 1.