CS 1501

CS 1501 Summer 2006

Practice Questions for Midterm Exam

Here are some example questions that may help you to study for the midterm exam. Try to answer the questions fully before looking at the answers. Though these questions indicate some of the material that may be on the exam, they are by no means comprehensive. Remember to also study EVERYTHING up to the last class before the exam on the Syllabus and Class Schedule , as well as everything pertaining to the first 3 assignments.

Fill in the Blanks Complete the statements below with the MOST APPROPRIATE words/phrases.

a) Given N keys, each with b bits, in a digital search tree, the WORST CASE search time for a key in the tree requires ____________________ bit comparisons, while the AVERAGE CASE search time requires ________________ bit comparisons.

b) Delete is a problem with open-addressing hashing because _______________________________________________________________________.

c) If I have an open addressing hash table of size M, and a cluster of size C, the probability that a random key will be Inserted into the location immediately after the cluster is _________________.

d) The mismatched character heuristic of the Boyer-Moore algorithm has a best case run-time of ____________.

e) If an encoding scheme satisfies the prefix property, it is certain that __________________________________________________________________.

True/False Indicate whether each of the following statements is True or False. For False answers, explain why it is false.

a) The brute-force algorithm to find a Hamiltonian Cycle in graph has an upper-bound run-time of Theta(2ⁿ).

b) A Patricia tree reduces the number of nodes from a Multiway Trie by eliminating nodes with a single child.

c) A good hash function should utilize the entire key.

Short Answers and Calculations

1) You have two programs, Program A, which runs in time k₁N and Program B, which runs in time k₂log₂(N) for some constants k₁ and k₂. Assume that for a problem of size N_o, both programs take X seconds to execute. Approximately how much time would each program take to run if we double the problem size? Show your work.

2) Define what it means to have a collision in a hash table, and why we cannot usually prevent them from occurring.

3) Consider a file containing the following text data:

AAABBBAAB

Trace the LZW encoding process for the file (in the same way done in handout lzw.txt, so each "step" produces a single codeword). Assume that the extended ASCII set will use codewords 0-255. For each step in the encoding, be sure to show all of the information indicated below. Note: The ASCII value for 'A' is 65.

LONGEST

STEP # PREFIX MATCHED CODEWORD OUTPUT (STRING, CODE) ADDED TO DICTIONARY

------ -------------- --------------- ----------------------------------

4) Consider Huffman compression and (8 bit) MTF self-organizing list compression. Which would perform better on each of the following files and why? Be specific by giving approximate compression ratios for each algorithm in each case.

a) A file containing 1000 of each character in the alphabet

b) A file containing 1000000 As

5) Consider the mismatched character heuristic of the Boyer-Moore string matching algorithm. For the pattern and text strings shown below, state and justify how many total character comparisons must be done in order to match each pattern within the text string. Justify your answer using the skip array for the pattern.

Text: ABCDXABCDYABCDZABCDE

Pattern: ABCDE

6) Justify in detail how many character comparisons are required to find a string in a DLB in the worst case. Assume that your DLB has N strings, each with a maximum of K characters, and that your alphabet has S possible characters in it.

Coding

1) Assume that you are using linear probing in a hash table of Strings. Function h(x) is defined as we discussed it in class (and you do NOT have to write it). Complete the code for the Find function below, which will return true if the item is present and false otherwise. Be sure to handle ALL possibilities.

String [] table; // instance variable

// other methods not shown

public boolean Find(String item)

{ // Note that all table locations

int index = h(item); // are initialized to null prior

// to any Inserts. Assume that no

// Deletes are allowed.

// fill in code

}

SOLUTIONS

Fill in the Blanks Complete the statements below with the MOST APPROPRIATE words/phrases.

a) Given N keys, each with b bits, in a digital search tree, the WORST CASE search time for a key in the tree requires ______b______________ bit comparisons, while the AVERAGE CASE search time requires ___Theta(lgN)___ bit comparisons.

b) Delete is a problem with open-addressing hashing because ___if a value within a cluster is deleted, values after it in the cluster may not be found___.

c) If I have an open addressing hash table of size M, and a cluster of size C, the probability that a random key will be Inserted into the location immediately after the cluster is ___(C+1)/M_______.

d) The mismatched character heuristic of the Boyer-Moore algorithm has a best case run-time of ___N/M_____.

e) If an encoding scheme satisfies the prefix property, it is certain that ____no codeword is a prefix of any other codeword_________________.

True/False Indicate whether each of the following statements is True or False. For False answers, explain why it is false.

a) The brute-force algorithm to find a Hamiltonian Cycle in graph has an upper-bound run-time of Theta(2ⁿ). FALSE – the upper bound is n!

b) A Patricia tree reduces the number of nodes from a Multiway Trie by eliminating nodes with a single child. TRUE

c) A good hash function should utilize the entire key. TRUE

Short Answers and Calculations

For Program A, since the time is linear, we know that if we double the problem size the run-time should also double. Thus we can say 2X seconds for Program A. For Program B it is more complicated, since Program B runs in logarithmic time. However we can still solve this with some math:

We know: k₂log₂(N_o) = X

And we want to solve k₂log₂(2N_o) = ?

Remembering properties of logarithms, we can rewrite the problem as follows:

k₂log₂(2N_o) = k₂[log₂(2) + log₂(N_o)] = k₂ + k₂log₂(N_o) = k₂ + X

2) Define what it means to have a collision in a hash table, and why we cannot usually prevent them from occurring.

A collision occurs in a hash table if, for two keys, x₁ and x₂, h(x₁) = h(x₂), with x₁ != x₂. Collisions cannot usually be prevented, since, in most instances, the key space being used (all possible keys) is greater in size than the table size, and, by the Pigeonhole Principle, at least two distinct keys must map to the same table location.

3) Consider a file containing the following text data:

AAABBBAAB

LONGEST

STEP # PREFIX MATCHED CODEWORD OUTPUT (STRING, CODE) ADDED TO DICTIONARY

------ -------------- --------------- ----------------------------------

1 A 65 (AA, 256)

2 AA 256 (AAB, 257)

3 B 66 (BB, 258)

4 BB 258 (BBA, 259)

5 AAB 257 --

a) A file containing 1000 of each character in the alphabet

b) A file containing 1000000 As

a) Since all characters have the same frequencies, Huffman will obtain no compression, since it depends on frequency disparities to be effective. On the other hand, the MTF heuristic will do well – the first occurrence of each character could require up to 11 bits, but the remaining 999 will require the minimum 4 bits, since the character will then be at the front of the list. Thus the ratio for MTF will approach ½, since MTF needs 3 bits to indicate how many bits follow, followed by the one bit to indicate that the character is at the front of the list.

b) By the same logic as above, the MTF heuristic will still achieve a compression ratio of about ½. However, now since all characters are the same, Huffman will approach its optimal ratio of 1/8 – since the Huffman tree will have only a single edge, thereby requiring only 1 bit to encode A, as opposed to the 8 bits required in ASCII. Note that the tree information will take up some space, but the compression should still be close to the optimal amount.

5) Skip(A)=4, Skip(B)=3, Skip(C)=2, Skip(D)=1 and Skip(E)=0. All other Skip array entries are 5. The total number of character comparisons is 8. For each of the first 3 mismatches the maximum skip value of 5 is used. The final 5 comparisons are used to match the pattern right to left.

Recall that a DLB "node" consists of a number of "nodelets" – one "nodelet" per possible character for a given prefix in the dictionary. In the worst case, all S characters in a given "node" are used, thereby requiring S "nodelets". If a character being searched for in a given "node" happens to be in the last "nodelet", S character comparisons will be required in the examination of a single position within the string. If this worst case occurs for each position within the string, a total of SK character comparisons will be required in total. Note that this worst case is extremely unlikely, since after the first few levels the "nodes" typically have very few "nodelets"' in them (which is why we use the DLB in the first place).

Coding

public boolean Find(String item)

{ // Note that all table locations

int index = h(item); // are initialized to null prior

// to any Inserts. Assume that no

// Deletes are allowed.

for (int i = 0; i < table.length; i++)

{

int curr = (index + i) % table.length;

if (table[curr] == null) // null location – not found

return false;

else if (table[curr].equals(item)) // found

return true;

}

return false; // cycled through all locations – not found

}