CS 1501 Summer
2006
Practice Questions for Midterm Exam
Here are some example questions that may help
you to study for the midterm exam. Try to answer the questions fully before
looking at the answers. Though these questions indicate some of the material
that may be on the exam, they are by no means comprehensive. Remember to also
study EVERYTHING up to the last class before the exam on the Syllabus and Class
Schedule , as well as everything pertaining
to the first 3 assignments.
Fill in the Blanks Complete the statements below with the MOST APPROPRIATE words/phrases.
a) Given N keys, each with b bits, in a digital search tree, the WORST CASE search time for a key in the tree requires ____________________ bit comparisons, while the AVERAGE CASE search time requires ________________ bit comparisons.
b) Delete is a problem with open-addressing hashing because _______________________________________________________________________.
c) If I have an open addressing hash table of size M, and a cluster of size C, the probability that a random key will be Inserted into the location immediately after the cluster is _________________.
d) The mismatched character heuristic of the Boyer-Moore algorithm has a best case run-time of ____________.
e) If an encoding scheme satisfies the prefix property, it is certain that __________________________________________________________________.
True/False Indicate whether each of the following statements is True or False. For
False answers, explain why it is false.
a) The brute-force algorithm to find a Hamiltonian Cycle in graph has an upper-bound run-time of Theta(2n).
b) A
Patricia tree reduces the number of nodes from a Multiway
Trie by eli
c) A good hash function should utilize the entire key.
Short Answers and Calculations
1) You have two programs, Program A, which runs in time k1N and Program B, which runs in time k2log2(N) for some constants k1 and k2. Assume that for a problem of size No, both programs take X seconds to execute. Approximately how much time would each program take to run if we double the problem size? Show your work.
2) Define what it means to have a collision in a hash table, and why we
cannot usually prevent them from occurring.
3) Consider a file containing the following text data:
AAABBBAAB
Trace
the LZW encoding process for the file (in the same way done in handout lzw.txt,
so each "step" produces a single codeword). Assume that the extended ASCII set will use codewords 0-255. For
each step in the encoding, be sure to show all of the information indicated
below. Note: The ASCII value for 'A' is
65.
LONGEST
STEP # PREFIX MATCHED CODEWORD OUTPUT (STRING,
CODE) ADDED TO DICTIONARY
------ -------------- --------------- ----------------------------------
4) Consider Huffman compression and (8 bit) MTF self-organizing list compression. Which would perform better on each of the following files and why? Be specific by giving approximate compression ratios for each algorithm in each case.
a) A file containing 1000 of each character in the alphabet
b) A file
containing 1000000 As
5) Consider the mismatched character heuristic of the Boyer-Moore
string matching algorithm. For the
pattern and text strings shown below, state and justify how many total
character comparisons must be done in order to match each pattern within
the text string. Justify your answer
using the skip array for the pattern.
Text: ABCDXABCDYABCDZABCDE
Pattern: ABCDE
6) Justify in detail how many character comparisons are required to find a
string in a DLB in the worst case.
Assume that your DLB has N strings, each with a maximum of K characters,
and that your alphabet has S possible characters in it.
Coding
1) Assume that you are using linear probing in a hash table of Strings. Function h(x) is defined as we discussed it in class (and you do NOT have to write it). Complete the code for the Find function below, which will return true if the item is present and false otherwise. Be sure to handle ALL possibilities.
String [] table; // instance variable // other methods not shown public boolean Find(String item)
{ // Note that all
table locations int index =
h(item); // are initialized
to null prior // to
any Inserts. Assume that no //
Deletes are allowed. // fill in code } |
SOLUTIONS
Fill in the Blanks Complete the statements below with the MOST APPROPRIATE words/phrases.
a) Given N keys, each with b bits, in a digital search tree, the WORST CASE search time for a key in the tree requires ______b______________ bit comparisons, while the AVERAGE CASE search time requires ___Theta(lgN)___ bit comparisons.
b) Delete is a problem with open-addressing hashing because ___if a value within a cluster is deleted, values after it in the cluster may not be found___.
c) If I have an open addressing hash table of size M, and a cluster of size C, the probability that a random key will be Inserted into the location immediately after the cluster is ___(C+1)/M_______.
d) The mismatched character heuristic of the Boyer-Moore algorithm has a best case run-time of ___N/M_____.
e) If an encoding scheme satisfies the prefix property, it is certain that ____no codeword is a prefix of any other codeword_________________.
True/False
Indicate whether each of the following statements is True
or False. For False answers, explain
why it is false.
a) The brute-force algorithm to find a Hamiltonian Cycle in graph has an upper-bound run-time of Theta(2n). FALSE – the upper bound is n!
b) A
Patricia tree reduces the number of nodes from a Multiway
Trie by eli
c) A good hash function should utilize the entire key. TRUE
Short
Answers and Calculations
1) You have two programs, Program A, which runs in time k1N and Program B, which runs in time k2log2(N) for some constants k1 and k2. Assume that for a problem of size No, both programs take X seconds to execute. Approximately how much time would each program take to run if we double the problem size? Show your work.
For Program A, since the time is linear, we know that if we double the problem size the run-time should also double. Thus we can say 2X seconds for Program A. For Program B it is more complicated, since Program B runs in logarithmic time. However we can still solve this with some math:
We know: k2log2(No) = X
And we want to solve k2log2(2No) = ?
Remembering properties of logarithms, we can rewrite the problem as follows:
k2log2(2No) = k2[log2(2) + log2(No)] = k2 + k2log2(No) = k2 + X
2) Define what it means to have a collision in a hash table, and why we cannot usually prevent them from occurring.
A collision occurs in a hash table if, for two keys, x1 and x2, h(x1) = h(x2), with x1 != x2. Collisions cannot usually be prevented, since, in most instances, the key space being used (all possible keys) is greater in size than the table size, and, by the Pigeonhole Principle, at least two distinct keys must map to the same table location.
3) Consider a file containing the following text data:
AAABBBAAB
Trace
the LZW encoding process for the file (in the same way done in handout lzw.txt,
so each "step" produces a single codeword). Assume that the extended ASCII set will use codewords 0-255. For
each step in the encoding, be sure to show all of the information indicated
below. Note: The ASCII value for 'A' is
65.
LONGEST
STEP # PREFIX MATCHED CODEWORD OUTPUT (STRING,
CODE) ADDED TO DICTIONARY
------ -------------- --------------- ----------------------------------
1
A 65 (AA, 256)
2
AA 256 (AAB, 257)
3
B 66 (BB, 258)
4
BB 258 (BBA, 259)
5 AAB 257 --
4) Consider Huffman compression and (8 bit) MTF self-organizing list compression. Which would perform better on each of the following files and why? Be specific by giving approximate compression ratios for each algorithm in each case.
a) A file containing 1000 of each character in the alphabet
b) A file containing 1000000 As
a) Since all characters have
the same frequencies, Huffman will obtain no compression, since it depends on frequency
disparities to be effective. On the other
hand, the MTF heuristic will do well – the first occurrence of each character could
require up to 11 bits, but the remaining 999 will require the
b) By the same logic as
above, the MTF heuristic will still achieve a compression ratio of about ½. However, now since all characters are the
same, Huffman will approach its optimal ratio of 1/8 – since the Huffman tree
will have only a single edge, thereby requiring only 1 bit to encode A, as
opposed to the 8 bits required in ASCII.
Note that the tree information will take up some space, but the
compression should still be close to the optimal amount.
5) Skip(A)=4, Skip(B)=3, Skip(C)=2, Skip(D)=1 and Skip(E)=0.
All other Skip array entries are 5. The total number of character comparisons
is 8. For each of the first 3 mismatches the maximum skip value of 5 is
used. The final 5 comparisons are used
to match the pattern right to left.
6) Justify in detail how many character comparisons
are required to find a string in a DLB in the worst case. Assume that your DLB has N strings, each with
a maximum of K characters, and that your alphabet has S possible characters in
it.
Recall that a DLB "node" consists of a
number of "nodelets" – one "nodelet" per possible character for a given prefix in
the dictionary. In the worst case, all S
characters in a given "node" are used, thereby requiring S "nodelets". If a
character being searched for in a given "node" happens to be in the
last "nodelet", S character comparisons
will be required in the exa
Coding
1)
public boolean Find(String item)
{ // Note that all table
locations int index =
h(item); // are initialized
to null prior // to
any Inserts. Assume that no //
Deletes are allowed. for (int i = 0; i < table.length; i++) { int curr = (index + i) % table.length; if (table[curr]
== null) // null location – not found return false; else
if (table[curr].equals(item)) // found return true; } return false; // cycled through all locations – not found } |