Alphabet 3 #number of alphabets 1 2 3 #alphabets, split by '\t' States 2 #number of states HOT COLD #states, split by '\t' StartProbability #prior probabilities of the states, the order is the same as before 0.8 0.2 TransitionProbability #transition probability matrix(T), N*N, N is the number of states, T[i][j] = p(s_j|s_i) 0.7 0.3 0.4 0.6 EmissionProbability #emission probability matrix(E), N*M, N is the number of states, M is the number of alphabets. E[i][j] = p(a_j|s_i) 0.2 0.4 0.4 0.5 0.4 0.1
Implement a probabilistic CKY parser.
The sample grammar file
pcfg.txt (click here to download) provided is exactly the same as the probabilistic grammar from the textbook.
0.80 S -> NP VP 0.15 S -> Aux NP VP 0.05 S -> VP 0.35 NP -> Pronoun 0.30 NP -> Proper-Noun 0.20 NP -> Det Nominal 0.15 NP -> Nominal 0.75 Nominal -> Noun 0.20 Nominal -> Nominal Noun 0.05 Nominal -> Nominal PP 0.35 VP -> Verb 0.20 VP -> Verb NP 0.10 VP -> Verb NP PP 0.15 VP -> Verb PP 0.05 VP -> Verb NP NP 0.15 VP -> VP PP 1.0 PP -> Preposition NP Det -> that [0.10] | a [0.30] | the [0.60] Noun -> book [0.10] | flight [0.30] | meal [0.15] | money [0.05] | flights [0.40] | dinner [0.10] Verb -> book [0.30] | include [0.30] | prefer [0.40] Pronoun -> i [0.40] | she [0.05] | me [0.15] | you [0.40] Proper-Noun -> houston [0.60] | twa [0.40] Aux -> does [0.60] | can [0.40] Preposition -> from [0.30] | to [0.30] | on [0.20] | near [0.15] | through [0.05]
The grammar provided has rules such as
VP -> Verb NP PP,
which has more than two non-terminals on the right hand side. However,
the CKY algorithm can only handle grammars in a binarized format such
as CNF. Therefore, you need to binarize the grammar before CKY decoding
can be executed. You should use the CNF conversion introduced in class.
You should be very careful that the binarization should be done in a way
that the probability stays the same for equivalent rules before and
Failing to conform to the the input/output requirement will result in a 5-point deduction.
Your script (for Python users) or executable jar (for Java users) must take two parameters:
If you use Python, your code will be tested as:
python prob-cky.py pcfg.txt "A test sentence ."
If you use Java, your code will be tested as:
java -cp yourname.jar cs2731.hw2.ProbCKY pcfg.txt "A test sentence ."
The output should be printed to the standard output stream. Print the following information
[S [NP [Pronoun I]] [VP [Verb book] [NP [Det a] [Nominal [Noun flight]]] [PP [Preposition to] [NP [Proper-Noun houston]]]]]
(Copy and paste this string into mshang.ca/syntree to visualize it. You will find this tool very useful throughout this homework. )
For Python users, include:
For Java users, include:
yourname.ziparchive which includes the Java source.
yourname.jarwhich is compiled from your source code. The jar should have a main class named