A partial Viterbi calculation is pictured here. This calculation takes us up through t=2 where v2(1) and v2(2) are computed. In the picture, the index 1 is used for the state labeled C and the index 2 is used for the state labeled H. Compute v3(1) and v3(2). You will need the transition and observation probabilities given here.
Think of this as filling in a table where the columns are moments in time and the rows are states in the HMM. Filling in the table with the numbers computed in the diagram above, and adding a column for time t = 0, and showing all the probability cells, it looks like this:
end | 0 | 0 | 0 | |
---|---|---|---|---|
H | 0 | .32 | .0448 | |
C | 0 | .02 | .048 | |
start | 1.0 | 0 | 0 | |
t = | 0 | 1 | 2 | 3 |
Each cell in the Viterbi table is filled with one of the Viterbi values computed in the diagram. Like the diagram, the table is complete through t=2. The values in the cells represent Viterbi probabilities. The Viterbi probability written as v2(2) repesents the probability of the highest probability path that ends at state 2 at time 2.
Implement a non-probabilistic CKY parser.
CLARIFICATIONS: Your program will need to convert your grammar to CNF if needed (as with the grammar below). You can't do this manually due to the blind testing of your program.
Here is an example grammar file (for the grammar below) that your program should be able to process. You can asssume that each word will have a separate rule (so you don't need to process disjunctions).
You can also assume that terminals begin with lowercase and non-terminals with uppercase, as in the example grammar.
CLARIFICATION: We will run your code on both new grammars and new test sentences.
0.80 S -> NP VP
0.15 S -> Aux NP VP
0.05 S -> VP
0.35 NP -> Pronoun
0.30 NP -> Proper-Noun
0.20 NP -> Det Nominal
0.15 NP -> Nominal
0.75 Nominal -> Noun
0.20 Nominal -> Nominal Noun
0.05 Nominal -> Nominal PP
0.35 VP -> Verb
0.20 VP -> Verb NP
0.10 VP -> Verb NP PP
0.15 VP -> Verb PP
0.05 VP -> Verb NP NP
0.15 VP -> VP PP
1.0 PP -> Preposition NP
Det -> that [0.10] | a [0.30] | the [0.60]
Noun -> book [0.10] | flight [0.30] | meal [0.15] | money [0.05] | flights [0.40] | dinner [0.10]
Verb -> book [0.30] | includes [0.30] | prefer [0.40]
Pronoun -> i [0.40] | she [0.05] | me [0.15] | you [0.40]
Proper-Noun -> houston [0.60] | twa [0.40]
Aux -> does [0.60] | can [0.40]
Preposition -> from [0.30] | to [0.30] | on [0.20] | near [0.15] | through [0.05]
Your script (for Python users) or executable jar (for Java users) must take two parameters:
If you use Python, your code will be tested as:
python cky.py cfg.txt "A test sentence"
If you use Java, your code will be tested as:
java -cp yourname.jar cs1671.hw2.CKY cfg.txt "A test sentence"
The output should be printed to the standard output stream. Print all of the parse trees for the sentence in the following bracket-based format:
[S [NP [Pronoun I]] [VP [Verb book] [NP [Det a] [Nominal [Noun flight]]] [PP [Preposition to] [NP [Proper-Noun houston]]]]]
(Copy and paste this string into mshang.ca/syntree to visualize it. You will find this tool very useful throughout this homework. )
For Python users, include:
cky.py
.readme.txt
which includes:
For Java users, include:
yourname.zip
archive which includes the Java source.yourname.jar
which is compiled from your source code. The jar should have a main class named cs1671.hw2.CKY
.readme.txt
which includes:
The probabilistic grammar provided has rules such as VP -> Verb NP PP
,
which has more than two non-terminals on the right hand side.
S -> NP VP 1.0
PP -> P NP 1.0
VP -> V NP 0.7
VP -> VP PP 0.3
P -> with 1.0
V -> saw 1.0
NP -> NP PP 0.4
NP -> scientists 0.1
NP -> chins 0.18
NP -> saw 0.04
NP -> moons 0.18
NP -> telescopes 0.1