Homework 2 (CS 2731 / ISSP 2230)

Assigned: February 21, 2017

Due: March 2, 2017 (midnight)

2.1 HMM Decoding (Viterbi) (50 points)

Implement the Viterbi algorithm (Fig. 5.17 on page 147 in Jurafsky and Martin). An example of the model file for Figure 6.3 (Comments shown here are NOT part of the file), download it from here.
Alphabet
3				#number of alphabets
1	2	3		#alphabets, split by '\t'

States
2				#number of states
HOT	COLD			#states, split by '\t'

StartProbability		#prior probabilities of the states, the order is the same as before
0.8	0.2

TransitionProbability		#transition probability matrix(T), N*N, N is the number of states, T[i][j] = p(s_j|s_i)
0.7	0.3
0.4	0.6

EmissionProbability		#emission probability matrix(E), N*M, N is the number of states, M is the number of alphabets. E[i][j] = p(a_j|s_i)
0.2	0.4	0.4
0.5	0.4	0.1

2.2    Probabilistic CKY Parsing (50 Points)

Implement a probabilistic CKY parser.

Grammar File

The sample grammar file pcfg.txt (click here to download) provided is exactly the same as the probabilistic L1 grammar from the textbook.

0.80 S -> NP VP
0.15 S -> Aux NP VP
0.05 S -> VP
0.35 NP -> Pronoun
0.30 NP -> Proper-Noun
0.20 NP -> Det Nominal
0.15 NP -> Nominal
0.75 Nominal -> Noun
0.20 Nominal -> Nominal Noun
0.05 Nominal -> Nominal PP
0.35 VP -> Verb
0.20 VP -> Verb NP
0.10 VP -> Verb NP PP
0.15 VP -> Verb PP
0.05 VP -> Verb NP NP
0.15 VP -> VP PP
1.0 PP -> Preposition NP
Det -> that [0.10] | a [0.30] | the [0.60]
Noun -> book [0.10] | flight [0.30] | meal [0.15] | money [0.05] | flights [0.40] | dinner [0.10]
Verb -> book [0.30] | include [0.30] | prefer [0.40]
Pronoun -> i [0.40] | she [0.05] | me [0.15] | you [0.40]
Proper-Noun -> houston [0.60] | twa [0.40]
Aux -> does [0.60] | can [0.40]
Preposition -> from [0.30] | to [0.30] | on [0.20] | near [0.15] | through [0.05]

Binarization

The grammar provided has rules such as VP -> Verb NP PP, which has more than two non-terminals on the right hand side. However, the CKY algorithm can only handle grammars in a binarized format such as CNF. Therefore, you need to binarize the grammar before CKY decoding can be executed. You should use the CNF conversion introduced in class. You should be very careful that the binarization should be done in a way that the probability stays the same for equivalent rules before and after binarization.

Input/Output Requirements

⚠️ Warning
Failing to conform to the the input/output requirement will result in a 5-point deduction.

Your script (for Python users) or executable jar (for Java users) must take two parameters:

If you use Python, your code will be tested as:

python prob-cky.py pcfg.txt "A test sentence ."

If you use Java, your code will be tested as:

java -cp yourname.jar cs2731.hw2.ProbCKY pcfg.txt "A test sentence ."

The output should be printed to the standard output stream. Print the following information

What to Include in Submission?

For Python users, include:

For Java users, include: