# Homework 2 (CS 2731 / ISSP 2230)

## 2.1 HMM Decoding (Viterbi) (50 points)

Implement the Viterbi algorithm (Fig. 5.17 on page 147 in Jurafsky and Martin).
• (35 points) Demonstrate the correctness of your implementation by running it with the HMM in Figure 6.3 on page 178 to compute the most likely weather sequence for the following observation sequence: 331122313
• (15 points) We will also test your implementation on other blind tests (including same HMM but new observation sequence, and a new HMM).
• Your program should accept two command line parameters. The first parameter is the filename of the HMM model; the second one is the input observation sequence you want to decode. Your program should output to standard output the decoded results (sequence of the hidden states). An example is shown below.
• ./hmm_decode model.txt 331122313
An example of the model file for Figure 6.3 (Comments shown here are NOT part of the file), download it from here.
Alphabet
3				#number of alphabets
1	2	3		#alphabets, split by '\t'

States
2				#number of states
HOT	COLD			#states, split by '\t'

StartProbability		#prior probabilities of the states, the order is the same as before
0.8	0.2

TransitionProbability		#transition probability matrix(T), N*N, N is the number of states, T[i][j] = p(s_j|s_i)
0.7	0.3
0.4	0.6

EmissionProbability		#emission probability matrix(E), N*M, N is the number of states, M is the number of alphabets. E[i][j] = p(a_j|s_i)
0.2	0.4	0.4
0.5	0.4	0.1


## 2.2    Probabilistic CKY Parsing (50 Points)

Implement a probabilistic CKY parser.

• (35 points) Demonstrate the correctness of your implementation by running it with the grammar below and the input "The flight includes a meal."
• (15 points) We will also test your implementation on other blind tests.

#### Grammar File

The sample grammar file pcfg.txt (click here to download) provided is exactly the same as the probabilistic ${\mathcal{L}}_{1}$$\mathcal{L}_1$ grammar from the textbook.

0.80 S -> NP VP
0.15 S -> Aux NP VP
0.05 S -> VP
0.35 NP -> Pronoun
0.30 NP -> Proper-Noun
0.20 NP -> Det Nominal
0.15 NP -> Nominal
0.75 Nominal -> Noun
0.20 Nominal -> Nominal Noun
0.05 Nominal -> Nominal PP
0.35 VP -> Verb
0.20 VP -> Verb NP
0.10 VP -> Verb NP PP
0.15 VP -> Verb PP
0.05 VP -> Verb NP NP
0.15 VP -> VP PP
1.0 PP -> Preposition NP
Det -> that [0.10] | a [0.30] | the [0.60]
Noun -> book [0.10] | flight [0.30] | meal [0.15] | money [0.05] | flights [0.40] | dinner [0.10]
Verb -> book [0.30] | include [0.30] | prefer [0.40]
Pronoun -> i [0.40] | she [0.05] | me [0.15] | you [0.40]
Proper-Noun -> houston [0.60] | twa [0.40]
Aux -> does [0.60] | can [0.40]
Preposition -> from [0.30] | to [0.30] | on [0.20] | near [0.15] | through [0.05]

#### Binarization

The grammar provided has rules such as VP -> Verb NP PP, which has more than two non-terminals on the right hand side. However, the CKY algorithm can only handle grammars in a binarized format such as CNF. Therefore, you need to binarize the grammar before CKY decoding can be executed. You should use the CNF conversion introduced in class. You should be very careful that the binarization should be done in a way that the probability stays the same for equivalent rules before and after binarization.

#### Input/Output Requirements

⚠️ Warning
Failing to conform to the the input/output requirement will result in a 5-point deduction.

Your script (for Python users) or executable jar (for Java users) must take two parameters:

• The grammar file
• The sentence to parse (which will be surrounded by double quotes)

If you use Python, your code will be tested as:

python prob-cky.py pcfg.txt "A test sentence ."

If you use Java, your code will be tested as:

java -cp yourname.jar cs2731.hw2.ProbCKY pcfg.txt "A test sentence ."

The output should be printed to the standard output stream. Print the following information

• What is the probability of the sentence? Just print the number.
• What are the parse trees for the sentence? Print each tree in the s-expression format. Also print the probability of each tree immediately below the tree. The s-expression is a bracket-based format for parse trees. An example:

Example
[S [NP [Pronoun I]] [VP [Verb book] [NP [Det a] [Nominal [Noun flight]]] [PP [Preposition to] [NP [Proper-Noun houston]]]]]

(Copy and paste this string into mshang.ca/syntree to visualize it. You will find this tool very useful throughout this homework. )

### What to Include in Submission?

For Python users, include:

• The python source file: prob-cky.py.
• A readme.txt which includes:
• Python version (2 or 3)
• How did you do binarization?
• Any known issues that prevent your script from running.

For Java users, include:

• A yourname.zip archive which includes the Java source.
• A yourname.jar which is compiled from your source code. The jar should have a main class named cs2731.hw2.ProbCKY.
• A readme.txt which includes:
• How did you do binarization?
• Any known issues that prevent your code from running.