CS2750: Homework 4
Due: 4/16/2017, 11:59pm
Note: If you are asked to implement something by yourself, it is not ok to use or even look at existing Matlab or Python code, unless it's utility code. If you have questions about what you can use, ask the instructor or the TA.
Part I: Bayesian Belief Networks / Written Answers (50 points)
Part II: Hidden Markov Models for Part-of-Speech Tagging (50 points)
- [10 pts] Bishop Exercise 8.10 -- Note: Derive these conclusions mathematically, rather than using d-separation. For the second part, simplify the expression as much as possible before concluding that in general the statement does not hold.
- [15 pts] Bishop Exercise 8.11
- [10 pts] Bishop Exercise 13.3
- [15 pts] In this exercise, we'll do some cross-domain recommendation, where we assume that there is a correlation between a user's taste in music and film. We'll only consider one music genre, namely jazz (which we'll denote by J), and four films, "Waking Life" (denoted by W), "Borat" (denoted by B), "Cinema Paradiso" (denoted by C) and "Requiem for a Dream" (denoted by R). We'll assume that conditioned on whether the user likes jazz, the movie likes/dislikes are independent. The prior probability of liking jazz is 30%. We've defined the following (combined) conditional probability table, where "=1" means "likes". We are conditioning on the first column.
|J=1 ||W=1 ||B=1 ||C=1 ||R=1 |
|T ||80 ||20 ||70 ||50 |
|F ||30 ||50 ||30 ||40 |
What is the probability the user likes jazz, given that she likes the first and fourth movies but dislikes the second and third?
How about the probability that the user likes jazz, given that she likes all the movies?
We'll use the HMM from our in-class part-of-speech tagging example, whose states are PropNoun, Noun, Verb, Det. The transition probabilities are the same as in the example shown in class. The observation probabilities are defined as follows:
|State/Observation ||john ||mary ||cat ||saw ||ate ||a ||the |
|PropNoun ||0.40 ||0.40 ||0.10 ||0.01 ||0.05 ||0.03 ||0.01 |
|Noun ||0.25 ||0.05 ||0.30 ||0.25 ||0.05 ||0.05 ||0.05 |
|Verb ||0.04 ||0.05 ||0.04 ||0.45 ||0.40 ||0.01 ||0.01 |
|Det ||0.01 ||0.01 ||0.01 ||0.01 ||0.01 ||0.45 ||0.50 |
- [20 pts] Write code to compute the probability of observing each of the following sentences, using the naive solution. Then pick some of the sentences and discuss what you observe about which of them seem more likely than others, and whether what you observe makes sense. -- Some tips: You can map each word to a number that is its index into our vocabulary (the union of the column headers above, except the first one); then a sentence is just a vector of numbers. It's fine to look for code that computes combinations with replacement, to get your list of possible state sequences.
- "john saw the cat." (or using our mapping to numbers, sent = [1 4 7 3];)
- "john ate."
- "john saw mary."
- "mary saw john."
- "cat saw the john."
- "john saw the saw."
- "john ate the cat."
- [20 pts] Now write code to do the same task, but using the efficient solution discussed in class. If you include the probability of transfering to the end state in the naive solution, make sure you also include it in the efficient solution. Make sure your efficient solution and your naive solution produce the same answer.
- [10 pts] Carry out the computations for the efficient solution by hand, just for the sentence "john ate.", and show your work. Then check your answer with the answer you got from your program.