(a) Jurafsky and Martin, 19.6-19.7 (p. 635), but just for the verb "buy". Your corpus should be big enough to verify that your selectional restrictions work on several examples, and should also contain examples illustrating where they fall apart. Collect your corpus from any source you like (e.g. online newspapers, the web, online corpora).
(b) Jurafsky and Martin, 20.1-20.3 (p. 679) , but using the WordNet sense inventory and only the following as your corpus: At 05:20:59 GMT this morning, the Echostar XI satellite was successfully launched.
What you should do is to use the one sense per collocation heuristic to automatically generate the initial seeds (by choosing a single collocation for each sense). Pick 5 examples of each sense to create the seeds, similarly to Figure 20.5 on page 651. Discuss how accurate this heuristic was. Once you have your seed set, you can manually make a decision list, and try it on the rest of the corpus. Discuss how well it does (qualitatively is fine). You can stop after this one cycle.
(a) Assume we have a corpus of 1000 words and the following WordNet Hierarchy:
Now assume we collect the following count data for each of the words:
(b) Assume we have the following co-occurence vectors for the words, "fish" and "bird":
The numbers above represent the count for the context relation to the left. Assuming that counts not listed are 0, calculate the cosine similarity (using counts, not PMI) for "fish" and "bird". Round your answer to two decimal places (e.g. 0.35)
(a) (9 points) Jurafsky and Martin, 23.1 (p. 810) , but use only semantic role labeling as the basis of your approach to a more intellgient system.
(b) (8 points) Jurafsky and Martin, 22.3 (p. 763)
(c) (8 points) Jurafsky and Martin, 22.4 (p. 763)