HOMEWORK 3 (CS 2731 / ISSP 2230)

Assigned: March 21, 2017

Due: April 4, 2017

  1. (25 points)

    (a) Jurafsky and Martin, 19.6-19.7 (p. 635), but just for the verb "buy". Your corpus should be big enough to verify that your selectional restrictions work on several examples, and should also contain examples illustrating where they fall apart. Collect your corpus from any source you like (e.g. online newspapers, the web, online corpora).

    (b) Jurafsky and Martin, 20.1-20.3 (p. 679) , but using the WordNet sense inventory and only the following as your corpus: At 05:20:59 GMT this morning, the Echostar XI satellite was successfully launched.

  2. (25 points) Use the "one sense per collocation" bootstrapping approach to seed a training set with 5 sentences (from a corpus of your choosing), for TWO of the following WordNet senses of the noun "racket":

    1. a loud and disturbing noise
    2. an illegal enterprise (such as extortion or fraud or drug peddling or prostitution) carried on for profit
    3. a sports implement (usually consisting of a handle and an oval frame with a tightly interlaced network of strings) used to strike a ball (or shuttlecock) in various games

    What you should do is to use the one sense per collocation heuristic to automatically generate the initial seeds (by choosing a single collocation for each sense). Pick 5 examples of each sense to create the seeds, similarly to Figure 20.5 on page 651. Discuss how accurate this heuristic was. Once you have your seed set, you can manually make a decision list, and try it on the rest of the corpus. Discuss how well it does (qualitatively is fine). You can stop after this one cycle.

  3. (25 points)

    (a) Assume we have a corpus of 1000 words and the following WordNet Hierarchy:

    Now assume we collect the following count data for each of the words:

    Assuming that all other words do not appear in the corpus, what is sim_resnick(J,P)? You can leave log in your answer rather than reduce to a number.

    (b) Assume we have the following co-occurence vectors for the words, "fish" and "bird":

    subj-of-A 3
    mod-of-B 2
    obj-of-B 4
    mod-of-C 2

    subj-of-A 3
    subj-of-D 1
    mod-of-C 4

    The numbers above represent the count for the context relation to the left. Assuming that counts not listed are 0, calculate the cosine similarity (using counts, not PMI) for "fish" and "bird". Round your answer to two decimal places (e.g. 0.35)

  4. (25 points)

    (a) (9 points) Jurafsky and Martin, 23.1 (p. 810) , but use only semantic role labeling as the basis of your approach to a more intellgient system.

    (b) (8 points) Jurafsky and Martin, 22.3 (p. 763)

    (c) (8 points) Jurafsky and Martin, 22.4 (p. 763)