Chapter 20 Part 2 Study Guide Solutions ======================================= =====For this question, we'll use these senses from WordNet: S: (v) attack, assail (launch an attack or assault on; begin hostilities or start warfare with) "Hitler attacked Poland on September 1, 1939 and started World War II"; "Serbian forces assailed Bosnian towns all week" S: (v) attack, round, assail, lash out, snipe, assault (attack in speech or writing) "The editors of the left-leaning paper attacked the new House Speaker" S: (v) attack, aggress (take the initiative and go on the offensive) "The Serbs attacked the village at night"; "The visiting team started to attack" S: (v) assail, assault, set on, attack (attack someone physically or emotionally) "The mugger assaulted the woman"; "Nightmares assailed him regularly" S: (v) attack (set to work upon; turn one's energies vigorously to a task) "I attacked the problem as soon as I got out of bed" S: (v) attack (begin to injure) "The cancer cells are attacking his liver"; "Rust is attacking the metal" S: (v) accuse, impeach, incriminate, criminate (bring an accusation against; level a charge against) "The neighbors accused the man of spousal abuse" S: (v) charge, accuse (blame for, make a claim of wrongdoing or misbehavior against) "he charged the director with indifference" S: (v) engage, wage (carry on (wars, battles, or campaigns)) "Napoleon and Hitler waged war against all of Europe" S: (n) class, category, family (a collection of things sharing a common attribute) "there are two classes of detergents" S: (n) class, form, grade, course (a body of students who are taught together) "early morning classes are always sleepy" S: (n) class, stratum, social class, socio-economic class (people having the same social, economic, or educational status) "the working class"; "an emerging professional class" S: (n) course, course of study, course of instruction, class (education imparted in a series of lessons or meetings) "he took a course in basket weaving"; "flirting is not unknown in college classes" S: (n) class, division (a league ranked by quality) "he played baseball in class D for two years"; "Princeton is in the NCAA Division 1-AA" S: (n) class, year (a body of students who graduate together) "the class of '97"; "she was in my year at Hoehandle High" S: (n) class ((biology) a taxonomic group containing one or more orders) S: (n) class (elegance in dress or behavior) "she has a lot of class" S: (n) war, warfare (the waging of armed conflict against an enemy) "thousands of people were killed in the war" S: (n) war, warfare (an active struggle between competing entities) "a price war"; "a war of wits"; "diplomatic warfare" Q1. Suppose we are using Lesk algorithms to disambiguate the meanings of "attack", "acuse", "engage", "class", and "warfare" in the sentence: "The Republicans attacked Obama accusing him of waging class warfare" Q1.a: To apply the basic Lesk algorithm, you need to measure overlap between all combinations of all senses of all 5 words. How many measurements are needed in this case? 6 senses of 'attack' * 2 senses of 'accuse' * 1 sense of 'wage' * 8 senses of 'class' * 2 senses of 'war' = 192 comparisons. Note that even though "wage" only has one sense, you want to include it, because its definition could help disambiguate the other words. Extra information: How you would implement it: create a hashtable for each combination of senses. It would be indexed by all the words that appear at least once in any of the senses in that combination. The value for a word would be the total number of times that word appears in any of the definitions in the combination. At the end, sum the values in each hash table, and assign the senses in the combination with the highest sum. For example, consider the combination of the first sense of each of the words. The gloss is the part in the () S: (v) attack, assail (launch an attack or assault on; begin hostilities or start warfare with) "Hitler attacked Poland on September 1, 1939 and started World War II"; "Serbian forces assailed Bosnian towns all week" S: (v) accuse, impeach, incriminate, criminate (bring an accusation against; level a charge against) "The neighbors accused the man of spousal abuse" S: (v) engage, wage (carry on (wars, battles, or campaigns)) "Napoleon and Hitler waged war against all of Europe" S: (n) class, category, family (a collection of things sharing a common attribute) "there are two classes of detergents" S: (n) war, warfare (the waging of armed conflict against an enemy) "thousands of people were killed in the war" Ignoring stopwords (and using stems/lemmas): ct[launch] = 1 ct[attack] = 1 ct[assault] = 1 ct[hostility] = 1 ct[start] = 1 ct[warfare] = 1 ct[bring] = 1 ct[accusation] = 1 ct[against] = 3 ct[level] = 1 ct[charge] = 1 ct[carry] = 1 ct[wars] = 1 ct[battle] = 1 ct[campaign] = 1 ct[collection] = 1 ct[share] = 1 ct[common] = 1 ct[attribute] 1 ct[wage] = 1 ct[armed] = 1 ct[conflict] = 1 For this combination, we only have one word that repeats: "against". So, the value is not high. Now, go on and consider the other 191 combinations When you are done, you choose one combination, and assign those senses to the words. Q1.b: To apply the simplified Lesk algorithm, you need to measure overlap between each sense of each word and the words in the sentence. How many measurements are needed in this case? There are 6 words in the sentence that are in WordNet and that are not stopwords. So, we have 6 * (6 + 2 + 8 + 2) = 108. Note that I left out "wage" because it has only one sense, and simplified Lesk considers each sense separately. Extra information: For example, to disambiguate "attack": for each sense, count overlap between its definition and the sentence, and assign the sense with the highest score. Here's the sentence again, for reference: "The Republicans attacked Obama accusing him of waging class warfare" FIRST SENSE: S: (v) attack, assail (launch an attack or assault on; begin hostilities or start warfare with) "Hitler attacked Poland on September 1, 1939 and started World War II"; "Serbian forces assailed Bosnian towns all week" ct[republican] = 0 ct[attack] No need to count this one; this is the target word ct[accuse] ct[wage] ct[class] ct[warfare] = 1 !! "warfare" is in both the sentence and the definition. SECOND SENSE: S: (v) attack, round, assail, lash out, snipe, assault (attack in speech or writing) "The editors of the left-leaning paper attacked the new House Speaker" ct[republican] = 0 ct[attack] No need to count this one; this is the target word ct[accuse] = 0 ct[wage] = 0 ct[class] = 0 ct[warfare] = 0 THIRD SENSE: S: (v) attack, aggress (take the initiative and go on the offensive) "The Serbs attacked the village at night"; "The visiting team started to attack" ct[republican] = 0 ct[attack] No need to count this one; this is the target word ct[accuse] = 0 ct[wage] = 0 ct[class] = 0 ct[warfare] = 0 FOURTH SENSE: S: (v) assail, assault, set on, attack (attack someone physically or emotionally) "The mugger assaulted the woman"; "Nightmares assailed him regularly" ct[republican] = 0 ct[attack] No need to count this one; this is the target word ct[accuse] = 0 ct[wage] = 0 ct[class] = 0 ct[warfare] = 0 FIFTH SENSE: S: (v) attack (set to work upon; turn one's energies vigorously to a task) "I attacked the problem as soon as I got out of bed ct[republican] = 0 ct[attack] No need to count this one; this is the target word ct[accuse] = 0 ct[wage] = 0 ct[class] = 0 ct[warfare] = 0 SIXTH SENSE: S: (v) attack (begin to injure) "The cancer cells are attacking his liver"; "Rust is attacking the metal" ct[republican] = 0 ct[attack] No need to count this one; this is the target word ct[accuse] = 0 ct[wage] = 0 ct[class] = 0 ct[warfare] = 0 Assign the sense with the highest count: which is SENSE 1. (Which isn't correct; the second sense is!) Now, you are done disambiguating "attack". Go do the same for each of the other words. Note: some of the words in the sentence are similar to the words in the definition. For example, in SENSE ONE, "launch" and "wage" are fairly similar. So, another type of Lesk algorithm would consider similiarity, not requiring exact matches. Q1.c: What are the correct senses are for the words in this sentence? "The Republicans attacked Obama accusing him of waging class warfare" S: (v) attack, round, assail, lash out, snipe, assault (attack in speech or writing) "The editors of the left-leaning paper attacked the new House Speaker" S: (v) accuse, impeach, incriminate, criminate (bring an accusation against; level a charge against) "The neighbors accused the man of spousal abuse" S: (v) engage, wage (carry on (wars, battles, or campaigns)) "Napoleon and Hitler waged war against all of Europe" S: (n) class, stratum, social class, socio-economic class (people having the same social, economic, or educational status) "the working class"; "an emerging professional class" S: (n) war, warfare (an active struggle between competing entities) "a price war"; "a war of wits"; "diplomatic warfare" =====Simulated Annealing to carry out the Lesk Algorithm: You don't need to know the simulated annealing algorithm itself. But, you should know: For the application of simulated annealing to word-sense disamiguation given in lecture: - what is a state? - what are the neighbors of a state? - how is the objective function defined? =====Acquiring Selectional Preferences Q1: Describe in English the bottom two lines and given an example (of a C2, W1, and R). We calculate count(W1,C2,R) by counting how often any word that has C2 as an ancestor in WordNet appears in W1's R. Extra information: For example, count("drive","vehicle",direct object) is the sum over w2 [all words in the corpus that have at least one sense with ancestor "vehicle"] of w2 being the direct object of "drive" in the corpus. For example, "bumper car", "truck", "car", "sled", etc all have senses with "vehicle" as an ancestor. Add up #"drive bumper car" + # "drive truck" + # "drive sled" + # .... to get the count. To get the conditional probability, divide by the number of times "drive" appears with a direct object in the corpus. Q2: Now, consider the A score. Overall, what does it measure? A(W1,C2,R) measures how much W1 "prefers" its R to be a word that has a sense with ancestor C2. E.g., how much does "drive" prefer its direct object to be a vehicle? Q3: Give the basic idea of the calculation. You will not be asked to reproduce it, or to apply to a specific example (which would be complicated to do by hand). First, consider the numerator. That is an application of the "KL Divergence", which measures how different two probability distributions are. The fraction in the numerator compares P(C2|W1,R) to P(C2). The more different this is, the greater the value. In our example, we are comparing the probability of seeing a vehicle GIVEN that the word is the direct object of "drive", with the simple probability of seeing a vehicle (that any given word is "bumber car", or "truck", or "sled", ...) The greater the difference, the more "drive"/direct object selects for or prefers vehicles. This intuition is what you need to know. As for specifics (for those who want to go further), in the numerator (which is the KL divergence) has two factors: a weight, and a difference. The more probable something is, the more the difference is "counted". The denominator is the same as the numerator, except it is a sum over all the concepts we are considering. So, the denominator is a normalizing factor.