Chapter 14, Part 2 Study Guide. =============================== This is a mixture of things to study, and questions to answer. **F** Possibilities for the final The questions are prefaced by Q: The answers are in the separate file of solutions. **F** Why is lexicalization of PCFG's important? **F** A question that confirms that you know what a head is, and the fact that heads propagate up the tree. That is, a node has a head child, and its head is the head of that child (see slide 7). **F** Be able to give specific counts that serve as estimates of the probability. E.g., for LHS --> RHS, the estimate is (# LHS rewritten as RHS / # LHS). Given rules such as on slide 10 and some sample data, be able to give the counts (for both types of rules). Know that, in Collins parsing, the head is generated, then the siblings are generated, assuming they are conditionally independent given the head. Given the probabilities on slide 14, argue that they show this conditional independence assumption. **F** Just know that to make lexicalized parsers work, we need to make independence assumptions, because without them, we couldn't possibly find enough data. There won't be further questions on the final about the Collins parser, which is an example of one such parser. **F** Explain the basic idea and motivation for splitting non-terminals (slides 17-19) **F** Given a context-free grammar, show the same grammar but with non-terminals split using parent annotation (slide 18) You don't need to know how to define probabilities for such grammars. Consider the following small Treebank. S NP John VP V1 said SBAR COMP that S NP Sally VP VP V2 snored ADVP loudly S NP Sally VP V1 declared SBAR COMP that S NP Bill VP VP V2 ran ADVP quickly S NP Fred VP V1 pronounced SBAR COMP that S NP Jeff VP VP V2 swam ADVP elegantly Q: Suppose you decide to build a PCFG from this treebank. Show the (non-lexicalized) PCFG that would be derived from this treebank, including the probabilities assigned to the rules. **F** This question is too long for the final. There may be a question asking you to show you understand the solution. Q:**F** Show two parse trees for "Jeff pronounced that Fred snored loudly", and calculate their probabilities under the PCFG. Q:**F** What difference in meaning comes from the syntactic differences? Q:**F** While we are at it, let's go back to the midterm, and consider the four parsewere the answers to one of the questions. Express them so we can tell the differences in their meanings. (S (NP (N GUARD)) (VP (VP (V RUNS)) (PP (PREP LIKE) (NP (N GOLD))))) (S (NP (ADJ GUARD) (NP (N RUNS))) (VP (V LIKE) (NP (N GOLD)))) (S (VP (VP (V GUARD) (NP (N RUNS))) (PP (PREP LIKE) (NP (N GOLD))))) (S (VP (V GUARD) (NP (NP (N RUNS)) (PP (PREP LIKE) (NP (N GOLD)))))) For continuing practice, make sure you understand the parse trees on slides 21, 22. Given a parse tree produced by a system and a gold-standard parse tree for a particular sentence, be able to calculate precision, recall, and F1. (This will not be on the final) **F** Know the idea of discriminative reranking and the motivation for it. Why is it possible that the maximum possible performance is less than 100%? **F** Be able to convert a lexicalized phrase-structure parse tree into a dependency parse tree (as on slide 32); this was covered more fully in lecture. ****From THIS POINT ON WILL NOT BE ON THE FINAL**** Q: Consider Slides 13 and 14 in chapter14part2.ppt. Slide 13 has an example rule, and slide 14 shows, in general, the probabilities to be calculated. Give the probabilities for the example on Slide 13 (you don't have the numbers; just write down the expression). Q: This rule is for the structure in which the PP attaches to the verb, i.e., money was used in the buying process. The rules for the alternative, in which the PP attaches to the NP are: VP(bought,V) --> V(bought,V) NP(book,NN) NP(book,NN) --> NN(book,NN) PP(with,Prep) Give the probabilities for these two rules. What are the key probabilities for determining where the PP attaches? Q: In these examples, the head word and the head NT are the same. But, often they are not the same. For example here is a rule for "61 years old" as in Pierre Vinken, 61 years old: ADJP(years,NNS) --> NP(years,NNS) JJ(old,JJ) The head is the NP. Its NT is NP, but its tag is NNS. Give the probabilities for this rule. For the Collins parser: know what is covered in class and the questions above. We are not covering the specific method for actually finding the most likely parse. On slide 15, you don't need to know this specific formula, but you need to know the idea - each term considers less information. The lamdas are weights determining how much each term is considered.