HOMEWORK 1 (CS 2731 / ISSP 2230)

Assigned: September 5, 2002

Due: September 19, 2002

Exercises

  1. Knowledge of Language (20 points)

    Inference is an essential part of natural language understanding. Informally, we define an inference as an assumption that is not explicitly stated but that most people would make during the understanding process. This might involve disambiguation or factual assumptions. Note that inferences can be wrong!

    For this question, you should list the inferences that most people would make while reading the following story: "John got up one morning and discovered his power was out. Unable to shave, he called his next door neighbor and asked if he could come over to borrow the bathroom. But everyone on the street was out. So John drove to work hoping no one would see him before he found a bathroom with hot water. Unfortunately, he ran into his boss on the elevator. He half-expected to find a pink slip in his mailbox the next day."

    1. Identify as many specific inferences as you can.


    2. For each inference, state the category of knowledge that a computer would need to make the inference: phonetics and phonology, morphology, syntax, semantics, pragmatics, and/or discourse. Be sure your inferences illustrate all of the categories.

    Note that there is no "right" answer to this question. You and your classmates will likely generate a different set and different number of inferences. The assignment will be graded based upon how well your examples illustrate that inference is ubiquitous (so if you only find a few inferences then you should look harder), and on your characterizations of the types of knowledge required to make the inferences.

  2. Regular Expressions and Automata (80 points)

    1. Jurafsky & Martin 2.4 (10 points)

    2. Jurafsky & Martin 2.5 (10 points)

    3. Jurafsky & Martin 2.6. Use American time expressions (i.e., "1 PM" or "1:00 in the afternoon", not "13:00") (10 points)

    4. A time/date tagger (50 points)
    5. Using the FSA's you've just designed, write a program in a language of your choice that puts XML-like tags around time and date specifications. For example:

      • INPUT: a text in English.


      • OUTPUT: the same text with all date and time expressions marked by <TIME> and </TIME> (for both dates and times).


      • SAMPLE INPUT: Christmas is celebrated on the 25th of December. Christmas Eve is celebrated the night before.


      • SAMPLE OUTPUT: <TIME> Christmas </TIME> is celebrated on <TIME> the 25th of December </TIME>. <TIME> Christmas Eve </TIME> is celebrated <TIME> the night before </TIME> .


      • SUBMIT: (documented) source code; output of your program on some training files; and a README file listing all time and date expressions that your program can handle, and instructions about how to run your program.
      • GRADING: Your program will be run on a unseen test file to evaluate its generality and correctness.
      • CLARIFICATION (added September 9): For parts a-c, please use the graphical FSA representation. For part d, if you need to convert to regular expressions or FSA tables, please be sure to include these versions explicitly in your code or README.
      • CLARIFICATION (added September 12): I/O elaboration.