Introduction to Natural Language Processing (CS 2731 / ISSP 2230), Fall 2002

Time: Tu Th 1:00-2:20  Place 6516 Sennott Square 
Professor:  Diane Litman Office Hours:  M 10-12 (741 LRDC), Tu 2:30-4:30 (5105 Sennott Square)
Email:  litman@cs.pitt.edu Phone:  412-624-8838 (Sennott Square); 412-624-1261 (LRDC)
TA: Ali Alanjawi Office Hours:  Tu 10-1, W 3:30-5, Th 11:30-1 (5501 Sennot Square)
Email:  alanjawi@cs.pitt.edu Phone:  412-624-8439

Description:

This course provides an introduction to the theory and practice of natural language processing (NLP) - the creation of computer programs that can understand, generate, and learn natural language. We will use natural language understanding as a vehicle to introduce the three major subfields of NLP: syntax (which concerns itself with determining the structure of a sentence), semantics (which concerns itself with determining the explicit meaning of a single sentence), and pragmatics (which concerns itself with deriving the implicit meaning of a sentence when it is used in a specific discourse context). The course will introduce both knowledge-based and statistical approaches to NLP, illustrate the use of NLP techniques and tools in a variety of application areas, and provide insight into many open research problems.

Prerequisites: CS 1501

Text:

Speech and Language Processing by Jurafsky and Martin (errata).

For a selection of topics, we will also read some current research paper(s). All students will be assigned a paper, and will lead the portion of class allotted to the discussion of that paper; the remaining students will email questions, which will form the basis of the discussion.

Requirements:

Concepts taught in class will be reinforced with assignments (both problem sets and programming), a project, and exams. Each student will also lead a paper discussion, and will send email questions as well as participate in the other discussions.

Grade Basis: homeworks (35%), project (25%), exams (35%), leading discussion & class participation (5%).

Late Penalty: For assignments that may be accepted late, the penalty is 10% per day up to 5 days including Saturday, Sunday, and holidays. Assignments are due at the start of class.

Announcements:

Grades

9th International Conference on User Modeling will be in Pittsburgh (more or less) next summer (CFP)

HLT-NAACL 2003: Human Language Technology 2003 / 3rd Conference of the North American Association for Computational Linguistics

Syllabus (evolving and subject to change!):

Topic Reading Assignments
Course Overview and Administration    
Knowledge of Language Ch 1  
Linguistic Background Handouts (optional)  
Regular Expressions and Automata Ch 2 Homework 1
Morphology and Finite State Transducers Ch 3 Final Reading List available
N-Grams Ch 6 (through 6.4)

van den Bosch & Daelemans

Litman discussion (9/24)
Part of Speech Tagging Ch 8

Chen & Goodman (NOTE: Focus on Sections 1, 4.1, 5, and 6. Just SKIM 2-3, and 4.2.)

Lu discussion (10/1)

Homework 2

Context-Free Grammars Ch 9  
Parsing with CFGs Ch 10  
Question Answering Hirschman, Light, Breck & Burger

Riloff & Thelen

Project Description available

Ma and Ringenberg discussions (10/10)

Features and Unification

The LCFlex Robust Parser

Ch 11

C. P. Rose and A. Lavie, Balancing Robustness and Efficiency in Unification-Augmented Contect-Free Parsers for Large Practical Applications , Robustness in Language and Speech Technology, J. C. Junqua and G. Van Noord (eds.), 2001, (Dr. Carolyn Rose, guest speaker, 10/17)

C. P. Rose, A. Roque, and D. Bhembe, An Efficient Incremental Architecture for Robust Interpretation (also 10/17)

Project Training Data available (10/10)
Representing Meaning (10/22) Ch 14  
Midterm Exam (10/24) Covers through Ch 11  
Semantic Analysis (10/29 - 31) Ch 15 (skip 15.2 though)

Srihari & Li

Riloff, Schafer, & Yarowksy

McKibbon and Penkrot discussions (10/31)
Lexical Semantics (11/5 - 7) Ch 16

Roark & Charniak

Baker, Fillmore & Lowe

Project Preliminary Evaluation due (11/5)

Rotaru and Gingrich discussions (11/7)

Word Sense Disambiguation (11/12-14) Relevant parts of Ch 17 (through 17.2)

Kilgarriff & Rosenzweig

Schiffman, Mani, & Concepcion

Kane and Kong discussions (11/12)

Homework 3

Discourse (11/14-21) Ch 18

Poesio, Cheng, Henschel, Hitzeman, Kibble & Stevenson

Jordan & VanLehn

Tseytlin and Gaddam discussions (11/19)
Dialogue and Conversational Agents (11/21-26, 12/3) Ch 19

Litman & Pan

Litman, Kearns, Singh, & Walker

Bhembe and Pelikan discussions (11/26)

HW 3 due (11/26)

Summing Up (12/3)   Project Final Evaluation (12/3)
Project Presentations (12/5 and 10)   Project Reports Due (12/5)
Final Exam (12/12) Covers Ch 1, and from Ch 14 on.

NO MAKEUPS

 

Academic Integrity:

Assignments must be your own individual work, unless explicitly stated otherwise. You must do the work without undue help from other people, and you must not present material from resources such as the Web, books, papers, code listings, and other people as your own. You may talk to each other about concepts and techniques, but you must not discuss specific solutions or approaches to solutions. Copying or paraphrasing someone's work, or permitting your own work to be copied or paraphrased, even in part, is not allowed and will result in an automatic grade of 0 for the assignment.

Interesting Links (besides resources available from J&M):

Chapters 1 and 2:

Classic NLP programs

Interview with "Ask Jeeves" (or, can a Q-A system participate in a conversation, thanks to Michael Ringenberg for this pointer!)

Chapter 3:

AT&T Labs - Research Finite State Machine Library

Chapter 8:

The LT POS HMM part of speech tagger

Chapter 11:

The LCFlex Parser

Michael Collins' Parser (requires a tagger to work).

Chapter 15:

Appelt and Israel's information extraction tutorial (IJCAI-99).

Chapter 16:

Framenet.

Chapter 19:

Allen's Dialogue Modeling for Spoken Language Systems tutorial (ACL Workshop 1997).

Hirschberg's Intonational Variation in Spoken Dialogue Systems tutorial.

Books on Reserve:

  • Natural Language Understanding, by James Allen, 1995.
  • Foundations of Statistical Natural Language Processing, by Christopher D. Manning and Hinrich Schutze, 1999.
  • A Comprehensive Grammar of English Language, by Randolf Quirk, Sidney Greenbaum, Geoffrey Leech, Jan Svartvik, 1985.

    Thanks:

    Some of the materials used in this course borrow from the NLP courses of James Martin, Dragomir Radev, Philip Resnick, Ellen Riloff, Johanna Moore, Julia Hirschberg, Steven Bird.

    Previous versions of this course:

  • Fall 2001