Diane Litman (TA is Yuhuan Jiang)
When & Where Tuesday and Thursdays 2:30-3:45, SENSQ 5313
Office Hours After class or by appointment
Description This course provides an introduction to the field of Natural Language Processing (NLP) - the creation of computer programs that can understand, generate, and learn natural language. Natural language understanding will be used as a vehicle to introduce three major subfields of NLP: syntax, semantics, and pragmatics. The course will introduce both knowledge-based and statistical methods for NLP, and will illustrate the use of such methods in a variety of application areas.

Prerequisites: CS 1501 OR consent of the instructor

Text: Speech and Language Processing by Jurafsky and Martin, 2nd Edition (errata). We will also sometimes use chapters in progress from the 3rd Edition draft.

Required Work Homeworks (30%): written and programming
Exams (30%): midterm and final
Group Course Project (30%): presentation and written report
Supplemental Research Papers (10%): leading discussion and class participation

Late Penalty: For assignments that may be accepted late, the penalty is 10% per day up to 5 days including Saturday, Sunday, and holidays. Assignments are due by 11:59pm.

J&M Readings

Assignments and Other Readings

January 5

Lecturer: Yuhuan Jiang

Ch 1 Homework 0: New shared document to edit for Homework 0. NOTE: proposed papers must be peer-reviewed (arxiv alone does not count) and must be SHORT papers.
January 10
Regular Expressions, Text Normalization, Edit Distance

Lecturer: Yuhuan Jiang

Ch 2
(3rd Edition)
Ken Church's tutorial Unix for Poets, pages 1-19

January 12
Finite State Transducers (ppt)
Ch 3  
January 17, 19
Language Modeling with N-Grams (pdf)
Ch 4
(3rd Edition)
(4.1-4.4, 4.6)
The final list of paper readings (from Homework 0) and the procedures for class discussion.

Presentation review form.

Assignment (due January 19, 2pm): Read and comment on this survey paper using NB.

January 24, 26
Part-of-Speech Tagging (pdf)
Ch 5

Optional: Ch 6

1/24: Homework 1 Assigned (due 2/9)

1/24: O'Brien

1/26: Zhu

Schoolhouse Rock for Conjunctions

January 31, February 2
Formal Grammars (pdf)
Ch 12 1/31: Jain
February 2
Syntactic Parsing (pdf)
Ch 13
February 7, 9
Statistical Parsing (pdf)

Lecturer (2/9): Yuhuan Jiang

Ch 14 2/7: Blake
February 14
Vector Semantics (pdf)

Lecturer: Huy Nguyen

Ch 15
(3rd Edition)
February 16
Semantics with Dense Vectors (pdf)
Ch 16
(3rd Edition)
2/16: Zhou
February 21, 23
Computing with Word Senses: WSD and WordNet (pdf1)
Ch 17
(3rd Edition)
2/21: Homework 2 Assigned (due 3/2)

2/21: Thaker

2/23: Lugini

2/23: Project Posted: Ideally, the class will be divided into 3-person or 2-person teams. If you really want to work by yourself, or have a larger team than 3 people, please talk to me first. Please send me your team composition by 3/2. There will also be a preliminary evaluation deadline on the development set towards the end of March.

February 28
Computing with Word Senses: WSD (pdf2)
Ch 17
(3rd Edition)
2/28: Nebbia
March 2
Lexicons for Sentiment and Affect Extraction (pdf)
Ch 18
(3rd Edition)
3/2: Ge

3/2: Homework 2 due

3/2: Project team members due

March 14
Semantic Role Labeling
Ch 22
(3rd Edition)
Notes on midterm

3/14: Zhang

3/15: monitored withdrawal deadline

March 16 Midterm Exam (closed book) Through Chapter 16


March 21
Semantic Role Labeling (continued) (pdf)
Ch 22
(3rd Edition)
3/21: Homework 3 (written) Assigned (due 4/4)
March 21, 23, 28
Information Extraction (pdf)
Ch 21
(3rd Edition)
3/28: Xu
March 28, 30, April 4
Computational Discourse (pdf)
Ch 21 3/30: Preliminary Project Evaluation due

3/30: Singla

4/4: Afrin

April 6, 11
Question Answering (pdf)

Summarization Introduction (pdf)

Ch 28
(3rd Edition)
4/6: Sun

4/11: Magooda

Watson documentary

April 13, 18, 20
Dialogue and Conversational Agents (pdf1, pdf2, pdf3)
Ch 24 Project Paper Instructions: Your paper should both describe your system (the architecture, components, etc.) and contain a discussion evaluating how well the version turned in for the final evaluation performed on Training and Test Sets (using the provided programs to compute performance). The conference papers that we have been reading are good models for your project paper. Papers should be NO LONGER THAN 4 pages (excluding references) using these LaTex or Word templates.

4/18: Miller

Why Amazon thought that the Mets David Wright was 234 years old (Washington Post, 4/18)

April 20/25
Fake News Project
4/18: Project code due

4/25: Project presentations (10 minutes per team)

4/25: Project papers due

April 27

Final Exam (not cumulative)


Acknowledgements: Some of the materials used in this course borrow from the NLP courses of Steven Bird, Julia Hirschberg, Rebecca Hwa, Dan Jurafsky, Chris Manning, James Martin, Johanna Moore, Dragomir Radev, Philip Resnick, Ellen Riloff.