Course Project

There are four options for the project:

  1. A community task. This project could use any publically available materials developed for some of the papers discussed in class, where you try to beat the current results. You can also try to beat prior work using your own data, or my data. For example, some ITSPOKE data is publically available via the PSLC, and you can also get other data from me. Chris Schunn is willing to make his multi-party student team dialogues available (10/23 optional paper).
  2. Implement and evaluate an algorithm that performs some type of spoken or natural language processing, motivated by a social/interactional application. You may wish to use some of the data that is available for the community task project type, but in a new way.
  3. Use linguistic knowledge to enhance a social/interactional application system. Processing may be fully automatic, or your system may take manual annotations as input.
  4. A corpus annotation project. This type of project must be done in pairs. It will involve developing annotation instructions, gathering or using a corpus, performing a training round of annotation, discussing the results with each other, revising the annotation instructions, and then annotating a fresh test set. Inter-coder reliability should be reported (percentage agreement and Kappa). The amount of data annotated need not be large.

All of the following deadlines must be met to receive credit on the project, with all written materials submitted electronically using Blackboard:

Projects can be done in small groups (pairs are ideal), although I am willing to consider individual projects or teams larger than two if the project can support it.

Note that if you are ambitious and choose carefully, you might be able to submit your final paper to a relevant venue (e.g the NAACL deadline is December 10).