Course Project

There are a variety of options for the project:

  1. A community task. This project could use any publically available materials developed for some of the papers discussed in class, where you try to beat the current results. You can also try to beat prior work using your own data, or my data. For example, some educational data is publically available via the web, and you can also get other data from me.
  2. Implement and evaluate an algorithm that performs some type of (spoken or) natural language processing, motivated by a user-generated content application. You may wish to use some of the data that is available for the community task project type, but in a new way.
  3. Use discourse knowledge to enhance a user-generated content system. Processing may be fully automatic, or your system may take manual annotations as input.
  4. A corpus annotation project. This type of project must be done in pairs. It will involve developing annotation instructions, gathering or using a corpus, performing a training round of annotation, discussing the results with each other, revising the annotation instructions, and then annotating a fresh test set. Inter-coder reliability should be reported (percentage agreement and Kappa). The amount of data annotated need not be large.
  5. Make me a proposal.

All of the following deadlines must be met to receive credit on the project, with all written materials submitted electronically using IVLE:

All submissions are due to IVLE by 11:59:59 pm (Singapore time) on the due date. No exceptions without documentation such as severe illness will be made. Otherwise the late penalty is 10% per day up to 5 days including Saturday, Sunday, and holidays.

Projects can be done in small groups (pairs are ideal), although I am willing to consider individual projects or teams larger than two if the project can support it.

Note that if you are ambitious and choose carefully, you might be able to submit your final paper to a relevant venue (e.g the NAACL deadline is January 6).