Project (CS 2731 / ISSP 2230), Fall 2013

The project for this class will be to design, build, and evaluate a Student Response Analysis (SRA) system. This will give you exposure to a cutting edge research area, and experience in building a real NLP system. The task offers the opportunity to evaluate the usefulness of approaches for semantic analysis in a practical application-oriented setting. Current e-learning systems have limited capability for giving students feedback and providing automatic assessment since there is no established technology for assessing natural language responses to questions.

Ideally, the class will be divided into 3-person or 2-person teams for the project. You may form your own team if you know people with whom you'd like to work. Otherwise, I can randomly assign you to a team. If you really want to work by yourself, or have a larger team, that is also possible.

Project Description

Full details

Project Resources

Training Set

\\afs\cs.pitt.edu\usr0\wencan\public\cs2731\SemEval\train

Tools

\\afs\cs.pitt.edu\usr0\wencan\public\cs2731\SemEval\Scripts

Alternatively, the above files can be directly obtained from SemEval_train.zip.

Your course project systems are also free to use any external resources such as WordNet, Wikipedia, Science Dictionary, etc.

Sample Student Response Analysis Papers

Previous approaches to student response analysis include methods based on text classification, latent semantic analysis and other semantic similarity measurements, textual entailment, and, in small domains, parsing and rule-based methods.

In particular, SRA can be cast as a form of recognizing textual entailment (Nielsen et al., 2008). However, educational domains present additional challenges. The text produced by students is often ill-formed and contains many typos and grammar mistakes, and often contains domain terminology that is rare in common training data sets. Collecting and annotating data sets for every new subject domain is expensive. Thus, the task is likely to require adapting existing approaches trained on large amounts of out-of-domain data to a new domain based on a small amount of in-domain training data, and implementing methods to deal with irregularities found in (typed) student input.

Alternatively, approaches used in essay scoring, such as LSA, could be transferred and adapted for use with shorter student answers provided in the task. The likely challenges include the shorter length of student answers, and the need to output categorical labels (chosen intentionally to help guide feedback decisions) rather than numeric grades commonly used in essay scoring.

Below is a sampling of the literature:

Graesser, A.C., Penumatsa, P., Ventura, M., Cai, Z., & Hu, X. (2007). Using LSA in AutoTutor: Learning through mixed initiative dialogue in natural language. In T. Landauer, D. McNamara, S. Dennis, and W. Kintsch (Eds.), Handbook of Latent Semantic Analysis (pp. 243-262). Mahwah, NJ: Erlbaum.
Pamela Jordan, Maxim Makatchev and Kurt VanLehn.(2004). Combining Competing Language Understanding Approaches in an Intelligent Tutoring System. In Proceedings of Intelligent Tutoring Systems Conference, Maceo, Brazil, 2004, Springer LNCS, vol 3220, pp 346-357.
Philip M. McCarthy, Vasile Rus, Scott A. Crossley, Arthur C. Graesser, Danielle S. McNamara: Assessing Forward-, Reverse-, and Average-Entailer Indices on Natural Language Input from the Intelligent Tutoring System. iSTART. FLAIRS Conference 2008: 165-170
Rodney D. Nielsen, Wayne Ward, James H. Martin and Martha Palmer. (2008). Annotating students' understanding of science concepts. In Proceedings of the Sixth International Language Resources and Evaluation Conference, (LREC'08), Marrakech, Morocco, May 28-30, 2008. Published by the European Language Resources Association, (ELRA), Paris, France.
Rodney D. Nielsen, Wayne Ward and James H. Martin. (2008). Learning to assess low-level conceptual understanding. In David Wilson and H. Chad Lane (Eds.): Proceedings of the Twenty-First International Artificial Intelligence Researchers Society Conference, (FLAIRS-08), pp 427-432, Coconut Grove, Florida, May 15-17, 2008. Published by the Association for the Advancement of Artificial Intelligence, (AAAI Press), Menlo Park, California.
Myroslava O. Dzikovska, Rodney D. Nielsen, Chris Brew. Towards Effective Tutorial Feedback for Explanation Questions: A Dataset and Baselines. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012). Jun 4-6, 2012. Montreal, Canada. PDF
Myroslava Dzikovska, Gwendolyn Campbell, Charles Callaway, Natalie Steinhauser, Elaine Farrow, Johanna Moore, Leslie Butler and Colin Matheson. Diagnosing natural language answers to support adaptive tutoring. In Proceedings of the 21st FLAIRS Conference, Miami, Florida, May 2008.

Program Submission Instructions

Use the same instructions as for the homeworks. Please read these additional guidelines:

If your program needs to be compiled in order to run, please compile it using any of the elements machines (you will have to do this for C, C++, Java for example). If you are not in CS use the university linux machines. Please specify in the README which file we should run and on which machines. For scripting-like languages (Python) there is no need to provide a executable file, but do specify the version. If you do your project in Windows, make sure you send the EXE file and instructions on how to run it.
If you use third party applications in your project (like POS taggers, named-entity recognizers), install them in your account under a public directory (so Wencan can access them when he runs your project) and use absolute paths when calling those applications (ex: (/afs/cs.pitt.edu/usr0/wencan/public/cs2731/taggerV1.14/). Note that this applies only if you do your application in Linux using an elements machine. Also, when compiling the third party applications, again use the elements machines. If you do your project in Windows you will have to provide links on what to download and how to install.

Project Report Instructions

Your writeup should both describe your system (the architecture, components, etc.), and contain a discussion evaluating how well the version turned in for the final evaluation performed on Training Set (and if the results are available, on Test Set). Use the provided programs to compute performance. The conference papers that we have been reading are good models for your project paper. Papers should be NO LONGER THAN 4 pages (excluding references) using ACL 2013 Style Files (available at http://acl2013.org/site/call.html)

Project Presentation Instructions

Each team should prepare a short oral powerpoint presentation to share the contents of your project report with the rest of the class. Presentations should be 10-12 minutes (subject to change once I know how many teams there are). I will cut them off at 12 minutes to allow 3 minutes for questions.