The project for this class will be to design, build, and evaluate a question answering system. This will give you exposure to a cutting edge research area, and experience in building a real NLP system.
For this question answering (QA) project, we will use the "CBC Reading Comprehension Corpus". This corpus consists of 125 news stories, each accompanied by a set of approximately 6-10 "Reading Comprehension" questions of the Who, What, When, Where, How, and Why variety. The news stories themselves were obtained from the "CBC 4 Kids" website, hosted by the Canadian Broadcast Corporation. The questions and an answer key were added by the MITRE Corporation, and are in the style of actual reading comprehension tests that are given to grade school children in the United States. A sample CBC story and reading comprehension test is shown below.
Hockey Star's Arts Donation
Sours
January 22, 1999
Ever heard the expression "don't look a
gift horse in the mouth?"
It means not to be too critical if you get
something for free.
Well, that is exactly what the federal government
is doing concerning a million dollar gift to the National Arts Centre in
Ottawa, Ontario.
And it appears the government does not like what it has
found.
To continue the metaphor, it look like this horse has some major
dental problems.
The government has been looking at a charitable
donation by Ottawa Senators hockey star Alexi Yashin.
Earlier this year he
promised to give one million dollars to the National Arts Centre, a concert
hall where people go to see plays and dance and to hear live music.
Many Canadians were delighted to see that Alexi Yashin was donating so
much money.
Mr. Yashin makes a salary of more than three million dollars a
season.
It is very expensive to put on live performances and the National
Arts Centre has recently had to raise ticket prices and cut back on
performances because of a lack of funds.
It was the most famous
donation ever to the National Arts Centre.
Mr. Yashin's popularity soared.
But then this week Mr. Yashin's donation turned sour.
After
giving the Centre $200,000 Alexi Yashin made an about face.
He decided not
to give the other $800,000 he had promised.
He says his decision was
for personal reasons.
But it now looks like Alexi Yashin was not really
being honest.
It seems he actually changed his mind because federal
Auditor General Denis Desautels told him he was trying to break the law.
Mr. Yashin originally announced he was a lover of the arts.
He
said that as a well paid hockey player he wanted to help the National Arts
Centre put on new performances.
But behind the scenes his plan was not to
give the Arts Centre one million dollars.
Instead, in a secret
agreement, the National Arts Centre would have hired Mr. Yashin's parents at
$85,000 a year.
They would have been paid out of Alexi Yashin's yearly
$200,000 donation.
What's more, his parents would not have to actually
work.
This was a way for Alexi Yashin to give money to his parents
while illegally saving thousands of dollars in taxes.
As well, a
lawyer working for Mr. Yashin was to get $15,000 out of the remaining
$115,000.
So in fact the million dollar donation would really only be
half of that.
It would have been a great public relations victory.
Mr. Yashin would look like he was being very generous.
But behind the
scenes he was really much less so.
Yesterday evening the Ottawa
Senators were playing in Boston.
After the game reporters went to the
locker room to ask Alexi Yashin what was going on.
But he refused to talk
to reporters about his "personal reasons" for cancelling his donation.
He said simply that "I know I didn't do anything illegal.
I know I
didn't do anything wrong.
I can't control what they say.
It's a free
country."
He also said he wanted to focus on hockey and nothing else.
The other players and coach of the Ottawa Senators would not comment
either.
They said it was Alexi's personal affair and not important to
them.
Many Ottawa-area hockey fans are deeply disappointed with what
is going on.
"People here are puzzled.
They feel let down," says
Rick Soweita, who owns a sports bar.
"He benefited from the publicity and
now he's got to own up."
<QUESTIONS>
<Q1> What reason
did Alexi Yashin give for backing out of his promised donation?
<Q2>
How much does Alexi Yashin earn as a hockey player?
<Q3> Who does
Alexi Yashin play for?
<Q4> How much money did Alexi Yashin actually
donate to the National Arts Centre?
<Q5> What do Alexi Yashin's
teammates think about this donation gone bad?
<Q6> How would Alexi
Yashin himself benefit from his donation scheme?
<Q7> Why do people
go to the National Arts Centre?
<Q8> How would Alexi's parents have
benefited from his donation to the National Arts Centre?
<Q9> Where
did reporters question Alexi Yashin?
Figure 1. Story
example
All stories have been split into sentences (one sentence per line) for you using the MXTERMINATOR sentence splitter developed by Adwait Ratnaparkhi. Paragraphs from the original story are separated by an empty line. The first line in the file is the title of story and the second is the date of the story.
IMPORTANT: These on-line materials cannot be distributed to anyone else or used for any purpose other than this class. If you wish to use this data for other purposes, please contact me and I will tell you what you need to do. The news stories are copyrighted by CBC/SRC, and were obtained by the MITRE Corporation for research purposes only.
The corpus includes an answer key created by MITRE that gives the correct answer for each question. Creating a Q/A system that can identify exact answers is difficult, so for this project we will focus on answer sentence identification. The answer key that we will use for our project marks the sentence(s) in each story that contains the exact answers that MITRE thought best answered each question. The sentence answer key sometimes lists more than one correct sentence for a question, in which case either one is correct. Your Q/A system should identify the sentence in the story that best answers each question. This is a much easier task and, from a practical perspective, nearly as useful for most real-world applications!
For each set of stories (training set, test set 1 and 2) there is a a file answerkey.txt present in the training or test set directory. This file contains the answers to all the questions from all stories in that directory. Here is the part of the answer key file that refers to the story mentioned above:
<FILE>1999-W04-5.qa
<Q_NUMBER>1
<A_LINE>26
<Q_TXT>What reason did Alexi Yashin give for backing out of his promised donation?
<A_TXT>He says his decision was for personal reasons.
<Q_NUMBER>2
<A_LINE>15
<Q_TXT>How much does Alexi Yashin earn as a hockey player?
<A_TXT>Mr. Yashin makes a salary of more than three million dollars a season.
<Q_NUMBER>3
<A_LINE>11
<Q_TXT>Who does Alexi Yashin play for?
<A_TXT>The government has been looking at a charitable donation by Ottawa Senators hockey star Alexi Yashin.
<Q_NUMBER>4
<A_LINE>23
<Q_TXT>How much money did Alexi Yashin actually donate to the National Arts Centre?
<A_TXT>After giving the Centre $200,000 Alexi Yashin made an about face.
<Q_NUMBER>5
<A_LINE>59
<Q_TXT>What do Alexi Yashin's teammates think about this donation gone bad?
<A_TXT>They said it was Alexi's personal affair and not important to them.
<Q_NUMBER>6
<A_LINE>38,45,19
<Q_TXT>How would Alexi Yashin himself benefit from his donation scheme?
<A_TXT>This was a way for Alexi Yashin to give money to his parents while illegally saving thousands of dollars in taxes. -OR- Mr. Yashin would look like he was being very generous. -OR- Mr. Yashin's popularity soared.
<Q_NUMBER>7
<A_LINE>12
<Q_TXT>Why do people go to the National Arts Centre?
<A_TXT>Earlier this year he promised to give one million dollars to the National Arts Centre, a concert hall where people go to see plays and dance and to hear live music.
<Q_NUMBER>8
<A_LINE>34
<Q_TXT>How would Alexi's parents have benefited from his donation to the National Arts Centre?
<A_TXT>Instead, in a secret agreement, the National Arts Centre would have hired Mr. Yashin's parents at $85,000 a year.
<Q_NUMBER>9
<A_LINE>49,48
<Q_TXT>Where did reporters question Alexi Yashin?
<A_TXT>After the game reporters went to the locker room to ask Alexi Yashin what was going on. -OR- Yesterday evening the Ottawa Senators were playing in Boston.
</FILE>...
Figure 2. Part of the answer key.
As you can see, for each question there are 4 lines that describe the question and the answer. In the first line, preceded by the <Q_NUMBER> tag, you will find the question number. In the second line, preceded by <A_LINE>, you will find the line of the sentence(s) that answers the question (remember that there is at most one sentence per line). The lines in the files are numbered starting from 1. Empty lines are also counted in. If there is more than one answer for a question, the lines for the answers are separated by a comma. The last two lines are for ease of reading only. One line contains the question text and the other the sentence(s) that answers the question (separated by " -OR- ").
Note that in some cases the answer to a question may come from the title or the date of the story so do not strip-off those lines (for example answers to WHEN questions are often found in the story date line).
Judging answers is subjective in nature, so you may sometimes disagree with MITRE's decisions in the answer key. But people will never completely agree on these things, and it is necessary to choose some set of answers for evaluation purposes, so we will use MITRE's judgements as "The Truth".
You will be using three sets of data at different points in the project:
The project will involve three phases:
You will be given the Training Set to use in developing your Q/A systems. You may use these stories and the answer keys in any way that you wish. The training data can be found in:
At this point, each team will hand in the final code for their Q/A system. We will run the Q/A systems on both the stories in Test Set #1 and Test Set #2. Your final project grade will be based on the performance of your Q/A system on both of the test sets.
The purpose of evaluating your systems on both test sets is to balance specificity with generality. You will have several weeks to try to get your Q/A systems to perform well on Test Set #1. Hopefully, everyone will be able to do fairly well on that test set. Test Set #2 will be a blind test set that no one will see until the final evaluation. A system that uses general techniques should work just as well on Test Set #2 as Test Set #1. But a system that has lots of hacks and tweaks based on Test Set #1 probably will perform very poorly on Test Set #2.
WARNING: You will be given the answer keys for Test Set #1, but your system is not allowed to use them when answering questions! The answer keys are being distributed only to show you what the correct answers should be, and to allow you to evaluate your Q/A systems automatically if you wish. Your system should use general techniques that can apply to a wide variety of texts.
Your Q/A system should accept two command line parameters. Running your program should look like:
myQAproject input_filename outputfile_name
The first parameter is the name of the input file. The first line of the input file will be a directory path and all subsequent lines will be story filenames. Your Q/A system should then process each story file in the list from the specified directory. A sample input file is below, which indicates that 6 story files should be processed and they can all be found in the directory /afs/cs.pitt.edu/usr0/litman/public/cs2731/TrainingSet/.
Each story file will be formatted like Figure 1. The first line is the story title. The second is the date of the story followed by two empty lines. After that, the main story begins with one sentence per line. Original paragraphs are separated by an empty line.
At the end of each story is a set of 5 to 10 questions. You can identify the question section of the file by looking for a line that contains only the <QUESTIONS> tag. After that, each line will contain a question and it will start with <Qn> tag (where n is the number of the question) followed by the question.
where:
The <Q_TXT> and <A_TXT> lines are optional and you don't need to include them in the output (though you might want to have them so that you can check your system answers faster while developing the system). You will be graded by matching your answer line against the one in the answer key. Please make sure that the <Q_NUMBER> line is followed by the <A_LINE> line. Also, do not forget to end each <FILE> section with a corresponding </FILE>.
You can have as many empty line as you like in your output (see for example Figure 2). Just make sure that you have the lines <FILE>, <Q_NUMBER>, <A_LINE> and </FILE> in your output.
The scripts can be found in:
GRADER
grader.pl input_filename answerkey_filename your_answer_filename
This is the script that will be used for grading. Input_filename refers to the same file you use as input in your Q/A system. The second command line parameter has to be the answer key file name (the one provided to you). The third one is your answer file name (if you swap the parameters you will get bogus results, so be careful!!!). The script output is self explanatory.
MARKER
marker.pl input_filename answer_filename tag_name new_extension
This script will generate new story files by marking the answers from answer_file with the <tag_name question_number> </tag_name question_number> tags in the original story. Input_filename refers to the same file you use as input in your Q/A system. Answer_filename refers to an answer file (it can be your answers or the answer key file). The script will process every story by looking up in the answer file the lines that contain answers and mark them with the appropriate tags. The annotated story is saved in a file with the same name but with new_extension extension (if you use the txt extension it will overwrite the original story)
You might find this tool useful by running it twice: once with the answer key and tag name CORRECT_ANS and then run on the annotated files with MY_ANS tag. In this way you will get stories that have both the correct answers and your answers marked (hopefully overlapping as much as possible :-)).
Remark: When the answer file is used by marker to annotate the story, the match between story file name and the file name from the answer (whatever follows <FILE>) is done without taking into account the EXTENSION. This will help you when you want to apply the marker two times to annotate with both answers.
If you want to run the scripts from other machines than elements, you have to have Perl installed. To run the scripts use something like:
perl_path/perl script script_parameters
Ideally, the class will be divided into 3-person or 2-person teams for the project. You may form your own team if you know people with whom you'd like to work. Otherwise, I can randomly assign you to a team. If you really want to work by yourself, or have a larger team, that is also possible.
The schedule for the projects is shown below:
By November 14, we expect each team to have a working Q/A system! It might not work well and may still be missing some components that you plan to incorporate, but it should be able to process a story and produce an answer for each question.
Participation in the preliminary evaluation is mandatory. Failure to participate will result in a 10% deduction off your final project grade. This policy is to ensure that everyone is making adequate progress.
Each project will be graded according to the following criteria:
The grade for the report and presentation will be based on clarity, as well as the creativity and ambitiousness shown in the design of your system. Thus, if you incorporate novel ideas and/or complex algorithms, then I will take that into account. Like the Olympics, difficulty can in effect boost your raw performance scores.
Note that the final grading is on a relative, not an absolute, scale. However, this does not mean that the team with the highest average ranking automatically gets an A (e.g., if the best score was no better than chance performance), or that the lowest scoring team fails. If every team produces a good and interesting system, I will be happy to give every team an A.
NLP is not a solved problem, and effective QA'ing is HARD! Randomly choosing a sentence will yield extremely low accuracy, so anything higher means that you are doing something good!!
This project and these instructions are based on Professor Ellen Riloff's course project at the University of Utah. Thanks to Mitre for the use of the CBC data.