Project (CS 1671)
Assigned: March 31, 2020
Due: April 3, 2020 (Two Questions/Hypotheses Before Midnight)
Due: April 16, 2020 (Before Midnight)
This project will extend the work done in Homework 3, by asking you to
1) pose two research questions (first due date) and 2) set up experiments
to address these questions (second due date). You will use the same
text classification task from Homework 3, the same
corpus of comments annotated for
constructiveness, and any of the work that you have already done for
Homework 3 if desired.
Task
Pose two questions about choices that might impact performance on this classification
task; conduct an experiment to answer each question; discuss the outcomes of the experiments and draw conclusions.
- This project is intended to be more open-ended and exploratory
than Homework 3. You should not pose the exact same
question as someone else, or ask the same type of question
twice.
- E.g.,
since "what is the impact of lowercasing?" and "what is the impact of
stemming?" both investigate text normalization choices, they would be
considered the same type of question. Another example is given below.
- You may frame your questions to explore any issue
discussed in class where multiple plausible options exist
for solving a problem (e.g., text normalization, cross-validation,
training data augmentation,
vector representations, discriminative/generative ML,
parts-of-speech, syntax, language models, named entity recognition...).
- You may also consult the research literature (e.g., papers such
as the one related to the project that was discussed in class) to help pose your questions.
- When you conduct your experiments make sure you compare with baselines so that you know your if your experiment succeeded or not.
What to Submit
Due Date 1: April 3rd
- In Courseweb submit a document describing your two questions that you want to answer.
- Make sure that your two questions are not really similar or exactly the same.
- In your submission also include any links for resources you plan to use, if possible.
- The main goal for this submission is to make sure that you have a
plan and that your questions are both sufficiently complex/interesting
and different enough from each other.
- Another example of questions that are too similar are
"Does using word2vec vectors improve classification
performance?" and "Does using GloVe vectors improve
classification performance?" as both investigate dense vectors.
- Asking one of those questions is fine, but asking both
would not be acceptable. If
you submit questions that are too similar, we will ask you
to keep changing one of the questions until we confirm that they are
different enough.
Due Date 2: April 16
- Your code and data files
- Please document enough of your program to help Rav grade your work.
- A README file that addresses the following:
- Describe the computing environment you used, especially if you used some off-the-shelf modules. (Do not use unusual packages. If you're not sure, please ask.)
- List any additional resources, references, or web pages you've consulted.
- List any person with whom you've discussed the assignment and describe the nature of your discussions.
- Discuss any unresolved issues or problems.
- A REPORT document that discusses the following:
- The questions posed, how each question was tested, and the results.
- Submit all of the above materials to Courseweb as a zip file.
Grading Guideline
- Document of two questions (5 Points)
- 5 points for submitting the two proposed questions
- Code (60 Points)
- 35 points: Code corresponding to question 1
- 25 points: Code corresponding to question 2
- Report (35 Points)
- 35 points for the program description, analysis, and data supporting that analysis.