CS 6281

Public Datasets that might be of interest for projects (in addition to the links already on the syllabus)

NOTE THAT THIS LIST IS A RANDOM SAMPLE OF THINGS I HAPPEN TO KNOW ABOUT; THERE ARE PROBABLY OTHERS!

Graded Essays

ASAP: Automated Student Assessment Prize data

Native Language Identification Shared Task: Each essay in the TOEFL11 is labeled with an English language proficiency level (high, medium, or low)

International Corpus of Learning English (several different types of grades)

CityU corpus of essay drafts of English learners: Sentences in these drafts are annotated with comments and error codes by language tutors, and are aligned to sentences in subsequent drafts; final grade available

My Data

You can also take a look at my papers and if there is data you are interested in, I might be able to give it to you in some cases. Particularly relevant might be papers by Zhang, Rahimi, and Nguyen.

Public Datasets that might be of interest for projects (in addition to the links already on the syllabus)

Graded Essays

Argument Mining

Wikipedia

Reviews

Email and Blogs

My Data

Updates since original post: Your data? NB annotated papers from the course?