Homework 3
Assigned 9/06; Due 9/11 11:59PM
Submit via CourseWeb (click on Homeworks on the left menu)
1. Praat
- Download Praat onto a computer.
- Read the very brief tutorial here.
- Record yourself in Praat saying the sentence Yeah, right!
in two ways: sarcastic and not sarcastic. Try for genuine sarcasm; use another person to make sure you get a good example
and the two utterances are distinct.
- For each of the two sentences, get the following pieces of information:
- The average pitch of the whole sentence in Hz
- The maximum pitch for the whole utterance in Hz
- The minimum pitch for the whole utterance in Hz
- The maximum intensity for the whole utterance in Hz
- The minimum intensity for the whole utterance in Hz
- Your rate of speech in words per second for the utterance
- At least one additional prosodic
feature from the Tepperman paper (it is assigned
for the 9/18 class)
- Any other features that you wish to compute.
- Compare the two sentences on the above
features. Discuss any patterns you see.
- Write up one paragraph with your numbers and
the discussion. You might find it useful to read
the Tepperman et al. paper now rather than later.
2. Twitter
- Start by defining a tweet type you wish to target in order to create a
small corpus,
and by selecting at least three indicators for that type
of tweet. For example, if your target is to create an "amusement"
dataset of tweets, one potential indicator could be "LOL".
- Collect a set of at least 500 tweets that contain appropriate
indicators, using repeated calls to the Search API, which is documented
here.
- The easiest way to get an initial feeling for how requests work is through a browser using the following
link.
- Hints:
- One possible high-level approach (among others) to this task is to use the "rpp" parameter to obtain
100 tweets per request, then use the "max_id" parameter to retrieve the next 100 tweets.
- Make use of the "lang" parameter.
- The precise way you eventually end up sending requests to the Twitter APIs for this assignment will
depend on the programming language you use (or the library you may
chose to employ).
- Note that the type of indicators that you can use is limited by the type of requests you can make (for
example, one can not easily search for syntactic patterns).
- Also note that Twitter has rate limitations.
- Write the 500 tweets to a plaintext file, one tweet per line,
which you will turn in. Also create a short file describing: the
response type you chose to collect, the respective indicators, your
code, and how well your requests worked.
3. LIWC
Lexicons like LIWC typically ignore grammatical information like
part of speech as well as contextual information about where and how
the words in the lexicon are used. To demonstrate problems with such
lexical approaches, choose two categories from LIWC, using this resource:
From each, select
one word in the category and find 3 real examples in which the word as
selected does not exhibit the sense labeled by the category. Try to
find interesting examples that are likely to really create problems,
rather than make use of rare senses of words.