Assignment 2

Flubb Glubb

Flubb glubb is a children's game where children sit in a circle and count. The game starts with a designated player says "1". The next player says "2" and so on. However, for any number divisible by "3", the player must say "flubb" instead of "3". For any number divisible by "7", the player must say "glubb" instead of "7". A number divisible by both 3 and 7 is replaced with "flubb glubb". Any player who hesitates or says the wrong thing is eliminated. The game continues until only one player remains. The beginning of a game can look like:

1, 2, flubb, 4, 5, flubb, glubb, 8, flubb, 10, 11, flubb, 13, glubb, flubb, 16, 17, flubb, 19, 20, flubb glubb, 22, ...

Write a program that asks the user how many numbers he/she wishes the game to go for. Write a for loop that starts at 1 and goes up to (and including) the user-specified number. For each number:

Separate each "number" with a comma. Print only 10 "numbers" to a line (consider each of "flubb", "glubb", and "flubb glubb" as one "number"). If the user wants the game to last for more than 10 numbers, continue on a new line.

An example of the game is:

How many numbers do you want to use with the Flubb Glubb game? 31
1, 2, flubb, 4, 5, flubb, glubb, 8, flubb, 10,
11, flubb, 13, glubb, flubb, 16, 17, flubb, 19, 20,
flubb glubb, 22, 23, flubb, 25, 26, flubb, 27, glubb, 29, flubb
31

Be sure to detect bad input from the user. For any bad input you know how to detect in Python, use a while loop to ask for good input. For any other bad input, include it in the information sheet.

Note: A number is divisible by another number if the remainder of a division is zero, i.e. a is divisible by b if a / b has a remainder of 0. For example, 14 is divisible by 7 because 14 / 7 has a remainder of 0. 14 is not divisible by 5 because 14 / 5 has a remainder of 4.

Natural Language Processing

Natural language processing is basically getting a computer program to "understand" and/or generate (i.e. write or speak) in a natural language (such as English). In this program, you will write code to estimate the readability of text.

Readability

Readability is how hard it is to understand the text. One simple measure is the Automated Readability Index (ARI). ARI estimates the grade level needed to underestand the text using the formula: 4.71*CPW + 0.5*WPS - 21.43

where:

Cleaning Text

In determining the similarity of texts, there is often some "cleaning" of each text, such as removing capitalization, removing non-alphanumeric characters (i.e. all characters that aren't letters or numbers), lemmatizing words (e.g. converting "running" to "run"), and removing stopwords. Stopwords are very common words that often convey little-to-no specific information, such as "a" or "the". The Natural Language Toolkit has a list of stopwords to use. For this program, use these stopwords: stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 'can', 'will', 'just', 'dont', 'should', 'now']

Program Description

Write a program that determines the readability of a text and cleans the text (you will not be determining the similarity of texts).

For readability, use ARI as described above. Do not clean the text before estimating the readability since this will affect the readability estimation. Print out the ARI value to three decimal places.

Cleaning the text involves doing the following (probably in this order):

  1. Creating a list of the words in the text
  2. Converting each word to lowercase
  3. Removing all non-alphanumeric characters from each word (you may find Python's String constants or isalnum useful)
  4. Removing all stopwords from the list of words

Note that you do not need to lemmatize the words.

Print out the final list of cleaned words.

Use list comprehension at least once in this program.

To test your program, use the first chapter of Black Beauty:

text = '''The first place that I can well remember was a large pleasant meadow with a pond of clear water in it. Some shady trees leaned over it, and rushes and water-lilies grew at the deep end. Over the hedge on one side we looked into a plowed field, and on the other we looked over a gate at our master's house, which stood by the roadside; at the top of the meadow was a grove of fir trees, and at the bottom a running brook overhung by a steep bank.

While I was young I lived upon my mother's milk, as I could not eat grass. In the daytime I ran by her side, and at night I lay down close by her. When it was hot we used to stand by the pond in the shade of the trees, and when it was cold we had a nice warm shed near the grove.

As soon as I was old enough to eat grass my mother used to go out to work in the daytime, and come back in the evening.

There were six young colts in the meadow besides me; they were older than I was; some were nearly as large as grown-up horses. I used to run with them, and had great fun; we used to gallop all together round and round the field as hard as we could go. Sometimes we had rather rough play, for they would frequently bite and kick as well as gallop.

One day, when there was a good deal of kicking, my mother whinnied to me to come to her, and then she said:

"I wish you to pay attention to what I am going to say to you. The colts who live here are very good colts, but they are cart-horse colts, and of course they have not learned manners. You have been well-bred and well-born; your father has a great name in these parts, and your grandfather won the cup two years at the Newmarket races; your grandmother had the sweetest temper of any horse I ever knew, and I think you have never seen me kick or bite. I hope you will grow up gentle and good, and never learn bad ways; do your work with a good will, lift your feet up well when you trot, and never bite or kick even in play."

I have never forgotten my mother's advice; I knew she was a wise old horse, and our master thought a great deal of her. Her name was Duchess, but he often called her Pet.

Our master was a good, kind man. He gave us good food, good lodging, and kind words; he spoke as kindly to us as he did to his little children. We were all fond of him, and my mother loved him very much. When she saw him at the gate she would neigh with joy, and trot up to him. He would pat and stroke her and say, "Well, old Pet, and how is your little Darkie?" I was a dull black, so he called me Darkie; then he would give me a piece of bread, which was very good, and sometimes he brought a carrot for my mother. All the horses would come to him, but I think we were his favorites. My mother always took him to the town on a market day in a light gig.

There was a plowboy, Dick, who sometimes came into our field to pluck blackberries from the hedge. When he had eaten all he wanted he would have what he called fun with the colts, throwing stones and sticks at them to make them gallop. We did not much mind him, for we could gallop off; but sometimes a stone would hit and hurt us.

One day he was at this game, and did not know that the master was in the next field; but he was there, watching what was going on; over the hedge he jumped in a snap, and catching Dick by the arm, he gave him such a box on the ear as made him roar with the pain and surprise. As soon as we saw the master we trotted up nearer to see what went on.

"Bad boy!" he said, "bad boy! to chase the colts. This is not the first time, nor the second, but it shall be the last. There take your money and go home; I shall not want you on my farm again." So we never saw Dick any more. Old Daniel, the man who looked after the horses, was just as gentle as our master, so we were well off.'''

The readability of the chapter is 7.456 or 7.447 (depending on how you count the number of words).

In the chapter above, there are exactly 36 sentences. If you get a different count, you may be dealing with one or more of the problems below:

Final Notes

Submission and Grading:

Complete the Assignment Information Sheet.

Submit your final programs and assignment information sheet (zipped into one file) to CourseWeb in the Assignment 2 assignment.

The grading rubric can be found here: Rubric (doc).

The assignment is due Monday, June 22 by 11:59 pm. As with all programming assignments, you have unlimited uploads (before the deadline), so you may upload the assignment before the deadline. If you later decide to upload another, you may do so without penalty (as long as it's before the assignment deadline). The last submission uploaded before the deadline will be the one graded. If you would like ungraded feedback on a programming assignment, you may send an email to your TA or the instructor and ask for feedback; please send your code as well.

For more advice on submitting your assignment, see the Programming Assignments section of the Tips for Success page.