CS2710, ISSP 2160 Fundamentals of Artificial Intelligence, Fall 2014 Assignment 5: Uncertainty By submitting a solution to this assignment, you attest that you (1) did all of the work, (2) that you did it alone, and (3) that you did it without using resources outside of those provided by class (CS2710/ISSP 2160 Foundations of Artificial Intelligence, Fall 2014), unless you explicitly state otherwise (in which case exactly what must be clearly indicated). ------------------------------------------------------ 1. Suppose that A is independent of B. It is not true that, for all C, A is conditionally independent of B given the value of C. Give an example that illustrates this. 2. Assume that 2% of the population in a country carry a particular virus. A test kit is able to detect the presence of the virus from a patient's blood sample. The test kit has the following accuracy: P(the kit shows positive | the patient is a carrier) = 0.998 P(the kit shows negative | the patient is not a carrier) = 0.996 What is the probability of a false positive? From: http://www.tc3.edu/instruct/sbrown/stat/falsepos.htm: Out of 1,098 tests that report positive results, 99 (9%) are correct and 999 (91%) are false positives. Therefore the probability that you actually have disease D, when you're given a positive test result, is only 9%. Symbolically you can write this as (P(have D | test positive) = 9%. 3. Assume the following conditional probabilities are available. P(WetGrass|Sprinkler, Rain) = 0.95 P(WetGrass|Sprinkler, ~Rain) = 0.9 P(WetGrass|~Sprinkler, Rain) = 0.8 P(WetGrass|~Sprinkler, ~Rain) = 0.1 P(Sprinkler|RainySeason) = 0.01 P(Sprinkler|~RainySeason) = 0.9 P(Rain|RainySeason) = 0.9 P(Rain|~RainySeason) = 0.2 P(RainySeason) = 0.7 Construct a Bayesian Network (including the conditional probability tables and the graph structure), and determine the probability P(WetGrass, RainySeason, ~Rain, ~Sprinkler) 4. Below is a data set from the UC Irvine Machine Learning repository. It concerns whether or not (T, F) a balloon is inflated. YELLOW,SMALL,STRETCH,ADULT,T YELLOW,SMALL,STRETCH,ADULT,T YELLOW,SMALL,STRETCH,CHILD,F YELLOW,SMALL,DIP,ADULT,F YELLOW,SMALL,DIP,CHILD,F YELLOW,LARGE,STRETCH,ADULT,T YELLOW,LARGE,STRETCH,ADULT,T YELLOW,LARGE,STRETCH,CHILD,F YELLOW,LARGE,DIP,ADULT,F YELLOW,LARGE,DIP,CHILD,F PURPLE,SMALL,STRETCH,ADULT,T PURPLE,SMALL,STRETCH,ADULT,T PURPLE,SMALL,STRETCH,CHILD,F PURPLE,SMALL,DIP,ADULT,F PURPLE,SMALL,DIP,CHILD,F PURPLE,LARGE,STRETCH,ADULT,T PURPLE,LARGE,STRETCH,ADULT,T PURPLE,LARGE,STRETCH,CHILD,F PURPLE,LARGE,DIP,ADULT,F PURPLE,LARGE,DIP,CHILD,F Give the following probabilities: P(F|Yellow,Small) = P(F,Yellow,Small) = P(T,Adult|Purple) = 5. We want to classify athletes as either not rich or rich. Each athlete plays either basketball or tennis (and is either a male or a female). Thus, we have: Economic status (E): not-rich, rich Sport (S): basketball, tennis Gender (G): male, female total number of athletes: 640 320 athletes are rich (E = rich) 160 are basketball players (E = rich, S = basketball) 40 are female (E = rich, S = basketball, G = female) 120 are male (E = rich, S = basketball, G = male) 160 are tennis players (E = rich, S = tennis) 120 are female (E = rich, S = tennis, G = female) 40 are male (E = rich, S = tennis, G = male) 320 athletes are not rich (E = not-rich) 160 are basketball players (E = not-rich, S = basketball) 120 are female (E = not-rich, S = basketball, G = female) 40 are male (E = not-rich, S = basketball, G = male) 160 are tennis players (E = not-rich, S = tennis) 40 are female (E = not-rich, S = tennis, G = female) 120 are male (E = not-rich, S = tennis, G = male) Are G and S conditionally independent given E? Please support your answer. 6. See the Appendix Below for information about the Naive Bayes probabilistic classifier. positive negative P(Class) 0.5 0.5 *Size* ---------------------------------- P(small|Class) 0.4 0.4 P(medium|Class) 0.1 0.2 P(large|Class) 0.5 0.4 *Color* ---------------------------------- P(red|Class) 0.9 0.3 P(blue|Class) 0.05 0.3 P(green|Class) 0.05 0.4 *Shape* ---------------------------------- P(square|Class) 0.05 0.4 P(triangle|Class) 0.05 0.3 P(circle|Class) 0.9 0.3 ------------------------------------------ (6.A) Apply EQ1 to the test instance . Please show your work. (6.B) Calculate EQ2 for the same test instance, for both the positive and negative classes. (How you can derive the denominator, given what you have?) Please show your work. (6.C) One approach to resolving ambiguous words in English is to use Bayesian reasoning based on surrounding words. Consider the following three meanings of the word "class": 1. "prototype for an object in Object-Oriented Programming(OOP)"; 2. "education imparted in a series of lessons or class meetings"; 3. "people having the same social or economic status"; Assume we treat the presence (or absence) of the following words anywhere in a sentence as evidence: "people" ("People often forget to define a deconstructor for their class", "People are often late to class", "The struggle of lower class people is the driving force of progress"); "program" ("This program does not use the window class", "This class is a required part of the natural science program", "The government's tax program does not address the needs of the lower class"); "student" ("This window class was written by a clever student", "The student was late to class", "The student was concerned with the problems of the working class"); "education" ("Learning how to write an abstract class is a vital part of your education", "Not attending the class will hamper your education", "Lowering the cost of education is an important issue for the middle class"). Assume that the following prior and conditional probabilities are measured (where m is a possible meaning for the ambiguous word). E.g.: P('student' appears in a sentence | 'class' has the "lessons" meaning in that sentence) = 0.2 P('student' does not appear in a sentence | 'class' has the "lessons" meaning in that sentence) = 0.8 m OOP lessons economic status P(m) 0.1 0.6 0.3 P(`people' | m) 0.001 0.1 0.1 P(`program' | m) 0.1 0.01 0.001 P(`student' | m) 0.01 0.2 0.01 P(`education' | m) 0.005 0.05 0.05 Apply the Naive Bayes classifier to determine which is the most probable meaning of "class" in the sentence "Did the student complete the homework program for the class?" Please show your work. 7. Consider the following Bayesian Network: A72 --> A2 -- > A6 < -- A7 --> A4 \ / \ / \ / \ / \ / v v A5 ---> A1 Are A72 and A5 conditionally independent given A2? Are A72 and A5 d-separated given A2? Are A1 and A7 d-separated given A6? Are A1 and A7 d-separated given A6, A2? Are A2 and A4 d-separated given A1, A7? 8. R&N 14.1. 9. Prove that a variable is independent of all other variables in the network, given its Markov Blanket. In your answer, refer to Figure 14.4. You can assume the TA will have Figure 14.4 in front of him when reading your answer. ============================================= 10. Partially trace the decision tree induction algorithm given in lecture on the following data. Specifically, show which attribute is chosen as the root (feel free to use entropy.py), and then show the first recursive calls (i.e., all the calls to DTL the first time the for-loop is executed). Day out temp hum wind playtennis d1 sunny hot high weak no d2 sunny hot high strong no d3 over hot high weak yes d4 rain mild high weak yes d5 rain cool norm weak yes d6 rain cool norm strong no d7 over cool norm strong yes d8 sunny mild high weak no d9 sunny cool norm weak yes d10 rain mild norm weak yes d11 sunny mild norm strong yes d12 over mild high strong yes d13 over hot norm weak yes d14 rain mild high strong no ============================================= Appendix on Naive Bayes Probabilistic Classification Assign the class that is most probable, given a combination of attribute values. The attribute values are all given - they are evidence variables. answer = argmax P(class|a1, a2, ..., an) class You might not be familiar with argmax. Here is a use that shows how it works: maxAbsoluateValue (S) = argmax | s | s in S "the s in S such that the expression |s| is maximal" E.g.: maxAbsoluateValue([4, -8, -1, 3]) ==> -8 Apply Bayes' Rule Plugging into the above: answer = argmax P(a1, a2,...,an | class) P(class) class ----------------------------- P(a1, a2,...,an) We can ignore the denominator, since it is a constant value that is the same for all classes. It won't determine which class we choose. So, our classifier is: answer = argmax P(a1, a2,...,an | class) P(class) class In the Naive Bayes model, the attributes are all conditionally independent of each other, given the value of the class variable. Let's derive a Naive Bayes model. Suppose we have three attributes, a1, a2, a3. answer = argmax P(a1,a2,a3 | class) P(class) [from above] class *apply definition of conditional probability* answer = argmax P(a1,a2,a3,Class) P(Class) class -------------------------- P(Class) *P(Class) cancels* answer = argmax P(a1,a2,a3,Class) class *apply the chain rule* answer = argmax P(a1|a2,a3,Class) P(a2|a3,Class) P(a3|Class) P(Class) class *applying conditional independence assumptions* (EQ1) answer = argmax P(a1|class) P(a2|class) P(a3|class) P(class) class Recall: we dropped the denominator above. We need it if we do want the probability (EQ2) P(class|a1,a2,a3) = P(a1|class) P(a2|class) P(a3|class) P(class) -------------------------------------------- P(a1,a2,a3,a4) *For the final, be sure you can derive EQ1, EQ2 and are able to explain the derivation* *Example:* Decide whether to play tennis (Yes, No) Training data: outlook temp humidity wind play Sun H High W No Sun H High S No Over H High W Yes Rain Mild High W Yes Rain Cool Normal W Yes Rain Cool Normal S Yes Over Cool Normal S No Sun Mild High W Yes Sun Cool Normal W No Rain Mild Normal W Yes Sun Mild Normal S Yes Over Mild High S Yes Over H Normal W Yes Rain Mild High S No Compare: P(yes) P(sunny | yes) P(cool | yes) P(high | yes) P(strong | yes) P(no) P(sunny | no) P(cool | no) P(high | no) P(strong | no) P(yes) P(sunny | yes) P(cool | yes) P(high | yes) P(strong | yes) 9/14 * 2/9 * 2/9 * 4/9 * 3/9 = .0053 P(no) P(sunny | no) P(cool | no) P(high | no) P(strong | no) 5/14 * 3/5 * 2/5 * 3/5 * 3/5 = .0206 Answer: No The numbers are called "parameter estimates", specifically "maximum likelihood estimates". We estimated the parameters based on counts in the training data.