CS1699: Homework 5

Due: 12/11/2015, 11:59pm

Instructions: Please provide your code and your written answers. Your written answers should be in the form of a single PDF or Word document (.doc or .docx). Include your name in your write-up. Your code should be written in Matlab. Zip or tar your written answers and .m files and upload the .zip or .tar file on CourseWeb -> CS1699 -> Assignments -> Homework 5. Name the file YourFirstName_YourLastName.zip or YourFirstName_YourLastName.tar.

For this homework, you have a choice of which part you want to complete. Part I [10 points] is mandatory for everyone, but does not involve any programming. Parts II and III are 40 points each, and you should pick one of them to complete. Thus, this homework will be out of 50 points max, and will count as 10% of your final grade. Note that while lengthy, the descriptions of Parts II and III should make them fairly straight-forward to complete. Read the descriptions for both, as the description for Part II contains information you need to complete Part III.

Extra credit: There are two ways to get extra credit on this assignment, but note that the total extra credit for this assignment is capped at 20 points (40%).
Part I: Attributes for image retrieval (10 points)

In this problem, you will use the WhittleSearch system to perform some searches. Use the following three interfaces: interface1, interface2, and interface3. Perform five searches total (distributed in any way over the three interfaces), trying to find the image in the top-left corner of the page. Do not scroll past the first 4 rows of image results. Report how long it took you to find the query image or something very similar to it for each search (and feel free to quit if it takes more than 5 iterations per query image), and any other comments you have. To update the search results, click on the yellow/green button. To set results to update automatically for the next round, use the button above the yellow arrow. To get a new query image, click on the button marked by a red arrow below:




Part II: Attributes for zero-shot recognition (40 points)

In this problem, you will implement a zero-shot recognition system which resembles the system proposed in Christoph Lampert et al.'s paper. (You don't need to read the paper to do the homework.) All you need to know about this system is that it models the probability of a certain object (e.g. polar bear) being present in the image using the probabilities of being present for each of the attributes that a bear is known to have. For example, if we detect the attributes "white", "furry", "bulbous", "not lean", "not brown" etc. in the image, i.e. attributes that a polar bear is known to have, we can be fairly confident that there is a bear in the image. Hence, we can recognize a polar bear without ever having seen a polar bear, if (1) we know what attributes a polar bear has, and (2) we have classifiers trained for these attributes, using images from other object classes (i.e. other animals). Follow the steps below to implement zero-shot recognition.

  1. First, copy the Animals with Attributes dataset (originally appearing here) from the following Pitt AFS directory. Look into the "Animals with Atttributes" folder. The dataset includes 50 animal categories, 85 attributes, and 30,475 images. The dataset provides a 50x85 predicate-matrix-binary.txt which you should read into Matlab using M = load('predicate-matrix-binary.txt'); An entry (i, j)=1 in the matrix says that the i-th class has the j-th attribute (e.g. a bear is white), and an entry of (i, j)=0 says that the i-th class doesn't have the j-th attribute (e.g. a bear is not white).
  2. Four feature types are provided: cq-hist, which are color histograms, phog-hist, which are a variation of HOG features, sift-hist, which is a SIFT bag-of-words histogram, and decaf, which are features extracted from a deep neural network. Pick any one feature type to use. For any feature type, there is one text file for every image, and files are organized by animal categories.
  3. The paper splits the object classes (not images) into a training and test set, for purposes of zero-shot recognition. In this scenario, the training classes are animals that your system will see, i.e. ones whose images the system has access to. In contrast, the test set contains classes (animals) for which your system will never see example images. The 40 training classes are given in trainclasses.txt and the 10 test classes are given in testclasses.txt. (Use [c1, c2] = textread('classes.txt', '%u %s'); to read in the class names. You can use the same function but with a different second argument to read in testclasses.txt.) At each time, we will assume that a query image can only be classified as belonging to one of the 10 unseen classes, so chance performance (randomly guessing the label) will be 10%.
  4. You will use all or a random sample of all images from the training classes (or rather, their feature descriptors) to train a classifier for each of the 85 attributes. The predicate matrix mentioned above tells you which animals have which attributes. So if a bear is brown, you should assign the "brown=1" tag to all of its images. Similarly, if a dalmatian is not brown, you should assign the tag "brown=0" to all of its images. You will use the images tagged with "brown=1" as the positive data in your classifier, and the images tagged with "brown=0" as the negative data, for the "brown" classifier. Use the Matlab fitcsvm function to train the classifiers. Save the model output by each attribute classifier as the j-th entry in a models cell array (initialized as models = cell(85, 1);) Note that if you sample data in such a way that you have either no positive or no negative data for some attribute classifier, you'll get a classifier that only knows about one class, which is a problem. However, for every attribute, there are some classes that do and some that don't have the attribute. So you just have to make sure you sample data from all classes, when training your attribute classifiers.
  5. You now have one classifier for each attribute. You next want to apply each attribute classifier j to each image l belonging to any of the test classes. You want to save the probability that the j-th attribute is present in the l-th image. To do so, you have to do one extra operation to your classifier. For each of the j attribute classifiers, run the function fitSVMPosterior on them, i.e. call model = models{j}; model = fitSVMPosterior(model); models{j} = model; (or if you want, run this function on each classifier before saving it into the cell array). Then to get the probability that the l-th image contains the attribute j, call [label, scores] = predict(model, x); where x is the feature descriptor for your image. Then scores will be a 1x2 vector, and the first entry in the vector will be the probability that the j-th class is present, while the second will be the probability that it is not present in the image. Ensure that the probabilities sum to 1, by calling assert(sum(scores) == 1), or if x contains the descriptors for multiple images, assert(all(sum(scores, 2) == 1)). Save these probabilities so you can easily access them in the next step.
  6. You will now actually predict which animals are present in each test image. (This corresponds to Equation 2 in the paper.) To perform classification of a query test image, you will assign it to the test class (out of 10) whose attribute "signature" it matches the best. How can we compute the probability that an image belongs to some animal category? Let's use a toy example where we only have 2 animal classes and 5 attributes. We know (from a predicate matrix like the one discussed above) that the first class has the first, second, and fifth attributes, but does not have the third and fourth. Then the probability that the query image (with descriptor x) belongs to this class is P(class = 1|x) = P(attribute_1 = 1|x) * P(attribute_2 = 1|x) * P(attribute_3 = 0|x) * P(attribute_4 = 0|x) * P(attribute_5 = 1|x). The "|x" notation means "given x", i.e. we compute some probability using the image descriptor x. Let's say the second class is known to have attributes 3 and 5, and no others. Then the probability that the query image belongs to this class is P(class = 2|x) = P(attribute_1 = 0|x) * P(attribute_2 = 0|x) * P(attribute_3 = 1|x) * P(attribute_4 = 0|x) * P(attribute_5 = 1|x).
  7. You will assign the image with descriptor x to that class i which gives the maximal P(class = i|x). For example, if P(class = 1|x) = 0.80 and P(class = 2|x) = 0.20, then you will assign x to class 1. You can call [~, ind] = max(probs); on a vector of probabilities such that probs(i) is P(class = i); then ind will give you the "winning" class to which x should be assigned.
  8. How do you compute P(attribute_i = 1|x)? This is a probability value you've computed already. It is just the second entry of the scores output from running predict on the descriptor x (assuming you trained with labels of 1 and 0). If you need P(attribute_i = 0|x), that's just the first entry of scores (or more simply, 1 - the second entry). Note that if you used the first entry for the probability of a label=1 (as this homework description was originally written), you will still get full credit.
  9. You will classify each test image from the 10 unseen (test) classes (or a sample of all images, if you prefer), and compute the average accuracy.
What to include in your submission:
  1. [10 points] A function [models] = train_attribute_models(...); that outputs a 85x1 cell array of attribute classifier models. You are free to pass in whatever arguments you need, and are welcome to add any additional outputs after models.
  2. [10 points] A function [probs_attr] = compute_attribute_probs(...); that outputs a 85xn matrix of probabilities, where n is the number of test images you choose to use; and probs_attr(j, l) is the probability that the j-th attribute is present in the l-th image. Again, use any inputs you like, and any additional outputs after the first one.
  3. [15 points] A function [probs_class] = compute_class_probs(...); that outputs an 10xn matrix of probabilities, where probs_class(i, l) is the probability that the i-th class is present in the l-th image.
  4. [5 points] A function [acc] = compute_accuracy(probs_class, ground_truth_class); where ground_truth_class is a 1xn vector such that ground_truth_class(l) is the true (i.e. given in the dataset) class for the l-th image. acc is a single real number denoting the overall accuracy of your system, averaged over the n test images. Also include the overall accuracy score in your write-up.

Part III: Active learning (40 points)

In this problem, you will implement a very simple active learning pipeline. Make sure you read the full description for Part II above. You will again use the Animals with Attributes dataset. However, this time you will not perform zero-shot recognition, i.e. you will be allowed to see sample images for the test classes. You also won't use attributes for this part. You will train a multi-class classifier that can distinguish between the 10 test classes discussed in Part II, using Matlab's fitcecoc function, but in two ways: "passive" and "active". The steps are described below.

  1. First, you will take the images that belong to each of the test classes, and for each class, you will draw a random sample of 5 initial training images, 15 "unlabeled" images (you will see why we use the "unlabeled" adjective shortly), and 20 test images. Use Matlab's randperm function: if images contains the images for some test animal class, call r = randperm(length(images)); initial_train_images = images(r(1:5)); unlabeled_images = images(r(6:20)); test_images = images(r(21:40)); .
  2. Concatenate the 5x10 initial training images (5 images from 10 classes) into a set we'll call the "training set." Separately, concatenate all "unlabeled images" (all 15x10 of them); we call this the "unlabeled set". Separately, concatenate all test images (all 20x10 of them); we call this the "test set."
  3. Start with "training set", and train a multi-class initial classifier. Compute its accuracy on the test set. Note that to get probability outputs which will be needed later, you should use model = fitcecoc(X, Y, 'FitPosterior', true);
  4. You will temporarily "hide" the labels of the unlabeled set, and iteratively "request" the labels, in two ways. This mimics a scenario when your system doesn't actually have the labels on the unlabeled images, and is asking annotators (e.g. on Mechanical Turk) to label some of them.
  5. To train the "Passive" classifier, start with the initial classifier, and then for 30 iterations, do the following: Randomly pick 5 images from the unlabeled set, "reveal" their labels, add them to the training set and remove them from the unlabeled set, retrain your classifier using the images in the training set, and compute its accuracy on the test set. At the end of the 30th iteration, you will have a total of 200 images in your training set.
  6. Now consider another classifier which actively requests labels ("Active"). It should use the same initial training set, same initial classifier, and same pool of unlabeled images, but it should not request labels randomly. At each of 30 iterations, it should do the following:
  7. Plot the accuracy curves for both the passive classifier and the active classifier.
What to include in your submission:
  1. [15 points] A function [accuracy_passive] = passive_classifier(...); where accuracy_passive is a 1x31 vector whose first entry is the initial classifier accuracy, and each 1+i-th entry is the accuracy after the i-th iteration. Add inputs and outputs as needed.
  2. [20 points] A function [accuracy_active] = active_classifier(...); where accuracy_active is a 1x31 vector whose first entry is the initial classifier accuracy, and each 1+i-th entry is the accuracy after the i-th iteration. Add inputs and outputs as needed. Note that your passive and your active classifier must use the same initial training set, same test set, and same unlabeled set. They will pick different data to add to the training set in each iteration, but after the 30th iteration, they will end up with the same training set.
  3. [5 points] Plots for the passive and active classifier accuracies, with the iteration number on the x axis and the accuracy on the y axis, and an explanation of what you observe. Include both plots and the explanation in your write-up. Discuss how the "Active" and "Passive" classifier accuracies change as more data is added. Also discuss how their accuracies compare. Provide possible explanations for what you observe.