CS1674: Homework 7 - Programming

Due: 11/2/2016, 11:59pm

This assignment is worth 50 points.

In this problem, you will develop two variants of a scene categorization system.

Get the scene categorization dataset provided by Svetlana Lazebnik from here.
You will need to extract your own features, using the VLFeat package. Use the function vl_sift. To set up VLFeat, download this binary, and follow these instructions. Make sure to run both steps of the demo to see a SIFT descriptor show up.
Divide the dataset into a training and test set. Use roughly half of the images from each category/class for training, and the rest for testing. If this is causing your program to run too slowly, feel free to use a smaller sample of both training and test images, but ensure you have at least 20 images from each class for both training and testing.
Compute a spatial pyramid over the features. The spatial pyramid representation was proposed in 2006 by Svetlana Lazebnik, Cordelia Schmid and Jean Ponce. The procedure of computing the pyramid is summarized in the following image from the paper (you don't have to recreate this figure, and don't have to make your image square), and described below.
Note that draft Spatial Pyramid Match code (computeSPMHistogram) is provided for you on CourseWeb, but you might need to adjust it in some way.
You will need to create a ''bag of words'' representation of the features in the image. To do this, you will run k-means on the SIFT feature descriptors of all training images (or a subset of all training images, if k-means is running too slowly). Make sure to include images from all classes in the set of SIFT descriptors on which you run k-means. Use kmeansML.m from HW5P. Save a variable means like you did in HW5P. A function to compute a BOW representation (getHistogram) for each training and test image is provided (similar to the one you implemented yourself in HW5P). This will give you the representation shown in the left-hand side of the figure, where the circles, diamonds and crosses denote different ''words'', in this toy example with k = 3. In your implementation, use k = 100. This forms your representation of the image, at level L = 0 of the pyramid. This is part of the computeSPMHistogram function which is provided for you.
Then, the draft SPM code divides the image into four quadrants as shown below. You need to know the locations of the feature descriptors so that you know in which quadrant they fall; VLFeat provides these (see documentation for vl_sift). Now the code will compute histograms as above, but will compute one histogram vector for each quadrant.
In the original paper, there is one more subdivision into sixteen regions as shown below, and computation of one histogram for each cell in the grid.
Finally, the draft code will concatenate the histograms computed in the above steps. Make sure you concatenate all histograms in the same order for all images. This will give you a 1xd-dimensional descriptor.
Now that you have a representation for each image, it is time to learn a classifier which can predict, for a test image, to which of 15 scene category/class it belongs. Each folder in the scene category dataset is a different category. All images from the same scene category will have the same label. The label values should be integers between 1 and 15.
You will use the KNN (k nearest neighbors) classifier. You have to write your own code, and your are NOT allowed to use the built-in Matlab function for KNN! Note that this k (and its value) is not the same as the k in k-means. For each test image, compute the Euclidean distance between its descriptor and each training image's descriptor (the descriptors are now the Spatial Pyramids). You can use the dist2 code from HW5P. Then find its k closest neighbors among only training images. Since these are training images, you know their labels. Find the mode (most common value; see Matlab's function mode) among the labels, and assign the test image to this label. In other words, the neighbors are "voting" on the label of the test image. The value k you use for KNN will be discussed below.
After performing classification, you need to evaluate the accuracy of your classifiers. You need to compute what fraction of the test images was assigned the correct label, i.e., the "ground truth" label that came with the dataset. This function is provided for you (computeAccuracy(trueLabels, predictedLabels), where trueLabels is the Nx1 vector of ground truth labels that came with the dataset in the form of membership to different folders, and predictedLabels is the corresponding Nx1 vector of labels predicted by the classifier).

What you need to include in your submission:

[15 points] function [pyramid] = computeSPMHistogram(im, means); which computes the Spatial Pyramid Match histogram as discussed above. im should be a grayscale image whose SIFT features you should extract, means should be the cluster centers from the bag-of-visual-words clustering operation, and pyramid should be a 1xd feature descriptor for the image. You're allowed to pass in optional extra parameters after the first two. Note that a draft of this code is provided for you.
[15 points] function [labels] = findLabelsKNN(pyramids_train, pyramids_test, labels_train, k); which predicts the labels of the test images using the KNN classifier. pyramids_train, pyramids_test should be an Mx1 cell array and an Nx1 cell array, respectively, where M is the size of the training image set and N is the size of your test image set, and each pyramids{i} is the 1xd Spatial Pyramid Match representation of the corresponding training or test image. labels_train should be an Mx1 vector of training labels, and labels should be a Nx1 vector of predicted labels for the test images. k is the k in K-Nearest Neighbors, and you select the value (you will try different values).
[20 pts] A script which get all images and their labels (feel free to reuse code from HW5P that shows how to get the contents of a directory, or see the provided getData for inspiration), extracts the features of training images, runs kmeansML to find the codebook centers, then computes SPM representations, and runs the KNN classifier, including computing accuracy. In this script, run the KNN classification with the following values for the k (different from the k-means k = 100): 1, 5, 25, 125. In other words, you have to run KNN 4 times and show 4 accuracy values.