CS2770: Homework 3

Due: 3/23/2017, 11:59pm

Part I: Color quantization with K-means (20 points)

For this problem you will write code to quantize a color space by applying K-means clustering to the pixels in a given input image. You are allowed to use built-in K-means code in Matlab or Python; you do not have to write your own K-means. Include each of the following components in your submission:

[10 pts] Given an RGB image, perform clustering in the 3-dimensional RGB space, and map each pixel in the input image to its nearest center. That is, replace the RGB value at each pixel with its nearest cluster's average RGB value. For example, if you set K=2, you might get:

Since these average RGB values may not be integers, you should round them to the nearest integer (1 through 255). Your function should be called quantizeRGB, should take in inputs origImg and k, and return outputs outputImg, meanColors, clusterIds. The variables origImg and outputImg are RGB images, k specifies the number of colors to quantize to, and meanColors is a Kx3 array of the K centers (one value for each cluster and each color channel). clusterIds is a numpixelsx1 matrix (with numpixels = numrows * numcolumns) that says which cluster each pixel belongs to.

[2 pts] Write a function to compute the Euclidean distance between the original RGB pixel values and the quantized values. Your function should be called computeQuantizationError, should take in inputs origImg, quantizedImg, and should return an output error, where origImg and quantizedImg are both RGB images, and error is a real number.

[8 pts] Write a function colorQuantizeMain that calls all the above functions appropriately using the image fish.jpg, and displays the results. Illustrate the quantization with at least three different values of K. Label all plots clearly with titles. In a text file explanation.txt, briefly answer the following: How and why does the error differ based on the value of K?

Part II: Edge detection and circle detection (30 points)

In this problem, you will implement (1) a simple edge detector, and (2) a Hough Transform circle detector that takes an input image and a fixed radius, and returns the centers of any detected circles of about that size. You are not allowed to use any built-in functions for finding edges or circles. Include the following in your submission:

[10 pts] A function called detectEdges which takes in as input im, threshold and returns output edges. This function computes edges in an image. im is the input color image, and threshold is a user-set threshold for detecting edges. edges is an Nx4 matrix containing 4 numbers for each of N detected edge points: the x location of the point, the y location of the point, the gradient magnitude at the point, and the gradient orientation (non-quantized) at the point.
- In this function, first convert the image to grayscale. Then simply compute the gradient magnitude and orientation at each pixel, and only return those (x, y) locations with magnitude that is higher than the threshold. You can reuse code from HW2.
- At the end, display, save, and include in your submission the thresholded edge image for an image of your choice.
- Remember that the x direction corresponds to columns and the y direction corresponds to rows.

[15 pts] A function called detectCircles which takes in as input im, edges, radius, top_k and returns as output centers. This function finds and visualizes circles from an edge map. im, edges are defined as above, radius specifies the size of circle we are looking for, and top_k says how many of the top-scoring circle center possibilities to show. The output centers is a Kx2 matrix in which each row lists the x, y position of a detected circle's center.
- Follow the pseudocode for finding circles shown in class, and use the gradient orientation you computed in the function above.
- Consider using ceil(a / quantization_value) and ceil(b / quantization_value) (where, for example, quantization_value can be set to 5) to easily figure out quantization/bins in Hough space. Don't forget to multiply by quantization_value once you've figured out the Hough parameters with most votes, to find out the actual x, y location corresponding to the selected bin.
- Ignore circles whose centers are outside the image.
- Your code should visualize the top K found circles. Consider using these functions (you're only allowed to use them to visualize the circles you found with your implementation of the Hough transform): viscircles (Matlab) and circle_perimeter (Python).
- You might find the function impixelinfo (Matlab, Python) useful for getting an estimate of the radius of circles in images manually.

[5 pts] Demonstrate the function applied to the images jupiter.jpg and egg.jpg. Display the images with detected circle(s), labeling the figure with the radius, save your image outputs, and include them in your submission.

Part III: Spatial Pyramid Match (50 points)

In this problem, you will develop a scene categorization system, using the spatial pyramid representation proposed in 2006 by Svetlana Lazebnik, Cordelia Schmid and Jean Ponce.

Get the scene categorization dataset provided by Svetlana Lazebnik from here.
You will need to extract your own features, using the VLFeat package. You will need to set up VLFeat. Use the function vl_sift (run the demo to see a SIFT descriptor show up).
Divide the dataset into a training and test set. Use roughly half of the images from each category/class for training, and the rest for testing. If this is causing your program to run too slowly, feel free to use a smaller sample of training/test images.
Compute a spatial pyramid over the features. The procedure of computing the pyramid is summarized in the following image from the paper (you don't have to recreate this figure, and don't have to make your image square). It was also described in class, and is briefly reviewed below. You will create a pyramid with only two levels (level 0 and level 1).
You will need to create a ''bag of words'' representation of the features in the image. To do this, you will run K-means on the SIFT feature descriptors of all training images (or a subset of all training images, if K-means is running too slowly). Make sure to include images from all classes in the set of SIFT descriptors on which you run K-means. You will only run K-means and generate a bag-of-words vocabulary once. This will give you the representation shown in the left-hand side of the figure, where the circles, diamonds and crosses denote different ''words'', in this toy example with K=3. In your implementation, use K=100. This forms your representation of the image, at level L = 0 of the pyramid.
Then, divide the image into four quadrants as shown below. You need to know the locations of the feature descriptors so that you know in which quadrant they fall; VLFeat provides these (see documentation for vl_sift). Now the code will compute histograms as above, but will compute one histogram vector for each quadrant.
Finally, concatenate the histograms computed in the above steps. Make sure you concatenate all histograms in the same order for all images. This will give you a 1xD-dimensional descriptor.
Now that you have a representation for each image, it is time to learn an SVM classifier which can predict, for a test image, to which of 15 scene categories it belongs. Each folder in the scene category dataset is a different category. All images from the same scene category have the same label. The label values should be integers between 1 and 15.
After performing classification, you need to evaluate the accuracy of your classifier, i.e. what fraction of the test images was assigned the correct label (the one that came with the dataset).

What you need to include in your submission:

[30 points] A function computeSPMHistogram which takes in inputs im, means and returns output pyramid. This function computes the Spatial Pyramid Match histogram as discussed above. im should be a grayscale image whose SIFT features you should extract inside the function, means should be the cluster centers from the bag-of-visual-words clustering operation, and pyramid should be a 1xD feature descriptor for the image.
[20 pts] A function SPMMain which gets the training/test images and their labels, extracts the SIFT features of training images that will be used for clustering, runs K-means to find the cluster means, computes SPM representations for all images, runs the SVM, and computes accuracy.

Acknowledgement: Parts I and II of this assignment are adapted from Kristen Grauman.