CS1699: Homework 2
Due: 10/08/2015, 11:59pm
Instructions: Please provide your written answers (for parts I, II and III) and your code (for parts II and III). Include your image results in your written answers file. Your written answers should be in the form of a PDF or Word document (.doc or .docx). Your code should be written in Matlab. Zip or tar your written answers, image results and .m files and upload the .zip or .tar file on CourseWeb -> CS1699 -> Assignments -> Homework 2. Name the file YourFirstName_YourLastName.zip or YourFirstName_YourLastName.tar.
Part I: Short Answers (15 points)
Part II: Color Quantization with K-means (30 points)
- Suppose we form a texture description using textons built from a filter bank of multiple
anisotropic derivative of Gaussian filters at two scales and six orientations (as displayed
below). Is the resulting representation sensitive to orientation, or is it invariant to
orientation? Explain why.
- Consider the figure below. Each small square
denotes an edge point extracted from an image. Say
we are going to use k-means to cluster these points'
positions into k=2 groups. That is, we will run kmeans
where the feature inputs are the (x,y)
coordinates of all the small square points. What is a
likely clustering assignment that would result? Briefly
explain your answer.
- When using the Hough Transform, we often discretize the parameter space to collect
votes in an accumulator array. Alternatively, suppose we maintain a continuous vote
space. Which grouping algorithm (among k-means, mean-shift, or graph-cuts) would be
appropriate to recover the model parameter hypotheses from the continuous vote space?
For this problem you will write code to quantize
a color space by applying k-means clustering
to the pixels in a given input image, and
experiment with two different color spaces---
RGB and HSV.
You are welcome to use the built-in Matlab function kmeans.
Include each of
the following components in your submission:
Useful Matlab functions: kmeans, rgb2hsv, hsv2rgb, imshow, double, uint8, reshape, repmat,
[5pts] Given an RGB image, quantize the 3-dimensional RGB space, and map each pixel
in the input image to its nearest k-means center. That is, replace the RGB value at each
pixel with its nearest cluster's average RGB value. For example, if you set k=2, you might get:
Since these average RGB values may not be integers, you should round them to the nearest integer (1 through 255). Use the following form:
function [outputImg, meanColors, clusterIds] = quantizeRGB(origImg, k)
where origImg and outputImg are RGB images of type uint8, k specifies the number of colors to
quantize to, and meanColors is a kx3 array of the k centers (one value for each cluster and each color channel). clusterIds is a numpixelsx1 matrix (with numpixels = numrows * numcolumns) that says which cluster each pixel belongs to.
Matlab tip: if the
variable im is a 3d matrix containing a color image with numpixels pixels, X =
reshape(im, numpixels, 3); will yield a matrix with the RGB features as its rows.
[5 pts] Given an RGB image, convert to HSV, and quantize the 1-dimensional Hue space.
Map each pixel in the input image to its nearest quantized Hue value, while keeping its
Saturation and Value channels the same as the input. Convert the quantized output back
to RGB color space. Use the following form:
function [outputImg, meanHues, clusterIds] = quantizeHSV(origImg, k)
where origImg and outputImg are RGB images of type uint8, k specifies the number of clusters,
meanHues is a kx1 vector of the hue centers, and clusterIds is defined as above.
[5 pts] Write a function to compute the sum-of-squared-differences (SSD) error between the original RGB pixel values
and the quantized values, with the following form:
function [error] = computeQuantizationError(origImg,
where origImg and quantizedImg are both RGB images of type uint8, and error is a scalar
giving the total SSD error across the image.
[5 pts] Given an image, compute and display (using the Matlab function histogram) two histograms of its hue values. Let the
first histogram use equally-spaced bins (uniformly dividing up the hue values), and let the
second histogram use bins defined by the k cluster center memberships (i.e., all pixels
belonging to hue cluster i go to the i-th bin, for i=1, ..., k). Reuse (call) functions you've written above whenever possible. Use the following form:
function [histEqual, histClustered] = getHueHists(im, k)
where im is the input color image of type uint8, and histEqual and histClustered are the two output
[5 pts] Write a script colorQuantizeMain.m that calls all the above functions
appropriately using the provided image fish.jpg, and displays the results. Include the image results, histograms, and error scores for both the RGB and HSV quantizations. Illustrate the
quantization with at least three different values of k. Be sure to convert an HSV image back
to RGB before displaying with imshow. Label all plots clearly with titles. Save your image results and include the results in your written answer sheet.
[5 pts] Briefly answer the following. How and why do the results differ based on the value of k? How do the two forms of histogram differ?
How do results vary depending on the color space?
Part III: Feature Extraction and Description (55 points)
In this problem, you will implement a feature extraction/detection and description pipeline, followed by a simple image retrieval task. While you will not exactly implement it, the SIFT paper by David Lowe is a useful resource, in addition to Section 4.1 of the Szeliski textbook. What you should include in your submission:
Now you have implemented a full basic image retrieval pipeline!
- [15 points] function [x, y, scores, Gx, Gy] = extract_keypoints(image) -- Code to perform keypoint detection (feature extraction) using the Harris corner detector, as described in class.
- You can use a window function of your choice; opt for the simplest one. image is a color image of type uint8 which you should convert to grayscale and double in your function.
- Each of x, y is an nx1 vector that denotes the x and y locations, respectively, of each of the n detected keypoints. Keep in mind that x denotes the horizontal direction, hence columns of the image, and y denotes the vertical direction, hence rows, but you can count from the top-left of the image.
- scores is an nx1 vector that contains the value to which you applied a threshold, for each detected keypoint.
- Gx, Gy are matrices with the same number of rows and columns as your input image, and store the gradients in the x and y directions at each pixel.
- You should also perform non-maximum suppression by only keeping those keypoints whose R score is larger than all of their 8 neighbors; if a keypoint does not have 8 neighbors, do not keep it. Don't remove indices while looping over pixels; instead keep a vector of indices you want to remove (start it empty and concatenate indices to it as needed) and then run the unique operation on it, then set the keypoints at those indices to .
- Also output in your function the fraction of keypoints out of the total number of pixels in the image.
- You can set the threshold for the "cornerness" score R however you like; for example, you can set it to 5 times the average R score. Or, you can simply output the top n keypoints (e.g. top 1%).
- The scores/x/y that you output should correspond to the final set of keypoints, after non-max suppression.
- [15 points] function [features, x, y, scores] = compute_features(image, x, y, scores, Gx, Gy) -- Code to perform feature description, similarly to Lowe's paper.
- image, x, y, scores, Gx, Gy are defined as above, but you do not have to convert the image in any way.
- features is an nxd matrix, each row of which contains the d-dimensional descriptor for the n-th keypoint.
- d should be equal to 4x4x8, and contain the 8-dimensional histogram of gradients in each cell of the 4x4 grid centered around each detected keypoint. Each of the 4x4 grid cells has an area of 4x4 pixels, so it contains a summary of the gradients in a 4x4-pixel region. Quantize the gradient orientations in 8 bins (so put values between 0 and 22.5 degrees in one bin, the 22.5 to 45 degree angles in another bin, etc.).
To populate the histogram, sum the gradient magnitudes that you have along each of the 8 orientations. Finally, you should clip all values to 0.2 as discussed in class, and normalize each descriptor to be of unit length. You do not have to implement any more sophisticated detail from the Lowe paper.
- If any of your detected keypoints are less than 7 pixels from the top/left or 8 pixels from the bottom/right of the image, erase this keypoint from the x, y, scores vectors and do not compute a descriptor for it.
- Since a 16x16 patch of pixels does not exactly center on a pixel, we'll "center" by putting the keypoint pixel at location (8, 8) in the 16x16 patch, i.e. it will technically be upper-left of the absolute center.
- To compute the gradient magnitude m(x, y) and gradient angle θ(x, y) at point (x, y), take L to be the image and use the formula below, but note that Matlab's atan returns values in radians:
- [10 points] Now pick a test set of 10 images and run your feature extraction and description on them.
Visualize the keypoints you have detected, for example by drawing circles over them. Use the scores variable and make keypoints with higher scores correspond to larger circles. Note that Matlab's plot counts from the top-left when plotting over an image. Save your code in a script called part3_c.m. Save the figures that show your features and include them in your answer sheet.
- [15 points] For one of the images in your test set (which we shall call the query image), rank the images in the remainder of the test set based on how similar they are to the query.
- For each image in your test set, pick a random subset (see randperm) with size that is equal to the smaller of (1) the total number of keypoints detected for that image, and (2) 500 (use min(a, b)).
- To compute similarity between two images, we will use these subsets of keypoints for the two images. Consider all pairs of keypoints such that the first keypoint comes from the first image, and the second keypoint comes from the second image. Then compute the Euclidean distance between the descriptors of each pair of keypoints. Finally, to get the similarity between the two images considered, average all the Euclidean distances for their keypoint descriptors. Remember that low distance means high similarity.
- Show the query image and the remaining images, ranked in descending order of similarity to the query image, in your answer sheet.
- Save your code in a script called part3_d.m.
- Hints: you can use two cell arrays to store (1) the image filenames, and (2) the features of the i-th image, and loop over the length of those (equally-sized) arrays. To compute Euclidean distance between two feature sets, use
sqrt(sum(power(feats1(m, :) - feats2(n, :), 2)).
Acknowledgement: Parts I and Part II and adapted from Kristen Grauman. Part III was inspired in part by an assignment by James Hays.