## CS1699: Homework 3

Due: 11/03/2015, 11:59pm

### Part I: Circle detection with the Hough Transform (35 points)

Implement a Hough Transform circle detector that takes an input image and a fixed radius, and returns the centers of any detected circles of about that size. You are not allowed to use any built-in Matlab functions for finding edges or circles! Include the following:
1. [5 pts] function [edges] = detectEdges(im, threshold) -- A function to compute edges in an image. im is the input image in uint8 type and in color, and threshold is a user-set threshold for detecting edges. edges is an Nx4 matrix containing 4 numbers for each of N detected edge points: N(i, 1) is the x location of the point, N(i, 2) is the y location of the point, N(i, 3) is the gradient magnitude at the point, and N(i, 4) is the gradient orientation (non-quantized) at the point.
• In this function, simply compute the gradient magnitude and orientation at each pixel, and only return those (x, y) locations with magnitude that is higher than the threshold. Use the magnitude and orientation equations (and reuse as much code as possible) from HW2.
• Allow for the user to skip passing a value for a threshold; use if(nargin==1), and set a default value for the threshold, e.g. some multiple (try 1 through 5) of the average gradient magnitude in the image.
• Remember that the x direction corresponds to columns and the y direction corresponds to rows.
• At the end, display, save, and include in your writeup the thresholded edge image for an image of your choice, as in slide 24 here.

2. [15 pts] function [centers] = detectCircles(im, edges, radius, top_k) -- A function to find and visualize circles from an edge map. im, edges are defined as above, radius specifies the size of circle we are looking for, and top_k says how many of the top-scoring circle center possibilities to show. The output centers is an Nx2 matrix in which each row lists the x, y position of a detected circle's center.
• Consider using ceil(a / quantization_value) and ceil(b / quantization_value) (where, for example, quantization_value can be set to 5) to easily figure out quantization/bins in Hough space. Don't forget to multiply by quantization_value once you've figured out the Hough parameters with most votes.
• Ignore circle centers outside the image.
• Set 1 as the default value for top_k (if the user didn't specify this value) using nargin as above.
• Use this line at the end of your function to visualize circles: figure; imshow(im); viscircles(centers, radius * ones(size(centers, 1), 1));.
• In your writeup, describe how your code would be different if you didn't know the gradient orientation.

3. [10 pts] Demonstrate the function applied to the provided images jupiter.jpg and egg.jpg. Display the images with detected circle(s), labeling the figure with the radius, save your image outputs, and include them in your writeup. You can use impixelinfo to estimate the radius of interest manually. For each image, include results for at least 5 different radius values.

4. [5 pts] For one of the images, demonstrate the impact of the vote space quantization (bin size).
Useful Matlab functions: ind2sub, ceil, atan, sin, cos, viscircles, impixelinfo.

### Part II: Video search with Bags of Visual Words (65 points)

For this problem, you will implement a video search method to retrieve relevant frames from a video based on the features in a query region selected from some frame. This assignment is loosely based around the following paper: "Video Google: A Text Retrieval Approach to Object Matching in Videos", by J. Sivic and A. Zisserman, published in ICCV 2003, which can be found here.

Data and provided code

At this link, courtesy of Kristen Grauman, you can find precomputed SIFT features (sift.zip) for frames in a "Friends" episode, as well as the associated frames as images (frames.zip), and provided starter code (code.zip). The data takes up about 5.5G, so if possible, do not copy it but point to it directly in your code (//afs/cs.pitt.edu/courses/cs1699/sift/, //afs/cs.pitt.edu/courses/cs1699/frames/).

Each .mat file in sift.zip corresponds to a single image, and contains the following variables, where n is the number of detected SIFT features in that image:

 Variable Size + type Description descriptors nx128 double the SIFT vectors as rows imname 1x57 char name of the image file that goes with this data numfeats 1x1 double number of detected features orients nx1 double the orientations of the patches positions nx2 double the positions of the patch centers scales nx1 double the scales of the patches

The provided code includes the following. Feel free to copy relevant parts of the provided code in the functions you have to implement.
• loadDataExample.m: Read and run this first and make sure you understand the data format. Note that you'll have to modify the paths for the frames and SIFT files. It is a script that shows a loop of data files, and how to access each descriptor. It also shows how to use some of the other functions below.
• displaySIFTPatches.m: Given SIFT descriptor info, it draws the patches on top of an image.
• getPatchFromSIFTParameters.m: Given SIFT descriptor info, it extracts the image patch itself and returns as a single image.
• selectRegion.m: Given an image and list of feature positions, it allows a user to draw a polygon showing a region of interest, and then returns the indices within the list of positions that fell within the polygon.
• dist2.m: A fast implementation of computing pairwise distances between two matrices for which each row is a data point.
• kmeansML.m: A fast k-means implementation that takes the data points as columns.
You are not required to use any of these functions, but you will probably find them helpful.

What to implement and discuss in your write-up

Write one script for each of the following (along with any helper functions you find useful), and in your writeup report on the results, explain, and show images where appropriate. Implement the functionality in the order given below.
1. [5 pts] function [inds] = matchRawDescriptors(d1, d2) that computes nearest raw SIFT descriptors. d1, d2 are Mx128 and Nx128 are sets of descriptors for two images respectively, where M, N are the number of keypoints detected in the first and second image. inds should contain those indices over the full set of N descriptors in the second image that match some descriptor in the first image.
• Use the Euclidean distance between SIFT descriptors to determine which are nearest among two images' descriptors. That is, "match" each feature from the first image to its nearest neighbor in the second image, and store as many indices into the second image's descriptor set as you have features in the selected region of the first image.
• Do not quantize to visual words at this step.
• To find the minimum values and indices in a matrix X along the i-th dimension, use min(X, [], i).

2. [10 pts] A script rawDescriptorMatches.m that allows a user to select a region of interest in one frame (see provided selectRegion function), and then match descriptors in that region to descriptors in the second image based on Euclidean distance in SIFT space (use the matchRawDescriptors you wrote above).
• Run your code on the two images and associated features in the provided file twoFrameData.mat (in the provided code zip file; run load('twoFrameData');) to demonstrate.
• Select a region of interest (a polygon) in the first image, and display the matched features in the second image, something like the below example. Include in your write-up a screenshot of the selected region, as well as the retrieved matched features, for three different selected image regions.
• See provided helper function displaySIFTPatches.

3. [15 pts] A script visualizeVocabulary.m to visualize a vocabulary.
• First, load some SIFT descriptors and compute a k-means clustering using those. You want to cluster a large, representative random sample of SIFT descriptors from some of the frames. (If you use a sample from just a single part of the episode, you will not get good results because your sample is not representative.) Note that you may run out of memory if you use all descriptors. Select a random sample of frames, and a random sample of features within each frame (see randperm). Sample 300 frames and at most 100 (1:min(100, length(...))) features per frame.
• Then run kmeansML to get your cluster memberships and means. Let the k centers be the visual words. The value of k is a free parameter; for this data something like k=1500 should work, but feel free to play with this parameter. See the provided kmeansML.m code.
• Choose two words that are distinct enough. (Hint: use the distances between them to pick those words automatically, but first restrict the choice of clusters to show to those that have at least 25 patches.)
• Display example image patches associated with the two visual words you've chosen. The goal is to show what the different words are capturing, and display enough patch examples so the word content is evident (e.g., say 25 patches per word displayed), similar to slide 59 here.
• See the provided helper function getPatchFromSIFTParameters.
• Remember to keep track of the parent image for each feature.
• Save the cluster memberships and means into a file, using save('centers.mat', 'membership', 'means');
• In your writeup, include your two visualized clusters, and explain what you see.

4. [5 pts] function [bow] = computeBOWRepr(descriptors, means) to compute a Bag-of-Words (BOW) representation of an image or image region (polygon). bow is a normalized bag-of-words histogram. descriptors is the Mx128 set of descriptors for the image or image region, and means is the kx128 set of cluster means.
• Map a raw SIFT descriptor to its visual word. The raw descriptor is assigned to the nearest visual word. See the provided dist2.m code for fast distance computations.
• Map an image or region's features into its bag-of-words histogram. The histogram for image Ij is a k-dimensional vector: F(Ij) = [ freq1, j    freq2, j    ...    freqk, j ], where each entry freqi, j counts the number of occurrences of the i-th visual word in that image, and k is the number of total words in the vocabulary. In other words, a single image's or image region's list of M SIFT descriptors yields a k-dimensional bag of words histogram.
• Matlab's histc is a useful function.
• You don't have to implement TF-IDF weighting.

5. [5 pts] function [sim] = compareSimilarity(bow1, bow2) to compare the similarity score for two bag-of-words histograms using the normalized scalar product, as in slide 69 here.

6. [10 pts] A script fullFrameQueries.m to process full frame queries.
• This part is similar to the last sub-part of HW2, but using a better similarity metric and a lot more images.
• Choose 3 different frames from the entire video dataset to serve as queries.
• Display the M=5 most similar frames to each of these queries (in rank order) based on the normalized scalar product between their bag of words histograms. Sort the similarity scores between a query histogram and the histograms associated with the rest of the images in the video. Pull up the images associated with the M most similar examples. See Matlab's sort function.
• For debugging, just load a small number of frames (and pick your query among those), compute their BOW, and leave the other frames' BOWs initialized to all zeros.
• In your write-up, include each query along with its most similar 3 images, and explain the results.

7. [15 pts] A script regionQueries.m to process region queries.
• Select your favorite query regions from each of 4 frames (which may be different than those used above) to demonstrate the top 3 retrieved frames when only a portion of the SIFT descriptors are used to form a bag of words.
• Form a query from a region within a frame. Select a polygonal region interactively with the mouse, and compute a bag of words histogram from only the SIFT descriptors that fall within that region.
• Try to include example(s) where the same object appears amidst different objects or backgrounds, and also include a failure case.
• Don't retrieve results from the same image as the query region.
• See the provided selectRegion.m code, and re-use some of your code from the part above.
• Run a close all, or open a new figure, before running the region selection.
• In your write-up, include each of the 4 queries (take a screenshot of the image with selected region) along with its most similar 3 images, and explain the results, including possible reasons for the failure cases.

Acknowledgement: Both parts of this homework are adapted from assignments by Kristen Grauman.