CS1699: Homework 3

Due: 11/03/2015, 11:59pm

Instructions: Please provide your written answers and your code. Include your image results in your written answers file (you don't have to submit them as separate files). Your written answers should be in the form of a single PDF or Word document (.doc or .docx). Your code should be written in Matlab. Zip or tar your written answers and .m files and upload the .zip or .tar file on CourseWeb -> CS1699 -> Assignments -> Homework 3. Name the file YourFirstName_YourLastName.zip or YourFirstName_YourLastName.tar.

Part I: Circle detection with the Hough Transform (35 points)

Implement a Hough Transform circle detector that takes an input image and a fixed radius, and returns the centers of any detected circles of about that size. You are not allowed to use any built-in Matlab functions for finding edges or circles! Include the following:
  1. [5 pts] function [edges] = detectEdges(im, threshold) -- A function to compute edges in an image. im is the input image in uint8 type and in color, and threshold is a user-set threshold for detecting edges. edges is an Nx4 matrix containing 4 numbers for each of N detected edge points: N(i, 1) is the x location of the point, N(i, 2) is the y location of the point, N(i, 3) is the gradient magnitude at the point, and N(i, 4) is the gradient orientation (non-quantized) at the point.

  2. [15 pts] function [centers] = detectCircles(im, edges, radius, top_k) -- A function to find and visualize circles from an edge map. im, edges are defined as above, radius specifies the size of circle we are looking for, and top_k says how many of the top-scoring circle center possibilities to show. The output centers is an Nx2 matrix in which each row lists the x, y position of a detected circle's center.

  3. [10 pts] Demonstrate the function applied to the provided images jupiter.jpg and egg.jpg. Display the images with detected circle(s), labeling the figure with the radius, save your image outputs, and include them in your writeup. You can use impixelinfo to estimate the radius of interest manually. For each image, include results for at least 5 different radius values.

  4. [5 pts] For one of the images, demonstrate the impact of the vote space quantization (bin size).
Useful Matlab functions: ind2sub, ceil, atan, sin, cos, viscircles, impixelinfo.

Part II: Video search with Bags of Visual Words (65 points)

For this problem, you will implement a video search method to retrieve relevant frames from a video based on the features in a query region selected from some frame. This assignment is loosely based around the following paper: "Video Google: A Text Retrieval Approach to Object Matching in Videos", by J. Sivic and A. Zisserman, published in ICCV 2003, which can be found here.

Data and provided code

At this link, courtesy of Kristen Grauman, you can find precomputed SIFT features (sift.zip) for frames in a "Friends" episode, as well as the associated frames as images (frames.zip), and provided starter code (code.zip). The data takes up about 5.5G, so if possible, do not copy it but point to it directly in your code (//afs/cs.pitt.edu/courses/cs1699/sift/, //afs/cs.pitt.edu/courses/cs1699/frames/).

Each .mat file in sift.zip corresponds to a single image, and contains the following variables, where n is the number of detected SIFT features in that image:

VariableSize + typeDescription
descriptorsnx128 doublethe SIFT vectors as rows
imname1x57 charname of the image file that goes with this data
numfeats1x1 doublenumber of detected features
orientsnx1 doublethe orientations of the patches
positionsnx2 doublethe positions of the patch centers
scalesnx1 doublethe scales of the patches

The provided code includes the following. Feel free to copy relevant parts of the provided code in the functions you have to implement. You are not required to use any of these functions, but you will probably find them helpful.

What to implement and discuss in your write-up

Write one script for each of the following (along with any helper functions you find useful), and in your writeup report on the results, explain, and show images where appropriate. Implement the functionality in the order given below.
  1. [5 pts] function [inds] = matchRawDescriptors(d1, d2) that computes nearest raw SIFT descriptors. d1, d2 are Mx128 and Nx128 are sets of descriptors for two images respectively, where M, N are the number of keypoints detected in the first and second image. inds should contain those indices over the full set of N descriptors in the second image that match some descriptor in the first image.

  2. [10 pts] A script rawDescriptorMatches.m that allows a user to select a region of interest in one frame (see provided selectRegion function), and then match descriptors in that region to descriptors in the second image based on Euclidean distance in SIFT space (use the matchRawDescriptors you wrote above).

  3. [15 pts] A script visualizeVocabulary.m to visualize a vocabulary.
  4. [5 pts] function [bow] = computeBOWRepr(descriptors, means) to compute a Bag-of-Words (BOW) representation of an image or image region (polygon). bow is a normalized bag-of-words histogram. descriptors is the Mx128 set of descriptors for the image or image region, and means is the kx128 set of cluster means.

  5. [5 pts] function [sim] = compareSimilarity(bow1, bow2) to compare the similarity score for two bag-of-words histograms using the normalized scalar product, as in slide 69 here.

  6. [10 pts] A script fullFrameQueries.m to process full frame queries.

  7. [15 pts] A script regionQueries.m to process region queries.

Acknowledgement: Both parts of this homework are adapted from assignments by Kristen Grauman.