CS1674: Homework 5 - Programming

Due: 10/10/2016, 11:59pm -- note this is Monday, i.e. you have 12 days to complete this assignment instead of the usual 7 days. However, it will also take more time, around 8-10 hours.

This assignment is worth 50 points.

Part I: Video Search with Bags of Visual Words (35 points)

For this problem, you will implement a video search method to retrieve relevant frames from a video based on the features in a query region selected from some frame. This assignment is loosely based around the following paper: "Video Google: A Text Retrieval Approach to Object Matching in Videos", by J. Sivic and A. Zisserman, published in ICCV 2003, which can be found here.

Most of the required functionality is implemented for you, you just have to (1) compute a bag-of-words representation, and (2) use it to do search.


Data and provided code

Courtesy of Kristen Grauman, you can download precomputed SIFT features (sift.zip) for frames in a "Friends" episode, as well as the associated frames as images (frames.zip). The full data takes up 4.76GB. You can also just work with a smaller subset (297MB, 301 video frames). To do so, download sift_subset.zip and frames_subset.zip.

Each .mat file in sift.zip corresponds to a single image, and contains the following variables, where n is the number of detected SIFT features in that image:

VariableSize + typeDescription
descriptorsnx128 doublethe SIFT vectors as rows
imname1x57 charname of the image file that goes with this data
numfeats1x1 doublenumber of detected features
orientsnx1 doublethe orientations of the patches
positionsnx2 doublethe positions of the patch centers
scalesnx1 doublethe scales of the patches

On CourseWeb, attached to the assignment, you can find provided starter code. The provided code includes the following. Feel free to copy relevant parts of the provided code in the functions you have to implement if needed. What to implement and submit
  1. [5 pts]
    Run visualizeVocabulary.m and explain what you see, in a text file titled vocabulary.txt. This function will also store a centers.mat file which you will need later.

  2. [10 pts]
    function [bow] = computeBOWRepr(descriptors, means) to compute a Bag-of-Words (BOW) representation of an image or image region (polygon). bow is a normalized bag-of-words histogram. descriptors is the Mx128 set of descriptors for the image or image region, and means is the 128xk set of cluster means. You want to compute a representation like the two vectors shown in slide 23 here.

  3. [20 pts: 15 points for correctness of code and 5 points for image results]
    A script regionQueries.m to demonstrate what frames/images are retrieved when we do a search using region queries, for 5 query regions (i.e. polygon regions from frames which you select with the mouse). For each query, show the top 3 retrieved frames when the SIFT descriptors in the selected region are used to form a bag of words, and that bag of words is matched against the bag of words for each frame.


Part II: Finding Matches between Points with a Homography (15 points)

In this exercise, you will compute an image homography, from matching points between two images. Using this homography, you can tell where points from the first image appear in the second image. You can also compute a warp between the two images, but we will not implement the warping for this exercise.

The homography function is provided for you, you just have to:
  1. write a script titled homography_script.m where you load images, select matching points, compute a homography, apply it to a new point from the first image, and show in different colors the point in the first image and its match (computed using a homogarphy) in the second image, and
  2. write a function [p2] = apply_homography(p1, H) to apply the homography and convert back from homogeneous coordinates.
More detailed instructions:
  1. Use the following two images: img1, img2. In your homography_script.m, load them into Matlab and show them in separate figures, followed by the command impixelinfo after each figure. This will allow you to see pixel coordinates at the bottom of the figures, when you hover over the images.
  2. Examine the images, and determine at least four pairs of points (in each pair, one point should be from the first image, and one from the second image) that are distinctive. Write them down in matrix form in the script, with rows being the points and columns being the x and y locations. This will give you the A, B to use below.
  3. Provided is a function H = compute_homography(A, B) that computes a homography between the points from the first image (in matrix A) and second image (in matrix B) that you found in the previous step.
  4. Now pick one new point from the first image, and use the computed homography to compute where it "lands" in the second image. Write a function [p2] = apply_homography(p1, H) to do this. Don't forget to covert back from homogeneous coordinates as discussed in class.
  5. Finally, in your script, write down the "test" points, and find where it lands in the second image. Also create two appropriately named figures, one of which shows the first image, with the p1 point selected from it shown in yellow, and the other shows the second image, with the p2 point computed using the homography, shown in red.


Acknowledgement: Part I is adapted from Kristen Grauman. The images in Part II were provided by Derek Hoiem.