CS 2770: Homework 3

Due: 4/17/2018, 11:59pm

This assignment is worth 100 points, but will take less time than the previous assignments.

In this assignment, you will evaluate object detectors and recurrent neural networks trained with different parameters.

Part I: Object Detection with Versions of YOLO (70 points)

In this problem, you will use and evaluate two pre-trained object detectors, YOLO ("You Only Look Once: Unified, Real-Time Object Detection," by Redmon et al., CVPR 2016) and Tiny YOLO. You will compute mAP scores for each, on the test set from the PASCAL VOC challenge. You will also report the time taken by each detector.

Some code/scripts are provided for you on nietzsche.cs.pitt.edu in /own_files/cs2770_hw3/darknet_gpu/. (Warning: One filename contains explicit language; not our fault!) Note that because the only official release of YOLO is in Python, and there is no official Matlab interface to YOLO, this part of the assignment has to be done in Python. Please talk to the TA if you need help coding in Python.

We reuse code from Darknet YOLO, and we create a sample executable. You can run it with the following commands:
cd /own_files/cs2770_hw3/darknet_gpu/
bash run.sh

This file contains:
export LD_LIBRARY_PATH=/opt/cuda-7.5/lib64/ # Setup cuda7.5
python python/cs2770_darknet.py # main process (python process)

We also provide a script python/cs2770_darknet.py which contains:
# import library
import sys
sys.path.append('/own_files/cs2770_hw3/darknet_gpu/python/')
import darknet as dn

# Select GPU
dn.set_gpu(0)

# tiny_yolo, also specification for yolo network is provided in this file
net = dn.load_net("cfg/tiny-yolo-voc.cfg", "tiny-yolo-voc.weights", 0)
meta = dn.load_meta("cfg/voc_fp.data")

# detection
im_name = "data_test/VOCdevkit/VOC2007/JPEGImages/000293.jpg"
r = dn.detect(net, meta, im_name, thresh=.5)
print r

The detected bounding boxes r require a post-processing step, which is provided for you with the function post_process. Darknet outputs bounding boxes in the form of (CENTER_X, CENTER_Y, WIDTH, HEIGHT), however for evaluation we require (X, Y, WIDTH, HEIGHT). post_process will do this conversion for us. cs2770_darknet.py outputs: [['person', [151.35217475891113, 47.60009002685547, 39.76028060913086, 67.713623046875]], ['person', [21.47481918334961, 21.470802307128906, 59.922386169433594, 162.10108947753906]]], which follows the structure [CLASS, BOUNDING_BOX] where BOUNDING_BOX represents the (X, Y, WIDTH, HEIGHT) information.

Your task is to compare the performance of the YOLO network against the Tiny YOLO network. Tiny YOLO is much faster, but less accurate than the standard YOLO model. Tiny YOLO uses mostly convolutional layers, without large fully connected layers at the end. We are going to compare both networks using mean average precision (mAP) and time predictions.

For the former, we will look at how well a predicted positive bounding box matches a ground-truth bounding box, and will then compute mean average precision:

where

True Positive - TP(c): a predicted bounding box (pred_bb) was made for class c, there is a ground truth bounding box (gt_bb) of class c, and IoU(pred_bb, gt_bb) >= 0.5.
False Positive - FP(c): a pred_bb was made for class c, and there is no gt_bb of class c. Or there is a gt_bb of class c, but IoU(pred_bb, gt_bb) < 0.5.

For a given class c, to compute the Intersection over Union metric (IoU) (see below image) between any individual predicted bounding box and the ground truth bounding boxes, take the best overlap (i.e. the highest overlap between the predicted and any ground truth box) as your final score for that predicted bounding box. If there is no ground truth bounding box, but you predict a positive window, your score for that box is 0.

Test images were compiled from the PASCAL VOC dataset, and images and their correspondent bounding boxes annotations are located in:

IMS_FOLDER=/own_files/cs2770_hw3/darknet_gpu/data_test/VOCdevkit/VOC2007/JPEGImages/
ANNOT_FOLDER=/own_files/cs2770_hw3/darknet_gpu/data_test/VOCdevkit/VOC2007/labels/

The annotation files follow the structure: CATEGORY X Y WIDTH HEIGHT, and these are your ground truth data. In total, there are 20 categories: aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor.

Some suggested implementation steps are as follows.

Execute cs2770_darknet.py to import the Darknet library, select GPU, specify network, and run detection, as described above.
Retrieve all VOC classes. You can get information of all available classes in the file: /own_files/cs2770_hw3/darknet_gpu/data/voc.names.
Create a function evaluate_detector that receives IMS_FOLDER, ANNOT_FOLDER, net, meta, classes, and outputs the mAP measure. The locations of IMS_FOLDER and ANNOT_FOLDER were provided above. Initialize TP and FP with 0s for all VOC classes.
You can implement im_detector that receives IM_FILE (image file), IM_ANNOT (image bounding box annotations), net, meta, TP and FP, and outputs updated TP and FP using the current image information.
In im_detector, you need to calculate bounding boxes predictions. This requires that you run the network detector and apply the post-processing step which was provided for you. You can run:
r = dn.detect(net, meta, IM_FILE, thresh=.5)
r = map(post_process, r)
Go over all images using glob.glob(IMS_FOLDER + '*.jpg'), and update TP and FP using im_detector. Calculate mAP measures using TP and FP as described before.
To measure time for all bounding boxes predictions, you can use the time.time() function.
Repeat the same steps for the standard YOLO network. You can initialize this network as follows:
net = dn.load_net(shared_folder + "cfg/yolo-voc.cfg", shared_folder + "yolo-voc.weights", 0)
meta = dn.load_meta(shared_folder + "cfg/voc_fp.data")

Part II: Recurrent Neural Networks (30 points)

In this problem, you will compare different parameter configurations for a recurrent network, using the perplexity measure.

We reuse code from this Tensorflow tutorial, which shows how to train a recurrent neural network on a language modeling task. Its goal is to fit a probabilistic model which assigns probabilities to sentences. It does so by predicting next words in a text given a history of previous words.

We created a sample executable for you, and you can run it with the following commands.
cd /own_files/cs2770_hw3/rnn/code/
bash run.sh

This file has the following structure:
export CUDA_VISIBLE_DEVICES=3 # Decide which GPU to use
export LD_LIBRARY_PATH=/opt/cuda-8.0-cuDNN5.1/lib64 # Setup cuda8.0
source /own_files/cs2770_hw3/rnn/python_rnn/bin/activate # activate python environment with all required packages

python ptb_word_lm.py --data_path=/own_files/cs2770_hw3/rnn/simple-examples/data/ --model=small # Run RNN with a small configuration parameters. Data_path contains the directory with the required data
deactivate # deactivate python environment

First, verify that this program runs properly. It takes around 20 minutes. The output will show perplexity measures (see slides 32-36 here) for the train and validation data.

You are going to modify the SmallConfig parameters from ptb_word_lm.py (please don't modify this file, instead create a copy of it), and analyze their contribution to performance. The small configuration is as follows:

You are required to test three parameters: hidden_size, keep_prob and num_steps. The parameters' roles are as follows: hidden_size is the number of LSTM units in the network, keep_prob is the probability of keeping weights in the dropout layer, and num_steps is number of words that the model will learn from to predict the words coming after (for example, if num_steps=5 and the input is "The cat sat on the", the output will be "cat sat on the mat").

Your task is to generate train and validation perplexity curves with 8 different configurations, modifying hidden_size with (200, 400), keep_prob with (1.0, 0.5), and num_steps with (20, 30). Then, in a new text file, answer the following: What can you infer from these plots? What parameters are preferable for this problem?

Some suggested implementation steps are as follows:

Read this Tensorflow tutorial for background, and run the run.sh sample file. All tutorial files are already in /own_files/cs2770_hw3/rnn/code/.
Copy all files from /own_files/cs2770_hw3/rnn/code/ to your own directory.
Modify the SmallConfig class from ptb_word_lm.py, and create the 8 previously mentioned configurations: c1, c2, ..., c8. Look for "TODO1" to add the new configurations.
Update get_config from ptb_word_lm.py, so it can receive the name (c1-c8) for each of these 8 configurations by console. Look for "TODO2" to add the new configurations code.
Update the run.sh file to run all configurations, and save the outputs in separate files. For configuration c1, the command will be:
python ptb_word_lm.py --data_path=/own_files/cs2770_hw3/rnn/simple-examples/data/ --model=c1 > c1.txt
It will save all output in c1.txt file. Look for "TODO3".
Execute bash script: bash run.sh
Run python create_tables.py, and it will create c1.csv, ..., c8.csv files with perplexity measures.
Using the generated files, create a plot comparing the 8 parameter configurations, using your favorite plotting tool.

Grading rubric:

[70 points ] Part I - evaluation code, and text file with mAP and time measures for both networks
- Code for experiments (64 points)
  - Read ground-truth bounding boxes (10 points)
  - Predict bounding boxes (10 points)
  - Implement IoU function (10 points)
  - Calculate TP and FP for one image (10 points)
  - Calculate TP and FP for all images (10 points)
  - Calculate mAP measure (7 points)
  - Calculate prediction time (7 points)
- Text file with mAP and time measures (6 points)
[30 points] Part II - evaluation scripts, plots, and explanation text file
- Create 8 configurations (5 points)
- Run scripts and output CSV files (10 points)
- Plots for train and validation perplexity (5 points)
- Text file with conclusions (10 points)

Acknowledgement: This assignment was designed and prepared by Nils Murrugarra-Llerena.