CS 2770: Homework 3

Due: 4/9/2019, 11:59pm

The main goal of this assignment is to make you familiar with the state of art object detection methods. In this assignment, you will work on two datasets to explore the effective factors in the performance of the object detection. You will be using Faster RCNN network pretrained on COCO dataset. Since the datasets in this homework are different from the COCO, you need to further train the model to fine tune it on our datasets. One of the datasets contains 5 categories of objects and the other one is a dataset of pedestrian. In this assignment, you will separately train networks for each of these datasets, compare their performance, and finally discuss the possible reasons for the differences in performance. Considering the training time, this homework will take around 14 hours to complete.
The homework description has 5 parts. In part I, we have prepared the steps that you need to follow to install required packages and download the datasets. In part II, we have prepared an explanation for the object detection metric that you need to use in your homework. In part III, you can find the list of packages that you need to import in your code plus the way that you can create the dataloader for each dataset. In part IV, you can find the steps that you need to follow to train and evaluate your model. Finally in part V, we have prepared a complete explanation for the experiments that you need to do for this assignment.



Training the CNN in this assignment may take a long time, and several of you will be using the limited computing resources at the same time, so be sure to start this assignment early.


Part I: Installing Required Packages and Downloading Data

    Installation:
  1. In this homework, you need to install new packages and libraries in an addition to the ones that you used in homework 2. To install the required packages, we have prepared a script. After login to clusters first you need to download the script with following command:
  2. wget http://people.cs.pitt.edu/~nhonarvar/TA_Spring_2020/PyTorch_Installation_hw3.sh
  3. Then you need to run the script by the following command (Note: you just need to run this script once):
  4. source PyTorch_Installation_hw3.sh
  5. After the successful execution of script, you need to run following command to exit the created virtual environment at this point:
  6. deactivate

    Use PyTorch:
    All of the steps to use PyTorch in this homework are the same as homework 2 except the name of virtual environment.
  7. In this step you need to transfer from login node to one of the GPUs node. On GPU cluster, there exist four GPU partitions: gtx1080, titanx, k40, and titan (Note: you can find complete explanation about all these partitions in this link). One way to use these GPU partitions is to run an interactive job on clusters. Here is an example of a command to submit a request to get access to one GPU of titanx partition for 2 hours:
    crc-interactive.py -g --time=2 -n 1 -c 1 -p titanx -u 1
    After sending the request, you may or may not get access to any GPU because all of the GPUs of partition might be in use. In the case that you are not able to get access to titanx partion, you can switch to other paritions (gtx1080, k40, and titan) just by replacing the titanx with name of other partitions in command line. (Note: You need to wait for 10-20 seconds to see whether you can get access to the requested GPU).
  8. Now (and any time in future that you want to use PyTorch) you must enter the virtual environment which has been created during the installation by following two commands:
    module load python/3.7.0 venv/wrap
    workon pytorch_hw3
  9. At this point, you can run your python code by the following command:
  10. python hw3.py

    Download the Data and Required Packages:
  11. For this homework, we have prepared two datasets that you need to download in cluster and unzip as follows:
    wget http://people.cs.pitt.edu/~nhonarvar/TA_Spring_2020/PASCAL.zip
    unzip PASCAL.zip
    wget http://people.cs.pitt.edu/~nhonarvar/TA_Spring_2020/PennFudanPed_hw3.zip
    unzip PennFudanPed_hw3.zip
    At this point you have a directory with name of PASCAL and a directory with name of PennFudanPed_hw3 which contain test, train and validation sets. For PASCAL dataset, inside each of the train, test, and val directories, there exist three directories: 1) Images, 2) BBox and 3) Labels. The images folder contains the images from PASCAL VOC dataset, the BBox folder contains the ground truth bounding boxes of objects in every image and the Labels contains the object category for bounding boxes in every image. For PennFudanPed_hw3 dataset, there exist two directories: 1) Images and 2) Masks. The Images folder contains images from Penn-Fudan dataset and Masks folder contains the segmentation mask of objects in every image.
  12. In this homework, you also need to put 6 python files in the same directoy that you are runing your code. You can download the zip file which includes all of these python files with the following command:
    wget http://people.cs.pitt.edu/~nhonarvar/TA_Spring_2020/Requied_Files.zip
    unzip Requied_Files.zip
    After unzipping the zip file, you need to copy all python files to the directory that you are running your code.

Part II: Object Detection Metric:

  1. In homework 2 which was a classification task, we simply used accuracy as a metric to choose the best model weight in training process. Computing the performance of object detection is more complicated compared to object detection. The output of object detection are bounding boxes and to compute the performance we use mAP (mean Average Precision). The definition of mAP is as follows:

  2.       where
    For a given class c, to compute the Intersection over Union metric (IoU) (see image below) between any individual predicted bounding box and the ground truth bounding boxes, take the best overlap (i.e. the highest overlap between the predicted and any ground truth box) as your final score for that predicted bounding box. If there is no ground truth bounding box, but you predict a positive window, your score for that box is 0.



Part III: Import Required Libraries and Modules and Data Preparation

  1. In the first step in your homework, you need to import required modules and libraries:
    import torch
    import torchvision
    from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
    from pascal_dataset import PASCALDataset
    import utils
    from coco_utils import get_coco_api_from_dataset
    from coco_eval import CocoEvaluator
    import copy
    import torch.optim as optim
    from torch.optim import lr_scheduler
    from PennFudanDataset import PennFudanDataset
  2. To represent our datasets, we have prepared the PASCALDataset class in pascal_dataset.py and PennFudanDataset class in PennFudanDataset.py. You can find all the details of the way the images and annotations are converted to tensors in pascal_dataset.py and PennFudanDataset.py files. You can use the PASCALDataset class and PennFudanDataset class in your code as follows:
    dataset = PASCALDataset('path_to_data')
    dataset = PennFudanDataset('path_to_data')
    Note: The path to data is the path to train, test, or val sets NOT the directory which includes the whole dataset. As the result, you need to create a separate dataset for each of train, test and validation sets.
  3. In the next step, we need to create the data loader for the dataset. Here is an example of the code to prepare the dataloader:
    data_loader = torch.utils.data.DataLoader(dataset, batch_size=4, shuffle=True, num_workers=4, collate_fn=utils.collate_fn)
    You must be familiar with all terms which have been used in this command from previous homework except collate_fn=utils.collate_fn. collate_fn=utils.collate_fn is used to return the tuples of images and image annotations in every iteration.
    Note: For data loader of train set use batch size of 4 and for data loader of validation set use batch size of 1. If you use batch size of 4 for both train and validation sets, you will get CUDA out of memory error.
    Note: You also need to create dataloader for train, test and validation sets separately.
  4. Number of categories which are in PASCAL dataset is 5. The categories are person, bicycle, car, motorcycle, airplane and their corresponding labels are 1, 2, 3, 4, 5, respectively. In addition to these labels, label 0 belongs to category of background and as the result the total number of classes which you need to use for training process is 6.
    PennFudanPed_hw3 just contains pedestrian (person) category. As the result the total number of classes for the object detection task is 2.

Part IV: Train Object Detection Network on Our Dataset

    In this homework, we will use Faster RCNN
  1. Now we need to first load the model for object detection which have been pre-trained. Here is the command to load the pre-trained model:
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    The number of classes in pre-trained model is different from the number of classes in our datasets. So similar to what you did in homework 2, you need to replace the box_predictor of model with a new FastRCNNPredictor layer to predict 6 classes when you are training on PASCAL dataset and 2 classes when you are training on PennFudanPed_hw3 dataset.
  2. Before starting the training process you need to set the optimizer, scheduler and number of epochs similar to what you did in homework 2.
    Note: Since every epoch of training for this task on pascal dataset takes a long time, do not use large value for number of epochs.
  3. Training Phase:
  4. Now you can start to train the network and in every epoch you have two phases of train and validation. In every epoch, if the mAP of the validation set is the largest mAP so far, you need to save the model weight.
  5. In every epoch, first you need to iterate over train set to perform the training process and then iterate over the validation set to evaluate the performance of trained model. Here is the set of commands to iterate over data, prepare the images and labels, and use them as input to model for object detection task:
    for images, targets in data_loader:
    images = list(image.to(device) for image in images)
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
    loss_dict = model(images, targets)
    First line in for loop: Since the images and targets are of type tuple, we need to convert them to a list. In the first line of for loop, first all Images in the batch are transferred to GPU device and then a list is created from all images. The input to the model is the list of all images of the batch.
    Second line in for loop: A target is a dictionary which contains the bounding box of objects, label of object, image ids, the areas of bounding boxes and whether or not the image is crowded. In the second line of for loop, all the values in dictionary of every image targets (annotations) are transferred to GPU device and finally a list is created from all targets in the batch.
    Third line in for loop: We use the the images and targets as input to the model and get the loss values. Note that the inputs to the model in train mode are both the images and the targets (annotations). Since the task of object detection has more than one loss value, the output for every image is a dictionary of all loss values. The dictionary contains following loss values: 1) loss_classifier: measures the performance of the object classification for detected bounding boxes, 2) loss_box_reg: measures the performance of network for retrieving the coordinates of the ground truth bounding boxes, 3) loss_objectness: measures the performance of network for retrieving bounding boxes which contain an object and 4) loss_rpn_box_reg: measures the performance of network for retrieving the region proposals.
  6. In the next step, you need to sum all losses and backpropagate the loss. Also, like homework 2, in every iteration of training phase you need to zero the gradient and apply step function of the optimizer.
  7. After iterating over the whole train set, you need to update the scheduler.
  8. Validation Phase:
  9. In the validation phase you need to create a coco evaluator to evaluate the performance of the network. To create an evaluator, first you need to create a coco API from our dataset:
    coco = get_coco_api_from_dataset(data_loader.dataset)
    Then you need to specify the IoU type:
    iou_types = ["bbox"]
    At the end, you can create a coco evaluator from coco API and IoU types:
    coco_evaluator = CocoEvaluator(coco, iou_types)
  10. At this point, you can start to iterate over validation set and compute the mAP. In every iteration first you need to transfer the images to GPU and then use them as input to the model. The input of model in evaluation mode is just images and as opposed to train phase, you do not need to transfer the target (annotations) to GPU. Here is the command to get the object detection for the images:
    outputs = model(image)
  11. For evaluation in coco_evaluator, the outputs needs to be on CPU and you need to transfer them from GPU to CPU. Then you need to create the pair of target and output as follows:
    res = {target["image_id"].item(): output for target, output in zip(targets, outputs)}
  12. Now the res is used to update the coco evaluator in every iteration:
  13. coco_evaluator.update(res)
  14. After iterating over all images, you need to run the following commands to get the final results for evaluation in every epoch:
  15. coco_evaluator.synchronize_between_processes()
    coco_evaluator.accumulate()
    coco_evaluator.summarize()
  16. At this point you can get the mAP over all validation set by the following command:
  17. coco_evaluator.coco_eval['bbox'].stats[0]
    You need to save the model weight which has the highest mAP on the validation set.

Part V: Experimenal Result for Object Detection on Our Datasets:

    For this part, you need to follow these steps for both PASCAL and PennFudanPed_hw3 datasets:
  1. Train the network with different hyperparameters (we recommed you to first focus on trying different learning rates and if you have enough time, you can also play with the number of epochs). You need to do at least 3 different experiments with different hyperparameters.
  2. Report the best test and validation accuracy in all of your experiments.
  3. Use the model which has the best test accuracy among all of your experiments to visualize the object detection results. For your visulization, you need to write a code to draw the bounding boxes which have been detected by the network in the image. In addition to bounding boxes, the name of category and its score should be shown somewhere aroud the bounding box. Your code needs to find 20 images from the test set with highest mAP, draw the bounding boxes and save in a directory. Then you need to visulize the images in a Jupyter notebook. In your submission, you need to submit both the original notebook and its html version. To use Jupyter notebook on clusters there are three main ways depending on your operating system:
  4. Finally you need to compare the results of two datasets with each other and discuss the possible reasons for the differences in results. In your discussion, you need to mention at least two reasons for the differences and write a comprehensive explanation for the reasons. To explore the differences, you can compute the statistics for the number of samples from every class in both datasets. Also visualizing the images is helpful.

Deliverables:

  1. A python file to train of the network.
  2. A python file to test the network.
  3. A python file to visulize the object detetcion results.
  4. Two Jupyter Notebooks plus their html version to show the results of visualization for both PASCAL and PennFudanPed_hw3 datasets.
  5. A report to analyze the experimenal results.
    Note: If your code can not be run, you can only receive up to 50% of the grade, even if you have all the required information in your report.

Grading rubric:

  1. [40 points] Code for training and validation of the network.
  2. [10 points] Code for testing the saved model weight on the test set.
  3. [20 points] Visualize the object detection results.
  4. [30 points] Write a report of the required experiments.
Acknowledgements: This assignment was prepared for you by Narges Honarvar Nazari and partly adapted from PyTorch tutorial for object detection. Part II has been adapted from a homework desinged by Nils Murrugarra-Llerena. The photos used for this assignment come from the PASCAL and Penn-Fudan datasets.