CS1678: Homework 2

CS 1678: Homework 2

Due: 3/18/2021, 11:59pm

This assignment is worth 50 points.

You will do this assignment using Google's Colab service (introduction). Your code will be included in a Jupyter notebook and will be run on the cloud, where you have access to GPUs for free with your Google account. To make a new notebook, go to File -> New notebook in the above link. Then add both code and text snippets; use text snippets to explain what your code does, and state which part this snippet is implementing. In Part B, also use text snippets to explain your observations about the results you obtain.

Turn on the GPU mode in Edit -> Notebook settings and set Hardware accelerator to GPU. You can print("GPU Model: %s" % torch.cuda.get_device_name(0)) to see what type of GPU you are assigned; e.g. it may be a Tesla T4.

Part A: Building a Custom Data Loader (15 points)

In our PyTorch tutorial, we show how to use the torchvision library to load common dataset such as MNIST. However, in real world applications, it's more common for you to deal with custom data. In this exercise, you are provided with a bunch of images and you need to write a data loader to feed these images to your model in PyTorch.

Note: You are not allowed to use torchvision in this exercise. You may find the following tutorial helpful: Writing Custom Datasets, Dataloaders and Transforms

Instructions:

Please download the data from here. There are two directories within the given zip file, one for training samples and one for test. In each directory, there are 10 subdirectories each containing the images that belong to that category. The structure looks like:
cifar10_train/airplane/airplane_00001.png
cifar10_train/airplane/airplane_00002.png
... ...
cifar10_train/truck/truck_04999.png

cifar10_test/airplane/airplane_00001.png
cifar10_test/airplane/airplane_00002.png
... ...
cifar10_test/truck/truck_00999.png
Implement a class CifarDataset, which represents the given dataset. Your custom dataset should inherit torch.utils.data.Dataset. This class can be used for both training set and test set, which depends on the argument root_dir passed to the constructor.
Once you set up your custom dataset, you can feed it to a torch.utils.data.DataLoader so it helps to iterate through your dataset.

Here is a sample code snippet that you could use as a starter:

class CifarDataset(torch.utils.data.Dataset):
  def __init__(self, root_dir):
    """Initializes a dataset containing images and labels."""
    super().__init__()
    raise NotImplementedError

  def __len__(self):
    """Returns the size of the dataset."""
    raise NotImplementedError

  def __getitem__(self, index):
    """Returns the index-th data item of the dataset."""
    raise NotImplementedError

train_dataset = CifarDataset(TRAIN_DIRECTORY_PATH)
train_dataloader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=BATCH_SIZE,
                                               shuffle=True)

Part B: Training a neural network in PyTorch (25 points)

In this part you need to implement a 3-layer MLP model (one input layer, one hidden layer with tanh activation and one output layer) in PyTorch (named MultiLayerPerceptronModel, which will be used to classify the images from the dataset in Part A. You can use the built-in modules in PyTorch to build your model, such as Linear, Dropout, Tanh, etc. You also need to write the training function (training), and should explore the following hyperparameter settings:

Batch size: Number of examples per training iteration.
Hidden size: Try using different number of hidden nodes in your model and compare the performances.
Dropout: Dropout is an effective strategy to defend against overfitting. Add a dropout layer after the hidden layer, and try using different dropout rate to compare the performances.
Optimizer: Try using different optimizers such as SGD, Adam, RMSProp.
Regularization (weight decay): L2 regularization can be specified by setting the weight_decay parameter in optimizer. Try using different regularization factor and check the performance.
Learning rate, Learning rate scheduler: Learning rate is key hyperparameter in model training, and you can gradually decreasing the learning rate to further improve your model. Try using different learning rate and different learning rate scheduler to compare the performance.

To get full credit, you should explore at least 5 different types of hyperparameters (from the 6 listed above), and choose at least 3 different values for each hyperparameters. For simplicity, you could analyze one hyperparameter at a time (i.e. fixing all others to some reasonable value), rather than perfoming grid search.

To evaluate the performance of trained models, you also need to write a function (evaluation) which loads the trained model and evaluates its performance on train/test set.

Please include a brief report in your Jupyter notebook, in which you clearly state what hyperparameters you explored, and what accuracy the model achieved on the train/test set for each setting of these hyperparameters.

Part C: Transfer Learning (10 points)

In this exercise you will take an ImageNet pre-trained MobileNetV2 model and finetune it on the dataset in Part A. Specifically, you will load a MobileNetV2 model with pretrained weights using torchvision library, and replace the final classification layer using a new randomly initialized fully-connected layer and finetune on the Cifar10 Dataset. You should try the two flavors of transfer learning, i.e. (1) freezing all the MobileNetV2 layers (feature extraction) and only train the final classification layer; (2) finetuning all MobileNetV2 layers together with the final classification layer. By "finetuning" we mean the model is further trained with small learning rate, thus the weights do not change significantly but it will hopefully lead to improved performance. Report (in a text snippet in your notebook) the performance of the two models on train/test set.

Hints:

You may find the following tutorial helpful: Finetuning torchvision Models
MobileNetV2 by default takes 224x224 image input, so you need to resize your image from 32x32 to 224x224 (e.g. using skimage.transform.resize).
PyTorch takes input image in the form of NCHW, which means the four dimension in the input tensor represents Batch, Channel, Height, Width. However by convention the image is saved as NHWC. You need to swap the dimensions in your input, e.g. using x.permute(0, 3, 1, 2). You can check the Stack Overflow answer or PyTorch documentations.

Submission: Please submit your Jupyter notebook, with parts clearly labeled, making sure to include the following classes/functions:

class CifarDataset
class MultiLayerPerceptronModel
function training
function evaluation

Acknowledgement: This assignment was designed by Mingda Zhang in Spring 2020.