CS 1699: Homework 3

Due: Feb. 27, 2020, 11:59pm

This assignment is worth 50 points.


Part I: Building a Custom Data Loader (20 points)

In our PyTorch tutorial, we show how to use the torchvision library to load common dataset such as MNIST. However, in real world applications, it's more common for you to deal with custom data. In this exercise, you are provided with a bunch of images and you need to write a data loader to feed these images to your model in PyTorch.

Note: You are not allowed to use torchvision in this exercise. You may find the following tutorial helpful: Writing Custom Datasets, Dataloaders and Transforms

Instructions:
Part II: Training a neural network in PyTorch (20 points)

In this exercise you need to implement a 3-layer MLP model (one input layer, one hidden layer with tanh activation and one output layer) in PyTorch (named MultiLayerPerceptronModel, which will be used to classify the images from the dataset in Part I.
You can use the built-in modules in PyTorch to build your model, such as Linear, Dropout, Tanh, etc..
You also need to write the training function (training), and should explore the following hyperparameter settings:
To get full credit, you should explore at least 4 different type of hyperparameters (from listed above), and choose at least 3 different values for each hyperparameters. For simplicity, you could analyze one hyperparameter at a time (i.e. fixing all others to some reasonable value), rather than perfoming grid search.
If you use TensorBoard to monitor your training, you can directly attach the screenshots of the training curves (accuracy) in your report.
To evaluate the performance of trained model, you also need to write a function (evaluation) which loads the trained model and evaluate its performance on train/test set. In your report, please clearly state what hyperparameters you explored, and what accuracy the model achieved on train/test set.


Part III: Transfer Learning (10 points)

In this exercise you can take an ImageNet pre-trained MobileNetV2 model and finetune it on the dataset in Part I.
Specifically, you can load a MobileNetV2 model with pretrained weights using torchvision library, and replace the final classification layer using a new randomly initialized fully-connected layer and finetune on the Cifar10 Dataset. You should try the two flavors of transfer learning, i.e. (1) freezing all the MobileNetV2 layers (feature extraction) and only train the final classification layer; (2) finetuning all MobileNetV2 layers together with the final classification layer. Report the performance of the two models on train/test set.
Note: By "finetuning" we mean the model is further trained with small learning rate, thus the weights do not change significantly but it will hopefully lead to improved performance.
Note: You may find the following tutorial helpful: Finetuning torchvision Models

Hints:
  1. MobileNetV2 by default takes 224x224 image input, so you may need to resize your image from 32x32 to 224x224 (e.g. using skimage.transform.resize for this model. )
  2. PyTorch takes input image in the form of NCHW, which means the four dimension in the input tensor represents Batch, Channel, Height, Width. However by convention the image is saved as NHWC. You may need to swap the dimensions in your input, e.g. using x.permute(0, 3, 1, 2). You can check the Stack Overflow answer or PyTorch documentations.


Submission: Please include the following files: