CS 2770: Homework 2

Due: 3/17/2020, 11:59pm

In this homework assignment, you will use a deep network to perform image categorization. First, you will use a pre-trained network (trained on a different problem) to extract features; and then use these features to train a SVM classifier which discriminates between 20 object categories. Second, you will train a network (with weights initialized from the same pre-trained network) and train it on this task. Finally, you will compare the performance of the pre-trained network to the network you trained.

You will use the PyTorch package. PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing. It is primarily developed by Facebook's AI Research lab (FAIR). We have prepared instructions for you to easily install PyTorch on the h2p.crc.pitt.edu clusters.

Training the CNN in this assignment may take a long time, and several of you will be using the limited computing resources at the same time, so be sure to start this assignment early.

Part I: SSH Basics - Getting Connected to the Server and Transferring Files

You will be connecting to the CRC clusters via SSH. If you are using a Windows machine and haven't used SSH before, you will need to first download a SSH client such as PuTTY. You can download PuTTY from here. If you are using a Mac or Linux, you already have SSH installed.
In order to use CRC cluster, you must first install the Pulse VPN client (see here for instructions). Connect to the VPN before trying to ssh to the cluster.
If you are on a Mac or Linux, open a terminal and type: ssh nah114@h2p.crc.pitt.edu and press enter to connect to the cluster. Note: you need to replace nah114 with your Pitt username. If you are on Windows, open PuTTY and for the host name, enter h2p.crc.pitt.edu and click Open to connect to the cluster. You will need to enter your Pitt username and password when prompted by the cluster.
Once you are logged in, you will be taken to login node of CRC cluster. You can create a separate directory for your homework assignment by using mkdir hw2 command. To use the GPUs in interactive mode, you need to transfer from login node to one of the GPU nodes which will be explained in more detail in the following sections.
You can either write your Python assignment file on your own computer and transfer it to the cluster using scp (on Mac or Linux) or WinSCP (you'll need to download this on Windows) to run it on the cluster or directly write the Python assignment file on the cluster using a text editor such as vim. On Mac or Linux a scp command to copy a file you've written to the cluster might look like this (where my username is nah114):
scp file.py nah114@h2p.crc.pitt.edu:/ihome/akovashka/nah114
This command will copy the Python file from your computer to your CRC clusters space. If you are on Windows and install WinSCP, you will be presented with a GUI interface where you can drag and drop files from your computer to your CRC clusters space. You also can use text editors like Vim, Vi, Nano, Emacs, etc. If you are comfortable to work with linux text editors, we recommend you to use them rather than writing the code on your computer and transfer it to the server.

Part II: Installing PyTorch, Using PyTorch on CRC Clusters and Downloading Data

Installation:

For installing the PyTorch on CRC clusters, we have prepared a script. After login to clusters first you need to download the script with following command:

wget http://people.cs.pitt.edu/~nhonarvar/TA_Spring_2020/PyTorch_Installation.sh

Then you need to run the script by the following command (Note: you just need to run this script once):

source PyTorch_Installation.sh

After the successful execution of script, you need to run following command to exit the created virtual environment at this point:

deactivate

Use PyTorch:

In this step you need to transfer from login node to one of the GPUs node. On GPU cluster, there exist four GPU partitions: gtx1080, titanx, k40, and titan (Note: you can find complete explanation about all these partitions in this link). One way to use these GPU partitions is to run an interactive job on clusters. Here is an example of a command to submit a request to get access to one GPU of titanx partition for 2 hours:
crc-interactive.py -g --time=2 -n 1 -c 1 -p titanx -u 1
After sending the request, you may or may not get access to any GPU because all of the GPUs of partition might be in use. In the case that you are not able to get access to titanx partion, you can switch to other paritions (gtx1080, k40, and titan) just by replacing the titanx with name of other partitions in command line. (Note: You need to wait for 10-20 seconds to see whether you can get access to the requested GPU).
Now (and any time in future that you want to use PyTorch) you must enter the virtual environment which has been created during the installation by following two commands:
module load python/3.7.0 venv/wrap
workon pytorch
At this point, you can run your python code by the following command:

python hw2.py

Download the Data:

For your homework, we have prepared a dataset that you need to download in cluster and unzip as follows:
wget http://people.cs.pitt.edu/~nhonarvar/TA_Spring_2020/hw2_data.zip
unzip hw2_data.zip
At this point you have a directory with name hw2_data which contains test, train and validation image sets. Inside each of the train, test, and val directories, there exist a directoy for each image category.

Part III: Import Required Libraries and Modules and Data Preprocessing

In the first step in your homework, you need to import required modules and libraries:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import time
import os
import copy
from sklearn import svm
from sklearn.metrics import accuracy_score
At this step you have downloaded the data to your home directory and you can use it for data preprocessing and training. In the first step you must prepare the data transformer. Here is an example for a simple data transformation:
data_transforms = {
'train': transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'test': transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}

transforms.Resize((224,224)) is for resizing all images to a unanimous value. (Note: Since we are using PyTorch version 1.2 on CRC clusters, the format of resize is different from the latest version of PyTorch. If you are using latest version of PyTorch version, you need to specify that in your submission)
transforms.ToTensor() converts the input to tensor. values of tensors are in range [0,1]. transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) normalizes the tensor values based on the mean and standard deviation for RGB values. [0.485, 0.456, 0.406] contains the mean values for Red, Green and Blue channels, respectively. [0.229, 0.224, 0.225] contains the standard deviation for Red, Green and Blue channels, respectively.
In the next step we need to use the downloaded images and data transformer to create a data loader as follows:
data_dir = 'hw2_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
for x in ['train', 'val', 'test']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=8, shuffle=True, num_workers=4)
for x in ['train', 'val' , 'test']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val', 'test']}
class_names = image_datasets['train'].classes

data_dir is a directory which has been created after downloading hw2_data.zip file and unzip it in part II and it contains the data for test, train and validation sets.
image_datasets keeps the path to all images in train, val and test directoies.
dataloaders receives the image_datasets, batch_size, shuffle and num_workers as input and retruns the data loader for train, validation and test sets.
batch_size specifies the size of mini-batch in every forward pass to the model.
num_workers specifies how many subprocesses to use for data loading.
shuffle specifies whether you want to shuffle the original order of images or not.
You can get the dataset_sizes and class_names by the aforementioned commands. We will use these terms in the following sections.
To be able to use GPU you need to set the cuda device by the following command:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
This command specifies to use GPU number 0 if available. If GPU is not available, CPU will be used instead.

Part IV: Loading and Using a Pretrained Network as a Feature Extractor

Now we need to load a pretrained CNN model. The model that we are loading has been trained on 14M images to classify them into 1000 classes (which aren't the same as the categories we aim to classify). To use the pretrained model as feature extractor, you need to create the following class:
class VGG16_Feature_Extraction(torch.nn.Module):
def __init__(self):
super(VGG16_Feature_Extraction, self).__init__()
VGG16_Pretrained = models.vgg16(pretrained=True)
self.features = VGG16_Pretrained.features
self.avgpool = VGG16_Pretrained.avgpool
self.feature_extractor = nn.Sequential(*[VGG16_Pretrained.classifier[i] for i in range(6)])
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.feature_extractor(x)
return x

This class is of type torch.nn.Module. In the initialization, first we load the pretrained VGG16 model and then copy the features and avgpool modules. The features is a sequential container which contains the convolutional and pooling layers. (You can find PyTorch implementation of VGG16 in this link). For our feature_extractor, we copy all the layers except the last fully connected layer from classifier of VGG16. (Note: the last fully connected layer from VGG16 is for classification on 1000 images and we do not need to have it as a part of our feature extractor). In the forward section of model, we first use the features module and then we apply avgpool. Before sending the result to the feature_extractor we need to flatten the data. At the end, we use the feature_extractor to extract features.
In the next step, you must use the class of VGG16_Feature_Extraction(torch.nn.Module) to extract the features for the train and test images. First you need to create an instance from the VGG16_Feature_Extraction and transfer it to the cuda device which you have prepared before as follows:
model = VGG16_Feature_Extraction()
model = model.to(device)
Now you need to use the model to extract features of images. You can extract and save the features in different ways and here is the code of one way to do it:
image_features = {}
image_labels = {}
for phase in ['train', 'test']:
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
model_prediction = model(inputs)
model_prediction_numpy = model_prediction.cpu().detach().numpy()
if (phase not in image_features):
image_features[phase] = model_prediction_numpy
image_labels[phase] = labels.numpy()
else:
image_features[phase] = np.concatenate((image_features[phase], model_prediction_numpy), axis=0)
image_labels[phase] = np.concatenate((image_labels[phase], labels.numpy()), axis=0)

In this code, first we create dictionaries for image features and image labels for both test and train sets.
Then we need to go through both train and test set, use the dataloaders which we have prepared before and extract features for every mini batch by this model_prediction = model(inputs) command. Since we want to use these features to train a SVM classifier, our features and labels must be numpy arrays. The output of model predictions are tensor on CUDA device and we need to transfer them to numpy array. This code model_prediction_numpy = model_prediction.cpu().detach().numpy() converts the tensors to NumPy arrays.
In the last step, we need to save the predictions. There are two main methods for saving the images: 1) You can concatenate features and labels in every step, 2) create a 2d array for features and labels in both test and train set. The size of array for features representation is n*4096 in which n is number of images and 4096 is size of extracted feature. (Note: The second approach is more efficient bacause it does not need concatenation in every step)
After retrieving features from the pre-trained VGG16 network, train a linear SVM using SKLearn's LinearSVC function on the train set but do not train on the withheld validation set or test set. You need to standardize the train set and test set before training and testing your SVM. You can use the sklearn.preprocessing.StandardScaler to do this.
Test your SVM on the test set (remember to standardize test features using the train mean and standard deviation first) and report the accuracy of the SVM at predicting the folder that the image was in. Also include a confusion matrix of the predictions using the sklearn.metrics.confusion_matrix function and include it in your submission. What do you observe about the types of errors the network makes?

Part V: Train and Test the CNN on our Dataset

Preparing the Network:

In this step instead of using the VGG16 as a feature extractor, you will train it on your dataset. To do so, first you need to load the VGG16 with pretrained weight from ImageNet.
model = models.vgg16(pretrained=True)
Then you need to extract the number of input features for the last fully connected layer of model:
num_ftrs = model.classifier[6].in_features
At the end, you need to replace the last fully connected layer with a new layer. This new layer has the same number of input features as the original network but the number of outputs are the same as the number of classes in our dataset:
model.classifier[6] = nn.Linear(num_ftrs, len(class_names))

Steps Before Start Training:

Set the number of epochs to 25.
num_epochs = 25
Send the model to CUDA device:
model = model.to(device)
Specify the criterion for evaluating the trained model:
criterion = nn.CrossEntropyLoss()
Set the optimizer, learning rate and momentum:
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
At the end, create a scheduler to control the way that learning rate changes during the training process:
scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

Training:

Before starting to iterate over epochs, we need to save the initial model weight as the best model weight and set the best accuracy as zero. best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
Now we can start to iterate over the epochs. Note that all section from here until end of part 38 are inside the following for loop.
for epoch in range(num_epochs):
In the next step, we need to iterate over the train and validation sets. (Note: In every epoch once you need to go through the train set for training the model parameters and then you need to go through the validation set to evaluate the trained model.)
for phase in ['train', 'val']:
if phase == 'train':
model.train()
else:
model.eval()
When we go through the train phase we need to set the mode of model as train and when we go through the validation phase we need to set the mode of model as eval.
In the next step we need to go through the data by using the dataloader which we have prepared in previous steps. In every iteraion, we get a minibatch of images and their corresponding labels. (Note: all of steps from here until 35 are under the for loop for dataloader)

for inputs, labels in dataloaders[phase]:

inputs = inputs.to(device)

labels = labels.to(device)

Before staring to use the model for predicting a mini batch, we need to set the optimizer as zero_grad().
optimizer.zero_grad()
Now we need to use the current model weight for predication and backpropagating the prediction loss.
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
if phase == 'train':
loss.backward()
optimizer.step()
all_batchs_loss += loss.item() * inputs.size(0)
all_batchs_corrects += torch.sum(preds == labels.data)
In the first line we use this code with torch.set_grad_enabled(phase == 'train')to enable gradient calculation in train phase.
Then we use the model to predict the classes of every minibatch and compute the loss.
If we are in training, we need to send the loss backward to network and update the optimizer.
After iteraring over all minibatchs and if we are in training phase, we need to run scheduler.step() to update the scheduler status as follows:
if phase == 'train':
scheduler.step()
In the next step we compute the loss and accuracy of the epoch.
epoch_loss = all_batchs_loss / dataset_sizes[phase]
epoch_acc = all_batchs_corrects.double() / dataset_sizes[phase]
At the end if we are in validation set, we check whether the accuracy of classification is better than the best accuracy so far to save the best model parameters.
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
torch.save(best_model_wts , 'best_model_weight.pth')

Testing:

The testing process is very similar to train process except that we do not need to backpropagate the loss. For testing the model, first you need to prepare the model in the same way that we prepared for training process and load the best model weight that we saved in training process.
model = models.vgg16()
num_ftrs = model.classifier[6].in_features
model.classifier[6] = nn.Linear(num_ftrs, len(class_names))
model = model.to(device)
model.load_state_dict(torch.load('best_model_weight.pth'))
After loading the model weight, we need to set the model to eval and the value of phase to 'test'.

model.eval()

phase = 'test'

In the next step, we need to go through test set, predict the category of images, and compute number of correctly classified images.
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
all_batchs_corrects += torch.sum(preds == labels.data)
At the end, we compute the accuracy over all data.
epoch_acc = all_batchs_corrects.double() / dataset_sizes[phase]

Part VI: Repeating Experiments with Different Hyper Parameters

Retrain your model with different hyper parameters (at least two) and include the confusion matrix and accuracy of model on your test data. You can play with learning rate and batch size.

Grading rubric:

[20 points] Prepare train and test features by using the pre-trained model.
[15 points] Train an SVM on the extracted features from the pre-trained model, report the confusion matrix and accuracies.
[5 points] Prepare data for network training.
[10 points] Transfer the layers from VGG16 for your new network.
[15 points] Set all the required hyper parameters for your new network.
[15 points] Train the network and report the confusion matrix and accuracies.
[20 points] Train with at least two other parameters and discuss the effect of these parameters on the results.

Note:

Acknowledgements: This assignment was prepared for you by Narges Honarvar Nazari, partly adapted from PyTorch tutorial in transfer learning, and based on an assignment developed by Chris Thomas. The photos used for this assignment come from the PASCAL VOC dataset.