CS 2770: Homework 2
Due: 3/17/2020, 11:59pm
In this
homework assignment, you will use a deep network to perform image categorization.
First, you will use a
pre-trained network (trained on a different problem) to extract features; and then use these features to
train a SVM classifier which discriminates between 20 object categories. Second, you
will train a network (with weights initialized from the same pre-trained
network) and train it on this task. Finally, you will compare the performance of
the pre-trained network to the network you trained.
You will use the PyTorch package.
PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing. It is primarily developed by Facebook's AI Research lab (FAIR). We have prepared instructions for you to easily install PyTorch on the
h2p.crc.pitt.edu clusters.
Training the CNN in this assignment may take
a long time, and several of you will be using the limited computing resources at the same time, so be sure to start this assignment early.
Part I: SSH Basics - Getting Connected to the Server and Transferring Files
- You will be connecting to the CRC clusters via SSH. If you are using a Windows
machine and haven't used SSH before, you will need to first download a SSH
client such as PuTTY. You can download PuTTY from
here.
If you are using a Mac or Linux, you already have SSH installed.
- In order to use CRC cluster, you must first install the Pulse VPN client (see
here for instructions). Connect to the VPN before trying to ssh to the cluster.
- If you are on a Mac or Linux, open a terminal and type:
ssh nah114@h2p.crc.pitt.edu and press enter to connect to the cluster.
Note: you need to replace nah114 with your Pitt username.
If you are on Windows, open PuTTY and for the host name, enter
h2p.crc.pitt.edu and click Open to connect to the cluster. You
will need to enter your Pitt
username and password when prompted by the cluster.
- Once you are logged in, you will be taken to login node of CRC cluster. You can create a
separate directory for your homework assignment by using mkdir hw2 command. To use the GPUs in interactive mode, you need to
transfer from login node to one of the GPU nodes which will be explained in more detail in the following sections.
- You can either write your Python assignment file on your own computer
and transfer it to the cluster using scp (on Mac or Linux) or WinSCP (you'll
need to download this on Windows) to run it on the cluster or directly write
the Python assignment file on the cluster using a text editor such as
vim. On Mac or Linux a
scp command to copy a file you've written to the cluster
might look like this (where my username is nah114):
scp file.py nah114@h2p.crc.pitt.edu:/ihome/akovashka/nah114
This command will copy the Python file from your computer to your CRC clusters
space. If you are on Windows and install WinSCP, you will be
presented with a GUI interface where you can drag and drop files from your
computer to your CRC clusters space.
You also can use text editors like Vim, Vi, Nano, Emacs, etc. If you are comfortable to work with linux text editors,
we recommend you to use them rather than writing the code on your computer and transfer it to the server.
Part II: Installing PyTorch, Using PyTorch on CRC Clusters and Downloading Data
Installation:
- For installing the PyTorch on CRC clusters, we have prepared a script. After login to clusters first
you need to download the script with following command:
wget http://people.cs.pitt.edu/~nhonarvar/TA_Spring_2020/PyTorch_Installation.sh
- Then you need to run the script by the following command (Note: you just need to run this script once):
source PyTorch_Installation.sh
- After the successful execution of script, you need to run following command to exit the created virtual environment
at this point:
deactivate
Use PyTorch:
- In this step you need to transfer from login node to one of the GPUs node. On GPU cluster,
there exist four GPU partitions: gtx1080, titanx, k40, and titan (Note: you can find complete
explanation about all these partitions in this link).
One way to use these GPU partitions is to run an interactive job on clusters.
Here is an example of a command to submit a request to get access to one GPU of
titanx partition for 2 hours:
crc-interactive.py -g --time=2 -n 1 -c 1 -p titanx -u 1
After sending the request, you may or may not get access to any GPU because all of the GPUs of
partition might be in use. In the case that you are not able to get access to titanx partion, you can switch to other paritions
(gtx1080, k40, and titan) just by replacing the titanx with name of other partitions in command line. (Note: You need to wait
for 10-20 seconds to see whether you can get access to the requested GPU).
- Now (and any time in future that you want to use PyTorch) you must enter the virtual environment which has been created during the installation
by following two commands:
module load python/3.7.0 venv/wrap
workon pytorch
- At this point, you can run your python code by the following command:
python hw2.py
Download the Data:
- For your homework, we have prepared a dataset that you need to download in cluster and unzip as follows:
wget http://people.cs.pitt.edu/~nhonarvar/TA_Spring_2020/hw2_data.zip
unzip hw2_data.zip
At this point you have a directory with name hw2_data which contains test, train and validation image sets.
Inside each of the train, test, and val directories, there exist a directoy for each image category.
Part III: Import Required Libraries and Modules and Data Preprocessing
- In the first step in your homework, you need to import required modules and libraries:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import time
import os
import copy
from sklearn import svm
from sklearn.metrics import accuracy_score
- At this step you have downloaded the data to your home directory and you can use it for data preprocessing and training.
In the first step you must prepare the data transformer. Here is an example for a simple
data transformation:
data_transforms = {
'train': transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'test': transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
transforms.Resize((224,224)) is for resizing all images to a
unanimous value. (Note: Since we are using PyTorch version 1.2 on CRC clusters, the format of resize is different from the latest version of PyTorch. If you are using
latest version of PyTorch version, you need to specify that in your submission)
transforms.ToTensor() converts the input to tensor.
values of tensors are in range [0,1].
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
normalizes the tensor values based on the mean and standard deviation for RGB values. [0.485, 0.456, 0.406]
contains the mean values for Red, Green and Blue channels, respectively. [0.229, 0.224, 0.225] contains the standard deviation for
Red, Green and Blue channels, respectively.
- In the next step we need to use the downloaded images and data transformer to create a data loader as
follows:
data_dir = 'hw2_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
for x in ['train', 'val', 'test']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=8, shuffle=True, num_workers=4)
for x in ['train', 'val' , 'test']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val', 'test']}
class_names = image_datasets['train'].classes
data_dir is a directory which has been created after downloading
hw2_data.zip file and unzip it in part II and it contains the data for test, train and validation sets.
image_datasets keeps the path to all images
in train, val and test directoies.
dataloaders receives the image_datasets,
batch_size, shuffle and
num_workers as input and retruns the data loader for train, validation and test sets.
batch_size specifies the size of mini-batch in every forward pass to the model.
num_workers specifies how many subprocesses to use for data loading.
shuffle specifies whether you want to shuffle the original order of images
or not.
You can get the dataset_sizes and class_names
by the aforementioned commands. We will use these terms in the following sections.
- To be able to use GPU you need to set the cuda device by the following command:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
This command specifies to use GPU number 0 if available. If GPU is not available,
CPU will be used instead.
Part IV: Loading and Using a Pretrained Network as a Feature Extractor
- Now we need to load a pretrained CNN model. The model that we are loading has
been trained on 14M images to classify them into 1000 classes (which
aren't the same as the categories we aim to classify). To use the pretrained model as feature extractor, you need to create
the following class:
class VGG16_Feature_Extraction(torch.nn.Module):
def __init__(self):
super(VGG16_Feature_Extraction, self).__init__()
VGG16_Pretrained = models.vgg16(pretrained=True)
self.features = VGG16_Pretrained.features
self.avgpool = VGG16_Pretrained.avgpool
self.feature_extractor = nn.Sequential(*[VGG16_Pretrained.classifier[i] for i in range(6)])
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.feature_extractor(x)
return x
This class is of type torch.nn.Module. In the initialization, first we load the pretrained VGG16 model and then
copy the features and avgpool modules.
The features is a sequential container which contains the convolutional and pooling layers. (You can find
PyTorch implementation of VGG16 in this link).
For our feature_extractor, we copy all the layers except the last fully connected layer
from classifier of VGG16. (Note: the last fully connected layer from VGG16 is for classification on 1000 images and
we do not need to have it as a part of our feature extractor). In the forward section of model, we first use the features module and then we apply avgpool. Before sending the result to the
feature_extractor we need to flatten the data.
At the end, we use the feature_extractor to extract features.
- In the next step, you must use the class of VGG16_Feature_Extraction(torch.nn.Module) to extract the features for the train and test images.
First you need to create an instance from the VGG16_Feature_Extraction and transfer it to the cuda device
which you have prepared before as follows:
model = VGG16_Feature_Extraction()
model = model.to(device)
- Now you need to use the model to extract features of images. You can extract and save the features in different ways and here is the code of one way to do it:
image_features = {}
image_labels = {}
for phase in ['train', 'test']:
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
model_prediction = model(inputs)
model_prediction_numpy = model_prediction.cpu().detach().numpy()
if (phase not in image_features):
image_features[phase] = model_prediction_numpy
image_labels[phase] = labels.numpy()
else:
image_features[phase] = np.concatenate((image_features[phase], model_prediction_numpy), axis=0)
image_labels[phase] = np.concatenate((image_labels[phase], labels.numpy()), axis=0)
In this code, first we create dictionaries for image features and image labels for both test and train
sets.
Then we need to go through both train and test set, use the dataloaders which we have prepared before
and extract features for every mini batch by this model_prediction = model(inputs) command.
Since we want to use these features to train a SVM classifier, our features and labels must be numpy arrays. The output of model predictions
are tensor on CUDA device and we need to transfer them to numpy array. This code model_prediction_numpy = model_prediction.cpu().detach().numpy()
converts the tensors to NumPy arrays.
In the last step, we need to save the predictions. There are two main methods for saving the images: 1) You can concatenate
features and labels in every step, 2) create a 2d array for features and labels in both test and train set. The size of array for
features representation is n*4096 in which n is number of images and 4096 is size of extracted feature. (Note: The second approach is more
efficient bacause it does not need concatenation in every step)
- After retrieving features from the pre-trained VGG16 network, train a linear SVM using SKLearn's
LinearSVC function on the train set but do not train on the withheld validation
set or test set. You need to standardize the train set and test set
before training and testing your SVM. You can use the
sklearn.preprocessing.StandardScaler to do this.
- Test your SVM on the test set (remember to standardize test features
using the train mean and standard deviation first) and report the accuracy of the SVM at
predicting the folder that the image was in. Also
include a confusion matrix of the predictions using the
sklearn.metrics.confusion_matrix function and include it in
your submission. What do you observe about the types of errors the network
makes?
Part V: Train and Test the CNN on our Dataset
Preparing the Network:
- In this step instead of using the VGG16 as a feature extractor, you will train it on your dataset. To do so, first
you need to load the VGG16 with pretrained weight from ImageNet.
model = models.vgg16(pretrained=True)
- Then you need to extract the number of input features for the last fully connected layer of model:
num_ftrs = model.classifier[6].in_features
- At the end, you need to replace the last fully connected layer with a new layer. This new layer has
the same number of input features as the original network but the number of outputs are the same as
the number of classes in our dataset:
model.classifier[6] = nn.Linear(num_ftrs, len(class_names))
Steps Before Start Training:
Here are the steps that you need to do before start training:
- Set the number of epochs to 25.
num_epochs = 25
- Send the model to CUDA device:
model = model.to(device)
- Specify the criterion for evaluating the trained model:
criterion = nn.CrossEntropyLoss()
- Set the optimizer, learning rate and momentum:
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
- At the end, create a scheduler to control the way that learning rate changes during the training process:
scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
Training:
In this part we go through step by step of training process as follows:
- Before starting to iterate over epochs, we need to save the initial model weight as the best
model weight and set the best accuracy as zero.
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
- Now we can start to iterate over the epochs. Note that all section from here until end of part 38 are inside the following
for loop.
for epoch in range(num_epochs):
- In the next step, we need to iterate over the train and validation sets. (Note: In every epoch once you need
to go through the train set for training the model parameters and then you need to go through the validation set to
evaluate the trained model.)
for phase in ['train', 'val']:
if phase == 'train':
model.train()
else:
model.eval()
When we go through the train phase we need to set the mode of model as train
and when we go through the validation phase we need to set the mode of model as eval.
- In the next step we need to go through the data by using the dataloader which we have prepared
in previous steps. In every iteraion, we get a minibatch of images and their corresponding
labels. (Note: all of steps from here until 35 are under the for loop for dataloader)
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
- Before staring to use the model for predicting a mini batch, we need to set the optimizer as zero_grad().
optimizer.zero_grad()
- Now we need to use the current model weight for predication and backpropagating the prediction loss.
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
if phase == 'train':
loss.backward()
optimizer.step()
all_batchs_loss += loss.item() * inputs.size(0)
all_batchs_corrects += torch.sum(preds == labels.data)
In the first line we use this code with torch.set_grad_enabled(phase == 'train')to
enable gradient calculation in train phase.
Then we use the model to predict the classes of every minibatch and compute the loss.
If we are in training, we need to send the loss backward to network and update the optimizer.
At the end we need to sum the loss and number of correctly predicted values over all batchs.
- After iteraring over all minibatchs and if we are in training phase, we need to run scheduler.step()
to update the scheduler status as follows:
if phase == 'train':
scheduler.step()
- In the next step we compute the loss and accuracy of the epoch.
epoch_loss = all_batchs_loss / dataset_sizes[phase]
epoch_acc = all_batchs_corrects.double() / dataset_sizes[phase]
- At the end if we are in validation set, we check whether the accuracy of classification is better than
the best accuracy so far to save the best model parameters.
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
torch.save(best_model_wts , 'best_model_weight.pth')
Testing:
- The testing process is very similar to train process except that we do not need to backpropagate the loss.
For testing the model, first you need to prepare the model in the same way that we prepared for training process and
load the best model weight that we saved in training process.
model = models.vgg16()
num_ftrs = model.classifier[6].in_features
model.classifier[6] = nn.Linear(num_ftrs, len(class_names))
model = model.to(device)
model.load_state_dict(torch.load('best_model_weight.pth'))
- After loading the model weight, we need to set the model to eval and the value of phase to
'test'.
model.eval()
phase = 'test'
- In the next step, we need to go through test set, predict the category of images, and compute
number of correctly classified images.
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
all_batchs_corrects += torch.sum(preds == labels.data)
- At the end, we compute the accuracy over all data.
epoch_acc = all_batchs_corrects.double() / dataset_sizes[phase]
Part VI: Repeating Experiments with Different Hyper Parameters
- Retrain your model with different hyper parameters (at least two) and include the confusion matrix and accuracy of model
on your test data. You can play with learning rate and batch size.
Grading rubric:
- [20 points] Prepare train and test features by using the pre-trained model.
- [15 points] Train an SVM on the extracted features from the pre-trained model, report the confusion matrix and accuracies.
- [5 points] Prepare data for network training.
- [10 points] Transfer the layers from VGG16 for your new network.
- [15 points] Set all the required hyper parameters for your new network.
- [15 points] Train the network and report the confusion matrix and accuracies.
- [20 points] Train with at least two other parameters and discuss the effect of these parameters on
the results.
Note: If your code can not be run, you can only receive up to 50% of the grade, even if you have all the
required information in your report.
Acknowledgements: This assignment was prepared for you by Narges Honarvar Nazari, partly adapted from PyTorch tutorial in transfer learning, and based on an assignment developed by Chris Thomas.
The photos used for this assignment come from the PASCAL VOC dataset.