CS 2770: Homework 1 (Python Version)
Due: 2/9/2017, 11:59pm
In this
homework assignment, you will use a deep network to perform image categorization.
You will first use a
pretrained network (trained on a different problem) to extract features. You will then use these features to
train a SVM classifier which discriminates between 20 object categories. You
will then train a network (with weights initialized from the same pre-trained
network) and train it on this task. Finally, you will compare the performance of
the pre-trained network to the network you trained on this problem.
You will use the Caffe package, a very popular deep learning framework for
computer vision. Caffe is a C++ framework, but has both Python and Matlab
interfaces. This page is for the Python interface. We have installed Caffe for you on the nietzsche.cs.pitt.edu server.
Training the CNN in this assignment may take
a long time, and several of you will be using the limited computing resources at the same time, so be sure to start this assignment early.
Part I: SSH Basics - Getting Connected to the Server and Transferring Files
- You will be connecting to the server via SSH. If you are using a Windows
machine and haven't used SSH before, you will need to first download a SSH
client such as PuTTY. You can download PuTTY from
here.
If you are using a Mac or Linux, you already have SSH installed.
- This server only allows incoming connections from computers in the CS
department or via the VPN client. If you are connecting from off
campus, you must first install the Pulse VPN client (see
here for instructions) in order to connect to the server. Connect to the
VPN before trying to ssh to the server.
- If you are on a Mac or Linux, open a terminal and type:
ssh
nietzsche.cs.pitt.edu and press enter to connect to the server. If you are on
Windows, open PuTTY and for the host name, enter
nietzsche.cs.pitt.edu and click Open to connect to the server. You
will need to enter your departmental
username and password when prompted by the server.
- Once you are logged in, you will be taken to your AFS home directory and
will probably see a "public" and "private" directory (if you have not
changed these yourself). Make sure to put any assignment files you are
working on in the private directory (or another directory which no one
except you can access).
- You can either write your Python assignment file on your own computer
and transfer it to the server using scp (on Mac or Linux) or WinSCP (you'll
need to download this on Windows) to run it on the server or directly write
the Python assignment file on the server using a text editor such as
vim. On Mac or Linux a
scp command to copy a file you've written to the server
might look like this (where my username is chris):
scp file.py chris@nietzsche.cs.pitt.edu:/afs/cs.pitt.edu/usr0/chris/private/
This command will copy the Python file from your computer to your AFS
storage space. If you are on Windows and install WinSCP, you will be
presented with a GUI interface where you can drag and drop files from your
computer to your AFS space.
Part II: Setting Up Your Environment and Python for Caffe
- Caffe requires libraries to be visible to Python for it to work. We need
to tell Python where these libraries are located. Before starting Python,
copy paste the following directly into the shell on the server:
bash (press enter after each line)
export LD_LIBRARY_PATH=/tmp/caffe/ffmpeg:/tmp/caffe/distribute/lib:/opt/cuda-8.0-cuDNN5.1/lib64:/tmp/caffe/opencv/install/lib:/tmp/caffe/anaconda2/lib:/opt/OpenBLAS/lib:/usr/local/lib
export PYTHONHOME=/tmp/caffe/anaconda2
export PYTHONPATH=/tmp/caffe/python
export PATH=/opt/cuda-8.0-cuDNN5.1/bin:/tmp/caffe/anaconda2/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
- To launch Python on the server, type python. You can then begin typing commands as you normally would in
Python or run a script that you write by typing
python script_name.py
- Python needs to know where to find its interface to the Caffe library
and will need some additional libraries to load Caffe. Add the following lines to the top of your script:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from sklearn import svm
import caffe
- You will be using a GPU to accelerate your CNNs. There are 4 GPUs on
this machine. Type nvidia-smi (before starting
Python) to
view the 4 GPUs on the machine. Look in the center column and you will see
four lines like: 0MiB / 11439MiB which show the memory utilization on the
GPU. The first number is the current utilization. Note which GPU has the
least memory utilization on the machine (this will change depending on who
is using what GPUs). Once a model loads on the GPU, the memory is unable to
be used by anybody else, so make sure to exit Python after you are done
doing your work so as not to exclusively hold memory unnecessarily.
- Add the following lines to the top of your script:
caffe.set_device(
#ENTER THE GPU NUMBER YOU NOTED ABOVE (0-3) HERE )
caffe.set_mode_gpu()
Part III: Preparing the Dataset for the Experiment
- The data for this assignment is located at
/tmp/caffe/data/. You will find 20 folders
with images in them. Each folder is the category of the image. For each
image, you will need to extract image features from the CNN and store them
in a variable along with the folder name that the image came from. Later,
you will train a linear SVM using these features to predict which folder an
image came from. In the second part of the assignment, you will train the
network on these images.
- You will need to randomly withhold 10% of the images as a validation set
for training the CNN. Withhold an additional 10% as a test set for
evaluation. Make sure to retain your data split for the entire assignment because you will use the same data split for training,
validating, and testing the neural network.
Part IV: Using a Pretrained Network as a Feature Extractor
- We will now load in a pretrained CNN model. The model we are loading has
been trained on 1.4M images to classify images into 1000 classes (which
aren't necessarily the animals we will be classfying). Add the following
line to your script:
net = caffe.Net('/tmp/caffe/models/deploy.prototxt',
'/tmp/caffe/models/weights.caffemodel', caffe.TEST)
The
caffe.Net
function loads a network model for use in Python. The first argument
specifies a file containing the network structure which tells Caffe how the
various network layers connect. The second argument specifies the learned
model to load containing the weights learned during training and copies
those weights into the network structure created by the first argument.
The final argument tells Caffe to load the network in test mode, rather than
train mode. You will see a lot of output appear once you execute this
command which you can ignore.
- We need to preprocess each image before the CNN classifies it. Set up
the Python data transformer using this code:
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_mean('data', np.load('/tmp/caffe/python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1))
transformer.set_transpose('data', (2,0,1))
transformer.set_channel_swap('data', (2,1,0))
transformer.set_raw_scale('data', 255.0)
- In order to extract the features for an image, first load the image in
Python using the: caffe.io.load_image('/path/to/image/to/load.jpg')
function. Caffe expects images in BGR
format (instead of RGB), needs to have the width and height dimensions
flipped, and needs to be in single precision. The transformer will do all of
these things for you automatically. Use the command:
img = transformer.preprocess('data', img)
to transform the image. You can then run the image through the
neural network by using these commands:
net.blobs['data'].data[...] = img
net.forward()
- Once an image has been run through the neural network, we are ready to
extract features from the network for that image. You will be extracting
features from the fc8 layer of the network. To extract
an image feature from the network for an image, use the command
net.blobs['fc8'].data to get the image
features. Store the
features you extract somewhere for training the SVM along with the folder
that the image came from.
- Train a linear SVM using SKLearn's
LinearSVC function on the train set but do not train on the withheld validation
set or test set. You need to standardize the train set and test set
before training and testing your SVM. You can use the
sklearn.preprocessing.StandardScaler to do this.
Note you will not be using the validation set for this part of the
assignment.
- Test your SVM on the test set (remember to standardize test features
using the train mean and standard deviation first) and report the accuracy of the SVM at
predicting the folder that the image was in. Also
include a confusion matrix of the predictions using the
sklearn.metrics.confusion_matrix function and include it in
your submission. What do you observe about the types of errors the network
makes?
Part V: Preparing Your Own Network
- Before we train the network, we must first set up the network solver,
which contains parameters necessary for training the network. Copy all of
the prototxt files from the /tmp/caffe/models
directory to your own
directory.
- We will begin by editing the solver.prototxt file. You will see the
syntax of the file when you open it. Each variable is on its own line and is
followed by a colon and then the parameter.
- We will first start out by
setting the learning rate. Set the base_lr parameter to be 0.0001. We are
using a slightly lower learning rate than usual because our batch size (the
number of images we use to compute the gradient at each step) will
be small (if we use too high of a learning rate with a small batch size, the
error will not decrease because the changes to the weights in the network
are too large).
- During training, Caffe will decrease the learning
rate after so many iterations so that later training can have less impact on
the weights (the idea is that after a few passes through the training data,
the weights don't need to change as much). Set the gamma parameter to 0.1.
This means that the learning rate will decrease by 10X every
stepsize
iterations.
- We want to save a copy of the network every epoch (one pass
through the train data). To find out how many iterations are in an epoch,
first figure out the number of images in your train set (from line 13), then
divide that number by the train batch size (which is 8) and round down
to the nearest whole number. Set the snapshot parameter to the number of
iterations in an epoch.
- Set the snapshot_prefix to the directory you wish to
save your trained models.
- Set the net parameter to the full path of where
your train_val.prototxt file is.
- Finally, set the stepsize to the number of
iterations in 10 epochs (so 10 times the number from snapshot). This means
the learning rate will decrease by 10X every 10 epochs.
- Now, we will need to change the train_val.prototxt file to handle our
problem. Currently, the network is trained to handle 1000 object classes. We
need to change the classifier output so that there are only 20 outputs (for
our 20 categories). Find
the line: num_output: 1000 and change it to
num_output: 20 to accommodate the
20 object classes in our dataset. You will also need to rename the layer you
changed since you changed the dimensions of the layer. Search the file for
fc8 and rename it to something of your choice (it appears in multiple
places, so be sure to change them all). While you are in this file, you can view
the overall network structure and see the different layers in the network.
Part VI: Training and Evaluating Your Own Network
- We are now ready to begin training in Python. Begin by creating a Caffe solver:
solver = caffe.SGDSolver('Path to
your solver.prototxt')
This instantiates the solver in Python. However, we
don't have enough data to train the network entirely from scratch, so we
will initialize the network to the same weights we used before. To do this,
type:
solver.net.copy_from('/tmp/caffe/models/weights.caffemodel')
- Write a loop to loop through your train set 25 times (25 epochs). You will process 8
images each iteration. For each iteration, randomly choose 8 images and
their labels from your train set (but do not use the same images again in
that epoch). Note: Caffe accepts labels as 0 indexed,
so your labels should be from 0 to 19, not strings. Load the 8 images and
use the transformer to transform them as you did in step 15. You will now create an input "blob" for the Caffe network from the 8 preprocessed images. To do this, concatenate the 8
images along the first dimension to form a Numpy array of shape [8, 3, 227, 227].
Also create a 8x1 labels numpy array which contains an integer from 0 to 19 for
each of the images in the input image blob.
- Provide Caffe with the data and labels using these commands:
solver.net.blobs['data'].data[...] = INPUT MINIBATCH
solver.net.blobs['label'].data[...] = INPUT LABELS
- Train the network on the minibatch using solver.step(1). This tells Caffe to perform one update of the weights
using your minibatch.
- After each step of the solver, get the value of the "loss" layer and
save it in an array. See step 16 for how to get the value of a layer.
- After each epoch of training, evaluate the model on the validation set.
To do this, load and preprocess the images as usual, and run the images
through the network by providing the images and their labels as you did in
step 24 (you will need to run the images through the network in batches of 8).
However, instead of doing net.forward(), you need to access the network using
solver.net.forward() to run the images
through.
Do not use
solver.step because we are not
training on the validation set. Finally, get the accuracy on each minibatch
from the validation set by getting the result of the accuracy layer. Take
the average of all of the accuracies of the minibatches in the validation
set and you have the accuracy of the
network at that epoch.
- After training, you can use the solver.net.save('FILENAME.caffemodel')command to save your
final trained network.
- Provide a plot of the train losses in your report. Also, provide a
second plot of your validation set accuracies (you should have 25 numbers in
this plot).
- Perform Part IV using your trained network instead of the pretrained
model. Use your network which had the best accuracy on the validation set. You can reuse all of your code from Part IV. You will need to change
the line to point to your network instead of the pretrained model:
net = caffe.Net(MODIFIED DEPLOY FILE, YOUR
CAFFEMODEL FILE, caffe.TEST)
Note: you will also need to modify the deploy.prototxt file to have
num_output: 20 and
to have the name of the layer that you changed in the
train_val.prototxt file (i.e. find all
fc8 in the
deploy.prototxt file and rename it to whatever name you chose).
- Report the accuracy of your network on the train set and test set without
the SVM. To do this, you can extract the network's classification scores
for each image by accessing the output
of the fc8 layer (remember to access your re-named version) and using the class with the max score as the network's
prediction to compute the accuracy.
If you need additional help with Python Caffe syntax, you may want to consult
the Caffe examples
here which illustrate the basic Caffe commands. (View the IPYNB files in
your browser). The
classification file is the most straightforward and shows all the main steps
of how to classify an image.
Grading rubric:
- [10 points] Setting up and splitting the data correctly.
- [30 points] Accuracy of pretrained model using SVM, and confusion matrix.
- [40 points] Accuracy of trained model without SVM.
- [10 points] Accuracy of trained model using SVM.
- [10 points] Plot of train losses and validation accuracies.
Acknowledgement: The photos used for this assignment come from the PASCAL VOC dataset.
The network model used in this assignment is
AlexNet.