CS 2770: Homework 1 (Python Version)

Due: 2/9/2017, 11:59pm

In this homework assignment, you will use a deep network to perform image categorization. You will first use a pretrained network (trained on a different problem) to extract features. You will then use these features to train a SVM classifier which discriminates between 20 object categories. You will then train a network (with weights initialized from the same pre-trained network) and train it on this task. Finally, you will compare the performance of the pre-trained network to the network you trained on this problem.

You will use the Caffe package, a very popular deep learning framework for computer vision. Caffe is a C++ framework, but has both Python and Matlab interfaces. This page is for the Python interface. We have installed Caffe for you on the nietzsche.cs.pitt.edu server.

Training the CNN in this assignment may take a long time, and several of you will be using the limited computing resources at the same time, so be sure to start this assignment early.

Part I: SSH Basics - Getting Connected to the Server and Transferring Files

  1. You will be connecting to the server via SSH. If you are using a Windows machine and haven't used SSH before, you will need to first download a SSH client such as PuTTY. You can download PuTTY from here. If you are using a Mac or Linux, you already have SSH installed.
  2. This server only allows incoming connections from computers in the CS department or via the VPN client. If you are connecting from off campus, you must first install the Pulse VPN client (see here for instructions) in order to connect to the server. Connect to the VPN before trying to ssh to the server.
  3. If you are on a Mac or Linux, open a terminal and type: ssh nietzsche.cs.pitt.edu and press enter to connect to the server. If you are on Windows, open PuTTY and for the host name, enter nietzsche.cs.pitt.edu and click Open to connect to the server. You will need to enter your departmental username and password when prompted by the server.
  4. Once you are logged in, you will be taken to your AFS home directory and will probably see a "public" and "private" directory (if you have not changed these yourself). Make sure to put any assignment files you are working on in the private directory (or another directory which no one except you can access).
  5. You can either write your Python assignment file on your own computer and transfer it to the server using scp (on Mac or Linux) or WinSCP (you'll need to download this on Windows) to run it on the server or directly write the Python assignment file on the server using a text editor such as vim. On Mac or Linux a scp command to copy a file you've written to the server might look like this (where my username is chris):
    scp file.py chris@nietzsche.cs.pitt.edu:/afs/cs.pitt.edu/usr0/chris/private/ 
    This command will copy the Python file from your computer to your AFS storage space. If you are on Windows and install WinSCP, you will be presented with a GUI interface where you can drag and drop files from your computer to your AFS space.

Part II: Setting Up Your Environment and Python for Caffe

  1. Caffe requires libraries to be visible to Python for it to work. We need to tell Python where these libraries are located. Before starting Python, copy paste the following directly into the shell on the server:
    bash (press enter after each line)
    export LD_LIBRARY_PATH=/tmp/caffe/ffmpeg:/tmp/caffe/distribute/lib:/opt/cuda-8.0-cuDNN5.1/lib64:/tmp/caffe/opencv/install/lib:/tmp/caffe/anaconda2/lib:/opt/OpenBLAS/lib:/usr/local/lib
    export PYTHONHOME=/tmp/caffe/anaconda2
    export PYTHONPATH=/tmp/caffe/python
    export PATH=/opt/cuda-8.0-cuDNN5.1/bin:/tmp/caffe/anaconda2/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
  2. To launch Python on the server, type python. You can then begin typing commands as you normally would in Python or run a script that you write by typing python script_name.py
  3. Python needs to know where to find its interface to the Caffe library and will need some additional libraries to load Caffe. Add the following lines to the top of your script:
    import numpy as np
    import matplotlib.pyplot as plt
    from PIL import Image
    from sklearn import svm

    import caffe
  4. You will be using a GPU to accelerate your CNNs. There are 4 GPUs on this machine. Type nvidia-smi (before starting Python) to view the 4 GPUs on the machine. Look in the center column and you will see four lines like: 0MiB / 11439MiB which show the memory utilization on the GPU. The first number is the current utilization. Note which GPU has the least memory utilization on the machine (this will change depending on who is using what GPUs). Once a model loads on the GPU, the memory is unable to be used by anybody else, so make sure to exit Python after you are done doing your work so as not to exclusively hold memory unnecessarily.
  5. Add the following lines to the top of your script:
    caffe.set_device( #ENTER THE GPU NUMBER YOU NOTED ABOVE (0-3) HERE )
    caffe.set_mode_gpu()

Part III: Preparing the Dataset for the Experiment

  1. The data for this assignment is located at /tmp/caffe/data/. You will find 20 folders with images in them. Each folder is the category of the image. For each image, you will need to extract image features from the CNN and store them in a variable along with the folder name that the image came from. Later, you will train a linear SVM using these features to predict which folder an image came from. In the second part of the assignment, you will train the network on these images.
  2. You will need to randomly withhold 10% of the images as a validation set for training the CNN. Withhold an additional 10% as a test set for evaluation. Make sure to retain your data split for the entire assignment because you will use the same data split for training, validating, and testing the neural network.

Part IV: Using a Pretrained Network as a Feature Extractor

  1. We will now load in a pretrained CNN model. The model we are loading has been trained on 1.4M images to classify images into 1000 classes (which aren't necessarily the animals we will be classfying). Add the following line to your script:
    net = caffe.Net('/tmp/caffe/models/deploy.prototxt', '/tmp/caffe/models/weights.caffemodel', caffe.TEST)

    The caffe.Net function loads a network model for use in Python. The first argument specifies a file containing the network structure which tells Caffe how the various network layers connect. The second argument specifies the learned model to load containing the weights learned during training and copies those weights into the network structure created by the first argument.  The final argument tells Caffe to load the network in test mode, rather than train mode. You will see a lot of output appear once you execute this command which you can ignore.
  2. We need to preprocess each image before the CNN classifies it. Set up the Python data transformer using this code:
    transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
    transformer.set_mean('data', np.load('/tmp/caffe/python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1))
    transformer.set_transpose('data', (2,0,1))
    transformer.set_channel_swap('data', (2,1,0))
    transformer.set_raw_scale('data', 255.0)
  3. In order to extract the features for an image, first load the image in Python using the: caffe.io.load_image('/path/to/image/to/load.jpg') function. Caffe expects images in BGR format (instead of RGB), needs to have the width and height dimensions flipped, and needs to be in single precision. The transformer will do all of these things for you automatically. Use the command: img = transformer.preprocess('data', img)  to transform the image. You can then run the image through the neural network by using these commands:
    net.blobs['data'].data[...] = img
    net.forward()
  4. Once an image has been run through the neural network, we are ready to extract features from the network for that image. You will be extracting features from the fc8 layer of the network. To extract an image feature from the network for an image, use the command net.blobs['fc8'].data to get the image features. Store the features you extract somewhere for training the SVM along with the folder that the image came from.
  5. Train a linear SVM using SKLearn's LinearSVC function on the train set but do not train on the withheld validation set or test set. You need to standardize the train set and test set before training and testing your SVM. You can use the sklearn.preprocessing.StandardScaler to do this. Note you will not be using the validation set for this part of the assignment.
  6. Test your SVM on the test set (remember to standardize test features using the train mean and standard deviation first) and report the accuracy of the SVM at predicting the folder that the image was in. Also include a confusion matrix of the predictions using the sklearn.metrics.confusion_matrix function and include it in your submission. What do you observe about the types of errors the network makes?

Part V: Preparing Your Own Network

  1. Before we train the network, we must first set up the network solver, which contains parameters necessary for training the network. Copy all of the prototxt files from the /tmp/caffe/models directory to your own directory.
  2. We will begin by editing the solver.prototxt file. You will see the syntax of the file when you open it. Each variable is on its own line and is followed by a colon and then the parameter.
  3. Now, we will need to change the train_val.prototxt file to handle our problem. Currently, the network is trained to handle 1000 object classes. We need to change the classifier output so that there are only 20 outputs (for our 20 categories). Find the line: num_output: 1000 and change it to num_output: 20 to accommodate the 20 object classes in our dataset. You will also need to rename the layer you changed since you changed the dimensions of the layer. Search the file for fc8 and rename it to something of your choice (it appears in multiple places, so be sure to change them all). While you are in this file, you can view the overall network structure and see the different layers in the network.

Part VI: Training and Evaluating Your Own Network

  1. We are now ready to begin training in Python. Begin by creating a Caffe solver:
    solver = caffe.SGDSolver('Path to your solver.prototxt')

    This instantiates the solver in Python. However, we don't have enough data to train the network entirely from scratch, so we will initialize the network to the same weights we used before. To do this, type:
    solver.net.copy_from('/tmp/caffe/models/weights.caffemodel')
  2. Write a loop to loop through your train set 25 times (25 epochs). You will process 8 images each iteration. For each iteration, randomly choose 8 images and their labels from your train set (but do not use the same images again in that epoch). Note: Caffe accepts labels as 0 indexed, so your labels should be from 0 to 19, not strings. Load the 8 images and use the transformer to transform them as you did in step 15. You will now create an input "blob" for the Caffe network from the 8 preprocessed images. To do this, concatenate the 8 images along the first dimension to form a Numpy array of shape [8, 3, 227, 227]. Also create a 8x1 labels numpy array which contains an integer from 0 to 19 for each of the images in the input image blob.
  3. Provide Caffe with the data and labels using these commands:
    solver.net.blobs['data'].data[...] = INPUT MINIBATCH
    solver.net.blobs['label'].data[...] = INPUT LABELS
  4. Train the network on the minibatch using solver.step(1). This tells Caffe to perform one update of the weights using your minibatch.
  5. After each step of the solver, get the value of the "loss" layer and save it in an array. See step 16 for how to get the value of a layer.
  6. After each epoch of training, evaluate the model on the validation set. To do this, load and preprocess the images as usual, and run the images through the network by providing the images and their labels as you did in step 24 (you will need to run the images through the network in batches of 8). However, instead of doing net.forward(), you need to access the network using solver.net.forward() to run the images through. Do not use solver.step because we are not training on the validation set. Finally, get the accuracy on each minibatch from the validation set by getting the result of the accuracy layer. Take the average of all of the accuracies of the minibatches in the validation set and you have the accuracy of the network at that epoch.
  7. After training, you can use the solver.net.save('FILENAME.caffemodel')command to save your final trained network.
  8. Provide a plot of the train losses in your report. Also, provide a second plot of your validation set accuracies (you should have 25 numbers in this plot).
  9. Perform Part IV using your trained network instead of the pretrained model. Use your network which had the best accuracy on the validation set. You can reuse all of your code from Part IV. You will need to change the line to point to your network instead of the pretrained model:
    net = caffe.Net(MODIFIED DEPLOY FILE, YOUR CAFFEMODEL FILE, caffe.TEST)
  10. Note: you will also need to modify the deploy.prototxt  file to have num_output: 20 and to have the name of the layer that you changed in the train_val.prototxt file (i.e. find all fc8 in the deploy.prototxt file and rename it to whatever name you chose).
  11. Report the accuracy of your network on the train set and test set without the SVM. To do this, you can extract the network's classification scores for each image by accessing the output of the fc8 layer (remember to access your re-named version) and using the class with the max score as the network's prediction to compute the accuracy.

If you need additional help with Python Caffe syntax, you may want to consult the Caffe examples here which illustrate the basic Caffe commands. (View the IPYNB files in your browser). The classification file is the most straightforward and shows all the main steps of how to classify an image.

Grading rubric:

  1. [10 points] Setting up and splitting the data correctly.
  2. [30 points] Accuracy of pretrained model using SVM, and confusion matrix.
  3. [40 points] Accuracy of trained model without SVM.
  4. [10 points] Accuracy of trained model using SVM.
  5. [10 points] Plot of train losses and validation accuracies.
Acknowledgement: The photos used for this assignment come from the PASCAL VOC dataset. The network model used in this assignment is AlexNet.