CS 2770: Homework 2

Due: 3/11/2019, 11:59pm

In this homework assignment, you will use a deep network to perform image categorization. First, you will use a pre-trained network (trained on a different problem) to extract features; and then use these features to train an SVM classifier which discriminates between 20 object categories. Second, you will train a network (with weights initialized from the same pre-trained network) and train it on this task. Finally, you will compare the performance of the pre-trained network to the network you trained.

You will use the Keras package. Keras is an open source neural network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano or PlaidML. We have prepared instructions for you to easily install TensorFlow on the nietzsche.cs.pitt.edu server and use the Keras package (on top of TensorFlow) for your assignment.

If you need additional help with Python Keras syntax, you may want to consult the Keras examples here which illustrate some good examples for Keras.


Training the CNN in this assignment may take a long time, and several of you will be using the limited computing resources at the same time, so be sure to start this assignment early.


Part I: SSH Basics - Getting Connected to the Server and Transferring Files

  1. You will be connecting to the server via SSH. If you are using a Windows machine and haven't used SSH before, you will need to first download a SSH client such as PuTTY. You can download PuTTY from here. If you are using a Mac or Linux, you already have SSH installed.
  2. This server only allows incoming connections from computers in the CS department or via the VPN client. If you are connecting from off campus, you must first install the Pulse VPN client (see here for instructions) in order to connect to the server. Connect to the VPN before trying to ssh to the server.
  3. If you are on a Mac or Linux, open a terminal and type: ssh nah114@nietzsche.cs.pitt.edu and press enter to connect to the server. Note: you need to replace nah114 with your Pitt username. If you are on Windows, open PuTTY and for the host name, enter nietzsche.cs.pitt.edu and click Open to connect to the server. You will need to enter your Pitt username and password when prompted by the server.
  4. Once you are logged in, you will be taken to your AFS home directory and will probably see a "public" and "private" directory (if you have not changed these yourself). Make sure to put any assignment files you are working on in the private directory (or another directory which no one except you can access).
  5. You can either write your Python assignment file on your own computer and transfer it to the server using scp (on Mac or Linux) or WinSCP (you'll need to download this on Windows) to run it on the server or directly write the Python assignment file on the server using a text editor such as vim. On Mac or Linux a scp command to copy a file you've written to the server might look like this (where my username is nah114):
    scp file.py nah114@nietzsche.cs.pitt.edu:/afs/cs.pitt.edu/usr0/nah114/private/ 
    This command will copy the Python file from your computer to your AFS storage space. If you are on Windows and install WinSCP, you will be presented with a GUI interface where you can drag and drop files from your computer to your AFS space. You also can use text editors like Vim, Vi, Nano, Emacs, etc. If you can are comfortable to work with linux text editors, we recommend you to use them rather than writing the code on your computer and transfer it to the server.

Part II: Setting Up Your Environment and Python for TensorFlow

    We have tried to make the required instruction for installing tensorflow as simple as possible. For this, we have prepared a script which performs all of the required commands for creating a virtual environment and installing TensorFlow on the virtual environment.
  1. We have prepared a run.sh script for you; you can get it here. You need to download the code and transfer it to your AFS folder using the scp command.
  2. In the next step you need to run the following command:
    sh run.sh
    After running this command, a virtual environment with name venv will be created. You can find a directory with the name venv in the folder where you run the sh run.sh command and it is recommended to run it inside your private directory. Every time that you need to run TensorFlow, you need to go to the directory which contains the venv folder and run following command:
    source ./venv/bin/activate
  3. After running the source command, you will be in a virtual environment with name venv and you can run your python program. If the name of your program is hw2.py, for running the code, you can simply run the python hw2.py command.
  4. To use TensorFlow and Keras for your project you need to to import required libraries. In the following lines, you can find the libraries that you need for your project:
    import PIL
    import tensorflow as tf
    import numpy as np
    import os
    from tensorflow.python.keras.models import Model, Sequential
    from tensorflow.python.keras.layers import Dense
    from tensorflow.python.keras.applications import VGG16
    from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
    from tensorflow.python.keras.optimizers import Adam
    from tensorflow.python.keras.callbacks import ModelCheckpoint
    PIL is is a free library for the Python programming language that adds support for opening, manipulating, and saving many different image file formats.
    NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
    The OS module in python provides functions for interacting with the operating system.
  5. There are 4 available GPUs in the nietzsche server, and all course members need to use these GPUs for their assignment. To share the GPUs fairly between all of students, we ask that you only use one GPU when you want to train your network and test your program. To do so, you simply need to use a variant of the following line of code after including necessary libraries:
  6. os.environ["CUDA_VISIBLE_DEVICES"]="3"
    In this example, I have used the GPU with number 3 but you need to check which GPUs are available and then set the number of GPU that you wish to use. To check free GPUs you can run the nvidia-smi command and it shows you which GPUs are in use and which GPUs are free. You should use the one which is currently free. In case that you can not find any free GPUs, you need to check the availability of GPUs by nvidia-smi command once in a while until you find a free GPU. Then in your code change the number for os.environ["CUDA_VISIBLE_DEVICES"] to the available GPU and run your code.

Part III: Preparing the Data Directories for the Experiment

  1. We have prepared the data folders that you need for your project. All the data folders are located in the /opt/cs2770/data/ims/cs2770_images/ directory.
  2. Inside the directory you can find three sub-directoies as follows:
  3. train_data
    validation_data
    test_data
  4. To use each of the data folders in the next steps, you need to use the full path to the directory, and in your code you can include three variables as follows:
  5. train_dir = '/opt/cs2770/data/ims/cs2770_images/train_data'
    validation_dir = '/opt/cs2770/data/ims/cs2770_images/validation_data'
    test_dir = '/opt/cs2770/data/ims/cs2770_images/test_data'

Part IV: Loading and Using a Pretrained Network as a Feature Extractor

  1. We will now load in a pretrained CNN model. The model we are loading has been trained on 1.4M images to classify images into 1000 classes (which aren't the same as the categories we aim to classify). To load the model you just need to include following line to your code:
    model = VGG16(include_top=True, weights='imagenet')
    We use this VGG16 function to retrieve the VGG16 pre-trained model. The first argument asks whether to include the fully-connected layer at the top of the network. Since we want to extract the values of the last fully-connected layer as features, we need to set include_top as True. The second argument is for choosing the dataset which has been used for training the VGG16 network, and we would like to use ImageNet.
  2. We need to pre-process each image before the CNN classifies it. If we consider the image_path as the string which contains the path to image, you need to do following steps to extract features from the last fully-connected layer of the VGG16 network. First we need to extract the shape of image which have been used for training of VGG16 model as follow:
    input_shape = model.layers[0].output_shape[1:3]
    We can read the the image as a numpy array by using PIL library:
    img = PIL.Image.open(image_path)
    We should change the size of image to the size of input_shape. As mentioned in the previous part, input_shape is the shape of input image which is acceptable by our network.
    img_resized = img.resize(input_shape, PIL.Image.LANCZOS)
    After resizing the image which is a 3D matrix, we expand the array to a 4D matrix.
    img_array = np.expand_dims(np.array(img_resized), axis=0)
    Note: For this part of the assignment, you will need to read all images in the train and test directories and create a Numpy array for each image, in order to get image features from the pre-trained VGG16 network. You will not be using the validation set for this part of the assignment.

  3. UPDATED: After preparing the Numpy array for the image(s), you need to extract features from fc2 layer. To do so, set a variable with name layer_name and create a new model with name fc2_layer_model as follows:
    layer_name = 'fc2'
    fc2_layer_model = Model(inputs=model.input,outputs=model.get_layer(layer_name).output)
    Then you can extract the image features using the pre-trained model with the following command:
    feature = fc2_layer_model.predict(img_array)
    If your img_array contains the information of just one image, then feature will be a matrix with (1, 4096) dimension. Otherwise, if you have concatenated m images in your img_array, the dimension of feature will be (m, 4096).

  4. After retrieving features from the pre-trained VGG16 network, train a linear SVM using SKLearn's LinearSVC function on the train set but do not train on the withheld validation set or test set. You need to standardize the train set and test set before training and testing your SVM. You can use the sklearn.preprocessing.StandardScaler to do this.
  5. Test your SVM on the test set (remember to standardize test features using the train mean and standard deviation first) and report the accuracy of the SVM at predicting the folder that the image was in. Also include a confusion matrix of the predictions using the sklearn.metrics.confusion_matrix function and include it in your submission. What do you observe about the types of errors the network makes?

Part V: Preparing Your Dataset and Your Own Network

  1. Before training your own network, you need to prepare the data. Data generation in Keras has two main steps. In the first step we need to use the ImageDataGenerator function to specify the way that images need to be pre-processed. Here you can find an example of the ImageDataGenerator command:
    datagen_train = ImageDataGenerator(rescale=1./255)
    datagen_validation = ImageDataGenerator(rescale=1./255)
    ImageDataGenerator generates batches of tensor image data with real-time data augmentation. The data will be looped over (in batches).
    Note: To use VGG16 network we need to rescale the data. Also you should keep in mind that we need to use image data generation for both train and validation data.
  2. In the next step, we should use the flow_from_directory function to create the final verison of data that we need for training. flow_from_directory function takes the path to a directory and generates batches of augmented data.
    batch_size=8
    generator_train = datagen_train.flow_from_directory(directory=train_dir, target_size=input_shape, batch_size = batch_size)
    generator_validation = datagen_validation.flow_from_directory(directory=validation_dir, target_size=input_shape, batch_size = batch_size)
    Note: We have set the value of input_shape in the previous steps.
    target_size is the image size that we want to use for training the network, and we use input_shape as target_sizebatch_size is the number of images we use to compute the gradient at each step.
  3. In the next step we need to transfer all layers of VGG16 until the prediction layer, and add a dense layer with softmax on top of the network. The latter has dimensionality equal to the number of classes.
    First we need to transfer all layers until fc2 from VGG16:
    transfer_layer = model.get_layer('fc2')
    Then we need to create a model from the tranfered part of the network:
    tranfered_model = Model(inputs=model.input, outputs=transfer_layer.output)
    To create our final model, we need to concatenate the layers from VGG16 with the last softmax layer which we will add in the following steps. Since the concatenation should be done sequentially, we use the Sequential() function for creating our new model as follows:
    new_model = Sequential()
    Then we need to add the transferred model to our final model:
    new_model.add(tranfered_model)
    To complete the architecture of our network we should add the final dense layer for the actual classification:
    num_classes = 20
    new_model.add(Dense(num_classes, activation='softmax'))
    Since we just want to train on the last layer of the network, we need to set the transferred layer to not be trainable:
    for layer in tranfered_model.layers:
      layer.trainable = False


    NEW: As part of requirement in homework you need to play with different parameters for training your network. One of the possible options is to set other layers as trainable. For example, if you do not use the aforementioned lines (i.e. don't set trainable value for every layer as False), then network will be trained on all of the layers. If you just want to train a specific layer (e.g. fc2), you can use the following lines of code:
    for layer in tranfered_model.layers:
      if 'fc2' in layer.name:
        layer.trainable = True
      else:
        layer.trainable = False

    Hint: To observe the layers of tranfered_model, you can print its summary as follows:
    print(tranfered_model.summary())

  4. We also need to set the optimizer, learning rate, loss and evaluation metric for our network:
    optimizer = Adam(lr=1e-5)
    loss = 'categorical_crossentropy'
    metrics = ['categorical_accuracy']
  5. Finally we need to compile the model for the changes that have happened:
    new_model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
  6. Keras does not by default save the snapshot weights of the model in every epoch of training. In this assignment, you need to use the weight of the model in the training phase which has the highest accuracy on the validation set. For saving the model weights, you can use the ModelCheckpoint function. To use this function, first we need to specify the name of the file that we want to use for the weight snapshot. The format for model weights is h5 and as example you can declare the file path as follow:
    weight_path = 'best_weight.h5'
    Note: We recommend you to save the model in a different folder. For example you can create a folder with name model_weights and save the models in this folder.
    After setting the path for the weight file, you can use the following command to save the file:
    checkpoint = ModelCheckpoint(weight_path, monitor='val_categorical_accuracy', verbose=1, save_best_only=True, mode='max' , period=5)
    The monitor argument is for setting the quantity that we would like to monitor. The verbose argument is for showing the verbosity mode during the training and we set it to 1 because we would like to see the progress of training in every epoch. Since we just need the weight of the best model, we should set save_best_only as Truemode is for setting when to start/stop training. Since we are monitoring val_categorical_accuracy and we would like our accuracy to increase as much as possible, we need to set mode as max which means training should stop when val_categorical_accuracy has stopped increasing. Saving a model weight for every iteraion makes the processing of training longer because weight files are big. To avoid waisting time for saving a big file during the training process, instead of saving the model in every epoch, we can save the model weight every five epoch and set period value as 5.
    Note: Based on the aformentioned configuration, the best_weight.h5 file will be overwritten every time that you run the program for training. Just keep in mind that if you want to keep the file for the best weights, you should rename it before starting to run your training code to prevent it from being overwritten.
  7. One of interesting features of TensorFlow which make it possible to visualize the training process is TensorBoard. TensorBoard gives you the option to visualize accuracy and loss graphs, images which have been used during training, etc. In the following line you can find an exmaple of python code for using Tensorboard:
    tensorboad = TensorBoard(log_dir='./logs', batch_size=batch_size)
    The first argument of the function is a directory in which log files will be saved and the second argument is batch_size which we have specified in previous lines. Note that you do not need to create the log directory.
  8. At the end, you need to concatenate the checkpoint and tensorboad to create list of callbacks and callbacks_list will be used in training of the network.
    callbacks_list = [checkpoint , tensorboard]

Part VI: Training and Evaluating Your Own Network

  1. We are now ready to begin training in Python. For training we need to set the number of epochs, steps for every epoch and number of validation batches. The following are good values you can use.
    epochs = 25
    steps_per_epoch = 100
    number_of_validation_batches = generator_validation.n / batch_size
  2. In the next step we need to run fit_generator as follows based on the variables that we have set before:
    history = new_model.fit_generator(generator=generator_train, epochs = epochs, steps_per_epoch = steps_per_epoch, validation_data = generator_validation, validation_steps = number_of_validation_batches, callbacks=callbacks_list)
    By running this command your network starts to train and you can see in your console window the progress of training, accuracy of training, and accuracy of validation for every epoch.
    Note: The training for 25 epochs should roughly take 1.5 hours if you use one GPU.
  3. The output of the fit_generator function gives you the information about the loss and accuracy on the train and validation sets. Use the output of new_model.fit_generator and provide a plot of the train losses in your report. Also, provide a second plot of your validation set accuracies (you should have 25 numbers in this plot which is the number of epochs). As an example you can get the training accuracy for all epochs with the following command:
    acc = history.history['categorical_accuracy']
  4. To visualize the Tensorboard in your browser you need to go through following steps:
    First you need to open a new terminal in similar way that you did in Part I with some minor differences. In this case your ssh command should be something like following command:
    ssh -N -f -L localhost:16006:localhost:6006 nah114@nietzsche.cs.pitt.edu
    If your model is still training, you need to open another terminal window in the exact same way as Part I. Then you need to go into the directoy in which your code is located and run following command:
    tensorboard --logdir logs --port 6006
    To see the Tensorboard in your browser, you just need to open following link:
    http://localhost:16006
    Note: Keep in mind that by running the aformentioned commands, all the log files located in logs directory will be shown in tensorboard. Since in every run of a program a log file will be created, to see a clear graph from your training you need to remove the log file from the previous runs of training.
  5. UPDATED: Now use the output from the last layer as predictions. Since those predictions are probabilities, you can pick the class with highest probability as the predicted label.
    Note: We have prepared img_array in Part IV.
    predictions = new_model.predict(img_array)
  6. UPDATED: In step 21, we give a suggestion to experiment with setting different layers to be trainable. If you trained more layers of your network than just the final softmax layer (e.g. up to fc2, fc1), please extract features from fc2 like in Part IV but now on the network that you trained, train an SVM on these features, and report the accuracy.

Part VII: Repeating Experiments with Different Parameters

  1. Retrain your model with different parameters (at least two) and include the confusion matrix and accuracy of model on your test data. You can play with learning rate and type of optimizer.

Deliverables:

  1. Two python files with name hw2_pretrained.py and hw2_train.py. In a script hw2_pretrained.py you should implement Part IV and print the accuracy of classification based on extracted features from the pre-trained network on test data. In a script hw2_train.py you should implement Parts V and VI and print the accuracy of classification based on your trained network and on test data.
  2. A complete report from all of your experiments including confusion matrix, accuracies and discussions.
  3. Note: Do NOT submit any log file or h5 file.

Grading rubric:

  1. [20 points] Prepare train and test features by using the pre-trained model.
  2. [15 points] Train an SVM on the extracted features from the pre-trained model, report the confusion matrix and accuracies.
  3. [5 points] Prepare data for network training.
  4. [10 points] Transfer the layers from VGG16 for your new network.
  5. [15 points] Set all the required parameters for your new network.
  6. [15 points] Train the network and report the confusion matrix and accuracies.
  7. [20 points] Train with at least two other parameters and discuss the effect of these parameters on the results.
  8. Note: If your code can not be run, you can only receive up to 50% of the grade, even if you have all the required information in your report.

Acknowledgements: This assignment was prepared for you by Narges Honarvar Nazari, partly adapted from this tutorial, and based on an assignment developed by Chris Thomas. The photos used for this assignment come from the PASCAL VOC dataset. The network model used in this assignment is the VGG16 network.