CS1674: Homework 9

Due: 12/4/2023, 11:59pm

This assignment is worth 50 points.

Please read all instructions before you begin.

In this assignment, you will train deep networks to perform categorization, and test a pretrained network that performs detection. The assignment has five parts. In the first part, you will train a neural network from scratch. In the second, you will transfer layers from a pretrained network, then append and train one fully-connected (FC) layer. In the third part, you will experiment with data augmentation, and with the amount of training data used. In the fourth part, you will experiment with freezing layers (setting their learning rate to 0). In the fifth part, you will test the performance of a pretrained object detector, using code you write to compute intersection-over-union (IOU). In a file answers.txt/pdf/docx, list all accuracies for each setting described below.

You will use the same dataset as for HW7, but resized (for compatibility with a pretrained network we will use that expects 227x227 inputs). You can either (1) create a separate folder with the same eight folders (categories) as those in HW7, but inside each folder, copy only the images with "resized" in the filename, resulting in 8x150 images total, or (2) you can download a new copy of the data with just the resized images from Canvas.

You will also need to install the Matlab Deep Learning (DL) Toolbox add-on. Go to Home --> Add-ons to do that. You can also use Matlab online (https://matlab.mathworks.com/) after you upload the scenes_lazebnik folder to your Matlab Drive; make sure the zip file is unzipped.

You will need to rely on the Matlab documentation to learn how to use the built-in neural network functions. Any existing functions are fair game-- of course, do not look for scripts that accomplish the entirety of what an assignment part asks. However, for this assignment, the goal is to learn how to use the documentation, hence please do look up functions and examples. Some useful links are below. Please skim through all of them to get a sense of how the DL toolbox works.

Some of the functions you will need to call are imageDatastore, splitEachLabel (to load and split the dataset; we will not use load_split_dataset from before), imageDataAugmenter, augmentedImageDatastore, trainingOptions, trainNetwork, classify. Doing the assignment will be very easy if you skim through the references above.

Unless otherwise specified, use a learning rate of 0.0001, a maximum of 5 epochs, 100 images per class for training, and the rest (50) for testing. Training should not take more than a few minutes for each setting. Always report performance on the test set. Because test accuracy will vary depending on the run, and the difference in accuracy between different networks will not be very large in some cases, you should run each experiment 5 times, then report the average accuracy (and for completeness, the standard deviation over the 5 accuracies).

For each problem, write your code in a separate script titled part_X.m where X is i, ii, iii, iv, v. Also submit a separate file where you describe and compare the performance of your different networks. Briefly hypothesize why you observe these trends, based on what we have discussed in class.

Part I: Training a Basic Network from Scratch (10 pts)

In this part, you will train a neural network for the task of classifying the eight scenes from HW7, from scratch.

You need to specify a folder for the train set and the test set. Refer to the "train network" link for details on how to set up the data.
Create a network with three types of layers (denoted A, B, C in the following). First, use an image input layer; this layer takes in the images at size 227x227x3. Next, use a group of layers for type A: a layer with 50 11x11 layers (same size as the filters in the first layer of AlexNet), followed by RELU and a max pooling layer of size 3x3 and stride 1. Then use a group B of 60 5x5 filters, RELU and max pooling of size 3x3 and stride 2. Then create group C: a fully-connected layer with size 8 (for 8 classes), followed by a softmax layer (which computes probabilities) and a classification layer. Check the links above for the corresponding functions and their inputs format.
You need to specify options for training the network. Use the "training options" link above. Specify the max number of epochs, the learning rate, and set the 'Plots' variable such that it shows training progress. Start out by showing the plots for the first few parts of the assignment to get a sense of what is happening, but then you can disable them.
For simplicity, we will not use a validation set. Train the network and output performance on the test set after the last iteration. To compute test accuracy, you can use the classify function (to get predictions), and the imdsTest.Labels variable to get the ground-truth labels on the test set.
In your answers file, hypothesize why you see such high/low performance (accuracy). Keep in mind what performance was in HW7.

Part II: Transferring Layers from Pretrained Network (10 pts)

(A) In this part, you will transfer layers from an AlexNet network trained on the ImageNet dataset. Refer to the "transfer learning" link. Transfer all layers up to but excluding the FC6 layer. It will be helpful to see what layers you are transfering; type net.Layers in the Matlab shell to get a list of the layers in AlexNet. Append a single fully-connected layer (of size 8), followed by softmax and classification. Then train and evaluate performance, and describe your observations.

(B) Next, additionally transfer FC6 and FC7, all the way up to (but excluding) FC8. Also transfer the layers that come after FC6 and FC7 (RELU, dropout). Now append a single fully-connected layer, as before. Train the network, evaluate performance on the test set, and describe your observations in the answers file.

Part III: Using Data Augmentation (10 pts)

In this part, you will experiment with different forms of data augmentation, computed on top of the network described in Part II (A). Refer to the "data augmentation" reference link above for an example of how to use augmentation, and the reference for the data augmenter for the list of augmentations you can use (X or Y reflection, rotation, etc., under Properties). You can use the following line to augment the training set, once you have created the original data store and train/test split (in Part I) and the augmenter: imdsTrain = augmentedImageDatastore(imageSize,imdsTrain,'DataAugmentation',imageAugmenter);

Some augmentations will give you better results than using no augmentations, and some will give you worse results. Experiment with different combinations of augmentations, report results with at least two combinations in your answers file, and keep the best setting in your submitted code. You can use this line to visualize the augmentations: minibatch = preview(imdsTrain); imshow(imtile(minibatch.input));

You will likely need to use less than 100 training images per class (e.g. 3 or 10) to see the effect of data augmentation. Thus, include results with at least two different settings of the number of training images used, and augmentation / no augmentation for each. Compare the networks that use data augmentation to one that does not, and compare networks that use different forms of data augmentation.

Part IV: Experimenting with Freezing Transferred Layers (5 pts)

In this part, you will use the network from Part II (A), but freeze all layers that you transferred from AlexNet. To do this, you can call layers(i).WeightLearnRateFactor = 0; on all layers that have learnable parameters (weights). This sets the learning rate to 0, i.e. no gradient updates will take place in those layers. Use 10 training images. In your answers file, compare networks that do or do not freeze the transferred layers.

Part V: Object Detection (15 pts)

In this part, you will test the performance of a pretrained object detector on three images: cars1.jpg, cars2.jpg and cars3.jpg. You will use some parts of the tutorial shown here. First, download the pretrained object detector for vehicles, from here (e.g. you can use the websave function). Access pretrained.detector, read in an image (with imread), resize the image to 224x224, and extract the bboxes, scores outputs from applying the detect function (using the detector and image as inputs). Then visualize the results using the insertObjectAnnotation function and imshow.

Finally (and this is the only part you are really implementing anything), compute the intersection-over-union of the predicted box and a ground-truth box, for each of the three examples. You can estimate (and write in your script) what the ground-truth box is for each image manually, using impixelinfo. If there are more that one cars per image, draw your bounding box over the detected car. Note that detect gives you one set of x, y coordinates, and a height and width. You can convert this to two sets of x, y coordinates (for two corners of the box). Then think how you can subtract coordinates of box corners to get the size of the overlapping area of the two boxes. Divide this area by the area of the union of the two boxes (sum the ground-truth and predicted box areas, but don't double-count the area of the overlap). Report your results in the answers file.

Submission:

part_i.m
part_ii.m
part_iii.m
part_iv.m
part_v.m
answers.txt/pdf/docx