CS 1699: Homework 4

Due: March 31, 2020 23:59 EST

This assignment is worth 50 points. Please contact Mingda Zhang (mzhang@cs.pitt.edu) if you have any issues/questions regarding this assignment.
We strongly recommend you to spend some time (approximately no more than 1-2 hours) to read the recommended implementation and our starter code, before you start on your own.
Excluding the time for training the models (please leave a few days for training), we expect this assignment to take no more than 4 - 6 hours.


Part I: Implement AlexNet - Winner of ILSVRC 2012 (15 points)

AlexNet is a milestone in the resurgence of deep learning, and it astonished the computer vision community by winning the ILSVRC 2012 by a large margin.
In this assignment, you need to implement the original AlexNet using PyTorch. The model architecture is shown in the following figure, which is from their original paper.
More specifically, your AlexNet should have the following architecture (e.g. for domain prediction task):
================================================================================================================
    Layer (type)      Kernel      Padding      Stride      Dilation          Output Shape           Param #     
----------------------------------------------------------------------------------------------------------------
        Conv2d-1     11 x 11                        4                    [-1, 96, 55, 55]            34,944
          ReLU-2                                                         [-1, 96, 55, 55]                 0
     MaxPool2d-3           3                        2                    [-1, 96, 27, 27]                 0
        Conv2d-4       5 x 5            2                               [-1, 256, 27, 27]           614,656
          ReLU-5                                                        [-1, 256, 27, 27]                 0
     MaxPool2d-6           3                        2                   [-1, 256, 13, 13]                 0
        Conv2d-7       3 x 3            1                               [-1, 384, 13, 13]           885,120
          ReLU-8                                                        [-1, 384, 13, 13]                 0
        Conv2d-9       3 x 3            1                               [-1, 384, 13, 13]         1,327,488
         ReLU-10                                                        [-1, 384, 13, 13]                 0
       Conv2d-11       3 x 3            1                               [-1, 256, 13, 13]           884,992
         ReLU-12                                                        [-1, 256, 13, 13]                 0
    MaxPool2d-13           3                        2                     [-1, 256, 6, 6]                 0
      Flatten-14                                                               [-1, 9216]                 0
      Dropout-15                                                               [-1, 9216]                 0
       Linear-16                                                               [-1, 4096]        37,752,832
         ReLU-17                                                               [-1, 4096]                 0
      Dropout-18                                                               [-1, 4096]                 0
       Linear-19                                                               [-1, 4096]        16,781,312
         ReLU-20                                                               [-1, 4096]                 0
       Linear-21                                                                  [-1, 4]            16,388
================================================================================================================
Note: The `-1` in Output shape represents `batch_size`, which is flexible during program execution.
    
However, before you get started, we have several hints hopefully to make your life easier: Instructions:
  1. Download the dataset and the starter code.
  2. Complete the implementation of class AlexNet, and training the model for domain prediction task. A sample usage for the provided training/evaluation script is
    python trainer.py --task_type=training --label_type=domain --learning_rate=0.001 --batch_size=128 --experiment_name=demo
    
  3. You may need to tune the hyperparamters to achieve better performance.
  4. Report the model architecture (i.e. call print(model) and copy/paste the output in your report), as well as the accuracy on the validation set in your writeup.

Part II: Enhancing AlexNet (20 points)

In this part, you need to modify the AlexNet in previous part, and train different models with the following changes.
Just a friendly reminder, if you implemented AlexNet following our recommendation, it should be very easy (e.g. just changing a few lines) to perform the following modifications.

Instructions:
  1. Larger kernel size
  2. Initial AlexNet has 5 convolutional kernels as defined in the table in Part I.
    We observe that for the 1st, 2nd and 5th convolutional layers, a MaxPool2d layer is followed to downsample the inputs.
    An alternative strategy is to use larger convolutional kernel (thus larger receptive field) and larger stride, which gives smaller output directly.
    Please copy your AlexNet to a new class named AlexNetLargeKernel, and implement the model following the architectures given below.
    class AlexNetLargeKernel
    ================================================================================================================
        Layer (type)      Kernel      Padding      Stride      Dilation          Output Shape           Param #     
    ----------------------------------------------------------------------------------------------------------------
            Conv2d-1     21 x 21            1           8                    [-1, 96, 27, 27]           127,104
              ReLU-2                                                         [-1, 96, 27, 27]                 0
            Conv2d-3       7 x 7            2           2                   [-1, 256, 13, 13]         1,204,480
              ReLU-4                                                        [-1, 256, 13, 13]                 0
            Conv2d-5       3 x 3            1                               [-1, 384, 13, 13]           885,120
              ReLU-6                                                        [-1, 384, 13, 13]                 0
            Conv2d-7       3 x 3            1                               [-1, 384, 13, 13]         1,327,488
              ReLU-8                                                        [-1, 384, 13, 13]                 0
            Conv2d-9       3 x 3                        2                     [-1, 256, 6, 6]           884,992
             ReLU-10                                                          [-1, 256, 6, 6]                 0
          Flatten-11                                                               [-1, 9216]                 0
          Dropout-12                                                               [-1, 9216]                 0
           Linear-13                                                               [-1, 4096]        37,752,832
             ReLU-14                                                               [-1, 4096]                 0
          Dropout-15                                                               [-1, 4096]                 0
           Linear-16                                                               [-1, 4096]        16,781,312
             ReLU-17                                                               [-1, 4096]                 0
           Linear-18                                                                  [-1, 4]            16,388
    ================================================================================================================
        
    Please use the same optimal hyperparameter with Part I to train the new model, and report architecture and accuracy in your report.

  3. Smaller number of filters
  4. AlexNet is initially designed for ImageNet classification, which aims for classifying a given input image into 1000 classes. For the PACS dataset in this assignment, the label is either 4 domains or 7 classes, thus intuitively the task is much simpler than the ImageNet classification.
    Therefore, a reasonable change is to reduce the number of filters in the model, i.e. AlexNetTiny.
    Please copy your AlexNet to a new class named AlexNetTiny, and implement the model following the architectures given below.
    class AlexNetTiny
    ================================================================================================================
        Layer (type)      Kernel      Padding      Stride      Dilation          Output Shape           Param #
    ----------------------------------------------------------------------------------------------------------------
            Conv2d-1     11 x 11                        4                    [-1, 48, 55, 55]            17,472
              ReLU-2                                                         [-1, 48, 55, 55]                 0
         MaxPool2d-3           3                        2                    [-1, 48, 27, 27]                 0
            Conv2d-4       5 x 5            2                               [-1, 128, 27, 27]           153,728
              ReLU-5                                                        [-1, 128, 27, 27]                 0
         MaxPool2d-6           3                        2                   [-1, 128, 13, 13]                 0
            Conv2d-7       3 x 3            1                               [-1, 192, 13, 13]           221,376
              ReLU-8                                                        [-1, 192, 13, 13]                 0
            Conv2d-9       3 x 3            1                               [-1, 192, 13, 13]           331,968
             ReLU-10                                                        [-1, 192, 13, 13]                 0
           Conv2d-11       3 x 3            1                               [-1, 128, 13, 13]           221,312
             ReLU-12                                                        [-1, 128, 13, 13]                 0
        MaxPool2d-13           3                        2                     [-1, 128, 6, 6]                 0
          Flatten-14                                                               [-1, 4608]                 0
          Dropout-15                                                               [-1, 4608]                 0
           Linear-16                                                               [-1, 2048]         9,439,232
             ReLU-17                                                               [-1, 2048]                 0
          Dropout-18                                                               [-1, 2048]                 0
           Linear-19                                                               [-1, 1024]         2,098,176
             ReLU-20                                                               [-1, 1024]                 0
           Linear-21                                                                  [-1, 4]             4,100
    ================================================================================================================
        
    Please use the same optimal hyperparameter with Part I to train the new model, and report architecture and accuracy in your report.

  5. Pooling strategies
  6. Another tweak to the AlexNet is the pooling layer. Instead of MaxPool2d another common pooling strategy is AvgPool2d, i.e. to average all the neurons in the receptive field.
    Please copy your AlexNet to a new class named AlexNetAvgPooling, and implement the model following the architectures given below.
    class AlexNetAvgPooling
    ================================================================================================================
        Layer (type)      Kernel      Padding      Stride      Dilation          Output Shape           Param #
    ----------------------------------------------------------------------------------------------------------------
            Conv2d-1     11 x 11                        4                    [-1, 96, 55, 55]            34,944
              ReLU-2                                                         [-1, 96, 55, 55]                 0
         AvgPool2d-3           3                        2                    [-1, 96, 27, 27]                 0
            Conv2d-4       5 x 5            2                               [-1, 256, 27, 27]           614,656
              ReLU-5                                                        [-1, 256, 27, 27]                 0
         AvgPool2d-6           3                        2                   [-1, 256, 13, 13]                 0
            Conv2d-7       3 x 3            1                               [-1, 384, 13, 13]           885,120
              ReLU-8                                                        [-1, 384, 13, 13]                 0
            Conv2d-9       3 x 3            1                               [-1, 384, 13, 13]         1,327,488
             ReLU-10                                                        [-1, 384, 13, 13]                 0
           Conv2d-11       3 x 3            1                               [-1, 256, 13, 13]           884,992
             ReLU-12                                                        [-1, 256, 13, 13]                 0
        AvgPool2d-13           3                        2                     [-1, 256, 6, 6]                 0
          Flatten-14                                                               [-1, 9216]                 0
          Dropout-15                                                               [-1, 9216]                 0
           Linear-16                                                               [-1, 4096]        37,752,832
             ReLU-17                                                               [-1, 4096]                 0
          Dropout-18                                                               [-1, 4096]                 0
           Linear-19                                                               [-1, 4096]        16,781,312
             ReLU-20                                                               [-1, 4096]                 0
           Linear-21                                                                  [-1, 4]            16,388
    ================================================================================================================
        
    Please use the same optimal hyperparameter with Part I to train the new model, and report architecture and accuracy in your report.

  7. Dilated convolutions (a.k.a. atrous convolutions)
  8. Dilated convolution introduced a new parameter (dilation rate) to traditional convolutions. Briefly speaking, by injecting holes into the convolution kernels, the receptive field can be enlarged.
    You may find this blog post and the interactive visualization tool helpful for understanding the concept.
    Also, recall the equation for calculating the output shape when changing the hyperparameters for convolution kernels:
    output = [input + 2 * padding - kernel - (kernel-1) * (dilation - 1)] / stride + 1
      
    Please copy your AlexNet to a new class named AlexNetDilation, and implement the model following the architectures given below.
    class AlexNetDilation
    ================================================================================================================
        Layer (type)      Kernel      Padding      Stride      Dilation          Output Shape           Param #
    ----------------------------------------------------------------------------------------------------------------
            Conv2d-1     11 x 11            5           4             2      [-1, 96, 55, 55]            34,944
              ReLU-2                                                         [-1, 96, 55, 55]                 0
         MaxPool2d-3           3                        2                    [-1, 96, 27, 27]                 0
            Conv2d-4       5 x 5            4                         2     [-1, 256, 27, 27]           614,656
              ReLU-5                                                        [-1, 256, 27, 27]                 0
         MaxPool2d-6           3                        2                   [-1, 256, 13, 13]                 0
            Conv2d-7       3 x 3            2                         2     [-1, 384, 13, 13]           885,120
              ReLU-8                                                        [-1, 384, 13, 13]                 0
            Conv2d-9       3 x 3            2                         2     [-1, 384, 13, 13]         1,327,488
             ReLU-10                                                        [-1, 384, 13, 13]                 0
           Conv2d-11       3 x 3            2                         2     [-1, 256, 13, 13]           884,992
             ReLU-12                                                        [-1, 256, 13, 13]                 0
        MaxPool2d-13           3                        2                     [-1, 256, 6, 6]                 0
          Flatten-14                                                               [-1, 9216]                 0
          Dropout-15                                                               [-1, 9216]                 0
           Linear-16                                                               [-1, 4096]        37,752,832
             ReLU-17                                                               [-1, 4096]                 0
          Dropout-18                                                               [-1, 4096]                 0
           Linear-19                                                               [-1, 4096]        16,781,312
             ReLU-20                                                               [-1, 4096]                 0
           Linear-21                                                                  [-1, 4]            16,388
    ================================================================================================================
        
    Please use the same optimal hyperparameter with Part I to train the new model, and report architecture and accuracy in your report.

Part III: Visualizing Learned Filters (15 points)

Different from hand-crafted features, the convolutional neural network extracted features from input images automatically thus are difficult for human to interpret. A useful strategy is to visualize the kernels learned from data. In this part, you are asked to check the kernels learned in AlexNet for two different tasks, i.e. classifying domains or classes.
You need to compare the kernels in different layers for two models (trained for two different tasks), and see if the kernels have different patterns.
For your convenience, we provided a visualization function in the starter code (visualize_kernels).

Instructions: Submission: Please include the following files: