- Larger kernel size
Initial AlexNet has 5 convolutional kernels as defined in the table in Part I.
We observe that for the 1st, 2nd and 5th convolutional layers, a MaxPool2d layer is followed to downsample the inputs.
An alternative strategy is to use larger convolutional kernel (thus larger receptive field) and larger stride, which gives smaller output directly.
Please copy your AlexNet to a new class named AlexNetLargeKernel, and implement the model following the architectures given below.
class AlexNetLargeKernel
================================================================================================================
Layer (type) Kernel Padding Stride Dilation Output Shape Param #
----------------------------------------------------------------------------------------------------------------
Conv2d-1 21 x 21 1 8 [-1, 96, 27, 27] 127,104
ReLU-2 [-1, 96, 27, 27] 0
Conv2d-3 7 x 7 2 2 [-1, 256, 13, 13] 1,204,480
ReLU-4 [-1, 256, 13, 13] 0
Conv2d-5 3 x 3 1 [-1, 384, 13, 13] 885,120
ReLU-6 [-1, 384, 13, 13] 0
Conv2d-7 3 x 3 1 [-1, 384, 13, 13] 1,327,488
ReLU-8 [-1, 384, 13, 13] 0
Conv2d-9 3 x 3 2 [-1, 256, 6, 6] 884,992
ReLU-10 [-1, 256, 6, 6] 0
Flatten-11 [-1, 9216] 0
Dropout-12 [-1, 9216] 0
Linear-13 [-1, 4096] 37,752,832
ReLU-14 [-1, 4096] 0
Dropout-15 [-1, 4096] 0
Linear-16 [-1, 4096] 16,781,312
ReLU-17 [-1, 4096] 0
Linear-18 [-1, 4] 16,388
================================================================================================================
Please use the same optimal hyperparameter with Part I to train the new model, and report architecture and accuracy in your report.
- Smaller number of filters
AlexNet is initially designed for ImageNet classification, which aims for classifying a given input image into 1000 classes. For the PACS dataset in this assignment, the label is either 4 domains or 7 classes, thus intuitively the task is much simpler than the ImageNet classification.
Therefore, a reasonable change is to reduce the number of filters in the model, i.e. AlexNetTiny.
Please copy your AlexNet to a new class named AlexNetTiny, and implement the model following the architectures given below.
class AlexNetTiny
================================================================================================================
Layer (type) Kernel Padding Stride Dilation Output Shape Param #
----------------------------------------------------------------------------------------------------------------
Conv2d-1 11 x 11 4 [-1, 48, 55, 55] 17,472
ReLU-2 [-1, 48, 55, 55] 0
MaxPool2d-3 3 2 [-1, 48, 27, 27] 0
Conv2d-4 5 x 5 2 [-1, 128, 27, 27] 153,728
ReLU-5 [-1, 128, 27, 27] 0
MaxPool2d-6 3 2 [-1, 128, 13, 13] 0
Conv2d-7 3 x 3 1 [-1, 192, 13, 13] 221,376
ReLU-8 [-1, 192, 13, 13] 0
Conv2d-9 3 x 3 1 [-1, 192, 13, 13] 331,968
ReLU-10 [-1, 192, 13, 13] 0
Conv2d-11 3 x 3 1 [-1, 128, 13, 13] 221,312
ReLU-12 [-1, 128, 13, 13] 0
MaxPool2d-13 3 2 [-1, 128, 6, 6] 0
Flatten-14 [-1, 4608] 0
Dropout-15 [-1, 4608] 0
Linear-16 [-1, 2048] 9,439,232
ReLU-17 [-1, 2048] 0
Dropout-18 [-1, 2048] 0
Linear-19 [-1, 1024] 2,098,176
ReLU-20 [-1, 1024] 0
Linear-21 [-1, 4] 4,100
================================================================================================================
Please use the same optimal hyperparameter with Part I to train the new model, and report architecture and accuracy in your report.
- Pooling strategies
Another tweak to the AlexNet is the pooling layer. Instead of MaxPool2d another common pooling strategy is AvgPool2d, i.e. to average all the neurons in the receptive field.
Please copy your AlexNet to a new class named AlexNetAvgPooling, and implement the model following the architectures given below.
class AlexNetAvgPooling
================================================================================================================
Layer (type) Kernel Padding Stride Dilation Output Shape Param #
----------------------------------------------------------------------------------------------------------------
Conv2d-1 11 x 11 4 [-1, 96, 55, 55] 34,944
ReLU-2 [-1, 96, 55, 55] 0
AvgPool2d-3 3 2 [-1, 96, 27, 27] 0
Conv2d-4 5 x 5 2 [-1, 256, 27, 27] 614,656
ReLU-5 [-1, 256, 27, 27] 0
AvgPool2d-6 3 2 [-1, 256, 13, 13] 0
Conv2d-7 3 x 3 1 [-1, 384, 13, 13] 885,120
ReLU-8 [-1, 384, 13, 13] 0
Conv2d-9 3 x 3 1 [-1, 384, 13, 13] 1,327,488
ReLU-10 [-1, 384, 13, 13] 0
Conv2d-11 3 x 3 1 [-1, 256, 13, 13] 884,992
ReLU-12 [-1, 256, 13, 13] 0
AvgPool2d-13 3 2 [-1, 256, 6, 6] 0
Flatten-14 [-1, 9216] 0
Dropout-15 [-1, 9216] 0
Linear-16 [-1, 4096] 37,752,832
ReLU-17 [-1, 4096] 0
Dropout-18 [-1, 4096] 0
Linear-19 [-1, 4096] 16,781,312
ReLU-20 [-1, 4096] 0
Linear-21 [-1, 4] 16,388
================================================================================================================
Please use the same optimal hyperparameter with Part I to train the new model, and report architecture and accuracy in your report.
- Dilated convolutions (a.k.a. atrous convolutions)
Dilated convolution introduced a new parameter (dilation rate) to traditional convolutions. Briefly speaking, by injecting holes into the convolution kernels, the receptive field can be enlarged.
You may find this blog post and the interactive visualization tool helpful for understanding the concept.
Also, recall the equation for calculating the output shape when changing the hyperparameters for convolution kernels:
output = [input + 2 * padding - kernel - (kernel-1) * (dilation - 1)] / stride + 1
Please copy your AlexNet to a new class named AlexNetDilation, and implement the model following the architectures given below.
class AlexNetDilation
================================================================================================================
Layer (type) Kernel Padding Stride Dilation Output Shape Param #
----------------------------------------------------------------------------------------------------------------
Conv2d-1 11 x 11 5 4 2 [-1, 96, 55, 55] 34,944
ReLU-2 [-1, 96, 55, 55] 0
MaxPool2d-3 3 2 [-1, 96, 27, 27] 0
Conv2d-4 5 x 5 4 2 [-1, 256, 27, 27] 614,656
ReLU-5 [-1, 256, 27, 27] 0
MaxPool2d-6 3 2 [-1, 256, 13, 13] 0
Conv2d-7 3 x 3 2 2 [-1, 384, 13, 13] 885,120
ReLU-8 [-1, 384, 13, 13] 0
Conv2d-9 3 x 3 2 2 [-1, 384, 13, 13] 1,327,488
ReLU-10 [-1, 384, 13, 13] 0
Conv2d-11 3 x 3 2 2 [-1, 256, 13, 13] 884,992
ReLU-12 [-1, 256, 13, 13] 0
MaxPool2d-13 3 2 [-1, 256, 6, 6] 0
Flatten-14 [-1, 9216] 0
Dropout-15 [-1, 9216] 0
Linear-16 [-1, 4096] 37,752,832
ReLU-17 [-1, 4096] 0
Dropout-18 [-1, 4096] 0
Linear-19 [-1, 4096] 16,781,312
ReLU-20 [-1, 4096] 0
Linear-21 [-1, 4] 16,388
================================================================================================================
Please use the same optimal hyperparameter with Part I to train the new model, and report architecture and accuracy in your report.