Due: 4/10/2024, 11:59pm
This assignment is worth 50 points.
In this assignment, you will implement essential elements of a
transformer decoder for image captioning. Additionally, you will be
introduced the self-supervised learning through SimCLR, a framework that
enhances visual understanding and representation learning using
contrastive learning for image classification.
Starter code is provided here.
You need to save a copy in your own Google Drive so that you can edit.
There are two self-explanatory notebooks related to transformers and
self-supervised learning. You are asked to edit python files (.py
)
inside the starter code to implement related components. The code
sections where you should start and end your implementation are
explicitly identified by comment lines. These python files are imported
in the notebooks. Therefore, you are also asked to use the notebooks for
background information, follow instructions, and test your
implementations. A small portion of the COCO dataset is utilized for
captioning while implementing transformers. Meanwhile, the CIFAR10
dataset is employed for image classification in self-supervised
learning, with both datasets being automatically downloaded by the
respective notebooks. While grading, the outputs from the notebooks will
primarily be used, but submitting the modified python files is also
required.
Part A: Implementing Transformers (25 Points)
Please follow the detailed instructions in the "Transformer_Captioning.ipynb"
notebook. The implementations that you need to complete are listed for each bullet below. After completing the implementations in the python files, you should run the related cells in the notebook to test your implementations.
MultiHeadAttention
class in the file cs1678_2078/transformer_layers.py
PositionalEncoding
class in cs1678_2078/transformer_layers.py
CaptioningTransformer
class in the file cs1678_2078/classifiers/transformer.py
Please follow the detailed instructions in the "Self_Supervised_Learning.ipynb"
notebook. The implementations that you need to complete are listed for each bullet below. After completing the implementations in the python files, you should run the related cells in the notebook to test your implementations.
compute_train_transform()
and CIFAR10Pair.__getitem__()
functions for data augmentation transform in cs1678_2078/simclr/data_utils.py
sim
and simclr_loss_naive
in cs1678_2078/simclr/contrastive_loss.py
sim_positive_pairs
, compute_sim_matrix
, simclr_loss_vectorized
in cs1678_2078/simclr/contrastive_loss.py
train
function in cs1678_2078/simclr/utils.py
CS1678_2078_HW4
folder after completing the implementations and running the related cells in the notebooks. Before compressing (zip) the CS1678_2078_HW4
folder from Colab, please exclude the cs1678_2078/datasets/coco_captioning
and pretrained_model
directories to reduce the size of the submission (otherwise, the file size would exceed the limit for submission on Canvas).