CS 1699: Homework 5

Due: April 16, 11:59 PM (EST)

This assignment is worth 50 points. Please contact Mingda Zhang (mzhang@cs.pitt.edu) if you have any issues/questions regarding this assignment.
Before you start, we provided the starter code for this assignment here. We strongly recommend you to spend some time to read the recommended implementation in starter code.
Excluding the time for training the models (please leave a few days for training), we expect this assignment to take no more than 12 hours.
Updates: You are not allowed to use the native torch.nn.LSTMCell or other built-in RNN modules in this assignment.

Part I: Sentiment analysis on IMDB reviews (20 points)

Large Moview Review Dataset (IMDB) is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. It provides a set of 50,000 highly polar movie reviews. We have split the dataset into training (45,000) and test (5,000). The positive:negative ratio is 1:1 in both splits.
In this task, you need to develop a RNN model to "read" each review then predict whether it's positive or negative.
In the provided starter code, we implemented a RNN pipeline (sentiment_analysis.py) and a GRUCell (rnn_modules.py) as an example.
You need to build a few other variants of RNN modules, specifically as explained in this blog.

Instructions:
  1. (2.5 points) Read the code in datasets.py and sentiment_analysis.py, then run the training with provided GRU cell. You should be able to achieve at least 85% accuracy in 50 epochs with default hyperparameters.
    Attach the figures (either TensorBoard screenshot or plot on your own) of (1) training loss, (2) training accuracy per epoch and (3) validation accuracy per epoch in your report.
  2. (4 points each) Implement the following three variants in rnn_modules.py, details in the blog, the section named Variants on Long Short Term Memory. Please note that the class is already provided for you, and you only need to complete the __init__ and forward functions. Please do NOT change the signatures.
    Finish these three classes and submit
  3. (2.5 points) Print number of model parameters for different module type (GRUCell, LSTMCell, PeepholedLSTMCell and CoupledLSTMCell) using count_parameters, and include the comparison in your report.
    Use the following hyperparameters for comparison: input_size=128, hidden_size=100, bias=True.
  4. (3 points) Run experiments with your custom implementation on sentiment analysis with IMDB dataset, and compare the results with the GRU, including both speed and performance.
    Attach the training loss and training/validation accuracy plot in your report.

Part II: Building a Shakespeare writer (15 points)

RNN has demonstrated great potential in modeling language, and one interesting property is that it can "learn" to generate new sentences.
In this task, you need to develop a character-based RNN model (meaning instead of words, RNN processed one character at a time) to learn how to write like Shakespeare.

Instructions:
  1. (4 pts) Read the code in datasets.py and sentence_generation.py, and complete the SentenceGeneration class which is a character-level RNN.
    You can reuse (by copy/paste) most of the codes from SentimentClassification in Part I, just note that instead of predicting positive or negative, now your task is to predict the next character given a sequence of chars (history).
  2. (4 pts) Train the model with the GRU module on Shakespear books. You should be able to achieve loss value of 1.2 in 10 epochs with default hyperparameters. (Update: You probably need to use an embedding_dim of 256 and hidden_size of 512 to get the above loss value.)
    If you are interested you could try with your own LSTM variants, but experiments with GRU is required.
  3. (7 pts) Complete the function in sentence_generation.py to load your trained model and generate new sentence from it.
    Basically, once a language model is trained, it is able to predict the next character after a sequence, and this process can be continued (predicted character serve as history for predicting the next).
    More specifically, your model should be able to predict the probability distribution over the vocabulary for the next character, and we have implemented a sampler sample_next_char_id which samples according to the probability. By repeating this process, your model is able to write arbitrarily long paragraphs.
    For example the following passage is written by a GRU trained on Shakespeare:
    ROMEO:Will't Marcius Coriolanus and envy of smelling!
    
    DUKE VINCENTIO:
    He seems muster in the shepherd's bloody winds;
    Which any hand and my folder sea fast,
    Last vantage doth be willing forth to have.
    Sirraher comest that opposite too?
    
    JULIET:
    Are there incensed to my feet relation!
    Down with mad appelate bargage! troubled
    My brains loved and swifter than edwards:
    Or, hency, thy fair bridging courseconce,
    Or else had slept a traitors in mine own.
    Look, Which canst thou have no thought appear.
    
    ROMEO:
    Give me them: for that I put us empty.
    
    RIVERS:
    The shadow doth not live: and he would not
    From thee for his that office past confusion
    Is their great expecteth on the wheek;
    But not the noble fathom were an poison
    Here come to make a dukedom: therefore--
    But O, God grant! for Signior HERY
    
    VI:
    Soft love, that Lord Angelo: then blaze me all;
    And slept not without a Calivan Us.
    
    Note that the model learns how to spell each word and write the sentence-like paragraphs all by itself, even including the punctuations and line breaks.
    Please use ROMEO and JULIET as history to begin the generation for 1000 characters each, and attach the generated text in your report.

Part III: Visualization of the LSTM gates (15 points)

We provided a RNN model with GRU module trained on War and Peace and you should visualize the learned model parameters to reveal the internal mechanism in GRU.
  1. (3 pts) Read the code in visualization.py and complete the function to visualize the per-step, per-hidden-cell activations in your RNN using heatmap. The provided model checkpoint is in the data directory ("war_and_peace_model_checkpoint.pt"). You may reuse some visualization codes from your assignment 4. (Updates: We have implemented the model for you in class VisualizeInternalGates and the dataset in VisualizeWarAndPeaceDataset. You can just build your model and dataset from these two classes.)
  2. (2 pts) Visualize the responses on the selected sentences in data/war_and_peace_visualize.txt, including update gate, reset gate and internal cell candidates.
  3. (7 pts) Modify the code to visualize different gates for the LSTM models you trained in Part I. You can reuse (but may need minor modifications) the codes in VisualizeInternalGates and VisualizeGRUCell.
  4. (3 pts) Describe what you observed from the visualization. (Updates: You do not need to show all the images, just pick the ones that can support your observations. In other words, the visualization is just to get a better understanding of how the model works, and you need to collect evidences that support your claim.)
    More specifically, you should look for general patterns in the figure. For example, in the figure below for update gate, each row represents one hidden cell and each column represents characters in the sequence (after each character being processed). You should look at the image zoomed-out and look for thick columns that are generally more different than the surroundings.
Submission: Please include the following files: