CS1678: Homework 2

CS 1678: Homework 2

Due: 10/6/2021, 11:59pm

This assignment is worth 40 points. The first part asks you to implement a simple neural network from scratch, and the second asks you to test it. The third part asks you to perform "training" of a neural network by hand (following an exercise we will do in class).

Part A: Training a neural network (14 points)

In this part, you will write code to train and apply a very simple neural network. Follow the example in Bishop Ch. 5 (linked under Readings) that uses a single hidden layer, a tanh function at the hidden layer and an identity function at the output layer, and a squared error loss. The network will have 30 hidden neurons (i.e. M=30) and 1 output neuron (i.e. K=1). To implement it, follow the equations in the slides and Bishop Ch. 5. You can include the bias term or omit it. Make sure you use a small enough learning rate (e.g. 10e-3 to 10e-5) and you initialize the weights to small random numbers, as shown in the slides.

First, write a function forward that takes inputs X, W1, W2 and outputs y_pred, Z. This function computes activations from the front towards the back of the network, using fixed input features and weights. You will also use the forward pass function to apply (run inference) and compute the loss for your network during/after training.

Inputs:

an NxD matrix X of features, where N is the number of samples and D is the number of feature dimensions,
an MxD matrix W1 of weights between the first and second layer of the network, where M is the number of hidden neurons, and
an 1xM matrix W2 of weights between the second and third layer of the network, where there is a single neuron at the output layer

Outputs:

[2 pts] an Nx1 vector y_pred containing the outputs at the last layer for all N samples, and
[2 pts] an NxM matrix Z containing the activations for all M hidden neurons of all N samples.

Second, write a function backprop that takes inputs X, y, M, iters, eta and outputs W1, W2, error_over_time. This function performs training using backpropagation (and calls the forward function as it iterates). Construct the network in this function, i.e. create the weight matrices and initialize the weights to small random numbers, then iterate: pick a training sample, compute the error at the output, then backpropagate to the hidden layer, and update the weights with the resulting error.

Inputs:

an NxD matrix X of features, where N is the number of samples and D is the number of feature dimensions,
an Nx1 vector y containing the ground-truth labels for the N samples,
a scalar M containing the number of hidden neurons to use,
a scalar iters defining how many iterations to run (one sample used in each), and
a scalar eta defining the learning rate to use.

Outputs:

[9 pts] W1 and W2, defined as above for forward, and
[1 pts] an itersx1 vector error_over_time that contains the error on the sample used in each iteration.

Part B: Testing your neural network on wine quality (14 points)

You will use the Wine Quality dataset. Use only the red wine data. The goal is to find the quality score of some wine based on its attributes. Write your code in a script neural_net.py.

[6 pts] First, download the winequality-red.csv file, load it, and divide the data into a training and test set using approximately 50% for training. Standardize the data, by computing the mean and standard deviation for each feature dimension using the train set only, then subtracting the mean and dividing by the stdev for each feature and each sample. Append a 1 for each feature vector, which will correspond to the bias that our model learns. Set the number of hidden units, the number of iterations to run, and the learning rate.
[3 pts] Call the backprop function to construct and train the network. Use 1000 iterations and 30 hidden neurons.
[3 pts] Then call the forward function to make predictions and compute the root mean squared error between predicted and ground-truth labels, sqrt(mean(square(y_test_pred - y_test))). Report this number in a file report.pdf/docx
[2 pts] Experiment with three different values of the learning rate. For each, plot the error over time (output by backprop above). Include these plots in your report.

Part C: Computing weight updates by hand (12 points)

In class, we saw how to compute activations in a neural network, and how to perform stochastic gradient descent to train it. We computed activations for two example networks, but only showed how to train one of them. Show how to train the second network using just a single example, x = [1 1 1], y = [0 0] (note that in this case, the label is a vector). Initialize all weights to 0.05. Use a learning rate of 0.3. You only need to perform a single iteration of training. Include your answers in text form in the file report.pdf/docx.

Submission: Please include the following files in your submission zip file:

forward.py
backprop.py
neural_net.py
report.pdf/docx