CS1678/2078: Homework 1

CS 1678/2078: Homework 1

Due: 2/5/2024, 11:59pm

This assignment is worth 50 points. The first part is a NumPy exercise. The second part asks you to perform "training" of a neural network by hand (following an exercise we do in class). The third part asks you to implement a simple neural network from scratch, and the fourth asks you to test it.

Please go through this tutorial before you begin. Please use Python 3.8 or later for all assignments. You will also use numpy/scipy, scikit-image and matplotlib libraries for this class. It is fair game to look up the Python documentation on the web, but do not look at entire code blocks for content you are asked to implement, and do not copy-paste anything.

Part A: NumPy exercise (14 points)

We have starter code hw1.py on Canvas. Your task is to complete the functions in the starter file. The specific function that you need to complete is listed in the brackets for each bullet below.

[2 pts] Generate a 1000000x1 (one million by one) vector of random numbers from a Gaussian (normal) distribution with mean of 0 and standard deviation of 5. (generate_random_numbers)
[2 pts] Add 1 to every value in the previous list, by using a loop. To determine how many times to loop, use the size or shape functions. Time this operation and print the number in the code. (add_one_by_loop, measure_time_consumptions)
[2 pts] Now add 1 to every value in the original random vector, without using a loop. Time this operation and print the time. (add_one_without_loop, measure_time_consumptions)
[2 pts] Plot the exponential function 2**x, for non-negative even values of x smaller than 30, without using loops. (plot_without_loop)
[2 pts] Generate two random matrices A and B, and compute their product by hand, using loops. Your code should generate the same results as Python's A@B operation or numpy's np.matmul(). (matrix_multiplication_by_loop)
[2 pts] Generate a matrix of shape [10, 10] containing numbers from 0 to 99 by manipulation of a given vector. Specifically, given a vector containing numbers ranging from 0 to 9, you need to perform some matrix manipulations on the vector (addition, transpose, broadcast, etc.), and generate a matrix containing 0 to 99. You should not hard-code the desired matrix manually. (matrix_manipulation)
[2 pts] Write a function normalize_rows which uses a single command (one line and no loops) to make the sum in each row of the matrix 1. More specifically, row-wise normalization requires the following property to hold:
1. Sum of the entries in each row should be 1.
2. If the elements in a row were not identical before the normalization, they should remain different after your normalization; however, the relative order should be preserved.
Assume the input matrix to your function is (1) non-negative and (2) all rows contain at least 1 non-zero element. (normalize_rows)

Part B: Computing weight updates by hand (12 points)

In class, we saw how to compute activations in a neural network, and how to perform stochastic gradient descent to train it. We computed activations for two example networks, but only showed how to train one of them. Show how to train the second network using just a single example, x = [1 1 1], y = [0 0] (note that in this case, the label is a vector). Initialize all weights to 0.05. Use a learning rate of 0.3. You only need to perform a single iteration of training. Include your answers in text form in a file titled report.pdf or report.docx.

Part C: Training a neural network (12 points)

In this part, you will write code to train and apply a very simple neural network. Follow the example in Bishop Ch. 5 (linked under Readings) that uses a single hidden layer, a tanh function at the hidden layer and an identity function at the output layer, and a squared error loss. The network will have 30 hidden neurons (i.e. M=30) and 1 output neuron (i.e. K=1). To implement it, follow the equations in the slides and Bishop Ch. 5. You don't need to explicitly include the bias term (+b); we'll use a trick in the next part to incorporate it. Make sure you use a small enough learning rate (e.g. 10e-3 to 10e-5) and you initialize the weights to small random numbers, as shown in the slides.

First, write a function forward that takes inputs X, W1, W2 and outputs y_pred, Z. This function computes activations from the front towards the back of the network, using fixed input features and weights. You will also use the forward pass function to apply (run inference) and compute the loss for your network during/after training.

Inputs:

an NxD matrix X of features, where N is the number of samples and D is the number of feature dimensions,
an MxD matrix W1 of weights between the first and second layer of the network, where M is the number of hidden neurons, and
an 1xM matrix W2 of weights between the second and third layer of the network, where there is a single neuron at the output layer

Outputs:

[2 pts] an Nx1 vector y_pred containing the outputs at the last layer for all N samples, and
[2 pts] an NxM matrix Z containing the activations for all M hidden neurons of all N samples.

Second, write a function backprop that takes inputs X, y, M, iters, eta and outputs W1, W2, error_over_time. This function performs training using backpropagation (and calls the forward function as it iterates). Construct the network in this function, i.e. create the weight matrices and initialize the weights to small random numbers, then iterate: pick a training sample, compute the error at the output, then backpropagate to the hidden layer, and update the weights with the resulting error.

Inputs:

an NxD matrix X of features, where N is the number of samples and D is the number of feature dimensions,
an Nx1 vector y containing the ground-truth labels for the N samples,
a scalar M containing the number of hidden neurons to use,
a scalar iters defining how many iterations to run (one sample used in each), and
a scalar eta defining the learning rate to use.

Outputs:

[7 pts] W1 and W2, defined as above for forward, and
[1 pts] an itersx1 vector error_over_time that contains the error on the sample used in each iteration.

Part D: Testing your neural network on wine quality (12 points)

You will use the Wine Quality dataset. Use only the red wine data. The goal is to find the quality score of some wine based on its attributes. Write your code in a script neural_net.py.

[6 pts] First, download the winequality-red.csv file, load it, and divide the data into a training and test set using approximately 50% for training. Standardize the data, by computing the mean and standard deviation for each feature dimension using the train set only, then subtracting the mean and dividing by the stdev for each feature and each sample. Append a 1 for each feature vector, which will correspond to the bias that our model learns. Set the number of hidden units, the number of iterations to run, and the learning rate.
[2 pts] Call the backprop function to construct and train the network. Use 1000 iterations and 30 hidden neurons.
[2 pts] Then call the forward function to make predictions and compute the root mean squared error between predicted and ground-truth labels, sqrt(mean(square(y_test_pred - y_test))). Report this number in report.pdf/docx
[2 pts] Experiment with three different values of the learning rate. For each, plot the error over time (output by backprop above). Include these plots in your report.

Submission: Please include the following files in your submission zip file:

A completed Python file hw1.py
report.pdf/docx
forward.py
backprop.py
neural_net.py