In this exercise, you will train and evaluate a very simple neural network.

- You will train a network with a single hidden layer. Your network will have a single output dimension (i.e. K=1).
- Your network should have a tanh activation function at the hidden layer, and identity activation function (i.e. yk = ak) at the output layer. This is exactly the function we used to illustrate the backprop algorithm. In other words, if you directly follow the slides from class (slides 26-30 from slide deck 10), this will be very quick to implement.
- The network will be trained for a regression task, using the Wine Quality dataset from HW1.
- Include one function which computes activations (forward pass), and another function which performs training using backpropagation (and calls the activation-computation function as it iterates). Also use the forward pass function to evaluate your network after training. Call both of these functions from a main function which sets up your train/test splits, trains the neural network, computes predictions on the train/test data, and prints your mean squared error on the training and test sets.
- Initialize your weights to small random numbers (e.g. on a scale of 1e-5).
- Experiment with different values of (1) the number of hidden neurons M; (2) the number of iterations before you terminate training; (3) the learning rate. Show plots in your submission that demonstrate what happens as you vary each of these three factors, while keeping the other two factors the same. You can start with the following values: M = 30; num_iter = 10000; lr = 0.001.

In this part, you will compute the output from applying a single set of convolution, non-linearity, and pooling operations, on a toy example. Below are your image (with size N = 9) and your filter (with size F = 3).

- First, show the output of applying convolution. Use no padding, and a stride of 2 (in both the horizontal and vertical directions).
- Second, show the output of applying a Rectified Linear Unit (ReLU) activation.
- Third, show the output of applying max pooling over 2x2 regions.

In this exercise, you will implement the AdaBoost method defined on pages 658-659 in Bishop (Section 14.3).

- Use the Pima dataset from HW2.
- Use decision stumps as your weak classifiers. Each decision stump operates on some feature dimension and uses some threshold over that feature dimension to make positive/negative predictions. For each decision stump, use at least 10 thresholds.
- Include one function which determines the best decision stump (the one with the lowest weighted error) on the training data. Include another function which implements the AdaBoost loop (using decision stumps), and outputs the final set of classifiers and weights (alpha) associated with each. Also include a third function which sets up your train/test splits, calls the AdaBoost loop to train, computes predictions on the train/test data, and prints accuracy on the training and test sets.

- Bishop Exercise 1.3
- Bishop Exercise 1.6
- Bishop Exercise 2.8 (first part only) -- Hint: Transform the right-hand side into the left-hand side.