CS2750: Homework 2

Due: 3/2/2017, 11:59pm

Note: If you are asked to implement something by yourself, it is not ok to use or even look at existing Matlab or Python code, unless it's utility code. If you have questions about what you can use, ask the instructor or the TA.

Part I: K-Nearest Neighbors (25 points)

In this example, you will implement and explore K-NN classification. You will use the Pima Indians Diabetes dataset. See the pima-indians-diabetes.data file (the last value in each row contains the target label for that row) and the pima-indians-diabetes.names file, both found at the Data Folder yellow link at the top.
  1. Before you begin, split the data into 10 approximately equally-sized "folds". Your results reported below should be an average of the results when you train on the first 9 folds and test on the remaining 1, then if you train on the folds numbered 1 through 8 and the 10th fold and testing on the 9th fold, etc. For simplicity, you can also just use folds of size 76 and drop the remaining 8 instances.
  2. Make sure to normalize the data X by subtracting the mean and dividing by the standard deviation over each dimension. Note that you should compute the mean and stdev using the training data only and then apply them on the test data. This is because in a real application we do not see the test data until after we "ship off" our program/code.
  3. Implement K-NN. Your function should take in as inputs a scalar K and matrices/vectors X_train of size NtrainxD (where Ntrain is the number of training instances and D is the feature dimension), y_train of size Ntrainx1 containing the labels of the training instances, and X_test of size NtestxD. It should output a vector y_test of size Ntestx1. For each test instance, compute its distance to all training instances, pick the closest K training instances, pick the most common among their labels, and return it as the label for that test instance. It's ok to use built-in functions that compute distances, sort, compute the most common member of a list, etc.
  4. In your submission, report (in a text file) the test accuracy when K=5. Remember to average the test accuracy over the 10 folds.
  5. So far, we have been weighing neighbors equally. Now we want to experiment with weighing them according to their distance to the test sample of interest. Implement a Gaussian-weighed K-NN classifier using the equation given in class. Experiment with 3 different values of the bandwidth parameter (σ from the equations on the board) and report the results. Remember that your plot should be averaged over the 10 test folds.
Part II: Fisher's Linear Discriminant (15 points)

You have the following two-dimensional data: {(4,1), (2,4), (2,3), (3,6), (4,4), (9,10), (6,8), (9,5), (8,7), (10,8)}. The first five data points belong to one class, and the second set of five to a second class.
  1. Write a function that computes the direction of the w vector corresponding to the Fisher's linear discriminant of the data points.
  2. Apply the function to the given data. Plot both the points and the direction along which they are projected. Save the figure and include it in your submission. Hint: The slope of the line w along which the points are plotted is w(2)/w(1).
  3. Have a look at this file for starter plotting code. Do not hesitate to ask the TA for help with plotting.
Part III: Perceptron (25 points)

In this part, you will trace through a run of the perceptron algorithm. Use the data samples and labels from Part II but subtract 5 from their coordinates to center them around the origin (or add a "bias"). This data is linearly separable and can be plotted in 2D using its two feature dimensions. Your goal is to create figures similar to Figure 4.7 in Bishop. Implement the perceptron algorithm and use it, along with some plotting code you write, to trace through several iterations of the method.
  1. Use the equations from the slides for perceptron on 2/9. Use all the data for training. You need to keep track of which instances are misclassified. In each step, the weight vector w is adjusted using one misclassified example.
  2. Instead of a basis function, you will just use x, i.e. φ(x) = x. Set η to 0.1.
  3. In your submission zip file, for three consecutive iterations of the method, show which points are misclassified, and the current w. In each iteration, output the iteration ID and the two feature dimensions for the misclassified example that is being used to correct the w. Use the feature dimensions to identify which point is being used.
  4. Plot positive points as hollow circles and negatives as filled circles. Plot correctly classified points in green, and misclassified ones in red.
  5. If too many (or too few) iterations are taking place, terminate the run and start another run.
  6. Do not hesitate to ask the TA for help with plotting. Refer to the plotting code from Part II.
Part IV: Short Answers / Support Vector Machines (35 points)
  1. [5 pts] Bishop Exercise 4.16.
  2. [5 pts] Bishop Exercise 6.3. Hints: Expand the norm notation, remembering that the L2 norm of a vector x is xTx, that xTx is the same as an inner product of x with itself, and transpose distribution properties. Then express the expanded form with kernel notation (what type of kernel do you see?)
  3. [5 pts] Bishop Exercise 7.2.
  4. [10 pts] Bishop Exercise 7.3. Hints: This is a two-class problem. You have two constraints total, one for the positive instance and one for the negative instance, and these constraints are equalities because your positive and negative points have to lie on the margin and be support vectors. Use Lagrange multipliers to make the constraints part of the optimization problem and find what w and b equal.
  5. [10 pts] Examine the Matlab function quadprog. It can be used train an SVM (find the optimal w). Consider the input variables H,f,A,b,Aeq,beq,lb,ub. Write pseudocode that shows how you should set each of them so that the quadratic program that is solved is the solution to an SVM. Also include pseudocode that shows how to compute the weight vector w from the output of quadprog. Make sure to explain in your pseudocode what notation you are using for the train/test feature/label matrices.