Part 1: Perceptron Training on Fisher's Iris dataset

Part 1A: Exceed 95% classification accuracy

Train a single-layer perceptron model on the Fisher's iris dataset using stochastic gradient descent. The perceptron should use the soft-max function for computing its outputs and the cross-entropy loss function (this is the special case described in the book and covered during the lectures). Include the trained weights obtained after exceeding 95% classification accuracy on the whole data set (i.e., all 150 instances) in your solution.

Hint: you can use the Excel spreadsheet developed during the lecture to solve this problem.


Insert the weights for your trained perceptron model into the following table

Sepal lengthSepal widthPetal lengthPetal width

Part 1B: Plot accuracy versus the number of training epochs

Training a perceptron model requires multiple passes through the data set. In this problem your goal is to plot the accuracy of the perceptron as a function of the number of these passes, which are called epochs in the neural network literature, starting from untrained model and until the accuracy reaches 95% or more. An example plot is provided below.

example perceptron accuracy plot Placeholder image Insert your plot to replace the placeholder above.

Part 2: Markov Chains, HMMs, and MDPs

Explain the difference between Markov chains, hidden Markov models (HMMs), and Markov decision processes (MDPs). Can HMMs and MDPs be viewed as extensions of Markov chains? What is the main difference between HMMs and MDPs? List several applications of HMMs. List several applications of MDPs. Who is Markov? Why is he sometimes hiding? What are the three problems solved by the HMM algorithms? Can you describe the problems solved by the algorithms for MDPs?

Insert your answers here.

Part 3: Markov Decision Processes and Optimal Policies

Part 3A: Reward Specification Conversion

Convert an example MDP from Wikipedia that assigns rewards to successful state transitions to an equivalent MDP that assigns rewards to performing an action in a state without specifying the next state (i.e., the textbook formulation). What should be done to the forgetting rate gamma to get exactly the same optimal policy from the reformulated version?

example MDP with rewards for actions and transitions
Insert the table that specifies the MDP with rewards for actions here. Describe what happens to the parameter gamma to get an equivalent policy from the modified MDP?
How many states would the modified MDP have?

Part 3B: Find Optimal Policy

Find the optimal policy for one of the MDPs from Part 3A (either the original MDP from Wikipedia or the converted version).

Part 4: Complete the First Project Report Peer Review

The peer review is submitted separately in Canvas.

Part 5: Complete the Second Project Report Peer Review

The peer review is submitted separately in Canvas.

Part 6: Complete the Third Project Report Peer Review

The peer review is submitted separately in Canvas.