## Neural Networks: Learning

#### cost function

``````L = total number of layers in the network
sl = number of units (not counting bias unit) in layer l
K = number of output units/classes`````` ``````the double sum simply adds up the logistic regression costs calculated for each cell in the output layer
the triple sum simply adds up the squares of all the individual Θs in the entire network.
the i in the triple sum does not refer to training example i``````

#### Backpropagation Algorithm  #### Backpropagation in Practice

``````thetaVector = [ Theta1(:); Theta2(:); Theta3(:); ]
deltaVector = [ D1(:); D2(:); D3(:) ]
Theta1 = reshape(thetaVector(1:110),10,11)
Theta2 = reshape(thetaVector(111:220),10,11)
Theta3 = reshape(thetaVector(221:231),1,11)`````` Once you have verified once that your backpropagation algorithm is correct, you don't need to compute gradApprox again. The code to compute gradApprox can be very slow.

#### random initialization #### Whole workflow

First, pick a network architecture; choose the layout of your neural network, including how many hidden units in each layer and how many layers in total you want to have.

• Number of input units = dimension of features x(i)
• Number of output units = number of classes
• Number of hidden units per layer = usually more the better (must balance with cost of computation as it increases with more hidden units)
• Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer.

Training a Neural Network

• Randomly initialize the weights
• Implement forward propagation to get hΘ(x(i)) for any x(i)
• Implement the cost function
• Implement backpropagation to compute partial derivatives
``````for i = 1:m, 