## Logistic Regression

#### Classification

The classification problem is just like the regression problem, except that the values y we now want to predict take on only a small number of discrete values. For now, we will focus on the **binary classification problem** in which y can take on only two values, 0 and 1.

#### Logistic Regression Model

#### Decision Boundary

The **decision boundary** is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function.

#### Logistic Regression Cost Function

We cannot use the same cost function that we use for linear regression because the Logistic Function will cause the output to be wavy, causing many local optima. In other words, it will not be a convex function.

If our correct answer 'y' is 0, then the cost function will be 0 if our hypothesis function also outputs 0. If our hypothesis approaches 1, then the cost function will approach infinity.

#### simplified cost function and Gradient Descent

#### Advanced Optimization,高级优化算法

"Conjugate gradient", "BFGS", and "L-BFGS" are more sophisticated, faster ways to optimize θ that can be used instead of gradient descent. We suggest that you should not write these more sophisticated algorithms yourself (unless you are an expert in numerical computing) but use the libraries instead, as they're already tested and highly optimized.

例如matlab里的fminunc()，一般来说都是把自己写好的代价函数，初始化的theta，等等参数传入

#### multicalss calssification

one vs all = N * one vs rest

one vs rest = logistic regression classifier

## Regularization

#### problem of overfitting

- Reduce the number of features
- Manually select which features to keep
- use a model selection algorithm

- Regularization
- keep all the features, but reduce the magnitude of parameters theta-j
- Regularization works well when we have a lot of slightly useful features.

#### with lambda

The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

cost function with lambda

gradient descent with lambda