coursera Andrew Ng 机器学习第三周笔记 逻辑回归与泛化

Logistic Regression


The classification problem is just like the regression problem, except that the values y we now want to predict take on only a small number of discrete values. For now, we will focus on the binary classification problem in which y can take on only two values, 0 and 1.

Logistic Regression Model

Decision Boundary

The decision boundary is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function.

Logistic Regression Cost Function

We cannot use the same cost function that we use for linear regression because the Logistic Function will cause the output to be wavy, causing many local optima. In other words, it will not be a convex function.


If our correct answer 'y' is 0, then the cost function will be 0 if our hypothesis function also outputs 0. If our hypothesis approaches 1, then the cost function will approach infinity.

simplified cost function and Gradient Descent



Advanced Optimization,高级优化算法

"Conjugate gradient", "BFGS", and "L-BFGS" are more sophisticated, faster ways to optimize θ that can be used instead of gradient descent. We suggest that you should not write these more sophisticated algorithms yourself (unless you are an expert in numerical computing) but use the libraries instead, as they're already tested and highly optimized.


multicalss calssification

one vs all = N * one vs rest

one vs rest = logistic regression classifier


problem of overfitting

  • Reduce the number of features
    • Manually select which features to keep
    • use a model selection algorithm
  • Regularization
    • keep all the features, but reduce the magnitude of parameters theta-j
    • Regularization works well when we have a lot of slightly useful features.

with lambda

The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

cost function with lambda

gradient descent with lambda