coursera机器学习第二周笔记

Linear Regression with Multiple Variables

Multiple Features

Gradient Descent for Multiple Variables

Feature Scaling

We can speed up gradient descent by having each of our input values in roughly the same range.

  • feature scaling: involves dividing the input values by the range of the input variable resulting in a new range of just 1.
  • mean normalization: involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero.

ui是平均值,si是最大-最小或者是标准差

Learning Rate

  • Debugging gradient descent: Make a plot with number of iterations on the x-axis. Now plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α.
  • Automatic convergence test: Declare convergence if J(θ) decreases by less than E in one iteration, where E is some small value such as 10−3. However in practice it's difficult to choose this threshold value.

If α is too small: slow convergence.

If α is too large: may not decrease on every iteration and thus may not converge.

Features and Polynomial Regression

We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form).

One important thing to keep in mind is, if you choose your features this way then feature scaling becomes very important.

Normal Equation

if XtX is noninvertible, the common causes might be having :

  • Redundant features, where two features are very closely related (i.e. they are linearly dependent)
  • Too many features (e.g. m ≤ n). In this case, delete some features or use "regularization" (to be explained in a later lesson).