## Linear Regression with Multiple Variables

#### Multiple Features

#### Gradient Descent for Multiple Variables

#### Feature Scaling

We can speed up gradient descent by having each of our input values in roughly the same range.

**feature scaling**: involves dividing the input values by the range of the input variable resulting in a new range of just 1.**mean normalization**: involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero.

ui是平均值，si是最大-最小或者是标准差

#### Learning Rate

**Debugging gradient descent**: Make a plot with number of iterations on the x-axis. Now plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α.**Automatic convergence test**: Declare convergence if J(θ) decreases by less than E in one iteration, where E is some small value such as 10−3. However in practice it's difficult to choose this threshold value.

#### Features and Polynomial Regression

We can **change the behavior or curve** of our hypothesis function by making it a quadratic, cubic or square root function (or any other form).

One important thing to keep in mind is, if you choose your features this way then feature scaling becomes very important.

#### Normal Equation

if XtX is **noninvertible**, the common causes might be having :

- Redundant features, where two features are very closely related (i.e. they are linearly dependent)
- Too many features (e.g. m ≤ n). In this case, delete some features or use "regularization" (to be explained in a later lesson).