Linear Regression with Multiple Variables
Gradient Descent for Multiple Variables
We can speed up gradient descent by having each of our input values in roughly the same range.
- feature scaling: involves dividing the input values by the range of the input variable resulting in a new range of just 1.
- mean normalization: involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero.
- Debugging gradient descent: Make a plot with number of iterations on the x-axis. Now plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α.
- Automatic convergence test: Declare convergence if J(θ) decreases by less than E in one iteration, where E is some small value such as 10−3. However in practice it's difficult to choose this threshold value.
Features and Polynomial Regression
We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form).
One important thing to keep in mind is, if you choose your features this way then feature scaling becomes very important.
if XtX is noninvertible, the common causes might be having :
- Redundant features, where two features are very closely related (i.e. they are linearly dependent)
- Too many features (e.g. m ≤ n). In this case, delete some features or use "regularization" (to be explained in a later lesson).