unsupervised learning
k-means altorithm
clustering, optimization objective
clustering, random initialization
should have K < m
Randomly pick K training examples.
clustering, choosing the number of clusters, K
Please draw a graph. Elbow method
Dimensionality reduction
Princiapl Component Analysis
reduce from n-dimension to k-dimension:
find k vectors u1 u2 u3 uk onto which to project the data,so as to minimize the projection error.
- preprocessing: feature scaling + mean normalization
- compute "covariance matrix"
- compute "eigenvectors" of matrix Sigma
[U,S,V] = svd(Sigma);
advice
mapping matrix(from N-d to K-d) should be defined by running PCA only on the training set.
This mapping can be applied as well to the examples X-cv and X-test in the cross validation and test sets.
- compression
- reduce memory/disk needed to store dat
- speed up learning algorithm
- visualization