Application example: photo OCR
pipeline
- text detection
- character segmentation
- character classification
sliding windows
- too easy to learn
Getting more data: artificial data synthesis
Discussion on getting more data
- make sure you have a low bias classifier before expending the effort.(Plot learning curves.) keep increasing the nuber of features/number of hidden units in neural network until you have a low bias classifier
- how much work would it be to get 10x as much data as we currently have?
- artificial data synthesis
- collect/label it yourself
- Croud source. e.g. Amazon Mechanical Turk
ceiling analysis, what part of the pipeline to work on next