Diffbot is a robot that sees the web the way people do, and helps developers extract the important parts from any web page.

Optical Character Recognition (OCR) is the electronic conversion of images of written or printed text into machine-encoded text.

content-extraction tier-2 upcoming
generic tier-2 pre-release