What kind of data should the tool work with?

A tool to convert Normal/Scanned PDF and Image to Word, Excel, PPT, Keynote, Pages, Text, etc. on Mac.

  • Convert PDF to Word (.doc), Excel (.xlsx), and More Common Office Format Files
  • Convert PDF to Pages and Keynote
  • Convert PDF to Graphics Files
  • Convert Scanned PDF with Accurate OCR
  • Convert Multilingual PDF Files
  • Support Password-Restricted PDF Files
Code license: Closed source
Last updated: 29 May 2016

An (optical character recognition) engine for creating editable and searchable electronic files from scanned paper documents, PDFs and digital photographs.

  • Recognition of Digital Camera and Mobile Phone Camera Images
  • Comprehensive Language Support
  • Complete Integration with Popular Office Applications
  • PDF conversion, archiving and security
Code license: Closed source
Last updated: 17 May 2016

Overview is a tool for analyzing large sets of documents. In includes a sophisticated search engine, word clouds, entity detection, and topic-based document clustering. If that’s not good enough, you can write your own plugins using the API. It is open source and you can run it on your own computer.

It was originally designed for investigative journalists, but it’s now also used for qualitative research, social media conversation analysis, legal document review, digital humanities, and more.

Overview is built to do several types of tasks:

Code license: Open source
Last updated: 9 Mar 2016

capella-scan can "OCR" music scores from PDF or common image formats and output the results in MusicXML for use with common music editing software.

Last updated: 23 Feb 2016

Combined with the Leptonica Image Processing Library Tesseract can read a wide variety of image formats and convert them to text in over 40 languages.

This code is a raw OCR engine. It has no output formatting and no UI. It can detect fixed pitch vs proportional text. Nevertheless in 1995 this engine was in the top 3 in terms of character accuracy, and it compiles and runs on both Linux and Windows. Training code is included in the open source release.

The core developer on the project is Ray Smith (theraysmith).

Code license: Open source, Apache License
Last updated: 27 Jan 2016

SharpEye is music scanning/"OCR" software that can convert an image of a score into an editable format such as MusicXML.

Last updated: 28 May 2015

A free (under the GNU General Public License) toolkit for the development of document image recognition systems.


  • Custom dictionaries may be created to assist with analysis of specific record types
  • Extensible functionality
  • Optical character recognition (OCR) toolkit plugin
Code license: Open source, GNU GPL
Last updated: 22 May 2015

Evernote is note-taking software in the cloud, with options for private and shared notebooks. Users can take text notes, and upload files to attach them to notes. Evernote has built-in OCR for images with printed or handwritten text. A premium account allows access to notebooks offline, as well as more storage and embedded PDF search.

Code license: Closed source
Last updated: 2 May 2015

post is very well written with lot of useful information for me. I am happy to find your great way of writing the post. With your generous help it is easier for me to understand and implement the concept. Thank you for the post.
car for sale
new car leasing

Last updated: 29 Dec 2014

PhotoScore takes an image of a music score-- including handwritten scores-- and outputs it in an editable format, including MusicXML.

Last updated: 29 Dec 2014

SmartScore takes an image of a music score and converts it into an editable format, including MusicXML.

Last updated: 29 Dec 2014

A software tool capable of performing Optical Character Recognition (OCR) upon a set of images. It achieves the task by analysing pixel sets and in an image and cross-matching them to a dictionary of words. Omnipage automates large sections of the digitisation process enabling physical objects to be scanned, processed using the OCR software and exported to a document file format. Later versions of the software incorporate image enhancement features to improve scan quality (and recognition results) and better support for complex page layouts and forms.

Last updated: 29 Dec 2014

The DocScanner app uses a device's built-in camera to scan documents. Features include image optimization, OCR, document type recognition (document, business card, receipt, etc.), autosorting, and ability to upload documents to Evernote, Dropbox, and Google Drive.

Code license: Closed source
Last updated: 29 Dec 2014
Subscribe to OCR