The Open ICR Project

The Open ICR project goal is to build an open source solution for recognizing handwritten characters. While the project was born out of the need to recognize individual latin characters (for ICR, aka intelligent character recognition), the long term "strech goal" of the project is to also be able to assist in the field of handwriting recognition, also known as HWR.

Project Roadmap

Where we are at, and where we are going

Building a character recognition engine is a complex task - for that reason we've tried to design a roadmap around achievable, shorter-term goals. The goals are:

  1. Writing image processing routines to "sanitize and standardize" the input character image as much as possible before handing it over to a recoginition engine. (Alpha version available)
  2. Training an off-the-shelf recognition engine to use our "sanitized and standardized" character set. We are currently using the Tesseract OCR along with a set of custom language files to recognize post-processed images. (Alpha version available)
  3. Develop a more specialized and sophisticated solution specifically targeted at individual character recognition, either using a neural network based approach or developing a specialized recognition engine. (TODO)

    The Open ICR Image Pre-processor

    The purpose of the image pre-processor is to "sanitize and standardize" the input image as much as possible to prepare it for the recognition engine. The image preprocessor has the following dependencies:

    The following is a short summary of the different modifications the image pre-processor makes to the image:
    1. Remove borders around the character (i.e. from imperfect character extraction)
    2. Median filtering is applied to remove salt and pepper type noise
    3. Character image is cropped down to borders of written character
    4. Character image is scaled to a standard set of dimensions
    5. Character image is thinned using Zhang Suen algo
    6. White space padding added around the image to prepare for next stage
    7. Erosion is added to the character image to join small gaps


    python -o original.png-d ~path_for_output\filename.png


This project is hosted on GitHub, and can be downloaded there:

Download the Open ICR Image Pre-Processor at GitHub »

March 18, 2013: Original Release