I am wanting to be able to "read" hand writing. I have managed to get the hand written characters from papers. I now - to the computer - have just pixels on paper (output is in .pdf format).

I want to have the computer be able to convert the pixels into letters and numbers, then into words that can be read/analyzed by the computer.

My thoughts are to see if there is open source for hand writing character recognition that I can use as a start.

I also read that I would need a large data set of charaters. Is there an open source for this ? Concurrent with this, I read that I could tranform a smaller data set into a larger one by modifying the the initial set. (I read about elastic deformations.)

I am starting to read into convolutional network and need to see where this takes me.

In my readings I came across a theory of using dictionaries as database. The HCR would read the first character then start to narrow down possibilities. I need thoughts on this as I see problems. If the word is "rose", but the HCR mistakenly reads the first letter as an "H", then I'd have "hose" returned.

At this point, all I am looking for are to be able to bounce ideas off people and learn about what has happend in this field. I'd also like links to open source examples and data bases - if any.

Is there a way I can attach my examples to this post ?

Thanks ?

I'm not aware of any closed or open source software that can manage to recognize handwriting with any reasonable degree of accuracy. Even clearly printed text can present a challenge for today's OCR engines, with none of them achieving 100% accuracy. Postal services have been using OCR for a number of years to recognize zip codes and automate sorting, but here we're talking about just a few characters of text and in known formats. Touch screen displays can also recognize text, but this is often one character at a time, and it's possible they may get some hints from the order in which characters are drawn. The way some letters can become overwritten would make it very difficult for a machine to read.

You might be able to improve on accuracy by using multiple OCR engines, as engines employ different techniques for recognizing characters, and post-processing the results with the help of dictionaries. If the subject of the text is narrow you might achieve better results with a customized dictionary.

I believe OCR is something that will get better with the use of neural networks, but it will be a long time before we see anything that can out perform a human eye and mind, which has the advantage of understanding the context in which something was written. Then again it might never happen - I wouldn't be surprised if handwriting becomes obsolete over the next decade.