Are there any limitations to text recognition from images, such as the quality of the image or the complexity of the text?

Recommended Answers

All 4 Replies

Tell more. Long ago I used Tesseract to extract text from images but it did require good image quality. Today the online image to text systems appear to do better. So the answer is yes.

I would suspect that the quality of the image needs to be high resolution enough such that the text is legible and can be reasonably matched to a font that the OCR software can detect.

Adobe Acrobat is pretty good at being able to turn text from a scanned image (e.g. a contract) into something searchable and editable.

The most obvious one would be the quality of the image. Then there's complexity of the text and it could also be challenging if it's another language other than English. Also, the software's ability to recognize handwriting or certain fonts can also pose problems so try to write legibly or use readable and common fonts. Here’s an in-depth article about text recognition.

Image quality is one, but image alignment is another - if the text is not aligned horizontally, you will get much worse results than otherwise. The presence of additional graphical elements can also cause issues. Text complexity is irrelevant, but you may find that some OCR modules will have an explicit setting for the language to be identified and will produce poor results if applied to a different language.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.