0

i need to somehow detect text block (i dont need OCR just area where text is, but i am dealing with pixels) within book page and then cut everything else out. i am dealing with scanned books so i have specs and smudges in the page. the easiest way to clean pages is to detect text block in the page and cut everything else, maybe some problems will occur with page numbers, but that should be detected and left as is.

i have tiff, but there is several ways to convert on bmp so any suggestions? code snippets? library to use or anything else that might help.

thanks

3
Contributors
2
Replies
3
Views
11 Years
Discussion Span
Last Post by Salem
0

I'd probably go for .bmp over .tiff

As extracting information from a *.bmp is more documented than any other file format? (maybe?)

Try this

0

> i have tiff, but there is several ways to convert on bmp so any suggestions? code snippets?
Yet this seems like such a minor issue compared to the other problem you have of recognising blocks of text, and telling them apart from other marks on the image.

I guess that will be the next question then...

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.