i need to somehow detect text block (i dont need OCR just area where text is, but i am dealing with pixels) within book page and then cut everything else out. i am dealing with scanned books so i have specs and smudges in the page. the easiest way to clean pages is to detect text block in the page and cut everything else, maybe some problems will occur with page numbers, but that should be detected and left as is.

i have tiff, but there is several ways to convert on bmp so any suggestions? code snippets? library to use or anything else that might help.

thanks

Recommended Answers

All 2 Replies

Member Avatar for iamthwee

I'd probably go for .bmp over .tiff

As extracting information from a *.bmp is more documented than any other file format? (maybe?)

Try this

> i have tiff, but there is several ways to convert on bmp so any suggestions? code snippets?
Yet this seems like such a minor issue compared to the other problem you have of recognising blocks of text, and telling them apart from other marks on the image.

I guess that will be the next question then...

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.