book page text block detection? (not OCR problem)

Question

biotech2 0 Newbie Poster

18 Years Ago

i need to somehow detect text block (i dont need OCR just area where text is, but i am dealing with pixels) within book page and then cut everything else out. i am dealing with scanned books so i have specs and smudges in the page. the easiest way to clean pages is to detect text block in the page and cut everything else, maybe some problems will occur with page numbers, but that should be detected and left as is.

i have tiff, but there is several ways to convert on bmp so any suggestions? code snippets? library to use or anything else that might help.

thanks

c

3 Contributors
2 Replies
133 Views
16 Hours Discussion Span
Latest Post 18 Years Ago Latest Post by Salem

iamthwee

18 Years Ago

I'd probably go for .bmp over .tiff

As extracting information from a *.bmp is more documented than any other file format? (maybe?)

Try this

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Salem 5,138 Posting Sage · Answer 1 · 2006-05-20T15:35:59+00:00

> i have tiff, but there is several ways to convert on bmp so any suggestions? code snippets?
Yet this seems like such a minor issue compared to the other problem you have of recognising blocks of text, and telling them apart from other marks on the image.

I guess that will be the next question then...