Member Avatar for shAq

I have a bunch of PDF files made from documents scanned into jpeg images. Now, how can i search these pdf documents for text that is in the image.

:!: This does not seem possible to me as well but i'll appreciate any sort of methods. Any modifications that may need to be done to the images before converting them to pdf?? Anything else?

Thanks a lot

Recommended Answers

All 6 Replies

It isn't. There is no text in an image. It's just part of the image.

You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html

Member Avatar for shAq

It isn't. There is no text in an image. It's just part of the image.

You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html

Thanks...

Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?

Thanks..

he's already told you to create your PDF files differently...

Thanks...

Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?

Thanks..

I really have no idea how you plan to "associate files with the keywords in the image" unless you mean connecting text file to an image somehow. Maybe you need to explain in detail what you are trying to accomplish, what type of images (bitmaps, jpg, etc)

Member Avatar for shAq

I have a bunch of images that are a result of scanned newspapers. I need to link the headlines in the newspaper's image to an index. SO, what i was thinking was making a data base to link the image file with keywords of the headlines. I think this is the only way to do that.

That makes sense. The only way to do that as far as I know is by hand. As I said, there is no text in an image so you will have to eyeball the headlines and type them in. Even an OCR program won't help that much.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.