I have a bunch of PDF files made from documents scanned into jpeg images. Now, how can i search these pdf documents for text that is in the image.
:!: This does not seem possible to me as well but i'll appreciate any sort of methods. Any modifications that may need to be done to the images before converting them to pdf?? Anything else?
It isn't. There is no text in an image. It's just part of the image.
You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html
It isn't. There is no text in an image. It's just part of the image.
You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html
Thanks...
Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?
Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?
Thanks..
I really have no idea how you plan to "associate files with the keywords in the image" unless you mean connecting text file to an image somehow. Maybe you need to explain in detail what you are trying to accomplish, what type of images (bitmaps, jpg, etc)
I have a bunch of images that are a result of scanned newspapers. I need to link the headlines in the newspaper's image to an index. SO, what i was thinking was making a data base to link the image file with keywords of the headlines. I think this is the only way to do that.
That makes sense. The only way to do that as far as I know is by hand. As I said, there is no text in an image so you will have to eyeball the headlines and type them in. Even an OCR program won't help that much.
No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Previous Thread in IT Professionals' Lounge Forum Timeline:I.T Certification