954,092 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Searching the scanned pdfs?

I have a bunch of PDF files made from documents scanned into jpeg images. Now, how can i search these pdf documents for text that is in the image.

:!: This does not seem possible to me as well but i'll appreciate any sort of methods. Any modifications that may need to be done to the images before converting them to pdf?? Anything else?

Thanks a lot

shAq
Newbie Poster
10 posts since Jul 2005
Reputation Points: 10
Solved Threads: 0
 

It isn't. There is no text in an image. It's just part of the image.

You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html

WaltP
Posting Sage w/ dash of thyme
Moderator
10,492 posts since May 2006
Reputation Points: 3,348
Solved Threads: 943
 

It isn't. There is no text in an image. It's just part of the image.

You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html

Thanks...

Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?

Thanks..

shAq
Newbie Poster
10 posts since Jul 2005
Reputation Points: 10
Solved Threads: 0
 

he's already told you to create your PDF files differently...

jwenting
duckman
Team Colleague
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
 

Thanks...

Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?

Thanks..


I really have no idea how you plan to "associate files with the keywords in the image" unless you mean connecting text file to an image somehow. Maybe you need to explain in detail what you are trying to accomplish, what type of images (bitmaps, jpg, etc)

WaltP
Posting Sage w/ dash of thyme
Moderator
10,492 posts since May 2006
Reputation Points: 3,348
Solved Threads: 943
 

I have a bunch of images that are a result of scanned newspapers. I need to link the headlines in the newspaper's image to an index. SO, what i was thinking was making a data base to link the image file with keywords of the headlines. I think this is the only way to do that.

shAq
Newbie Poster
10 posts since Jul 2005
Reputation Points: 10
Solved Threads: 0
 

That makes sense. The only way to do that as far as I know is by hand. As I said, there is no text in an image so you will have to eyeball the headlines and type them in. Even an OCR program won't help that much.

WaltP
Posting Sage w/ dash of thyme
Moderator
10,492 posts since May 2006
Reputation Points: 3,348
Solved Threads: 943
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: