944,057 Members | Top Members by Rank

Ad:
Jan 29th, 2007
0

Searching the scanned pdfs?

Expand Post »
I have a bunch of PDF files made from documents scanned into jpeg images. Now, how can i search these pdf documents for text that is in the image.

:!: This does not seem possible to me as well but i'll appreciate any sort of methods. Any modifications that may need to be done to the images before converting them to pdf?? Anything else?

Thanks a lot
Similar Threads
Reputation Points: 10
Solved Threads: 0
Newbie Poster
shAq is offline Offline
10 posts
since Jul 2005
Jan 29th, 2007
0

Re: Searching the scanned pdfs?

It isn't. There is no text in an image. It's just part of the image.

You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html
Moderator
Reputation Points: 3278
Solved Threads: 894
Posting Sage
WaltP is offline Offline
7,747 posts
since May 2006
Jan 29th, 2007
0

Re: Searching the scanned pdfs?

Click to Expand / Collapse  Quote originally posted by WaltP ...
It isn't. There is no text in an image. It's just part of the image.

You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html
Thanks...

Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?

Thanks..
Reputation Points: 10
Solved Threads: 0
Newbie Poster
shAq is offline Offline
10 posts
since Jul 2005
Jan 29th, 2007
0

Re: Searching the scanned pdfs?

he's already told you to create your PDF files differently...
Team Colleague
Reputation Points: 1658
Solved Threads: 331
duckman
jwenting is offline Offline
7,719 posts
since Nov 2004
Jan 29th, 2007
0

Re: Searching the scanned pdfs?

Click to Expand / Collapse  Quote originally posted by shAq ...
Thanks...

Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?

Thanks..
I really have no idea how you plan to "associate files with the keywords in the image" unless you mean connecting text file to an image somehow. Maybe you need to explain in detail what you are trying to accomplish, what type of images (bitmaps, jpg, etc)
Moderator
Reputation Points: 3278
Solved Threads: 894
Posting Sage
WaltP is offline Offline
7,747 posts
since May 2006
Jan 30th, 2007
0

Re: Searching the scanned pdfs?

I have a bunch of images that are a result of scanned newspapers. I need to link the headlines in the newspaper's image to an index. SO, what i was thinking was making a data base to link the image file with keywords of the headlines. I think this is the only way to do that.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
shAq is offline Offline
10 posts
since Jul 2005
Jan 30th, 2007
0

Re: Searching the scanned pdfs?

That makes sense. The only way to do that as far as I know is by hand. As I said, there is no text in an image so you will have to eyeball the headlines and type them in. Even an OCR program won't help that much.
Moderator
Reputation Points: 3278
Solved Threads: 894
Posting Sage
WaltP is offline Offline
7,747 posts
since May 2006

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in IT Professionals' Lounge Forum Timeline: I.T Certification
Next Thread in IT Professionals' Lounge Forum Timeline: rundll32exe on shutdown





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC