Searching the scanned pdfs?

Reply

Join Date: Jul 2005
Posts: 10
Reputation: shAq is an unknown quantity at this point 
Solved Threads: 0
shAq shAq is offline Offline
Newbie Poster

Searching the scanned pdfs?

 
0
  #1
Jan 29th, 2007
I have a bunch of PDF files made from documents scanned into jpeg images. Now, how can i search these pdf documents for text that is in the image.

:!: This does not seem possible to me as well but i'll appreciate any sort of methods. Any modifications that may need to be done to the images before converting them to pdf?? Anything else?

Thanks a lot
Reply With Quote Quick reply to this message  
Join Date: May 2006
Posts: 3,114
Reputation: WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of 
Solved Threads: 281
Moderator
WaltP's Avatar
WaltP WaltP is offline Offline
Posting Sensei

Re: Searching the scanned pdfs?

 
0
  #2
Jan 29th, 2007
It isn't. There is no text in an image. It's just part of the image.

You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html
The 3 Laws of the Procrastination Society:
1) Never do today that which can be put off until tomorrow
2) Tomorrow never comes
Reply With Quote Quick reply to this message  
Join Date: Jul 2005
Posts: 10
Reputation: shAq is an unknown quantity at this point 
Solved Threads: 0
shAq shAq is offline Offline
Newbie Poster

Re: Searching the scanned pdfs?

 
0
  #3
Jan 29th, 2007
Originally Posted by WaltP View Post
It isn't. There is no text in an image. It's just part of the image.

You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html
Thanks...

Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?

Thanks..
Reply With Quote Quick reply to this message  
Join Date: Nov 2004
Posts: 6,143
Reputation: jwenting is just really nice jwenting is just really nice jwenting is just really nice jwenting is just really nice 
Solved Threads: 213
Team Colleague
jwenting's Avatar
jwenting jwenting is offline Offline
duckman

Re: Searching the scanned pdfs?

 
0
  #4
Jan 29th, 2007
he's already told you to create your PDF files differently...
As people are clearly allowed to attack me but I'm not allowed to defend myself, I no longer post to this site.
Reply With Quote Quick reply to this message  
Join Date: May 2006
Posts: 3,114
Reputation: WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of 
Solved Threads: 281
Moderator
WaltP's Avatar
WaltP WaltP is offline Offline
Posting Sensei

Re: Searching the scanned pdfs?

 
0
  #5
Jan 29th, 2007
Originally Posted by shAq View Post
Thanks...

Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?

Thanks..
I really have no idea how you plan to "associate files with the keywords in the image" unless you mean connecting text file to an image somehow. Maybe you need to explain in detail what you are trying to accomplish, what type of images (bitmaps, jpg, etc)
The 3 Laws of the Procrastination Society:
1) Never do today that which can be put off until tomorrow
2) Tomorrow never comes
Reply With Quote Quick reply to this message  
Join Date: Jul 2005
Posts: 10
Reputation: shAq is an unknown quantity at this point 
Solved Threads: 0
shAq shAq is offline Offline
Newbie Poster

Re: Searching the scanned pdfs?

 
0
  #6
Jan 30th, 2007
I have a bunch of images that are a result of scanned newspapers. I need to link the headlines in the newspaper's image to an index. SO, what i was thinking was making a data base to link the image file with keywords of the headlines. I think this is the only way to do that.
Reply With Quote Quick reply to this message  
Join Date: May 2006
Posts: 3,114
Reputation: WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of WaltP has much to be proud of 
Solved Threads: 281
Moderator
WaltP's Avatar
WaltP WaltP is offline Offline
Posting Sensei

Re: Searching the scanned pdfs?

 
0
  #7
Jan 30th, 2007
That makes sense. The only way to do that as far as I know is by hand. As I said, there is no text in an image so you will have to eyeball the headlines and type them in. Even an OCR program won't help that much.
The 3 Laws of the Procrastination Society:
1) Never do today that which can be put off until tomorrow
2) Tomorrow never comes
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the IT Professionals' Lounge Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC