| | |
Searching the scanned pdfs?
![]() |
•
•
Join Date: Jul 2005
Posts: 10
Reputation:
Solved Threads: 0
I have a bunch of PDF files made from documents scanned into jpeg images. Now, how can i search these pdf documents for text that is in the image.
:!: This does not seem possible to me as well but i'll appreciate any sort of methods. Any modifications that may need to be done to the images before converting them to pdf?? Anything else?
Thanks a lot
:!: This does not seem possible to me as well but i'll appreciate any sort of methods. Any modifications that may need to be done to the images before converting them to pdf?? Anything else?
Thanks a lot
It isn't. There is no text in an image. It's just part of the image.
You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html
You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html
The 3 Laws of the Procrastination Society:
1) Never do today that which can be put off until tomorrow
2) Tomorrow never comes
1) Never do today that which can be put off until tomorrow
2) Tomorrow never comes
•
•
Join Date: Jul 2005
Posts: 10
Reputation:
Solved Threads: 0
•
•
•
•
It isn't. There is no text in an image. It's just part of the image.
You can try an OCR (optical character reader) program that can read an image (non-compressed, usually). The best one I found is from http://www.transym.com/index.html
Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?
Thanks..
•
•
•
•
Thanks...
Actually I am trying to write a program that'll search the pdfs for keywords. I guess i'll have to associate files with the keywords in the image then. I can think of only this method... Is this the only wayt to do that?
Thanks..
The 3 Laws of the Procrastination Society:
1) Never do today that which can be put off until tomorrow
2) Tomorrow never comes
1) Never do today that which can be put off until tomorrow
2) Tomorrow never comes
That makes sense. The only way to do that as far as I know is by hand. As I said, there is no text in an image so you will have to eyeball the headlines and type them in. Even an OCR program won't help that much.
The 3 Laws of the Procrastination Society:
1) Never do today that which can be put off until tomorrow
2) Tomorrow never comes
1) Never do today that which can be put off until tomorrow
2) Tomorrow never comes
![]() |
Similar Threads
- Searching for software with scanning, text/graphic editing, etc. functions (*nix Software)
- searching the web (Geeks' Lounge)
- Damn IE!!! (Web Browsers)
Other Threads in the IT Professionals' Lounge Forum
- Previous Thread: I.T Certification
- Next Thread: rundll32exe on shutdown
| Thread Tools | Search this Thread |
1gbit advertising advice amazon answers archive british broadband business businessprocesses career carrier censorship cern china cio collectiveintelligence connectivity consumer consumers corporateearnings datatransfer debtcollectors dictionary digg digital ebay ecommerce email employment environment facebook food government grid high-definition hottub infodelivery infotech intel internet interview ipod isp japan kindle lhc library malware marketing mit moonfruit news onlineshopping piracy piratebay pope porn program questions r&d religion remoteworking research retail security sex shopping simple skype smallbusiness smb sms socialmedia socialnetworking software softwareengineer spam speed spending startrek statistics stocks study stumbleupon survey tabletpc technology touch-screen touchscreen twitter uk videoinprint voips web webdeveloper windows words






