954,549 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Reading PDF files

Hi everyone,

Is there a way to read the text in a pdf file like reading notepad? And after that, I should be able to use it, for example, in a text area or show it to my users without using another program such as adobe reader. That's really important for my project.
Thanks in advance.

uurcnyldrm
Newbie Poster
17 posts since Jan 2012
Reputation Points: 10
Solved Threads: 0
 

I thought that notepad was a MS program. How do you read it? What do you get when you read the notepad.exe file? Do you mean text files?

The Apache project has some packages (?called POI) for reading PDF files.
Google for it

NormR1
Posting Expert
Moderator
6,677 posts since Jun 2010
Reputation Points: 1,138
Solved Threads: 656
 

The answer is yes and no. No in case that PDF is actually batch of images that been previously scanned and just converted from what ever image format to PDF(they would need to under go OCR- optical character recognition process which is not 100% perfect)
Yes you can, and should be able to do with iText PdfReader and get page build components, or you can use Apache PDFBox which should be more flexible in the way of data extraction from PDF.

@NormR1 POI is for Microsoft Office document formats(Word, Excel, etc)

peter_budo
Code tags enforcer
Moderator
15,436 posts since Dec 2004
Reputation Points: 2,806
Solved Threads: 902
 

I thought that notepad was a MS program. How do you read it? What do you get when you read the notepad.exe file? Do you mean text files?

The Apache project has some packages (?called POI) for reading PDF files. Google for it

You understood what I mean... .txt files are what I was talking about. Thanks for answer.

uurcnyldrm
Newbie Poster
17 posts since Jan 2012
Reputation Points: 10
Solved Threads: 0
 

The answer is yes and no. No in case that PDF is actually batch of images that been previously scanned and just converted from what ever image format to PDF(they would need to under go OCR- optical character recognition process which is not 100% perfect) Yes you can, and should be able to do with iText PdfReader and get page build components, or you can use Apache PDFBox which should be more flexible in the way of data extraction from PDF.

@NormR1 POI is for Microsoft Office document formats(Word, Excel, etc)


Thanks. That will really help.

uurcnyldrm
Newbie Poster
17 posts since Jan 2012
Reputation Points: 10
Solved Threads: 0
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: