Hi everyone,

Is there a way to read the text in a pdf file like reading notepad? And after that, I should be able to use it, for example, in a text area or show it to my users without using another program such as adobe reader. That's really important for my project.
Thanks in advance.

I thought that notepad was a MS program. How do you read it? What do you get when you read the notepad.exe file? Do you mean text files?

The Apache project has some packages (?called POI) for reading PDF files.
Google for it

The answer is yes and no. No in case that PDF is actually batch of images that been previously scanned and just converted from what ever image format to PDF(they would need to under go OCR- optical character recognition process which is not 100% perfect)
Yes you can, and should be able to do with iText PdfReader and get page build components, or you can use Apache PDFBox which should be more flexible in the way of data extraction from PDF.

@NormR1 POI is for Microsoft Office document formats(Word, Excel, etc)

Edited 4 Years Ago by peter_budo: n/a

I thought that notepad was a MS program. How do you read it? What do you get when you read the notepad.exe file? Do you mean text files?

The Apache project has some packages (?called POI) for reading PDF files.
Google for it

You understood what I mean... .txt files are what I was talking about. Thanks for answer.

The answer is yes and no. No in case that PDF is actually batch of images that been previously scanned and just converted from what ever image format to PDF(they would need to under go OCR- optical character recognition process which is not 100% perfect)
Yes you can, and should be able to do with iText PdfReader and get page build components, or you can use Apache PDFBox which should be more flexible in the way of data extraction from PDF.

@NormR1 POI is for Microsoft Office document formats(Word, Excel, etc)

Thanks. That will really help.

This article has been dead for over six months. Start a new discussion instead.