Reading PDF files

Question

uurcnyldrm -3 Newbie Poster

13 Years Ago

Hi everyone,

Is there a way to read the text in a pdf file like reading notepad? And after that, I should be able to use it, for example, in a text area or show it to my users without using another program such as adobe reader. That's really important for my project.
Thanks in advance.

java pdf

3 Contributors
4 Replies
249 Views
6 Hours Discussion Span
Latest Post 13 Years Ago Latest Post by uurcnyldrm

All 4 Replies

peter_budo 2,532 Code tags enforcer

13 Years Ago

The answer is yes and no. No in case that PDF is actually batch of images that been previously scanned and just converted from what ever image format to PDF(they would need to under go OCR- optical character recognition process which is not 100% perfect)
Yes you can, and should be able to do with iText PdfReader and get page build components, or you can use Apache PDFBox which should be more flexible in the way of data extraction from PDF.

@NormR1 POI is for Microsoft Office document formats(Word, Excel, etc)

Edited 13 Years Ago by peter_budo because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

NormR1 563 Posting Sage Team Colleague · Answer 1 · 2012-02-10T04:30:17+00:00

I thought that notepad was a MS program. How do you read it? What do you get when you read the notepad.exe file? Do you mean text files?

The Apache project has some packages (?called POI) for reading PDF files.
Google for it

uurcnyldrm -3 Newbie Poster · Answer 2 · 2012-02-10T04:39:52+00:00

I thought that notepad was a MS program. How do you read it? What do you get when you read the notepad.exe file? Do you mean text files?
The Apache project has some packages (?called POI) for reading PDF files.
Google for it

You understood what I mean... .txt files are what I was talking about. Thanks for answer.

uurcnyldrm -3 Newbie Poster · Answer 3 · 2012-02-10T04:40:41+00:00

The answer is yes and no. No in case that PDF is actually batch of images that been previously scanned and just converted from what ever image format to PDF(they would need to under go OCR- optical character recognition process which is not 100% perfect)
Yes you can, and should be able to do with iText PdfReader and get page build components, or you can use Apache PDFBox which should be more flexible in the way of data extraction from PDF.
@NormR1 POI is for Microsoft Office document formats(Word, Excel, etc)

Thanks. That will really help.

Reading PDF files

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers