pdf to text

Question

dinhunzvi 0 Newbie Poster

12 Years Ago

is it possible to etract palin text from a pdf file? if possible, can i please have the code

c++ pdf

3 Contributors
2 Replies
213 Views
1 Hour Discussion Span
Latest Post 12 Years Ago Latest Post by mike_2000_17

All 2 Replies

mike_2000_17 2,669 21st Century Viking

12 Years Ago

Your best bet is to use Poppler because PDF is a complex format with no easy way to extract text from, you need to rely on a library made to load PDFs, like Poppler. There is a utility called pdftotext which does exactly what you want. You can use it in a bash terminal (Unix / Linux) for example to save the text to a file:

$ pdftotext my_file.pdf > my_file.txt

Or to count the number of occurrences of the word "the" in a pdf file:

$ pdftotext my_file.pdf | grep -w -o the | wc -l

Within a C++ program, you can simply issue a system() call with commands like the above, or you can use the libpoppler C++ API directly.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

DavidB 44 Junior Poster · Answer 1 · 2012-10-04T20:45:58+00:00

I usually just copy from a PDF document and paste to a plaintext document.

Do you really need a program to do that?

pdf to text

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers