is it possible to etract palin text from a pdf file? if possible, can i please have the code

Recommended Answers

All 2 Replies

I usually just copy from a PDF document and paste to a plaintext document.

Do you really need a program to do that?

Your best bet is to use Poppler because PDF is a complex format with no easy way to extract text from, you need to rely on a library made to load PDFs, like Poppler. There is a utility called pdftotext which does exactly what you want. You can use it in a bash terminal (Unix / Linux) for example to save the text to a file:

$ pdftotext my_file.pdf > my_file.txt

Or to count the number of occurrences of the word "the" in a pdf file:

$ pdftotext my_file.pdf | grep -w -o the | wc -l

Within a C++ program, you can simply issue a system() call with commands like the above, or you can use the libpoppler C++ API directly.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.