0

Hi guys, I'm looking into the possibility of writing some javascript (or using an API of course) to be able to search a collection of PDFs and return the results in a Json format for processing (mainly displaying them back to the user). I've played around a little bit with something called the adobe acrobat console, which essentially it's a console running inside adobe acrobat reader which allows you to run some javascript.
The following simple snippet runs in that console and returns a list of results in all the PDFs selected:

search.matchCase = false;
search.wordMatching = "MatchAnyWord";
search.bookmarks = true;
search.query("will","Folder","/C/Users/xxx/Desktop/PDFs");

I've basically created a folder called PDFs where I stored 2 PDFs and then I run this search which returns all the results in a separate window. That's great, but I need to be able to "export" this functionality and pack it up in a script. Has anybody got any idea? Or even better, has anybody done this before?
cheers

Edited by Violet_82

3
Contributors
4
Replies
35
Views
9 Months
Discussion Span
Last Post by Mohamed_84
0

What did you use to retrieve texts from PDF files? And what is the format of the results? Did you read PDF file using JavaScript or something else?

0

Nothing, the above is internal to Adobe acrobat so it literally is those lines of code pasted in this peculiar adobe console: it resturns the results in a nice window - I presume under the hood it's a json objec of some kind but from there you don't have access to the code. I've looked a bit into pdf.js, https://github.com/mozilla/pdf.js as that seems to be the way forward, although installing and getting it up and running is proving rather challenging as not everything works the way it should, especially running gulp. I failed in the office, so i'll try on my laptop at home this evening hopefully, a clean install of nodejs and everything else, hoping to get some help from their irc channel
Has anybody used pdf.js?

0

If you want to extract text from PDF files using JavaScript, you may try this pdftotext on github. It should be simplier than pdf.js one; however, the results from extracting may not be in json format.

Edited by Taywin

Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.