0

Hi guys, I'm looking into the possibility of writing some javascript (or using an API of course) to be able to search a collection of PDFs and return the results in a Json format for processing (mainly displaying them back to the user). I've played around a little bit with something called the adobe acrobat console, which essentially it's a console running inside adobe acrobat reader which allows you to run some javascript.
The following simple snippet runs in that console and returns a list of results in all the PDFs selected:

search.matchCase = false;
search.wordMatching = "MatchAnyWord";
search.bookmarks = true;
search.query("will","Folder","/C/Users/xxx/Desktop/PDFs");

I've basically created a folder called PDFs where I stored 2 PDFs and then I run this search which returns all the results in a separate window. That's great, but I need to be able to "export" this functionality and pack it up in a script. Has anybody got any idea? Or even better, has anybody done this before?
cheers

Edited by Violet_82

3
Contributors
4
Replies
35
Views
1 Year
Discussion Span
Last Post by Mohamed_84
0

What did you use to retrieve texts from PDF files? And what is the format of the results? Did you read PDF file using JavaScript or something else?

0

Nothing, the above is internal to Adobe acrobat so it literally is those lines of code pasted in this peculiar adobe console: it resturns the results in a nice window - I presume under the hood it's a json objec of some kind but from there you don't have access to the code. I've looked a bit into pdf.js, https://github.com/mozilla/pdf.js as that seems to be the way forward, although installing and getting it up and running is proving rather challenging as not everything works the way it should, especially running gulp. I failed in the office, so i'll try on my laptop at home this evening hopefully, a clean install of nodejs and everything else, hoping to get some help from their irc channel
Has anybody used pdf.js?

0

If you want to extract text from PDF files using JavaScript, you may try this pdftotext on github. It should be simplier than pdf.js one; however, the results from extracting may not be in json format.

Edited by Taywin

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.