Open PDFs in a Folder

Question

Stuugie 50 Marketing Strategist

11 Years Ago

Hi All,

I'm trying to loop through and open PDFs in a folder using Java. I have the following code:

import java.awt.Desktop;
import java.io.File;
import java.io.IOException;


public class OpenPDFs {

    public static void main(String[] args) throws IOException {
        // TODO code application logic here
        String fp;
        fp ="S:\\Economic Forecasts\\Fcst13\\SourceForecasts\\";
        File file = new File(fp);
            if(file.toString().endsWith(".pdf"))
                Runtime.getRuntime().exec("rundll32 url.dll,FileProtocolHandler " + file);
            else{
                Desktop dt = Desktop.getDesktop();
                dt.open(file);
            }
    }
}

which only opens the folder window that contains the files. In the end, what I've been tasked to do is go through a lot of PDFs and remove the metadata within each one. I know I can do this using Acrobat 9 but it can only be done 1 at a time. The people asking me about this say they have about 1000 PDFs to do this to. Has anyone ever done this or can you suggest a good way to do this?

java pdf

4 Contributors
20 Replies
593 Views
1 Day Discussion Span
Latest Post 11 Years Ago Latest Post by Stuugie

godzab 0 Junior Poster in Training

11 Years Ago

Use a thread do to this, it will do multiple at a time.

ex.

class OpenPDFs implements Runnable{
    public void run(){
        //do the task you did in main in here
    }

    public void main(String[] args){

        OpenPDFs open = new OpenPDFs(); // this is just an example
        OpenPDFs open1 = new OpenPDFs();
        Thread t = new Thread(open);
        Thread t1 = new Thread(open1);
        t.start(); //runs both objects without waiting
        t1.start();
    }
}

Here is an article on this, Im not sure if I explained this well: http://java.sampleexamples.com/how-to-use-runnable-interface-for-creating-thread-in-java/

Edited 11 Years Ago by godzab

iamthwee

11 Years Ago

Not quite java related but I would opt for:
http://www.rockpdf.com/

Please note the trial version can only handle pdfs with 50 pages

Grab the trial then use a bat file to recurse through the directory removing all the pdf metadata.

Shouldn't be too difficult... If you need further help I can be more specific.

Assuming you are on windows, if not you could always set up virtual box.

*Make sure to do a backup of your master directory in case things go wrong.

Edited 11 Years Ago by iamthwee

iamthwee

11 Years Ago

Sorry I just stumbled across this:

http://www.traction-software.co.uk/servertools/pdfinfo/trialrestrictions.htm

which has no trial restrictions for removing metadata and already has a batch file processor included AND is useable in different OS's.

Not tried it but looks the better option IMO.

peter_budo 2,532 Code tags enforcer

11 Years Ago

Did you tried iText or Apache PDFBox?

iamthwee commented: great links +14

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Stuugie 50 Marketing Strategist · Answer 1 · 2013-08-14T22:46:54+00:00

Thanks to the both of you. I am going to test iamthwee's suggestion first because I was initially tasked with finding 3rd party software to do the job and wasn't having any luck. I'll let you both know how it goes tomorrow!

Stuugie 50 Marketing Strategist · Answer 2 · 2013-08-15T02:49:20+00:00

I'll check out both. Money isn't an issue as it's not mine being spent! :) Thanks again iamthwee!

Stuugie 50 Marketing Strategist · Answer 3 · 2013-08-15T12:44:06+00:00

So I'm truthfully rather noobish when it comes to adding something like pdfbox. Is there a specific folder path I need to follow in order to use it's imports? I've been trying to use any and all of these suggestions but they either aren't installing, aren't importing, or I'm plain old dumb (which could be the case right now). Either way, this is really frustrating me atm.

Stuugie 50 Marketing Strategist · Answer 4 · 2013-08-15T12:57:24+00:00

Oh yeah and trying to run the first two suggested software from the command prompt keeps erroring with the files not being found.

Not a good way for me to start my day that's for sure.

Stuugie 50 Marketing Strategist · Answer 5 · 2013-08-15T13:26:03+00:00

OK, so I delved into Adobe Acrobat 9 Pro and the following steps did exactly what I needed:

Open Adobe Acrobat Pro
Click Advanced-Document Processing-Batch Processing
Click New Sequence
Name the Sequence (I named it “MetaRemove”
Click “Select Commands…”
In the Document folder, click “Examine Document”
Click “Add>>”
Expand the Examine Document field by clicking the +
Double click Remove metadata: Yes
Deselect all except Metadata and hidden text (unless there are other fields you want included in this process)
Click OK
Click to highlight MetaRemove
Click “Run Sequence”

Thanks for all the suggestions and if someone would be gracious enough to let me know how to use the suggestions given I'd be grateful for that too.

peter_budo 2,532 Code tags enforcer Team Colleague Featured Poster · Answer 6 · 2013-08-15T13:31:06+00:00

Oh yeah and trying to run the first two suggested software from the command prompt keeps erroring with the files not being found.

What you mean running from command prompt and getting errors. That is hardly a description of issue from developer...

Stuugie 50 Marketing Strategist · Answer 7 · 2013-08-15T13:44:14+00:00

for pdfleo (for instance) I get a return message: "missing option FILE"

peter_budo 2,532 Code tags enforcer Team Colleague Featured Poster · Answer 8 · 2013-08-15T14:41:32+00:00

Well would be beneficial if you actually posted code or what ever you are executing. Not sitting next to you to see on your screen ;)

Stuugie 50 Marketing Strategist · Answer 9 · 2013-08-15T15:16:58+00:00

What code? At command prompt I entered: pdfleo as per directions from the pdfleo pdf, which was downloaded from here.

peter_budo 2,532 Code tags enforcer Team Colleague Featured Poster · Answer 10 · 2013-08-15T15:31:06+00:00

Never used pdfleo I though you been talking about iText or PDFBox that is why I asked what your code looks like...

Stuugie 50 Marketing Strategist · Answer 11 · 2013-08-15T15:44:01+00:00

For PDFBox, I tried the following

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.pdfbox.cos.COSDocument;

import org.apache.pdfbox.pdmodel.PDDocument;

import org.apache.pdfbox.pdfwriter.COSWriter;



public class OpenPDFs {

    public static void main(String[] args) throws IOException {
        // TODO code application logic here
        String fp;
        fp ="S:\\Economic Forecasts\\Fcst13\\SourceForecasts\\";
        PDDocumentInformation info = document.getDocumentInformation();

but the import org... lines tell me:

package org.apache.pdfbox.cos does not exist

peter_budo 2,532 Code tags enforcer Team Colleague Featured Poster · Answer 12 · 2013-08-15T18:51:59+00:00

You did not imported/provided pdfbox-1.8.2.jar to your IDE (IntelliJ, Eclipse, NetBeans) correctly. Therefore it is complaining about not existing imports.
Tell us what IDE you using and we can give you guidance how to associate library JAR with your project in your IDE, or just google "import library jar IDE_NAME"

iamthwee · Answer 13 · 2013-08-15T19:15:46+00:00

Hi Stuugie, I tried both but prefer the second link as it has no page restrictions. The second link worked. I will post more specific instructions.

Not to take away Peter's great advice I don't think you need to get down and dirty with java, importing libraries and understanding how it works.

The second link should be fine...

iamthwee · Answer 14 · 2013-08-15T20:07:22+00:00

Hi Stuugie.

Here are the instructions.

Extract your downloaded file.
Inside the folder 'linuxpdfinfo' paste in the pdf you want to clean up.

3.Now open a terminal window and navigate to this folder.

4.Once in this folder type in the terminal window exactly as is.

./pdfinfo -itest.pdf -otest2.pdf -removeinfo -removexmp

Let us assume your pdf is called 'test.pdf'

Now you have created a new pdf called test2.pdf and the xmp info has been removed!

Enjoy.

iamthwee · Answer 15 · 2013-08-15T20:13:35+00:00

If you wish to process more than one folder note wild cards are no permitted.

So do a dump of all the pdf files first then reference that list in the terminal window.

E.g
ls -1 *.pdf > list.txt

then
./pdfinfo -ilist.txt -fstuugie -removeinfo -removexmp

Stuugie 50 Marketing Strategist · Answer 16 · 2013-08-16T12:52:16+00:00

@peter, I'm using Netbeans at both work and home. I just wanted to state that I don't work very much (at all really) with Java but have decided to delve into it again like I am a student taking Java courses again. I was programming last night for about 2 hours making up simple classes from my old text book. I really want to strengthen my skills with Java and OO programming, that's my goal!

@iamthwee, I have meetings this morning but I am going to give your suggestions a go when I have time today. I'll get back to you and let you know how I do.

Thanks for your patience with me guys, I really appreciate it!