0

I'm writing a program that will download pfd files and collate them. However the urls I get are a bit weird. They look like this:

http://links.ealert.nature.com/ctt?kn=105&m=34136651&r=MjA1NzczMzM4NgS2&b=0&j=NTkwOTY5NjQS1&mt=1&rt=0

I'm using URLConnection and BufferedInputStream, but no data is read from the stream (read() returns -1 at the first try). If they are embedded in a webpage like this:

<a href="http://links.ealert.nature.com/ctt?kn=105&m=34136651&r=MjA1NzczMzM4NgS2&b=0&j=NTkwOTY5NjQS1&mt=1&rt=0">pdf</a>

then the browser resolves this url to a direct url:

http://www.nature.com/nature/journal/v461/n7265/pdf/461697a.pdf

How can I do it in a Java program? I suspect there's some sort of link server involved there, but I don't know how to communicate with it.

3
Contributors
8
Replies
9
Views
8 Years
Discussion Span
Last Post by mcek
0

The browser is not "resolving" anything. The site is "redirecting".

So, are you using URLConnection or HttpURLConnection? HttpURLConnection will "follow" redirects, by default, URLConnection, won't.

Edited by masijade: n/a

0

if you are trying to read it line by line it does not work. try reading it byte by byte as follow:

URL url = new URL("..........");
HttpURLConnection conn = (HttpURLConnection)url.openConnection();

InputStream in = conn.getInputStream();

ByteArrayOutputStream bos = new ByteArrayOutputStream();

int i;
while((i = in.read())!= -1)
{
    bos.write(i);
}

byte [] b = bos.toByteArray();
FileOutputStream fos = new FileOutputStream("c:\\temp\\test.pdf");
fos.write(b);
fos.close();
conn.disconnect();

Edited by Dani: Formatting fixed

0

Thank you for your answers. It doesn't seem to solve the problem however. The code im using is this:

public static void download(String urlS, File destination) throws IOException {
        BufferedInputStream bis = null;
        BufferedOutputStream bos = null;
        try {
            URL url = new URL(urlS);
            HttpURLConnection urlc = (HttpURLConnection)url.openConnection();
            System.out.println("Response message: " + urlc.getResponseMessage());
            System.out.println("Follow redirects: " + HttpURLConnection.getFollowRedirects());
            System.out.println("Content length: " + urlc.getContentLength());
            System.out.println("Content type: " + urlc.getContentType());

            bis = new BufferedInputStream(urlc.getInputStream());
            bos = new BufferedOutputStream(new FileOutputStream(
                    destination.getName()));

            int i;
            while ((i = bis.read()) != -1) {
                bos.write(i);
            }
        } finally {
            if (bis != null) {
                try {
                    bis.close();
                } catch (IOException ioe) {
                    ioe.printStackTrace();
                }
            }
            if (bos != null) {
                try {
                    bos.close();
                } catch (IOException ioe) {
                    ioe.printStackTrace();
                }
            }
        }
    }

I don't think reading line by line is the problem because it works perfectly well when I supply the direct URL. The output I get from the first URL is:

Response message: OK
Follow redirects: true
Content length: 0
Content type: text/plain; charset=UTF-8

and I get an empty file.

If I supply the second, direct URL I get this:

Response message: OK
Follow redirects: true
Content length: -1
Content type: application/pdf

and a nice pdf file.

The HttpURLConnect also doesn't seem to follow the redirection or whatever that is. :icon_confused:

I shall just add that if you paste the first URL into browser addres bar it also returns a completely empty page. It only works if this URL is in a link!

Edited by mcek: n/a

1

That information seems to encapsulate a session, so Google around a bit and find out how to maintain a session using HttpURLConnection.

0

Have you ever try this:

con = (HttpURLConnection)url.openConnection();
con.setRequestMethod("GET");
con.connect();

if (con.getResponseCode()==HttpURLConnection.HTTP_OK)
{
.......
.......
......

}

0

Hi
You code above work perfectly.
I have been tested with netBeans and it's ok.
I think that this thread is solved.

0

I got it to work!

@moutanna: The code doesn't work. HttpURLConnection uses GET by default.

Thanks to masijade for the session suggestion! I looked for ways of maintaining the session and I found a web testing suite - HttpUnit (http://httpunit.sourceforge.net/index.html). It has convenient classes for traversing web pages as if your program were a user in front of a browser and maintains the session automatically. Easy! Here's the code that downloads the page with links with those weird URLs and downloads all PDFs:

WebConversation wc = new WebConversation();
        WebResponse indexResp = wc.getResource(new GetMethodWebRequest(url));

        WebLink[] links = new WebLink[1];
        try {
            links = indexResp.getLinks();
        } catch (SAXException ex) {
            Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
        }

        System.out.println("Downloading the PDFs...");

        for(WebLink link : links) {
            if(!link.getText().contentEquals("PDF"))
                continue;
            try {
                link.click();
            } catch (SAXException ex) {
                Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
            }

            WebResponse resp = wc.getCurrentPage();
            String fileName = resp.getURL().getFile();
            fileName = fileName.substring(fileName.lastIndexOf("/") + 1);
            System.out.println(fileName);
            
            File file = new File(fileName);

            BufferedInputStream bis = new BufferedInputStream(resp.getInputStream());
            BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(file.getName()));

            int i;
            while ((i = bis.read()) != -1) {
                bos.write(i);
            }
            bis.close();
            bos.close();
        }
        System.out.println("Done downloading.");
This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.