944,182 Members | Top Members by Rank

Ad:
  • Java Discussion Thread
  • Unsolved
  • Views: 2609
  • Java RSS
Sep 13th, 2005
0

accessing redirected pages through java crawler

Expand Post »
hi forum,
i m developing a simple web crawler in java.upon entering an URL, the crawler downloads the corresponding web page and continues this process.but i m having problem in accessing web pages which are redirected to a diferent URL.one such example is www.telegraphindia.com ,in which a new part gets added to the original URL. can anybody help.thanks in advance.
Similar Threads
Reputation Points: 10
Solved Threads: 0
Newbie Poster
Dark Master is offline Offline
8 posts
since Aug 2005
Sep 13th, 2005
0

Re: accessing redirected pages through java crawler

Let's look at the response when requesting this page:
Java Syntax (Toggle Plain Text)
  1. HTTP/1.1 302 Object moved
  2. Date: Tue, 13 Sep 2005 16:06:26 GMT
  3. Server: Microsoft-IIS/6.0
  4. X-Powered-By: ASP.NET
  5. Location: section/frontpage/index.asp
  6. Content-Length: 148
  7. Content-Type: text/html
  8. Set-Cookie: ASPSESSIONIDACTBSRRB=FALIOKOCIJNCLJAPOONLFLCF; path=/
  9. Cache-control: private
  10.  
  11. <head><title>Object moved</title></head>
  12. <body><h1>Object Moved</h1>This object may be found <a HREF="section/frontpage/index.asp">here</a>.</body>
The header indicates a response code of 302. 302 responses include a "Location" directive that indicates where the actual response can be found at (if properly formatted that is). As you can see from the response, the Location is specified as "section/frontpage/index.asp". All you need to do is request that page from the same domain in order to get the information you want.
Reputation Points: 38
Solved Threads: 25
Posting Shark
chrisbliss18 is offline Offline
902 posts
since Aug 2005
Sep 20th, 2005
0

Re: accessing redirected pages through java crawler

thanks criss for ur reply, but i dont know how to implement ur suggetion.i hv created a inputstream object and used a url.openstream() method to access the contents of the page.can u suggest how i can capture the redirected portion of the URL.also how can i find out the http response codes that u showed.plz help.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
Dark Master is offline Offline
8 posts
since Aug 2005
Sep 20th, 2005
0

Re: accessing redirected pages through java crawler

You should use the HttpURLConnection class for requesting pages through HTTP. This class has a setFollowRedirects method that allows you to tell the class to automatically follow redirects. This class has many methods that you will find very helpful since it gives you the ability to read response messages and header information from the response.
Reputation Points: 38
Solved Threads: 25
Posting Shark
chrisbliss18 is offline Offline
902 posts
since Aug 2005

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Java Forum Timeline: Making a Remote Host
Next Thread in Java Forum Timeline: Writing Files to URLs





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC