accessing redirected pages through java crawler

Reply

Join Date: Aug 2005
Posts: 8
Reputation: Dark Master is an unknown quantity at this point 
Solved Threads: 0
Dark Master Dark Master is offline Offline
Newbie Poster

accessing redirected pages through java crawler

 
0
  #1
Sep 13th, 2005
hi forum,
i m developing a simple web crawler in java.upon entering an URL, the crawler downloads the corresponding web page and continues this process.but i m having problem in accessing web pages which are redirected to a diferent URL.one such example is www.telegraphindia.com ,in which a new part gets added to the original URL. can anybody help.thanks in advance.
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 902
Reputation: chrisbliss18 is an unknown quantity at this point 
Solved Threads: 23
chrisbliss18's Avatar
chrisbliss18 chrisbliss18 is offline Offline
Posting Shark

Re: accessing redirected pages through java crawler

 
0
  #2
Sep 13th, 2005
Let's look at the response when requesting this page:
  1. HTTP/1.1 302 Object moved
  2. Date: Tue, 13 Sep 2005 16:06:26 GMT
  3. Server: Microsoft-IIS/6.0
  4. X-Powered-By: ASP.NET
  5. Location: section/frontpage/index.asp
  6. Content-Length: 148
  7. Content-Type: text/html
  8. Set-Cookie: ASPSESSIONIDACTBSRRB=FALIOKOCIJNCLJAPOONLFLCF; path=/
  9. Cache-control: private
  10.  
  11. <head><title>Object moved</title></head>
  12. <body><h1>Object Moved</h1>This object may be found <a HREF="section/frontpage/index.asp">here</a>.</body>
The header indicates a response code of 302. 302 responses include a "Location" directive that indicates where the actual response can be found at (if properly formatted that is). As you can see from the response, the Location is specified as "section/frontpage/index.asp". All you need to do is request that page from the same domain in order to get the information you want.
Did we help you? Did we miss the point entirely? Update your thread and let us know.
Don't like the answers you are getting?
Did you try searching?
Clean up and optimize Windows 2000/XP
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 8
Reputation: Dark Master is an unknown quantity at this point 
Solved Threads: 0
Dark Master Dark Master is offline Offline
Newbie Poster

Re: accessing redirected pages through java crawler

 
0
  #3
Sep 20th, 2005
thanks criss for ur reply, but i dont know how to implement ur suggetion.i hv created a inputstream object and used a url.openstream() method to access the contents of the page.can u suggest how i can capture the redirected portion of the URL.also how can i find out the http response codes that u showed.plz help.
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 902
Reputation: chrisbliss18 is an unknown quantity at this point 
Solved Threads: 23
chrisbliss18's Avatar
chrisbliss18 chrisbliss18 is offline Offline
Posting Shark

Re: accessing redirected pages through java crawler

 
0
  #4
Sep 20th, 2005
You should use the HttpURLConnection class for requesting pages through HTTP. This class has a setFollowRedirects method that allows you to tell the class to automatically follow redirects. This class has many methods that you will find very helpful since it gives you the ability to read response messages and header information from the response.
Did we help you? Did we miss the point entirely? Update your thread and let us know.
Don't like the answers you are getting?
Did you try searching?
Clean up and optimize Windows 2000/XP
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the Java Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC