| | |
accessing redirected pages through java crawler
![]() |
•
•
Join Date: Aug 2005
Posts: 8
Reputation:
Solved Threads: 0
hi forum,
i m developing a simple web crawler in java.upon entering an URL, the crawler downloads the corresponding web page and continues this process.but i m having problem in accessing web pages which are redirected to a diferent URL.one such example is www.telegraphindia.com ,in which a new part gets added to the original URL. can anybody help.thanks in advance.
i m developing a simple web crawler in java.upon entering an URL, the crawler downloads the corresponding web page and continues this process.but i m having problem in accessing web pages which are redirected to a diferent URL.one such example is www.telegraphindia.com ,in which a new part gets added to the original URL. can anybody help.thanks in advance.
Let's look at the response when requesting this page: The header indicates a response code of 302. 302 responses include a "Location" directive that indicates where the actual response can be found at (if properly formatted that is). As you can see from the response, the Location is specified as "section/frontpage/index.asp". All you need to do is request that page from the same domain in order to get the information you want.
Java Syntax (Toggle Plain Text)
HTTP/1.1 302 Object moved Date: Tue, 13 Sep 2005 16:06:26 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET Location: section/frontpage/index.asp Content-Length: 148 Content-Type: text/html Set-Cookie: ASPSESSIONIDACTBSRRB=FALIOKOCIJNCLJAPOONLFLCF; path=/ Cache-control: private <head><title>Object moved</title></head> <body><h1>Object Moved</h1>This object may be found <a HREF="section/frontpage/index.asp">here</a>.</body>
Did we help you? Did we miss the point entirely? Update your thread and let us know.
Don't like the answers you are getting?
Did you try searching?
Clean up and optimize Windows 2000/XP
Don't like the answers you are getting?
Did you try searching?
Clean up and optimize Windows 2000/XP
•
•
Join Date: Aug 2005
Posts: 8
Reputation:
Solved Threads: 0
thanks criss for ur reply, but i dont know how to implement ur suggetion.i hv created a inputstream object and used a url.openstream() method to access the contents of the page.can u suggest how i can capture the redirected portion of the URL.also how can i find out the http response codes that u showed.plz help.
You should use the HttpURLConnection class for requesting pages through HTTP. This class has a setFollowRedirects method that allows you to tell the class to automatically follow redirects. This class has many methods that you will find very helpful since it gives you the ability to read response messages and header information from the response.
Did we help you? Did we miss the point entirely? Update your thread and let us know.
Don't like the answers you are getting?
Did you try searching?
Clean up and optimize Windows 2000/XP
Don't like the answers you are getting?
Did you try searching?
Clean up and optimize Windows 2000/XP
![]() |
Similar Threads
- learning php (PHP)
- user authentication and authorization (JSP)
- Java. (Java)
- Trouble accessing some secure pages (Web Browsers)
- Dynamic web pages? Which will exist? (IT Professionals' Lounge)
Other Threads in the Java Forum
- Previous Thread: Making a Remote Host
- Next Thread: Writing Files to URLs
| Thread Tools | Search this Thread |
actuate add android api applet application applications array arrays automation balls bank binary bluetooth business chat class clear client code codesnippet collections component database defaultmethod development dice digit dragging ebook eclipse equation error event formatingtextintooltipjava fractal functiontesting game givemetehcodez graphics gui health hql html hyper ide idea image infinite int integer invokingapacheantprogrammatically j2me java javame javaprojects jni jpanel julia linux list main map method methods mobile myregfun mysql netbeans nonstatic openjavafx parameter pearl php problem program project recursion repositories scanner scrollbar server set sms sort sorting spamblocker sql sqlserver state storm string sun superclass swing swt thread threads tree windows





