I am working on a web scraping project and I have to work through a series of log-in pages before I can get to the data I want. My app is basically duplicating the HTTPS requests that are sent by a browser when I use the site conventionally. To help with this I am tracing the requests & responses generated by a browser.

At one point an HTTPS GET produces a 302 response which has me confused. As I understand it, the 302 response should specify an alternative URL in the location field, and the GET should be re-submitted to the new URL. However, the Location field in the 302 response is an exact match to the URL used in the original GET. What is happening here?

When I trace the data sent & received by a browser going though this sequence, it does indeed re-submit the GET. However, because the location field in the 302 response matches the URL used in the original GET, the two GETs look the same. The response is different though. The first produces a 302 response and the second produces a 200 code. Now I could just blindly do the same thing in my software but that seems risky. If I don't understand what is going on I am unlikely to produce a robust aplication and I am concerned it could break at this point. Can someone help me understand?

Recommended Answers

All 2 Replies

What is happening here?

Could be anything. Faulty redirect, some session code playing tricks, hard to say. If the website tells you to redirect, follow it (whether or not the URL is the same).

Aha! Spotted it. Partly dumb code by whoever wrote the web page and partly my own blindness. There IS a difference between the new location and the original URL. The original was:
https://www.somewhere.com/something
and the new location is:
https://www.somewhere.com/something/
So the original author had omitted the trailing backslash. Anyway I am happy now that I understand what is happening, even though it is something stupid.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.