0

I am working on a web scraping project and I have to work through a series of log-in pages before I can get to the data I want. My app is basically duplicating the HTTPS requests that are sent by a browser when I use the site conventionally. To help with this I am tracing the requests & responses generated by a browser.

At one point an HTTPS GET produces a 302 response which has me confused. As I understand it, the 302 response should specify an alternative URL in the location field, and the GET should be re-submitted to the new URL. However, the Location field in the 302 response is an exact match to the URL used in the original GET. What is happening here?

When I trace the data sent & received by a browser going though this sequence, it does indeed re-submit the GET. However, because the location field in the 302 response matches the URL used in the original GET, the two GETs look the same. The response is different though. The first produces a 302 response and the second produces a 200 code. Now I could just blindly do the same thing in my software but that seems risky. If I don't understand what is going on I am unlikely to produce a robust aplication and I am concerned it could break at this point. Can someone help me understand?

2
Contributors
2
Replies
8
Views
2 Years
Discussion Span
Last Post by SalmiSoft
0

What is happening here?

Could be anything. Faulty redirect, some session code playing tricks, hard to say. If the website tells you to redirect, follow it (whether or not the URL is the same).

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.