0

I've been using file_get_contents on the same URL for years and now suddenly it fails. After reading of similar issues I've tried using various context flags with no luck.

The really odd thing is that I can load the URL from any browser with no issues but file_get_contents fails. Does anyone have any ideas?

ERROR:
Warning: file_get_contents(http://www.aaii.com/sentimentsurvey/sent_results): failed to open stream: Redirection limit reached, aborting in ... on line 286

Thanks in advance!

2
Contributors
1
Reply
26
Views
10 Months
Discussion Span
Last Post by cereal
0

Hi,

it happens because their server detects you are using a bot, by setting a User-Agent their server replies with a Location header that suggests where to redirect the request, this is a time-limited login link that redirects back to the requested page. Example with HTTPie:

http -vv GET http://www.aaii.com/sentimentsurvey/sent_results User-Agent:'Mozilla/5.0 (X11; Linux i686) AppleWebKit [...]'

You send:

GET /sentimentsurvey/sent_results HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: www.aaii.com
User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit [...]

And get back in response:

HTTP/1.1 302 Moved Temporarily
Cache-Control: no-cache
Content-Encoding: gzip
Content-Type: text/html; charset=UTF-8
Date: Sat, GMT
Location: https://user.aaii.com/sso/login.aspx?vi=8&vt=[ALPHANUM_STRING]&DPLF=Y
Pragma: no-cache
Set-Cookie: CFID=; HttpOnly;expires=Mon, ;path=/
Set-Cookie: CFTOKEN=UUID-STRING; HttpOnly;expires=Mon, ;path=/
Set-Cookie: JSESSIONID=ALPHANUM-STRING;path=/; HttpOnly
Transfer-Encoding: chunked

Now, the browser at this points follows the Location header, so you get the page. With HTTPie is done by repeating the request and by adding --follow and you finally get:

HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Type: text/html; charset=UTF-8
Date: Sat, GMT
Set-Cookie: VISIT=1; HttpOnly;expires=;path=/
Set-Cookie: JOINAD=; HttpOnly;path=/
Set-Cookie: COUNTER=0; HttpOnly;path=/
Set-Cookie: POPCOOKIE5=0;expires=;path=/
Set-Cookie: POPCOOKIE90=0;expires=Sun, ;path=/
Set-Cookie: POPCOOKIE5=1; HttpOnly;path=/
Set-Cookie: POPCOOKIE90=1; HttpOnly;path=/
Set-Cookie: NOTBOT=0;expires= GMT;path=/
Set-Cookie: EXPIREHOLD=
Transfer-Encoding: chunked

The NOTBOT cookie seems to be the key, but it will not work in first instance if you don't send also the other cookies.

You can try to use curl CURLOPT_FOLLOWLOCATION or a stream context within file_get_contents but I suspect that in both cases you would violate their terms of services. See if they have an API or an XML feed that you can query regularly.

Edited by cereal

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.