0

Hello!

I am trying to create some sort of a crawler. I was using file_get_contents() to get the pages until i stumbled on this one site, where that didn't work:

$page = 'http://www.site.com/page.php';
$content = file_get_contents($page);
echo htmlspecialchars($content);

This returned a completely blank page.

After looking it up, it appears that it's possible a certain allow_url_fopen is set to off. I read that it could be bypassed with cURL. So i tried:

$page = 'http://www.site.com/page.php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $page);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$contents = curl_exec ($ch);
curl_close ($ch);

echo $contents;

...but this returns the exactly same thing: a blank page.

I have cURL installed and the code works for some other sites i tried. I could give you the url of this site in PM if you want to try it (cause i'd rather not post it publicly).

Any ideas?

Thanks in advance.

3
Contributors
5
Replies
6
Views
7 Years
Discussion Span
Last Post by kireol
0

my guess is 1) the site needs a user agent since you arent setting one or 2) the site returns HTMl not 1 in file, but in many, and you're not getting the right one

1
$page = 'http://www.site.com/page.php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $page);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);

Tried this and it works - straight from the php manual.

0

Hi and thanks for your replies. Ardav, that didn't work. I had tried it earlier too. My initial guess was what kireol said, something having to do with user agent. But i don't know how to set that. Any help?

I think the site returns one page only, but in any case i'm not getting anything at all, not even a single byte.

Thanks

0

I put a random user agent:

$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";

curl_setopt($ch, CURLOPT_USERAGENT, $useragent);

and it worked! Thanks :)

0

I thought so :)


also, here's my code that works on the site you gave me.

<?php

$cookie_file_path = "cookies/cookiejar.txt"; // Please set your Cookie File path
$fp = fopen("$cookie_file_path","w") or die("<BR><B>Unable to open cookie file $mycookiefile for write!<BR>");
fclose($fp);


    $LOGINURL = "http://www.enterurlhere.com";
    $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)";
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL,$LOGINURL);
        curl_setopt($ch, CURLOPT_USERAGENT, $agent);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
        curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
        $result = curl_exec ($ch);
echo $result;
?>
This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.