Hello!

I am trying to create some sort of a crawler. I was using file_get_contents() to get the pages until i stumbled on this one site, where that didn't work:

$page = 'http://www.site.com/page.php';
$content = file_get_contents($page);
echo htmlspecialchars($content);

This returned a completely blank page.

After looking it up, it appears that it's possible a certain allow_url_fopen is set to off. I read that it could be bypassed with cURL. So i tried:

$page = 'http://www.site.com/page.php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $page);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$contents = curl_exec ($ch);
curl_close ($ch);

echo $contents;

...but this returns the exactly same thing: a blank page.

I have cURL installed and the code works for some other sites i tried. I could give you the url of this site in PM if you want to try it (cause i'd rather not post it publicly).

Any ideas?

Thanks in advance.

Recommended Answers

All 5 Replies

my guess is 1) the site needs a user agent since you arent setting one or 2) the site returns HTMl not 1 in file, but in many, and you're not getting the right one

Member Avatar for diafol
$page = 'http://www.site.com/page.php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $page);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);

Tried this and it works - straight from the php manual.

Hi and thanks for your replies. Ardav, that didn't work. I had tried it earlier too. My initial guess was what kireol said, something having to do with user agent. But i don't know how to set that. Any help?

I think the site returns one page only, but in any case i'm not getting anything at all, not even a single byte.

Thanks

I put a random user agent:

$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";

curl_setopt($ch, CURLOPT_USERAGENT, $useragent);

and it worked! Thanks :)

I thought so :)


also, here's my code that works on the site you gave me.

<?php

$cookie_file_path = "cookies/cookiejar.txt"; // Please set your Cookie File path
$fp = fopen("$cookie_file_path","w") or die("<BR><B>Unable to open cookie file $mycookiefile for write!<BR>");
fclose($fp);


    $LOGINURL = "http://www.enterurlhere.com";
    $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)";
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL,$LOGINURL);
        curl_setopt($ch, CURLOPT_USERAGENT, $agent);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
        curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
        $result = curl_exec ($ch);
echo $result;
?>
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.