The stupidity in this thread made me register an account. Originally I was googling for the same thing the original poster is looking for.
When you crawl the web, you will spend most time waiting for network packets and saving the data some place. A spider is a perfect example of a piece of code, where execution time does not matter at all. You could write it in commodore basic 2.0 and wouldn't notice a difference.
Creating a spider in an unsuitable language like C++ will double your development effort for an actual performance gain in the first percentile.
The execution time limit in PHP is actually configurable. (Doh). It's usually disabled for command line execution. PHP only runs once? What happens then? The script self destructs?
Recommending java over PHP for performance reasons only makes sense if you are religious and worship The Java.
Naturally running a web crawler is not a task for days or even weeks. It's closer to years. If the op was looking to crawl a single site or two, he'd probably use one of the perfectly fine windows client applications and not look for a script.
There are opcode caches for PHP, which make it a just-in-time compiled language.
Assuming that there are no memory leaks is quite generous. Can we also assume that the world is round? When you "clock" a programming language, it would be kinda helpful to know what that loop was running, which operating system you were using, the bus width, and the compiler flags for the executable. The amount of memory does rather not matter.
Oh, and ...
<?php
$count = 0;
$now = microtime(true);
while ( ($now+1) > microtime(true)) $count++;
print "Loops per second: ".number_format($count)."\n";
?>
workhorse:~# php loop.php
Loops per second: 2,222,026
... you are full of it.
But let's assume you actually benchmarked the script in question. Let's also assume an average text weight of 50kb for a web page. Then your 3g processor (mobile phone?) could spider 30 gigabyte per minute. That's ~500 megabytes per second. Phat subsystem there. Mysql cluster with memory tables on 10GbE?
There are simple ways to split the websites to crawl between several instances of the script. You do not need threads. You can multi-task.
You remind me of that dude who threatened to "hack my website" and backed that claim with a traceroute. Please stop giving technical advice. Thank you.