If you want to extract the url's from the page then I have an existing script that not only extracts to links to other pages but also links to pictures and other media. My script is as follows:
function getlinks($url) {
$media=preg_split('/(href\=\"|href\=\'|href\=|src\=\"|src\=\'|src\=)/i',$url);
$media=preg_replace("/([^\'])\'(.*)/is",'$1',$media);
$media=preg_replace("/([^\"])\"(.*)/is",'$1',$media);
$media=preg_replace("/([^\>])\>(.*)/is",'$1',$media);
$media=preg_replace("/([^\'])\'(.*)/i",'$1',$media);
$media=preg_replace("/([^\"])\"(.*)/i",'$1',$media);
$media=preg_replace("/([^\>])\>(.*)/i",'$1',$media);
$media=preg_replace("/([^ ])\ [0-9\'\"\>\/](.*)/is",'$1',$media);
$media=@preg_replace("/([^ ])\ [0-9\'\"\>\/](.*)/i",'$1',$media);
$mediaext=preg_replace("/.*[.]([^.]+)/",'$1',$media);
return $media;
}
//above function returns an array
May be badley written but does the job. So I shall see if I can do a preg_match function.
=======================
Edit:
I have now written a function that will extract the links more efficiently and is as follows:
<?
function getlinks($url) {
$data=file_get_contents($url);
preg_match_all('/(href|src)\=(\"|\')[^\"\'\>]+/i',$data,$media);
unset($data);
$data=preg_replace('/(href|src)(\"|\'|\=\"|\=\')(.*)/i',"$3",$media[0]);
return $data;
}
//now to use the function
echo "<xmp>";
var_dump(getlinks('http://www.google.com.au'));
echo "</xmp>";
?>
And the function as you can see returns an array of the links.
cwarn23
Occupation: Genius
3,033 posts since Sep 2007
Reputation Points: 413
Solved Threads: 259