I have a script to crawl all link but in homepage..That can crawl internal n external link..

But that's just the matter. I want to crawl all internal link in all pages please tell me how to do it.. Here is all that I've got :

<?php
if (isset($_POST['url'])) {
$url = $_POST['url'];
$f = @fopen($url,"r");
while( $buf = fgets($f,1024) )
{
$buf = fgets($f, 4096);
preg_match_all("/<\s*a\s+[^>]*href\s*=\s*[\"']?([^\"' >]+)[\"' >]/isU",$buf,$words);
for( $i = 0; $words[$i]; $i++ )
{
for( $j = 0; $words[$i][$j]; $j++ )
{
$cur_word = strtolower($words[$i][$j]);
print "$cur_word<br>";
}

}
}
}


}
?>

If you only want internal links from the original URL you posted, then I would use the parse_url function together with regular expressions...

Using the host index from the parse_url result array in a regex will tell you whether the link is internal or external... then do as you wish with the links you find.

R.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.