0

I have a script to crawl all link but in homepage..That can crawl internal n external link..

But that's just the matter. I want to crawl all internal link in all pages please tell me how to do it.. Here is all that I've got :

<?php
if (isset($_POST['url'])) {
$url = $_POST['url'];
$f = @fopen($url,"r");
while( $buf = fgets($f,1024) )
{
$buf = fgets($f, 4096);
preg_match_all("/<\s*a\s+[^>]*href\s*=\s*[\"']?([^\"' >]+)[\"' >]/isU",$buf,$words);
for( $i = 0; $words[$i]; $i++ )
{
for( $j = 0; $words[$i][$j]; $j++ )
{
$cur_word = strtolower($words[$i][$j]);
print "$cur_word<br>";
}

}
}
}


}
?>
2
Contributors
1
Reply
2
Views
8 Years
Discussion Span
Last Post by blocblue
0

If you only want internal links from the original URL you posted, then I would use the parse_url function together with regular expressions...

Using the host index from the parse_url result array in a regex will tell you whether the link is internal or external... then do as you wish with the links you find.

R.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.