function get_url_contents($url){
    $crl = curl_init();
    $timeout = 5;
    curl_setopt ($crl, CURLOPT_URL,$url);
    curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
    $ret = curl_exec($crl);
    return $ret;

$url = 'http://books.rediff.com/categories/fiction-genres/2180204';
$outhtml = get_url_contents($url);
$html= str_get_html($outhtml);

foreach($html->find('a') as $link) {
    echo "<a href =".$link->href.">".$link->href."</a><br>";

This gives all the links present on the given URL.
I wish to remove all the duplicate entries as well as those Javascript links that I get after crawling like "javascript:doSearch('MT'); javascript:window,history.go(-1);" ...
Please help!
Thanks ...

5 Years
Discussion Span
Last Post by apanimesh061
Featured Replies
  • 1

    Store your href's in an array and use [this function](http://php.net/manual/en/function.array-unique.php). Read More


After I extract all the urls from the web page ... how do I traverse the urls using bfs or dfs ? Do I have to store them in a database and then traverse through them ??


Whether you store them in an array or in the database, there is no issue between BFS or DFS, because those two apply to binary trees or graphs.


Well, if I do not store the URLs in the database then how will I traverse them by BFS/DFS ?


I have been able to remove the duplicates in the urls crawled.
But I cannot understand that how should I implement BFS/DFS traversal in this crawler .... ?
Like I stored all the crawled arrays in an array ...
Do I have to store all the URLs directly into a tree instread of an array ???


Now that I have at least got all the urls of a page, I just wish to traverse urls using BFS and DFS ....
Please tell me how do we do that in php ?? I have a feeling I am asking this question the wrong way or may be something is missing !!

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.