Member Avatar for philip.s

Hi Guys

I want to write a program that extracts links from a url and then adds them to a mysql database.

I have found some nice examples ie

<?
function getlinks($url) {
    $data=file_get_contents($url);
    preg_match_all('/(href|src)\=(\"|\')[^\"\'\>]+/i',$data,$media);
    unset($data);
    $data=preg_replace('/(href|src)(\"|\'|\=\"|\=\')(.*)/i',"$3",$media[0]);
    return $data;
    }

//now to use the function
echo "<xmp>";
var_dump(getlinks('http://www.google.com.au'));
echo "</xmp>";
?>

Are there other methods that I can use and how do I get the output into a database?

Thank You

Recommended Answers

All 8 Replies

I guess that getlinks() returns an array with all links in a page. So if you go through that array with a for-loop you can simply save them in a database with a regular SQL-expression.

Member Avatar for philip.s

Thanks a lot for your response, but I am a bit of a newbie at php would you be able to put the above statement into code and elaborate a bit on it pleeeeeeeeeeeeeeeeeeezzze

foreach($data as $link){
mysql_query("INSERT INTO table_name (link, something, something_else) VALUES('".$link."','blah1','blah2')");
}

basically when you have your $data it is an array()

that means $data = array("http://link1.com", "http://link2.com", "and so on...");

foreach just goes over each item in data array and does something for each item in the array specified by whatever is in the {}

in this case

we are inserting each link in the db.

hope this sheds a little light on the situation.

Member Avatar for philip.s

Like this? How would you change it?

<?php

$url = "http://www.google.co.za",

function getlinks($url) {
    $data=file_get_contents($url);
    preg_match_all('/(href|src)\=(\"|\')[^\"\'\>]+/i',$data,$media);
    unset($data);
    $data=preg_replace('/(href|src)(\"|\'|\=\"|\=\')(.*)/i',"$3",$media[0]);
    return $data;
    }

$con = mysql_connect("localhost","root","");
if (!$con)
  {
  die('Could not connect: ' . mysql_error());
  }

mysql_select_db("temp", $con);

mysql_query($sql,$con);

foreach($data as $link){
mysql_query("INSERT INTO first (link)
 VALUES('".$link."')");
}

mysql_close($con);


?>

Thanks for your input :-)

on line 21 u are executing a mysql query but i dont see where you are defining what $sql is or why you are doing it. i think you can remove line 21

Member Avatar for philip.s

Hi guys

Would anyone happen to know why this program is not working:

<?php

function getlinks($url) {
    $data=file_get_contents($url);
    preg_match_all('/(href|src)\=(\"|\')[^\"\'\>]+/i',$data,$media);
    unset($data);
    $data=preg_replace('/(href|src)(\"|\'|\=\"|\=\')(.*)/i',"$3",$media[0]);
    return $data;
    }


$con = mysql_connect("localhost","root","");
if (!$con)
  {
  die('Could not connect: ' . mysql_error());
  }
mysql_select_db("temp", $con);
$sql = "CREATE TABLE first
(
link NOT NULL
)";

getlinks('http://www.google.com');

mysql_select_db("temp", $con);

foreach($data as $link){
$tbl = ("INSERT INTO first (link)
 VALUES('"$link"')");
}

mysql_query($con, $sql, $tbl);

mysql_close($con);


?>

Thanks

You have some sql in there that is saying to create a table but you are not actually executing that query so i assume table "table" doesnt exist.
also the params for your mysql_query are wrong. it should be mysql_query($query, $connection); you should create a table in phpmyadmin and then just manipulate the data in that table from your scripts instead of creating the table from your scripts.

Member Avatar for diafol

$data=file_get_contents($url);

This is deemed unsafe by many hosts when used in a x-domain fashion, so they disable it. Check your phpinfo() to see if it is disabled or not. Just coz it works for you on localhost, doesn't mean that it'll work when you run on your remote server.

BTW - this could open you up to a bunch of nasties - be careful!

If it is disabled, a workaround would be to use cURL functions. See the php manual for this - some good examples there.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.