Member Avatar

Hi Guys

I want to write a program that extracts links from a url and then adds them to a mysql database.

I have found some nice examples ie

<?
function getlinks($url) {
    $data=file_get_contents($url);
    preg_match_all('/(href|src)\=(\"|\')[^\"\'\>]+/i',$data,$media);
    unset($data);
    $data=preg_replace('/(href|src)(\"|\'|\=\"|\=\')(.*)/i',"$3",$media[0]);
    return $data;
    }

//now to use the function
echo "<xmp>";
var_dump(getlinks('http://www.google.com.au'));
echo "</xmp>";
?>

Are there other methods that I can use and how do I get the output into a database?

Thank You

I guess that getlinks() returns an array with all links in a page. So if you go through that array with a for-loop you can simply save them in a database with a regular SQL-expression.

Member Avatar

Thanks a lot for your response, but I am a bit of a newbie at php would you be able to put the above statement into code and elaborate a bit on it pleeeeeeeeeeeeeeeeeeezzze

foreach($data as $link){
mysql_query("INSERT INTO table_name (link, something, something_else) VALUES('".$link."','blah1','blah2')");
}

basically when you have your $data it is an array()

that means $data = array("http://link1.com", "http://link2.com", "and so on...");

foreach just goes over each item in data array and does something for each item in the array specified by whatever is in the {}

in this case

we are inserting each link in the db.

hope this sheds a little light on the situation.

Member Avatar

Like this? How would you change it?

<?php

$url = "http://www.google.co.za",

function getlinks($url) {
    $data=file_get_contents($url);
    preg_match_all('/(href|src)\=(\"|\')[^\"\'\>]+/i',$data,$media);
    unset($data);
    $data=preg_replace('/(href|src)(\"|\'|\=\"|\=\')(.*)/i',"$3",$media[0]);
    return $data;
    }

$con = mysql_connect("localhost","root","");
if (!$con)
  {
  die('Could not connect: ' . mysql_error());
  }

mysql_select_db("temp", $con);

mysql_query($sql,$con);

foreach($data as $link){
mysql_query("INSERT INTO first (link)
 VALUES('".$link."')");
}

mysql_close($con);


?>

Thanks for your input :-)

on line 21 u are executing a mysql query but i dont see where you are defining what $sql is or why you are doing it. i think you can remove line 21

Member Avatar

Hi guys

Would anyone happen to know why this program is not working:

<?php

function getlinks($url) {
    $data=file_get_contents($url);
    preg_match_all('/(href|src)\=(\"|\')[^\"\'\>]+/i',$data,$media);
    unset($data);
    $data=preg_replace('/(href|src)(\"|\'|\=\"|\=\')(.*)/i',"$3",$media[0]);
    return $data;
    }


$con = mysql_connect("localhost","root","");
if (!$con)
  {
  die('Could not connect: ' . mysql_error());
  }
mysql_select_db("temp", $con);
$sql = "CREATE TABLE first
(
link NOT NULL
)";

getlinks('http://www.google.com');

mysql_select_db("temp", $con);

foreach($data as $link){
$tbl = ("INSERT INTO first (link)
 VALUES('"$link"')");
}

mysql_query($con, $sql, $tbl);

mysql_close($con);


?>

Thanks

You have some sql in there that is saying to create a table but you are not actually executing that query so i assume table "table" doesnt exist.
also the params for your mysql_query are wrong. it should be mysql_query($query, $connection); you should create a table in phpmyadmin and then just manipulate the data in that table from your scripts instead of creating the table from your scripts.

Member Avatar

$data=file_get_contents($url);

This is deemed unsafe by many hosts when used in a x-domain fashion, so they disable it. Check your phpinfo() to see if it is disabled or not. Just coz it works for you on localhost, doesn't mean that it'll work when you run on your remote server.

BTW - this could open you up to a bunch of nasties - be careful!

If it is disabled, a workaround would be to use cURL functions. See the php manual for this - some good examples there.