Can anyone see anything wrong with this. Im stuck :(

$sql1 = mysql_query("DELETE FROM spider WHERE url='$addtolist'");

if (!mysql_query($sql1,$con))
  {
  die('Error Deleting: ' . mysql_error());
  }
echo "Record Deleted <br />";

$sql2="INSERT INTO list (title, url, description)
VALUES
('$title','$url','$description')";

if (!mysql_query($sql2,$con))
  {
  die('Error Adding: ' . mysql_error());
  }
echo "1 record added";

The first line you posted will run the query, so the if statement will always return false as the row is already deleted, change

$sql1 = mysql_query("DELETE FROM spider WHERE url='$addtolist'");

to

$sql1 = "DELETE FROM spider WHERE url='$addtolist'";

Thanks for that.

Also, when people submit data there is alot of duplicate data being submitted. How can I filter through everything to avoid this?

I can get the domain name using this

preg_match('@^(?:http://)?([^/]+)@i',
    $addtolist, $hostname);
$site = $hostname[1];

But then where would I go from here?

Sorry, Ive not done any work with MySQL before.

Make another column in the database for hostname (call it whatever you like) and store the result of $hostname[1] in there.

Then when a new URL is submitted, get the hostname, and see if it already exists in the database, for example:

preg_match ('@^(?:http://)?([^/]+)@i', $addtolist, $hostname);
$site = $hostname[1];
$result = mysql_query("SELECT `hostname` FROM `spider` WHERE `hostname` = '$site'");
if (mysql_num_rows($result) != 0) {
  // Hostname already in db
} else {
  // Not in the db
}

Right, Ok. I understand it but Im reeeaaaally confused :|

This is what Ive come up with. What do you make of it?

$result = mysql_query("SELECT url FROM blacklist WHERE url = '$hostname'");

for ($i = 0; $i < $hrefs->length; $i++) {
	$href = $hrefs->item($i);
	$url = $href->getAttribute('href');
	
	if (mysql_num_rows($result) != 0) {
  	echo 'URL already in DB';
	} 	
	else {
		echo $url.'<br />';
  	$query = "INSERT INTO spider (url, site) VALUES ('$url', '$hostname')";
	mysql_query($query) or die('Error, insert query failed');
	$query1 = "INSERT INTO blacklist (url) VALUES ('$url')";
	mysql_query($query1) or die('Error, Blacklisting failed');
}
}

I am genuinly trying. Im not just taking advantage here :p

Any chance you can post some more of the script, for example what $hrefs contains?

I dont think that will work, but if you post up some more it would be easier to provide a decent response/suggestion.

Yeah sure. Essentially its a spider. It finds all the links on a page :)

<?php
include('includes/config.php');

//Collect URLS

$target_url = $_GET['url'];

//Get host name from URL
preg_match('@^(?:http://)?([^/]+)@i',
    $target_url, $hostname);
$hostname = $hostname[1];


$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
if (!$html) {
	echo "<br />cURL error number:" .curl_errno($ch);
	echo "<br />cURL error:" . curl_error($ch);
	exit;
}

// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);

// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");

$result = mysql_query("SELECT url FROM blacklist WHERE url = '$hostname'");

for ($i = 0; $i < $hrefs->length; $i++) {
	$href = $hrefs->item($i);
	$url = $href->getAttribute('href');
	
	if (mysql_num_rows($result) != 0) {
  	echo 'URL already in DB';
	} 	
	else {
		echo $url.'<br />';
  	$query = "INSERT INTO spider (url, site) VALUES ('$url', '$hostname')";
	mysql_query($query) or die('Error, insert query failed');
	$query1 = "INSERT INTO blacklist (url) VALUES ('$url')";
	mysql_query($query1) or die('Error, Blacklisting failed');
}
}

?>

The last part should be

$result = mysql_query("SELECT url FROM blacklist WHERE url = '$hostname'");
if (mysql_num_rows($result) != 0) {
  echo 'URL already in DB';
} else {
  for ($i = 0; $i < $hrefs->length; $i++) {
    $href = $hrefs->item($i);
    $url = $href->getAttribute('href');
    echo $url.'<br />';
    $query = "INSERT INTO spider (url, site) VALUES ('$url', '$hostname')";
    mysql_query($query) or die('Error, insert query failed');
    $query1 = "INSERT INTO blacklist (url) VALUES ('$url')";
    mysql_query($query1) or die('Error, Blacklisting failed');
  }
}

Ah, we were getting mixed messages there. I needed to check if $url was already in the database where as you were checking for $hostname :p

I ended up with this if anyone else was curious...

for ($i = 0; $i < $hrefs->length; $i++) {
	$href = $hrefs->item($i);
	$url = $href->getAttribute('href');
	$result = mysql_query("SELECT * FROM blacklist WHERE url = '$url'");
		if(mysql_num_rows($result) != 0){
			echo 'URL already in database';
		}
		else {
			echo $url.'<br />';
  	$query = "INSERT INTO spider (url, site) VALUES ('$url', '$hostname')";
	mysql_query($query) or die('Error, insert query failed');
	$query1 = "INSERT INTO blacklist (url) VALUES ('$url')";
	mysql_query($query1) or die('Error, Blacklisting failed');
		}
}
This question has already been answered. Start a new discussion instead.