0

For example here is some code.....

preg_match_all("/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is", $stripped_file, $matches);

What I don't get is all of those special characters? (?:[^>]*)
How can I do this with the h1 tag, h2 tag, and h3 tag?

Edited by Joe34: error

2
Contributors
3
Replies
4
Views
6 Years
Discussion Span
Last Post by blocblue
0

Hi,

To understand regular expressions, read this article. It explains what the different modifiers do.

With regard to extracting the content from between h1, h2 and h3 tags, you could use something like:

preg_match_all('<h1[^>]*>([^<]*)</h1>', $matches);

The '<h1' defines the opening tag of the header.
The '[^>]*' should match any characters except for a closing tag >. This is in case ids and classes are declared.
The '([^<]*)' is similar to before in that it'll match any character except for an opening tag <. The brackets have a special meaning in that whatever is matched by the expression within the brackets will be captured in the $matches array.
And the '</h1>' obviously closes the header.

Hope this helps.
R.

0

Thanks now how can I include that in this foreach function? Where I can insert them into the mysql database together. Also $matches[1] is a link. Thanks in advanced!

foreach($matches[1] as $key => $value)
{

if(strpos($value,"http://") != 'FALSE' && strpos($value,"https://") != 'FALSE')
{
$New_URL = "http://" . $domain . $value;
}
else
{
$New_URL = $value;    
}
$New_URL = addslashes($New_URL);
$Check = mysql_query("SELECT * FROM pages WHERE url='$New_URL'");
$Num = mysql_num_rows($Check);

if($Num == 0)
{
    mysql_query("INSERT INTO pages (url, title, keywords)
                    VALUES ('$New_URL','$Title', '$Keywords')");

    $_SESSION['i']++;

    echo $_SESSION['i'] . "</br>";
}
echo mysql_error();
}
0

I'm not sure that line 4 is going to work. A link isn't going to have http:// and https:// in it. It would be one or the other. You might therefore want to change the operand from && to ||.

R.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.