For example here is some code.....

preg_match_all("/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is", $stripped_file, $matches);

What I don't get is all of those special characters? (?:[^>]*)
How can I do this with the h1 tag, h2 tag, and h3 tag?

Recommended Answers

All 3 Replies

Hi,

To understand regular expressions, read this article. It explains what the different modifiers do.

With regard to extracting the content from between h1, h2 and h3 tags, you could use something like:

preg_match_all('<h1[^>]*>([^<]*)</h1>', $matches);

The '<h1' defines the opening tag of the header.
The '[^>]*' should match any characters except for a closing tag >. This is in case ids and classes are declared.
The '([^<]*)' is similar to before in that it'll match any character except for an opening tag <. The brackets have a special meaning in that whatever is matched by the expression within the brackets will be captured in the $matches array.
And the '</h1>' obviously closes the header.

Hope this helps.
R.

Thanks now how can I include that in this foreach function? Where I can insert them into the mysql database together. Also $matches[1] is a link. Thanks in advanced!

foreach($matches[1] as $key => $value)
{

if(strpos($value,"http://") != 'FALSE' && strpos($value,"https://") != 'FALSE')
{
$New_URL = "http://" . $domain . $value;
}
else
{
$New_URL = $value;    
}
$New_URL = addslashes($New_URL);
$Check = mysql_query("SELECT * FROM pages WHERE url='$New_URL'");
$Num = mysql_num_rows($Check);

if($Num == 0)
{
    mysql_query("INSERT INTO pages (url, title, keywords)
                    VALUES ('$New_URL','$Title', '$Keywords')");

    $_SESSION['i']++;

    echo $_SESSION['i'] . "</br>";
}
echo mysql_error();
}

I'm not sure that line 4 is going to work. A link isn't going to have http:// and https:// in it. It would be one or the other. You might therefore want to change the operand from && to ||.

R.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.