Hey guys

I am grabbing URLs from the database. Then with each URL, I need to open that file, and only display the contents between <h1> tags and discard the rest.

-- IMPORTANT --

I do not want to actually remove everything else from each file, I simply want to echo out the contents of whatever is contained inside the <h1> tags.

I have tried str_replace, as well as preg_replace, and I cannot seem to get it right! Any pointers on how to accomplish this? Thank you!!

Recommended Answers

All 4 Replies

Did you try Simple HTML DOM Parser.
It is very simple.

include 'simple_html_dom.php';
$html = file_get_html('http://www.daniweb.com/');

foreach($html->find('h1') as $element) 
       echo $element->plaintext.'<br>'; //find & echo all the h1 tags

See the link for documentation. It is easy to understand too.

Member Avatar for diafol

Rather than load an external lib, maybe...

$urlArray = array('example'=>'http://www.example.com/', ...);

function get_first_tag($urls, $tag='h1')
{
    $dom = new DOMDocument();
    $output = [];
    libxml_use_internal_errors(true);
    foreach($urls as $site=>$url)
    {
        $dom->loadHTMLFile($url);
        $node = $dom->getElementsByTagName($tag);
        if($item = $node->item(0))
        {
            $output[$site] = $item->nodeValue;
        }
    }
    return $output;
}


print_r(get_first_tag($urlArray)); //get first h1 tag contents
print_r(get_first_tag($urlArray, 'h3')); //get first h3 tag contents

Thank you for your responses! As it turned out, I figured out the issue in my script just minutes after posting this..I wanted to say that I really liked both of your answers; I learned something new from each!

Member Avatar for diafol

Great! How about posting your script to show us how you did it?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.