Hey guys

I am grabbing URLs from the database. Then with each URL, I need to open that file, and only display the contents between <h1> tags and discard the rest.


I do not want to actually remove everything else from each file, I simply want to echo out the contents of whatever is contained inside the <h1> tags.

I have tried str_replace, as well as preg_replace, and I cannot seem to get it right! Any pointers on how to accomplish this? Thank you!!

Did you try Simple HTML DOM Parser.
It is very simple.

include 'simple_html_dom.php';
$html = file_get_html('http://www.daniweb.com/');

foreach($html->find('h1') as $element) 
       echo $element->plaintext.'<br>'; //find & echo all the h1 tags

See the link for documentation. It is easy to understand too.

Member Avatar


Rather than load an external lib, maybe...

$urlArray = array('example'=>'http://www.example.com/', ...);

function get_first_tag($urls, $tag='h1')
    $dom = new DOMDocument();
    $output = [];
    foreach($urls as $site=>$url)
        $node = $dom->getElementsByTagName($tag);
        if($item = $node->item(0))
            $output[$site] = $item->nodeValue;
    return $output;

print_r(get_first_tag($urlArray)); //get first h1 tag contents
print_r(get_first_tag($urlArray, 'h3')); //get first h3 tag contents

Thank you for your responses! As it turned out, I figured out the issue in my script just minutes after posting this..I wanted to say that I really liked both of your answers; I learned something new from each!

Member Avatar


Great! How about posting your script to show us how you did it?