0

Hey guys

I am grabbing URLs from the database. Then with each URL, I need to open that file, and only display the contents between <h1> tags and discard the rest.

-- IMPORTANT --

I do not want to actually remove everything else from each file, I simply want to echo out the contents of whatever is contained inside the <h1> tags.

I have tried str_replace, as well as preg_replace, and I cannot seem to get it right! Any pointers on how to accomplish this? Thank you!!

3
Contributors
4
Replies
15
Views
3 Years
Discussion Span
Last Post by diafol
1

Did you try Simple HTML DOM Parser.
It is very simple.

include 'simple_html_dom.php';
$html = file_get_html('http://www.daniweb.com/');

foreach($html->find('h1') as $element) 
       echo $element->plaintext.'<br>'; //find & echo all the h1 tags

See the link for documentation. It is easy to understand too.

1

Rather than load an external lib, maybe...

$urlArray = array('example'=>'http://www.example.com/', ...);

function get_first_tag($urls, $tag='h1')
{
    $dom = new DOMDocument();
    $output = [];
    libxml_use_internal_errors(true);
    foreach($urls as $site=>$url)
    {
        $dom->loadHTMLFile($url);
        $node = $dom->getElementsByTagName($tag);
        if($item = $node->item(0))
        {
            $output[$site] = $item->nodeValue;
        }
    }
    return $output;
}


print_r(get_first_tag($urlArray)); //get first h1 tag contents
print_r(get_first_tag($urlArray, 'h3')); //get first h3 tag contents
0

Thank you for your responses! As it turned out, I figured out the issue in my script just minutes after posting this..I wanted to say that I really liked both of your answers; I learned something new from each!

0

Great! How about posting your script to show us how you did it?

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.