I'm trying to understand XPath, but I've come acrost an issue I can not seam to find an answer for. In this case it seams that XPath is not returning what it should.

I've got a sample html file, test.html:

<html>

<div>
<p>1</p>
<p>2</p>
<p>3</p>
<p>4</p>
<p>5</p>
</div>

</html>

And my PHP file, test.php

<?php
echo "<pre>";
$url = "test.html";

$oldSetting = libxml_use_internal_errors(true);
libxml_clear_errors();

$html = new DOMDocument();
$html->loadHtmlFile($url);
$xpath = new DOMXPath($html);

$titles = $xpath->query("//p");
foreach ($titles as $title){
echo $title->nodeValue."<br />";
}

libxml_clear_errors();
libxml_use_internal_errors($oldSetting);

echo "</pre>";
?>

I can set the xpath query to //p and get all the p tags content on screen. That's good.
Set to /html//p I get the same. That's good.
Set //p[1] I get the first p tag. That's good.
Set to //p[5] I get the 5th p tag. That's good.

That's all groovy.

But if I do /html/div/p I get nothing. I've messed with a ton of similar queries with no luck.

I'm trying to read the url of an image from a website, and using Firefox's Firebug plugin I can copy the Xpath and I get something like

/html/body/div[2]/div/div[2]/div/div/div/div[2]/div/div/div[2]/p/img

But in PHP I get no result unless I remove all the "[2]", take out some of the div's and place a // before img.

So what's going on here, every example I've read says this is correct, but in the very very simple example above just a simple /html/div/p or /html/div//p does not work.

Thanks for your help!

Possibly has something to do with the loadHtmlFile. If I use load (since your HTML is well-formed, using $titles = $xpath->query("//html/div/p"); works as expected.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.