Hi everybody!

Well my question is as follows, I need to search for a div with a specific ID using php under wordpress, till now I found images and links, but I need to find a specific div ID and copy it, for now tested with this:

$html->find('$html->find('#divid');');
$html->find('div[id=divid]');

and several more but with no results m I doing something wrong?

Inside this id there is this kind of code:

....%3Fe%3D1309211119%26ri%3D1024%26rs%3D85%26h%3.....

Any idea?

Thanks ind advance!

Recommended Answers

All 32 Replies

Member Avatar for diafol

What class are you using for the $html->find? Not sure how this method is supposed to work. What does find return? What are the allowed parameters? In what format are the parameters?

wow, tons of questions ;)

I'm almost newbie but almost only.

Well I'm using this

Website: http://sourceforge.net/projects/simplehtmldom/

return is a string with "invalid" characters like % but I need them all, the hole string is it possible?

You tested with what ? This one ?

$html->find('$html->find('#divid');');
$html->find('div[id=divid]');

Does it correct use ? In my opinion, your class wont' work while it is quoting, the first line. Furthermore, why the class is inside the same class as parameter.

Perhaps, you would not probably well read the instruction. I don't know what that snippet would suppose which function.

Hi

I tested with both and nothing was returned,

and others like;

$html->find('a[class=miniature]');
$html->find('span[class=red]');
$html->find('img[class=borderx]');

are working fine, it must be something "stupid" but don know what, :(

PS: all them are precede by a variable, like: $picture = $html->find('img[class=borderx]');

okokok, I've foounf this inside the code mentioned before:

protected $self_closing_tags = array('img'=>1, 'br'=>1, 'input'=>1, 'meta'=>1, 'link'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);

wich, I supose, works with images, so I want to work with videos flv, wich are the strings I am searching for, what do you think?

Perhaps try:

$html->find('#divid');

no way :(
But thinking in another way, if I know how a string starts (file=) and how it ends (.flv) is there a way to copy/retrive the hole string?

I've check it but I think thah I'm messing my self :)


maybe is... echo $html->getElementById("div1")->???

checking right now

Yes, but looking at the source, that's just a wrapper of:

$html->find('#div1', 0);

Where the 0 makes it return the first one (I think, which seems useless to me, but they also included a getElementsById function, which returns all elements with the ID. Go figure).

Give XPath a try, here is a sample I have in my personal library:

txt-file.txt

<?xml version="1.0"?>

<body>
This line of information will be pulled because it is in between the body tags!
</body>

And the PHP Code:

<?php
$filename = "txt-file.txt"; // xml formatted text file...

// open the file and load contents into $string
$fh = fopen($filename, "r") or die("Can't open file");
$string = fread($fh, filesize($filename)); 
fclose($fh);

// Get it ready for XPath
$xml = new SimpleXMLElement($string);

// Specify your XPath query / expression
$result = $xml->xpath('/body');

// Loop through each result XPath has returned
while(list( ,$node) = each($result)) {
    echo '/body: ',$node,"\n";
}

?>

So for your xpath, do something like:

<?php

$url = ""; // Set this to url

$string = file_get_contents($url);

// Get it ready for XPath
$xml = new SimpleXMLElement($string);

// Specify your XPath query / expression
$result = $xml->xpath('/body/div[@id='div-id']');

// Loop through each result XPath has returned
while(list( ,$node) = each($result)) {
    echo '/body: ',$node,"\n";
}

?>

~John

Hi!

Sorry, it says this:


Parse error: syntax error, unexpected T_STRING in /home/*****/get.php on line 11

this line is:


$result = $xml->xpath('div[@id='player']');

Change one of the quote pairs tot double quotes.

Do as @twiss said, or escape it.

$result = $xml->xpath('div[@id=\'player\']');

That's because of the encapsulated apostrophes.

I'm not the best with XPath myself, but I've used it for some pretty slick things. I got the following code from (http://us.php.net/manual/en/class.domxpath.php) and I edited it to pull the content of any div with an id of player.

You may need to find another way to load the file you are working with though (save file_get_contents($file); as a file, then open it with the LoadHTMLFile function).

<?php
  
  $file = "file.txt"; // xml formatted text file...
  $doc = new DOMDocument();
  $doc->loadHTMLFile($file);
  
  $xpath = new DOMXpath($doc);
  
  // example 1: for everything with an id
  //$elements = $xpath->query("//*[@id]");
  
  // example 2: for node data in a selected id
  //$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");
  
  // example 3: same as above with wildcard
  $elements = $xpath->query("*/div[@id='player']");
  
  if (!is_null($elements)) {
    foreach ($elements as $element) {
  //    echo "<br/>[". $element->nodeName. "]";
  
      $nodes = $element->childNodes;
      foreach ($nodes as $node) {
        echo $node->nodeValue. "\n";
      }
    }
  }

?>

Guys I love you all! It begins to work :)

Any way, sorry if I'm messing up all the time, maybe I shpuld explain what is the goal, let's go:

I must to search inside a <div id="player" ........ </id> inside a determinated html remote file a string that begins with: flv_url= and ends with &amp;

I don't know really if this is the method or if its possible I only know those two constants in the string but I want to get the hole string, what do you thing? Is it possible? :S

Tahnks again!

If I understand properly, you want to find a <div> with an id of player, then you want to take the information out of it IF that information begins with "flv_url=" and ends with "&amp;"?

You will have to add some code at lines 22 - 25, something like:

$nodes = $element->childNodes;
foreach ($nodes as $node) {
$line_content $node->nodeValue;

preg_match('/(flv_url=).?*(&amp;)/is',$line_content,$return);
if(!empty($return[0])){$results[] = $line_content; unset($return);}
}
?>

You will probably have to make changes to that preg_match function. I know it has to start with a "/" slash, and end with "/is", the ".?*" is like a wild card matching everything in between. I'm just not very good with regex stuff. You may also mount (I think thats what its called) that beginning with a "^" char:

preg_match("/^(flv_url=).?*(&amp;)/is",$content,$return);

You might need to escape the =, & and ; chars.

Sorry, I am out of time. I'll check back end of today.

Good luck!

wich is the value for $line_content $node->nodeValue;

it show a "Parse error: syntax error, unexpected T_VARIABLE in"

Thanks thanks thanks ;)

It needs an = in between.

well, after all I think I'm "almost" there;

I think I must use preg_match function but I don't know how to use this function in a concrete url or file neither how to find the "·%·& strings that begins with file= and ends with &amp

I'm boring, I know :(

I'm gonna assume you don't know PHP at all, do you?

What is the URL you are trying to scrape?

You'll need something like this: /^flv_url=.+?&amp;$/

*Edit- I understand now after thoroughly reading page 2.

I'm going to do some hunting- I wrote a PHP application that does almost this exact same thing. When I find it, I'll put a link to the source files if you'd like.

@cjohnweb I'm so n00b, sorry ;) I'm trying to do my best, but in php my knowledge is really limited

@TySkby Thanks! this is what I needed,the contruction of the preg_match! But finally it was not the beginig and the end of the string it was inside, doing this:

<?php
$html = htmlspecialchars(file_get_contents('yourURL')); 


//$html->find preg_match(/^flv_url=.+?&amp;$/)

if(preg_match('/flv_url=.+?amp;/',$html))
    echo 'FOUND';
else
    echo 'NOT FOUND';   

?>

I'm having "founds" so,,,, the last question, how can I print the result? I mean the string? or the part I'm interested in?

XPath is specifically used for parsing the DOM. It really depends on the application you are trying to do. I can scripts with PHP, Curl and Xpath that can do anything from logging in to sites, scraping information, heck I could make a Google bot clone that follows links, saves email addresses, you name it. Id say that in this circumstance it depends on what language you are more familiar with, because I'm sure, like you are saying, Javascript could do this just as well as PHP, or even ASP.

@TySkby I'm grabbing videos from diferent sources for myself with my own player and allthis stuff, under wordpress and using a made engine (plugin) that makes it, but I want more sites to be grabbed (yes, adults too)

:)


Thanks to everybody and really sorry if I'm boring you ;)

@cjohnweb ok, but following the actual line "'we are" almost done, isn' it?

I'm not sure I understand, but yeah you are almost done it looks.

Give me a sample URL that I can test code with.

well a sample site to grab the code should be this http://freshmeat4u.com/beta/test.php

is a "brute" copy paste so videos are not working and layaout is a mess, but the code is there, WARNING adult content and its not for spam :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.