I need to find a way to parse an html file with php.
the idea here is to read the file eg:

http://www.somesite.com/page.html

Then look for a string eg:

"This text is present "

Then perform an action based on the whether or not that item is present.

The if else logic I am ok with but being a php newbie I dont know how to get the parsing done and evaluating the presence of the string.

Recommended Answers

All 6 Replies

Try this:

$yourfile = "page.html"; //Set your file here
$yourstring = "string"; //This is the string

$data = "";
$handle = fopen($yourfile, 'r');
while (!feof($handle)) {
$data .= fgets($handle, 256);
}
fclose($handle); 

if (strpos($string, $data)) {
echo "found";
} else {
echo "not found";
}
Member Avatar for diafol

I take it that this is an external site.

If so, file_get_contents() may work, but may not if your host has disallowed allow_url_fopen. You can check this with the phpinfo() function. Failing this, you can use cURL functions. Search Google and the php manual for usage of CURL functions.

Either way, you should be able to read the whole file into a string via file_get_contents or CURL. Then it's a matter of searching the string for a match as mentioned previously by grr.

If you need a curl function and can't find a suitable example, get back to me and I'll fire one off to you.

ok so I tried both ideas kinda got what I was looking for but did not quite get the file_get_contents to work with the script below.
:(

$yourfile = "page.html"; //Set your file here
      $yourstring = "string"; //This is the string
      $data = "";
      $handle = fopen($yourfile, 'r');
      while (!feof($handle)) {
      $data .= fgets($handle, 256);
      }
      fclose($handle);
      if (strpos($string, $data)) {
      echo "found";
      } else {
      echo "not found";
      }

How do I get to pass the contents from the file_get_content as a variable then parse that for a string, I am a real newbie to php and would have done this in Cold fusion so really specific pointers help.

Sounds like this has not been solved yet.

Here is a straight-forward script.

# Script LookForString.txt

# Read web page into a string variable.
var str page ; cat "http://www.somesite.com/page.html" > $page

# Is string "This text is present^" present in $page ?
if ( { sen -c ("^This text is present^") $page } > 0 )
    echo "found"
else
    echo "not found"
endif

Note that output of the command (cat "whatever") can be redirected into a string variable. "whatever" can be a local file or a web page - no difference. I think that was your main problem - how to get the contents of a web page into a string variable.

Script is in biterscripting ( http://www.biterscripting.com ). You can translate it into any language. If you make it better, please post the better version.

To try the script as is, save the script as file C:/Scripts/LookForString.txt , then enter the following command into biterscripting.

script "C:/Scripts/LookForString.txt"
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.