0

Hi all,

I'm working on my PHP to generating the xml output. I use simple_html_dom method to parsing the contains from my script called get-listing.php.

Here is my PHP script:

<?php
ini_set('max_execution_time', 300);
$errmsg_arr = array();
$errflag = false;
$link;
include ('simple_html_dom.php');

$base1 = "http://www.mysite.com/get-listing.php";
$html = file_get_html($base1);   

$countp = $html->find('p');     
header("Content-type: text/xml");
$xml .= "<?xml version='1.0' encoding='UTF-8' ?>";
//echo $xml;
$xml .= '<tv generator-info-name="www.testbox.elementfx.com/xmltv">';
?>

Output for get-listing.php:

<p id='channels'>ABC FAMILY</p><p id='links'><a href='http://www.mysite.com/get-listing.php?channels=ABC FAMILY&id=101'></p><a id="aTest" href="">Stream 1</a><br><br><p id='channels'>102 CBS</p><p id='links'><a href='http://www.mysite.com/get-listing.php?channels=CBS&id=102'></p><a id="aTest" href="">Stream 1</a><br><br>

Here is the output for ABC-FAMILY:

<span id="time1">9:00 PM </span> - <span id="title1">17 Again</span><br><br><span id="time2">11:00 PM </span> - <span id="title2">The 700 Club</span><br><br>

Here is the output for CBS:

<span id="time1">9:00 PM </span> - <span id="title1">Unforgettable: Til Death</span><br><br><span id="time2">10:00 PM </span> - <span id="title2">Hawaii Five-0: Ho'i Hou</span><br><br>

I'm creating the variable so I can connect to get-listing.php script. I want to create the loops variable to get the list of url from get-listing.php with each html tag called <p id="links">. And also I want to get the list of contains from the get-listing.php when I open on each url. The contains I want to get from the tags is called <span id="title1"> and <span id="title2">.

Can you please tell me how I can create the loops to get the list of url from each tag <p id='links'> to open them using with simple_html_dom and how I can get the list of contains I want to get from abc-family and cbs output?

3
Contributors
4
Replies
28
Views
3 Years
Discussion Span
Last Post by veedeoo
0

I'm not sure how simple_html_dom.php works and if it can read classes/ids so you could do something like $html->find('p id=links') or $html->find('p #links'). Can you post the find() function from that class so we can try to figure it out? If it's something you grabbed from someone on the web link us to their site as they may have some documentation on that.

Edited by Fernando_4

1

WARNING! Parsing any remote contents without any written permission from the owner can cause a messy legal battle in court. Prepare to have millions of dollars if you are standing against big corporation. Just saying. Technology is pretty cool, but crossing beyond what we call responsible and ethical programming is an extremely dangerous practices.

One effecient way of doing this is to load the remote html file through cURL. At least, this will minimize the vulnerability of your server (by setting the allow_url_fopen directive to ON). With the utilization of cURL, you can set this to OFF.

 function useCurl($url,$source=null){
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)');
        curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,10);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $this->output = curl_exec($ch);
        curl_close($ch);
        return $this->output;
        unset($this->output);
      }

we can implement simple html dom as simple as this

$this_html = file_get_html(useCurl('my_url'));

foreach($this_html->find('p') as $item_p){
    ## do something witht the $item_p
    }

to parse anything with class and id

foreach($this_html->find('p') as $item_p)){

    foreach($item_p->find('id=channels') as $channel){
        echo $channel; // this give us ABC FAMILY


}

parse the remaining <p> whatever you will have to do..

0

change the cURL codes above to this. I just copy it from the parser class I wrote some years ago.

it should read like this

     function useCurl($url,$source=null){
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)');
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
return $output;
unset($output);
}
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.