Parsing the tags to get the contains

Question

mark103 0 Newbie Poster

11 Years Ago

Hi all,

I'm working on my PHP to generating the xml output. I use simple_html_dom method to parsing the contains from my script called get-listing.php.

Here is my PHP script:

<?php
ini_set('max_execution_time', 300);
$errmsg_arr = array();
$errflag = false;
$link;
include ('simple_html_dom.php');

$base1 = "http://www.mysite.com/get-listing.php";
$html = file_get_html($base1);   

$countp = $html->find('p');     
header("Content-type: text/xml");
$xml .= "<?xml version='1.0' encoding='UTF-8' ?>";
//echo $xml;
$xml .= '<tv generator-info-name="www.testbox.elementfx.com/xmltv">';
?>

Output for get-listing.php:

<p id='channels'>ABC FAMILY</p><p id='links'><a href='http://www.mysite.com/get-listing.php?channels=ABC FAMILY&id=101'></p><a id="aTest" href="">Stream 1</a><br><br><p id='channels'>102 CBS</p><p id='links'><a href='http://www.mysite.com/get-listing.php?channels=CBS&id=102'></p><a id="aTest" href="">Stream 1</a><br><br>

Here is the output for ABC-FAMILY:

<span id="time1">9:00 PM </span> - <span id="title1">17 Again</span><br><br><span id="time2">11:00 PM </span> - <span id="title2">The 700 Club</span><br><br>

Here is the output for CBS:

<span id="time1">9:00 PM </span> - <span id="title1">Unforgettable: Til Death</span><br><br><span id="time2">10:00 PM </span> - <span id="title2">Hawaii Five-0: Ho'i Hou</span><br><br>

I'm creating the variable so I can connect to get-listing.php script. I want to create the loops variable to get the list of url from get-listing.php with each html tag called . And also I want to get the list of contains from the get-listing.php when I open on each url. The contains I want to get from the tags is called  and .

Can you please tell me how I can create the loops to get the list of url from each tag  to open them using with simple_html_dom and how I can get the list of contains I want to get from abc-family and cbs output?

php

3 Contributors
4 Replies
162 Views
1 Day Discussion Span
Latest Post 11 Years Ago Latest Post by veedeoo

All 4 Replies

veedeoo 474 Junior Poster

11 Years Ago

WARNING! Parsing any remote contents without any written permission from the owner can cause a messy legal battle in court. Prepare to have millions of dollars if you are standing against big corporation. Just saying. Technology is pretty cool, but crossing beyond what we call responsible and ethical programming is an extremely dangerous practices.

One effecient way of doing this is to load the remote html file through cURL. At least, this will minimize the vulnerability of your server (by setting the allow_url_fopen directive to ON). With the utilization of cURL, you can set this to OFF.

 function useCurl($url,$source=null){
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)');
        curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,10);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $this->output = curl_exec($ch);
        curl_close($ch);
        return $this->output;
        unset($this->output);
      }

we can implement simple html dom as simple as this

$this_html = file_get_html(useCurl('my_url'));

foreach($this_html->find('p') as $item_p){
    ## do something witht the $item_p
    }

to parse anything with class and id

foreach($this_html->find('p') as $item_p)){

    foreach($item_p->find('id=channels') as $channel){
        echo $channel; // this give us ABC FAMILY


}

parse the remaining whatever you will have to do..

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mark103 0 Newbie Poster · Answer 1 · 2014-04-05T21:31:36+00:00

mark103 0 Newbie Poster

11 Years Ago

does anyone know how???????????

Fernando_4 0 Junior Poster in Training · Answer 2 · 2014-04-06T14:41:03+00:00

I'm not sure how simple_html_dom.php works and if it can read classes/ids so you could do something like $html->find('p id=links') or $html->find('p #links'). Can you post the find() function from that class so we can try to figure it out? If it's something you grabbed from someone on the web link us to their site as they may have some documentation on that.

veedeoo 474 Junior Poster Featured Poster · Answer 3 · 2014-04-06T20:14:37+00:00

change the cURL codes above to this. I just copy it from the parser class I wrote some years ago.

it should read like this

     function useCurl($url,$source=null){
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)');
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
return $output;
unset($output);
}

Parsing the tags to get the contains

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers