how to fetch a website then process it ?

Question

OsaMasw 13 Loving Helper

13 Years Ago

Hello guys
i have a problem with PHP code and i dont know what is wrong

i want to get the content of a website then search for specific tags then take the value of tags , lets say i want to search for example for these tags

<div class="text platinum">15Platinum</div>
<div class="text gold">64 Gold</div>
<div class="text silver">178 Silver</div>
<div class="text bronze">637 Bronze</div>

and get the value in bold and red

i wrote i simple php code
and every time i run the site i got this error

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Input is not proper UTF-8, indicate encoding ! in Entity, line: 22 in C:\AppServ\www\datap\test.php on line 21

Fatal error: Call to a member function getElementsByTagName() on a non-object in C:\AppServ\www\datap\test.php on line 30

what should i do ??

<?php


$url = 'http://us.playstation.com/playstation/psn/profiles/hawkiq';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); 
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch); 
curl_close ($ch);

$pregArray   = array('/Bronze/', '/Silver/', '/Gold/', '/Platinum/', '/text/','/\%/', '/\s/');
/* The name of the div classes/ids that we need to get data from */
$valuesToLoad = array('text', 'leveltext', 'progresstext', 'text bronze', 'text silver', 'text gold', 'text platinum');
														
 /* Set up our DOM and load the HTML */
$dom = new DOMDocument();
$dom->recover = true;
$dom->strictErrorChecking = false;
$dom->loadHTML($result);
$channel=$dom->getElementsByTagName('text platinum')->item(0);
// just to tets if there is output :(
echo $channel ;


   __parse(); 
    
    
    function __parse() {
       
        foreach ($dom->getElementsByTagName('div') as $element) {
           
            foreach ($element->attributes as $key => $node) {
               
                foreach ($valuesToLoad as $value) {
                    
                    if ($element->getAttribute($key) == $value) {
                        
                        $varName = $value == "text" ? "total_trophies" : preg_replace($pregArray, '', $value);
                       
                        $varName = preg_replace($pregArray, '', $element->nodeValue);
                    }
                }
                
            }
            
        }
		}
?>

html-css php

Edited 13 Years Ago by OsaMasw because: n/a

3 Contributors
14 Replies
623 Views
1 Week Discussion Span
Latest Post 13 Years Ago Latest Post by OsaMasw

All 14 Replies

pritaeas 2,211 ¯\_(ツ)_/¯

13 Years Ago

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Input is not proper UTF-8, indicate encoding ! in Entity, line: 22 in C:\AppServ\www\datap\test.php on line 21

As the error states, you need to set the correct encoding.

pritaeas 2,211 ¯\_(ツ)_/¯

13 Years Ago

That's what you get from ignoring the error.

pritaeas 2,211 ¯\_(ツ)_/¯

13 Years Ago

You already used a regex to remove data from the page, why not use a regex to extract the information you want:

<div class="text gold">(.*?)</div>

Edited 13 Years Ago by pritaeas because: n/a

pritaeas 2,211 ¯\_(ツ)_/¯

13 Years Ago

Use strip_tags on your output.

OsaMasw commented: you awesome +1

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

OsaMasw 13 Loving Helper · Answer 1 · 2012-01-13T16:26:33+00:00

lets say i ignored the error above this error still appears to me.

Fatal error: Call to a member function getElementsByTagName() on a non-object in C:\AppServ\www\datap\test.php on line 30

OsaMasw 13 Loving Helper · Answer 2 · 2012-01-14T14:12:23+00:00

could anyone help me :(

just tell me how to do it
i need to fetch this webpage
http://us.playstation.com/playstation/psn/profiles/hawkiq

then search for these tags

<div id="leveltext"> 12</div>
<div id="text">899 </div>
<div class="progresstext"> 94% </div>
<div class="text platinum">15 Platinum</div>
<div class="text gold">66 Gold</div>
<div class="text silver">179 Silver</div>
<div class="text bronze">639 Bronze</div>
and put the text between tags into new array

every time i tried i got these errors

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: error parsing attribute name in Entity, line: 19 in C:\AppServ\www\datap\test.php on line 56

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : sup in Entity, line: 299 in C:\AppServ\www\datap\test.php on line 56

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 325 in C:\AppServ\www\datap\test.php on line 56

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 325 in C:\AppServ\www\datap\test.php on line 56

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : script in Entity, line: 357 in C:\AppServ\www\datap\test.php on line 56

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : div in Entity, line: 391 in C:\AppServ\www\datap\test.php on line 56

pritaeas 2,211 ¯\_(ツ)_/¯ Moderator Featured Poster · Answer 3 · 2012-01-14T16:44:33+00:00

pritaeas 2,211 ¯\_(ツ)_/¯

13 Years Ago

Find out what encoding that file has, and set it before loading.

OsaMasw 13 Loving Helper · Answer 4 · 2012-01-14T19:55:10+00:00

i solve the encoding part by using this

$html1= curl_getinfo($ch);  
curl_close ($ch);
echo $html;
echo "<br><br><br><br>";
//try to get page encoding as it was sent from server
if ($html1['content_type']){
    $arr= explode('charset=',$html1['content_type']);
    $csethdr= strtolower(trim($arr[1]));
} else {
    $csethdr= false;
}

$cset= false;
$arr= array();

//This has to replace page meta tags for charset with utf-8, but it doesn't actually help(see the bug info).
if (preg_match_all(
'/(<meta\s*http-equiv="Content-Type"\s*content="[^;]*;
\s*charset=([^"]*?)(?:"|\;)[^>]*>)/' //merge this line
,$html,$arr,PREG_PATTERN_ORDER)){
    $cset= strtolower(trim($arr[2][0]));
    if ($cset!='utf-8'||$cset!=$csethdr){
        $new= str_replace($arr[2][0],'utf-8',$arr[1][0]);
        $html= str_replace($arr[1][0],$new,$html);
        $cset= $csethdr;
    } else {
        $cset= false;
    }

    if ($cset=='utf-8'){
        $cset= false;
    }
}
unset($arr);
if ($cset){
    $html= iconv($cset,'utf-8',$html);
}
unset($cset);

//solve dom bug
$html=preg_replace('/<head[^>]*>/','<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
',$html);

just need to fetch the required information

OsaMasw 13 Loving Helper · Answer 5 · 2012-01-18T00:23:36+00:00

OsaMasw 13 Loving Helper

13 Years Ago

anyone help ??

pritaeas 2,211 ¯\_(ツ)_/¯ Moderator Featured Poster · Answer 6 · 2012-01-18T14:46:16+00:00

Since you switched to regex for your problem, why not write a regex to extract your information.

pzuurveen 90 Posting Whiz in Training · Answer 7 · 2012-01-18T19:32:33+00:00

what are trying to do
Hack user profiles from playstation.com?

OsaMasw 13 Loving Helper · Answer 8 · 2012-01-21T14:34:07+00:00

no its my user profile on PlayStation.com, i need to extract the information from it to put it in my website.
Sorry pritaeas i didn't understand what are you trying to explain ??

OsaMasw 13 Loving Helper · Answer 9 · 2012-01-22T01:52:21+00:00

and that's the point i didn't understand :P
Sorry am a beginner in PHP programing
i solve all my problems by using simple_html_dom class
its really simple and suitable for beginners like me

include_once('simple_html_dom.php');
$id='hawkiq';
$url = 'http://us.playstation.com/playstation/psn/profiles/'.$id;
$html = file_get_html($url);
$platinum = $html->find('div[class="text platinum"]',0);
echo $platinum;

the output is something like "17", and that's what i need to do, but in some <td> tags
i used this code

foreach($html->find('td.tlevel') as $e)
    $s= $e->innertext;

the out put will be like this
"<img src="hawkiq1_files/platinum_l_002.png" width="20" height="20" /> 17"

i don't need the img in my output i just need the "17"
other tags have spaces or , so i can use exclude to separate it but this i don't know how to do it
just solve this problem and this question will be closed :)

OsaMasw 13 Loving Helper · Answer 10 · 2012-01-24T21:13:33+00:00

OsaMasw 13 Loving Helper

13 Years Ago

thnx for the help Sir

how to fetch a website then process it ?

Recommended Answers Collapse Answers

All 14 Replies

Recommended Answers