0

Hello guys
i have a problem with PHP code and i dont know what is wrong

i want to get the content of a website then search for specific tags then take the value of tags , lets say i want to search for example for these tags

<div class="text platinum">15Platinum</div>
<div class="text gold">64 Gold</div>
<div class="text silver">178 Silver</div>
<div class="text bronze">637 Bronze</div>

and get the value in bold and red

i wrote i simple php code
and every time i run the site i got this error

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Input is not proper UTF-8, indicate encoding ! in Entity, line: 22 in C:\AppServ\www\datap\test.php on line 21

Fatal error: Call to a member function getElementsByTagName() on a non-object in C:\AppServ\www\datap\test.php on line 30

what should i do ??

<?php


$url = 'http://us.playstation.com/playstation/psn/profiles/hawkiq';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); 
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch); 
curl_close ($ch);

$pregArray   = array('/Bronze/', '/Silver/', '/Gold/', '/Platinum/', '/text/','/\%/', '/\s/');
/* The name of the div classes/ids that we need to get data from */
$valuesToLoad = array('text', 'leveltext', 'progresstext', 'text bronze', 'text silver', 'text gold', 'text platinum');
														
 /* Set up our DOM and load the HTML */
$dom = new DOMDocument();
$dom->recover = true;
$dom->strictErrorChecking = false;
$dom->loadHTML($result);
$channel=$dom->getElementsByTagName('text platinum')->item(0);
// just to tets if there is output :(
echo $channel ;


   __parse(); 
    
    
    function __parse() {
       
        foreach ($dom->getElementsByTagName('div') as $element) {
           
            foreach ($element->attributes as $key => $node) {
               
                foreach ($valuesToLoad as $value) {
                    
                    if ($element->getAttribute($key) == $value) {
                        
                        $varName = $value == "text" ? "total_trophies" : preg_replace($pregArray, '', $value);
                       
                        $varName = preg_replace($pregArray, '', $element->nodeValue);
                    }
                }
                
            }
            
        }
		}
?>

Edited by OsaMasw: n/a

3
Contributors
14
Replies
16
Views
5 Years
Discussion Span
Last Post by OsaMasw
Featured Replies
  • 1

    That's what you get from ignoring the error. Read More

  • 1

    You already used a regex to remove data from the page, why not use a regex to extract the information you want: [CODE=text] <div class="text gold">(.*?)</div> [/CODE] Read More

  • 1

    Use [URL="http://php.net/manual/en/function.strip-tags.php"]strip_tags[/URL] on your output. Read More

0

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Input is not proper UTF-8, indicate encoding ! in Entity, line: 22 in C:\AppServ\www\datap\test.php on line 21

As the error states, you need to set the correct encoding.

0

lets say i ignored the error above this error still appears to me.

Fatal error: Call to a member function getElementsByTagName() on a non-object in C:\AppServ\www\datap\test.php on line 30

0

could anyone help me :(

just tell me how to do it
i need to fetch this webpage
http://us.playstation.com/playstation/psn/profiles/hawkiq

then search for these tags

<div id="leveltext"> 12</div>
<div id="text">899 </div>
<div class="progresstext"> 94% </div>
<div class="text platinum">15 Platinum</div>
<div class="text gold">66 Gold</div>
<div class="text silver">179 Silver</div>
<div class="text bronze">639 Bronze</div>
and put the text between tags into new array

every time i tried i got these errors

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: error parsing attribute name in Entity, line: 19 in C:\AppServ\www\datap\test.php on line 56

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : sup in Entity, line: 299 in C:\AppServ\www\datap\test.php on line 56

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 325 in C:\AppServ\www\datap\test.php on line 56

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 325 in C:\AppServ\www\datap\test.php on line 56

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : script in Entity, line: 357 in C:\AppServ\www\datap\test.php on line 56

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : div in Entity, line: 391 in C:\AppServ\www\datap\test.php on line 56
0

i solve the encoding part by using this

$html1= curl_getinfo($ch);  
curl_close ($ch);
echo $html;
echo "<br><br><br><br>";
//try to get page encoding as it was sent from server
if ($html1['content_type']){
    $arr= explode('charset=',$html1['content_type']);
    $csethdr= strtolower(trim($arr[1]));
} else {
    $csethdr= false;
}

$cset= false;
$arr= array();

//This has to replace page meta tags for charset with utf-8, but it doesn't actually help(see the bug info).
if (preg_match_all(
'/(<meta\s*http-equiv="Content-Type"\s*content="[^;]*;
\s*charset=([^"]*?)(?:"|\;)[^>]*>)/' //merge this line
,$html,$arr,PREG_PATTERN_ORDER)){
    $cset= strtolower(trim($arr[2][0]));
    if ($cset!='utf-8'||$cset!=$csethdr){
        $new= str_replace($arr[2][0],'utf-8',$arr[1][0]);
        $html= str_replace($arr[1][0],$new,$html);
        $cset= $csethdr;
    } else {
        $cset= false;
    }

    if ($cset=='utf-8'){
        $cset= false;
    }
}
unset($arr);
if ($cset){
    $html= iconv($cset,'utf-8',$html);
}
unset($cset);

//solve dom bug
$html=preg_replace('/<head[^>]*>/','<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
',$html);

just need to fetch the required information

0

no its my user profile on PlayStation.com, i need to extract the information from it to put it in my website.
Sorry pritaeas i didn't understand what are you trying to explain ??

1

You already used a regex to remove data from the page, why not use a regex to extract the information you want:

<div class="text gold">(.*?)</div>

Edited by pritaeas: n/a

0

and that's the point i didn't understand :P
Sorry am a beginner in PHP programing
i solve all my problems by using simple_html_dom class
its really simple and suitable for beginners like me

include_once('simple_html_dom.php');
$id='hawkiq';
$url = 'http://us.playstation.com/playstation/psn/profiles/'.$id;
$html = file_get_html($url);
$platinum = $html->find('div[class="text platinum"]',0);
echo $platinum;

the output is something like "17", and that's what i need to do, but in some <td> tags
i used this code

foreach($html->find('td.tlevel') as $e)
    $s= $e->innertext;

the out put will be like this
"<img src="hawkiq1_files/platinum_l_002.png" width="20" height="20" /> 17"

i don't need the img in my output i just need the "17"
other tags have spaces or , so i can use exclude to separate it but this i don't know how to do it
just solve this problem and this question will be closed :)

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.