get_meta_tags is not working properly

Question

rrlogu 0 Newbie Poster

12 Years Ago

I am trying to get the meta data information from the below url, but its not working, all I'm getting on the screen is an empty screen. Any kind of help is highly appreciated.

$tags = get_meta_tags('http://watch32.com/movies-online/return-to-the-blue-lagoon-2766');

echo $tags['author'];    
echo $tags['keywords'];    
echo $tags['description']; 
echo $tags['geo_position'];

perl php

Edited 12 Years Ago by Dani because: Formatting fixed

2 Contributors
9 Replies
2K Views
2 Weeks Discussion Span
Latest Post 12 Years Ago Latest Post by rrlogu

All 9 Replies

veedeoo 474 Junior Poster

12 Years Ago

Hi,

To make it work you will have to enable allow_url_include to on. However, there is a big security hole associated by allowing this to be on.

The best option you can probably use is the Simple Html Dom and the use of cURL.

Looking at your codes above and the source code of your target url, these does not exist anywhere in the page

echo $tags['author'];
echo $tags['geo_position'];

, and therefore your codes should be modified to this..

## include the simple html dom file
## assuming it is located in the dom directory
include_once('dom/simple_html_dom.php');

## write your simple cURL function

function useCurl($url){
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)');
        curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,10);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $output = curl_exec($ch);curl_close($ch);
        return $output;
        unset($output);
      }

 ## define your url variable
 $url = 'http://watch32.com/movies-online/return-to-the-blue-lagoon-2766';

 ## call the function above, and asigned the output to variable html_file
 $html_file = useCurl($url);

 ## use the str_get_html method from the simple html dom
 $html = str_get_html($html_file);

 ## iterate through the parse data --> meta with the content attribute
 foreach($html->find('meta') as $item){

   echo $item->content.'<br/>';

   }

If properly done, the script above should output

text/html; charset=utf-8
index, follow
Return To The Blue Lagoon, return to the blue lagoon, Return To The Blue Lagoon (1991), movies, watch movies, watch movies online free, download movies, watch free movies online, watch movies online, stream movies, watch free movies, stream movies online, streaming movies online, free online movies, new movies, drama movies
Watch Return To The Blue Lagoon Online | return to the blue lagoon | Return To The Blue Lagoon (1991) | Director: William A. Graham | Cast: Milla Jovovich, Brian Krause, Lisa Pelikan, Courtney Barilla, Garette Ratliff Henson, Emma James, Jackson Barton, Nana Coburn, Brian Blain, Peter Hehir, Alexander Petersons
100002973380420
185768114834507

I purposely, let the script parse all of the meta tags, your job now is to make the script ignore these items, which is pretty simple to do..

 text/html; charset=utf-8
index, follow

## and this

 100002973380420
185768114834507

veedeoo 474 Junior Poster

12 Years Ago

Cool :).

To parse this

Return To The Blue Lagoon, return to the blue lagoon, Return To The Blue Lagoon (1991), movies, watch movies, watch movies online free, download movies, watch free movies online, watch movies online, stream movies, watch free movies, stream movies online, streaming movies online, free online movies, new movies, drama movies

you need to change the foreach loop code above with this

foreach($html->find('meta[name=keywords]') as $element){
   echo $element->content.'<br/>';

   }

And to parse this

Watch Return To The Blue Lagoon Online | return to the blue lagoon | Return To The Blue Lagoon (1991) | Director: William A. Graham | Cast: Milla Jovovich, Brian Krause, Lisa Pelikan, Courtney Barilla, Garette Ratliff Henson, Emma James, Jackson Barton, Nana Coburn, Brian Blain, Peter Hehir, Alexander Petersons

You need to add this

foreach( $html->find('meta[name=description]') as $desc){
echo $desc->content.'<br/>';

}

veedeoo 474 Junior Poster

12 Years Ago

Here is another hint.. If you want the movie details, change the codes to this..

foreach( $html->find('meta[name=description]') as $desc){

$movie_detail = explode("|", $desc->content);
echo 'Title : '. $movie_detail[2].'<br/>';
echo  $movie_detail[3].'<br/>';

$casts= explode(",", $movie_detail[4]);
echo 'Casts<br/>';
foreach($casts as $actor){
echo $actor.'<br/>';

}

}

The above codes should output similar to this

Title : Return To The Blue Lagoon (1991)
Director: William A. Graham
Casts
Cast: Milla Jovovich
Brian Krause
Lisa Pelikan
Courtney Barilla
Garette Ratliff Henson
Emma James
Jackson Barton
Nana Coburn
Brian Blain
Peter Hehir
Alexander Petersons

I hope this helps......I am in good faith that you were given the permission by the site owner,, otherwise it is not good parsing other site's content, because of the moral integrity defined by common principles.

Edited 12 Years Ago by veedeoo because: info added

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

rrlogu 0 Newbie Poster · Answer 1 · 2013-06-11T02:04:47+00:00

Thank you so much Veedeoo, I'm getting the exact output what I need !!!

rrlogu 0 Newbie Poster · Answer 2 · 2013-06-25T23:25:09+00:00

Thanks Veedoo !!! I'm trying to get the img with find('img') but its giving me 157 images, any tips to get the exact image.

veedeoo 474 Junior Poster Featured Poster · Answer 3 · 2013-06-26T06:28:57+00:00

for the first image of the movie you can do it like this..

foreach($html->find( 'div[class=box_img]') as $images){
   foreach($images->find('img') as $image){
    echo '<img src="'. $image->src.'"/><br/>';


   }

}

if you want all of the images below the main movie image, you can pretty much parse them this way..

 foreach($html->find('div[class=box_des]') as $thumbs){

    foreach($thumbs->find('img') as $thumb){
        echo '<img src="'.$thumb->src.'"/><br/>';


    }


}

One thing I cannot share is how to programatically take out their player. That is against the rule. However, if they do allow embed, then that is the only time you can parse the embed codes and not the player codes.

rrlogu 0 Newbie Poster · Answer 4 · 2013-06-27T00:25:13+00:00

Its working but not for all the url's ex: http://en.wikipedia.org/wiki/Iron_Man_3

veedeoo 474 Junior Poster Featured Poster · Answer 5 · 2013-06-27T01:27:13+00:00

That would be different application called wikipedia API.. I wrote one before, but I am currently busy and in the middle on developing a python application, so I don't have any time to look for it in my php files.

It can be done , but wikipedia is etremely strict in content reusing policy... Sorry, but I cannot do that.

To be able to conform with the TOS, you will have to download the image and store it in your server..

rrlogu 0 Newbie Poster · Answer 6 · 2013-06-27T10:51:17+00:00

Actually I'm trying my code to work for any url, thanks Veedoo for all your support.

get_meta_tags is not working properly

Recommended Answers Collapse Answers

All 9 Replies

Recommended Answers