I need some help. Ive got this script which scrapes the IMDB top 250 movies list. What Im trying to do is add a search link next to the year bit.

Ive got it partially working with str_replace but it only adds the link to the first movie. See here. (Pay attention to the actual URL of the search links)

So how would I make it add links to all the movies correctly. I was thinking preg_replace because I could use regex. But I have no idea how to use regex :confused:

Please help. Thanks :D Heres my script...

function get_inner_string($a,$b,$c) 
  $y = explode($b,$a); 
  $x = explode($c,$y[1]); 
  return $x[0]; 

//Get Page 
$file = 'http://www.imdb.com/chart/top'; 

//Open Page 
$open_file = file_get_contents($file); 

//Find the list 
$find_ad = get_inner_string($open_file, '<i>For this top 250, only votes from regular voters are considered.</i>', 'The formula for calculating the Top Rated 250 Titles gives a <b>true Bayesian estimate</b>:'); 

//Add http://www.imdb.com/ to the URL's 
$new_page = str_replace('a href="/title/', 'a href="http://www.imdb.com/title/', $find_ad); 

//Find movie name 
$find_movie = get_inner_string($new_page, '/">', '</a>');

//Search URL
$search_url = '<a href="http://www.theflickzone.com/search.php?do=process&sortby=lastpost&titleonly=true&query=' . $find_movie . '"> Search </a>';

$replace_search = str_replace(')</font>', ') - ' . $search_url . '</font>', $new_page); 

echo $replace_search; 


first, after $find_ad has been set, I would split the data by $list = explode("</tr>",$find_ad) then do a replacement on each item like so:

$list=preg_replace("/([^>])(</a> \(\d{4}\))/",'\1\2 - <a href="http://www.theflickzone.com/search.php?do=process&sortby=lastpost&titleonly=true&query=\1"> Search </a>',$list);

//then to wrap it up:
$new_page = '<table border="1" cellspacing="0" cellpadding="4" style="margin-right:30px;">';
foreach ($list as $item){
  $new_page .= $item . </tr>
$new_page .= "</table></p>";

The regex line should replace Movie Name</a> (year) with Movie Name</a> (year) - [YOUR LINK] though it's not tested. You could quite easily put the search link at the start of the name but you would have to incorporate the <a ...> tag into the regex.

Hope you can get your head around all that.

Regular expressions are quite simple.

Instead of matching exact characters like in your str_replace() you match patterns.

The pattern is held in two delimiters, denoting the start and end of the pattern. The delimiter has to be non-alphanumeric.
After the end delimiter, you have the modifiers (flags).

So a regular expression could be:


That matches just the letter a. The delimiter is |.

Another example:


This matches the sequence of characters "ape". The "i" after the second delimiter is a modifier that means the match is case-insensitive. So "ape" can be in any case: eg: "aPE" would be a match.

There are special characters used to match patterns. The most used is the fullstop. "."

The fullstop matches any single character.



Would match "aPE" as well as "abe" etc.

The other two most used characters are * and +.

* means 0 or more of the character to the left of it.
+ means 1 or more of the character to the left of it.



this would match "appe" or even "ae" since we can match 0 or more of ".". Since . is any character, it can be 0 or more of any character between a and e.

This is just the fundamentals, Regex is very powerful. Take a look at:


it has a lot of information on regular expressions.

commented: Good idea explaining the fundamentals +4