Extracting a link from a webpage.

Question

Edel_ 0 Newbie Poster

16 Years Ago

I'm just starting out with PHP so I thought I'd do something simple.
I modified an existing bit of code but it doesn't seem to work:

<?php  
  
$url = $_GET['DLURL'];  
  
// Fetch page  
$string = FetchPage($url);  
  
// Regex that extracts the urls from links  
  
$links_regex = '/<a[^/>]*'.  
  
'href=["|\']([^javascript:].*)["|\']/Ui';  
  
preg_match_all($links_regex, $string, $out, PREG_PATTERN_ORDER);  
  
echo "<pre>"; print_r($out); echo "</pre>";  
  
function FetchPage($path)  
{  
$file = fopen($path, "r");   
  
if (!$file)  
{  
exit("The was a connection error!");  
}   
  
$data = '';  
  
while (!feof($file))  
{  
// Extract the data from the file / url  
  
$data .= fgets($file, 1024);  
}  
return $data;  
}  
?>

I'd expect it to print all the links in a page but it doesn't (gives the !file case). It's hosted on t35.com. I'm passing an additional '?DLURL="google.com"' in the URL.

If this is solved I'll get to what I'm actually trying to do :) Thanks.

php regex

2 Contributors
4 Replies
111 Views
3 Days Discussion Span
Latest Post 16 Years Ago Latest Post by Edel_

All 4 Replies

cwarn23 387 Occupation: Genius

16 Years Ago

All fixed. Try this:

<?php  
  
$url = $_GET['DLURL'];  
  
// Fetch page  
$string = FetchPage($url);  
  
// Regex that extracts the urls from links  
  
$links_regex = '/\<a[^\/\>]*'.  
  
'href=["|\']([^javascript:].*)["|\']/Ui';  
  
preg_match_all($links_regex, $string, $out, PREG_PATTERN_ORDER);  
  
echo "<pre>"; print_r($out[1]); echo "</pre>";  
  
function FetchPage($path)  
{  
$file = fopen($path, "r");   
  
if (!$file)  
{  
exit("The was a connection error!");  
}   
  
$data = '';  
  
while (!feof($file))  
{  
// Extract the data from the file / url  
  
$data .= fgets($file, 1024);  
}  
return $data;  
}  
?>

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Edel_ 0 Newbie Poster · Answer 1 · 2009-05-31T20:28:08+00:00

Thanks. I just tried it and got this:

Warning: fopen() [function.fopen]: URL file-access is disabled in the server configuration in [...]/test3.php on line 20

Warning: fopen(http://www.google.com) [function.fopen]: failed to open stream: no suitable wrapper could be found in [...]/test3.php on line 20
The was a connection error!

It's a no-go with the 'URL file-access is disabled' I suppose. Maybe I should switch servers?

cwarn23 387 Occupation: Genius Team Colleague Featured Poster · Answer 2 · 2009-06-01T12:11:49+00:00

I've just added a url validator for you but it requires the curl extension to be enabled which we all hope is enabled on your server. Below is the new script with a url validator and the error line fixed:

<?php  
  
$url = $_GET['DLURL'];  
  
// Fetch page  
$string = FetchPage($url);  
  
// Regex that extracts the urls from links  
  
$links_regex = '/\<a[^\/\>]*'.  
  
'href=["|\']([^javascript\<b\>\<\/b\>\:].*)["|\']/Ui';  
  
preg_match_all($links_regex, $string, $out, PREG_PATTERN_ORDER);  
  
echo "<pre>"; print_r($out[1]); echo "</pre>";  
  
function FetchPage($path)  
{
$handle   = curl_init($path);
if ($handle!==false){
curl_setopt($handle, CURLOPT_HEADER, false);
curl_setopt($handle, CURLOPT_FAILONERROR, true);  // this works
curl_setopt($handle, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); // request as if Firefox   
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, false);
$connectable = curl_exec($handle);
curl_close($handle);
} else {
$connectable=false;
}
if ($connectable) {
$file = fopen($path, "r");   
}
if (!$file || !$connectable) {
exit("The was a connection error!");  
}   
$data = '';
while (!feof($file)) {  
// Extract the data from the file / url   
$data .= fgets($file, 1024);  
}  
return $data;  
}  
?>

Edel_ 0 Newbie Poster · Answer 3 · 2009-06-04T01:36:50+00:00

I just tried it with a little server I set up. Works like a charm, thanks :)

Extracting a link from a webpage.

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers