Alright, since you responded with a great example of how to do it with regular expressions, I guess I can provide an xpath example using the DOM as i mentioned previously.
<?php
$sUrl = 'http://www.google.com';
$oDom = new DomDocument();
@$oDom->loadHTMLFile( $sUrl );
$oXpath = new DomXpath($oDom);
//Could also be //@href | //@src i just think the one used gives you more finite control over the result set.
$oRes = $oXpath->query("//a/@href | //img/@src | //script/@src");
$i=0;
foreach($oRes as $h1) {
echo $h1->nodeValue . '<br>';
$i++;
}
echo $i.' urls found in page.<br /><br />';
http://images.google.com/imghp?hl=en&tab=wi
http://maps.google.com/maps?hl=en&tab=wl
http://news.google.com/nwshp?hl=en&tab=wn
http://video.google.com/?hl=en&tab=wv
http://mail.google.com/mail/?hl=en&tab=wm
http://www.google.com/intl/en/options/
http://www.google.com/prdhp?hl=en&tab=wf
http://groups.google.com/grphp?hl=en&tab=wg
http://books.google.com/bkshp?hl=en&tab=wp
http://scholar.google.com/schhp?hl=en&tab=ws
http://www.google.com/finance?hl=en&tab=we
http://blogsearch.google.com/?hl=en&tab=wb
http://www.youtube.com/?hl=en&tab=w1
http://www.google.com/calendar/render?hl=en&tab=wc
http://picasaweb.google.com/home?hl=en&tab=wq
http://docs.google.com/?hl=en&tab=wo
http://www.google.com/reader/view/?hl=en&tab=wy
http://sites.google.com/?hl=en&tab=w3
http://www.google.com/intl/en/options/
/url?sa=p&pref=ig&pval=3&q=http://www.google.com/ig%3Fhl%3Den%26source%3Diglk&usg=AFQjCNFA18XPfgb7dKnXfKz7x7g1GDH1tg
https://www.google.com/accounts/Login?continue=http://www.google.com/&hl=en
/intl/en_ALL/images/logo.gif
/advanced_search?hl=en
/preferences?hl=en
/language_tools?hl=en
/intl/en/ads/
/services/
/intl/en/about.html
/intl/en/privacy.html
29 links found in page.
The only thing to be aware of here, is urls that are relative and not full paths. You would need to put some logic in place to add the domain back to them if its not there already.