943,754 Members | Top Members by Rank

Ad:
  • PHP Discussion Thread
  • Marked Solved
  • Views: 2540
  • PHP RSS
Apr 13th, 2009
0

Extract links from a webpage

Expand Post »
Hi All
I am new to PHP and just got a project that needs me to extract links from the page, i have got the part of page that have links in a string and was wondering if i can extract all the links in that string.

String is
<div class="usrsbt">
<div style="border-top: 2px solid rgb(255, 153, 0); border-left: 1px solid rgb(233, 233, 233); border-right: 1px solid rgb(233, 233, 233); width: 100px; font-size: 13px; padding-left: 5px; cursor: pointer;">Submitted Links</div><hr size="1" width="550" color="#e9e9e9"><div style="overflow: auto; max-height: 250px;"><div style="margin-top: 5px;"></div><div style="margin-top: 2px; width: 550px; height: 20px; background-color: rgb(233, 233, 233); font-size: 13px;"><div style="float: left; font-variant: small-caps;">movie link (megavideo) :</div><div style="float: left; margin-left: 10px;">Link <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://www.megavideo.com/?v=q8mlbpov">1</a> </div><div style="float: right;"><a href="javascript:DeadLink('http://www.movies-on-demand.tv/report.php?id=2345')"><img title="Report Dead Link" src="http://img.movies-on-demand.tv/templates/mod/images/broken.gif"></a></div></div><div style="margin-top: 2px; width: 550px; height: 20px; background-color: rgb(255, 255, 255); font-size: 13px;"><div style="float: left; font-variant: small-caps;">movie link (supernovatube) :</div><div style="float: left; margin-left: 10px;">Link <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://www.supernovatube.com/human.php?viewkey=870858bae14ac0b6d9f7">1</a> </div><div style="float: right;"><a href="javascript:DeadLink('http://www.movies-on-demand.tv/report.php?id=2345')"><img title="Report Dead Link" src="http://img.movies-on-demand.tv/templates/mod/images/broken.gif"></a></div></div><div style="margin-top: 2px; width: 550px; height: 20px; background-color: rgb(233, 233, 233); font-size: 13px;"><div style="float: left; font-variant: small-caps;">movie link (Zshare) :</div><div style="float: left; margin-left: 10px;">Link <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://www.supernovatube.com/human.php?viewkey=25b5152132ed77abbcc7">1</a><a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://megavideo.com/?v=qz3563j8">2</a> <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://megavideo.com/?v=e0an8vlj">3</a> <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://megavideo.com/?v=ppp3oaws">4</a> </div><div style="float: right;"><a href="javascript:DeadLink('http://www.movies-on-demand.tv/report.php?id=2345')"><img title="Report Dead Link" src="http://img.movies-on-demand.tv/templates/mod/images/broken.gif"></a></div></div><a style="color: rgb(255, 0, 102); font-weight: bold; margin-left: 30px; float: right; margin-top: 3px; margin-right: 14px;" target="resultframe" href="javascript:void(0);" onclick="javascript:alert('You do not have permission to submit links. Please register and login to enable link submission.');">+ Add Link</a></div></div><br clear="all">

the names you are seeing in red are categories and green are links under them respectively so i want to have category names and links under it, that i can put in a db table.

Please help
Thanks in advance
Similar Threads
Reputation Points: 10
Solved Threads: 0
Light Poster
jyotiu is offline Offline
33 posts
since Sep 2008
Apr 13th, 2009
0

Re: Extract links from a webpage

Hi,

I'm returning to this forum from long time ago... Looking at this issue it isn't difficult at all... just dealing a bit with regular expressions, preg_match_all() should be the function that is worth to use e.g.
to store all links into an array:
php Syntax (Toggle Plain Text)
  1. $pattern = "/href=\x22([^\x22]*)\x22/";
  2. preg_match_all($pattern, $string, $links);

Then, would be cool to take away javascript stuff from links, populating a new array e.g.
php Syntax (Toggle Plain Text)
  1. foreach($links[1] as $link){
  2. if(strpos($link, "javascript") === FALSE){
  3. $filtered_links[] = $link;
  4. }
  5. }

Little tuning would be necessary on the search pattern to extract the names on another array and use of mysql_query() to insert the arrays into a database.
Last edited by martin5211; Apr 13th, 2009 at 9:51 pm.
Reputation Points: 52
Solved Threads: 23
Posting Whiz in Training
martin5211 is offline Offline
271 posts
since Aug 2007

This thread is solved

Either the thread starter or a moderator has marked this thread as solved. You can most likely trust the responses and answers given. There is most likely no reason for any further responses to be posted here. If you have a related question, please start a new thread in this forum instead.

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in PHP Forum Timeline: query case ?
Next Thread in PHP Forum Timeline: CSRF





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC