| | |
Extract links from a webpage
Please support our PHP advertiser: PostgreSQL or MySQL? Compare and contrast the two most popular open source databases
Thread Solved |
•
•
Join Date: Sep 2008
Posts: 33
Reputation:
Solved Threads: 0
Hi All
I am new to PHP and just got a project
that needs me to extract links from the page, i have got the part of page that have links in a string and was wondering if i can extract all the links in that string.
String is
the names you are seeing in red are categories and green are links under them respectively so i want to have category names and links under it, that i can put in a db table.
Please help
Thanks in advance
I am new to PHP and just got a project
that needs me to extract links from the page, i have got the part of page that have links in a string and was wondering if i can extract all the links in that string.String is
<div class="usrsbt"> <div style="border-top: 2px solid rgb(255, 153, 0); border-left: 1px solid rgb(233, 233, 233); border-right: 1px solid rgb(233, 233, 233); width: 100px; font-size: 13px; padding-left: 5px; cursor: pointer;">Submitted Links</div><hr size="1" width="550" color="#e9e9e9"><div style="overflow: auto; max-height: 250px;"><div style="margin-top: 5px;"></div><div style="margin-top: 2px; width: 550px; height: 20px; background-color: rgb(233, 233, 233); font-size: 13px;"><div style="float: left; font-variant: small-caps;">movie link (megavideo) :</div><div style="float: left; margin-left: 10px;">Link <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://www.megavideo.com/?v=q8mlbpov">1</a> </div><div style="float: right;"><a href="javascript:DeadLink('http://www.movies-on-demand.tv/report.php?id=2345')"><img title="Report Dead Link" src="http://img.movies-on-demand.tv/templates/mod/images/broken.gif"></a></div></div><div style="margin-top: 2px; width: 550px; height: 20px; background-color: rgb(255, 255, 255); font-size: 13px;"><div style="float: left; font-variant: small-caps;">movie link (supernovatube) :</div><div style="float: left; margin-left: 10px;">Link <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://www.supernovatube.com/human.php?viewkey=870858bae14ac0b6d9f7">1</a> </div><div style="float: right;"><a href="javascript:DeadLink('http://www.movies-on-demand.tv/report.php?id=2345')"><img title="Report Dead Link" src="http://img.movies-on-demand.tv/templates/mod/images/broken.gif"></a></div></div><div style="margin-top: 2px; width: 550px; height: 20px; background-color: rgb(233, 233, 233); font-size: 13px;"><div style="float: left; font-variant: small-caps;">movie link (Zshare) :</div><div style="float: left; margin-left: 10px;">Link <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://www.supernovatube.com/human.php?viewkey=25b5152132ed77abbcc7">1</a><a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://megavideo.com/?v=qz3563j8">2</a> <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://megavideo.com/?v=e0an8vlj">3</a> <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://megavideo.com/?v=ppp3oaws">4</a> </div><div style="float: right;"><a href="javascript:DeadLink('http://www.movies-on-demand.tv/report.php?id=2345')"><img title="Report Dead Link" src="http://img.movies-on-demand.tv/templates/mod/images/broken.gif"></a></div></div><a style="color: rgb(255, 0, 102); font-weight: bold; margin-left: 30px; float: right; margin-top: 3px; margin-right: 14px;" target="resultframe" href="javascript:void(0);" onclick="javascript:alert('You do not have permission to submit links. Please register and login to enable link submission.');">+ Add Link</a></div></div><br clear="all">
the names you are seeing in red are categories and green are links under them respectively so i want to have category names and links under it, that i can put in a db table.
Please help
Thanks in advance
•
•
Join Date: Aug 2007
Posts: 189
Reputation:
Solved Threads: 14
Hi,
I'm returning to this forum from long time ago... Looking at this issue it isn't difficult at all... just dealing a bit with regular expressions, preg_match_all() should be the function that is worth to use e.g.
to store all links into an array:
Then, would be cool to take away javascript stuff from links, populating a new array e.g.
Little tuning would be necessary on the search pattern to extract the names on another array and use of mysql_query() to insert the arrays into a database.
I'm returning to this forum from long time ago... Looking at this issue it isn't difficult at all... just dealing a bit with regular expressions, preg_match_all() should be the function that is worth to use e.g.
to store all links into an array:
php Syntax (Toggle Plain Text)
$pattern = "/href=\x22([^\x22]*)\x22/"; preg_match_all($pattern, $string, $links);
Then, would be cool to take away javascript stuff from links, populating a new array e.g.
php Syntax (Toggle Plain Text)
foreach($links[1] as $link){ if(strpos($link, "javascript") === FALSE){ $filtered_links[] = $link; } }
Little tuning would be necessary on the search pattern to extract the names on another array and use of mysql_query() to insert the arrays into a database.
Last edited by martin5211; Apr 13th, 2009 at 9:51 pm.
![]() |
Similar Threads
- eb browser question (C#)
- Cant open certain links in a web page... (Viruses, Spyware and other Nasties)
- Websites content extraction team (Post your Resume)
- Beginner.........Books (MySQL)
Other Threads in the PHP Forum
- Previous Thread: query case ?
- Next Thread: CSRF
| Thread Tools | Search this Thread |
advanced apache api array beginner binary broken cakephp check checkbox class cms code cookies cron curl database date datepart display dropdownlist dynamic echo email eregi error execution file files folder form forms function functions google head href htaccess html if...loop image include includingmysecondfileinthechain insert ip javascript job joomla jquery key library limit link login mail menu mlm multiple mysql oop password paypal pdf pdfdownload php phpvotingscript problem query radio random recursion remote screen script search server sessions smarty sms sorting source space sql startup stored syntax system table traffic tutorial unicode update upload url validator variable video web youtube zend





