| | |
Extract links from a webpage
Please support our PHP advertiser: PostgreSQL or MySQL? Compare and contrast the two most popular open source databases
Thread Solved |
•
•
Join Date: Sep 2008
Posts: 33
Reputation:
Solved Threads: 0
Hi All
I am new to PHP and just got a project
that needs me to extract links from the page, i have got the part of page that have links in a string and was wondering if i can extract all the links in that string.
String is
the names you are seeing in red are categories and green are links under them respectively so i want to have category names and links under it, that i can put in a db table.
Please help
Thanks in advance
I am new to PHP and just got a project
that needs me to extract links from the page, i have got the part of page that have links in a string and was wondering if i can extract all the links in that string.String is
<div class="usrsbt"> <div style="border-top: 2px solid rgb(255, 153, 0); border-left: 1px solid rgb(233, 233, 233); border-right: 1px solid rgb(233, 233, 233); width: 100px; font-size: 13px; padding-left: 5px; cursor: pointer;">Submitted Links</div><hr size="1" width="550" color="#e9e9e9"><div style="overflow: auto; max-height: 250px;"><div style="margin-top: 5px;"></div><div style="margin-top: 2px; width: 550px; height: 20px; background-color: rgb(233, 233, 233); font-size: 13px;"><div style="float: left; font-variant: small-caps;">movie link (megavideo) :</div><div style="float: left; margin-left: 10px;">Link <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://www.megavideo.com/?v=q8mlbpov">1</a> </div><div style="float: right;"><a href="javascript:DeadLink('http://www.movies-on-demand.tv/report.php?id=2345')"><img title="Report Dead Link" src="http://img.movies-on-demand.tv/templates/mod/images/broken.gif"></a></div></div><div style="margin-top: 2px; width: 550px; height: 20px; background-color: rgb(255, 255, 255); font-size: 13px;"><div style="float: left; font-variant: small-caps;">movie link (supernovatube) :</div><div style="float: left; margin-left: 10px;">Link <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://www.supernovatube.com/human.php?viewkey=870858bae14ac0b6d9f7">1</a> </div><div style="float: right;"><a href="javascript:DeadLink('http://www.movies-on-demand.tv/report.php?id=2345')"><img title="Report Dead Link" src="http://img.movies-on-demand.tv/templates/mod/images/broken.gif"></a></div></div><div style="margin-top: 2px; width: 550px; height: 20px; background-color: rgb(233, 233, 233); font-size: 13px;"><div style="float: left; font-variant: small-caps;">movie link (Zshare) :</div><div style="float: left; margin-left: 10px;">Link <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://www.supernovatube.com/human.php?viewkey=25b5152132ed77abbcc7">1</a><a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://megavideo.com/?v=qz3563j8">2</a> <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://megavideo.com/?v=e0an8vlj">3</a> <a rel="nofollow" style="color: rgb(0, 0, 221); text-decoration: underline;" target="_blank" href="http://megavideo.com/?v=ppp3oaws">4</a> </div><div style="float: right;"><a href="javascript:DeadLink('http://www.movies-on-demand.tv/report.php?id=2345')"><img title="Report Dead Link" src="http://img.movies-on-demand.tv/templates/mod/images/broken.gif"></a></div></div><a style="color: rgb(255, 0, 102); font-weight: bold; margin-left: 30px; float: right; margin-top: 3px; margin-right: 14px;" target="resultframe" href="javascript:void(0);" onclick="javascript:alert('You do not have permission to submit links. Please register and login to enable link submission.');">+ Add Link</a></div></div><br clear="all">
the names you are seeing in red are categories and green are links under them respectively so i want to have category names and links under it, that i can put in a db table.
Please help
Thanks in advance
•
•
Join Date: Aug 2007
Posts: 189
Reputation:
Solved Threads: 14
Hi,
I'm returning to this forum from long time ago... Looking at this issue it isn't difficult at all... just dealing a bit with regular expressions, preg_match_all() should be the function that is worth to use e.g.
to store all links into an array:
Then, would be cool to take away javascript stuff from links, populating a new array e.g.
Little tuning would be necessary on the search pattern to extract the names on another array and use of mysql_query() to insert the arrays into a database.
I'm returning to this forum from long time ago... Looking at this issue it isn't difficult at all... just dealing a bit with regular expressions, preg_match_all() should be the function that is worth to use e.g.
to store all links into an array:
php Syntax (Toggle Plain Text)
$pattern = "/href=\x22([^\x22]*)\x22/"; preg_match_all($pattern, $string, $links);
Then, would be cool to take away javascript stuff from links, populating a new array e.g.
php Syntax (Toggle Plain Text)
foreach($links[1] as $link){ if(strpos($link, "javascript") === FALSE){ $filtered_links[] = $link; } }
Little tuning would be necessary on the search pattern to extract the names on another array and use of mysql_query() to insert the arrays into a database.
Last edited by martin5211; Apr 13th, 2009 at 9:51 pm.
![]() |
Similar Threads
- eb browser question (C#)
- Cant open certain links in a web page... (Viruses, Spyware and other Nasties)
- Websites content extraction team (Post your Resume)
- Beginner.........Books (MySQL)
Other Threads in the PHP Forum
- Previous Thread: query case ?
- Next Thread: CSRF
| Thread Tools | Search this Thread |
Tag cloud for PHP
.htaccess access ajax apache api array beginner binary broken cakephp checkbox class cms code codingproblem cron curl database date directory display download dynamic echo email error file files folder form forms function functions google href htaccess html image include insert integration ip java javascript joomla limit link login loop mail memmory menu methods mlm mod_rewrite multiple mysql oop parse paypal pdf php problem query radio random recursion regex remote script search select send server sessions sms snippet soap source space speed sql static structure syntax system table tutorial up-to-date update upload url validation validator variable video web wordpress xml youtube





