0

Hello,

I am using this very basic form of the dom scraper to scrape image urls from tumblr webpage, it kinda does what i am looking for but i am facing two issues.
1. Its scraping all links, i want it to scrape only links that contain "media.tumblr.com"
2. I want to scrape the next pages too ex.

blogname.tumblr.com/mobile/page/2
blogname.tumblr.com/mobile/page/3

Can Someone please help me with this? i am really having hard time to figure this out :( i really suck at php :P

<?php
    include_once('../../simple_html_dom.php');
    $html = file_get_html('http://beneathmyveins.tumblr.com/mobile');
    foreach($html->find('a') as $element)
           echo $element->href . '<br>'; 
?>

Thanks in advance. Any help would be greatly appreciated.

Regards,
Milton

Edited by pritaeas: Moved to web dev.

2
Contributors
2
Replies
16
Views
1 Year
Discussion Span
Last Post by miltonbburke
1

Limiting the results to just links containing "media.tumblr.com" is easy:
if(strpos($element->href, 'media.tumblr.com') !== false) {
echo $element->href . '<br>';'}'
will remove any that don't include that string

For the next pages, put a for loop around your scrapping code and add the loop iteration onto the URL each time

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.