Hello,

I am using this very basic form of the dom scraper to scrape image urls from tumblr webpage, it kinda does what i am looking for but i am facing two issues.
1. Its scraping all links, i want it to scrape only links that contain "media.tumblr.com"
2. I want to scrape the next pages too ex.

blogname.tumblr.com/mobile/page/2
blogname.tumblr.com/mobile/page/3

Can Someone please help me with this? i am really having hard time to figure this out :( i really suck at php :P

<?php
    include_once('../../simple_html_dom.php');
    $html = file_get_html('http://beneathmyveins.tumblr.com/mobile');
    foreach($html->find('a') as $element)
           echo $element->href . '<br>'; 
?>

Thanks in advance. Any help would be greatly appreciated.

Regards,
Milton

Recommended Answers

All 2 Replies

Limiting the results to just links containing "media.tumblr.com" is easy:
if(strpos($element->href, 'media.tumblr.com') !== false) {
echo $element->href . '<br>';'}'
will remove any that don't include that string

For the next pages, put a for loop around your scrapping code and add the loop iteration onto the URL each time

@Hericles : That worked like a charm :)
Thanks so much :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.