Instagram Web Crawler

Question

Stefce 146 Posting Pro

7 Years Ago

Hello, so i tried to make my own insta crawler but having some dificulties, here is the code for now:

import requests
    from bs4 import BeautifulSoup

    def insta_spider(max_pages):
        page = 1
        while page <= max_pages:
            url = 'https://instagram.com/xenia/'
            source_code = requests.get(url)
            plain_text = source_code.text
            soup = BeautifulSoup(plain_text, "html.parser")
            for link in soup.findAll('a', {'class': '_2g7d5 notranslate _7qk7w'}):
                href = "https://instagram.com/" + link.get('href')
                title = link.string
                print(href)
                print(title)
                #get_single_item_data(title)
            page += 1

    insta_spider(1)

so basicly i want to get the href tags with the class name _2g7d5 notranslate _7qk7w of the current instagram profile which is https://instagram.com/xenia/ but i get nothing gathered Process finished with exit code 0. Someone knows whats the problem with this code or instagram does not allow to crawl?

EDIT: Somehow instagram doesnt display the classes i want to gather in the View Source page how do i solve this?

python seo

Edited 7 Years Ago by Stefce

2 Contributors
1 Reply
2K Views
7 Hours Discussion Span
Latest Post 7 Years Ago Latest Post by rproffitt

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

rproffitt 2,701 https://5calls.org Moderator · Answer 1 · 2017-08-13T19:25:01+00:00

After your last post about this I decided to check out the Instagram API. But disaster there as I read this:

You cannot use the API Platform to crawl or store users' media without their express consent.

So that could be why it's borked. Instagram likely breaks what folk have created which means you have to fall back to web scraping methods.
https://www.quora.com/How-can-I-crawl-Instagram-without-using-API kicks around the 3 methods but one of them is dead before you started.