0

Hello, so i tried to make my own insta crawler but having some dificulties, here is the code for now:

import requests
    from bs4 import BeautifulSoup

    def insta_spider(max_pages):
        page = 1
        while page <= max_pages:
            url = 'https://instagram.com/xenia/'
            source_code = requests.get(url)
            plain_text = source_code.text
            soup = BeautifulSoup(plain_text, "html.parser")
            for link in soup.findAll('a', {'class': '_2g7d5 notranslate _7qk7w'}):
                href = "https://instagram.com/" + link.get('href')
                title = link.string
                print(href)
                print(title)
                #get_single_item_data(title)
            page += 1

    insta_spider(1)

so basicly i want to get the href tags with the class name _2g7d5 notranslate _7qk7w of the current instagram profile which is https://instagram.com/xenia/ but i get nothing gathered Process finished with exit code 0. Someone knows whats the problem with this code or instagram does not allow to crawl?

EDIT: Somehow instagram doesnt display the classes i want to gather in the View Source page how do i solve this?

Edited by Stefan_1

2
Contributors
1
Reply
21
Views
1 Month
Discussion Span
Last Post by rproffitt
0

After your last post about this I decided to check out the Instagram API. But disaster there as I read this:

You cannot use the API Platform to crawl or store users' media without their express consent.

So that could be why it's borked. Instagram likely breaks what folk have created which means you have to fall back to web scraping methods.
https://www.quora.com/How-can-I-crawl-Instagram-without-using-API kicks around the 3 methods but one of them is dead before you started.

Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.