Hello all, is there some python code or program to crawl instagram? I would like to get the user Username, Bio and 5-6 random pics. - Thank you

Member Avatar
Member Avatar
+0 forum 7

Hello, so i tried to make my own insta crawler but having some dificulties, here is the code for now: import requests from bs4 import BeautifulSoup def insta_spider(max_pages): page = 1 while page <= max_pages: url = 'https://instagram.com/xenia/' source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, "html.parser")            for link in soup.findAll('a', {'class': '_2g7d5 notranslate _7qk7w'}): href = "https://instagram.com/" + link.get('href') title = link.string print(href) print(title) #get_single_item_data(title) page += 1 insta_spider(1) so basicly i want to get the `href` tags with the class name `_2g7d5 notranslate _7qk7w` of the current instagram profile which is `https://instagram.com/xenia/` …

Member Avatar
Member Avatar
+0 forum 1

Hi. I'm creating a web crawler, I want to send the output into a txt file, how can I do it? And I also want to give a path to the script to set the directory. How can I do it now?

Member Avatar
Member Avatar
+0 forum 1

Hello. I was looking for a tutorial or any example of creating web crawler that i found this code somewhere and copied and pasted to test it: First, it is a web crawler, right? Because when i gave it a url of a website, the output was some linkes were published on the terminal. Second, if you test it yourself, you will see that linkes will divided into some parts with the title `Scanning depth 1 web` and so on (the number will change). What is that for? What does it mean? What does depth number web means? Third, i …

Member Avatar
Member Avatar
+0 forum 5

Hello. I have a homework. I have asked to create a web crawler that be able to enter into a music website and then for the first step, collect the name of singers that their names starts with the letter "A". Now i need a little help for this step. How my crawler should understand wich words in that page are the singers names?! The crawler should find their names in a special tag, correct?! But what kind of tag?! Their names could be in any tag like <h4></h4> for example or in a single <p></p> tag or in a …

Member Avatar
Member Avatar
+0 forum 12

Hello. I'm trying to create a web crawler. I've read about web crawler's duty and about how it works and what he does. But just need more information. Could you please tell me what does a web crawler can do? What kind of duty i can define for my web crawler? What can i ask it to do?

Member Avatar
Member Avatar
+0 forum 4

Hi again. I want to create a robot or spider or crawler with python urllib. Still couldn't find any good tutorial. Any suggestion?!

Member Avatar
Member Avatar
+0 forum 3

hello Please help me. I need pseudo code or Algorithm or flowchart of Weighted page rank algorithm that search engine can use for ranking the websites in its search results. Please help. thanks in advance.

Member Avatar
+0 forum 0

Hello Please help me to understand the following What are Compilation time Parameters?? Is Excetution time and run time are same?? Please answer above in realtion with web development. Also is this line correct ?? Our thesis Main objective is to use effective web-crawling by the means of cluster analysis and page ranking algorithms to make it enhanced in terms of Execution Time, Compilation Time Parameters.

Member Avatar
+0 forum 0

Hi folks I want to universal website crawler using PHP, so my crawler will work on any given site. By using my web application, user will input any site, will provide input, what he needs to get from given site and will click on Start button. Then my web application will begin to get data from source website. I am using iframe for this purpose , load page in iframe and using jquery I get class and tags name of spacific area from user. But when I load external website like ebay or amazon etc it does not work, as …

Member Avatar
Member Avatar
+0 forum 1

Hello everyone, Maybe someone out there can help me. I have this code that I am using and for the life of me it keeps throwing an error: If Len(ComboBox1.Text) < 1 Then MsgBox("You need to specify a target for this to operate properly.", vbExclamation, "Error") Exit Sub End If GetCleanURL() WebBrowser1.Navigate(Server_Addy_RAW) Dim links As HtmlElementCollection = WebBrowser1.Document.Links For Each link As HtmlElement In links Linkbox.Items.Add(link.GetAttribute("href")) Next the point of the code is to navigate to a web site and then pull the links and add them to a list box. When that is done it keeps crawling the web …

Member Avatar
Member Avatar
+0 forum 3

Apologies first if this is covered elsewhere - I searched but could not find. I am looking for a way to search the web for the presence of a JavaScript code snippet within the HTML <body> of a web page. I would specify the code snippet and send the bot on its way, it would come back with either a number of results, or a list of pages. I realise there are billions of web pages so don't know whether this is feasilble or not. The purpose is to determine the number of participating sites in a particular network. (Currently …

Member Avatar
Member Avatar
+0 forum 12

How can i create a php script that will search websites for articles that the contains are string?

Member Avatar
Member Avatar
+0 forum 1

import java.net.*; import java.io.*; import java.util.Date; import java.util.*; public class crawling { //Properties private String startUrl; private int maxUrls = -1; private boolean limitHost = false; private boolean crawling = false; private Date currentDate; private int counter=0; // Set up crawl lists. HashSet crawledList = new HashSet(); LinkedHashSet toCrawlList = new LinkedHashSet(); // Cache of robot disallow lists. private HashMap disallowListCache = new HashMap(); // user-defined crawler's objects variables WebPage pageContents; dbOracle webDB ; UrlManager currentUrl; //hold configuration file name String configFile = null; //Constructor public crawling() throws Exception { configFile = "crawler.conf"; crawling = false; counter = 0; readConfigFile(configFile); …

Member Avatar
Member Avatar
+0 forum 1

Hello, I'm doing a small project that aims to develop a search engine. What I need is a web crawler that can scrape backlinks for a website. I need to scratch: Title, PageText, Page Size, Back Links Does anyone know of a good web crawler that can give me that? [I]need help![/I]

Member Avatar
+0 forum 0

I want to report deadlinks on my site, my current script does work but it allows search engine bots to click the dead link report which makes it hard for me to determine which reports are form visitors and which are from bots. What am I doing wrong here? As the bots are still able to click the link and send the report to me. Any help is much appreciated Thanks. [CODE] $x=$_GET['Deadlink']; $agent=$_GET['agent']; $bots = array( 'bing' => 'http://onlinehelp.microsoft.com/en-us/bing/gg132928.aspx', 'yahoo' => 'http://help.yahoo.com/help/us/ysearch/slurp', 'google' => 'http://www.google.com/support/webmasters/bin/answer.py?answer=182072', 'ask'=>'http://www.ask.com/questions-about/Webmaster-Tool', ); $agent = strtolower($_SERVER['HTTP_USER_AGENT']); foreach( $bots as $name => $bot) { if(stripos($agent,$bot)!==false) { …

Member Avatar
Member Avatar
+0 forum 3

I have many redirect scripts on my site (they compute some PHP then redirect to the relevant page) - however on google webmaster they all kept coming up as "Soft-404" errors, which I read are bad for PR. A while ago I restricted googlebot's access to my /site/ folder, which contains all these redirect scripts to prevent this, which has worked fine, however I'm concerned this might be preventing the crawler from actually navigating the site to get to other pages. Is it safe to keep these redirect scripts restricted, will googlebot still be able to look around and sort …

Member Avatar
Member Avatar
+0 forum 8

My code is suppose to crawl web pages, index the links, then crawl those web pages and on and on again! But it won't work? I get no errors what is wrong? I think it gets into the foreach but doesn't make it to the $DCheck if statement! [CODE]<?php if(empty($_SESSION['page'])) { $original_file = file_get_contents("http://www.yahoo.com/"); } else { $original_file = file_get_contents($_SESSION['page']); } $stripped_file = strip_tags($original_file, "<a>"); preg_match_all("/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is", $stripped_file, $matches); //DEBUGGING //$matches[0] now contains the complete A tags; ex: <a href="link">text</a> //$matches[1] now contains only the HREFs in the A tags; ex: link foreach( $matches[1] as $key => $value) { echo "1"; …

Member Avatar
Member Avatar
+0 forum 5

Okay let's say I open the url, [url]http://www.nothing.com/[/url] using php's fopen function. then I do this, [CODE] <?php $S = sss $url = "http://www.nothing.com/" . $S . "&ql=1"; $open = fopen($url,"r"); while(! feof($open)) { echo fgets($open). "<br />"; } ?> [/CODE] That above returns all of the page I'm looking at. My goal is to have some sort of php function to sort of collect specific information from the page for example: I want to crawl the page and then find where it says name: next, I want to see what is next to name example: Name: Bob York Then …

Member Avatar
Member Avatar
+0 forum 3

So I'm building a web crawler for a pet project I've been working on. I'm using tutorial code for the crawler then building on it. I've done extensive troubleshooting and haven't had any luck. The problem: [LIST] [*]Roughly half the websites return content, but all of them return headers. [*]Some websites return content for some pages but not for others. [/LIST] What I've tried: [LIST] [*]Setting the user agent to my browser user agent to ensure it's not the robots.txt file. [*]Comparing headers from all the sites to see if there is any pattern in the headers. That is, all …

Member Avatar
Member Avatar
+1 forum 4

I found a crawl detection script on the internet that works great. But at the moment I only have Googlebot. I'm trying to add YahoosLurp, and Ask Jeeves, and any other popular web crawler. PHP is definetly my weakness Heres the script [CODE] <?php $useragent = $_SERVER["HTTP_USER_AGENT"]; if (stripos($useragent,"Googlebot")) { //Found! Create file $file = fopen("crawled.txt","a"); fwrite($file,"You've been crawled\n"); } ?> [/CODE] Its saved in a text file called crawled.txt Im trying to add extra crawlers to the script, with the date.

Member Avatar
Member Avatar
+0 forum 1

The End.