30 Topics

Member Avatar for
Member Avatar for borobhaisab

@dani I got ChatGpt to fix my Crawler that was showing error. This code that was showing error: My Buggy Code ```` <?php //START OF SCRIPT FLOW. //Preparing Crawler & Session: Initialising Variables. //Preparing $ARRAYS For Step 1: To Deal with Xml Links meant for Crawlers only. //SiteMaps Details Scraped …

Member Avatar for borobhaisab
1
58
Member Avatar for borobhaisab

Experienced Fellow Programmers, I asking questions to those who have experiences with web crawlers. I do not want my web crawler getting trapped onto some domain, while crawling it. Trapped and going in a loop for some reason. And so, what to look-out for to prevent loops ? 1.I know …

Member Avatar for borobhaisab
0
72
Member Avatar for borobhaisab

@dani I got ChatGpt to fix my Crawler that was showing error. This code that was showing error: My Buggy Code ```` <?php //START OF SCRIPT FLOW. //Preparing Crawler & Session: Initialising Variables. //Preparing $ARRAYS For Step 1: To Deal with Xml Links meant for Crawlers only. //SiteMaps Details Scraped …

Member Avatar for AndreRet
1
90
Member Avatar for borobhaisab

Hiya, I do not understand why this crawler fails to echo found links on a page. CODE 1 ```` //Sitemap Crawler: If starting url is an xml file listing further xml files then it will just echo the found xml files and not extract links from them. //Sitemap Protocol: https://www.sitemaps.org/protocol.html …

Member Avatar for borobhaisab
0
132
Member Avatar for dado.d

I was crawling http://www.imenik.hr (more specific: http://www.imenik.hr/imenik/trazi/1/ptt:51551.html) and it was working. I used PHP Symfony crawler, I even used proxies. I was testing it locally so I tried without proxies. I added 5 seconds of sleep time between each requests, I tried random sleep between requests, I tried 10 seconds. …

0
25
Member Avatar for Stefce

Hello all, is there some python code or program to crawl instagram? I would like to get the user Username, Bio and 5-6 random pics. - Thank you

Member Avatar for peter_62
0
662
Member Avatar for Stefce

Hello, so i tried to make my own insta crawler but having some dificulties, here is the code for now: import requests from bs4 import BeautifulSoup def insta_spider(max_pages): page = 1 while page <= max_pages: url = 'https://instagram.com/xenia/' source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, "html.parser")     …

Member Avatar for rproffitt
0
2K
Member Avatar for Niloofar24

Hi. I'm creating a web crawler, I want to send the output into a txt file, how can I do it? And I also want to give a path to the script to set the directory. How can I do it now?

Member Avatar for Gribouillis
0
367
Member Avatar for Niloofar24

Hello. I was looking for a tutorial or any example of creating web crawler that i found this code somewhere and copied and pasted to test it: First, it is a web crawler, right? Because when i gave it a url of a website, the output was some linkes were …

Member Avatar for vegaseat
0
436
Member Avatar for Niloofar24

Hello. I have a homework. I have asked to create a web crawler that be able to enter into a music website and then for the first step, collect the name of singers that their names starts with the letter "A". Now i need a little help for this step. …

Member Avatar for iJunkie22
0
374
Member Avatar for Niloofar24

Hello. I'm trying to create a web crawler. I've read about web crawler's duty and about how it works and what he does. But just need more information. Could you please tell me what does a web crawler can do? What kind of duty i can define for my web …

Member Avatar for Niloofar24
0
325
Member Avatar for Niloofar24

Hi again. I want to create a robot or spider or crawler with python urllib. Still couldn't find any good tutorial. Any suggestion?!

Member Avatar for vegaseat
0
416
Member Avatar for crescendo

Which are the important aspects to keep in mind while creating a website that can help making a crawl able website? My previous websites are not crawled by Google. I have created those websites by using Wordpress.

Member Avatar for Kelly Burby
0
175
Member Avatar for hbk_star2006

hello Please help me. I need pseudo code or Algorithm or flowchart of Weighted page rank algorithm that search engine can use for ranking the websites in its search results. Please help. thanks in advance.

0
145
Member Avatar for hbk_star2006

Hello Please help me to understand the following What are Compilation time Parameters?? Is Excetution time and run time are same?? Please answer above in realtion with web development. Also is this line correct ?? Our thesis Main objective is to use effective web-crawling by the means of cluster analysis …

0
152
Member Avatar for Nadeem_2

Hi folks I want to universal website crawler using PHP, so my crawler will work on any given site. By using my web application, user will input any site, will provide input, what he needs to get from given site and will click on Start button. Then my web application …

Member Avatar for cereal
0
400
Member Avatar for amvx86

Hello everyone, Maybe someone out there can help me. I have this code that I am using and for the life of me it keeps throwing an error: If Len(ComboBox1.Text) < 1 Then MsgBox("You need to specify a target for this to operate properly.", vbExclamation, "Error") Exit Sub End If …

Member Avatar for amvx86
0
350
Member Avatar for apanimesh061

Are the crawled URLs stored in the database then they are traversed in BFS/DFS manner .... or is it something else ? Please Help!

Member Avatar for LastMitch
0
93
Member Avatar for apanimesh061

<?php include 'simple_html_dom.php'; function get_url_contents($url){ $crl = curl_init(); $timeout = 5; curl_setopt ($crl, CURLOPT_URL,$url); curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout); $ret = curl_exec($crl); curl_close($crl); return $ret; } $url = 'http://rural.nic.in'; $outhtml = get_url_contents($url); $html= str_get_html($outhtml); foreach($html->find('li')as $item) { echo $item."<br>"; } //print_r($outhtml); ?> I wish to print only …

Member Avatar for adam.adamski.96155
0
91
Member Avatar for monsterpot

Apologies first if this is covered elsewhere - I searched but could not find. I am looking for a way to search the web for the presence of a JavaScript code snippet within the HTML <body> of a web page. I would specify the code snippet and send the bot …

Member Avatar for aqeel.mmd
0
936
Member Avatar for dan_code_guru

How can i create a php script that will search websites for articles that the contains are string?

Member Avatar for JameB
0
136
Member Avatar for Priya Dharsini

import java.net.*; import java.io.*; import java.util.Date; import java.util.*; public class crawling { //Properties private String startUrl; private int maxUrls = -1; private boolean limitHost = false; private boolean crawling = false; private Date currentDate; private int counter=0; // Set up crawl lists. HashSet crawledList = new HashSet(); LinkedHashSet toCrawlList = …

Member Avatar for stultuske
0
318
Member Avatar for eoop.org

Hello, I'm doing a small project that aims to develop a search engine. What I need is a web crawler that can scrape backlinks for a website. I need to scratch: Title, PageText, Page Size, Back Links Does anyone know of a good web crawler that can give me that? …

0
135
Member Avatar for Blitz-labs.com

I want to report deadlinks on my site, my current script does work but it allows search engine bots to click the dead link report which makes it hard for me to determine which reports are form visitors and which are from bots. What am I doing wrong here? As …

Member Avatar for Blitz-labs.com
0
198
Member Avatar for Pinchanzee

I have many redirect scripts on my site (they compute some PHP then redirect to the relevant page) - however on google webmaster they all kept coming up as "Soft-404" errors, which I read are bad for PR. A while ago I restricted googlebot's access to my /site/ folder, which …

Member Avatar for Pinchanzee
0
254
Member Avatar for Joe34

My code is suppose to crawl web pages, index the links, then crawl those web pages and on and on again! But it won't work? I get no errors what is wrong? I think it gets into the foreach but doesn't make it to the $DCheck if statement! [CODE]<?php if(empty($_SESSION['page'])) …

Member Avatar for lordspace
0
242
Member Avatar for Joe34

Okay let's say I open the url, [url]http://www.nothing.com/[/url] using php's fopen function. then I do this, [CODE] <?php $S = sss $url = "http://www.nothing.com/" . $S . "&ql=1"; $open = fopen($url,"r"); while(! feof($open)) { echo fgets($open). "<br />"; } ?> [/CODE] That above returns all of the page I'm looking …

Member Avatar for chrishea
0
183
Member Avatar for blur0224

So I'm building a web crawler for a pet project I've been working on. I'm using tutorial code for the crawler then building on it. I've done extensive troubleshooting and haven't had any luck. The problem: [LIST] [*]Roughly half the websites return content, but all of them return headers. [*]Some …

Member Avatar for blur0224
1
291
Member Avatar for JayGeePee

I found a crawl detection script on the internet that works great. But at the moment I only have Googlebot. I'm trying to add YahoosLurp, and Ask Jeeves, and any other popular web crawler. PHP is definetly my weakness Heres the script [CODE] <?php $useragent = $_SERVER["HTTP_USER_AGENT"]; if (stripos($useragent,"Googlebot")) { …

Member Avatar for mbhanley
0
175
Member Avatar for theemerchant

3 days after I resubmit my sitemap.xml on GWT, I saw bunch of crawl errors on my GWT My Problem is GWT crawl a page on my site that doesn't exist, i mean the URL does not exist, The page is a 404 not found and it appears on crawl …

Member Avatar for Tiggerito
0
95

The End.