30 Topics
|
|
@dani I got ChatGpt to fix my Crawler that was showing error. This code that was showing error: My Buggy Code ```` <?php //START OF SCRIPT FLOW. //Preparing Crawler & Session: Initialising Variables. //Preparing $ARRAYS For Step 1: To Deal with Xml Links meant for Crawlers only. //SiteMaps Details Scraped … |
|
Experienced Fellow Programmers, I asking questions to those who have experiences with web crawlers. I do not want my web crawler getting trapped onto some domain, while crawling it. Trapped and going in a loop for some reason. And so, what to look-out for to prevent loops ? 1.I know … |
|
@dani I got ChatGpt to fix my Crawler that was showing error. This code that was showing error: My Buggy Code ```` <?php //START OF SCRIPT FLOW. //Preparing Crawler & Session: Initialising Variables. //Preparing $ARRAYS For Step 1: To Deal with Xml Links meant for Crawlers only. //SiteMaps Details Scraped … |
|
Hiya, I do not understand why this crawler fails to echo found links on a page. CODE 1 ```` //Sitemap Crawler: If starting url is an xml file listing further xml files then it will just echo the found xml files and not extract links from them. //Sitemap Protocol: https://www.sitemaps.org/protocol.html … |
|
I was crawling http://www.imenik.hr (more specific: http://www.imenik.hr/imenik/trazi/1/ptt:51551.html) and it was working. I used PHP Symfony crawler, I even used proxies. I was testing it locally so I tried without proxies. I added 5 seconds of sleep time between each requests, I tried random sleep between requests, I tried 10 seconds. … |
|
Hello all, is there some python code or program to crawl instagram? I would like to get the user Username, Bio and 5-6 random pics. - Thank you |
|
Hello, so i tried to make my own insta crawler but having some dificulties, here is the code for now: import requests from bs4 import BeautifulSoup def insta_spider(max_pages): page = 1 while page <= max_pages: url = 'https://instagram.com/xenia/' source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, "html.parser") … |
|
Hi. I'm creating a web crawler, I want to send the output into a txt file, how can I do it? And I also want to give a path to the script to set the directory. How can I do it now? |
|
Hello. I was looking for a tutorial or any example of creating web crawler that i found this code somewhere and copied and pasted to test it: First, it is a web crawler, right? Because when i gave it a url of a website, the output was some linkes were … |
|
Hello. I have a homework. I have asked to create a web crawler that be able to enter into a music website and then for the first step, collect the name of singers that their names starts with the letter "A". Now i need a little help for this step. … |
|
Hello. I'm trying to create a web crawler. I've read about web crawler's duty and about how it works and what he does. But just need more information. Could you please tell me what does a web crawler can do? What kind of duty i can define for my web … |
|
Hi again. I want to create a robot or spider or crawler with python urllib. Still couldn't find any good tutorial. Any suggestion?! |
|
Which are the important aspects to keep in mind while creating a website that can help making a crawl able website? My previous websites are not crawled by Google. I have created those websites by using Wordpress. |
|
hello Please help me. I need pseudo code or Algorithm or flowchart of Weighted page rank algorithm that search engine can use for ranking the websites in its search results. Please help. thanks in advance. |
|
Hello Please help me to understand the following What are Compilation time Parameters?? Is Excetution time and run time are same?? Please answer above in realtion with web development. Also is this line correct ?? Our thesis Main objective is to use effective web-crawling by the means of cluster analysis … |
|
Hi folks I want to universal website crawler using PHP, so my crawler will work on any given site. By using my web application, user will input any site, will provide input, what he needs to get from given site and will click on Start button. Then my web application … |
|
Hello everyone, Maybe someone out there can help me. I have this code that I am using and for the life of me it keeps throwing an error: If Len(ComboBox1.Text) < 1 Then MsgBox("You need to specify a target for this to operate properly.", vbExclamation, "Error") Exit Sub End If … |
|
Are the crawled URLs stored in the database then they are traversed in BFS/DFS manner .... or is it something else ? Please Help! |
|
<?php include 'simple_html_dom.php'; function get_url_contents($url){ $crl = curl_init(); $timeout = 5; curl_setopt ($crl, CURLOPT_URL,$url); curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout); $ret = curl_exec($crl); curl_close($crl); return $ret; } $url = 'http://rural.nic.in'; $outhtml = get_url_contents($url); $html= str_get_html($outhtml); foreach($html->find('li')as $item) { echo $item."<br>"; } //print_r($outhtml); ?> I wish to print only … |
|
Apologies first if this is covered elsewhere - I searched but could not find. I am looking for a way to search the web for the presence of a JavaScript code snippet within the HTML <body> of a web page. I would specify the code snippet and send the bot … |
|
How can i create a php script that will search websites for articles that the contains are string? |
|
import java.net.*; import java.io.*; import java.util.Date; import java.util.*; public class crawling { //Properties private String startUrl; private int maxUrls = -1; private boolean limitHost = false; private boolean crawling = false; private Date currentDate; private int counter=0; // Set up crawl lists. HashSet crawledList = new HashSet(); LinkedHashSet toCrawlList = … |
|
Hello, I'm doing a small project that aims to develop a search engine. What I need is a web crawler that can scrape backlinks for a website. I need to scratch: Title, PageText, Page Size, Back Links Does anyone know of a good web crawler that can give me that? … |
|
I want to report deadlinks on my site, my current script does work but it allows search engine bots to click the dead link report which makes it hard for me to determine which reports are form visitors and which are from bots. What am I doing wrong here? As … |
|
I have many redirect scripts on my site (they compute some PHP then redirect to the relevant page) - however on google webmaster they all kept coming up as "Soft-404" errors, which I read are bad for PR. A while ago I restricted googlebot's access to my /site/ folder, which … |
|
My code is suppose to crawl web pages, index the links, then crawl those web pages and on and on again! But it won't work? I get no errors what is wrong? I think it gets into the foreach but doesn't make it to the $DCheck if statement! [CODE]<?php if(empty($_SESSION['page'])) … |
|
Okay let's say I open the url, [url]http://www.nothing.com/[/url] using php's fopen function. then I do this, [CODE] <?php $S = sss $url = "http://www.nothing.com/" . $S . "&ql=1"; $open = fopen($url,"r"); while(! feof($open)) { echo fgets($open). "<br />"; } ?> [/CODE] That above returns all of the page I'm looking … |
|
So I'm building a web crawler for a pet project I've been working on. I'm using tutorial code for the crawler then building on it. I've done extensive troubleshooting and haven't had any luck. The problem: [LIST] [*]Roughly half the websites return content, but all of them return headers. [*]Some … |
|
|
I found a crawl detection script on the internet that works great. But at the moment I only have Googlebot. I'm trying to add YahoosLurp, and Ask Jeeves, and any other popular web crawler. PHP is definetly my weakness Heres the script [CODE] <?php $useragent = $_SERVER["HTTP_USER_AGENT"]; if (stripos($useragent,"Googlebot")) { … |
3 days after I resubmit my sitemap.xml on GWT, I saw bunch of crawl errors on my GWT My Problem is GWT crawl a page on my site that doesn't exist, i mean the URL does not exist, The page is a 404 not found and it appears on crawl … |
The End.