My website isn't getting crawled to my knowledge at all or very infrequently does my robots.txt file have something to do with that? My current file looks like

User-agent: *
Disallow: /

Is this good or bad for crawls?

Recommended Answers

All 4 Replies

Dude,

What the text meant is not to crawl any pages within and below the directory where the robot.txt is located.

If you don't want the spider to crawl the image directory, you can give the instruction like this

User-agent: *
Disallow: /images/

this

User-agent: *

is for all the robots or spiders.. I would disallow a crawl from an evil spider such as slurp, and allow the rest. so, my code will be something like this

User-agent: *

User-agent: Slurp
Disallow: /

The above will disallow the spider Slurp to crawl my site.

You must itimized all the not allowed bots.

Just to add to what has already been said, crawlers don't need to follow a robots.txt file. The majority will follow it, such as Google and Yahoo's crawlers will obey to the rules but there is nothing to stop me writing a crawler to crawl your site and completely ignore the rules you have set.

If you don't want the site crawled due to privacy etc., then you shall need a more secure method of stopping crawlers such as using .htaccess.

no robots.txt = crawl as much as you want.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.