what is robots.txt?

please define robot.txt what is it and how does it works?

Recommended Answers

All 41 Replies

Hi friends,

Meaning for Robots.txt:
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines .

you are interested to learn more about it, visit http://www.robotstxt.org/ or you can go straight to the Standard for Robot Exclusion.

Shewag

Robots.txt file is a file that gives the Search Engine crawlers the right instructions. E.g such as where is the sitemap.xml file and where to not follow.

Robots.txt file allows or prevents Search engine crawlers to enter your site. It is like a instruction manual or a map for the crawlers to know where to crawl.

What they're telling you is that the internet archive did not archive the site due to a preference set in that site’s robot.txt file.

The robot.txt file is checked by spiders, like the way back machine, to determine whether or not the site’s owner wants that particular spider to index their site.

Robots.txt is a file that directs and gives instructions to the search engine spiders on what to follow and what not.

Karol said it all. there are so many members here that will surely help.

robots.txt basically tells search engine spiders not to index certain areas of your site. You can protect private areas of sites from being visible to search engines, and thus to everyone.

Robots.txt is a text file you put on your site to tell search robots which pages you would like them not to visit.

Robot.txt can help to get index ur site and crawl your site page

Robots.txt is used for giving instructions about their site to web robots. It resides under our site root directory. In my own experiences, it might be also a leak for our sites since it is only a plain text file and would be found easily by hackers.

All the best,

robots.txt is a text file which can be used to restrict web robots to accessing your web site only in ways of which you approve.
The robots.txt file is a simple text file (no HTML), that must be placed in your root directory, for example:

http://www.yourwebsite.com/robots.txt

To make a robots.txt file you don't need any special knowledge. Just an ordinary Notepad application. Try to google this term and you'll find lots of examples with an explanation of what each line means. And you'll easily compose one for yourself.

robot.txt is a text file(not HTML) that tell search engine bot what to follow and what not to!

If you do not know what robots.txt are, chances are you are better off not specifying them in your code. As you could get into trouble by not having your site pages crawled... and I mean EVERY page

If you do not know what robots.txt are, chances are you are better off not specifying them in your code. As you could get into trouble by not having your site pages crawled... and I mean EVERY page

SE's don't need to crawl every page. For example, do they need to crawl your website's terms of service or privacy policy? Or how about a co-branded promo page that you know is only going to be 'live' for an extended period of time?

Not every page needs to be indexed and crawled by the Search Engines. In fact, it would be great for the WWW if webmasters were more selective of the pages they allowed the SE's to access. There would be a lot less glut out there. Think of all the rack space Google and Bing could save! Everyone... Go Green.. Use Robots.txt! ;)

robots.txt is basically used to tell the crawler where to crawl and which section you don't need to be crawled. While optimizing your site you can keep this text file under your root directory.

The information was too quite well.

Robot text file is like set of instruction to search engine spider about crawling site data.

i was getting into the same question... thanks all who replied

The robots.txt is a standard which was developed in 1994, when large-scale web indexing became popular; indexers such as Lycos and AltaVista used it

robots.txt is mainly use for leave those pages which you not want to read by Google...

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit.The location of robots.txt is very important. It must be in the main directory because otherwise search engines will not be able to find it.

Robots.txt File should be included so that the search engine bots don't get 404 errors when they look for it.
Just include the following 2 lines and drop it in the root.
User-agent: *
Disallow:

A text file placed in the root directory of a Web site that prohibits search engine spiders from indexing all or specific pages of the site.

The robots.txt file is a set of instructions for visiting robots (spiders) that index the content of your web site pages. For those spiders that obey the file, it provides a map for what they can, and cannot index.

Definition of the above robots.txt file:

User-agent: *
The asterisk (*) or wildcard represents a special value and means any robot.
Disallow:
The Disallow: line without a / (forward slash) tells the robots that they can index the entire site.

Example Code for robots.txt file:

User-agent: *
Disallow: /private/
Disallow: /images-saved/

Hello All
Robots.txt basically tells search engine spiders not to index certain areas of your site.

Code:
<meta name="robots" content="index,follow" />

its a file.. when you put on your site after that Google will be crawler you site..
-----------------

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.