Hello all!!
I've been making websites for many years, and recently I decided to get more into SEO. So here's my question:
I would like to know what happens if I wrote in the robots.txt:

User-agent: *
Disallow: /
Sitemap: http://www.example.com/sitemap.xml

And in the sitemap.xml, there are thousand of links linking to different pages in a directory, for example:

1 => http://www.example.com/site/index.php
2 => http://www.example.com/site/index.php?lang=en
3 => http://www.example.com/site/shopping.php
4 => http://www.example.com/site/picture.php
etc...

In this case, as I understand what robots.txt and sitemap.xml do;
First, robots.txt disallow Search Engine (Let's talk about Google) to index any file or folder in the domain name: example.com. However, google will look at the sitemap.xml and find that it has to index the concerned links.
What happens in this situation ?

Moreover I would like to know, what happens when a page has the meta robots set to noindex, but at the same time it appears on the sitemap.xml. What happens in this situation as well ?

Finally, I would like to know if I updated my path to the sitemap for my website on google webmaster tools, will it be enough for google to go and check it or it should also appear on the robots.txt ?

Thank you

Recommended Answers

All 4 Replies

Member Avatar for LastMitch

And in the sitemap.xml, there are thousand of links linking to different pages in a directory, for example:

@cmps

I am bit confused with your sitemap.xml, I don't see an array like that before

It should look like this:

http://www.sitemaps.org/protocol.html

@LastMitch you're totally correct, I just wrote this example to show links in the sitemap pointing to pages not allowed by the robot. And I was wondering what happens in such a conflict. Which one dominates ?

Here's a correction for the sitemap.xml code:

<url>
  <loc>http://www.example.com/index.php?test=test</loc>
  <lastmod>2013-10-12</lastmod>
  <changefreq>weekly</changefreq>
  <priority>1.00</priority>
</url>

 etc...

Thank you, I should have posted a better example of sitemap.xml in the first post

Member Avatar for LastMitch

And I was wondering what happens in such a conflict. Which one dominates ?

There's no conflict. robot.txt reads sitemap.xml if you put it in there.

The sitemap.xml has all the links on the website.

Great, thank you for replying :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.