0

Hello all!!
I've been making websites for many years, and recently I decided to get more into SEO. So here's my question:
I would like to know what happens if I wrote in the robots.txt:

User-agent: *
Disallow: /
Sitemap: http://www.example.com/sitemap.xml 

And in the sitemap.xml, there are thousand of links linking to different pages in a directory, for example:

1 => http://www.example.com/site/index.php
2 => http://www.example.com/site/index.php?lang=en
3 => http://www.example.com/site/shopping.php
4 => http://www.example.com/site/picture.php
etc...

In this case, as I understand what robots.txt and sitemap.xml do;
First, robots.txt disallow Search Engine (Let's talk about Google) to index any file or folder in the domain name: example.com. However, google will look at the sitemap.xml and find that it has to index the concerned links.
What happens in this situation ?

Moreover I would like to know, what happens when a page has the meta robots set to noindex, but at the same time it appears on the sitemap.xml. What happens in this situation as well ?

Finally, I would like to know if I updated my path to the sitemap for my website on google webmaster tools, will it be enough for google to go and check it or it should also appear on the robots.txt ?

Thank you

Edited by cmps

2
Contributors
4
Replies
20
Views
4 Years
Discussion Span
Last Post by cmps
0

And in the sitemap.xml, there are thousand of links linking to different pages in a directory, for example:

@cmps

I am bit confused with your sitemap.xml, I don't see an array like that before

It should look like this:

http://www.sitemaps.org/protocol.html

0

@LastMitch you're totally correct, I just wrote this example to show links in the sitemap pointing to pages not allowed by the robot. And I was wondering what happens in such a conflict. Which one dominates ?

Here's a correction for the sitemap.xml code:

<url>
  <loc>http://www.example.com/index.php?test=test</loc>
  <lastmod>2013-10-12</lastmod>
  <changefreq>weekly</changefreq>
  <priority>1.00</priority>
</url>

 etc...

Thank you, I should have posted a better example of sitemap.xml in the first post

0

And I was wondering what happens in such a conflict. Which one dominates ?

There's no conflict. robot.txt reads sitemap.xml if you put it in there.

The sitemap.xml has all the links on the website.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.