954,360 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

robots.txt and 302 redirects

I have page1.html that is being 302 redirected (temporary redirect) to page2.html
page2.html is disallowed in my robots.txt file

Under normal circumstances, when googlebot encounters a 301 redirect from page1.html to page2.html, it will index page2.html, and when googlebot encounters a 302 redirect from page1.html to page2.html, it will index page1.html

Since, theoretically, the url of page1.html is what would be indexed, would it still be indexed considering page2.html is blocked?

cscgal
The Queen of DaniWeb
Administrator
19,421 posts since Feb 2002
Reputation Points: 1,474
Solved Threads: 229
 

I would think that because you redirect page1 to page2 the search engine will include it in a crawl despite a robot.txt instruction to do otherwise.

canadafred
SEO Consultant
Moderator
1,021 posts since Feb 2006
Reputation Points: 192
Solved Threads: 28
 

Would it just crawl (b/c when it first finds page1.html, it is a valid url for it), or would it actually index the contents of page2.html, despite a robots.txt file to disallow crawling or indexing of page2.html?

cscgal
The Queen of DaniWeb
Administrator
19,421 posts since Feb 2002
Reputation Points: 1,474
Solved Threads: 229
 

It's been a couple of days, and Google Webmaster Tools is now showing me that page1.html is not being crawled due to being blocked in my robots.txt file, even though it is only page2.html that is actually listed in robots.txt.

This is the desired effect, in my case.

cscgal
The Queen of DaniWeb
Administrator
19,421 posts since Feb 2002
Reputation Points: 1,474
Solved Threads: 229
 

If you had blocked your page2.html in robots.txt. The search engines bots won't crawl that page even though you had (302 - 'Found' or 'Moved Temporarily') redirected the page1.html to page2.html.

User-agent: *
Disallow: /page2.html

Confirm that you had verified your domain name in webmaster central.

http://www.google.com/webmasters/

Resubmit your sitemap.xml having page1.html in Google Webmaster Tools and Bing Webmaster Center. The SE bots will crawl the URLS given in sitemap.xml and update their index accordingly.

sugeshg
Newbie Poster
1 post since Jul 2011
Reputation Points: 10
Solved Threads: 0
 

blocking the URL in the robots.txt doesn't do much good these days. Google will still index the URL and give it whatever title they want and rank it for what they want. noindex meta robots tag is far more useful.

you say it worked, but I would keep an eye on it. in late June google posted about using robots.txt vs noindex and stated that robots.txt was nolonger their endorsed method. http://www.google.com/support/webmasters/bin/answer.py?answer=156449

they have since clarified that they WILL index the URL but not the page or it's content. that means you can easily run into duplicate/thin content issues by blocking URLs that might get shared out on the web.

joeyoungblood
Newbie Poster
7 posts since Jul 2011
Reputation Points: 10
Solved Threads: 0
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: