I have page1.html that is being 302 redirected (temporary redirect) to page2.html
page2.html is disallowed in my robots.txt file

Under normal circumstances, when googlebot encounters a 301 redirect from page1.html to page2.html, it will index page2.html, and when googlebot encounters a 302 redirect from page1.html to page2.html, it will index page1.html

Since, theoretically, the url of page1.html is what would be indexed, would it still be indexed considering page2.html is blocked?

Recommended Answers

All 5 Replies

I would think that because you redirect page1 to page2 the search engine will include it in a crawl despite a robot.txt instruction to do otherwise.

Would it just crawl (b/c when it first finds page1.html, it is a valid url for it), or would it actually index the contents of page2.html, despite a robots.txt file to disallow crawling or indexing of page2.html?

It's been a couple of days, and Google Webmaster Tools is now showing me that page1.html is not being crawled due to being blocked in my robots.txt file, even though it is only page2.html that is actually listed in robots.txt.

This is the desired effect, in my case.

If you had blocked your page2.html in robots.txt. The search engines bots won't crawl that page even though you had (302 - 'Found' or 'Moved Temporarily') redirected the page1.html to page2.html.

User-agent: *
Disallow: /page2.html

Confirm that you had verified your domain name in webmaster central.

http://www.google.com/webmasters/

Resubmit your sitemap.xml having page1.html in Google Webmaster Tools and Bing Webmaster Center. The SE bots will crawl the URLS given in sitemap.xml and update their index accordingly.

blocking the URL in the robots.txt doesn't do much good these days. Google will still index the URL and give it whatever title they want and rank it for what they want. noindex meta robots tag is far more useful.

you say it worked, but I would keep an eye on it. in late June google posted about using robots.txt vs noindex and stated that robots.txt was nolonger their endorsed method. http://www.google.com/support/webmasters/bin/answer.py?answer=156449

they have since clarified that they WILL index the URL but not the page or it's content. that means you can easily run into duplicate/thin content issues by blocking URLs that might get shared out on the web.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.