I am trying to remove an entire folder of thin content from Google to help me recover from a Panda/EAT-related penalty. I want to keep the content on the site for the benefit of users, but not waste crawl budget or have Google think that we have so many pages of thin content.

I added the folder to robots.txt quite a few months ago. While some pages are showing up as "Blocked by robots.txt", the majority of pages now show up in my coverage report as "Indexed, though blocked by robots.txt". About 2 months ago, I submitted a removal request for all URLs that begin with the prefix, but there's been no change. Google Search Console's report updates every few days, but the number of URLs that say, "Indexed, though blocked by robots.txt" is increasing, even months after the removal request for those same pages.

Recommended Answers

This request works only for pages and images that have already been updated or removed from the web. Use the correct URL in your request.

Jump to Post

All 8 Replies

Oh, also ... the pages are noindexed, although I do know the page needs to be seen by Google in order to see there's a noindex directive. I don't want to remove the pages from robots.txt because there's a lot of faceted navigation involved so it would eat up my entire crawl budget. That's why I thought that requesting a temporary removal would do the trick, but it hasn't.

1.On your computer, go to your Google Account.
2.On the top left navigation panel, click Data & personalization.
3.Under "Activity and timeline," click My Activity.
4.Find the item you want to delete. ...
5.On the item you want to delete, click More .
6.Click Delete.

Umm .... that has nothing to do with the question I asked. This is the SEO forum.

Google search displays information collected from websites in the web. The best way to remove information about yourself in Google search results is to contact the website owner who published the information. If they remove it, Google will not be able to find the listing information in the search results.

If the website owner does not remove it, Google will delete certain types of sensitive personal information.

This request works only for pages and images that have already been updated or removed from the web. Use the correct URL in your request.

Ultimately, you can noindex, you can do Robots.txt exclusions, or you can go into Webmaster Tools and try to make the removal, but it's just like indexing in that how and when Google responds to this is really up in the air.

Is the content so important that you can't delete or redirect it entirely? As opposed to messing with indexing and robots.txt, I typically prefer to shoot it to more relevant/higher quality content with a 301.

in that how and when Google responds to this is really up in the air.

True, it may take a really long time and it's not actually guaranteed (especially if Google thinks it's a typo or server misconfiguration on your end, they may ignore your directives), but I think you can be confident that if you noindex a page, and Google recrawls it since that meta directive was added, they won't index it. Googlebot also attemps to be a good citizen and they won't crawl pages in robots.txt, although they may still index these pages (especially if they have a lot of backlinks) even though they don't have access to the content on the page.

If you really want to get a definitive answer to the issue (provided that you're still experiencing it!), just ask @JohnMu on twitter. He's Google's Senior Search Advocate and usually pretty good at answering problem issues. At the very least you'll get to hear it straight from the horse's mouth!

Hope this helps.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts learning and sharing knowledge.