The number of known pages in Google Search Console blocked by robots.txt in the Coverage report just recently started going down. Over the course of this month, it went down from about 400K pages to 200K pages.

No changes to robots.txt file in 6+ months nor any big structural changes, 404'd pages, etc. in that amount of time either.

What would cause this number to go down on its own?

Recommended Answers

All 8 Replies

What's the timeframe here? Why I ask is that the Verizon and other outages may be a factor and it could bounce back.

What's the timeframe here? Why I ask is that the Verizon and other outages may be a factor and it could bounce back.

Nono, you're misunderstanding. This is the blocked by robots.txt coverage report. We don't want it to bounce back. Also, it has nothing to do with traffic.

I still wonder about the timeframe here. And yes I see why you don't want it to bounce back.

That aside, I was just reading about a Google problem in April 2019 and their cache syncronization issues across their server farms which was not what was interesting but another item where they revealed about how they age the information. Could it be Google aging these entries in their system?

I still wonder about the timeframe here.

Almost all on January 5th, and then small dips weekly since. Google Search Console coverage report only updates once a week or so.

Could it be Google aging these entries in their system?

I'm not sure what you're referring to. Are you referring to Google aging entries in GSC specifically, or within their index? Can you link to the article?

The changes above are now reflected in the index coverage report so you may see new types of issues or changes in counts of issues.

So Google did note a change at https://developers.google.com/search/blog/2021/01/index-coverage-data-improvements dated Jan 11, 2021 and your date of Jan 5 is close enough to when they did roll out a change.

I don't see the blog entry about aging but the change I read at this link does look connected from what little I know.

Does look to be changes made to the way they report the results, but doesn't explain how/why the number would go down. If they're excluded via robots.txt, then Google knows about them. If they knew about them yesteday, why did they forget about them today? Other sections of GSC report go back as far as 2017 when reporting pages.

commented: As to why, usually because they are keeping a lot more under the hat. +15

Even though the Blocked by robots.txt in GSC is going down, this is a major issue that almost everyone has faced. I also have gone through this issue. Fortunately, there is a quick fix for this error. Simply change the robots.txt file at example.com/robots.txt to enable Googlebot and other search engines to crawl your pages.

I think you are completely misunderstanding me.

Our robots.txt file exists for a reason. It is designed to block pages we don't want to be crawled by Googlebot. My question was entirely related to unexpected behavior in Google's search console report.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.