I do realize that backlinks to URLs disallowed via robots.txt do still have whatever Google's modern version of pagerank is. Depending upon how many incoming backlinks there are, even if Google can't crawl a URL, it might use those external signals to still rank the URL in the search results (albeit with no page title or description).

My question, however, is whether the domain overall is helped? I suspect URLs that are unable to be crawled can't spread pagerank to other pages. By any chance does the domain root get a little lift up? Or does that pagerank completely evaporate at the URL level?

Recommended Answers

All 3 Replies

Dani I have never seen in in that way , just two clarification questions if you would like. As I understood you are talking about URLs that are disallowed in robots.txt to be crawled by Google , but what meta name="robots" do the have in their html head ? Also how do you know that it is using those signals to rank those URLs in page results without title and description ? Do you see them in certain queries ? And if so how do they appear , only a URL ?

but what meta name="robots" do the have in their html head

It doesn't matter. If a URL is disallowed via robots.txt, then Googlebot never crawls the page to discover what is or is not in any meta tags in the page's HTML. Regardless, the <meta name="robots" content="noindex"> tag tells Google to not index the URL, but it can see the URL, crawl the URL, find other links pointing out from the URL, etc. Therefore, it's not relevant to my question :)

Also how do you know that it is using those signals to rank those URLs in page results without title and description ? Do you see them in certain queries ? And if so how do they appear , only a URL ?

Yes, they appear just as URLs, as explained here.

I do come across these in the search results from time to time, but it is pretty rare because it can only happen when there are enough external signals pointing to a URL that Google can't access, to make the weight of the external signals enough to overshadow that Google has no clue what is on the page.

The title and description can be pulled by Google without one being there - so same is done when blocked from crawling - if there is a link on the site to the page that does not have a no index no follow wrapped around the link will be how Google gets the page and ranks it.
Google does not follow robots.txt instructions

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.