I have many redirect scripts on my site (they compute some PHP then redirect to the relevant page) - however on google webmaster they all kept coming up as "Soft-404" errors, which I read are bad for PR. A while ago I restricted googlebot's access to my /site/ folder, which contains all these redirect scripts to prevent this, which has worked fine, however I'm concerned this might be preventing the crawler from actually navigating the site to get to other pages.

Is it safe to keep these redirect scripts restricted, will googlebot still be able to look around and sort the rest? Or should I stop restricting access and get Soft-404's instead.

I also get soft-404's on pages that sometimes redirect (script before the doctype and headers) say, if a user have an invalid url variable, but the pages come out fine.
That's not so much of a problem, could just make another redirect script, but I'm curious as to why google thinks they're all Soft-404's when they all return 200 Ok.

Any help is much appreciated.

Recommended Answers

All 8 Replies

If there are no pages within the /site/ folder that Googlebot has any reason to visit at all, then it's fine to leave it disallowed via robots.txt (in my opinion). When you ask if googlebot will still be able to "look around and sort the rest" are you worried that there are pages that are NOT in the /site/ folder but are only linked to from pages within the /site/ folder?

If there are no pages within the /site/ folder that Googlebot has any reason to visit at all, then it's fine to leave it disallowed via robots.txt (in my opinion). When you ask if googlebot will still be able to "look around and sort the rest" are you worried that there are pages that are NOT in the /site/ folder but are only linked to from pages within the /site/ folder?

Yes, some of the pages are only internally linked via some of the redirect scripts in the /site/ folder, my worry is that google wouldn't be able to see these pages at all?

Yes, that's true ... Google won't be able to follow internal links to pages only accessible from pages within the /site/ folder, if the /site/ folder is in the robots.txt disallow list. What's more, Google probably knows about these links from external backlinks (or from before the /site/ folder was disallowed), and because it cannot find any links to them anymore, considers them "orphan pages" -- aka a bad navigation structure.

In that case, disallowing the entire /site/ folder is not a good strategy.

Cheers, I've unblocked it now! Soft-404's are certainly better than it completely missing pages.

You can use the noindex meta tag on soft-404 pages so that they don't count against you spamming the Google index with lots of low quality pages.

You can use the noindex meta tag on soft-404 pages so that they don't count against you spamming the Google index with lots of low quality pages.

The pages don't send headers though (maybe that's the problem?)

All they contain is the php code, because they only redirect. They don't have headers, and if they did, the redirect would fail with the "headers already sent" php error. Am I doing something wrong? Is this bad practise?

If all they do is redirect, they shouldn't be showing up as soft-404 pages.

If all they do is redirect, they shouldn't be showing up as soft-404 pages.

My thoughts exactly. Thanks again for your help.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.