The new robots.txt tester that is now built into Google Search Console is finding errors with the last two rows of my robots.txt file:

Sitemap: https://www.daniweb.com/latest-sitemap.xml
Sitemap: https://www.daniweb.com/sitemap.xml

The error is: Invalid sitemap URL detected; syntax not understood

Any idea what could be wrong here?

robertoben41 commented: The issue may be due to having multiple Sitemap entries. Consider consolidating both into one line, like this: Sitemap: https://www.daniweb.com/late +0

Recommended Answers

All 16 Replies

Knowing how obtuse some error messages are, could it be referring to one of the URLs in either or both of the xml files?

Nope, the sitemap files are parsed by Google error-free.

Here's the first one it has a problem with:

Screenshot_2023-11-30_at_3.13.33_PM.png

The only thing I can think of is that this new robots.txt parser is still buggy and expects a sitemap file as opposed to a sitemap index file, even though Google acknowledges here that it can be a sitemap index file just fine.

sitemap: [absoluteURL]
The [absoluteURL] line points to the location of a sitemap or sitemap index file. It must be a fully qualified URL, including the protocol and host, and doesn't have to be URL-encoded. The URL doesn't have to be on the same host as the robots.txt file. You can specify multiple sitemap fields. The sitemap field isn't tied to any specific user agent and may be followed by all crawlers, provided it isn't disallowed for crawling.

Google Search Console's new robots.txt tester enhances website management by identifying issues in the last two lines of your robots.txt file. This tool aids webmasters in ensuring proper crawling instructions for search engines, streamlining SEO efforts and improving overall website visibility. Regularly using this feature can help maintain optimal search engine indexing.

commented: ??? -2

The only thing I can think of is that this new robots.txt parser is still buggy and expects a sitemap file as opposed to a sitemap index file

No , its not that , I tested it with sitemap index file and no issues reported. If I were you I would test also having only one sitemap index file in robots.txt and also removing the comments (maybe something weird happens in their URL parser so I would check one time as simple as possible). Of course making small edits and recheck it takes time because you have to request for a recrawl.
Just a quick question. Why do you have your sitemaps in your robots.txt file? Don't you declare them explicitly in Google Search Console ? If there is no gain to it wouldn't that make life (?) easier to bad bots ?

Why do you have your sitemaps in your robots.txt file? Don't you declare them explicitly in Google Search Console ?

Sitemaps are a valid part of the robots.txt protocol, and used not just by Google, but also by Bing, DuckDuckGo, and other smaller engines.

If I were you I would test also having only one sitemap index file in robots.txt and also removing the comments (maybe something weird happens in their URL parser so I would check one time as simple as possible). Of course making small edits and recheck it takes time because you have to request for a recrawl.

Fortunately, requesting a recrawl through Search Console takes only moments. Unfortunately, removing the comments didn't work, having just one sitemap file didn't work, and changing that one sitemap file to point to a sitemap instead of a sitemap index didn't work either.

probably this is a known issue with Google Search Console robots.txt tester see @ Google support . One of the weird things is that it doesn't happen always (for example I tested a robots.txt having a sitemap index file without showing any issue ).

Ahh, yeah. It seems like it's an open bug.

An "Invalid Robots.txt" error means there is a problem with the robots.txt file on your site. The robots.txt file is used to instruct search engines which pages or sections of your website to crawl or not to index.

Here are some reasons why your robots.txt file may be considered invalid:

Grammatical errors:
Check the robots.txt file for grammar errors. Even minor mistakes or formatting errors can make the file invalid. Instructions must be properly structured (eg, user agent, block).
Legal configuration error:
Be sure to correctly define the rules specified in the robots.txt file. Incorrect configuration can prevent search engines from accidentally accessing important parts of your site.

commented: Why post AI... +15

Did you try the Allow directive, so the website can be crawled and indexed by Googlebot? Be sure to re-submit the corrected sitemap URL.

    Allow: /Sitemap: https://www.daniweb.com/latest-sitemap.xml
    Allow: /Sitemap: https://www.daniweb.com/sitemap.xml

That is not valid robots.txt syntax. There’s a bug in the validator that is allowing it, but I suspect once the bug is fixed it will error on those lines instead.

Google now considers my robots.txt as valid. Same one I had from the beginning.

Did you try removing the spaces and commented lines? I've seen editors insert "invisible" ghosts, that mess up parsing. I checked the examples and they passed the validator and testing Tool(s), with all User Agents *.

<...>
Disallow: /forums/stats.php
Disallow: /forums/tagcloud.php

# Disallow: /tags/
# Disallow: /community-center/say-hello/
# Disallow: /community-center/geeks-lounge/
# Disallow: /community-center/meta-daniweb/
# Disallow: /forums/tag-*.html

Sitemap: https://www.daniweb.com/latest-sitemap.xml
Sitemap: https://www.daniweb.com/sitemap.xml

to:

<...>
Disallow: /forums/stats.php
Disallow: /forums/tagcloud.php
Sitemap: https://www.daniweb.com/latest-sitemap.xml
Sitemap: https://www.daniweb.com/sitemap.xml

The only thing I see different, otherwise, is the urlset.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

versus:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

Per my previous post, Google fixed the bug in their validator. All is good now.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.