web site is created.
Sitemap is generated from sitemap-xml.
It contain 38 web pages.

But the indexed pages, checked as site:websiteA.com gives extra pages.
These extra web pages are not found in sitemap.

Is it needed to include the indexed pages in sitemap?
OR
Is it needed to remove the indexed pages via Google WebMaster?

Thanks

Recommended Answers

All 7 Replies

Search engines will crawl any public content on your website and possibly index it. A URL doesn't necessarily have to be in your sitemap. As long as the content is linked from another known source, sooner or later it will get discovered.

If you wish to restrict indexing of content, consider using the robots.txt file, or meta robots tag, which all well behaved search engines will observe. Bad bots will just ignore your directions and crawl regardless.

If you need to prevent content from being indexed, make it private, i.e. require some form of authentication.

Sometimes URLs can get indexed multiple times. This can happen, for example, when you have dynamically generated content. You might have one URL, but perhaps the content is displayed differently depending on the querystring, e.g.

http://example.com/some-list.php?sort=ascending
http://example.com/some-list.php?sort=descending

Only one page, but two URLs!

Search engines tend to regard this as duplicate content. In this situation, best practice is to use 'link rel canonical' to avoid a penalty.

Thanks Lax.

So Is it good to include the extra web page links in the sitemap?
OR
not to include?

I think it is good to include right?
If so; is manually adding the only way to include that links in the sitemap?
Because the sitemap-xml tools to generate sitemap is not including those extra links.

Thanks.

You can also check your indexed page by google webmaster tools.

Why it shows different values?
No.of Indexed pages - from Google Webmaster is 34
No.of pages checked in Google is 55+

See attached

Adding the extra URLs won't hurt. And the sitemap will allow you to specify things like priority and update frequency of a page, which search engines will take as a hint.

If your sitemap tool has missed entries, you can always add them manually in a text editor. More information on sitemaps can be found here: http://www.sitemaps.org/

Google search engine page results are typically just a sample of data. They tend to understate the number of pages actually indexed for some reason. Google Webmaster Tools on the other hand should give you a better idea of how many pages are index. BTW, your screen grab shows pages indexed from your sitemap. Have a look around GWT and you should see a count of all pages indexed.

Thanks Bro,

Now I got two type of data from Google WebMaster -
1: total pages indexed from the sitemap submitted = 34
2: total pages indexed = got as 65

So one question:
How can I get all the page details? i mean not manually, any tools can generate those 65+ pages? Are Google providing such tools?

Because My sitemap gives only 36 page details, but my website has 65+ :(

There are plenty of standalone sitemap generators around. Unfortunately I cannot recommend any in particular, but it will be better to choose a server-side tool. Online sitemap generators won't be able to discover content that isn't already linked.

Try this search: http://duckduckgo.com/?q=server-side+sitemap+generator

Also, a sitemap generator is the type of feature I'd expect to find built-in to a content management system, or possibly available as an add-on. Do you have a CMS? What web server software are you using?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.