okay, it's been a while since I've asked any dumb simple questions you all know the answers to.
when a site is using https instead of http, can spiders still crawl that site?
My understanding is those pages load slightly slower and are just more secure. But I'm starting to suspect they cause all sorts of problems with spiders.
Thoughts?

Recommended Answers

All 6 Replies

14,550,000,000 for allinurl: http
196,000,000 for allinurl: https

Don't forget that https are reserved many times for checkouts and other secure parts of websites that aren't necessarily there to be crawled.

In general https pages are not crawled or indexed by search engines. If you want a page to be crawled and indexed it is wise not encrypt its contents.

In general https pages are not crawled or indexed by search engines. If you want a page to be crawled and indexed it is wise not encrypt its contents.

I have seen this before and it really concerns me if I am understanding it correctly. I am new to "webmastering" and have read enough to be really confused at this point. I have a godaddy quick shopping cart that I have added about 80 products to with a goal to be web searchable and sell online. Godaddy HIGHLY recommended an SSL, so I added it. Now the entire domain is https (vtsupply) and all the product pages/descriptions are on https pages. I have submitted to all the major search engines for crawling indexing and am working on my keywords, inbound links, etc.

Am I just spinning my wheels though if I am https? Should I revoke the license? Godaddy says it's no problem with search engines crawling/indexing my site, but I'm not sure I believe it and I don't want to "wait 6 weeks" to view reports and see if there is a problem at that time. Another recommendation made was to host another identical site (another cost and setup) with links to SSL godaddycart (godaddy won't/can't apply the SSL to just the cart since currently, "my whole site is the cart".). It took a ton of time to setup my existing site and I'm not looking forward to doing it all over again for a duplicate (non SSL) site linked to the cart site. At this point I am really bummed and confused on this matter.:rolleyes: THANKS for any assistance. Aaron

The only pages that need to be secure are your order pages and any other page that captures and transmits sensitive data. Otherwise you are just spinning your wheels and making your server work hard encrypting pages that don't need to be encrypted.

if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you'd use the robots.txt files below.
For your http protocol (http://yourserver.com/robots.txt):
User-agent: *
Allow: /
For the https protocol (https://yourserver.com/robots.txt):
User-agent: *
Disallow: /

there is no way to do it via your robots.txt file i would recommend that you use the noindex meta tag!

commented: It can be done via robots.txt, url rewriting, and a scripting language. +0
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.