I want google to index forum threads
We all know that google doesn't like to index dynamic pages. However, I just checked up on the latest daniweb.com pages indexed by google, and guess what I found?
Google has indexed just about every member's profile page, but not the forum threads! The member profile pages contain less original content, and contain the same number of variables in the URL (e.g. member.php?u=1 as opposed to showthread.php?p=100) Why is this, and how could I get google to index the important pages instead of the meaningless ones?
cscgal
The Queen of DaniWeb
19,421 posts since Feb 2002
Reputation Points: 1,474
Solved Threads: 229
that is very odd. i'm sure you know your site better than anyone so think about a site map in your mind. think how google could have gotten there without seeing/indexing these pages. that's the only advice i can give.
evilmonkey29
Junior Poster in Training
71 posts since Aug 2002
Reputation Points: 11
Solved Threads: 3
Nope, no robots.txt. I'm putting some more effort into this since a few weeks ago when I first posted, and it's getting a bit better. A number of threads are indexed now.
Jibtronic, have you tried using the phpBB search-engine friendly hack? It's awfully good. You can find it via phphacks.com and it's the one that uses mod rewrite and an .htaccess file.
It lets you convert all viewtopic.php?t=100 threads into URLs like t100.html because we all know how much google likes static URLs better than dynamic URLs.
I actually wrote my own version of this mod for my own use awhile back (when this forum ran off of phpBB). But I never released it. Guess someone else finally did :)
cscgal
The Queen of DaniWeb
19,421 posts since Feb 2002
Reputation Points: 1,474
Solved Threads: 229
Google will now log variables but it has some blocked like ?id=(n) so I guess that ?p= is one of the ones that are blocked
Ragnarok
Junior Poster in Training
94 posts since Mar 2004
Reputation Points: 10
Solved Threads: 0
Google doesn't necessarily have certain variables such as ?id= blocked. What it is is that Google doesn't want to index URLs containing a session ID (that long value like 8482349809934 ...) ... if you don't know what I'm talking about, check out any page on Amazon.com ;)
In any case, Google doesn't block variables such as ?id= or ?sid= but rather any variable that has a value that resembles a session ID.
Regardless, I found out my problem. I was doing a google serch for site: www.daniweb.com daniweb to return all URLs in the daniweb.com domain that contain the word "daniweb" - basically to estimate how many of my pages are indexed with Google. Apparently my member profile pages just had the word "daniweb" more prominently than my other pages, which wouldn't display until I clicked on the "show all pages" link at the end of the Google search results.
Regardless, guests have now been prohibited from viewing member pages, so Google won't see this anymore anyways ;)site: www.daniweb.com daniweb
cscgal
The Queen of DaniWeb
19,421 posts since Feb 2002
Reputation Points: 1,474
Solved Threads: 229
Do this:
allinurl:
instead of
site:
May I ask what the difference is?
cscgal
The Queen of DaniWeb
19,421 posts since Feb 2002
Reputation Points: 1,474
Solved Threads: 229