Starting from here , I searched for "accident" and got this (saved) list .

What is fairly obvious is that this thread is not in that list.
Why isn't the search finding this recently added thread containing the search term?

What is also obvious is that the results are unsorted.
Is there a way to sort them by date for example?

Recommended Answers

All 13 Replies

The search function is not literal in that it won't return all results containing the query string in age order. It takes numerous things into account (such as word density, etc) to return the list in order of relevance based on a score it calculates. To the best of my ability, I have made the search function a happy medium between being able to return relevant content and being able to search 500,000 posts rather quickly without consuming too many server resources .

How much does it consume Dani when the normal search file is used?? (I remember you did tell me it was heavy on the server)

When the normal search is used, it basically takes upwards of five minutes to process any single request. In fact, the load was so heavy that even with this much more optimized searching algorithm, the entire search functionality had to be offset to a separate server with its own independent searchable post index because the load of actually reading the live post tables was too high.

Is there any update for this?

So if I want to find a thread where I vaguely recall what was written, I basically have to resort to pot luck as to whether I'll find it or not?

How about saying that on the search page that you use a heuristic for performance reasons, and actual results are in the YMMV category.

Have you ever considered purchasing a google search engine appliance which seems to do a pretty good job in a reasonable amount of time.

At least give us some other options like limiting the date range (find all the posts in the last x days/months).

Any kind of sorting of the output would be preferable to what we have now.

> So if I want to find a thread where I vaguely recall what was written, I basically have to resort to pot luck as to whether I'll find it or not?

If you use the "search entire site" option in the sidebar you'll be querying Google for the results. You'll probably find these results more to your liking.

> Have you ever considered purchasing a google search engine appliance which seems to do a pretty good job in a reasonable amount of time.

The google search appliance starts at $30,000 and searches up to 500,000 documents. Firstly, that is incredibly way out of my price range, especially since we already have search functionality which works reasonably well for the majority of users. Secondly, the 500,000 document limit wouldn't last us more than another few months tops. There is a mini appliance that for only $2,000 can search up to 50,000 documents, but that would be less effective as what we have now because it would only be able to index half of our current forum threads, no future threads, none of the blog entries, and none of the code snippets.

> How about saying that on the search page that you use a heuristic for performance reasons

I think that it can pretty much be assumed whenever querying a huge dataset that some level of heuristics are involved. There would be no positive benefit to having such a disclaimer.

> Any kind of sorting of the output would be preferable to what we have now.

I'll see what I can do.

> How about saying that on the search page that you use a heuristic for performance reasons

I think that it can pretty much be assumed whenever querying a huge dataset that some level of heuristics are involved. There would be no positive benefit to having such a disclaimer.

No it can't. Most users that search have absolutely no idea what backend they are searching, nor the extent of the information being searched. A webmaster may understand, but non-web-gurus don't. I do not consider myself a neophyte, nor is Salem, and it seems neither of us made this obvious assumption.

I'm with Salem here.

I understand not everyone is familiar with web development. In that case, I'll explain to you ... Google's algorithm uses some level of heuristics. The default vBulletin algorithm uses heuristics, although we've never used this algorithm. The MySQL fulltext search, which we used up until recently as it was no longer able to handle our load, uses heuristics. The algorithm we use now uses heuristics. None of these algorithms have ever returned every post or thread that contains the query word.

To be quite honest, the algorithm we use now is MUCH more accurate than vBulletin's default search (which is why I opted to never use it).

That's kool, but how does that help the next person that comes by and expects answers from a search? Now only those that read this thread understand heuristics, but the other 223,045 members (and uncountable lurkers) still have no clue (not that they'll read it from the search page anyway, but there's a chance). :icon_wink:

Well the point I'm trying to make it, for at least those in the know, all forum searches always involve some type of heuristics. Being the only forum out there that actually announces this fact couldn't possibly do anything other than portray a bad image, even though everyone else does the same thing. If it hasn't bothered you for the past 5 years DaniWeb's been alive, and it doesn't bother you on every other forum out there, why does it suddenly need to be announced front and center?

The google "entire site" results are just as wonderfully haphazard as the internal search.

When searching a forum (or forum group), how about searching the database in reverse chronological order. For the first few months say, the search is accurate, and becomes increasingly heuristic with the age of the thread being searched.

That way, it would be able to find anything precisely which was posted in a recent time frame, and good enough from your perspective of not overly hammering the servers with expensive search requests.

Perhaps "enable" this feature with a "didn't find what you were looking for?" link at the end of the search results, which repeats the search with an improved algorithm?

Another thing, sorting search results by date might cut down on the proclivity of noobs doing a search, then bumping some thread which is several years old (but a good heuristic match) with their pointless "me too" posts.

> The google "entire site" results are just as wonderfully haphazard as the internal search.

If the results Google (a company that made billions off of search) generates for you don't live up to your expectations then I can't imagine how you could expect the results my measley little server returns possibly could.

commented: Are we getting a little cranky? -2
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.