I'm not new to SQL or PHP, or basic queries of SQL databases, but I'm creating a site that's going to be dependent on having a really accurate search feature. Basically, I'm just looking for some help on how I can get started, because there's a ton of information out there (various downloads that claim to index databases and such) but it's all very confusing as I don't know exactly what I'm looking for.

I'm going to have a table with information about people (let's say that the columns include "profession," "name," and "location"), and I want a user to be able to search for either of these three values separately (let's say there's a separate search bar to search each of those columns). I just need a really good way for the user to be able to search for these values, where it would be able to suggest searches and fix spelling errors (like Google), but also match "Jim" for "Jimmy" and "NYC" for "N.Y.C." or "New York City."

I really appreciate any tips you might have. I'm just very confused by all the information that's available and have always really been thankful for all the advice given on this site! Thanks so much and have a great day!

5 Years
Discussion Span
Last Post by jeffc418

I would think that the google suggestion for search is based off a custom dictionary, and if an item is not in the dictionary it offers the closest match as a suggestion. What the closest match is will depend on your algorithm. Also, google has been mining search queries, and has methods for collecting all their search queries, for such a long time, their data sets must be massive. One way to do things is to do a daily/weekly or whatever automated query of columns that will check columns for their values and make dictionaries based on what is availabe in your database. So if people use NY or N.Y. or New York in the actual database, your script will automatically add those to a dictionary, and offer those as suggestions.

Of course, as your data becomes large enough, you may actually be able to make a whole new table based on either alphabet (starts with a, then all items with a!), and if it gets really really large (like google), I assume they have multiple tables designed around multiple paths and queries tables based on spelling.. (starts with n, then ne, then new... etc... a space will break a word, and a new word is searched.. each group of letters will have a table dedicated to them).

This may be overkill, or entirely wrong... but it's how I would solve it.

Im guessing that google actually uses the meta data from their spiders to fill their data tables, and has a ranking system for how often certain words come up in meta-data as to how likely the user was looking for that term. So if 3000 sites have the word "folic" and 200 have "folec" then when someone misspells "folic" it offers an alternative based on collective intelligence.

You can use this idea for your ability to suggest... if 1000 people use N.Y. and only 25 use N.T. (due to mistyping), then odds are the user is looking for N.Y.

Similarly, if a user types ne York, you will find that no people use ne followed by another word (they may be looking for new england or whatever), so you will find that New is most likely their search query.

Hope that helps...




Thanks a lot for your advice, I definitely can appreciate what your saying. I'm facinated by what major search engines do to organize their data and it really says a lot about the power behind our databases and their programming languages.

While I can completely conceptualize that, I'm wondering if there's any way to kind of "get my feet off the ground" in terms of creating this smart search engine. Are there any programs or codes that already exist that do something like this? Is this "database indexing," and is that something I should pursue? It just seems like the kind of thing that already exists and users are familiar with, so I'm wondering if I have to reinvent the wheel.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.