Hi,

Wondering how I would be able to implement an efficient fuzzy match algorithm in either c# or sql. I was using levenshtein's distance for my test data (~10 records) but once I started using a larger data set (~800 records), I realized this may not be the best way to go. It took approx. 4 minutes to run through the ~800 records to find one match. This is for a web form our clients will use so we are looking for something ~10 seconds or better.

I have to search through a customer data and match on firstname, lastname, social, birthdate, and the first line of their address. I need to account for typos or missing values so I can't just use social to match. Currently, the database I am using has about 800 customers, but eventually, this program will be used for larger databases (in the thousands).

I am looking for some free tools that I may be able to use. I am looking for records that match 80% and greater. I have also tried soundex but I do not think that it would be accurate enough for this. For example, I have a customer Jim Saunders and I search for firstname=Jim, lastname=Saun. Jim Saunders would not come up. I would need to account for names that are not complete as well.

Thanks.

Edit: I just tried a different approach that may work...
I use soundex again, but instead of matching the whole name, I take the first three characters of the first and last name. IE: JIM SAU. and that would match. But is there a way to match numbers? I cannot use soundex to match on birthdate or social.

Edited 5 Years Ago by faintfascinatio: n/a

Not sure if this will help you, but you can use LIKE when trying to match strings in SQL. To take your example:

select * from Customer where FirstName LIKE 'Jim%' AND LastName LIKE '%Sau%'

This will select all customers whose first names start with Jim and last names contain Sau. There are other wildcard characters that you can use in your LIKE statements too.

Not sure if this will help you, but you can use LIKE when trying to match strings in SQL. To take your example:

select * from Customer where FirstName LIKE 'Jim%' AND LastName LIKE '%Sau%'

This will select all customers whose first names start with Jim and last names contain Sau. There are other wildcard characters that you can use in your LIKE statements too.

No, I'm looking for something less exact. I would like to be able to account for typos, like if they misspelled Saunders as Suanders, I'd like Jim SUanders to come up.
Thanks though.

This article has been dead for over six months. Start a new discussion instead.