Algorithm for fuzzy match of login id / username

starstarstarstarstarstarstarstarstarstar Rating: 0/5 (0 vote cast) print

Hi,

I just joined and started working on a project that I'm wondering if it's already been done. I have a DB that stores info about users, things like login ID, firstname, last name, employee ID, email, etc... I've been asked to devise and algorithm to do some type of fuzzy match so that whenever we import a new user, we can compare the login id vs the data elements to see if it's the same person. Things like: jdoe has an 80% probability of matching an entry with first name john and last name doe.

So we would have a set of rules and pattern matching based on 5 or 6 data elements.

Does anyone know if this has been done and any references or open source code to help?

 
Thanks!

Jim

 

 : atomz4peace     Reply  

Replies (1)

Your task is very similar to duplicate detection in adresses. A number of good algorithms exist to do a fuzzy string matching, luckily most of them are very easy to implement:

A few months ago somebody posted the "Fuzzy String Matching Engine", which actually implements most of them in Visual Basic. See http://www.kdkeys.net/forums/thread/6450.aspx 

From my personal experience I recommend dice coefficient (with n = 2, because names are very short). Very easy to implement and works reliable. Hint: If you have a large number of users, try to store the N-Grams in a database table; this is a lot faster than a string comparison for thousands of names.

Regards,
Hans
 

: Hans_Meier    Reply

profile
saria goudarzi

hi friends

I need Code for implementing Hierarchical Fuzzy Clustering or a software that implement it 

please help me guys ,Its Urgent for me

wishing happiness



Post A Reply

 Questions & Answers