I think you should reinvent the wheel. Depending on the structure of the different fields, the code can be very short, for example
import re
name_re = re.compile(r'^\w+$')
def IsItAName(s):
return name_re.match(s) is not None
Do you have a description of all possible inputs ?
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
Thanks Gribouillis,
Alas, I do not have a description of all possible inputs. If I did, you're right, this could be very simple indeed!
Names could be the first and/or last names of any person, with or without titles, initials, misspellings, etc. Dates are entered as text, and could really be in any format. These are mixed together with fields of short sentences, phrases, single words, and other cruft.
I think I could whip up an IsItADate() test pretty easily that would be right most of the time.
I suspect a proper IsItAName() function couldn't be implemented without two things. 1) a really long list of first and last names, and 2) some soft-ish rules: is it less than 5 words, does it only use alphabetic characters, etc...
If these were well-coded, such functions would return a number indicating the likelihood that a string is a name, or a date, rather than just a TRUE or FALSE.
My guess is that such functions would be very useful to anyone parsing random user input in the wild, from the web for example, or as part of a natural language processing library. That's why I have a hunch that I don't have to write these from scratch. Unfortunately, I have not been able to find such code.
Perhaps the natural language toolkit could help you http://www.nltk.org/ .
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691