The challenge I need to resolve is getting a sub-string from strings such as those below. Each line is of an array so I am iterating and working on one line at a time in a JavaScript loop. All of the strings have the names at the start of the line:

Joe Smith will run...
Jane Jones will follow...
Bridget Burns and Jack Jones will be away...
Jack Jones, Gracie Burns and George Burns have three days in...

The sub-string I need is the name(s) at the start of each string. My JavaScript is handling an example of the first or second line adequately by:

substr   = replace(/^(\w+ \w+ )(?:.*)/g,"$1").trim();

will output Joe Smith or Jane Jones respectively.

Is there a method to extract just the names, preferably as one variable?

I have some experience with regular expressions but not enough to reliably extract what I need from examples such as these.

Recommended Answers

All 8 Replies

While it's easy for a human to pick out the names by context, it's not so easy for code. How can you determine where the names end and the rest of the sentece begins? What constitutes a name? Is a name always a first name and a last name where both are single words? Without knowing the parameters you won't be able to parse the names.

@Reverend Jim: I am pretty sure all of your questions are answered in the examples and what I wrote.

All of the strings have the names at the start of the line:

As for how the names will appear, that is in the examples as well. Two to three names, all with first and last only, no hyphenated surnames or first names. In cases of more than one name, separated by a comma; comma and conjunction 'and' if there are three names.

This is NLP or natual language processing. If I had to do this I'd head to http://compromise.cool/ and use the .people() function.

All of the strings have the names at the start of the line:

That may be the case but how do you know where the name(s) end. For example:

  1. Joe Smith will be attending.
  2. Joe Smith and family will be attending.

How do you determine whether or not and denotes the end of a name or separates two names?

@Reverend Jim: No, you are bending my example to make it more diffiult with the second possibility Joe Smith and family.... That is not an example I provided

From the start of the string 'and' will only be sandwiched between two names like:

Bridget Burns and Jack Jones will be away...
Jack Jones, Gracie Burns and George Burns have three days in...

I need some kind of an if or look ahead on a comma , or the word and - Regex concepts I am familiar that exist but not sure how to do - to capture after the first name. It is not impossible.

Kindly don't go bending my example to make up how this is an impossible situation to program. It is possible with some reliabilty. rproffitt's contribution of http://compromise.cool/ shows good promise from testing I have done so far. The existence of this library shows it is possible.

I wasn't trying to overly complicate things. I assumed this is for an assignment and I've lost points because the marker said "you didn't consider...". I suppose the easiest thing would be to use a regular expression that matches only words that start with an upper case letter. Then all you have to do is process them two at a time. The regex for that is \b[A-Z][a-z]+\b. That would reduce

Jack Jones, Gracie Burns and George Burns have three days in...

to a collection consisting of

Jack
Jones
Gracie Burns
George Burns

or you could use the regexp \b[A-Z][a-z]+ [A-Z][a-z]+\b which would give you full names collected as

Jack Jones
Gracie Burns
George Burns

Thanks @rproffitt. The library you pointed me to for NLP is working well.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.