Hi All,
I have to split a text into words using both spaces and punctuation as delimiters.
Punctuation includes characters like .,!?:;'"-
I am using the split function as it follows:

wordsArray = strLine.split("[.,!?:;'\"-]+\\s*");

However, this only splits my text by spaces and ignores other characters I've set as delimiters.
Obviously, I am new at regexes and I could use your help.
Thank you.

6 Years
Discussion Span
Last Post by ~s.o.s~

The regular expression: ([.,!?:;'\"-]|\\s)+ should do the trick. It is read as "for one or more occurrences of either a whitespace or punctuation". The important part here is to make sure that the '+' is kept outside the alternation, since we can have a mix of white-spaces and punctuations.

The problem with your regular expression was that you didn't take into consideration punctuations and whitespaces, but rather punctuation followed by whitespaces. If you have a 'this or that' situation, use alternation. Writing patterns one after another just increases the matching requirement; your pattern read "match any punctuation *followed* by zero or more whitespace characters" which isn't what you wanted. Hence all it did was match punctuations but blew up when faced with white-spaces.

Edited by ~s.o.s~: n/a


Excellent answer and explanation for regex - I now understand what I was doing wrong.
Your solution works perfectly.
Thank you.


It's possible to include exceptions by using negative lookbehind assertion feature of regular expressions. Just throw in another alternation which uses lookaround and you should be golden. But this approach would leave behind a nasty regular expression in your code so if possible, try to deal with exceptions after performing the basic splitting rather than modifying the regex.

Also, please create a new thread for your question, and refer this thread if you feel it is related rather than bumping an existing solved thread.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.