hi i have to finish my assignment by tonight and am stuck when trying to make a regular expression that splits a big slab of text into sentences.
I am using the .split() function and want it to split on '.' , '?' , '!' , and '\n'
but also if there is more than 1 \n in a row or a ./?/! followed by \n to have the senteces split as they should ie.

"It went on for five minutes without stopping. And by the time the sheep
had quieted down, the chance to utter any protest had passed, for the pigs
had marched back into the farmhouse.

Benjamin felt a nose nuzzling at his shoulder."

should return;

s[0] = "It went on for five minutes without stopping"
s[1] = "And by the time the sheep had quieted down, the chance to utter any protest had passed, for the pigs had marched back into the farmhouse"
s[2] = "Benjamin felt a nose nuzzling at his shoulder"

If someone could please give me the expression and an explanation of how it is constructed it would be greatly appreated.
Thanks mick.

Recommended Answers

All 6 Replies

ok fair enough i have this so far.
"[.+?!\n*][\\s+]"

did you read the link?

yes i read the link and i have read many others. I know i am close but i just need a bit of help. Why have a forum for programming issues if u cant ask a simple question without getting a useless answer pointing you to the most generic link.

Hmm, maybe I didn't understand your question, you want the solution and an explanation. correct?

Why the '+' character in your character set after the '.'? Also, the problem here is that you are clubbing new lines with punctuation characters based on which the string has to be split. The correct logic would be to split on the punctuation characters *followed* by one or more new lines. Assuming that the sentences are grammatically correct, a regex like /[.?!]\s*/ should do the job. The task of stripping excessive newlines would be better handled by a simple replaceAll method call.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.