I have a method that counts the number of words in a JTextArea. It works pretty good, except for the fact it counts characters that's not letters as words(such as "!@#$" would be a word)...

Here is the code that I have got so far(no erros, compiles and runs fine, just needs to be more specific in what it searches for)

public void processWordCount()
 {
	String data = textArea2.getText();
	Scanner s = new Scanner(data);
	Pattern p = Pattern.compile(" ");
	String words = null;
	int count = 0;
	while (s.hasNext())
	{
		words = s.next();
		count += 1;
	}
	JOptionPane.showMessageDialog(null, "Word Count:  " + count);
		
 }

Recommended Answers

All 9 Replies

well, it works according to the standard definition of what a word is, which is anything delimited by whitespace.
Of course it's not complete as you fail to detect line breaks and tabs as word boundaries.

What do you mean detect line breaks and tabs? Is this necesary.

If you have a text that
has words on more than one line with
no space between them looking only
for spaces as word boundaries
will mean you see
a lot
less
words
than
you
should.

I see what you mean, but actually the code I posted covers that. I tried this:

One
Two
Three
Four

On seperate lines withough any space, and it showed up as four words. I thought it would have the effect you were suggesting.

So do you personally think this would be ok, or would you make it more specific in what it defines as a word?

yes, in your case it works for linebreaks because regular expressions only work on a single line.
It does however not work for tabs.

you could use:
java.util.StringTokenizer
java.util.regex.Pattern
but when u use pattern, make sure that each string token contains at least something to the effect of [a-zA-z0-9], if it does then
count += 1;

Thanks man, that helped a bunch.

whoops i noticed in error in my regular expression, it should have been a capital Z
[a-zA-Z0-9]

Thanks for correcting that, I'm getting ready to test it.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.