I have a method that counts the number of words in a JTextArea. It works pretty good, except for the fact it counts characters that's not letters as words(such as "!@#$" would be a word)...

Here is the code that I have got so far(no erros, compiles and runs fine, just needs to be more specific in what it searches for)

public void processWordCount()
 {
	String data = textArea2.getText();
	Scanner s = new Scanner(data);
	Pattern p = Pattern.compile(" ");
	String words = null;
	int count = 0;
	while (s.hasNext())
	{
		words = s.next();
		count += 1;
	}
	JOptionPane.showMessageDialog(null, "Word Count:  " + count);
		
 }

well, it works according to the standard definition of what a word is, which is anything delimited by whitespace.
Of course it's not complete as you fail to detect line breaks and tabs as word boundaries.

If you have a text that
has words on more than one line with
no space between them looking only
for spaces as word boundaries
will mean you see
a lot
less
words
than
you
should.

I see what you mean, but actually the code I posted covers that. I tried this:

One
Two
Three
Four

On seperate lines withough any space, and it showed up as four words. I thought it would have the effect you were suggesting.

So do you personally think this would be ok, or would you make it more specific in what it defines as a word?

yes, in your case it works for linebreaks because regular expressions only work on a single line.
It does however not work for tabs.

you could use:
java.util.StringTokenizer
java.util.regex.Pattern
but when u use pattern, make sure that each string token contains at least something to the effect of [a-zA-z0-9], if it does then
count += 1;

This article has been dead for over six months. Start a new discussion instead.