954,545 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

word count

I have a method that counts the number of words in a JTextArea. It works pretty good, except for the fact it counts characters that's not letters as words(such as "!@#$" would be a word)...

Here is the code that I have got so far(no erros, compiles and runs fine, just needs to be more specific in what it searches for)

public void processWordCount()
 {
	String data = textArea2.getText();
	Scanner s = new Scanner(data);
	Pattern p = Pattern.compile(" ");
	String words = null;
	int count = 0;
	while (s.hasNext())
	{
		words = s.next();
		count += 1;
	}
	JOptionPane.showMessageDialog(null, "Word Count:  " + count);
		
 }
server_crash
Postaholic
2,111 posts since Jun 2004
Reputation Points: 113
Solved Threads: 20
 

well, it works according to the standard definition of what a word is, which is anything delimited by whitespace.
Of course it's not complete as you fail to detect line breaks and tabs as word boundaries.

jwenting
duckman
Team Colleague
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
 

What do you mean detect line breaks and tabs? Is this necesary.

server_crash
Postaholic
2,111 posts since Jun 2004
Reputation Points: 113
Solved Threads: 20
 

If you have a text that
has words on more than one line with
no space between them looking only
for spaces as word boundaries
will mean you see
a lot
less
words
than
you
should.

jwenting
duckman
Team Colleague
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
 

I see what you mean, but actually the code I posted covers that. I tried this:

One
Two
Three
Four

On seperate lines withough any space, and it showed up as four words. I thought it would have the effect you were suggesting.

So do you personally think this would be ok, or would you make it more specific in what it defines as a word?

server_crash
Postaholic
2,111 posts since Jun 2004
Reputation Points: 113
Solved Threads: 20
 

yes, in your case it works for linebreaks because regular expressions only work on a single line.
It does however not work for tabs.

jwenting
duckman
Team Colleague
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
 

you could use:
java.util.StringTokenizer
java.util.regex.Pattern
but when u use pattern, make sure that each string token contains at least something to the effect of [a-zA-z0-9], if it does then
count += 1;

paradox814
Posting Whiz
351 posts since Oct 2004
Reputation Points: 13
Solved Threads: 4
 

Thanks man, that helped a bunch.

server_crash
Postaholic
2,111 posts since Jun 2004
Reputation Points: 113
Solved Threads: 20
 

whoops i noticed in error in my regular expression, it should have been a capital Z
[a-zA-Z0-9]

paradox814
Posting Whiz
351 posts since Oct 2004
Reputation Points: 13
Solved Threads: 4
 

Thanks for correcting that, I'm getting ready to test it.

server_crash
Postaholic
2,111 posts since Jun 2004
Reputation Points: 113
Solved Threads: 20
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You