word count

Question

server_crash 64 Postaholic

20 Years Ago

I have a method that counts the number of words in a JTextArea. It works pretty good, except for the fact it counts characters that's not letters as words(such as "!@#$" would be a word)...

Here is the code that I have got so far(no erros, compiles and runs fine, just needs to be more specific in what it searches for)

public void processWordCount()
 {
	String data = textArea2.getText();
	Scanner s = new Scanner(data);
	Pattern p = Pattern.compile(" ");
	String words = null;
	int count = 0;
	while (s.hasNext())
	{
		words = s.next();
		count += 1;
	}
	JOptionPane.showMessageDialog(null, "Word Count:  " + count);
		
 }

java

3 Contributors
9 Replies
202 Views
3 Days Discussion Span
Latest Post 20 Years Ago Latest Post by server_crash

All 9 Replies

jwenting 1,905 duckman

20 Years Ago

well, it works according to the standard definition of what a word is, which is anything delimited by whitespace.
Of course it's not complete as you fail to detect line breaks and tabs as word boundaries.

paradox814 1 Posting Whiz

20 Years Ago

you could use:
java.util.StringTokenizer
java.util.regex.Pattern
but when u use pattern, make sure that each string token contains at least something to the effect of [a-zA-z0-9], if it does then
count += 1;

paradox814 1 Posting Whiz

20 Years Ago

whoops i noticed in error in my regular expression, it should have been a capital Z
[a-zA-Z0-9]

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

server_crash 64 Postaholic · Answer 1 · 2004-12-27T20:09:19+00:00

What do you mean detect line breaks and tabs? Is this necesary.

jwenting 1,905 duckman Team Colleague · Answer 2 · 2004-12-28T00:34:08+00:00

If you have a text that
has words on more than one line with
no space between them looking only
for spaces as word boundaries
will mean you see
a lot
less
words
than
you
should.

server_crash 64 Postaholic · Answer 3 · 2004-12-28T02:40:06+00:00

I see what you mean, but actually the code I posted covers that. I tried this:

One
Two
Three
Four

On seperate lines withough any space, and it showed up as four words. I thought it would have the effect you were suggesting.

So do you personally think this would be ok, or would you make it more specific in what it defines as a word?

jwenting 1,905 duckman Team Colleague · Answer 4 · 2004-12-28T14:27:11+00:00

yes, in your case it works for linebreaks because regular expressions only work on a single line.
It does however not work for tabs.

server_crash 64 Postaholic · Answer 5 · 2004-12-29T08:48:28+00:00

server_crash 64 Postaholic

20 Years Ago

Thanks man, that helped a bunch.

server_crash 64 Postaholic · Answer 6 · 2004-12-29T19:21:38+00:00

Thanks for correcting that, I'm getting ready to test it.

word count

Recommended Answers Collapse Answers

All 9 Replies

Recommended Answers