Hi all,

I'm currently learning Java right now through a class and my instructor has asked me to try writing a word-count program using a Map interface. It should run through command line and has to identify the number of words, the number of distinct words, and the distinct words that were used.

This is what I have right now:

import java.util.*;

public class WordCount {
	
	public static void main(String[] args) {
		// Defining total word count variable
		int totalCount = 0;
		
		// Setting up linked hash map so output will display words in order of appearance
		Map<String, Integer> textInput = new LinkedHashMap<String, Integer>();
		
		// Determining the number of distinct words and their frequencies of occurrence
		for (String a : args) {
        	Integer freq = textInput.get(a);
        	textInput.put(a, (freq == null) ? 1 : freq + 1);
        					}
		
		// Determining the word count using the occurrence frequencies
		for (int Values : textInput.values())
		    if (Values >=2) {
		    	totalCount += Values;
		    }
		    else {
		    	++totalCount;
		    }
		
		// Determining correct grammar and printing word count results
		// If there's only one word
		if (totalCount == 1) {
			System.out.println("The total word count is " + totalCount + " word.");
			System.out.println("The word is " +textInput.keySet());
		}
		
		// If there's no words
		else if (totalCount == 0) {
			System.out.println("There are no words.");
			
		// If there's more than one word
		} else {
			System.out.println("The total word count is " + totalCount + " words.");
			System.out.println("There are "+ textInput.size() + " different words.");
			System.out.println("The words are: " +textInput.keySet());
			
		}
		
    }
	
}

So far, I have got it to work very well except for one small hitch - it's case sensitive in that it will identify words spelled with different-case letters, such as "There" and "there", as separate words.

Output using "This is a test sentence that this word count program needs to process" ("This" and "this" are the words to watch here):

The total word count is 13 words.
There are 13 different words.
The words are: [This, is, a, test, sentence, that, this, word, count, program, needs, to, process.]

Output using "This is a test sentence that This word count program needs to process" (the two "This" are spelled both with capital Ts here):

The total word count is 13 words.
There are 12 different words.
The words are: [This, is, a, test, sentence, that, word, count, program, needs, to, process.]

How can I make the program non-case-sensitive?

Thanks in advance!

Recommended Answers

All 2 Replies

I'm fairly rusty/ignorant on Map and how it works behind the scenes when you use "put" and "get". If you were comparing two strings manually and iterating through some Collection, I'd suggest you use this function (compareToIgnoreCase):

http://java.sun.com/javase/6/docs/api/java/lang/String.html#compareToIgnoreCase(java.lang.String)

on each member of the Collection.

However, there's also an easy solution that doesn't require any knowledge of what's going on behind the scenes with Map. Convert all strings to lower case using the toLowerCase function from String:

http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase()

Do this before adding each String to the Map.

Got it. Thanks a lot! :)

import java.util.*;

public class WordCount {
	
	public static void main(String[] args) {
		// Defining total word count variable
		int totalCount = 0;
		
		// Setting up linked hash map so output will display words in order of appearance
		Map<String, Integer> textInput = new LinkedHashMap<String, Integer>();
		
		// Determining the number of distinct words and their frequencies of occurrence
		for (String a : args) {
			a = a.toLowerCase();
        	Integer freq = textInput.get(a);
        	textInput.put(a, (freq == null) ? 1 : freq + 1);
        					}
		
		// Determining the word count using the occurrence frequencies
		for (int Values : textInput.values())
		    if (Values >=2) {
		    	totalCount += Values;
		    }
		    else {
		    	++totalCount;
		    }
		
		// Determining correct grammar and printing word count results
		// If there's only one word
		if (totalCount == 1) {
			System.out.println("The total word count is " + totalCount + " word.");
			System.out.println("The word is " +textInput.keySet());
		}
		
		// If there's no words
		else if (totalCount == 0) {
			System.out.println("There are no words.");
			
		// If there's more than one word
		} else {
			System.out.println("The total word count is " + totalCount + " words.");
			System.out.println("There are "+ textInput.size() + " different words.");
			System.out.println("The words are: " +textInput.keySet());
			
		}
		
    }
	
}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.