1,105,177 Community Members

To count the number of occurrences of words in a text

maduxi
Newbie Poster
2 posts since Dec 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
Unverified Member
 
0
 

am working on a project to write a program that finds 10 most used words in a text, but i got stuck dont know what i should do next can someone help me please.

i come this far only

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Pattern;


public class Lab4 {

    public static void main(String[] args) throws FileNotFoundException {

        Scanner file = new Scanner(new File("text.txt")).useDelimiter("[^a-zA-Z]+");



        List<String> words = new ArrayList<String>();

        while (file.hasNext()){
            String tx = file.next();
           // String x = file.next().toLowerCase();
            words.add(tx);

        }

        Collections.sort(words);
       // System.out.println(words);

    }

}
Member Avatar
JamesCherrill
... trying to help
10,362 posts since Apr 2008
Reputation Points: 2,081 [?]
Q&As Helped to Solve: 1,749 [?]
Skill Endorsements: 47 [?]
Moderator
Featured
 
0
 

You could build a HashMap<string, Integer> with the word as the key and a count as value. For every word in the file, if it's already in the Map add one to the count, if it's not already in the Map add it with a count of one. When you've finished you can loop thru the Map to find the highest count.

Member Avatar
mvmalderen
Nearly a Posting Maven
2,370 posts since Feb 2009
Reputation Points: 2,071 [?]
Q&As Helped to Solve: 282 [?]
Skill Endorsements: 40 [?]
 
0
 

The HashMap is a good suggestion, although I want to add something to it.
Personally I wouldn't recommend putting an Integer in it, since Integer is immutable, and you'll have to issue a put() everytime you want to update the count.
I suggest you create a Counter class that wraps around an int field and provides an increment() method. Something like this:

public class Counter
{
    private int count;

    public Counter()
    {
        this(0);
    }

    public Counter(int seed)
    {
        this.count = seed;
    }

    public void reset()
    {
        this.count = 0;
    }

    public void increment()
    {
        this.count++;
    }

    public int getCount()
    {
        return this.count;
    }
}

Everytime you fetch a word do the following:

Counter c = countsByWords.get(word);
if (c == null)
{
    // First time we encounter this word, create a counter for it
    // and put it in the Map.
    countsByWords.put(word, new Counter(1));
}
else
    c.increment(); // increment the count for this word

EDIT #0: Can anyone please link me to the post formatting stuff? I've been away for quite some time, and pretty much alot of things seem to have changed. I'd like some decent code tag formatting for the language I intend the code example for, and somehow the editor assumes I want non-decent code formatting.

EDIT #1: Just refreshed the page, and code highlighting is in place... Is the code highlighting handled automatically now?

Member Avatar
JamesCherrill
... trying to help
10,362 posts since Apr 2008
Reputation Points: 2,081 [?]
Q&As Helped to Solve: 1,749 [?]
Skill Endorsements: 47 [?]
Moderator
Featured
 
1
 

I'm not sure why you see put as a problem, but if you want a mutable integer value there's no need to write a class. You could use an AtomicInteger, or just use a HashMap<String, int[]> with a 1-element int array as the value and keep incrementing its zero'th element

Member Avatar
mvmalderen
Nearly a Posting Maven
2,370 posts since Feb 2009
Reputation Points: 2,071 [?]
Q&As Helped to Solve: 282 [?]
Skill Endorsements: 40 [?]
 
0
 

I'm not sure why you see put as a problem, but if you want a mutable integer value there's no need to write a class. You could use an AtomicInteger, or just use a HashMap<String, int[]> with a 1-element int array as the value and keep incrementing its zero'th element

I pointed that out because it is possible to avoid the overhead of put() each time a count needs to be updated (except for the first time that a word is counted).
For the remainder I agree with the rest of your post though.

Member Avatar
Taywin
Posting Maven
2,632 posts since Apr 2010
Reputation Points: 134 [?]
Q&As Helped to Solve: 378 [?]
Skill Endorsements: 17 [?]
 
1
 

Scanner file = new Scanner(new File("text.txt")).useDelimiter("[^a-zA-Z]+");

I just want to clarify your program on this line. Do you intend to count only words that contains only alphabets? If there is a word with a hyphen in the middle, do you count them as 2 words or one? (i.e. Is your job a part-time or full-time?) If you count each of them as two words, it may not be correct. Also, how about a word with number? (i.e. My constructor1 and constructor2 implementation are totally different.) Just my 2 cents...

maduxi
Newbie Poster
2 posts since Dec 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
Unverified Member
 
0
 

i used " .useDelimiter("[^a-zA-Z]+") " so it only read words from a to z nothing else.

You
This article has been dead for over three months: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article