We're a community of 1076K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,075,780 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

To count the number of occurrences of words in a text

am working on a project to write a program that finds 10 most used words in a text, but i got stuck dont know what i should do next can someone help me please.

i come this far only

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Pattern;


public class Lab4 {

    public static void main(String[] args) throws FileNotFoundException {

        Scanner file = new Scanner(new File("text.txt")).useDelimiter("[^a-zA-Z]+");



        List<String> words = new ArrayList<String>();

        while (file.hasNext()){
            String tx = file.next();
           // String x = file.next().toLowerCase();
            words.add(tx);

        }

        Collections.sort(words);
       // System.out.println(words);

    }

}
4
Contributors
6
Replies
1 Day
Discussion Span
4 Months Ago
Last Updated
8
Views
maduxi
Newbie Poster
2 posts since Dec 2012
Reputation Points: 0
Solved Threads: 0
Skill Endorsements: 0

You could build a HashMap<string, Integer> with the word as the key and a count as value. For every word in the file, if it's already in the Map add one to the count, if it's not already in the Map add it with a count of one. When you've finished you can loop thru the Map to find the highest count.

JamesCherrill
... trying to help
Moderator
8,507 posts since Apr 2008
Reputation Points: 2,583
Solved Threads: 1,454
Skill Endorsements: 30

The HashMap is a good suggestion, although I want to add something to it.
Personally I wouldn't recommend putting an Integer in it, since Integer is immutable, and you'll have to issue a put() everytime you want to update the count.
I suggest you create a Counter class that wraps around an int field and provides an increment() method. Something like this:

public class Counter
{
    private int count;

    public Counter()
    {
        this(0);
    }

    public Counter(int seed)
    {
        this.count = seed;
    }

    public void reset()
    {
        this.count = 0;
    }

    public void increment()
    {
        this.count++;
    }

    public int getCount()
    {
        return this.count;
    }
}

Everytime you fetch a word do the following:

Counter c = countsByWords.get(word);
if (c == null)
{
    // First time we encounter this word, create a counter for it
    // and put it in the Map.
    countsByWords.put(word, new Counter(1));
}
else
    c.increment(); // increment the count for this word

EDIT #0: Can anyone please link me to the post formatting stuff? I've been away for quite some time, and pretty much alot of things seem to have changed. I'd like some decent code tag formatting for the language I intend the code example for, and somehow the editor assumes I want non-decent code formatting.

EDIT #1: Just refreshed the page, and code highlighting is in place... Is the code highlighting handled automatically now?

mvmalderen
Posting Maven
2,612 posts since Feb 2009
Reputation Points: 2,221
Solved Threads: 280
Skill Endorsements: 36

I'm not sure why you see put as a problem, but if you want a mutable integer value there's no need to write a class. You could use an AtomicInteger, or just use a HashMap<String, int[]> with a 1-element int array as the value and keep incrementing its zero'th element

JamesCherrill
... trying to help
Moderator
8,507 posts since Apr 2008
Reputation Points: 2,583
Solved Threads: 1,454
Skill Endorsements: 30

I'm not sure why you see put as a problem, but if you want a mutable integer value there's no need to write a class. You could use an AtomicInteger, or just use a HashMap<String, int[]> with a 1-element int array as the value and keep incrementing its zero'th element

I pointed that out because it is possible to avoid the overhead of put() each time a count needs to be updated (except for the first time that a word is counted).
For the remainder I agree with the rest of your post though.

mvmalderen
Posting Maven
2,612 posts since Feb 2009
Reputation Points: 2,221
Solved Threads: 280
Skill Endorsements: 36

Scanner file = new Scanner(new File("text.txt")).useDelimiter("[^a-zA-Z]+");

I just want to clarify your program on this line. Do you intend to count only words that contains only alphabets? If there is a word with a hyphen in the middle, do you count them as 2 words or one? (i.e. Is your job a part-time or full-time?) If you count each of them as two words, it may not be correct. Also, how about a word with number? (i.e. My constructor1 and constructor2 implementation are totally different.) Just my 2 cents...

Taywin
Posting Maven
2,633 posts since Apr 2010
Reputation Points: 275
Solved Threads: 375
Skill Endorsements: 17

i used " .useDelimiter("[^a-zA-Z]+") " so it only read words from a to z nothing else.

maduxi
Newbie Poster
2 posts since Dec 2012
Reputation Points: 0
Solved Threads: 0
Skill Endorsements: 0

This article has been dead for over three months: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
 
© 2013 DaniWeb® LLC
Page rendered in 0.1132 seconds using 2.68MB