Java Code to make an Word-Frequency-Counter

Question

Dinesh_9 0 Light Poster

11 Years Ago

Hi everyone,I am assigned with an task of writing an java code to create an Word-Frequency-Counter which needs to satisfy the following constraints:
1)It must prompt the user to enter an path from where the code will read all the contents of text files(.txt) present in that directory.

2)An property file named stop.txt has to be read by the code which will have list of words that has to be excluded from the frequency count Example:If stop.txt have the word "is,that,or"then the count of "is,that,or" has to be skipped.

3)All the identical words has to be chopped down to root word,Example:If the file have words like tall,taller,tallest then the count of the word tall must be 3 not 1.

4)The user must be prompted to enter an number and that number of words has to be displayed Example:If an user enters 2 then the top 2 words(Based on the Frequency) has to be displayed:
Here is the code which i have made so far:

/**
 *This code is to create an Word-Frequency Counter
 *@author Dinesh
 *@version 0.1
 */
import java.io.*;
import java.util.*;

class FrequencyCounter {

    public static void main(String[] args) {
        System.out.println("Enter the file path to analyse:");
        Scanner scan = new Scanner(System.in);
        String path = scan.nextLine();//Files present in this path will be analysed to count frequency
        File directory = new File(path);
        File[] listOfFiles = directory.listFiles();//To get the list of file-names found at the "directoy"
        BufferedReader br = null;
        String words[] = null;
        String line;
        String files;
        Map<String, Integer> wordCount = new HashMap<String, Integer>();     //Creates an Hash Map for storing the words and its count
        for (File file : listOfFiles) { 
            if (file.isFile()) {
                files = file.getName();
                try {
                    if (files.endsWith(".txt") || files.endsWith(".TXT")) {  //Checks whether an file is an text file 
                        br = new BufferedReader(new FileReader(files));      //creates an Buffered Reader to read the contents of the file
                        while ((line = br.readLine()) != null) {
                            line = line.toLowerCase();
                            words = line.split("\\s+");                      //Splits the words with "space" as an delimeter 
                        }
                        br.close();
                    }
                    for (String read : words) {

                        Integer freq = wordCount.get(read);
                        wordCount.put(read, (freq == null) ? 1 : freq + 1); //For Each word the count will be incremented in the Hashmap
                    }

                } catch (NullPointerException | IOException e) {
                    System.out.println("I could'nt read your files:" + e);
                }

            }
            System.out.println(wordCount.size() + " distinct words:");     //Prints the Number of Distinct words found in the files read
            System.out.println(wordCount);                                 //Prints the Word and its occurrence
        }

    }
}

When i tried running this code i have got the output as follows:

Enter the file path to analyse:
/home/dinesh/Desktop
0 distinct words:
{}
I could'nt read your files:java.lang.NullPointerException
0 distinct words:
{}
5 distinct words:
{hello=1, are=1, how=1, you=1, hi=1}
5 distinct words:
{hello=2, are=2, how=2, you=2, hi=2}
5 distinct words:
{hello=3, are=3, how=3, you=3, hi=3}
5 distinct words:
{hello=4, are=4, how=4, you=4, hi=4}
5 distinct words:
{hello=5, are=5, how=5, you=5, hi=5}
5 distinct words:
{hello=6, are=6, how=6, you=6, hi=6}
5 distinct words:
{hello=7, are=7, how=7, you=7, hi=7}
5 distinct words:
{hello=8, are=8, how=8, you=8, hi=8}
6 distinct words:
{=1, hello=8, are=8, how=8, you=8, hi=8}
6 distinct words:
{=1, hello=8, are=8, how=8, you=8, hi=8}
6 distinct words:
{=2, hello=8, are=8, how=8, you=8, hi=8}
6 distinct words:
{=3, hello=8, are=8, how=8, you=8, hi=8}
6 distinct words:
{=4, hello=8, are=8, how=8, you=8, hi=8}
6 distinct words:
{=4, hello=8, are=8, how=8, you=8, hi=8}
6 distinct words:
{=5, hello=8, are=8, how=8, you=8, hi=8}
6 distinct words:
{=6, hello=8, are=8, how=8, you=8, hi=8}
6 distinct words:
{=7, hello=8, are=8, how=8, you=8, hi=8}
6 distinct words:
{=8, hello=8, are=8, how=8, you=8, hi=8}
6 distinct words:
{=9, hello=8, are=8, how=8, you=8, hi=8}
8 distinct words:
{=9, hello=8, dine=2, dinesh=1, are=8, how=8, you=8, hi=8}
8 distinct words:
{=9, hello=8, dine=4, dinesh=2, are=8, how=8, you=8, hi=8}
8 distinct words:
{=9, hello=8, dine=6, dinesh=3, are=8, how=8, you=8, hi=8}

But this is not the expected form of output needed and i am confused about implementing the constraints 2,3 and 4 kindly help/guide me to complete this assignment.Thanks in advance! `

java

4 Contributors
38 Replies
6K Views
10 Months Discussion Span
Latest Post 10 Years Ago Latest Post by vighnesh.anap

All 38 Replies

JamesCherrill 4,733 Most Valuable Poster

11 Years Ago

A bit of a digression, but I've been practicing with the new features in Java 8 (due March), and I have to share this with anyone who's interested...

The following code creates a map of counts for all the unique words (case insensitive) in all the .txt files in a specified folder...

Path dirPath = Paths.get("c://testdata");

Map<String, Long> counts = Files.list(dirPath). // parallel().
    filter(path -> path.toString().toLowerCase().endsWith(".txt")).
    flatMap(path -> bufferedReaderFromPath(path).lines()).
    flatMap(line -> Stream.of(line.split("\\s+"))).
    filter(word -> word.length() > 0).
    map(word -> word.toLowerCase()).
    collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

more amazingly: just uncomment the parallel() call and it will automatically split and run this in a suitable number of parallel threads!

Java 8 is really the biggest thing since 1.5 - maybe even bigger.

ps: bufferedReaderFromPath(path) is just a cover for
BufferedReader(new FileReader(path.toFile()))
but because lambdas can't throw arbitrary Execptions, it needed a wrapper method to deal with any FileNotFoundExceptions

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 1 · 2014-01-05T09:01:59+00:00

Did you really write that code? It displays a wide knowledge of Java syntax, including some of the latest language enhancements. Someone with that kind of programming skill would surely know how to handle parts 2-4.

Dinesh_9 0 Light Poster · Answer 2 · 2014-01-05T09:54:12+00:00

Thanks for the compliment :) but i am completely new to java programming and i have learned those things to implement in my assignment but still couldn't do the rest by googling :(

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 3 · 2014-01-05T10:23:18+00:00

OK, so start with number 2 "Read a list of words from a file". You already have written code to read and store all the words from all the text files in a directory, so this is just a subset of that.

Dinesh_9 0 Light Poster · Answer 4 · 2014-01-05T10:38:59+00:00

But its not reading all the files found in the path entered by the user i couldn't find the reason and its generating an null pointer exception too

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 5 · 2014-01-05T10:47:40+00:00

In your catch block execute an e.printStackTrace(); That will tell you the exact line where the NPE happened, from which it's usually easy to see exactly what went wrong.
Anyway... You already have written code that will read and store all the words from all the text files in a directory (when it's debugged), so this is just a subset of that.
(There's no point trying to add new functionality to a program that's not working yet.)

Dinesh_9 0 Light Poster · Answer 6 · 2014-01-05T11:14:46+00:00

After adding e.printStackTrace(); inside the catch block it has found that line number 34 is causing the NPE and yes its the Enhanced for loop used for iterating the contents of the words array.

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 7 · 2014-01-05T11:35:37+00:00

"words" must still be null - look at the earlier code to see how - use print statements to see exactly what's happening.
ps - I'm going out now, maybe someone else will help
J

Dinesh_9 0 Light Poster · Answer 8 · 2014-01-05T12:04:05+00:00

After modifying my code like this

import java.io.*;
import java.util.*;

class FrequencyCounter {

    public static void main(String[] args) {
        System.out.println("Enter the file path to analyse:");
        Scanner scan = new Scanner(System.in);
        String path = scan.nextLine();//Files present in this path will be analysed to count frequency
        File directory = new File(path);
        File[] listOfFiles = directory.listFiles();//To get the list of file-names found at the "directoy"
        BufferedReader br;
        String words[]=null;
        String line;
        String files;
        Map<String, Integer> wordCount = new HashMap<String, Integer>();     //Creates an Hash Map for storing the words and its count
        for (File file : listOfFiles) {
            if (file.isFile()) {
                files = file.getName();
                try {
                    if (files.endsWith(".txt") || files.endsWith(".TXT")) {  //Checks whether an file is an text file 
                        br = new BufferedReader(new FileReader(files));      //creates an Buffered Reader to read the contents of the file
                        while ((line = br.readLine()) != null) {
                            line = line.toLowerCase();
                            words= line.split("\\s+");                      //Splits the words with "space" as an delimeter 
                        }
                        br.close();
                    }

                } catch (NullPointerException | IOException e) {
                    e.printStackTrace();
                    System.out.println("I could'nt read your files:" + e);
                }

            }

        }
        for (String read : words) {

            Integer freq = wordCount.get(read);
            wordCount.put(read, (freq == null) ? 1 : freq + 1); //For Each word the count will be incremented in the Hashmap
        }
        System.out.println(wordCount.size() + " distinct words:");     //Prints the Number of Distict words found in the files read
        System.out.println(wordCount);                                 //Prints the Word and its occurrence

    }
}

there is no NPE but still the i could get the count of the words present in one file only :(
this is the output which i have got

Enter the file path to analyse:
/home/dinesh/Desktop
2 distinct words:
{dine=2, dinesh=1}

but there are more files with contents

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 9 · 2014-01-05T16:32:04+00:00

Your first loop (17 - 37) processes many files, but stores its results in a "words" array that gets replaced on each pass of the loop. SO after the loop you only have the words array for the last file prcessed.

Dinesh_9 0 Light Poster · Answer 10 · 2014-01-05T17:11:41+00:00

I could understand the mistake but couldn't correct it whenever i change the loop's scope i am getting errors.I am sorry can you please say the correction which i need to do

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 11 · 2014-01-05T17:50:09+00:00

Lines 38-42 increment the counts for an array of words, so if you execute that code inside the loop, after creating each words array (line 25), that should be OK.
ps It's useless to say "I'm getting errors" if you don't say exactly what those errors are!

Dinesh_9 0 Light Poster · Answer 12 · 2014-01-06T13:09:58+00:00

Now its working good after making the modification which you have told here is the output which i have got

Enter the file path to analyse:
/home/dinesh/Desktop
9 distinct words:
{=1, hello=3, dine=2, dinesh=1, are=1, norm=1, how=1, you=1, hi=4}

But i am not sure what this {=1 why its getting printed here is the code after modification

import java.io.*;
import java.util.*;

class FrequencyCounter {

    public static void main(String[] args) {
        System.out.println("Enter the file path to analyse:");
        Scanner scan = new Scanner(System.in);
        String path = scan.nextLine();//Files present in this path will be analysed to count frequency
        File directory = new File(path);
        File[] listOfFiles = directory.listFiles();//To get the list of file-names found at the "directoy"
        BufferedReader br;
        String words[] = null;
        String line;
        String files;
        Map<String, Integer> wordCount = new HashMap<String, Integer>();     //Creates an Hash Map for storing the words and its count
        for (File file : listOfFiles) {
            if (file.isFile()) {
                files = file.getName();
                try {
                    if (files.endsWith(".txt") || files.endsWith(".TXT")) {  //Checks whether an file is an text file 
                        br = new BufferedReader(new FileReader(files));      //creates an Buffered Reader to read the contents of the file
                        while ((line = br.readLine()) != null) {
                            line = line.toLowerCase();
                            words = line.split("\\s+");                      //Splits the words with "space" as an delimeter 
                            for (String read : words) {

                                Integer freq = wordCount.get(read);
                                wordCount.put(read, (freq == null) ? 1 : freq + 1); //For Each word the count will be incremented in the Hashmap
                            }

                        }
                        br.close();
                    }

                } catch (NullPointerException | IOException e) {
                    e.printStackTrace();
                    System.out.println("I could'nt read your files:" + e);
                }

            }

        }

         System.out.println(wordCount.size() + " distinct words:");     //Prints the Number of Distict words found in the files read
         System.out.println(wordCount);                                 //Prints the Word and its occurrence

    }
}

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 13 · 2014-01-06T13:19:38+00:00

But i am not sure what this {=1 why its getting printed

Looks like your splitting may be returning exactly one empty string, ie "" (or maybe an unprintable character). In the loop starting line 27 you could print the read variable (inside a pair of delimiters so you can see its exact length) to see what's going on

Dinesh_9 0 Light Poster · Answer 14 · 2014-01-06T14:11:55+00:00

yes you are right an blank space has been stored in read not sure how this is coming here is the output

Enter the file path to analyse:
/home/dinesh/Desktop
hi
hello
how
are
you
hi
hello
hi
hello
hi
norm

dine
dine
dinesh
9 distinct words:
{=1, hello=3, dine=2, dinesh=1, are=1, norm=1, how=1, you=1, hi=4}

anyway thanks for that tips so now how can i make my code to neglect certain words which are in an file say stop.txt?

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 15 · 2014-01-06T14:21:41+00:00

You'll have to look at the source files to see how/why you are parsing that blank space - your split method parameter should deal with it automatically.

how can i make my code to neglect certain words which are in an file say stop.txt?

You can read stop.txt into an array (just like reading the words in your existing code). You can then write a tiny method that tells you if a given word is in that array or not, and you can then use that method to decide which words to neglect.

There are classes in the Java API that will make this easier, if you want to learn/use them. Eg read the words in stop.txt into an ArrayList, then you can use its contains method to see if it contains a given word.

Dinesh_9 0 Light Poster · Answer 16 · 2014-01-12T06:50:47+00:00

James its awesome i think i need to learn more :) in java 7 itself then i will move on to java 8 :) still i have not completed my assignment.

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 17 · 2014-01-12T09:08:21+00:00

Java 8 isn't out yet, so you can safely ignore it for now. I just posted that for anyone who was preparing to support Java 8 when it is released.

If you need more help with your assignment, just post your questoins here. Parts 2 and 4 should be easy enough for you. I don't understand what the teacher is asking for in point 3.

Dinesh_9 0 Light Poster · Answer 18 · 2014-01-12T10:49:18+00:00

I have used the following code to store the contents of file(stop.txt) after line number 16.

 List<String[]> stoplist = new ArrayList<String[]>();
        try {
            br = new BufferedReader(new FileReader("stopwords.txt"));
                while ((stopwords=br.readLine())!= null) {
                stopwords = stopwords.toLowerCase();
                String[] stopArray=stopwords.split("\\s+");
                stopArray.toString();
                stoplist.add(stopArray);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

Now there is no error but i couldn't make the code to use contains()method with the arraylist to remove the words present in the stop.txt file from counting.

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 19 · 2014-01-12T11:12:43+00:00

You have made this too hard by building a List of arrays.
It will be much simpler if you just have a List<String> containing one word per entry. So at line 8, instead of adding the whole array to the list, use a small loop to add all the words in the array to the list one at a time.
Now you have a list of words, you can simply use stopList.contains(someWord) to see if someWord is in the list.

ps Line 7 does nothing - toString returns a string representation of the array (its type etc, not its contents), but you don't do anything with that returned value.

Dinesh_9 0 Light Poster · Answer 20 · 2014-01-12T12:02:04+00:00

i have changed the code as you have suggested and for using contains method i did like this

for (File file : listOfFiles) {
            if (file.isFile()) {
                files = file.getName();
                try {
                    if (files.endsWith(".txt") || files.endsWith(".TXT")) {  //Checks whether an file is an text file 
                        br1 = new BufferedReader(new FileReader(files));      //creates an Buffered Reader to read the contents of the file
                        while ((line = br1.readLine()) != null) {
                            line = line.toLowerCase();
                            if(!stoplist.contains(line)){


                            words = line.split("\\s+");                      //Splits the words with "space" as an delimeter 
                            }
                            for (String read : words) {
                                System.out.println(read);
                                Integer freq = wordCount.get(read);
                                wordCount.put(read, (freq == null) ? 1 : freq + 1); //For Each word the count will be incremented in the Hashmap
                            }



                        }
                        br1.close();

but now the thing happening is if the stop.txt have hi then the counter is not rejecting that word from counting

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 21 · 2014-01-12T12:34:13+00:00

Thta's becuase you are checking if stopList contains the whole line from the txt file. You should be checking individual words.

Dinesh_9 0 Light Poster · Answer 22 · 2014-01-12T12:46:48+00:00

yes now i have changed it into

for (String read : words) {
                                    //System.out.println(read);
                                    if (!(stoplist.contains(read))) {
                                    Integer freq = wordCount.get(read);
                                    wordCount.put(read, (freq == null) ? 1 : freq + 1); //For Each word the count will be incremented in the Hashmap
                                    }

now its working i believe :) and regarding the constraint 3 whats the requirement is if we have read an word tall,taller,tallest then they must be considered as tall and so its count has to be 3

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 23 · 2014-01-12T13:10:51+00:00

Yes, but how does that work in general? What genberal rukle do you apply?

Do you ignore "er", "est", etc at the end of words? What about "deer" or "west"?

Do you ignore a word if it's the same as another word but with extra letters at the end? What about "man", "many", "manifest"?

Do you combine both those rules?

Do you have a complete English dictionary with the "root words" identified?

... you get the idea ....

Dinesh_9 0 Light Poster · Answer 24 · 2014-01-12T13:22:47+00:00

yes i can understand that but my teacher has made the assignment tricky so i am neglecting that constraint for now going on with constraint number 4 so how can i sort the contents of an hashmap based on the count value??i mean i have used the HashMap Map<String, Integer> wordCount = new HashMap<String, Integer>(); so i need to arrange the wordCount in the decreasing order such that i can print the top occuring k words along with their count based on the count.

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 25 · 2014-01-12T13:46:52+00:00

This is a really interesting problem, but because it's your homework I can't just give you the answer. But here are some things to remember:
HashMaps don't have any ordering, so it makes no sense to sort them. A TreeMap is held in the sort order of its keys, but you want it sorted by its values.
You can't just use the count as key and word as value because there will be many duplicate counts.
You really need a way to group together a word and its count so you can sort them together... ... ...

Dinesh_9 0 Light Poster · Answer 26 · 2014-01-12T16:42:45+00:00

I have found the code to sort HashMap based on value but i couldn't understand it or how to use it in my code so here is the code kindly help me to use it James

public static <K extends Comparable,V extends Comparable> Map<K,V> sortByValues(Map<K,V> map){
        List<Map.Entry<K,V>> entries = new LinkedList<Map.Entry<K,V>>(map.entrySet());

        Collections.sort(entries, new Comparator<Map.Entry<K,V>>() {

            @Override
            public int compare(Entry<K, V> o1, Entry<K, V> o2) {
                return o1.getValue().compareTo(o2.getValue());
            }
        });

        //LinkedHashMap will keep the keys in the order they are inserted
        //which is currently sorted on natural ordering
        Map<K,V> sortedMap = new LinkedHashMap<K,V>();

        for(Map.Entry<K,V> entry: entries){
            sortedMap.put(entry.getKey(), entry.getValue());
        }

        return sortedMap;
    }

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 27 · 2014-01-12T17:05:21+00:00

That code creates a new LinkedHashMap. LinkedHashMap is interesting because it remembers the order that its elements were added in. The method starts by getting a List of all the entries (key/value pairs) from your Map, sorts them by value, according to the new Comparator, then adds them to the LinkedHashMap, so the LinkedHashMap is now in the same order that the List.

At this point you should have noticed that you have no need of the LinkedHashMap at all for this application. Everything you need is in the sorted List.

Printing the first "n" entries from the list is trivial, but since you want the highest counts, not the lowest, you need to change the comparator.

I recognised that code immediately, so there's a serious chance that your teacher will as well, which may not help your final grade!

If I were you, I would now write my own highy simplified version of that, which just does what's needed for this exercise. Get rid of all the generics and hard-code the types from your own HashMap. Fix the comparator to sort descending rather than ascending. Get rid of the LinkedHashMap and just use the List. That will also prove that you understood what you were doing and din't just copy something blindly.
Good Luck!
J

Dinesh_9 0 Light Poster · Answer 28 · 2014-01-13T07:28:10+00:00

James i tried to sort the HashMap by using the following code snippet

List<Map.Entry<String,Integer>> entries = new ArrayList<Map.Entry<String,Integer>>(wordCount.entrySet());
        Comparator<Map.Entry<String,Integer>> reverser = Collections.reverseOrder();
        Collections.sort(entries, reverser);

        System.out.println(entries);

but its generating the following run-time errors

Enter the file path to analyse:
/home/dinesh/Desktop
Exception in thread "main" java.lang.ClassCastException: java.util.HashMap$Entry cannot be cast to java.lang.Comparable
    at java.util.Collections$ReverseComparator.compare(Collections.java:3569)
    at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
    at java.util.TimSort.sort(TimSort.java:189)
    at java.util.TimSort.sort(TimSort.java:173)
    at java.util.Arrays.sort(Arrays.java:659)
    at java.util.Collections.sort(Collections.java:217)
    at FrequencyCounter.main(FrequencyCounter.java:71)

kindly help me to correct my mistakes

Java Code to make an Word-Frequency-Counter

Recommended Answers Collapse Answers

All 38 Replies

Recommended Answers