Hey All!
New to here, but I have a couple questions:
How do I find Average word length, and how to I printout the occurrence of each word size?
Needs to look like this:

length frequency
------ ---------
     1         3
     2        13
     3        24
     4        13
     5        10
     6         2
     7         5
     8         3
     9         1
    10         3
    11         2
  > 23         0
------ ---------
Average length = 4.2

And also, how do I format this:

abcdefghijklm
27111103397242502185
nopqrstuvwxyz
2526502130357110120

to match this:

a   b   c   d   e   f   g   h   i   j   k   l   m  
  27   1  11  10  33   9   7  24  25   0   2  18   5 
   n   o   p   q   r   s   t   u  v   w   x   y   z
  25  26   5   0  21  30  35   7  1  10   1   2   0

I have a main class, but its irrelevant to post, here's my instantiable class where all the calculations occur:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.StringTokenizer;

public class TextStatistics {

	private Scanner in;
	private StringTokenizer ST;
	private int[] chr = new int[26];
	private int lines = 0;
	private int words = 0;
	private int characters = 0;
	private String temp;
	private String temp1 = "";
	private String temp2 = "";
	private String temp3 = "";
	private String temp4 = "";
	private char ch;
	private double avg = 0.0;
	private int count;

	public TextStatistics(String args) {
		try {
			in = new Scanner(new File(args));
			while (in.hasNext()) {
				temp = in.nextLine().toLowerCase();
				ST = new StringTokenizer(temp,
						" , .;:'\"&!?-_\n\t12345678910[]{}()@#$%^*/+-");
				lines++;
				words += ST.countTokens();
				count = temp.length();
				characters = characters + count;

				for (int c = 0; c < temp.length(); c++) {
					ch = temp.charAt(c);
					if (ch >= 'a' && ch <= 'z') {
						chr[ch - 'a']++;
					}
				}
			}
		} catch (FileNotFoundException e) {
			System.err.print("TextStatistics: File " + args
					+ " cannot be found");
		}

		for (int c = 0; c < 13; c++) {
			temp1 += (char) (c + 'a');
			temp2 += chr[c];
		}
		for (int c = 13; c < 26; c++) {
			temp3 += (char) (c + 'a');
			temp4 += chr[c];
		}
	}

	public int getLines() {
		return lines;
	}

	public int getWords() {
		return words;
	}

	public int getCharacters() {
		return characters;
	}

	@Override
	public String toString() {

		return "=============================================================\n"
				+ lines
				+ " lines\n"
				+ words
				+ " words\n"
				+ characters
				+ " characters\n"
				+ "---------------------------------------\n"
				+ temp1
				+ "\n"
				+ temp2
				+ '\n'
				+ temp3
				+ "\n"
				+ temp4
				+ "\n"
				+ "---------------------------------------\n"
				+ "length frequency\n"
				+ "------ ---------\n"
				+ "------ ---------\n"
				+ "Average length = "
				+ avg
				+ "\n"
				+ "=============================================================\n";
	}
}

Here's my current output:
Statistics for testfile.txt
=============================================================
11 lines
79 words
458 characters
---------------------------------------
abcdefghijklm
27111103397242502185
nopqrstuvwxyz
2526502130357110120
---------------------------------------
length frequency
------ ---------
------ ---------
Average length = 0.0
=============================================================

Recommended Answers

All 15 Replies

For

length frequency
------ ---------
     1         3
     2        13
     3        24
     4        13
     5        10
     6         2
     7         5
     8         3
     9         1
    10         3
    11         2
  > 23         0
------ ---------
Average length = 4.2

I would like to see length and frequency as key-value pairs in a map. Use the length as key and increment a value put it back. Once you are done, iterate through the key set and print it out.

For

a   b   c   d   e   f   g   h   i   j   k   l   m  
  27   1  11  10  33   9   7  24  25   0   2  18   5 
   n   o   p   q   r   s   t   u  v   w   x   y   z
  25  26   5   0  21  30  35   7  1  10   1   2   0

Use

(Integer.toString(chr[c])).length();

as a looping condition to add that many number of spaces to your temp1 and temp3 variable for each character.

Is there a way you can make an example of how you use

(Integer.toString(chr[c])).length()

So I can see how to take those values it gets and turn them into a " "?

int someCount = 5;
		while(someCount > 0){
			temp1 += ' ';
		}
		temp1 += 'a';
	}

The above snippet of code adds 5 spaces to the beginning of 'a'. You need to use the

(Integer.toString(chr[c])).length()

to initialize value for someCount like i have done. This adds as many spaces as the number of digits for the below character.

I'm sorry, but it just gets stuck in an infinite loop :/

Ah well.. Sorry, i did not run it and see. You have to decrement someCount inside while. Or you could go for a FOR loop

int someCount = (Integer.toString(chr[c])).length()
while(someCount > 0){
temp1 -= ' ';
}
temp1 += 'a';
}

like that?

No.

while(someCount > 0){
temp1 += ' ';
someCount--;
}

You see here, the value of someCount has to decrease for each iteration Or else you end up getting the infinite loop.

Did you write the first program actually ? If so, this should be straight forward.

Yea, so far I've written everything.
I'm pretty new to Java coding,
its hard learning this and C++ at the same time.

Thanks btw!
got the spacing correct now!

for (int c = 0; c < 13; c++) {
			int Count = (Integer.toString(chr[c])).length();
			while (Count > 0) {
				freq1 += ' ';
				Count--;
			}
			freq1 += (char) (c + 'a') + " ";
			freq3 += chr[c] + "  ";
		}
		for (int c = 13; c < 26; c++) {
			int Count = (Integer.toString(chr[c])).length();
			while (Count > 0) {
				freq2 += ' ';
				Count--;
			}
			freq2 += (char)(c + 'a') + " ";
			freq4 += chr[c]+ "  ";
		}

Alright now onto Word Length Frequencies.
After that, I think i can get Average length on my own.

I don't want to use a map because we haven't learned those in my class yet, and
i don't want it to look like someone wrote this for me.

So far I have:

while (ST.hasMoreTokens()) {

}

Just use the array indexes for word lengths . This is a bad way to do it. but i cant think of anything else right now.

Eg. Use a[1] for words of length 1 , a[2] for length 2 and so on. Their frequencies will be the values stored in the array.

Here's what my teacher sent me over email:

"You need an array for the word length frequencies and you need to increment
one element of the array for each word in the file. Printing it out should be
fairly obvious; just make sure the count is not zero before printing."

Yes. You can follow the pointers that i gave you on doing this. Your teacher meant the same. Before printing out the array, just check to see that its value is not zero.

alright!(:
I'll try my best and post any errors i come across.

GOT IT!(:
THANKS for all the help.


MAIN:

import java.io.IOException;

/**
 * @author Cody Romero
 * 
 */
public class ProcessText {

	/**
	 * @param args
	 * @throws IOException
	 */
	public static void main(String[] args) throws IOException {

		if (args.length == 0) {
			System.err
					.println("ProcessText: Usage: java ProcessText file1 [file2 ...]");
			System.exit(1);
		} // determines if the files are sent in as command line arguments

		System.out.println("Testing current files:");

		for (int x = 0; x < args.length; x++) {
			String txt = args[x];
			System.out.print("( " + txt + " )" + " ");
		} // prints out which files are being read

		System.out.println('\n');

		for (int z = 0; z < args.length; z++) {
			TextStatistics File = new TextStatistics(args[z]);
			System.out.println("\nStatistics for " + args[z]);
			System.out.println(File.toString());
		} // prints out the TextStatistics.java program
	}
}

Method that Reads File

import java.io.File;
import java.io.FileNotFoundException;
import java.text.DecimalFormat;
import java.util.Scanner;
import java.util.StringTokenizer;

/**
 * @author Cody Romero
 * 
 */
public class TextStatistics {

	private Scanner in;
	private StringTokenizer ST;
	private DecimalFormat fmt = new DecimalFormat("0.#");
	private int[] chr = new int[26];
	private int[] frq = new int[23];
	private String freq1 = "", freq2 = "", freq3 = " ", freq4 = " ",
			freq5 = "", temp, avg;
	private char ch;
	private int count1, count2, lines = 0, words = 0, characters = 0,
			average = 0;

	/**
	 * A constructor that takes the name of a file as a parameter and then opens
	 * the file and reads the entire file line-by-line. It collects the number
	 * of characters, number of words, number of lines, average word length and
	 * for the arrays that contain the number of words of each length and the
	 * number of times each letter occurs in the file.
	 * 
	 * @param args
	 */
	public TextStatistics(String args) {
		try {
			in = new Scanner(new File(args)); // reads the file line-by-line
			while (in.hasNext()) {
				lines++; // number of lines in the file
				temp = in.nextLine().toLowerCase();
				ST = new StringTokenizer(temp,
						" , .;:'\"&!?-_\n\t12345678910[]{}()@#$%^*/+-"); // Tokenizer
																			// for
																			// reading
																			// each
																			// token
																			// individually
				words += ST.countTokens(); // number of words in the file
				count1 = temp.length();
				characters = characters + count1; // number of characters in the
													// file
				for (int c = 0; c < temp.length(); c++) {
					ch = temp.charAt(c);
					if (ch >= 'a' && ch <= 'z') {
						chr[ch - 'a']++;
					}
				} // reads each character and determines the frequency of that
					// character
				while (ST.hasMoreTokens()) {
					String s = ST.nextToken();
					for (int c = 0; c < s.length(); c++) {
						count2 = s.length();
						if (count2 >= 1) {
							frq[count2]++;
						}
					}
				} // reads each token or words and determines its length and
					// frequency
			}
		} catch (FileNotFoundException e) {
			System.err.print("TextStatistics: File " + args
					+ " cannot be found");
		} // catches any files that are not in the directory
		for (int c = 0; c < 13; c++) {
			int Count = (Integer.toString(chr[c])).length();
			while (Count > 0) {
				freq1 += ' ';
				Count--;
			}
			freq1 += (char) (c + 'a') + " ";
			freq3 += chr[c] + "  ";
		} // sets ups up the frequency of each character to printout
		for (int c = 13; c < 26; c++) {
			int Count = (Integer.toString(chr[c])).length();
			while (Count > 0) {
				freq2 += ' ';
				Count--;
			}
			freq2 += (char) (c + 'a') + " ";
			freq4 += chr[c] + "  ";
		}
		for (int c = 1; c < 12; c++) {
			freq5 += "     " + c + "\t      " + ((frq[c]) / c) + '\n';
			average += frq[c];
		} // sets up the frequency of each length of all the words in the file,
			// and also calculates the average length
		avg = fmt.format((double) average / words);
	}

	/*
	 * (non-Javadoc)
	 * 
	 * @see java.lang.Object#toString()
	 */
	@Override
	public String toString() {

		return "=============================================================\n"
				+ lines
				+ " lines\n"
				+ words
				+ " words\n"
				+ characters
				+ " characters\n"
				+ "---------------------------------------\n"
				+ freq1
				+ '\n'
				+ freq3
				+ '\n'
				+ freq2
				+ '\n'
				+ freq4
				+ '\n'
				+ "---------------------------------------\n"
				+ "length frequency\n"
				+ "------ ---------\n"
				+ freq5
				+ "------ ---------\n"
				+ "Average length = "
				+ avg
				+ "\n"
				+ "=============================================================\n";
	}
}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.