CSCI-15 Assignment #2, String processing. (60 points) Due 9/23/13

You MAY NOT use C++ string objects for anything in this program.

Write a C++ program that reads lines of text from a file using the ifstream getline() method, tokenizes the lines into words ("tokens") using strtok(), and keeps statistics on the data in the file. Your input and output file names will be supplied to your program on the command line, which you will access using argc and argv[].

You need to count the total number of words, the number of unique words, the count of each individual word, and the number of lines. Also, remember and print the longest and shortest words in the file. If there is a tie for longest or shortest word, you may resolve the tie in any consistent manner (e.g., use either the first one or the last one found, but use the same method for both longest and shortest). You may assume the lines comprise words (contiguous lower-case letters [a-z]) separated by spaces, terminated with a period. You may ignore the possibility of other punctuation marks, including possessives or contractions, like in "Jim's house". Lines before the last one in the file will have a newline ('\n') after the period. In your data files, omit the '\n' on the last line. You may assume that the lines will be no longer than 100 characters, the individual words will be no longer than 15 letters and there will be no more than 100 unique words in the file.

Read the lines from the input file, and echo-print them to the output file. After reaching end-of-file on the input file (or reading a line of length zero, which you should treat as the end of the input data), print the words with their occurrence counts, one word/count pair per line, and the collected statistics to the output file. You will also need to create other test files of your own. Also, your program must work correctly with an EMPTY input file –which has NO statistics.

Test file looks like this (exactly 4 lines, with NO NEWLINE on the last line):

the quick brown fox jumps over the lazy dog.
now is the time for all good men to come to the aid of their party.
all i want for christmas is my two front teeth.
the quick brown fox jumps over a lazy dog.

Copy and paste this into a small file for one of your tests.

Hints:

Use a 2-dimensional array of char, 100 rows by 16 columns (why not 15?), to hold the unique words, and a 1-dimensional array of ints with 100 elements to hold the associated counts. For each word, scan through the occupied lines in the array for a match (use strcmp()), and if you find a match, increment the associated count, otherwise (you got past the last word), add the word to the table and set its count to 1.

The separate longest word and the shortest word need to be saved off in their own C-strings. (Why can't you just keep a pointer to them in the tokenized data?)

Remember – put NO NEWLINE at the end of the last line, or your test for end-of-file might not work correctly. (This may cause the program to read a zero-length line before seeing end-of-file.)

This is not a long program – no more than about 2 pages of code.

Here is my source code:

#include<iostream>
#include<iomanip>
#include<fstream>
using std::cout;
using std::ifstream;
using std::ofstream;
using std::endl;
using std::cin;
using std::getline;
using std::right;
using std::setw; 

// This function will indicate every different word in the file.
void wordCount(char position[], char separateWord[100][16], int row, int length)
{
    for(int i = 0; i < length; i++)
    {
        separateWord[row][i] = position[i];
    }
}

// This function will determine how many times each word appears in the file.
void numCount(char position[], char separateWord[100][16], int row, int length)
{
    for(int i = 0; i < length; i++)
    {
        position[i] = separateWord[row][i];
    }
}    

// Find and print the longest and shortest word in the file.
void longestandshortestWord(ifstream &inputFile, ofstream &outputFile)
{
    int length;
    char words[100]; // Holds every word.
    char longestword[100]; // Holds the longest word.
    char shortestword[16]; // Holds the shortest word.
    int longestlength = 0;
    int shortestlength = 0;

    while(!inputFile.eof())
    {
        inputFile >> words; // Read every word.
        longestlength = strlen(longestword); // Determine which word is longer.
        shortestlength = strlen(shortestword); // Determine which word is shorter.
        length = strlen(words); // Determine length of each word.
        // If one word is longer than the others, get the longest word.
        if(length > longestlength)
        {
            longestlength = length;
            strcpy(longestword, words);
        }
        // If one word is shorter than the others, get the shortest word.
        else if(length < shortestlength)
        {
            shortestlength = length;
            strcpy(shortestword, words);
        }
    }
    // Print the longest word.
    outputFile << "Longest Word: " << longestword << endl;
    // Print the shortest word.
    outputFile << "Shortest Word: " << shortestword << endl;
}

// Print every unique word and its associated count
void uniquewords(ifstream &inputFile, ofstream &outputFile)
{
    char uniqueWords[100][16]; // Holds the unique words.
    int i;
    int counter[100]; // Holds the associated counts.
    char temporaryword[100];
    char words[100]; // Holds every word.
    int maxChar = 16; // Maximum number of letters in a word.
    int index = 0;
    bool found = false; // Determines if word is found in the file.

    // For each unique word, set its associated count to 0.
    for(i = 0; i < 100; i++)
    {
        counter[i] = 0;
    }

    while(!inputFile.eof())
    {
        inputFile >> words; // Read every word.
        found = false;
        // For each word, determine if the word matches another.
        for(i = 0; !found && i < 100; i++)
        {
            numCount(temporaryword, uniqueWords, i, maxChar);
            // If there is a match, increment the associated count.
            if(strcmp(words, temporaryword) == 0)
            {
                counter[i]++;
                found = true;
            }
        }
        // If there is no match, add the word to the table and set its count to 1.
        if(!found)
        {
            wordCount(words, uniqueWords, index, 100);
            counter[index]+=1;
            index++;
        }
    }
    // Display the table
    outputFile << setw(13) << "Words/Count" << endl;
    outputFile << "-------------" << endl;
    // Display each unique word and its associated count.
    for(i = 0; i < index; i++)
    {
        numCount(words, uniqueWords, i, maxChar);
        outputFile << right << setw(10) << words << ": " << counter[i] << endl;
    }
}

// Call every function.
int main(int argc, char *argv[])
{
    ifstream inputFile;
    ofstream outputFile;
    char *token;
    char words[100];
    char lines[100];
    char inFile[12] = "string1.txt";
    char outFile[16] = "word result.txt";
    int lineCount = 0;
    int uniquewordCount = 0;
    int totalwordCount = 0;

    // Get the name of the file from the user.
    cout << "Enter the name of the file: ";
    cin >> inFile;

    // Open the input file.
    inputFile.open(inFile);

    // Open the output file.
    outputFile.open(outFile);

    // If successfully opened, process the data.
    if(inputFile)
    {   
        lineCount++;
        while(!inputFile.eof())
        {
            uniquewords(inputFile, outputFile);
            longestandshortestWord(inputFile, outputFile);
            uniquewordCount++; // Increment the total number of unique words.
            // Tokenize each word and remove spaces, periods, and newlines.
            token = strtok(words, " .\n");
            while(token != NULL)
            {
                inputFile >> words; // Read each word.
                totalwordCount++; // Increment the total number of words.
                token = strtok(NULL, " .\n");
            }
        }
        // Display the total number of lines, unique words, and words in the file.
        outputFile << "Total number of lines in file: " << lineCount << endl;
        outputFile << "Total number of unique words in file: " << uniquewordCount << endl;
        outputFile << "Total number of words in file: " << totalwordCount << endl;
        // Close the input file.
        inputFile.close();
        // Close the output file.
        outputFile.close();
    }
    else
    {
        // Display the error message.
        cout << "There was an error opening the input file.\n";
    }
    return 0;
}

Most of my output seems to be working, but all I'm getting is this when I call all functions:

Words/Count
-------------
       the: 5
     quick: 2
     brown: 2
       fox: 2
     jumps: 2
      over: 2
      lazy: 2
      dog.: 2
       now: 1
        is: 2
      time: 1
       for: 2
       all: 2
      good: 1
       men: 1
        to: 2
      come: 1
       aid: 1
        of: 1
     their: 1
    party.: 1
         i: 1
      want: 1
 christmas: 1
        my: 1
       two: 1
     front: 1
    teeth.: 1
         a: 1
Longest Word: 
Shortest Word: ©F
Total number of lines in file: 1
Total number of unique words in file: 1
Total number of words in file: 1

When I only call longestandshortestWord() in main, I get this output:

Longest Word: christmas
Shortest Word: i
Total number of lines in file: 1
Total number of unique words in file: 1
Total number of words in file: 1

When I don't call uniqueWords() and longestandshortestWord() in main(), I get this output:

Total number of lines in file: 1
Total number of unique words in file: 28
Total number of words in file: 44

How can I print all output at once? Also, how can I print the total number of lines in the file which is 4, and total number of unique words in file which is 29?

andrew.mendonca.967
Deleted Member

I tried putting that line inside the while loop, but it gave this output (without calling uniqueWords() and longestandshortestWord()

Total number of lines in file: 28
Total number of unique words in file: 28
Total number of words in file: 44

Is there another way to fix this?

@yingdan because the project description tells to not use string objects and getline function would require a string object.

This question has already been answered. Start a new discussion instead.