Hey I have to write a program that reads a text file that contains a list of words, 1 word per line. I have to store the unique words and count the occurrences of each unique word. When the file is completely read, I have to print the words and the number of occurrences to a text file. The output should be the words in alphabetical order along with the number of times they occur. Then print to the file some statistics:

I have to use character arrays instead of strings.
I must use the linear search (something that looks like this)

array.int search ( int array[], int number, int key )
{
	int pos = 0;
	
	while(pos < number && array[pos] != key)
		pos++;
	
	if( pos == number)
		pos = 0;
	else
		pos = 1;
	
	return pos;
}

to determine if a word is in the array. The array is an array of structures and that the key is a char array so the string comparison must be used. The search task should be a separate function.
The search must be a separate function that returns an integer values. I cant use a for loop and the function must have only one return statement.
I tried to start off by reading the files and storing the words

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <string>
#include <iomanip>
using namespace std;

void displayFile( char []);

void main ()
{
	int const wordLength = 21;
	int const Num = 101;
	int const fileSize = 255;
	
	
	char filename[fileSize];
	ifstream inFile;
	
	
	struct
	{
		char word[wordLength];
		int count;
	} wordCount;
	
	wordCount array[Num];
	
    cout << "Please enter the name of thr file you wish to open: "<< endl;
	cin >> filename;
	
	displayFile (filename);
	
}

void displayFile (char fileName [] )
{
    ifstream infile;
	char array [101];
    char line [101];
	int ch;
    int i;
	infile.open (fileName);

    while ((ch = infile.peek()) != EOF) 
    {
        infile.getline (line, 101);
        line = array[i];
        i++;
    }
	
   for (int j = 0; j<101; j++)
	cout << array[j] << endl;
    infile.close ();

Any tips would be appreciated, and thanks in advance.

Recommended Answers

All 25 Replies

I would first create a structure

struct words
{
    std::string word;
    int count;
};

Next an vector (array) of those structures

vector<words> wordArray;

Now, then the program reads a line, check wordArray if its already in the array. if it is then increment the count. If not, then add a new element to the vector. When done, you will have the information you need to display on the screen.

Another method is to use a <map>, but implementation is left for someone else.

I think you should look at the below code more carefully and must understand the complexity and map datastructure,

if you've any confusion post, I will explain you in more detail then.

fstream inFile("C:\\Laiq.txt");

	string line;
	std::map<string, int> mapWordCount;
	string word;
	std::string::size_type endWord;
	std::string::size_type startWord;
	while (std::getline (inFile,line)) {				
		startWord = 0;
		while ((endWord = line.find_first_of(" ", startWord))!= line.npos) 
                {				
				word = line.substr (startWord, endWord - startWord);
				mapWordCount[word]++;
				startWord = endWord+1;
		}
	}
	inFile.close();


	for(map<string,int>::const_iterator iter = mapWordCount.begin(); 
		iter != mapWordCount.end(); ++iter) {
			cout<< "The Frequency of word: "<< iter->first << " is : "<< iter->second <<endl;
	}

You have two different problems:
1. Tokenize the input stream (extract words from the stream).
2. Build word dictionary.
The second one has a very simple solution: use std::<map> data structure as Laiq Ahmed mentioned above. However you need another code to process every word with the map-based word dictionary:

if the next word is found in the map then
    do notning or increment this word counter
else
    insert new word in the map
endif

There are lots of methods to solve the 1st problem. For example:

open file stream
create an empty map
loop // until eof
   skip non-letters
   clear word buffer // use std::string::clear()
   append letters to the word buffer
   process the word with the map // see #2
endloop
process a possible last word
traverse map (file word dictionary)

Summary:
- use std::ifstream
- use std::string for word buffer
- use std::map<std::string,int> for dictionary with counters
You have a good chance to write a simple and clear code after a proper functional decomposition of pseudocode snippets ;)...

>>I have to use character arrays instead of strings.
I must use the linear search

That pretty much precludes use of a map. It also means you can't use STL strings within a struct, and may preclude use of a vector to hold the structs, too.

You could use a C style string with either static or dynamic memory within the struct. The static version will limit the maximum length of any possible string entered, but since these strings will be words, then it would be unlikely that any given word would have more than 20 or 30 letters per word. Using dynamic memory to store the string within the struct will minimize the memory wasted by using strings of length less than than maximum length in the static version, but will require you to manage your own memory, which is a bit of a hassle, though not an overwhelming task by any means.

The non-STL structures you could use to store the structs could be either an array or a list. If you don't know the maximum number of possible unique words you could encounter then using a list might be advantageous. You could make a guess as to the max number of words, but that isn't quite as predictable as the maximum size of each word. Linear searching is possible with either lists or arrays.

Sorting the structures by string in alphabetical order before printing/sending values to file could be done in either of several different ways. For beginners, bubble sorts with arrays and insertion sorts with lists seem pretty popular.

Sorry, I have not noticed those absurd restruictions.
The only result of this idiotic methodology: a pupil sheds tears on DaniWeb then use copy/paste for extremelly ineffective codes in early 60-th style (or worse)...

The maximum length of a word is 20, so the character array needs to be 21. The maximum numbers of words is 100, if there are more than 100 words I need to return a message saying something like "The stats did not use the words after the 100th word"

Anyways, I'm having a hard time even starting the program. I'm starting off by trying to read and store the words of the text file. My text book has code for reading line by line a text file and then displaying it. So I figured I should start with that. Heres the code-

#include <iostream>
#include <fstream>
#include <cstdlib>   // needed for exit()
#include <string>
using namespace std;

int main()
{
  string filename = "text.dat";  // put the filename up front
  string line;
  ifstream inFile;
  
  inFile.open(filename.c_str());

  if (inFile.fail())  // check for successful open
  {
    cout << "\nThe file was not successfully opened"
	 << "\n Please check that the file currently exists."
	 << endl;
    exit(1);
  }

  // read and display the file's contents
  while (getline(inFile,line))
    cout << line << endl;

  inFile.close(); 

  cin.ignore();  // this line is optional

  return 0;
}

Heres my code split into two functions, however it does not compile. I get an error at the while statement.-

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <string>
#include <iomanip>
using namespace std;

void displayFile( char []);

void main ()
{
	int const wordLength = 21;
	int const Num = 101;
	int const fileSize = 255;
	char filename[fileSize];
	
	cout << "Please enter the name of the file you wish to open: "<< endl;
	cin.getline(filename,fileSize);
	
	displayFile (filename);
	cin.ignore();
}

void displayFile (char fileName[] )
{
    ifstream inFile;
	
    char line [101];
	
	inFile.open(fileName);

 while (getline(inFile, line))
    cout << line << endl;

	inFile.close(); 

	return 0;
}

My code is nearly identical except for the fact that I use character arrays instead of strings, so I'm not sure why its not compiling.
Also, is this a good way to start off the program? Or should I try something else.

Ok I figured out I need to use "inFile.getline(line,length)" instead of "getline(inFile, line)". I created a struct like stated early. However I'm having trouble storing the words from the text file.

I use this function-

void displayFile (char fileName[], words array[] )
{
	int i = 0;
    ifstream inFile;
	
    char line [101];
	
	inFile.open(fileName);

 while (inFile.getline(line,101))
 {   
	cout << line << endl;
    array[i].word = line;
    i++;
 }   
	inFile.close(); 

}

Where array, is an array of structs made up of character array"word" and an integer "count". Any help on storing the words and the number of words in the array would be helpfull.

Read-a-line-in-a-char-array solution has an obvious defect: you must define max line length a priori.
Look at the function which can get words from an input stream (not only from fstream) directly:

// #include <cctype> dependency
typedef unsigned char Uchar; // needed for isalpha()
/// Get or skip the next word from a stream. 
/// wbufsize is the max word length + 1 (for null char)
bool getWord(char* word, int wbufsize, std::istream& is)
{
    int i = 0, n = wbufsize - 1; // max letters
    bool fill = (word && n > 0); // word wanted
    char ch; // current char
    
    while (is.get(ch) && !isalpha(Uchar(ch)))
        ;   // skip delimiters
    if (is) // have a letter
    do {
        if (fill) { // have a room
            word[i] = ch; // append letter
            if (i >= n)   // no more room
                fill = false;
        }
        ++i; // letters counter
    } while (is.get(ch) && isalpha(Uchar(ch)));
    if (word)// have a buffer
        word[i] = '\0'; // end of word
    return i > 0; // have a word
}

Of course, it's possible to adopt this algorithm to extract words from a char array.

I tried a different Approach to achieve the same

char* arr = " Hallo WOrld";
	char* ptrFirst = arr;
	char* ptrIter = arr;
	int nWordCount = 0;

	// Base Condition.
	while (isspace (*ptrFirst))
		++ptrFirst;

	ptrIter = ptrFirst;
	
	while (ptrFirst) {
		
		while (isalpha(*ptrIter)) {
			++ptrIter;
		}

		++nWordCount;
		
		while (isspace (*ptrIter))
			++ptrIter;

		if (ptrFirst == ptrIter)
			break;

		ptrFirst = ptrIter;
	}

	cout << nWordCount-1 <<endl;

Hope this approach will help you understand?

I tried a different Approach to achieve the same

It's not only "different" approach: it's a wrong approach ;).
The C Standard (7.4.1.10):

The standard white-space characters are the following:
space (' '), form feed ('\f'), new-line ('\n'), carriage return ('\r'),
horizontal tab ('\t'), and vertical tab ('\v'). In the "C" locale,
isspace returns true only for the standard white-space characters.

Therefore the code above can't select word in "123four5six..." string. Also it has obviously incorrect lines, for example:

while (ptrFirst) { // probably, must be *ptrFirst

More subtle defect of the code above is that isXXX family function do not work for negative arguments. If a character in a text file has a bit value '1xxxxxxx' and implementation char type is signed then *ptrIter expression gets negative integer and isalpha(*ptrIter) result is undefined. That's why Uchar typedef was defined in my code.

In actual fact it's inaccurate implementation of the same (scanner-like) approach ;)

Apropos, if we have text file with whitespace separators only, no need in scanner-like methods at all. The simplest code works fine:

string word;
while (file >> word) {
    // process word
}

My Bad with the Line

while (ptrFirst) {

Agreed with C99 standard, But the requirement doesn't say anything regarding the numeric separated words. thats why I've implemented the code this way, again your fstream approach is simplest but the thing is that I've tried to use the char* instead of streams provided function.

Thanks.

by the way I didn't compile it.

Strictly speaking, there is 1 word per line in those strange requirements (linear search only, dont use string, cant use for loop etc - it's enough to make you weep). If so no need in word extraction code at all.
Well, if you don't like "stream-based approach", don't use C++ fstream to get lines from a file. Use fgets or what else from C stuff. Furthemore, it's so easy to adopt the code for C-string scan: change f.get(c) to the next char extraction code with null byte test. Oh, sorry, I forgot: don't use istringstream! Don't use C++ at all ;)...

ArkM: I am not denying your opinions but the thing is that we should start learning from basics and Programming language has nothing to do with the logic, if you understand the logic then I suggest to use the built-in functions otherwise creating a raw logic is always a good starting point.

Is it programming learning basics: don't use for loops, don't use this, don't use that... and so on? It's a profanation.
Better download and read well-known B.Stroustrup's article "Learning Standard C++ as a New Language":
http://www.research.att.com/~bs/new_learning.pdf

Thanks ArkM I've gone through this article of Bjarne, no contradiction with this document at all, but as an experienced programmer what do you think of requirements,
practically speaking
"One day your boss come to your desk and ask I've bought a library written in C and I want you to use that for blah blah?"
or what if your boss ask you to develop a C library itself ?
I am not telling you that C is superior than C++, but the thing that matter is requirements, if someone asks for C code teach them C but also provide them with the C++ implementation and the differences between the two. I think this is better learning approach.

Thanks for all the input. I was able to store the lines with strcpy(). but now I'm trying to use strncmp to find out the number of unique words (or lines) in the text file. I tried a couple of things, but none seemed to work. I'm given these guideline-


You must use the linear search algorithm to determine if a word is in the array. Remember that the array is an array of structures and that the key is a string (char array) so the string comparison must be used. The search task should be a separate function.
The search must be a separate function that returns an integer values. Do not use a for loop and the function must have only one return statement.

Heres the instructors linear search-

int search (int list [], int size, int key)
{
    int pos = 0;
    while (pos < size && list[pos] != key)
        pos++;
    if (pos == size)
        pos = -1;
    return pos;
}

I'm really having trouble on this, any help would be appreciated.

bool search(char ** words, int numWords, char * currentWord)
{
    bool found = false;
    int i = 0;
    //compare current word to each word already in array
    while(!found && i < numWords)
       //if current word is found
       if(strcmp(currentWord, words[i]) == 0)
         //change flag to end loop
         found = true;
    return found;
}

"One day your boss come to...for blah blah?" or what if your boss ask you to develop a C library itself ?

You change the subject, it's a very interesting theme. C and C++ are different languages. I'm using both languages at the same time. No problems. It's me come to my boss and say him what's our next library ;). But I never teach my team young members with "don't use for loops in C" methodology. No C specific style in those absurd requirements. Do you really think that bad_programming == C and good_programming == C++?
Now let's remember: this is the C++ language thread and we are talking about C++ here.

Ok, heres my revised code-

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <string>
#include <iomanip>
using namespace std;

    int const wordLength = 21;
	int const Num = 100;
	int const fileSize = 255;

	struct words
	{
		char word[wordLength];
		int count;	
	};


	
int storeFile( char [], words []);
void wordSearchSetup(char[], int, words[]);
int wordSearch(char[], int, words[]);

void main ()
{
    int count;
	char fileName[fileSize];
	
	cout << "Please enter the name of the file you wish to open: "<< endl;
	cin.getline(fileName,fileSize);
	
	words array[Num];
	
	count = storeFile (fileName, array);
		
	
	cin.ignore();
}

int storeFile (char fileName[], words array[] )
{	
	int count = 0;
	int i = 0;
    ifstream inFile;
    char line [Num];
	
	inFile.open(fileName);

	while (inFile.getline(line,Num))
	{   
     strncpy(array[i].word, line, wordLength);
	 count++;
	 i++;  
	}   
	
	inFile.close(); 

	wordSearchSetup( fileName, count, array);
   return count; 
}

void wordSearchSetup(char fileName[], int count, words array[])
{
	char line[Num];
	int i = 0;
	int size;
	ifstream inFile;
	inFile.open(fileName);
	
	while (inFile.getline(line,Num))
	 size = wordSearch(line, count, array);
   
	cout << size << endl;

	
}


int wordSearch( char line[], int count, words array[])
{
	int i = count;
	
	while (i)
	{
		if (strcmp(line, array[i-1].word) == 0) 
		{ 
			array[i-1].count++; 
			return  count;   
        }
           i-- ; 
	}
    
	strcpy(array[count].word, line) ; 
	array[count].count = 1 ; 
	return count+1;   
}

My search functions still arent giving me the results I want, it just returns the number of words, not unique words.

change this:

while (inFile.getline(line,Num))
{   
     strncpy(array[i].word, line, wordLength);
     count++;
     i++;  
}

To this:

while (inFile.getline(line,Num))
{   
     if(wordSearch(array, count, line))
        cout << "duplicate found" << endl;
     else
     {
        strcpy(array[count], line);
        count++;
      }
}

Change wordSearch to this:

bool wordSearch( char line[], int count, words array[])
{
    bool found = false;
    int i = count;
    while (i)
    {
        if (strcmp(line, array[i-1].word) == 0) 
       { 
            array[i-1].count++; 
            found = true;
            break;  
        }
           i-- ; 
     }
      return found;
}

Eliminate wordSearchSetup() completely.

count should be the number of unique words found. If it reaches 100 before completely reading the file you will have to output the full error message. If you want to keep track of the number of total words found in the file in addition to the number of unique words in the file, you can do that too. Once you have completed the file reading you can display array with each unique word and the number of times it was found.

Ok thanks, now I'm trying to determine the average occurence of each words, I'm starting out by finding out how many of each word there is, heres my function-

void averageOccurrence(words array[], int array_length)
{
	int n;
	char cmp_array[wordLength];
	
	for( int i= 0; i< array_length; i++)
	{	
		strcpy(cmp_array, array[i].word);
		
		for (int j=1; j<array_length; j++)
		{
			n = (strcmp(array[j].word, cmp_array));
			if(n == 0)
			array[i].count++;
		}
	}
	
	
}

It just gives me a large count like 150077, or 150079.

What's initial value of count member (must be zero)?

>>trying to determine the average occurence of each words

You don't care what each word is to do this, you only need only need to know how many unique words there are----that would be count in my last post, and how many words there were in the file. The number of words in the file could be calculated as a running total as you read through the file, as indicated in my last post, or it can be calculated by looping through the array of unique words and adding up the number of each in a running total. For example. if there are three unique words with frequency of 3, 6 and 9 each respectively, then the average number of occurences of unique words would be 6. You can decide which approach you wish to take. However, the code you have posted in post #22 above doesn't have a chance of coming up with the correct answer.

Thanks a lot, that helps a lot. I'm on my last stat, I have to find the most commonly occuring word(s). Heres my function-

void commonWord(double count[], int array_length, words array[])
{
	int commonCount[Num];
	int max;
	max = count[0];
	int j = 0;
	int i = 0;
	int k = 0;
	
	for(i = 1; i<array_length; i++)
	{
		if(count[i] > max)
			max = count[i];
		else if( count[i] == max)
			{
				commonCount[j] = i;
				j++;
			}	
	}
	
	cout<< "The most commonly occuring words are: "<< endl;
	
	for( k = 0; k<array_length; k++)
		cout<< array[commonCount[k]].word<< endl;
}

I dont get a compile-time error. But when I run the program, I get a message telling me the .exe file stopped working.

I'm now trying to write my stats to a text file, but its only writing two of the stats, and those stats come from the same function.

Heres the code -

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <string>
#include <iomanip>
using namespace std;

int const wordLength = 21;
int const Num = 100;
int const fileSize = 255;

struct words
{
	char word[wordLength];
	int count;	
};


	
void storeFile(char[], char [], words []);
void displayFile(char[], char[], words[]);
int wordSearch(char word[], int array_size, words []);
void sortSetup(char[], words [], int);
int sort (words [], int, int);
void averageLength(char[], words[], int);
void Occurrence(char[], words[], int);
void commonWord(char[], double[], int, words []);
void averageOccurrence(char[], words[], int);

int main ()
{
	
	words array[Num];
	char fileName[fileSize];
	char out_file_name[fileSize]; 
	
	cout << "Please enter the name of the file you wish to open: "<< endl;
	cin.getline(fileName,fileSize);
	
	if (!cin.good() ) { 
		cout << "Error reading cin..." << endl ; 
		return -1 ; 
	} 
	
	cout<<"Please enter the name of the  file you wish to send the data too" << endl;
	cin.getline(out_file_name,fileSize);
	
	displayFile(out_file_name, fileName, array);
	storeFile(out_file_name, fileName, array);
	
	cin.ignore();
}

void storeFile (char out_file_name[], char fileName[], words array[] )
{	
	ofstream outFile;
	outFile.open(out_file_name);
	
	int i = 0;
	ifstream inFile;
	char line [Num];
	int array_size = 0 ;  
	inFile.open(fileName);

	while (inFile.getline(line,Num))
	{   
		array_size = wordSearch(line, array_size, array);

		i++;    
	}   
	
	inFile.close(); 

	outFile<< "The number of unique words are: "<< array_size << endl;
	outFile<< "total number of words are: " << i << endl;
	outFile<< endl;

}

 int wordSearch( char line[], int array_size, words array[])
{
	int i = array_size ;  
	
	
	while (i && array_size > 0)   
	{
		if (strcmp(line, array[i-1].word) == 0)  
		{ 
			array[i-1].count++;   
			return  array_size ;   
		}
		i-- ;  
	}
    
	strcpy(array[array_size].word, line) ; 
	array[array_size].count = 1 ; 
	return array_size+1;   
}

void displayFile (char out_file_name[], char fileName[], words array[] )
{
	int i = 0;
	char line [Num];
	
	ifstream inFile;
	inFile.open(fileName);
	
	
	while (inFile.getline(line,Num))
	{   
	
		strncpy(array[i].word, line, wordLength);
		i++;
	}   
	 
    sortSetup(out_file_name, array, i);
    averageLength(out_file_name, array, i);
	Occurrence(out_file_name, array, i);
	averageOccurrence(out_file_name, array, i);
	
	inFile.close();
}

void sortSetup (char out_file_name[], words array [], int array_length)
{
	ofstream outFile;
	outFile.open(out_file_name);
		
	char temp[wordLength];
    int position;
    int i = array_length;
	
    for (int loop = 0; loop < array_length - 1; loop++)
    {
        position = sort (array, loop, array_length - 1);
        if (position != loop)
        {
            strcpy(temp, array[position].word);
            strcpy(array[position].word, array[loop].word);
            strcpy(array[loop].word, temp);
        }
    }
	
	outFile << "The words in alphabetical order are:"<< endl;

	for (int j = 0; j< array_length; j++)
	{
		if(strcmp(array[j].word,array[j-1].word)!=0)
			outFile << array[j].word << endl;
	}	
	
	outFile << endl;
}


int sort (words array[], int start, int stop)
{
    int n;
	int loc = start;
    for (int pos = start + 1; pos <= stop; pos++)
    {    
		n = (strcmp(array[pos].word, array[loc].word));
		
		if (n < 0)
            loc = pos;
    }
	return loc;
}

void averageLength(char out_file_name[], words array[], int i)
{
	ofstream outFile;
	outFile.open(out_file_name);
	
	double average = 0;
	for(int j = 0; j<i; j++)
		average = average + strlen(array[j].word);
	
	average = average/i;
	
	outFile << "The average length of the words are: " << average <<endl;
	outFile << endl;
}	

void Occurrence(char out_file_name[], words array[], int array_length)
{
	ofstream outFile;
	outFile.open(out_file_name);
	
	int n;
	char cmp_array[wordLength];
	double count[Num];
	
	for( int i= 0; i< array_length; i++)
	{	
		strcpy(cmp_array, array[i].word);
		count[i] = 0;
		for (int j=0; j<array_length; j++)
		{
			n = (strcmp(array[j].word, cmp_array));
			if(n == 0)
			count[i]++;
		}
	}
	
	outFile<<"The unique words and the number of times they appear in the text file appears asthe following:"<< endl; 
	outFile<<"word/times it appears:" << endl;
	outFile<< endl;
	for (int k = 0; k< array_length; k++)
	{
		if(strcmp(array[k].word,array[k-1].word)!=0)
			outFile <<array[k].word << " / " << count[k] <<  endl;
	}
	
	outFile<<endl;
	commonWord(out_file_name, count, array_length, array);
}


void commonWord(char out_file_name[], double count[], int array_length, words array[])
{
	ofstream outFile;
	outFile.open(out_file_name);
	
	int count_max;
	count_max = count[0];
	int j = 0;
	int i = 0;
	
	
	for(i = 1; i<array_length; i++)
	{
		if(count[i] > count_max)
			count_max = count[i];
	}

	outFile<< "The word(s) that occur the most are: "<< endl;
	
	for( j = 0;  j<array_length; j++)
	{
		if(strcmp(array[j].word,array[j-1].word)!=0)
		{
		  if(count[j] == count_max)
		
			outFile << array[j].word<< endl;
		}	
	}		

	outFile<< endl;
}		

void averageOccurrence(char out_file_name[], words array[], int array_length)
{
	ofstream outFile;
	outFile.open(out_file_name);
	
	int n;
	char cmp_array[wordLength];
	double count[Num];
	
	for( int i= 0; i< array_length; i++)
	{	
		strcpy(cmp_array, array[i].word);
		count[i] = 0;
		for (int j=0; j<array_length; j++)
		{
			n = (strcmp(array[j].word, cmp_array));
			if(n == 0)
			count[i]++;
		}
	}
	
	outFile<<"The average occurence of a word appears as the following:" << endl; outFile <<"word/average appearence:" << endl;
	outFile<< endl;
	for (int k = 0; k< array_length; k++)
	{
		if(strcmp(array[k].word,array[k-1].word)!=0)
			outFile <<array[k].word << " / " << count[k]/array_length <<  endl;
	}
	
	outFile<<endl;
}

The "storeFile" function is the only one that prints to the file.
BTW I know this code is unorganized and not the best way to do it, but this project is due tomorrow (12-10)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.