| | |
Determining the number of unique words in a .txt file
![]() |
•
•
Join Date: Jul 2005
Posts: 1,671
Reputation:
Solved Threads: 261
change this: To this: Change wordSearch to this: Eliminate wordSearchSetup() completely.
count should be the number of unique words found. If it reaches 100 before completely reading the file you will have to output the full error message. If you want to keep track of the number of total words found in the file in addition to the number of unique words in the file, you can do that too. Once you have completed the file reading you can display array with each unique word and the number of times it was found.
C++ Syntax (Toggle Plain Text)
while (inFile.getline(line,Num)) { strncpy(array[i].word, line, wordLength); count++; i++; }
C++ Syntax (Toggle Plain Text)
while (inFile.getline(line,Num)) { if(wordSearch(array, count, line)) cout << "duplicate found" << endl; else { strcpy(array[count], line); count++; } }
C++ Syntax (Toggle Plain Text)
bool wordSearch( char line[], int count, words array[]) { bool found = false; int i = count; while (i) { if (strcmp(line, array[i-1].word) == 0) { array[i-1].count++; found = true; break; } i-- ; } return found; }
count should be the number of unique words found. If it reaches 100 before completely reading the file you will have to output the full error message. If you want to keep track of the number of total words found in the file in addition to the number of unique words in the file, you can do that too. Once you have completed the file reading you can display array with each unique word and the number of times it was found.
Klatu Barada Nikto
•
•
Join Date: Nov 2008
Posts: 13
Reputation:
Solved Threads: 0
Ok thanks, now I'm trying to determine the average occurence of each words, I'm starting out by finding out how many of each word there is, heres my function-
It just gives me a large count like 150077, or 150079.
C++ Syntax (Toggle Plain Text)
void averageOccurrence(words array[], int array_length) { int n; char cmp_array[wordLength]; for( int i= 0; i< array_length; i++) { strcpy(cmp_array, array[i].word); for (int j=1; j<array_length; j++) { n = (strcmp(array[j].word, cmp_array)); if(n == 0) array[i].count++; } } }
•
•
Join Date: Jul 2005
Posts: 1,671
Reputation:
Solved Threads: 261
>>trying to determine the average occurence of each words
You don't care what each word is to do this, you only need only need to know how many unique words there are----that would be count in my last post, and how many words there were in the file. The number of words in the file could be calculated as a running total as you read through the file, as indicated in my last post, or it can be calculated by looping through the array of unique words and adding up the number of each in a running total. For example. if there are three unique words with frequency of 3, 6 and 9 each respectively, then the average number of occurences of unique words would be 6. You can decide which approach you wish to take. However, the code you have posted in post #22 above doesn't have a chance of coming up with the correct answer.
You don't care what each word is to do this, you only need only need to know how many unique words there are----that would be count in my last post, and how many words there were in the file. The number of words in the file could be calculated as a running total as you read through the file, as indicated in my last post, or it can be calculated by looping through the array of unique words and adding up the number of each in a running total. For example. if there are three unique words with frequency of 3, 6 and 9 each respectively, then the average number of occurences of unique words would be 6. You can decide which approach you wish to take. However, the code you have posted in post #22 above doesn't have a chance of coming up with the correct answer.
Klatu Barada Nikto
•
•
Join Date: Nov 2008
Posts: 13
Reputation:
Solved Threads: 0
Thanks a lot, that helps a lot. I'm on my last stat, I have to find the most commonly occuring word(s). Heres my function-
I dont get a compile-time error. But when I run the program, I get a message telling me the .exe file stopped working.
C++ Syntax (Toggle Plain Text)
void commonWord(double count[], int array_length, words array[]) { int commonCount[Num]; int max; max = count[0]; int j = 0; int i = 0; int k = 0; for(i = 1; i<array_length; i++) { if(count[i] > max) max = count[i]; else if( count[i] == max) { commonCount[j] = i; j++; } } cout<< "The most commonly occuring words are: "<< endl; for( k = 0; k<array_length; k++) cout<< array[commonCount[k]].word<< endl; }
•
•
Join Date: Nov 2008
Posts: 13
Reputation:
Solved Threads: 0
I'm now trying to write my stats to a text file, but its only writing two of the stats, and those stats come from the same function.
Heres the code -
The "storeFile" function is the only one that prints to the file.
BTW I know this code is unorganized and not the best way to do it, but this project is due tomorrow (12-10)
Heres the code -
C++ Syntax (Toggle Plain Text)
#include <iostream> #include <fstream> #include <cstdlib> #include <string> #include <iomanip> using namespace std; int const wordLength = 21; int const Num = 100; int const fileSize = 255; struct words { char word[wordLength]; int count; }; void storeFile(char[], char [], words []); void displayFile(char[], char[], words[]); int wordSearch(char word[], int array_size, words []); void sortSetup(char[], words [], int); int sort (words [], int, int); void averageLength(char[], words[], int); void Occurrence(char[], words[], int); void commonWord(char[], double[], int, words []); void averageOccurrence(char[], words[], int); int main () { words array[Num]; char fileName[fileSize]; char out_file_name[fileSize]; cout << "Please enter the name of the file you wish to open: "<< endl; cin.getline(fileName,fileSize); if (!cin.good() ) { cout << "Error reading cin..." << endl ; return -1 ; } cout<<"Please enter the name of the file you wish to send the data too" << endl; cin.getline(out_file_name,fileSize); displayFile(out_file_name, fileName, array); storeFile(out_file_name, fileName, array); cin.ignore(); } void storeFile (char out_file_name[], char fileName[], words array[] ) { ofstream outFile; outFile.open(out_file_name); int i = 0; ifstream inFile; char line [Num]; int array_size = 0 ; inFile.open(fileName); while (inFile.getline(line,Num)) { array_size = wordSearch(line, array_size, array); i++; } inFile.close(); outFile<< "The number of unique words are: "<< array_size << endl; outFile<< "total number of words are: " << i << endl; outFile<< endl; } int wordSearch( char line[], int array_size, words array[]) { int i = array_size ; while (i && array_size > 0) { if (strcmp(line, array[i-1].word) == 0) { array[i-1].count++; return array_size ; } i-- ; } strcpy(array[array_size].word, line) ; array[array_size].count = 1 ; return array_size+1; } void displayFile (char out_file_name[], char fileName[], words array[] ) { int i = 0; char line [Num]; ifstream inFile; inFile.open(fileName); while (inFile.getline(line,Num)) { strncpy(array[i].word, line, wordLength); i++; } sortSetup(out_file_name, array, i); averageLength(out_file_name, array, i); Occurrence(out_file_name, array, i); averageOccurrence(out_file_name, array, i); inFile.close(); } void sortSetup (char out_file_name[], words array [], int array_length) { ofstream outFile; outFile.open(out_file_name); char temp[wordLength]; int position; int i = array_length; for (int loop = 0; loop < array_length - 1; loop++) { position = sort (array, loop, array_length - 1); if (position != loop) { strcpy(temp, array[position].word); strcpy(array[position].word, array[loop].word); strcpy(array[loop].word, temp); } } outFile << "The words in alphabetical order are:"<< endl; for (int j = 0; j< array_length; j++) { if(strcmp(array[j].word,array[j-1].word)!=0) outFile << array[j].word << endl; } outFile << endl; } int sort (words array[], int start, int stop) { int n; int loc = start; for (int pos = start + 1; pos <= stop; pos++) { n = (strcmp(array[pos].word, array[loc].word)); if (n < 0) loc = pos; } return loc; } void averageLength(char out_file_name[], words array[], int i) { ofstream outFile; outFile.open(out_file_name); double average = 0; for(int j = 0; j<i; j++) average = average + strlen(array[j].word); average = average/i; outFile << "The average length of the words are: " << average <<endl; outFile << endl; } void Occurrence(char out_file_name[], words array[], int array_length) { ofstream outFile; outFile.open(out_file_name); int n; char cmp_array[wordLength]; double count[Num]; for( int i= 0; i< array_length; i++) { strcpy(cmp_array, array[i].word); count[i] = 0; for (int j=0; j<array_length; j++) { n = (strcmp(array[j].word, cmp_array)); if(n == 0) count[i]++; } } outFile<<"The unique words and the number of times they appear in the text file appears asthe following:"<< endl; outFile<<"word/times it appears:" << endl; outFile<< endl; for (int k = 0; k< array_length; k++) { if(strcmp(array[k].word,array[k-1].word)!=0) outFile <<array[k].word << " / " << count[k] << endl; } outFile<<endl; commonWord(out_file_name, count, array_length, array); } void commonWord(char out_file_name[], double count[], int array_length, words array[]) { ofstream outFile; outFile.open(out_file_name); int count_max; count_max = count[0]; int j = 0; int i = 0; for(i = 1; i<array_length; i++) { if(count[i] > count_max) count_max = count[i]; } outFile<< "The word(s) that occur the most are: "<< endl; for( j = 0; j<array_length; j++) { if(strcmp(array[j].word,array[j-1].word)!=0) { if(count[j] == count_max) outFile << array[j].word<< endl; } } outFile<< endl; } void averageOccurrence(char out_file_name[], words array[], int array_length) { ofstream outFile; outFile.open(out_file_name); int n; char cmp_array[wordLength]; double count[Num]; for( int i= 0; i< array_length; i++) { strcpy(cmp_array, array[i].word); count[i] = 0; for (int j=0; j<array_length; j++) { n = (strcmp(array[j].word, cmp_array)); if(n == 0) count[i]++; } } outFile<<"The average occurence of a word appears as the following:" << endl; outFile <<"word/average appearence:" << endl; outFile<< endl; for (int k = 0; k< array_length; k++) { if(strcmp(array[k].word,array[k-1].word)!=0) outFile <<array[k].word << " / " << count[k]/array_length << endl; } outFile<<endl; }
The "storeFile" function is the only one that prints to the file.
BTW I know this code is unorganized and not the best way to do it, but this project is due tomorrow (12-10)
Last edited by matt_570; Dec 9th, 2008 at 8:15 pm.
![]() |
Other Threads in the C++ Forum
- Previous Thread: Dynamic arrays in a class
- Next Thread: passing fstream to a function and searching for a string
| Thread Tools | Search this Thread |
api array based binary bitmap business c++ c/c++ char class classes code codesamplerunwhilecommands coding commentinghelp compile console conversion count decide delete deploy desktop developer directshow dll download dynamic dynamiccharacterarray email encryption error faq file forms fstream function functions game givemetehcodez graph guess gui hash homeworkhelp homeworkhelper iamthwee ifpug ifstream incrementoperators infinite input int integer java lib linkedlist linker listing loop looping loops map math matrix memory multiple news node output pointer port problem proficiency program programming project python random read recursion reference rpg string strings temperature template test text text-file tree url variable vector video win32 windows winsock wordfrequency wxwidgets






