943,640 Members | Top Members by Rank

Ad:
  • C++ Discussion Thread
  • Unsolved
  • Views: 5074
  • C++ RSS
You are currently viewing page 3 of this multi-page discussion thread; Jump to the first page
Dec 5th, 2008
0

Re: Determining the number of unique words in a .txt file

change this:
C++ Syntax (Toggle Plain Text)
  1. while (inFile.getline(line,Num))
  2. {
  3. strncpy(array[i].word, line, wordLength);
  4. count++;
  5. i++;
  6. }
To this:
C++ Syntax (Toggle Plain Text)
  1. while (inFile.getline(line,Num))
  2. {
  3. if(wordSearch(array, count, line))
  4. cout << "duplicate found" << endl;
  5. else
  6. {
  7. strcpy(array[count], line);
  8. count++;
  9. }
  10. }
Change wordSearch to this:
C++ Syntax (Toggle Plain Text)
  1. bool wordSearch( char line[], int count, words array[])
  2. {
  3. bool found = false;
  4. int i = count;
  5. while (i)
  6. {
  7. if (strcmp(line, array[i-1].word) == 0)
  8. {
  9. array[i-1].count++;
  10. found = true;
  11. break;
  12. }
  13. i-- ;
  14. }
  15. return found;
  16. }
Eliminate wordSearchSetup() completely.

count should be the number of unique words found. If it reaches 100 before completely reading the file you will have to output the full error message. If you want to keep track of the number of total words found in the file in addition to the number of unique words in the file, you can do that too. Once you have completed the file reading you can display array with each unique word and the number of times it was found.
Reputation Points: 718
Solved Threads: 373
Nearly a Posting Maven
Lerner is offline Offline
2,253 posts
since Jul 2005
Dec 7th, 2008
0

Re: Determining the number of unique words in a .txt file

Ok thanks, now I'm trying to determine the average occurence of each words, I'm starting out by finding out how many of each word there is, heres my function-


C++ Syntax (Toggle Plain Text)
  1. void averageOccurrence(words array[], int array_length)
  2. {
  3. int n;
  4. char cmp_array[wordLength];
  5.  
  6. for( int i= 0; i< array_length; i++)
  7. {
  8. strcpy(cmp_array, array[i].word);
  9.  
  10. for (int j=1; j<array_length; j++)
  11. {
  12. n = (strcmp(array[j].word, cmp_array));
  13. if(n == 0)
  14. array[i].count++;
  15. }
  16. }
  17.  
  18.  
  19. }
It just gives me a large count like 150077, or 150079.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
matt_570 is offline Offline
13 posts
since Nov 2008
Dec 7th, 2008
0

Re: Determining the number of unique words in a .txt file

What's initial value of count member (must be zero)?
Reputation Points: 1234
Solved Threads: 347
Postaholic
ArkM is offline Offline
2,001 posts
since Jul 2008
Dec 7th, 2008
0

Re: Determining the number of unique words in a .txt file

>>trying to determine the average occurence of each words

You don't care what each word is to do this, you only need only need to know how many unique words there are----that would be count in my last post, and how many words there were in the file. The number of words in the file could be calculated as a running total as you read through the file, as indicated in my last post, or it can be calculated by looping through the array of unique words and adding up the number of each in a running total. For example. if there are three unique words with frequency of 3, 6 and 9 each respectively, then the average number of occurences of unique words would be 6. You can decide which approach you wish to take. However, the code you have posted in post #22 above doesn't have a chance of coming up with the correct answer.
Reputation Points: 718
Solved Threads: 373
Nearly a Posting Maven
Lerner is offline Offline
2,253 posts
since Jul 2005
Dec 7th, 2008
0

Re: Determining the number of unique words in a .txt file

Thanks a lot, that helps a lot. I'm on my last stat, I have to find the most commonly occuring word(s). Heres my function-

C++ Syntax (Toggle Plain Text)
  1. void commonWord(double count[], int array_length, words array[])
  2. {
  3. int commonCount[Num];
  4. int max;
  5. max = count[0];
  6. int j = 0;
  7. int i = 0;
  8. int k = 0;
  9.  
  10. for(i = 1; i<array_length; i++)
  11. {
  12. if(count[i] > max)
  13. max = count[i];
  14. else if( count[i] == max)
  15. {
  16. commonCount[j] = i;
  17. j++;
  18. }
  19. }
  20.  
  21. cout<< "The most commonly occuring words are: "<< endl;
  22.  
  23. for( k = 0; k<array_length; k++)
  24. cout<< array[commonCount[k]].word<< endl;
  25. }
I dont get a compile-time error. But when I run the program, I get a message telling me the .exe file stopped working.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
matt_570 is offline Offline
13 posts
since Nov 2008
Dec 9th, 2008
0

Re: Determining the number of unique words in a .txt file

I'm now trying to write my stats to a text file, but its only writing two of the stats, and those stats come from the same function.

Heres the code -

C++ Syntax (Toggle Plain Text)
  1. #include <iostream>
  2. #include <fstream>
  3. #include <cstdlib>
  4. #include <string>
  5. #include <iomanip>
  6. using namespace std;
  7.  
  8. int const wordLength = 21;
  9. int const Num = 100;
  10. int const fileSize = 255;
  11.  
  12. struct words
  13. {
  14. char word[wordLength];
  15. int count;
  16. };
  17.  
  18.  
  19.  
  20. void storeFile(char[], char [], words []);
  21. void displayFile(char[], char[], words[]);
  22. int wordSearch(char word[], int array_size, words []);
  23. void sortSetup(char[], words [], int);
  24. int sort (words [], int, int);
  25. void averageLength(char[], words[], int);
  26. void Occurrence(char[], words[], int);
  27. void commonWord(char[], double[], int, words []);
  28. void averageOccurrence(char[], words[], int);
  29.  
  30. int main ()
  31. {
  32.  
  33. words array[Num];
  34. char fileName[fileSize];
  35. char out_file_name[fileSize];
  36.  
  37. cout << "Please enter the name of the file you wish to open: "<< endl;
  38. cin.getline(fileName,fileSize);
  39.  
  40. if (!cin.good() ) {
  41. cout << "Error reading cin..." << endl ;
  42. return -1 ;
  43. }
  44.  
  45. cout<<"Please enter the name of the file you wish to send the data too" << endl;
  46. cin.getline(out_file_name,fileSize);
  47.  
  48. displayFile(out_file_name, fileName, array);
  49. storeFile(out_file_name, fileName, array);
  50.  
  51. cin.ignore();
  52. }
  53.  
  54. void storeFile (char out_file_name[], char fileName[], words array[] )
  55. {
  56. ofstream outFile;
  57. outFile.open(out_file_name);
  58.  
  59. int i = 0;
  60. ifstream inFile;
  61. char line [Num];
  62. int array_size = 0 ;
  63. inFile.open(fileName);
  64.  
  65. while (inFile.getline(line,Num))
  66. {
  67. array_size = wordSearch(line, array_size, array);
  68.  
  69. i++;
  70. }
  71.  
  72. inFile.close();
  73.  
  74. outFile<< "The number of unique words are: "<< array_size << endl;
  75. outFile<< "total number of words are: " << i << endl;
  76. outFile<< endl;
  77.  
  78. }
  79.  
  80. int wordSearch( char line[], int array_size, words array[])
  81. {
  82. int i = array_size ;
  83.  
  84.  
  85. while (i && array_size > 0)
  86. {
  87. if (strcmp(line, array[i-1].word) == 0)
  88. {
  89. array[i-1].count++;
  90. return array_size ;
  91. }
  92. i-- ;
  93. }
  94.  
  95. strcpy(array[array_size].word, line) ;
  96. array[array_size].count = 1 ;
  97. return array_size+1;
  98. }
  99.  
  100. void displayFile (char out_file_name[], char fileName[], words array[] )
  101. {
  102. int i = 0;
  103. char line [Num];
  104.  
  105. ifstream inFile;
  106. inFile.open(fileName);
  107.  
  108.  
  109. while (inFile.getline(line,Num))
  110. {
  111.  
  112. strncpy(array[i].word, line, wordLength);
  113. i++;
  114. }
  115.  
  116. sortSetup(out_file_name, array, i);
  117. averageLength(out_file_name, array, i);
  118. Occurrence(out_file_name, array, i);
  119. averageOccurrence(out_file_name, array, i);
  120.  
  121. inFile.close();
  122. }
  123.  
  124. void sortSetup (char out_file_name[], words array [], int array_length)
  125. {
  126. ofstream outFile;
  127. outFile.open(out_file_name);
  128.  
  129. char temp[wordLength];
  130. int position;
  131. int i = array_length;
  132.  
  133. for (int loop = 0; loop < array_length - 1; loop++)
  134. {
  135. position = sort (array, loop, array_length - 1);
  136. if (position != loop)
  137. {
  138. strcpy(temp, array[position].word);
  139. strcpy(array[position].word, array[loop].word);
  140. strcpy(array[loop].word, temp);
  141. }
  142. }
  143.  
  144. outFile << "The words in alphabetical order are:"<< endl;
  145.  
  146. for (int j = 0; j< array_length; j++)
  147. {
  148. if(strcmp(array[j].word,array[j-1].word)!=0)
  149. outFile << array[j].word << endl;
  150. }
  151.  
  152. outFile << endl;
  153. }
  154.  
  155.  
  156. int sort (words array[], int start, int stop)
  157. {
  158. int n;
  159. int loc = start;
  160. for (int pos = start + 1; pos <= stop; pos++)
  161. {
  162. n = (strcmp(array[pos].word, array[loc].word));
  163.  
  164. if (n < 0)
  165. loc = pos;
  166. }
  167. return loc;
  168. }
  169.  
  170. void averageLength(char out_file_name[], words array[], int i)
  171. {
  172. ofstream outFile;
  173. outFile.open(out_file_name);
  174.  
  175. double average = 0;
  176. for(int j = 0; j<i; j++)
  177. average = average + strlen(array[j].word);
  178.  
  179. average = average/i;
  180.  
  181. outFile << "The average length of the words are: " << average <<endl;
  182. outFile << endl;
  183. }
  184.  
  185. void Occurrence(char out_file_name[], words array[], int array_length)
  186. {
  187. ofstream outFile;
  188. outFile.open(out_file_name);
  189.  
  190. int n;
  191. char cmp_array[wordLength];
  192. double count[Num];
  193.  
  194. for( int i= 0; i< array_length; i++)
  195. {
  196. strcpy(cmp_array, array[i].word);
  197. count[i] = 0;
  198. for (int j=0; j<array_length; j++)
  199. {
  200. n = (strcmp(array[j].word, cmp_array));
  201. if(n == 0)
  202. count[i]++;
  203. }
  204. }
  205.  
  206. outFile<<"The unique words and the number of times they appear in the text file appears asthe following:"<< endl;
  207. outFile<<"word/times it appears:" << endl;
  208. outFile<< endl;
  209. for (int k = 0; k< array_length; k++)
  210. {
  211. if(strcmp(array[k].word,array[k-1].word)!=0)
  212. outFile <<array[k].word << " / " << count[k] << endl;
  213. }
  214.  
  215. outFile<<endl;
  216. commonWord(out_file_name, count, array_length, array);
  217. }
  218.  
  219.  
  220. void commonWord(char out_file_name[], double count[], int array_length, words array[])
  221. {
  222. ofstream outFile;
  223. outFile.open(out_file_name);
  224.  
  225. int count_max;
  226. count_max = count[0];
  227. int j = 0;
  228. int i = 0;
  229.  
  230.  
  231. for(i = 1; i<array_length; i++)
  232. {
  233. if(count[i] > count_max)
  234. count_max = count[i];
  235. }
  236.  
  237. outFile<< "The word(s) that occur the most are: "<< endl;
  238.  
  239. for( j = 0; j<array_length; j++)
  240. {
  241. if(strcmp(array[j].word,array[j-1].word)!=0)
  242. {
  243. if(count[j] == count_max)
  244.  
  245. outFile << array[j].word<< endl;
  246. }
  247. }
  248.  
  249. outFile<< endl;
  250. }
  251.  
  252. void averageOccurrence(char out_file_name[], words array[], int array_length)
  253. {
  254. ofstream outFile;
  255. outFile.open(out_file_name);
  256.  
  257. int n;
  258. char cmp_array[wordLength];
  259. double count[Num];
  260.  
  261. for( int i= 0; i< array_length; i++)
  262. {
  263. strcpy(cmp_array, array[i].word);
  264. count[i] = 0;
  265. for (int j=0; j<array_length; j++)
  266. {
  267. n = (strcmp(array[j].word, cmp_array));
  268. if(n == 0)
  269. count[i]++;
  270. }
  271. }
  272.  
  273. outFile<<"The average occurence of a word appears as the following:" << endl; outFile <<"word/average appearence:" << endl;
  274. outFile<< endl;
  275. for (int k = 0; k< array_length; k++)
  276. {
  277. if(strcmp(array[k].word,array[k-1].word)!=0)
  278. outFile <<array[k].word << " / " << count[k]/array_length << endl;
  279. }
  280.  
  281. outFile<<endl;
  282. }

The "storeFile" function is the only one that prints to the file.
BTW I know this code is unorganized and not the best way to do it, but this project is due tomorrow (12-10)
Last edited by matt_570; Dec 9th, 2008 at 8:15 pm.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
matt_570 is offline Offline
13 posts
since Nov 2008

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in C++ Forum Timeline: Dynamic arrays in a class
Next Thread in C++ Forum Timeline: passing fstream to a function and searching for a string





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC