943,708 Members | Top Members by Rank

Ad:
  • C++ Discussion Thread
  • Unsolved
  • Views: 5084
  • C++ RSS
You are currently viewing page 1 of this multi-page discussion thread
Dec 2nd, 2008
0

Determining the number of unique words in a .txt file

Expand Post »
Hey I have to write a program that reads a text file that contains a list of words, 1 word per line. I have to store the unique words and count the occurrences of each unique word. When the file is completely read, I have to print the words and the number of occurrences to a text file. The output should be the words in alphabetical order along with the number of times they occur. Then print to the file some statistics:

I have to use character arrays instead of strings.
I must use the linear search (something that looks like this)
C++ Syntax (Toggle Plain Text)
  1. array.int search ( int array[], int number, int key )
  2. {
  3. int pos = 0;
  4.  
  5. while(pos < number && array[pos] != key)
  6. pos++;
  7.  
  8. if( pos == number)
  9. pos = 0;
  10. else
  11. pos = 1;
  12.  
  13. return pos;
  14. }
to determine if a word is in the array. The array is an array of structures and that the key is a char array so the string comparison must be used. The search task should be a separate function.
The search must be a separate function that returns an integer values. I cant use a for loop and the function must have only one return statement.
I tried to start off by reading the files and storing the words
C++ Syntax (Toggle Plain Text)
  1. #include <iostream>
  2. #include <fstream>
  3. #include <cstdlib>
  4. #include <string>
  5. #include <iomanip>
  6. using namespace std;
  7.  
  8. void displayFile( char []);
  9.  
  10. void main ()
  11. {
  12. int const wordLength = 21;
  13. int const Num = 101;
  14. int const fileSize = 255;
  15.  
  16.  
  17. char filename[fileSize];
  18. ifstream inFile;
  19.  
  20.  
  21. struct
  22. {
  23. char word[wordLength];
  24. int count;
  25. } wordCount;
  26.  
  27. wordCount array[Num];
  28.  
  29. cout << "Please enter the name of thr file you wish to open: "<< endl;
  30. cin >> filename;
  31.  
  32. displayFile (filename);
  33.  
  34. }
  35.  
  36. void displayFile (char fileName [] )
  37. {
  38. ifstream infile;
  39. char array [101];
  40. char line [101];
  41. int ch;
  42. int i;
  43. infile.open (fileName);
  44.  
  45. while ((ch = infile.peek()) != EOF)
  46. {
  47. infile.getline (line, 101);
  48. line = array[i];
  49. i++;
  50. }
  51.  
  52. for (int j = 0; j<101; j++)
  53. cout << array[j] << endl;
  54. infile.close ();

Any tips would be appreciated, and thanks in advance.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
matt_570 is offline Offline
13 posts
since Nov 2008
Dec 2nd, 2008
0

Re: Determining the number of unique words in a .txt file

I would first create a structure
C++ Syntax (Toggle Plain Text)
  1. struct words
  2. {
  3. std::string word;
  4. int count;
  5. };

Next an vector (array) of those structures
C++ Syntax (Toggle Plain Text)
  1. vector<words> wordArray;

Now, then the program reads a line, check wordArray if its already in the array. if it is then increment the count. If not, then add a new element to the vector. When done, you will have the information you need to display on the screen.

Another method is to use a <map>, but implementation is left for someone else.
Sponsor
Team Colleague
Featured Poster
Reputation Points: 5608
Solved Threads: 2282
Retired and Enjoying Life
Ancient Dragon is offline Offline
21,950 posts
since Aug 2005
Dec 3rd, 2008
0

Re: Determining the number of unique words in a .txt file

I think you should look at the below code more carefully and must understand the complexity and map datastructure,

if you've any confusion post, I will explain you in more detail then.
cpp Syntax (Toggle Plain Text)
  1. fstream inFile("C:\\Laiq.txt");
  2.  
  3. string line;
  4. std::map<string, int> mapWordCount;
  5. string word;
  6. std::string::size_type endWord;
  7. std::string::size_type startWord;
  8. while (std::getline (inFile,line)) {
  9. startWord = 0;
  10. while ((endWord = line.find_first_of(" ", startWord))!= line.npos)
  11. {
  12. word = line.substr (startWord, endWord - startWord);
  13. mapWordCount[word]++;
  14. startWord = endWord+1;
  15. }
  16. }
  17. inFile.close();
  18.  
  19.  
  20. for(map<string,int>::const_iterator iter = mapWordCount.begin();
  21. iter != mapWordCount.end(); ++iter) {
  22. cout<< "The Frequency of word: "<< iter->first << " is : "<< iter->second <<endl;
  23. }
Last edited by Laiq Ahmed; Dec 3rd, 2008 at 5:06 am. Reason: Changing Code Intendation
Reputation Points: 113
Solved Threads: 20
Junior Poster
Laiq Ahmed is offline Offline
147 posts
since Jun 2006
Dec 3rd, 2008
0

Re: Determining the number of unique words in a .txt file

You have two different problems:
1. Tokenize the input stream (extract words from the stream).
2. Build word dictionary.
The second one has a very simple solution: use std::<map> data structure as Laiq Ahmed mentioned above. However you need another code to process every word with the map-based word dictionary:
C++ Syntax (Toggle Plain Text)
  1. if the next word is found in the map then
  2. do notning or increment this word counter
  3. else
  4. insert new word in the map
  5. endif
There are lots of methods to solve the 1st problem. For example:
C++ Syntax (Toggle Plain Text)
  1. open file stream
  2. create an empty map
  3. loop // until eof
  4. skip non-letters
  5. clear word buffer // use std::string::clear()
  6. append letters to the word buffer
  7. process the word with the map // see #2
  8. endloop
  9. process a possible last word
  10. traverse map (file word dictionary)
Summary:
- use std::ifstream
- use std::string for word buffer
- use std::map<std::string,int> for dictionary with counters
You have a good chance to write a simple and clear code after a proper functional decomposition of pseudocode snippets ...
Reputation Points: 1234
Solved Threads: 347
Postaholic
ArkM is offline Offline
2,001 posts
since Jul 2008
Dec 3rd, 2008
0

Re: Determining the number of unique words in a .txt file

>>I have to use character arrays instead of strings.
I must use the linear search

That pretty much precludes use of a map. It also means you can't use STL strings within a struct, and may preclude use of a vector to hold the structs, too.

You could use a C style string with either static or dynamic memory within the struct. The static version will limit the maximum length of any possible string entered, but since these strings will be words, then it would be unlikely that any given word would have more than 20 or 30 letters per word. Using dynamic memory to store the string within the struct will minimize the memory wasted by using strings of length less than than maximum length in the static version, but will require you to manage your own memory, which is a bit of a hassle, though not an overwhelming task by any means.

The non-STL structures you could use to store the structs could be either an array or a list. If you don't know the maximum number of possible unique words you could encounter then using a list might be advantageous. You could make a guess as to the max number of words, but that isn't quite as predictable as the maximum size of each word. Linear searching is possible with either lists or arrays.

Sorting the structures by string in alphabetical order before printing/sending values to file could be done in either of several different ways. For beginners, bubble sorts with arrays and insertion sorts with lists seem pretty popular.
Reputation Points: 718
Solved Threads: 373
Nearly a Posting Maven
Lerner is offline Offline
2,253 posts
since Jul 2005
Dec 3rd, 2008
0

Re: Determining the number of unique words in a .txt file

Sorry, I have not noticed those absurd restruictions.
The only result of this idiotic methodology: a pupil sheds tears on DaniWeb then use copy/paste for extremelly ineffective codes in early 60-th style (or worse)...
Reputation Points: 1234
Solved Threads: 347
Postaholic
ArkM is offline Offline
2,001 posts
since Jul 2008
Dec 3rd, 2008
0

Re: Determining the number of unique words in a .txt file

The maximum length of a word is 20, so the character array needs to be 21. The maximum numbers of words is 100, if there are more than 100 words I need to return a message saying something like "The stats did not use the words after the 100th word"

Anyways, I'm having a hard time even starting the program. I'm starting off by trying to read and store the words of the text file. My text book has code for reading line by line a text file and then displaying it. So I figured I should start with that. Heres the code-

C++ Syntax (Toggle Plain Text)
  1. #include <iostream>
  2. #include <fstream>
  3. #include <cstdlib> // needed for exit()
  4. #include <string>
  5. using namespace std;
  6.  
  7. int main()
  8. {
  9. string filename = "text.dat"; // put the filename up front
  10. string line;
  11. ifstream inFile;
  12.  
  13. inFile.open(filename.c_str());
  14.  
  15. if (inFile.fail()) // check for successful open
  16. {
  17. cout << "\nThe file was not successfully opened"
  18. << "\n Please check that the file currently exists."
  19. << endl;
  20. exit(1);
  21. }
  22.  
  23. // read and display the file's contents
  24. while (getline(inFile,line))
  25. cout << line << endl;
  26.  
  27. inFile.close();
  28.  
  29. cin.ignore(); // this line is optional
  30.  
  31. return 0;
  32. }

Heres my code split into two functions, however it does not compile. I get an error at the while statement.-
C++ Syntax (Toggle Plain Text)
  1. #include <iostream>
  2. #include <fstream>
  3. #include <cstdlib>
  4. #include <string>
  5. #include <iomanip>
  6. using namespace std;
  7.  
  8. void displayFile( char []);
  9.  
  10. void main ()
  11. {
  12. int const wordLength = 21;
  13. int const Num = 101;
  14. int const fileSize = 255;
  15. char filename[fileSize];
  16.  
  17. cout << "Please enter the name of the file you wish to open: "<< endl;
  18. cin.getline(filename,fileSize);
  19.  
  20. displayFile (filename);
  21. cin.ignore();
  22. }
  23.  
  24. void displayFile (char fileName[] )
  25. {
  26. ifstream inFile;
  27.  
  28. char line [101];
  29.  
  30. inFile.open(fileName);
  31.  
  32. while (getline(inFile, line))
  33. cout << line << endl;
  34.  
  35. inFile.close();
  36.  
  37. return 0;
  38. }
My code is nearly identical except for the fact that I use character arrays instead of strings, so I'm not sure why its not compiling.
Also, is this a good way to start off the program? Or should I try something else.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
matt_570 is offline Offline
13 posts
since Nov 2008
Dec 3rd, 2008
0

Re: Determining the number of unique words in a .txt file

Ok I figured out I need to use "inFile.getline(line,length)" instead of "getline(inFile, line)". I created a struct like stated early. However I'm having trouble storing the words from the text file.

I use this function-

C++ Syntax (Toggle Plain Text)
  1. void displayFile (char fileName[], words array[] )
  2. {
  3. int i = 0;
  4. ifstream inFile;
  5.  
  6. char line [101];
  7.  
  8. inFile.open(fileName);
  9.  
  10. while (inFile.getline(line,101))
  11. {
  12. cout << line << endl;
  13. array[i].word = line;
  14. i++;
  15. }
  16. inFile.close();
  17.  
  18. }
Where array, is an array of structs made up of character array"word" and an integer "count". Any help on storing the words and the number of words in the array would be helpfull.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
matt_570 is offline Offline
13 posts
since Nov 2008
Dec 4th, 2008
0

Re: Determining the number of unique words in a .txt file

Read-a-line-in-a-char-array solution has an obvious defect: you must define max line length a priori.
Look at the function which can get words from an input stream (not only from fstream) directly:
c++ Syntax (Toggle Plain Text)
  1. // #include <cctype> dependency
  2. typedef unsigned char Uchar; // needed for isalpha()
  3. /// Get or skip the next word from a stream.
  4. /// wbufsize is the max word length + 1 (for null char)
  5. bool getWord(char* word, int wbufsize, std::istream& is)
  6. {
  7. int i = 0, n = wbufsize - 1; // max letters
  8. bool fill = (word && n > 0); // word wanted
  9. char ch; // current char
  10.  
  11. while (is.get(ch) && !isalpha(Uchar(ch)))
  12. ; // skip delimiters
  13. if (is) // have a letter
  14. do {
  15. if (fill) { // have a room
  16. word[i] = ch; // append letter
  17. if (i >= n) // no more room
  18. fill = false;
  19. }
  20. ++i; // letters counter
  21. } while (is.get(ch) && isalpha(Uchar(ch)));
  22. if (word)// have a buffer
  23. word[i] = '\0'; // end of word
  24. return i > 0; // have a word
  25. }
Of course, it's possible to adopt this algorithm to extract words from a char array.
Reputation Points: 1234
Solved Threads: 347
Postaholic
ArkM is offline Offline
2,001 posts
since Jul 2008
Dec 4th, 2008
0

Re: Determining the number of unique words in a .txt file

I tried a different Approach to achieve the same
cpp Syntax (Toggle Plain Text)
  1. char* arr = " Hallo WOrld";
  2. char* ptrFirst = arr;
  3. char* ptrIter = arr;
  4. int nWordCount = 0;
  5.  
  6. // Base Condition.
  7. while (isspace (*ptrFirst))
  8. ++ptrFirst;
  9.  
  10. ptrIter = ptrFirst;
  11.  
  12. while (ptrFirst) {
  13.  
  14. while (isalpha(*ptrIter)) {
  15. ++ptrIter;
  16. }
  17.  
  18. ++nWordCount;
  19.  
  20. while (isspace (*ptrIter))
  21. ++ptrIter;
  22.  
  23. if (ptrFirst == ptrIter)
  24. break;
  25.  
  26. ptrFirst = ptrIter;
  27. }
  28.  
  29. cout << nWordCount-1 <<endl;

Hope this approach will help you understand?
Reputation Points: 113
Solved Threads: 20
Junior Poster
Laiq Ahmed is offline Offline
147 posts
since Jun 2006

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in C++ Forum Timeline: Dynamic arrays in a class
Next Thread in C++ Forum Timeline: passing fstream to a function and searching for a string





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC