View Single Post
Join Date: Jul 2008
Posts: 2,001
Reputation: ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of 
Solved Threads: 343
ArkM's Avatar
ArkM ArkM is offline Offline
Postaholic

Re: Determining the number of unique words in a .txt file

 
0
  #9
Dec 4th, 2008
Read-a-line-in-a-char-array solution has an obvious defect: you must define max line length a priori.
Look at the function which can get words from an input stream (not only from fstream) directly:
  1. // #include <cctype> dependency
  2. typedef unsigned char Uchar; // needed for isalpha()
  3. /// Get or skip the next word from a stream.
  4. /// wbufsize is the max word length + 1 (for null char)
  5. bool getWord(char* word, int wbufsize, std::istream& is)
  6. {
  7. int i = 0, n = wbufsize - 1; // max letters
  8. bool fill = (word && n > 0); // word wanted
  9. char ch; // current char
  10.  
  11. while (is.get(ch) && !isalpha(Uchar(ch)))
  12. ; // skip delimiters
  13. if (is) // have a letter
  14. do {
  15. if (fill) { // have a room
  16. word[i] = ch; // append letter
  17. if (i >= n) // no more room
  18. fill = false;
  19. }
  20. ++i; // letters counter
  21. } while (is.get(ch) && isalpha(Uchar(ch)));
  22. if (word)// have a buffer
  23. word[i] = '\0'; // end of word
  24. return i > 0; // have a word
  25. }
Of course, it's possible to adopt this algorithm to extract words from a char array.
Reply With Quote