Determining the number of unique words in a .txt file

Please support our C++ advertiser: Intel Parallel Studio Home
Reply

Join Date: Jul 2008
Posts: 2,001
Reputation: ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of 
Solved Threads: 343
ArkM's Avatar
ArkM ArkM is offline Offline
Postaholic

Re: Determining the number of unique words in a .txt file

 
0
  #11
Dec 4th, 2008
>I tried a different Approach to achieve the same
It's not only "different" approach: it's a wrong approach .
The C Standard (7.4.1.10):
  1. The standard white-space characters are the following:
  2. space (' '), form feed ('\f'), new-line ('\n'), carriage return ('\r'),
  3. horizontal tab ('\t'), and vertical tab ('\v'). In the "C" locale,
  4. isspace returns true only for the standard white-space characters.
Therefore the code above can't select word in "123four5six..." string. Also it has obviously incorrect lines, for example:
  1. while (ptrFirst) { // probably, must be *ptrFirst
More subtle defect of the code above is that isXXX family function do not work for negative arguments. If a character in a text file has a bit value '1xxxxxxx' and implementation char type is signed then [icode]*ptrIter[/code] expression gets negative integer and [icode]isalpha(*ptrIter)[/code] result is undefined. That's why Uchar typedef was defined in my code.

In actual fact it's inaccurate implementation of the same (scanner-like) approach

Apropos, if we have text file with whitespace separators only, no need in scanner-like methods at all. The simplest code works fine:
  1. string word;
  2. while (file >> word) {
  3. // process word
  4. }
Last edited by ArkM; Dec 4th, 2008 at 9:14 am.
Reply With Quote Quick reply to this message  
Join Date: Jun 2006
Posts: 147
Reputation: Laiq Ahmed will become famous soon enough Laiq Ahmed will become famous soon enough 
Solved Threads: 20
Laiq Ahmed Laiq Ahmed is offline Offline
Junior Poster

Re: Determining the number of unique words in a .txt file

 
0
  #12
Dec 4th, 2008
My Bad with the Line
  1. while (ptrFirst) {
Agreed with C99 standard, But the requirement doesn't say anything regarding the numeric separated words. thats why I've implemented the code this way, again your fstream approach is simplest but the thing is that I've tried to use the char* instead of streams provided function.

Thanks.

by the way I didn't compile it.
Reply With Quote Quick reply to this message  
Join Date: Jul 2008
Posts: 2,001
Reputation: ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of 
Solved Threads: 343
ArkM's Avatar
ArkM ArkM is offline Offline
Postaholic

Re: Determining the number of unique words in a .txt file

 
0
  #13
Dec 4th, 2008
Strictly speaking, there is 1 word per line in those strange requirements (linear search only, dont use string, cant use for loop etc - it's enough to make you weep). If so no need in word extraction code at all.
Well, if you don't like "stream-based approach", don't use C++ fstream to get lines from a file. Use fgets or what else from C stuff. Furthemore, it's so easy to adopt the code for C-string scan: change f.get(c) to the next char extraction code with null byte test. Oh, sorry, I forgot: don't use istringstream! Don't use C++ at all ...
Reply With Quote Quick reply to this message  
Join Date: Jun 2006
Posts: 147
Reputation: Laiq Ahmed will become famous soon enough Laiq Ahmed will become famous soon enough 
Solved Threads: 20
Laiq Ahmed Laiq Ahmed is offline Offline
Junior Poster

Re: Determining the number of unique words in a .txt file

 
0
  #14
Dec 4th, 2008
ArkM: I am not denying your opinions but the thing is that we should start learning from basics and Programming language has nothing to do with the logic, if you understand the logic then I suggest to use the built-in functions otherwise creating a raw logic is always a good starting point.
Reply With Quote Quick reply to this message  
Join Date: Jul 2008
Posts: 2,001
Reputation: ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of 
Solved Threads: 343
ArkM's Avatar
ArkM ArkM is offline Offline
Postaholic

Re: Determining the number of unique words in a .txt file

 
0
  #15
Dec 4th, 2008
Is it programming learning basics: don't use for loops, don't use this, don't use that... and so on? It's a profanation.
Better download and read well-known B.Stroustrup's article "Learning Standard C++ as a New Language":
http://www.research.att.com/~bs/new_learning.pdf
Reply With Quote Quick reply to this message  
Join Date: Jun 2006
Posts: 147
Reputation: Laiq Ahmed will become famous soon enough Laiq Ahmed will become famous soon enough 
Solved Threads: 20
Laiq Ahmed Laiq Ahmed is offline Offline
Junior Poster

Re: Determining the number of unique words in a .txt file

 
0
  #16
Dec 4th, 2008
Thanks ArkM I've gone through this article of Bjarne, no contradiction with this document at all, but as an experienced programmer what do you think of requirements,
practically speaking
"One day your boss come to your desk and ask I've bought a library written in C and I want you to use that for blah blah?"
or what if your boss ask you to develop a C library itself ?
I am not telling you that C is superior than C++, but the thing that matter is requirements, if someone asks for C code teach them C but also provide them with the C++ implementation and the differences between the two. I think this is better learning approach.
Reply With Quote Quick reply to this message  
Join Date: Nov 2008
Posts: 13
Reputation: matt_570 is an unknown quantity at this point 
Solved Threads: 0
matt_570 matt_570 is offline Offline
Newbie Poster

Re: Determining the number of unique words in a .txt file

 
0
  #17
Dec 4th, 2008
Thanks for all the input. I was able to store the lines with strcpy(). but now I'm trying to use strncmp to find out the number of unique words (or lines) in the text file. I tried a couple of things, but none seemed to work. I'm given these guideline-


You must use the linear search algorithm to determine if a word is in the array. Remember that the array is an array of structures and that the key is a string (char array) so the string comparison must be used. The search task should be a separate function.
The search must be a separate function that returns an integer values. Do not use a for loop and the function must have only one return statement.

Heres the instructors linear search-
  1. int search (int list [], int size, int key)
  2. {
  3. int pos = 0;
  4. while (pos < size && list[pos] != key)
  5. pos++;
  6. if (pos == size)
  7. pos = -1;
  8. return pos;
  9. }
I'm really having trouble on this, any help would be appreciated.
Reply With Quote Quick reply to this message  
Join Date: Jul 2005
Posts: 1,681
Reputation: Lerner is a name known to all Lerner is a name known to all Lerner is a name known to all Lerner is a name known to all Lerner is a name known to all Lerner is a name known to all 
Solved Threads: 264
Lerner Lerner is offline Offline
Posting Virtuoso

Re: Determining the number of unique words in a .txt file

 
0
  #18
Dec 4th, 2008
  1. bool search(char ** words, int numWords, char * currentWord)
  2. {
  3. bool found = false;
  4. int i = 0;
  5. //compare current word to each word already in array
  6. while(!found && i < numWords)
  7. //if current word is found
  8. if(strcmp(currentWord, words[i]) == 0)
  9. //change flag to end loop
  10. found = true;
  11. return found;
  12. }
Last edited by Lerner; Dec 4th, 2008 at 7:23 pm.
Klatu Barada Nikto
Reply With Quote Quick reply to this message  
Join Date: Jul 2008
Posts: 2,001
Reputation: ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of 
Solved Threads: 343
ArkM's Avatar
ArkM ArkM is offline Offline
Postaholic

Re: Determining the number of unique words in a .txt file

 
0
  #19
Dec 4th, 2008
Originally Posted by Laiq Ahmed View Post
"One day your boss come to...for blah blah?" or what if your boss ask you to develop a C library itself ?
You change the subject, it's a very interesting theme. C and C++ are different languages. I'm using both languages at the same time. No problems. It's me come to my boss and say him what's our next library . But I never teach my team young members with "don't use for loops in C" methodology. No C specific style in those absurd requirements. Do you really think that bad_programming == C and good_programming == C++?
Now let's remember: this is the C++ language thread and we are talking about C++ here.
Reply With Quote Quick reply to this message  
Join Date: Nov 2008
Posts: 13
Reputation: matt_570 is an unknown quantity at this point 
Solved Threads: 0
matt_570 matt_570 is offline Offline
Newbie Poster

Re: Determining the number of unique words in a .txt file

 
0
  #20
Dec 4th, 2008
Ok, heres my revised code-

  1. #include <iostream>
  2. #include <fstream>
  3. #include <cstdlib>
  4. #include <string>
  5. #include <iomanip>
  6. using namespace std;
  7.  
  8. int const wordLength = 21;
  9. int const Num = 100;
  10. int const fileSize = 255;
  11.  
  12. struct words
  13. {
  14. char word[wordLength];
  15. int count;
  16. };
  17.  
  18.  
  19.  
  20. int storeFile( char [], words []);
  21. void wordSearchSetup(char[], int, words[]);
  22. int wordSearch(char[], int, words[]);
  23.  
  24. void main ()
  25. {
  26. int count;
  27. char fileName[fileSize];
  28.  
  29. cout << "Please enter the name of the file you wish to open: "<< endl;
  30. cin.getline(fileName,fileSize);
  31.  
  32. words array[Num];
  33.  
  34. count = storeFile (fileName, array);
  35.  
  36.  
  37. cin.ignore();
  38. }
  39.  
  40. int storeFile (char fileName[], words array[] )
  41. {
  42. int count = 0;
  43. int i = 0;
  44. ifstream inFile;
  45. char line [Num];
  46.  
  47. inFile.open(fileName);
  48.  
  49. while (inFile.getline(line,Num))
  50. {
  51. strncpy(array[i].word, line, wordLength);
  52. count++;
  53. i++;
  54. }
  55.  
  56. inFile.close();
  57.  
  58. wordSearchSetup( fileName, count, array);
  59. return count;
  60. }
  61.  
  62. void wordSearchSetup(char fileName[], int count, words array[])
  63. {
  64. char line[Num];
  65. int i = 0;
  66. int size;
  67. ifstream inFile;
  68. inFile.open(fileName);
  69.  
  70. while (inFile.getline(line,Num))
  71. size = wordSearch(line, count, array);
  72.  
  73. cout << size << endl;
  74.  
  75.  
  76. }
  77.  
  78.  
  79. int wordSearch( char line[], int count, words array[])
  80. {
  81. int i = count;
  82.  
  83. while (i)
  84. {
  85. if (strcmp(line, array[i-1].word) == 0)
  86. {
  87. array[i-1].count++;
  88. return count;
  89. }
  90. i-- ;
  91. }
  92.  
  93. strcpy(array[count].word, line) ;
  94. array[count].count = 1 ;
  95. return count+1;
  96. }
My search functions still arent giving me the results I want, it just returns the number of words, not unique words.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC