![]() |
| ||
| Determining the number of unique words in a .txt file Hey I have to write a program that reads a text file that contains a list of words, 1 word per line. I have to store the unique words and count the occurrences of each unique word. When the file is completely read, I have to print the words and the number of occurrences to a text file. The output should be the words in alphabetical order along with the number of times they occur. Then print to the file some statistics: I have to use character arrays instead of strings. I must use the linear search (something that looks like this) array.int search ( int array[], int number, int key )to determine if a word is in the array. The array is an array of structures and that the key is a char array so the string comparison must be used. The search task should be a separate function. The search must be a separate function that returns an integer values. I cant use a for loop and the function must have only one return statement. I tried to start off by reading the files and storing the words #include <iostream> Any tips would be appreciated, and thanks in advance. |
| ||
| Re: Determining the number of unique words in a .txt file I would first create a structure struct words Next an vector (array) of those structures vector<words> wordArray; Now, then the program reads a line, check wordArray if its already in the array. if it is then increment the count. If not, then add a new element to the vector. When done, you will have the information you need to display on the screen. Another method is to use a <map>, but implementation is left for someone else. |
| ||
| Re: Determining the number of unique words in a .txt file I think you should look at the below code more carefully and must understand the complexity and map datastructure, if you've any confusion post, I will explain you in more detail then. fstream inFile("C:\\Laiq.txt"); |
| ||
| Re: Determining the number of unique words in a .txt file You have two different problems: 1. Tokenize the input stream (extract words from the stream). 2. Build word dictionary. The second one has a very simple solution: use std::<map> data structure as Laiq Ahmed mentioned above. However you need another code to process every word with the map-based word dictionary: if the next word is found in the map thenThere are lots of methods to solve the 1st problem. For example: open file streamSummary: - use std::ifstream - use std::string for word buffer - use std::map<std::string,int> for dictionary with counters You have a good chance to write a simple and clear code after a proper functional decomposition of pseudocode snippets ;)... |
| ||
| Re: Determining the number of unique words in a .txt file >>I have to use character arrays instead of strings. I must use the linear search That pretty much precludes use of a map. It also means you can't use STL strings within a struct, and may preclude use of a vector to hold the structs, too. You could use a C style string with either static or dynamic memory within the struct. The static version will limit the maximum length of any possible string entered, but since these strings will be words, then it would be unlikely that any given word would have more than 20 or 30 letters per word. Using dynamic memory to store the string within the struct will minimize the memory wasted by using strings of length less than than maximum length in the static version, but will require you to manage your own memory, which is a bit of a hassle, though not an overwhelming task by any means. The non-STL structures you could use to store the structs could be either an array or a list. If you don't know the maximum number of possible unique words you could encounter then using a list might be advantageous. You could make a guess as to the max number of words, but that isn't quite as predictable as the maximum size of each word. Linear searching is possible with either lists or arrays. Sorting the structures by string in alphabetical order before printing/sending values to file could be done in either of several different ways. For beginners, bubble sorts with arrays and insertion sorts with lists seem pretty popular. |
| ||
| Re: Determining the number of unique words in a .txt file Sorry, I have not noticed those absurd restruictions. The only result of this idiotic methodology: a pupil sheds tears on DaniWeb then use copy/paste for extremelly ineffective codes in early 60-th style (or worse)... |
| ||
| Re: Determining the number of unique words in a .txt file The maximum length of a word is 20, so the character array needs to be 21. The maximum numbers of words is 100, if there are more than 100 words I need to return a message saying something like "The stats did not use the words after the 100th word" Anyways, I'm having a hard time even starting the program. I'm starting off by trying to read and store the words of the text file. My text book has code for reading line by line a text file and then displaying it. So I figured I should start with that. Heres the code- #include <iostream> Heres my code split into two functions, however it does not compile. I get an error at the while statement.- #include <iostream>My code is nearly identical except for the fact that I use character arrays instead of strings, so I'm not sure why its not compiling. Also, is this a good way to start off the program? Or should I try something else. |
| ||
| Re: Determining the number of unique words in a .txt file Ok I figured out I need to use "inFile.getline(line,length)" instead of "getline(inFile, line)". I created a struct like stated early. However I'm having trouble storing the words from the text file. I use this function- void displayFile (char fileName[], words array[] )Where array, is an array of structs made up of character array"word" and an integer "count". Any help on storing the words and the number of words in the array would be helpfull. |
| ||
| Re: Determining the number of unique words in a .txt file Read-a-line-in-a-char-array solution has an obvious defect: you must define max line length a priori. Look at the function which can get words from an input stream (not only from fstream) directly: // #include <cctype> dependencyOf course, it's possible to adopt this algorithm to extract words from a char array. |
| ||
| Re: Determining the number of unique words in a .txt file I tried a different Approach to achieve the same char* arr = " Hallo WOrld"; Hope this approach will help you understand? |
| ||
| Re: Determining the number of unique words in a .txt file >I tried a different Approach to achieve the same It's not only "different" approach: it's a wrong approach ;). The C Standard (7.4.1.10): The standard white-space characters are the following:Therefore the code above can't select word in "123four5six..." string. Also it has obviously incorrect lines, for example: while (ptrFirst) { // probably, must be *ptrFirstMore subtle defect of the code above is that isXXX family function do not work for negative arguments. If a character in a text file has a bit value '1xxxxxxx' and implementation char type is signed then [icode]*ptrIter[/code] expression gets negative integer and [icode]isalpha(*ptrIter)[/code] result is undefined. That's why Uchar typedef was defined in my code.In actual fact it's inaccurate implementation of the same (scanner-like) approach ;) Apropos, if we have text file with whitespace separators only, no need in scanner-like methods at all. The simplest code works fine: string word; |
| ||
| Re: Determining the number of unique words in a .txt file My Bad with the Line while (ptrFirst) { Agreed with C99 standard, But the requirement doesn't say anything regarding the numeric separated words. thats why I've implemented the code this way, again your fstream approach is simplest but the thing is that I've tried to use the char* instead of streams provided function.Thanks. by the way I didn't compile it. |
| ||
| Re: Determining the number of unique words in a .txt file Strictly speaking, there is 1 word per line in those strange requirements (linear search only, dont use string, cant use for loop etc - it's enough to make you weep). If so no need in word extraction code at all. Well, if you don't like "stream-based approach", don't use C++ fstream to get lines from a file. Use fgets or what else from C stuff. Furthemore, it's so easy to adopt the code for C-string scan: change f.get(c) to the next char extraction code with null byte test. Oh, sorry, I forgot: don't use istringstream! Don't use C++ at all ;)... |
| ||
| Re: Determining the number of unique words in a .txt file ArkM: I am not denying your opinions but the thing is that we should start learning from basics and Programming language has nothing to do with the logic, if you understand the logic then I suggest to use the built-in functions otherwise creating a raw logic is always a good starting point. |
| ||
| Re: Determining the number of unique words in a .txt file Is it programming learning basics: don't use for loops, don't use this, don't use that... and so on? It's a profanation. Better download and read well-known B.Stroustrup's article "Learning Standard C++ as a New Language": http://www.research.att.com/~bs/new_learning.pdf |
| ||
| Re: Determining the number of unique words in a .txt file Thanks ArkM I've gone through this article of Bjarne, no contradiction with this document at all, but as an experienced programmer what do you think of requirements, practically speaking "One day your boss come to your desk and ask I've bought a library written in C and I want you to use that for blah blah?" or what if your boss ask you to develop a C library itself ? I am not telling you that C is superior than C++, but the thing that matter is requirements, if someone asks for C code teach them C but also provide them with the C++ implementation and the differences between the two. I think this is better learning approach. |
| ||
| Re: Determining the number of unique words in a .txt file Thanks for all the input. I was able to store the lines with strcpy(). but now I'm trying to use strncmp to find out the number of unique words (or lines) in the text file. I tried a couple of things, but none seemed to work. I'm given these guideline- You must use the linear search algorithm to determine if a word is in the array. Remember that the array is an array of structures and that the key is a string (char array) so the string comparison must be used. The search task should be a separate function. The search must be a separate function that returns an integer values. Do not use a for loop and the function must have only one return statement. Heres the instructors linear search- int search (int list [], int size, int key)I'm really having trouble on this, any help would be appreciated. |
| ||
| Re: Determining the number of unique words in a .txt file bool search(char ** words, int numWords, char * currentWord) |
| ||
| Re: Determining the number of unique words in a .txt file Quote:
Now let's remember: this is the C++ language thread and we are talking about C++ here. |
| ||
| Re: Determining the number of unique words in a .txt file Ok, heres my revised code- #include <iostream>My search functions still arent giving me the results I want, it just returns the number of words, not unique words. |
| ||
| Re: Determining the number of unique words in a .txt file change this: while (inFile.getline(line,Num))To this: while (inFile.getline(line,Num))Change wordSearch to this: bool wordSearch( char line[], int count, words array[])Eliminate wordSearchSetup() completely. count should be the number of unique words found. If it reaches 100 before completely reading the file you will have to output the full error message. If you want to keep track of the number of total words found in the file in addition to the number of unique words in the file, you can do that too. Once you have completed the file reading you can display array with each unique word and the number of times it was found. |
| ||
| Re: Determining the number of unique words in a .txt file Ok thanks, now I'm trying to determine the average occurence of each words, I'm starting out by finding out how many of each word there is, heres my function- void averageOccurrence(words array[], int array_length)It just gives me a large count like 150077, or 150079. |
| ||
| Re: Determining the number of unique words in a .txt file What's initial value of count member (must be zero)? |
| ||
| Re: Determining the number of unique words in a .txt file >>trying to determine the average occurence of each words You don't care what each word is to do this, you only need only need to know how many unique words there are----that would be count in my last post, and how many words there were in the file. The number of words in the file could be calculated as a running total as you read through the file, as indicated in my last post, or it can be calculated by looping through the array of unique words and adding up the number of each in a running total. For example. if there are three unique words with frequency of 3, 6 and 9 each respectively, then the average number of occurences of unique words would be 6. You can decide which approach you wish to take. However, the code you have posted in post #22 above doesn't have a chance of coming up with the correct answer. |
| ||
| Re: Determining the number of unique words in a .txt file Thanks a lot, that helps a lot. I'm on my last stat, I have to find the most commonly occuring word(s). Heres my function- void commonWord(double count[], int array_length, words array[])I dont get a compile-time error. But when I run the program, I get a message telling me the .exe file stopped working. |
| ||
| Re: Determining the number of unique words in a .txt file I'm now trying to write my stats to a text file, but its only writing two of the stats, and those stats come from the same function. Heres the code - #include <iostream> The "storeFile" function is the only one that prints to the file. BTW I know this code is unorganized and not the best way to do it, but this project is due tomorrow (12-10) |
| All times are GMT -4. The time now is 10:43 am. |
Forum system based on vBulletin Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
©2003 - 2009 DaniWeb® LLC