I have been given a problem in which i have to write a program to find the letter frequency and the average word length of a piece of text. i've written the code to promt for and read in the text file. i'm finding it difficult however to make the jump to analyse the text i.e. to find the total number of words to start with. i think once i do this i'll be able to finish it but am having difficulties with this bit so any tips or suggestions would be greatly received.

Recommended Answers

All 2 Replies

I have been given a problem in which i have to write a program to find the letter frequency and the average word length of a piece of text. i've written the code to promt for and read in the text file. i'm finding it difficult however to make the jump to analyse the text i.e. to find the total number of words to start with. i think once i do this i'll be able to finish it but am having difficulties with this bit so any tips or suggestions would be greatly received.

How about reading the words in from the text file one word at a time and counting the words as you go?

strtok( ) is a good function for this.

you tokenize each word based on "whitespace"

so

single_word = strtok(MyBuffer," ")

is the first call and it returns to "single_word" the pointer to a string that is everything up to (and not including) the first whitespace. in other words, it points to the first single word found in "MyBuffer".

subsequent calls to strtok( ) use the NULL argument to indicate that it's a continuation of the original handle (started with "MyBuffer").

single_word = strtok(NULL," ");

and each time this is called (typically from a single loop) it will keep putting the next "word" into the string pointed to by 'single_word'

once there are no more "words", buffer will be the NULL pointer, so you need to test for this each time.

while (strtok(NULL," ") != NULL)
{
<stuff to do>
}

caveats: "MyBuffer". is often read from a file one line at a time... words at the end of the line are not necessarily entire words. they may be chopped and continue on the next line, and thus in another buffer. special handling will be needed for these situations.

another thing to consider is that the original string (what i called "MyBuffer". above) will be beat up by the use of strtok( ). specifically the actual text in the buffer will have each of the tokens (in this example the whitespace
" ") replaced with a null character ('\0') this is very important if you planned on keeping the text for use afterwards.

be sure to look up the full specification of the "strtok( )" function. its very powerful, but not especially intuitive to use for beginners.

the other option is heavy use of pointers.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.