I am trying to create a program that, after inputing a text file, can:
1) Count number of lines in the text file.
2) Count number of words in the file.
3) Count number of characters in the file including white spaces.
4) Count number of characters in the file excluding white spaces.

I have written the following code:

#include <iostream>
#include <string>
#include <fstream>
using namespace std;

int main()
{
	ifstream file("wordcount.txt");
	string s1;
	int words=0, lines=0, ch_spaces=0, ch_nspaces=0;
	while (!file.eof())
	{
		getline(file, s1);
		ch_spaces += s1.size();
		lines++;
	}
	file.clear();
	file.seekg(0, ios::beg);
	while (!file.eof())
	{
		file >> s1;
		words++;
	}
	
	cout << "The file contains " << ch_spaces << " characters (including spaces)\n";
	cout << "The file contains " << ch_nspaces << " characters (excluding spaces)\n";
	cout << "The file contains " << words << " words.\n";
	cout << "The file contains " << lines << " lines.\n";

	system("pause");
	return 0;
}

It fulfills the first three requirements. But I can't think of a method to count number of characters without white spaces (spaces, tabs etc.)

Can someone please help me with this??? I am thinking on two approaches.
1) Count the number of spaces and subtract it from ch_spaces.
2) Trim the text on a line by line bases and count the number of characters in it in a similar way as I counted ch_spaces.

If someone can help me in implementing any of these two approaches than it would be great too.

Thanks in advance!!!!!!!!!

Recommended Answers

All 7 Replies

Since you have no qualms about reading the file multiple times, just go through it once more character-by-character to grab the whitespace and non-whitespace counts. Then you'd have three loops:

  1. Read line-by-line and increment the line count
  2. Read word-by-word and increment the word count
  3. Read character-by-character and increment the space/non-space count

Though the typical approach to this is reading the file character by character and using a state machine to help out with the trickier blocks (ie. words). This way you only read through the file one time.

>while (!file.eof())
This is a bug waiting to happen. eof() only returns true after you've tried and failed to read from the stream, which means the last record will be processed twice. I strongly recommend using your input method as the condition:

while (getline(file, s1)) {
while (file >> s1)) {

Both getline and operator>> return a reference to the stream object, which has a conversion path to bool for checking if the state is good or not.

>ch_spaces += s1.size();
This doesn't do what you think. The length of s1 includes both whitespace and non-whitespace. You need to further break it down.

I also want to read the file character by character and count space and non-space characters but actually I am not able to figure out a way to detect whether a character being read is a space character or not. I know only one such function isalpha() but it returns true only for alphabets and false for all special characters and numeric digits.

Also currently I can't understand some other terms and phrases you used such as "state machine", "state is good or not" etc. reason is probably that I am attending my first programming course ever. may be I will be able to understand all this stuff in a week or two.

If you can tell me a way to detect if a character is space character or not then it will surely solve my problem.

Thank you very much for the help so far!!!!!

There are several ways. Since the space char is a character, notated like this: ' ', you can look for it directly. Otherwise you can use a function related to isalpha() called isspace(). However, isspace() looks for all whitespace characters, which includes the space character as well as the tab character, the newline char, etc.

commented: Thank you very much for mentioning this fuction. Without it I would never had completed my this program ... +1

There are several ways. Since the space char is a character, notated like this: ' ', you can look for it directly. Otherwise you can use a function related to isalpha() called isspace(). However, isspace() looks for all whitespace characters, which includes the space character as well as the tab character, the newline char, etc.

really ?????? then this function is exactly what I was looking for. I am going to try it out and will post the modified version of the above program soon.

Thankyou very much!!!!!!

Once you have a working version of the code, I'll also post an alternative using my suggestion.

Once you have a working version of the code, I'll also post an alternative using my suggestion.

Here is the final program which has all of my desired functionality...
Thankyou verrrrrrrrrrrryyyyyyyy much Lerner for helping me out.

#include <iostream>
#include <string>
#include <fstream>
using namespace std;

int main()
{
	ifstream file("wordcount.txt");
	string s1;
	int words=0, lines=0, ch_spaces=0, ch_no_spaces=0, spaces=0;
	while (getline(file, s1))
	{		
		ch_spaces += s1.size();
		lines++;
		for (int i=0; i<s1.size(); i++)
		{
			if (isspace(s1[i]))
				spaces++;
		}
	}
	file.clear();
	file.seekg(0, ios::beg);

	while (file >> s1)
		words++;
	
	ch_no_spaces = ch_spaces - spaces;

	cout << "The file contains " << ch_spaces << " characters (including spaces)\n";
	cout << "The file contains " << ch_no_spaces << " characters (excluding spaces)\n";
	cout << "The file contains " << words << " words.\n";
	cout << "The file contains " << lines << " lines.\n";

	system("pause");
	return 0;
}

and Norue!!!

I have used your suggested method of controlling the while loop in this version of my program but actually I have not understand what is the difference between the two methods. If you explain it a little bit more than it will be great. as far as
ch_spaces += s1.size();
is concerned then I think you misunderstood my use of this statement. I am actually using it to count the number of characters including white-spaces not without them. Afterwards I am subtracting the total number of whitespaces found in the file from this number to get the number of total non-space characters.

I have used your suggested method of controlling the while loop in this version of my program but actually I have not understand what is the difference between the two methods.

Your original method is a bug and my method corrects the bug. The bug is that on the very last iteration of the loop, eof() will return false even though there's nothing left to read. Then your input method will fail, which leaves the input variable with the same contents as the previous iteration. So you process the same input two times.

I think you misunderstood my use of this statement

Yes, I misunderstood your use of ch_nspaces. I read it as the number of whitespace characters rather than the number of total characters minus whitespace.

As promised, here is my version:

#include <cctype>
#include <fstream>
#include <iostream>

int main()
{
    std::ifstream in("test.txt");

    if (in) {
        int total = 0, spaces = 0, words = 0, lines = 0;
        bool inword = false;
        char ch;

        while (in.get(ch)) {
            ++total; // All characters including whitespace

            if (ch == '\n')
                ++lines;

            if (std::isspace(ch)) {
                ++spaces;
                inword = false;
            }
            else if (!inword) {
                ++words;
                inword = true;
            }
        }

        // Add the last line if it doesn't end with '\n'
        if (ch != '\n' && total > 0)
            ++lines;

        std::cout<<"Total characters ("<< total <<")\n"
                 <<"Non-spaces ("<< total - spaces <<")\n"
                 <<"Words ("<< words <<")\n"
                 <<"Lines ("<< lines <<")\n";
    }
}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.