Hi guys.

Im doing an assignment, I am getting on ok with it but I am sadly really stuck at the moment.

The part I am stuck on involves counting html tags in a text file. I have thought of a method of doing this but unfortunately I have no idea how to implement it into the code. My idea is to look for the start symbol of the tag( <) then the contents, then the end symbol (>) Does anyone know how I could do this in c++?

Many thanks


You could just go through the file and when your program comes across a '<' it should just ignore everything after it until it comes across a '>', at that point you've to count a tag ...

probably something like this:

std::string str = "<html>";
if( str.find("<") != string::npos && str.find(">") != string::npos)
    // most likely an html tag

Thanks alot guys :) i'll try these ideas out.

Dragon I had a stab at using your technique but it returns a random number of tags in the output. :(

// assignment program 
// read file and copy to another
// count amount of charecters,lines, comments and tags
// change Xhtml tags from upper case to lower case
// place in new file 

#include <iostream>
#include <fstream>
#include <string>
#include <cstring>
#include <iomanip> 
#include <cctype>
using namespace std;

int main()
string file1,file2;
string str = "<>"; 
ifstream ipfile;
ofstream opfile;
char c;

int amountline = 0;
int amountcha = 0;
int amounttag = 0;
int amountcomment = 0;
cout <<  "Please enter the name of the file you wish to check" << endl;
cin >> file1;


if (!ipfile.is_open())

	cout << "Oops! Couldn't open " << file1 << "!\n"<<endl;

	return 1;


cout << " Please enter the file you wish the edited contents to be copied to" << endl;
cout << " This will be created if it does not already exist"<< endl; 
cin >> file2;


while (!ipfile.eof())


opfile << c;


		if(c!='\n' && !ipfile.eof() && c!=' ')

	 if( str.find("<") != string::npos && str.find(">") != string::npos)

      if ( c == '!')


cout << " This file contains :" << amountline << " lines" << endl;
cout << " This file contains :" << amountcha << " charecters" << endl;
cout << " this file has : " << amountcomment<<  " comments " << endl;
cout << " This file contains  : " << amounttag << "tags"   << endl; 

cout << " Copy complete, edited code located in " << file2 << endl;

return 0;

thats all my code so far.

why not just use getline() to read an entire line at one time?

std::string line;
while( getline(ipfile, line) )
   // blabla

I don't do html coding, but I think any given tag must be on one line, such as "<html>" can not be split between lines, so it doesn't make any sense to read the html file one character at a time.

I can use getline to search for html tags in the line?

you use getline() to read an entire line that is terminated with '\n'. Then use string::find() to look for < and > characters as shown in previous example code.

Also note that ipfile.eof() is not needed in my loop because the loop stops on error or end-of-file.

You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

That sounds like the hard way. strtok() embeds NULLs in the string, which might corrupt the std::string class.

>You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

Best not to mix c with c++. Naughty naughty :-0

In any case all these suggestions are too niave. There are a few examples which will slip through the loop, (html attributes)


So using regular expressions, would be the best way to deal with this however, if this is homework your professor is unlikely to care.

commented: *Nods* +12

I got it working guys :) Code I used below :

if (c=='<' )

		istag = true;


	 if (c== '>' && istag == true)


istag = false; 

if (c== '!')


	iscomment = true; 


if ( c== '-' && iscomment == true)
amounttag --;
iscomment = false; 


works a treat :) , Thanks for all your help.

I think I am doing the same assignment as you. Dont forget there maybe an exclamation mark in normal text. "Hello!" you would be better saying if '<' precedes '!' then add 1 to comment count.