Hi guys.

Im doing an assignment, I am getting on ok with it but I am sadly really stuck at the moment.

The part I am stuck on involves counting html tags in a text file. I have thought of a method of doing this but unfortunately I have no idea how to implement it into the code. My idea is to look for the start symbol of the tag( <) then the contents, then the end symbol (>) Does anyone know how I could do this in c++?

Many thanks

Mat

You could just go through the file and when your program comes across a '<' it should just ignore everything after it until it comes across a '>', at that point you've to count a tag ...

probably something like this:

std::string str = "<html>";
if( str.find("<") != string::npos && str.find(">") != string::npos)
{
    // most likely an html tag
}

Dragon I had a stab at using your technique but it returns a random number of tags in the output. :(

// assignment program 
// read file and copy to another
// count amount of charecters,lines, comments and tags
// change Xhtml tags from upper case to lower case
// place in new file 

#include <iostream>
#include <fstream>
#include <string>
#include <cstring>
#include <iomanip> 
#include <cctype>
using namespace std;

int main()
{
string file1,file2;
string str = "<>"; 
ifstream ipfile;
ofstream opfile;
char c;



int amountline = 0;
int amountcha = 0;
int amounttag = 0;
int amountcomment = 0;
cout <<  "Please enter the name of the file you wish to check" << endl;
cin >> file1;


ipfile.open(file1.c_str());


if (!ipfile.is_open())

{
	cout << "Oops! Couldn't open " << file1 << "!\n"<<endl;

	return 1;

}
 
{

cout << " Please enter the file you wish the edited contents to be copied to" << endl;
cout << " This will be created if it does not already exist"<< endl; 
cin >> file2;
}

opfile.open(file2.c_str());


while (!ipfile.eof())

{	

ipfile.get(c);
opfile << c;


		

		if(c!='\n' && !ipfile.eof() && c!=' ')
		{
			amountcha++;
		}
		if(c=='\n')
		{
			amountline++;
		}

	 if( str.find("<") != string::npos && str.find(">") != string::npos)
{
    amounttag++;
}

   
      if ( c == '!')
      {
      amountcomment++;
      }
  

  









cout << " This file contains :" << amountline << " lines" << endl;
cout << " This file contains :" << amountcha << " charecters" << endl;
cout << " this file has : " << amountcomment<<  " comments " << endl;
cout << " This file contains  : " << amounttag << "tags"   << endl; 

cout << " Copy complete, edited code located in " << file2 << endl;



return 0;
}

thats all my code so far.

why not just use getline() to read an entire line at one time?

std::string line;
while( getline(ipfile, line) )
{
   // blabla
}

I don't do html coding, but I think any given tag must be on one line, such as "<html>" can not be split between lines, so it doesn't make any sense to read the html file one character at a time.

you use getline() to read an entire line that is terminated with '\n'. Then use string::find() to look for < and > characters as shown in previous example code.

Also note that ipfile.eof() is not needed in my loop because the loop stops on error or end-of-file.

You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

That sounds like the hard way. strtok() embeds NULLs in the string, which might corrupt the std::string class.

>You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

Best not to mix c with c++. Naughty naughty :-0

In any case all these suggestions are too niave. There are a few examples which will slip through the loop, (html attributes)

http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx

So using regular expressions, would be the best way to deal with this however, if this is homework your professor is unlikely to care.

Comments
*Nods*

I got it working guys :) Code I used below :

if (c=='<' )

	 {
		 
		istag = true;

	 }

	 if (c== '>' && istag == true)

	 {

amounttag++;
istag = false; 
	 }




if (c== '!')

{

	iscomment = true; 

}

if ( c== '-' && iscomment == true)
{
amountcomment++;
amounttag --;
iscomment = false; 

}

works a treat :) , Thanks for all your help.

I think I am doing the same assignment as you. Dont forget there maybe an exclamation mark in normal text. "Hello!" you would be better saying if '<' precedes '!' then add 1 to comment count.

This article has been dead for over six months. Start a new discussion instead.