c++ count html tags

Question

mat1989 0 Newbie Poster

16 Years Ago

Hi guys.

Im doing an assignment, I am getting on ok with it but I am sadly really stuck at the moment.

The part I am stuck on involves counting html tags in a text file. I have thought of a method of doing this but unfortunately I have no idea how to implement it into the code. My idea is to look for the start symbol of the tag( <) then the contents, then the end symbol (>) Does anyone know how I could do this in c++?

Many thanks

Mat

c c# c++ html-css

7 Contributors
14 Replies
1K Views
3 Days Discussion Span
Latest Post 16 Years Ago Latest Post by Xlphos

All 14 Replies

Ancient Dragon 5,243 Achieved Level 70

16 Years Ago

probably something like this:

std::string str = "<html>";
if( str.find("<") != string::npos && str.find(">") != string::npos)
{
    // most likely an html tag
}

Ancient Dragon 5,243 Achieved Level 70

16 Years Ago

why not just use getline() to read an entire line at one time?

std::string line;
while( getline(ipfile, line) )
{
   // blabla
}

I don't do html coding, but I think any given tag must be on one line, such as "<html>" can not be split between lines, so it doesn't make any sense to read the html file one character at a time.

Ancient Dragon 5,243 Achieved Level 70

16 Years Ago

you use getline() to read an entire line that is terminated with '\n'. Then use string::find() to look for < and > characters as shown in previous example code.

Also note that ipfile.eof() is not needed in my loop because the loop stops on error or end-of-file.

Comatose 290 Taboo Programmer

16 Years Ago

You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

tux4life 2,072 Postaholic · Answer 1 · 2009-04-06T03:11:11+00:00

You could just go through the file and when your program comes across a '<' it should just ignore everything after it until it comes across a '>', at that point you've to count a tag ...

mat1989 0 Newbie Poster · Answer 2 · 2009-04-06T03:18:32+00:00

mat1989 0 Newbie Poster

16 Years Ago

Thanks alot guys :) i'll try these ideas out.

mat1989 0 Newbie Poster · Answer 3 · 2009-04-06T03:46:09+00:00

Dragon I had a stab at using your technique but it returns a random number of tags in the output. :(

MosaicFuneral 812 Nearly a Posting Virtuoso · Answer 4 · 2009-04-06T03:56:58+00:00

MosaicFuneral 812 Nearly a Posting Virtuoso

16 Years Ago

Show us the code, with CODE TAGS.

mat1989 0 Newbie Poster · Answer 5 · 2009-04-06T04:00:19+00:00

// assignment program 
// read file and copy to another
// count amount of charecters,lines, comments and tags
// change Xhtml tags from upper case to lower case
// place in new file 

#include <iostream>
#include <fstream>
#include <string>
#include <cstring>
#include <iomanip> 
#include <cctype>
using namespace std;

int main()
{
string file1,file2;
string str = "<>"; 
ifstream ipfile;
ofstream opfile;
char c;



int amountline = 0;
int amountcha = 0;
int amounttag = 0;
int amountcomment = 0;
cout <<  "Please enter the name of the file you wish to check" << endl;
cin >> file1;


ipfile.open(file1.c_str());


if (!ipfile.is_open())

{
	cout << "Oops! Couldn't open " << file1 << "!\n"<<endl;

	return 1;

}
 
{

cout << " Please enter the file you wish the edited contents to be copied to" << endl;
cout << " This will be created if it does not already exist"<< endl; 
cin >> file2;
}

opfile.open(file2.c_str());


while (!ipfile.eof())

{	

ipfile.get(c);
opfile << c;


		

		if(c!='\n' && !ipfile.eof() && c!=' ')
		{
			amountcha++;
		}
		if(c=='\n')
		{
			amountline++;
		}

	 if( str.find("<") != string::npos && str.find(">") != string::npos)
{
    amounttag++;
}

   
      if ( c == '!')
      {
      amountcomment++;
      }
  

  









cout << " This file contains :" << amountline << " lines" << endl;
cout << " This file contains :" << amountcha << " charecters" << endl;
cout << " this file has : " << amountcomment<<  " comments " << endl;
cout << " This file contains  : " << amounttag << "tags"   << endl; 

cout << " Copy complete, edited code located in " << file2 << endl;



return 0;
}

thats all my code so far.

mat1989 0 Newbie Poster · Answer 6 · 2009-04-06T04:17:12+00:00

I can use getline to search for html tags in the line?

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 7 · 2009-04-06T04:31:23+00:00

You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

That sounds like the hard way. strtok() embeds NULLs in the string, which might corrupt the std::string class.

iamthwee · Answer 8 · 2009-04-06T04:38:26+00:00

>You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

Best not to mix c with c++. Naughty naughty :-0

In any case all these suggestions are too niave. There are a few examples which will slip through the loop, (html attributes)

http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx

So using regular expressions, would be the best way to deal with this however, if this is homework your professor is unlikely to care.

mat1989 0 Newbie Poster · Answer 9 · 2009-04-06T21:50:27+00:00

I got it working guys :) Code I used below :

if (c=='<' )

	 {
		 
		istag = true;

	 }

	 if (c== '>' && istag == true)

	 {

amounttag++;
istag = false; 
	 }




if (c== '!')

{

	iscomment = true; 

}

if ( c== '-' && iscomment == true)
{
amountcomment++;
amounttag --;
iscomment = false; 

}

works a treat :) , Thanks for all your help.

Xlphos 16 Veteran Poster · Answer 10 · 2009-04-09T20:04:00+00:00

I think I am doing the same assignment as you. Dont forget there maybe an exclamation mark in normal text. "Hello!" you would be better saying if '<' precedes '!' then add 1 to comment count.

c++ count html tags

Recommended Answers Collapse Answers

All 14 Replies

Recommended Answers