c++ count html tags

Question

mat1989 0 Newbie Poster

15 Years Ago

Hi guys.

Im doing an assignment, I am getting on ok with it but I am sadly really stuck at the moment.

The part I am stuck on involves counting html tags in a text file. I have thought of a method of doing this but unfortunately I have no idea how to implement it into the code. My idea is to look for the start symbol of the tag( <) then the contents, then the end symbol (>) Does anyone know how I could do this in c++?

Many thanks

Mat

c++

7 Contributors
14 Replies
995 Views
3 Days Discussion Span
Latest Post 15 Years Ago Latest Post by Xlphos

All 14 Replies

mvmalderen 2,072 Postaholic

15 Years Ago

You could just go through the file and when your program comes across a '<' it should just ignore everything after it until it comes across a '>', at that point you've to count a tag ...

Ancient Dragon 5,243 Achieved Level 70

15 Years Ago

probably something like this:

std::string str = "<html>";
if( str.find("<") != string::npos && str.find(">") != string::npos)
{
    // most likely an html tag
}

MosaicFuneral 812 Nearly a Posting Virtuoso

15 Years Ago

Show us the code, with CODE TAGS.

Ancient Dragon 5,243 Achieved Level 70

15 Years Ago

why not just use getline() to read an entire line at one time?

std::string line;
while( getline(ipfile, line) )
{
   // blabla
}

I don't do html coding, but I think any given tag must be on one line, such as "<html>" can not be split between lines, so it doesn't make any sense to read the html file one character at a time.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mat1989 0 Newbie Poster · Answer 1 · 2009-04-06T03:18:32+00:00

mat1989 0 Newbie Poster

15 Years Ago

Thanks alot guys :) i'll try these ideas out.

mat1989 0 Newbie Poster · Answer 2 · 2009-04-06T03:46:09+00:00

Dragon I had a stab at using your technique but it returns a random number of tags in the output. :(

mat1989 0 Newbie Poster · Answer 3 · 2009-04-06T04:00:19+00:00

// assignment program 
// read file and copy to another
// count amount of charecters,lines, comments and tags
// change Xhtml tags from upper case to lower case
// place in new file 

#include <iostream>
#include <fstream>
#include <string>
#include <cstring>
#include <iomanip> 
#include <cctype>
using namespace std;

int main()
{
string file1,file2;
string str = "<>"; 
ifstream ipfile;
ofstream opfile;
char c;



int amountline = 0;
int amountcha = 0;
int amounttag = 0;
int amountcomment = 0;
cout <<  "Please enter the name of the file you wish to check" << endl;
cin >> file1;


ipfile.open(file1.c_str());


if (!ipfile.is_open())

{
	cout << "Oops! Couldn't open " << file1 << "!\n"<<endl;

	return 1;

}
 
{

cout << " Please enter the file you wish the edited contents to be copied to" << endl;
cout << " This will be created if it does not already exist"<< endl; 
cin >> file2;
}

opfile.open(file2.c_str());


while (!ipfile.eof())

{	

ipfile.get(c);
opfile << c;


		

		if(c!='\n' && !ipfile.eof() && c!=' ')
		{
			amountcha++;
		}
		if(c=='\n')
		{
			amountline++;
		}

	 if( str.find("<") != string::npos && str.find(">") != string::npos)
{
    amounttag++;
}

   
      if ( c == '!')
      {
      amountcomment++;
      }
  

  









cout << " This file contains :" << amountline << " lines" << endl;
cout << " This file contains :" << amountcha << " charecters" << endl;
cout << " this file has : " << amountcomment<<  " comments " << endl;
cout << " This file contains  : " << amounttag << "tags"   << endl; 

cout << " Copy complete, edited code located in " << file2 << endl;



return 0;
}

thats all my code so far.

mat1989 0 Newbie Poster · Answer 4 · 2009-04-06T04:17:12+00:00

I can use getline to search for html tags in the line?

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 5 · 2009-04-06T04:18:35+00:00

you use getline() to read an entire line that is terminated with '\n'. Then use string::find() to look for < and > characters as shown in previous example code.

Also note that ipfile.eof() is not needed in my loop because the loop stops on error or end-of-file.

Comatose 290 Taboo Programmer Team Colleague · Answer 6 · 2009-04-06T04:28:19+00:00

You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 7 · 2009-04-06T04:31:23+00:00

You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

That sounds like the hard way. strtok() embeds NULLs in the string, which might corrupt the std::string class.

iamthwee · Answer 8 · 2009-04-06T04:38:26+00:00

>You could tokenize (strtok) by "<" and stick each piece into a vector or so... then get it's size().

Best not to mix c with c++. Naughty naughty :-0

In any case all these suggestions are too niave. There are a few examples which will slip through the loop, (html attributes)

http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx

So using regular expressions, would be the best way to deal with this however, if this is homework your professor is unlikely to care.

mat1989 0 Newbie Poster · Answer 9 · 2009-04-06T21:50:27+00:00

I got it working guys :) Code I used below :

if (c=='<' )

	 {
		 
		istag = true;

	 }

	 if (c== '>' && istag == true)

	 {

amounttag++;
istag = false; 
	 }




if (c== '!')

{

	iscomment = true; 

}

if ( c== '-' && iscomment == true)
{
amountcomment++;
amounttag --;
iscomment = false; 

}

works a treat :) , Thanks for all your help.

Xlphos 16 Veteran Poster · Answer 10 · 2009-04-09T20:04:00+00:00

I think I am doing the same assignment as you. Dont forget there maybe an exclamation mark in normal text. "Hello!" you would be better saying if '<' precedes '!' then add 1 to comment count.

c++ count html tags

Recommended Answers Collapse Answers

All 14 Replies

Recommended Answers