Hi there,

I am relatively new to C++ and was hoping someone could provide me with some guidance, I have a .txt file that contains a heap of HTML and I wish to extract a small portion of dynamic text from differing places. For example:

.txt file before filter
HTTP/1.1 200 OK
Content-Length: 48547
Content-Type: text/html; charset=UTF-8
Date: Sat, 31 Oct 2009 20:00:33 GMT
Content-Language: en-UK
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

.txt file after filter
HTTP
Date
PUBLIC

So far I've retrieved the stream of data before storing it into a vector. I'm just not sure how to filter down the .txt file from here?

ofstream txt("test.txt", ios::app);

while (1) {
      string l = s.ReceiveLine();
      if (l.empty()) break;
      cout << l;

	  txt << l; // feed output into .txt

	  // feed stream into vector
	  vStream.push_back(l);

Any help is much appreciated!

Recommended Answers

All 3 Replies

And what is the actual problem?
You can do this in two ways.
1. Use regular expressions.
2. Write a simple parser myself.

Bah, im struggling with regex!

I want to grab
<td scope="row" class="name">this text</td>

So far I've got: \w[A-Z\<]

But this gives me the < as well!

I managed to get close enough using

"[A-Z][A-Z]{3}<\\/t|[A-Z]{3}<\\/t|[A-Z\\.]{3}<\\/t"

Thanks

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.