0

Hi there,

I am relatively new to C++ and was hoping someone could provide me with some guidance, I have a .txt file that contains a heap of HTML and I wish to extract a small portion of dynamic text from differing places. For example:

.txt file before filter
HTTP/1.1 200 OK
Content-Length: 48547
Content-Type: text/html; charset=UTF-8
Date: Sat, 31 Oct 2009 20:00:33 GMT
Content-Language: en-UK
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

.txt file after filter
HTTP
Date
PUBLIC

So far I've retrieved the stream of data before storing it into a vector. I'm just not sure how to filter down the .txt file from here?

ofstream txt("test.txt", ios::app);

while (1) {
      string l = s.ReceiveLine();
      if (l.empty()) break;
      cout << l;

	  txt << l; // feed output into .txt

	  // feed stream into vector
	  vStream.push_back(l);

Any help is much appreciated!

Edited by pac-man: n/a

2
Contributors
3
Replies
5
Views
8 Years
Discussion Span
Last Post by pac-man
0

And what is the actual problem?
You can do this in two ways.
1. Use regular expressions.
2. Write a simple parser myself.

0

Bah, im struggling with regex!

I want to grab
<td scope="row" class="name">this text</td>

So far I've got: \w[A-Z\<]

But this gives me the < as well!

Edited by pac-man: n/a

0

I managed to get close enough using

"[A-Z][A-Z]{3}<\\/t|[A-Z]{3}<\\/t|[A-Z\\.]{3}<\\/t"

Thanks

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.