Analysing HTML documents in C++

Question

IndianaRonaldo -3 Light Poster

13 Years Ago

I started with curl and now I'm able to post form data, obtain html pages and all that basic stuff. But I'm unable to analyse individual data in the obtained webpage. Like filtering all the images, videos and things like that. I can do that in Javascript, but that can only be run from a browser, so I want a way to analyse a webpage without a browser.
I searched around the web and I found it was called parsing. Can someone explain how it works, and if individual elements in a webpage can be obtained using a parser or have I misunderstood it? Is there any other way to do this using C++ alone?

Thanks in advance.

c++ html-css image

3 Contributors
3 Replies
244 Views
3 Months Discussion Span
Latest Post 13 Years Ago Latest Post by mustafaneguib

All 3 Replies

Ancient Dragon 5,243 Achieved Level 70

13 Years Ago

Here is boost-html that might help you

IndianaRonaldo commented: Thanks!! I checked out that one too, but it doesn't have very good forum support, so I went with libxml2. +0

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Thanks!! I checked out that one too, but it doesn't have very good forum support, so I went with libxml2.

IndianaRonaldo -3 Light Poster · Answer 1 · 2012-05-31T14:27:21+00:00

Atlast found a standard library that has good forum support and suits my job. Libxml2 seems to be the best options for this. Libcurl coupled with libxml2 and I could make a code for downloading wallpapers from a site. It has good functionality and documentation support.

mustafaneguib 1 Junior Poster in Training · Answer 2 · 2012-06-01T16:25:31+00:00

I searched around the web and I found it was called parsing. Can someone explain how it works, and if individual elements in a webpage can be obtained using a parser or

yes this can be easily done. you can use a number of techniques to this. at the moment i am working on developing a search engine in C/C++ which includes a spider which gets the html code from a server and parses it for all the links.

the technique that i follow is what i learned in compiler construction, that is to design and develop a transition diagram and then code it, which reads the html code character by character. at the same time you have states, for each character there is a specific state so that you know what character and word you have read.

however where i dont need such precision i have used the searching technique, and where i need precision and need to know the order of the elements i use the character by character search.

Analysing HTML documents in C++

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers