I started with curl and now I'm able to post form data, obtain html pages and all that basic stuff. But I'm unable to analyse individual data in the obtained webpage. Like filtering all the images, videos and things like that. I can do that in Javascript, but that can only be run from a browser, so I want a way to analyse a webpage without a browser.
I searched around the web and I found it was called parsing. Can someone explain how it works, and if individual elements in a webpage can be obtained using a parser or have I misunderstood it? Is there any other way to do this using C++ alone?

Thanks in advance.

Recommended Answers

All 3 Replies

Here is boost-html that might help you

commented: Thanks!! I checked out that one too, but it doesn't have very good forum support, so I went with libxml2. +0

Atlast found a standard library that has good forum support and suits my job. Libxml2 seems to be the best options for this. Libcurl coupled with libxml2 and I could make a code for downloading wallpapers from a site. It has good functionality and documentation support.

I searched around the web and I found it was called parsing. Can someone explain how it works, and if individual elements in a webpage can be obtained using a parser or

yes this can be easily done. you can use a number of techniques to this. at the moment i am working on developing a search engine in C/C++ which includes a spider which gets the html code from a server and parses it for all the links.

the technique that i follow is what i learned in compiler construction, that is to design and develop a transition diagram and then code it, which reads the html code character by character. at the same time you have states, for each character there is a specific state so that you know what character and word you have read.

however where i dont need such precision i have used the searching technique, and where i need precision and need to know the order of the elements i use the character by character search.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.