Hi Guys,

After trying to port a C++ program which was a console application where
it crawled the forums with the url provided and in the end stored the result inside
a database for further analysis.

Now, with very limited time I have decided to replicate this in vb.net as I have come
across few functions and classes which are much easier to use.

Since I will be replicating the c++ application, I will be following the same design as mentioned below:

Frist: Initiate forum connect
Second: Get list of forums and information from database
Third: Read individual forum URL and associated information
Forth: Instantiate and run crawler for forum
Fiveth: Forums remaining? No then go back to step 3 or else continue
Sixth: Close connection
Seventh: Exit Program

Anyhow, I am looking for advice on how I can go about downloading contents of say a thread inside a forum and storing it to a database where the contents can be parsed for specific information.

Please advice

Thanks

Recommended Answers

All 2 Replies

The .NET framework has classes for making HTTP requests (System.Net namespace) and an open source library - "Html Agility Pack" is available to parsing html.

The .NET framework has classes for making HTTP requests (System.Net namespace) and an open source library - "Html Agility Pack" is available to parsing html.

Thanks for your reply :)

I have no problem grabbing source code from a given website but my main
concern is some of them have xml codding in them so I am not sure how
to work around designing the parser to work with both html or xml?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.