HTML Parser...

JaedenRuiner 0 Newbie Poster

17 Years Ago

Well,

I'm not sure how other people do it, but eh, for some reason VB makes my task excessively difficult, because it is so inter-dependent and integrated.

Now, HTMLDocument, is a Document interface for...go figure...an Html Document. However, I can't use it. I can only get to it via a WebBrowser component, which is not only excessive, but doesn't work right. I create 5 Browser Components, and Browse to 5 different Web Pages Asynchronously, however, only 2 or 3 of the browser Components actually call the attached event "DocumentCompleted()" So even though the all 5 pages have loaded, my code is never informed that they've loaded.

So, I try to create just an HtmlDocument, and using the HtmlDocument.Write() method try to create the document that way. But HtmlDocument doesn't have any constructors...so HOW DOES IT GET CREATED FOR WebBrowser???? As well, even when I use the HtmlDocument.Write() method, the page is loaded and in the browser, but there are no Links() or All() Elements listed in the document. Which defeats the purpose of the exercise.

That is all Superfluous anyway, because basically i can't get VB to do what I want it to do, so I need to ask how to achieve my goal, and thus rewrite the code the way VB wants it.
What I want, and I hope someone actually responds, because I really want to know how to achieve this aim:
1. Browse to Web Site: https://somepage.somedomain.com/serversidefile.cfm?param=X
2. Load the Document into a HTMLDocument Class
3. Parse the Table elements and Link Elements to gather information from the file.

Now, Step 1 I can do via HttpWebRequest/HttpWebResponse, or via a WebBrowser.Navigate(), however the .Navigate() has been unreliable in letting me know it has loaded the document.

Step 3 is fully written and works perfectly as it is.

Step 2 - THIS IS THE BOTTLENECK.
As I said, I'm gathering a bunch of information automatically off of web pages. Lets run through my process.

Object1 - Inherits CollectionBase
Browses to a List.html file on a server, and loads it up.
Upon loaded it cycles through all of the Links() and then creates the children objects with those links.

Object2 -
Parses the Link passed to it to find some information, and then loads the page to find the remainder information for the object.
After the DocumentCompleted() event is triggered from it internal WebBrowser object, it calls out to the parent object via an event.

Object1 - Receives the Events from it's contained Object2's and fires an event for each one out to the mainform which owns the instance of Object1.

My problem lies in that each Object2 creates its own WebBrowser object and sets the Handler for DocumentCompleted, but even though the page is loaded and is infact completed, the event doesn't always trigger for them all. on average I receive the event less than 60% of the time, and the thing about that is, is that ALL of these pages are identically formatted. They're generated html code so if the WebBrowser can load/parse one, it should do it for them all, but for some reason the event sequence is 100% and I NEED it to be.

So I want either the WebBrowser component to load the page Synchronously (ie: I call WebBrowser.Navigate() and it doesn't return until the page is fully loaded and the WebBrowser.Document property is accessible) or to take a Direct HTTP stream and populate/load an HtmlDocument class off of the stream/text, which so far works with even less success than the WebBrowser methodology.

Thanks
Jaeden "Sifo Dyas" al'Raec Ruiner

1 Contributor
0 Replies
192 Views

Be the first to reply

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.