1,105,578 Community Members

HTML tags to listbox in visual basic.net

Member Avatar
pjns19
Newbie Poster
10 posts since Feb 2013
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi all,

Looking to extract all HTML tags from a dump of HTML data and put them all in a listbox.

I currently have the following code.

It displays to me things like HTML HEAD TITLE BODY.

But i want things like the IMG and ALT tags.

    ' Obtain the document interface
    Dim htmlDocument As mshtml.IHTMLDocument2 = DirectCast(New mshtml.HTMLDocument(), mshtml.IHTMLDocument2)
    ' Construct the document
    htmlDocument.write(htmlDocument)
    ListBox1.Items.Clear()
    ' Extract all elements
    Dim allElements As mshtml.IHTMLElementCollection = htmlDocument.all
    ' Iterate all the elements and display tag names
    For Each element As mshtml.IHTMLElement In allElements
        ListBox1.Items.Add(element.tagName)
    Next
    ' Extract all image elements
    Dim imgElements As mshtml.IHTMLElementCollection = htmlDocument.images
    ' Iterate through each image element
    For Each img As mshtml.IHTMLImgElement In imgElements
        ListBox2.Items.Add(img.src)
    Next
End Sub
Member Avatar
tinstaafl
Postaholic
2,012 posts since Jun 2010
Reputation Points: 559 [?]
Q&As Helped to Solve: 402 [?]
Skill Endorsements: 35 [?]
 
1
 

If you don't absolutely have to use the mshtml interface you could try this:

Imports System.IO
Public Class Form1

    Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        'Set the webbrowser control visible property to false if you don't need it for anything else.
        WebBrowser1.Url = New Uri("C:\Test1.htm")
    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim htmlDocument As HtmlDocument = WebBrowser1.Document
        ListBox1.Items.Clear()
        ' Iterate all the elements and display tag names
        For Each element As HtmlElement In htmlDocument.All
            ListBox1.Items.Add(element.TagName)                
            If element.TagName.ToUpper = "IMG" Then
                ListBox2.Items.Add(element.DomElement.src)
            End If
        Next

    End Sub
End Class
Member Avatar
pjns19
Newbie Poster
10 posts since Feb 2013
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi,

Thanks so much for your code. working very well. i have one more problem to ask sorry.

But i can open a new thread if you would like me too.

Within the code it has IMG as the tag name. if i wanted to tag to be for example;

TITLE or ALT

    Dim htmlDocument As HtmlDocument = WebBrowser1.Document
    ListBox1.Items.Clear()
    ' Iterate all the elements and display tag names
    For Each element As HtmlElement In htmlDocument.All
        ListBox1.Items.Add(element.TagName)
        If element.TagName.ToUpper = "TITLE" Then
            ListBox2.Items.Add(element.DomElement.src)
        End If
    Next
End Sub

I get the following error...

Public member 'src' on type 'HTMLTitleElementClass' not found.

Thanks so much!!

Member Avatar
pjns19
Newbie Poster
10 posts since Feb 2013
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Sorry. after thinking a little more, i should be more obvious about what i want to do.

So for example if the TITLE element has nothing in it for example "" then i want that printed in listbox2.

If it has something init i want that printed into listbox2 for example if TITLE element has

" Welcome to Amazon " init. i want that into listbox2.

Thanks.

Member Avatar
TnTinMN
Practically a Master Poster
640 posts since Jun 2012
Reputation Points: 418 [?]
Q&As Helped to Solve: 152 [?]
Skill Endorsements: 16 [?]
 
0
 

Try recasting the document to a IHTMLDocument3 use getElementsByTagName on the new cast object,

Member Avatar
pjns19
Newbie Poster
10 posts since Feb 2013
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

i'll be honest, i don't know how to do that. do you have any sample code?

Thanks.

Member Avatar
TnTinMN
Practically a Master Poster
640 posts since Jun 2012
Reputation Points: 418 [?]
Q&As Helped to Solve: 152 [?]
Skill Endorsements: 16 [?]
 
0
 

I have tested this by casting the webbrowser.Document.DomDocument, so hopefully it will work for you.

You used: Dim htmlDocument As mshtml.IHTMLDocument2 = DirectCast(New mshtml.HTMLDocument(), mshtml.IHTMLDocument2)

recast as mshtml.IHTMLDocument3

   Dim doc3 As mshtml.IHTMLDocument3 = DirectCast(htmlDocument, IHTMLDocument3)
   For Each img As mshtml.IHTMLImgElement In doc3.getElementsByTagName("img")
      Debug.WriteLine(img.src)
   Next
Member Avatar
tinstaafl
Postaholic
2,012 posts since Jun 2010
Reputation Points: 559 [?]
Q&As Helped to Solve: 402 [?]
Skill Endorsements: 35 [?]
 
1
 

src is the file path for images, and my original code works for that. To get the inner text for Title use:

        If element.TagName.ToUpper = "TITLE" Then
            ListBox2.Items.Add(element.InnerText)
        End If
Question Answered as of 1 Year Ago by tinstaafl and TnTinMN
You
This question has already been solved: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article