Guys I need help on how to extract data from this web page http://hidemyass.com/proxy-list/
Its mainly the Ip address and port but i have no idea in where to start. I know to start out with this
Dim elements As HtmlElementCollection = Me.botBrowser.Document.All
but i dont know how i would transverse the source code to find the ip address and port.
Also like if i just wanted to first one on the page each time the page refreshed how would i do this also

Recommended Answers

All 7 Replies

Hi,

I'd use one of the many free scren scraper tools there are out there and save myself the bother.... you can get them to export to csv or xml and take the file into read....

See if this helps.
1 Button, 1 ListBox

Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        getHTML("http://hidemyass.com/proxy-list/")
    End Sub

    Private myWebResponse As Net.HttpWebResponse
    Private myStream As IO.Stream
    Private myReader As IO.StreamReader

    Private Sub getHTML(ByVal siteURL As String)
        Me.Cursor = Cursors.WaitCursor
        Try
            myWebResponse = CType(Net.HttpWebRequest.Create(siteURL).GetResponse, Net.HttpWebResponse)
            myStream = myWebResponse.GetResponseStream()
            myReader = New IO.StreamReader(myStream)
            extractHTML(myReader.ReadToEnd, ListBox1)
            myReader.Close()
            myStream.Close()
            myWebResponse.Close()
        Catch ex As Exception
            MsgBox("There was a connection problem.", MsgBoxStyle.Critical)
        End Try
        Me.Cursor = Cursors.Default
    End Sub

    Private iSi, iEi As Integer, arTemp(), sTemp, sItemToAddToListBox As String

    Private Sub extractHTML(ByVal htmlContent As String, ByVal selListbox As ListBox)
        selListbox.Items.Clear()
        With htmlContent
            iSi = .IndexOf("<td>IP address</td>")
            iEi = .IndexOf("</table>", iSi)
            arTemp = .Substring(iSi, iEi - iSi).Split("/"c)
        End With
        sTemp = "<td><span>"
        For i As Integer = 0 To arTemp.Length - 1
            With arTemp(i)
                If .ToLower.Contains(sTemp) Then
                    sItemToAddToListBox = .Substring(.IndexOf(sTemp) + sTemp.Length).Replace("<", "")
                    sItemToAddToListBox &= ":" & arTemp(i + 2).Substring(.IndexOf("<td>") + 5).Replace("<", "")
                    selListbox.Items.Add(sItemToAddToListBox)
                End If
            End With
        Next
        MsgBox("done")
    End Sub
End Class
commented: Amazingly Helpful +0

Ty so much, im havng it refresh with a timer every 60 seconds is their a way to completely erase everything in the list box and refresh it with the new stuff it picked up?

Never mind solved that ^

codeorder can you help me get the ip address and port in a variable im having trouble transverse the itemlist

Since "arTemp" is already declared in my previous code, use this.

Private Sub ListBox1_SelectedIndexChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ListBox1.SelectedIndexChanged
        With ListBox1
            If Not .SelectedIndex = -1 Then
                arTemp = .Items(.SelectedIndex).ToString.Split(":"c) '// .Split item in 2 Arrays.
                MsgBox(arTemp(0)) '// IP.
                MsgBox(arTemp(1)) '// Port.
            End If
        End With
    End Sub

Seems this will not work now because HMA recently had some changes in its source

See if this helps.
1 Button, 1 ListBox

Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        getHTML("http://hidemyass.com/proxy-list/")
    End Sub

    Private myWebResponse As Net.HttpWebResponse
    Private myStream As IO.Stream
    Private myReader As IO.StreamReader

    Private Sub getHTML(ByVal siteURL As String)
        Me.Cursor = Cursors.WaitCursor
        Try
            myWebResponse = CType(Net.HttpWebRequest.Create(siteURL).GetResponse, Net.HttpWebResponse)
            myStream = myWebResponse.GetResponseStream()
            myReader = New IO.StreamReader(myStream)
            extractHTML(myReader.ReadToEnd, ListBox1)
            myReader.Close()
            myStream.Close()
            myWebResponse.Close()
        Catch ex As Exception
            MsgBox("There was a connection problem.", MsgBoxStyle.Critical)
        End Try
        Me.Cursor = Cursors.Default
    End Sub

    Private iSi, iEi As Integer, arTemp(), sTemp, sItemToAddToListBox As String

    Private Sub extractHTML(ByVal htmlContent As String, ByVal selListbox As ListBox)
        selListbox.Items.Clear()
        With htmlContent
            iSi = .IndexOf("<td>IP address</td>")
            iEi = .IndexOf("</table>", iSi)
            arTemp = .Substring(iSi, iEi - iSi).Split("/"c)
        End With
        sTemp = "<td><span>"
        For i As Integer = 0 To arTemp.Length - 1
            With arTemp(i)
                If .ToLower.Contains(sTemp) Then
                    sItemToAddToListBox = .Substring(.IndexOf(sTemp) + sTemp.Length).Replace("<", "")
                    sItemToAddToListBox &= ":" & arTemp(i + 2).Substring(.IndexOf("<td>") + 5).Replace("<", "")
                    selListbox.Items.Add(sItemToAddToListBox)
                End If
            End With
        Next
        MsgBox("done")
    End Sub
End Class
commented: .next time,start a thread and link .Me to it.reason:provided.solution. -3
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.