1,105,402 Community Members

Extract data from webpage every minute?

Member Avatar
Alleyn
Newbie Poster
4 posts since Aug 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Guys I need help on how to extract data from this web page http://hidemyass.com/proxy-list/
Its mainly the Ip address and port but i have no idea in where to start. I know to start out with this
Dim elements As HtmlElementCollection = Me.botBrowser.Document.All
but i dont know how i would transverse the source code to find the ip address and port.
Also like if i just wanted to first one on the page each time the page refreshed how would i do this also

Member Avatar
G_Waddell
Practically a Posting Shark
821 posts since Nov 2009
Reputation Points: 131 [?]
Q&As Helped to Solve: 137 [?]
Skill Endorsements: 13 [?]
 
0
 

Hi,

I'd use one of the many free scren scraper tools there are out there and save myself the bother.... you can get them to export to csv or xml and take the file into read....

Member Avatar
codeorder
Postaholic
2,027 posts since Aug 2010
Reputation Points: 197 [?]
Q&As Helped to Solve: 390 [?]
Skill Endorsements: 10 [?]
 
1
 

See if this helps.
1 Button, 1 ListBox

Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        getHTML("http://hidemyass.com/proxy-list/")
    End Sub

    Private myWebResponse As Net.HttpWebResponse
    Private myStream As IO.Stream
    Private myReader As IO.StreamReader

    Private Sub getHTML(ByVal siteURL As String)
        Me.Cursor = Cursors.WaitCursor
        Try
            myWebResponse = CType(Net.HttpWebRequest.Create(siteURL).GetResponse, Net.HttpWebResponse)
            myStream = myWebResponse.GetResponseStream()
            myReader = New IO.StreamReader(myStream)
            extractHTML(myReader.ReadToEnd, ListBox1)
            myReader.Close()
            myStream.Close()
            myWebResponse.Close()
        Catch ex As Exception
            MsgBox("There was a connection problem.", MsgBoxStyle.Critical)
        End Try
        Me.Cursor = Cursors.Default
    End Sub

    Private iSi, iEi As Integer, arTemp(), sTemp, sItemToAddToListBox As String

    Private Sub extractHTML(ByVal htmlContent As String, ByVal selListbox As ListBox)
        selListbox.Items.Clear()
        With htmlContent
            iSi = .IndexOf("<td>IP address</td>")
            iEi = .IndexOf("</table>", iSi)
            arTemp = .Substring(iSi, iEi - iSi).Split("/"c)
        End With
        sTemp = "<td><span>"
        For i As Integer = 0 To arTemp.Length - 1
            With arTemp(i)
                If .ToLower.Contains(sTemp) Then
                    sItemToAddToListBox = .Substring(.IndexOf(sTemp) + sTemp.Length).Replace("<", "")
                    sItemToAddToListBox &= ":" & arTemp(i + 2).Substring(.IndexOf("<td>") + 5).Replace("<", "")
                    selListbox.Items.Add(sItemToAddToListBox)
                End If
            End With
        Next
        MsgBox("done")
    End Sub
End Class
Member Avatar
Alleyn
Newbie Poster
4 posts since Aug 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Ty so much, im havng it refresh with a timer every 60 seconds is their a way to completely erase everything in the list box and refresh it with the new stuff it picked up?

Member Avatar
Alleyn
Newbie Poster
4 posts since Aug 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Never mind solved that ^

Member Avatar
Alleyn
Newbie Poster
4 posts since Aug 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

codeorder can you help me get the ip address and port in a variable im having trouble transverse the itemlist

Member Avatar
codeorder
Postaholic
2,027 posts since Aug 2010
Reputation Points: 197 [?]
Q&As Helped to Solve: 390 [?]
Skill Endorsements: 10 [?]
 
0
 

Since "arTemp" is already declared in my previous code, use this.

Private Sub ListBox1_SelectedIndexChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ListBox1.SelectedIndexChanged
        With ListBox1
            If Not .SelectedIndex = -1 Then
                arTemp = .Items(.SelectedIndex).ToString.Split(":"c) '// .Split item in 2 Arrays.
                MsgBox(arTemp(0)) '// IP.
                MsgBox(arTemp(1)) '// Port.
            End If
        End With
    End Sub
Question Answered as of 2 Years Ago by codeorder and G_Waddell
Member Avatar
fiaworkz
Newbie Poster
4 posts since Nov 2011
Reputation Points: -3 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
-1
 

Seems this will not work now because HMA recently had some changes in its source

See if this helps.
1 Button, 1 ListBox

Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        getHTML("http://hidemyass.com/proxy-list/")
    End Sub

    Private myWebResponse As Net.HttpWebResponse
    Private myStream As IO.Stream
    Private myReader As IO.StreamReader

    Private Sub getHTML(ByVal siteURL As String)
        Me.Cursor = Cursors.WaitCursor
        Try
            myWebResponse = CType(Net.HttpWebRequest.Create(siteURL).GetResponse, Net.HttpWebResponse)
            myStream = myWebResponse.GetResponseStream()
            myReader = New IO.StreamReader(myStream)
            extractHTML(myReader.ReadToEnd, ListBox1)
            myReader.Close()
            myStream.Close()
            myWebResponse.Close()
        Catch ex As Exception
            MsgBox("There was a connection problem.", MsgBoxStyle.Critical)
        End Try
        Me.Cursor = Cursors.Default
    End Sub

    Private iSi, iEi As Integer, arTemp(), sTemp, sItemToAddToListBox As String

    Private Sub extractHTML(ByVal htmlContent As String, ByVal selListbox As ListBox)
        selListbox.Items.Clear()
        With htmlContent
            iSi = .IndexOf("<td>IP address</td>")
            iEi = .IndexOf("</table>", iSi)
            arTemp = .Substring(iSi, iEi - iSi).Split("/"c)
        End With
        sTemp = "<td><span>"
        For i As Integer = 0 To arTemp.Length - 1
            With arTemp(i)
                If .ToLower.Contains(sTemp) Then
                    sItemToAddToListBox = .Substring(.IndexOf(sTemp) + sTemp.Length).Replace("<", "")
                    sItemToAddToListBox &= ":" & arTemp(i + 2).Substring(.IndexOf("<td>") + 5).Replace("<", "")
                    selListbox.Items.Add(sItemToAddToListBox)
                End If
            End With
        Next
        MsgBox("done")
    End Sub
End Class
You
This question has already been solved: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: