We're a community of 1077K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,076,031 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

Extract data from webpage every minute?

Guys I need help on how to extract data from this web page http://hidemyass.com/proxy-list/
Its mainly the Ip address and port but i have no idea in where to start. I know to start out with this
Dim elements As HtmlElementCollection = Me.botBrowser.Document.All
but i dont know how i would transverse the source code to find the ip address and port.
Also like if i just wanted to first one on the page each time the page refreshed how would i do this also

4
Contributors
7
Replies
6 Months
Discussion Span
1 Year Ago
Last Updated
10
Views
Question
Answered
Alleyn
Newbie Poster
4 posts since Aug 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

Hi,

I'd use one of the many free scren scraper tools there are out there and save myself the bother.... you can get them to export to csv or xml and take the file into read....

G_Waddell
Practically a Master Poster
619 posts since Nov 2009
Reputation Points: 107
Solved Threads: 93
Skill Endorsements: 5

See if this helps.
1 Button, 1 ListBox

Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        getHTML("http://hidemyass.com/proxy-list/")
    End Sub

    Private myWebResponse As Net.HttpWebResponse
    Private myStream As IO.Stream
    Private myReader As IO.StreamReader

    Private Sub getHTML(ByVal siteURL As String)
        Me.Cursor = Cursors.WaitCursor
        Try
            myWebResponse = CType(Net.HttpWebRequest.Create(siteURL).GetResponse, Net.HttpWebResponse)
            myStream = myWebResponse.GetResponseStream()
            myReader = New IO.StreamReader(myStream)
            extractHTML(myReader.ReadToEnd, ListBox1)
            myReader.Close()
            myStream.Close()
            myWebResponse.Close()
        Catch ex As Exception
            MsgBox("There was a connection problem.", MsgBoxStyle.Critical)
        End Try
        Me.Cursor = Cursors.Default
    End Sub

    Private iSi, iEi As Integer, arTemp(), sTemp, sItemToAddToListBox As String

    Private Sub extractHTML(ByVal htmlContent As String, ByVal selListbox As ListBox)
        selListbox.Items.Clear()
        With htmlContent
            iSi = .IndexOf("<td>IP address</td>")
            iEi = .IndexOf("</table>", iSi)
            arTemp = .Substring(iSi, iEi - iSi).Split("/"c)
        End With
        sTemp = "<td><span>"
        For i As Integer = 0 To arTemp.Length - 1
            With arTemp(i)
                If .ToLower.Contains(sTemp) Then
                    sItemToAddToListBox = .Substring(.IndexOf(sTemp) + sTemp.Length).Replace("<", "")
                    sItemToAddToListBox &= ":" & arTemp(i + 2).Substring(.IndexOf("<td>") + 5).Replace("<", "")
                    selListbox.Items.Add(sItemToAddToListBox)
                End If
            End With
        Next
        MsgBox("done")
    End Sub
End Class
codeorder
Postaholic
2,124 posts since Aug 2010
Reputation Points: 256
Solved Threads: 387
Skill Endorsements: 8

Ty so much, im havng it refresh with a timer every 60 seconds is their a way to completely erase everything in the list box and refresh it with the new stuff it picked up?

Alleyn
Newbie Poster
4 posts since Aug 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

Never mind solved that ^

Alleyn
Newbie Poster
4 posts since Aug 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

codeorder can you help me get the ip address and port in a variable im having trouble transverse the itemlist

Alleyn
Newbie Poster
4 posts since Aug 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

Since "arTemp" is already declared in my previous code, use this.

Private Sub ListBox1_SelectedIndexChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ListBox1.SelectedIndexChanged
        With ListBox1
            If Not .SelectedIndex = -1 Then
                arTemp = .Items(.SelectedIndex).ToString.Split(":"c) '// .Split item in 2 Arrays.
                MsgBox(arTemp(0)) '// IP.
                MsgBox(arTemp(1)) '// Port.
            End If
        End With
    End Sub
codeorder
Postaholic
2,124 posts since Aug 2010
Reputation Points: 256
Solved Threads: 387
Skill Endorsements: 8
Question Answered as of 1 Year Ago by codeorder and G_Waddell

Seems this will not work now because HMA recently had some changes in its source

See if this helps.
1 Button, 1 ListBox

Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        getHTML("http://hidemyass.com/proxy-list/")
    End Sub

    Private myWebResponse As Net.HttpWebResponse
    Private myStream As IO.Stream
    Private myReader As IO.StreamReader

    Private Sub getHTML(ByVal siteURL As String)
        Me.Cursor = Cursors.WaitCursor
        Try
            myWebResponse = CType(Net.HttpWebRequest.Create(siteURL).GetResponse, Net.HttpWebResponse)
            myStream = myWebResponse.GetResponseStream()
            myReader = New IO.StreamReader(myStream)
            extractHTML(myReader.ReadToEnd, ListBox1)
            myReader.Close()
            myStream.Close()
            myWebResponse.Close()
        Catch ex As Exception
            MsgBox("There was a connection problem.", MsgBoxStyle.Critical)
        End Try
        Me.Cursor = Cursors.Default
    End Sub

    Private iSi, iEi As Integer, arTemp(), sTemp, sItemToAddToListBox As String

    Private Sub extractHTML(ByVal htmlContent As String, ByVal selListbox As ListBox)
        selListbox.Items.Clear()
        With htmlContent
            iSi = .IndexOf("<td>IP address</td>")
            iEi = .IndexOf("</table>", iSi)
            arTemp = .Substring(iSi, iEi - iSi).Split("/"c)
        End With
        sTemp = "<td><span>"
        For i As Integer = 0 To arTemp.Length - 1
            With arTemp(i)
                If .ToLower.Contains(sTemp) Then
                    sItemToAddToListBox = .Substring(.IndexOf(sTemp) + sTemp.Length).Replace("<", "")
                    sItemToAddToListBox &= ":" & arTemp(i + 2).Substring(.IndexOf("<td>") + 5).Replace("<", "")
                    selListbox.Items.Add(sItemToAddToListBox)
                End If
            End With
        Next
        MsgBox("done")
    End Sub
End Class
fiaworkz
Newbie Poster
4 posts since Nov 2011
Reputation Points: 7
Solved Threads: 0
Skill Endorsements: 0

This question has already been solved: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
View similar articles that have also been tagged:
 
© 2013 DaniWeb® LLC
Page rendered in 0.0795 seconds using 2.7MB