Hello, I'm sorry if this is the wrong place to post this, I couldn't find a support section.
Anyway, I'm new to VB.NET and honestly, I'm lost. I need to extract text from a HTML page, here's the line I want to extract from
<p><a href="/video7419001/0/CoD4">CoD4</a></p>
What I want to extract from it is this
<p><a href="/video(This here)/0/(This here)">CoD4</a></p>
All I want to do is put it into two listboxes, could I do this without using Regex? I've had a look at Regex and it's mind blowing with these "wild cards" I honestly don't know where to start lol.
Thanks for the advice :)

Recommended Answers

All 9 Replies

With some string manipulation this must also be possible, if you always have the same format in your html. Have a look at the String class in MSDN. Split and Substring methods are the first that come in my mind. Success!

This might not be what you're looking for, but I would suggest regex's.

1) They're fast.
2) They're very powerfull. They're the powerhouse of text manupulation, making things even like bioinformatics easy.

Just take your time in learning regex's. There not too hard, just don't get scared by them.

Being said, yes there are ways around it. You can implement your own string searching algorithm (this is quite a bit of work mind you). Perhaps something like KMP

Also a tutorial like thisone or many others can be found on the net.

Thanks for the replies, it helped alot. I finally got all the stuff into a list box that I want. This is what I got /video7419001/0/CoD4 now to the next bit, is it possible to extract this here /video(these digits here)/0/CoD4into another listbox? they all range from 1000 - 99999999
Thanks :)

You can use another regex to read the first number. "[0-9]+" in perl, so probably something simular in vb.net.

You can do the following:
Add Imports System.Text.RegularExpressions

Create a class named "VideoInfo.vb".

VideoInfo.vb:

Public Class VideoInfo
    Public Property videoNumber As String
    Public Property name As String

    Public Sub New()

    End Sub

    Public Sub New(ByVal videoNumber As String, ByVal name As String)
        Me.videoNumber = videoNumber
        Me.name = name
    End Sub

End Class

We will use the above class inside the following function.

    Private Function extractData(ByVal myData As String) As List(Of VideoInfo)

        'create new list for video info
        Dim videoInfoList As New List(Of VideoInfo)

        'pattern that we want to find
        'use named groups
        Dim pattern As String = "/video(?<vidNumber>[0-9]+)/[0-9]?/(?<vidName>[A-Za-z0-9]+)"

        'look for a match
        Dim myMatch As Match = Regex.Match(myData, pattern)

        'keep looking for matches until no more are found
        While myMatch.Success

            'get groups
            Dim myGroups As GroupCollection = myMatch.Groups

            'store extracted info in an instance of VideoInfo
            Dim myVideoInfo As New VideoInfo
            myVideoInfo.videoNumber = myMatch.Groups("vidNumber").Value
            myVideoInfo.name = myMatch.Groups("vidName").Value

            'add video info to the list
            videoInfoList.Add(myVideoInfo)

            'get next match
            myMatch = myMatch.NextMatch()
        End While

        Return videoInfoList
    End Function

To use it:

        Dim myData As String = String.Empty
        Dim myVideoInfo As List(Of VideoInfo)
        Dim output As String = String.Empty

        myData += "<p><a href=""/video7419004/0/CoD4"">CoD4</a></p>"
        myData += "<p><a href=""/video7419005/0/CoD5"">CoD5</a></p>"
        myData += "<p><a href=""/video7419006/0/CoD6"">CoD6</a></p>"

        myVideoInfo = extractData(myData)


        For Each video In myVideoInfo
            output += "video: " & video.videoNumber & " " & video.name
            output += System.Environment.NewLine
        Next

        'display for testing purposes
        MessageBox.Show(output)

Resources:

Regex: Named Capturing Groups in .NET

Regular Expressions in C# – Part 2 – Matches and NextMatch

Why was my post down-voted? It is a tested/working solution. If you disagree with my solution, please provide explanation.

It wasn't me who downvoted but as a guess, I'd say it was an overly complex solution to a simple problem. In my opinion that didn't justify a downvote so I upvoted to cancel it.

My guess is because you're giving free code away. Kind of like the saying "give a man a fish, feed him for a day, teach a man how to fish, feed him for a lifetime."

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.