very hard regex extraction for me :( im trying toextract JOHNDIGEDY which is a variable

HTML String:

<a onclick="showPersona('persistentNation'); return false;" href=""><span>Welcome,</span>

        JOHNDIGEDY
          <span style="color: red;">(Unverified)</span>

any one help?

ive tryed

Dim Regex As New Regex("</span>\n\n"".*""\n<span style=", RegexOptions.Multiline)
                    For Each M As Match In Regex.Matches(data)
                        Dim Description As String = M.Value.Split("""").GetValue(0)
                        TextBox4.Text = Description
                    Next

And

Dim Regex As New Regex("</span>\r\n\r\n"".*""\r\n<span style=", RegexOptions.Multiline)
                    For Each M As Match In Regex.Matches(data)
                        Dim Description As String = M.Value.Split("""").GetValue(1)
                        TextBox4.Text = Description
                    Next

Any one shine any light on this please?

Recommended Answers

All 7 Replies

When you're not sure how the lines will be broken, it will be tricky.

You could do it like this or just remove ALL whitespace and look between the closing span and opening span.

I made a mock-up of the HTML and put it in a string array to simulate retrieving the data as an array. I run it through a filter to remove the carriage returns and linefeeds, then turn it into a single string and run it through the regex.
I called the element (or group member) to be extracted "name":

Imports System
Imports System.Linq
Imports System.Text.RegularExpressions

Module Module1
   Sub Main()
      Dim arr() As String =
      {
         "<a onclick=""showPersona('persistentNation'); return false;"" href=""""><span>Welcome,</span>" _
          + Chr(13) + Chr(10) +
         "        JOHNDIGEDY" + Chr(13) + Chr(10) + " ",
         "          <span style=""color: red;"">(Unverified)</span>" + Chr(13) + Chr(10)
      }

      Dim strTemp = String.Join("",
         arr.Select(Function(s) s.Replace(Chr(13).ToString(), "").Replace(Chr(10).ToString(), "")).ToArray())

      Dim rxGetName As New Regex("</span> * (?<name>.*) *<span")

      If (rxGetName.IsMatch(strTemp)) Then
         Console.WriteLine("({0})", rxGetName.Match(strTemp).Groups("name").Value.ToString().Trim())
      End If
   End Sub
End Module

Ya see tht code is from a streamreader Readtoend. I cant specify the string with them char(13) etc as i will have to manually edit the page source each time? how else can i do it =\

If you use ReadToEnd(), it's OK. Just add the replace method on the end of it.
ReadToEnd().Replace(...).Replace(...)

no luck, nothing happends, label changes to account banned but nothing no popups or anything, how would i get it to show in a label to display the username?

Dim writer As StreamWriter = New StreamWriter(request.GetRequestStream())
            writer.Write("https://profile.ea.com/login.do?authenticationSource=EA-JForums&surl=http://forum.ea.com/uk/categories/list.page&remoteurl=http://forum.ea.com/uk/gusUser/login.page&selectprofile=true&locale=en_GB%20surl=http%3A%2F%2Fforum.ea.com%2Fuk%2Fcategories%2Flist.page&selectprofile=true&remoteurl=http%3A%2F%2Fforum.ea.com%2Fuk%2FgusUser%2Flogin.page&action=Login&registrationSource=&authenticationSource=EA-JForums&HIDE_GUS=&username=" & TextBox1.Text & "&password=" & TextBox2.Text & "")
            writer.Close()
            response = request.GetResponse()
            'Get the data from the page
            Dim stream As StreamReader = New StreamReader(response.GetResponseStream())
            Dim data As String = stream.ReadToEnd().Replace(Chr(13).ToString(), "").Replace(Chr(10).ToString(), "")
            response.Close()

            If data.Contains("<title>EA Forums") = True Then
                If data.Contains("You have been banned") = True Then

                    Label1.Text = "Acc Banned"
                    Banned = True
                    Label1.ForeColor = Color.Red
                    Label1.Font = New Font(Label1.Font, FontStyle.Bold)
                    Dim Regex As New Regex("</span> * (?<name>.*) *<span")
                    If (Regex.IsMatch(data)) Then
                        Console.WriteLine("({0})", Regex.Match(data).Groups("name").Value.ToString().Trim())
                    End If

If you ran the sample and it worked, then something must be different in the conditions of the data when it comes back from the web server.

How many times does that pattern repeat?
Does the match code get triggered at all?
Did you set a breakpoint on the line doing Console.WriteLine to see if it is reached?

And, most importantly, did you verify the target string segment is actually in the data you retrieved on the last run?

wouldn't have a clue im a noob =/

What I'm really asking is if there is a possibility the pattern you're searching for appears more than once in the file.
For instance: this mock-up has two elements that should be found.
BTW, I modified the filter a little to take out spaces.
...could have done that with another Regex, but I don't want to complicate matters.
MAYBE that extra space is filled with TABS Chr(9), which I didn't check for before...

Imports System
Imports System.Linq
Imports System.Text.RegularExpressions

Module Module1
   Sub Main()
      Dim arr() As String =
      {
         "<a onclick=""showPersona('persistentNation'); return false;"" href=""""><span>Welcome,</span>" _
          + Chr(13) + Chr(10) +
         "        " + Chr(13) + Chr(10) + " ",
         "          <span style=""color: red;"">(Unverified)</span>" + Chr(13) + Chr(10) +
         "<a onclick=""showPersona('persistentNation'); return false;"" href=""""><span>Welcome,</span>" _
          + Chr(13) + Chr(10) +
         "        JOHNDIGEDY" + Chr(13) + Chr(10) + " ",
         "          <span style=""color: red;"">(Unverified)</span>" + Chr(13) + Chr(10) +
         "<a onclick=""showPersona('persistentNation'); return false;"" href=""""><span>Welcome,</span>" _
          + Chr(13) + Chr(10) +
         "         " + Chr(13) + Chr(10) + " ",
         "    AA      <span style=""color: red;"">(Unverified)</span>" + Chr(13) + Chr(10)
      }

      Dim strTemp = String.Join("",
         arr.Select(Function(s) s.Replace(Chr(13).ToString(), "") _
            .Replace(Chr(10).ToString(), "").Replace(" ", "") _
            .Replace(Chr(9).ToString(), "")).ToArray())

      Dim rxGetName As New Regex("</span>(?<name>[A-Za-z]{1,})<span")

      For Each m As Match In rxGetName.Matches(strTemp)
         Console.WriteLine("({0})", m.Groups(1).Value.ToString())
      Next
   End Sub
End Module
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.