Get data between any two tags

Question

dhimanbiswas4u 0 Newbie Poster

12 Years Ago

sir i want to get data between to tag...
example:(showing in my view source)

<img class='stat_icon' src='/images/green.png'> 
**data to be extracted**</a>

or like

<input type=hidden name=timestamp value='**data to be extracted**'>

for the first example i wrote like:

Imports System.Text.RegularExpressions
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim requestt As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("my desired link")
            Dim responsee As System.Net.HttpWebResponse = requestt.GetResponse()
            Dim sr As System.IO.StreamReader = New System.IO.StreamReader(responsee.GetResponseStream())
            Dim sourcecodee As String = sr.ReadToEnd()
            Dim pattern1 As String = "<img class='stat_icon' src='/images/red.png'> " & _
"(.*)</a>"
            Dim m As Match = Regex.Match(sourcecodee, pattern1)
            If m.Success Then
                MsgBox(m.Groups(1).Value)
            End If
End Sub

but nothing showing or appearning as msgbox.

please solve it and discus it with example as much as u can.

regex vb.net

4 Contributors
4 Replies
273 Views
14 Hours Discussion Span
Latest Post 12 Years Ago Latest Post by tinstaafl

All 4 Replies

G_Waddell 131 Posting Whiz in Training

12 Years Ago

Hi,
I'll assume that you do realise that the </a> tag is the end tag for <a> and not <img>
Also are you sure the attributes inside the tag are using single quotes rather than double quotes?
<img class="stat_icon" src="/images/red.png"> is different from "<img class='stat_icon' src='/images/red.png'>.
Why not use Instr to get the start and end positions of the data then mid to get the actual data:

 Dim sourcecodee As String = sr.ReadToEnd()
 dim iStart, iEnd as integer
 dim datamatch as string
 'rather than worry about the quotes I'll grab the class name
 iStart = Instr(lcase(sourcecodee), "stat_icon")
 If iStart <> 0 andalso instr(iStart, sourcecodee, ">") <> 0 then
    'go to end of the img tag...
    iStart = instr(istart, sourcecodee,">")
    'get first "</a>" after start
    iEnd = instr(istart,lcase(sourcecodee),"</a>") 
    If iEnd <> 0 then
        datamatch = mid(sourcecodee, iStart , iEnd - iStart)
    end if
end if

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

gusano79 247 Posting Shark · Answer 1 · 2013-05-20T16:34:23+00:00

In addition to Mr. Waddell's comments...

An immediately obvious problem: the source has green.png, but you're looking for red.png.

A less obvious problem: You have a space at the end of the first part of the regular expression, but the input looks like it has a newline there. As it's inside the tags, you probably don't want to match it explicitly, so get rid of the space. It still won't match, though. You need to set the SingleLine option so your .* will match the newline as well.

A non-problem that in other cases would be one: Where you're matching red.png, the . still has its special "match any character" meaning, which isn't what you want. It works when matching the literal text red.png, but only because the literal . appears where you're matching anything. the regex red.png would also match other literal text, like redipng or red$png. If you want to match a special regex character as a normal character, escape it with \, like this: red\.png

So a corrected version of your regular expression would look like this:

<img class='stat_icon' src='/images/green\.png'>(.*)</a>

I recommend that whenever possible, don't break a regex across multiple literal strings (like you have above with &); it's easy to miss subtle errors that way.

Why not use Instr to get the start and end positions of the data then mid to get the actual data

Because it takes ten times as much code to match text, which is why we have regular expressions. They look arcane at first, but once you wrap your head around them, text matching ceases to be a challenge.

I can see readability arguments either way, but when reading someone else's code, I'd rather scratch my head over a single-line regular expression written using a standard syntax than try to decipher a free-form text matching loop.

A final thought: You're matching a very specific chunk of text. There's plenty of opportunity here to make a more general regex to find content inside of tags. If you're interested to explore that path, I'll just leave this here as a possible next step:

<(?<tag>\S+).*>(?<data>[^<]*)</\k<tag>>

gusano79 247 Posting Shark · Answer 2 · 2013-05-20T16:35:36+00:00

Oh, I almost forgot... while developing regular expressions, I find tools like The Regulator to be extremely helpful.

tinstaafl 1,176 Posting Maven · Answer 3 · 2013-05-20T21:07:03+00:00

If you create a webbrowser control, then create an htmldocument from a URL loaded in that control, you now have acces to all the elements in the web page and can access any values in those elements, even get the outer html code for any specific element.

Public Class Form1
    Public Webbrowser1 As New WebBrowser()
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim NewDoc As HtmlDocument
        NewDoc = WebBrowser1.Document
        For Each el As HtmlElement In NewDoc.All
            If el.TagName.ToLower = "img" Then
                Dim HTMLString As String = el.OuterHtml
                If el.TagName.ToLower = "img" AndAlso HTMLString.Contains("stat_icon") Then
                    MsgBox(HTMLString)
                End If

            End If
        Next
    End Sub

    Private Sub Form1_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load
        Webbrowser1.Url = New Uri("my desired link")    
    End Sub
End Class

Get data between any two tags

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers