Howdy,

I have a basic html file containing certain data I need to extract. This is the code for just one of the tables on the page:

<TABLE title="Left Magazine"class="dataTable" align="center" cellspacing="0" cellpadding="0">      <caption>Media Details</caption><THEAD><TR class="captionRow"><TH>Slot #</TH><TH>Attn</TH><TH>Status</TH><TH>In Drive</TH><TH>Label</TH><TH>Media Loads</TH><TH>Comment</TH></TR></THEAD>      <TBODY>
<TR class="altRowColor" >            <TD>1 </TD>          <TD>&nbsp;</TD><TD>Full, Gen. 3 </TD>         <TD>&nbsp;</TD>         <TD>TMSWK2D</TD><TD>    9</TD>            <TD>&nbsp; Poor write quality</TD>        </TR>
<TR class="altRowColor" >            <TD>2 </TD>          <TD>&nbsp;</TD><TD>Full, Gen. 1 </TD>         <TD>&nbsp;</TD>         <TD>TMSWK2C</TD><TD>&nbsp;</TD>           <TD>Read Only, Clean Tape</TD>        </TR>
<TR class="altRowColor" >            <TD>3 </TD>          <TD>&nbsp;</TD><TD>Full, Gen. 3 </TD>         <TD>&nbsp;</TD>         <TD>TMSWK2B</TD><TD>    9</TD>            <TD>&nbsp; Poor write quality</TD>        </TR>
<TR class="altRowColor" >            <TD>4 </TD>          <TD>&nbsp;</TD><TD>Full, Gen. 3 </TD>         <TD>&nbsp;</TD>         <TD>TMSWK2A</TD><TD>   10</TD>            <TD>&nbsp; Poor write quality</TD>        </TR>      </TBODY>      </TABLE>


I need to extract the data from both of the "media information" tables, excluding the "in-drive" field. I have found some example code using the DOMDocument() function. This kind of works, but it selects every table in the page, rather than just the two tables + fields I need:

<?php	
    /*** a new dom object ***/ 
    $dom = new domDocument; 

    /*** load the html into the object ***/ 
    $dom->loadHTMLFile('inventory_status.html'); 

    /*** discard white space ***/ 
    $dom->preserveWhiteSpace = false; 

    /*** the table by its tag name ***/ 
    $tables = $dom->getElementsByTagName('table'); 

    /*** get all rows from the table ***/ 
    $rows = $tables->item(0)->getElementsByTagName('tr'); 
    
    /*** loop over the table rows ***/ 
    foreach ($rows as $row) 
    { 
        /*** get each column by tag name ***/ 
        $cols = $row->getElementsByTagName('td'); 

        /*** echo the values ***/ 
        echo $cols->item(0)->nodeValue.''; 
        echo $cols->item(1)->nodeValue.''; 
        echo $cols->item(2)->nodeValue; 
        echo '<hr />'; 
	
    }


My first question is whether the DOM is the best/easiest way to achieve the parsing. And secondly, how could I modify the code to select only the relevant data?

Cheers for your time

Recommended Answers

All 3 Replies

Vb code for parsing HTML Table

'You must have referance to "Microsoft vb Regular expression" object
'two methods are given
'learn from this Examples
Private Sub Command1_Click()
Dim str As String
Dim pattern As String

pattern = "<td[^>]*?>[\s\S]*?<\/td>"

str = Inet1.OpenURL("http://bkcom.net/bh/bolDates.htm") 'There is table on this location

'Method One
MsgBox TestRegExp(pattern, str)

'Method Two
rege str, pattern
End Sub
Function rege(subjectString, pattern)
'Prepare a regular expression object
    Dim myRegExp As RegExp
    Dim myMatches As MatchCollection
    Dim myMatch As Match

    Set myRegExp = New RegExp

    myRegExp.IgnoreCase = True
    myRegExp.Global = True
    myRegExp.pattern = pattern
    Set myMatches = myRegExp.Execute(subjectString)
    For Each myMatch In myMatches
        MsgBox (stripHtml(myMatch.Value))
    Next
End Function
Function TestRegExp(myPattern As String, myString As String)
   'Create objects.
   Dim objRegExp As RegExp
   Dim objMatch As Match
   Dim colMatches   As MatchCollection
   Dim RetStr As String

   Set objRegExp = New RegExp
   objRegExp.pattern = myPattern
   objRegExp.IgnoreCase = True
   objRegExp.Global = True

   If (objRegExp.Test(myString) = True) Then

    Set colMatches = objRegExp.Execute(myString)

    For Each objMatch In colMatches
      RetStr = RetStr & "Match found at position "
      RetStr = RetStr & objMatch.FirstIndex & ". Match Value is '"
      RetStr = RetStr & objMatch.Value & "'." & vbCrLf
    Next
   Else
    RetStr = "String Matching Failed"
   End If
   TestRegExp = stripHtml(RetStr)
End Function
Function stripHtml(strContent)
    On Error Resume Next
    Dim mStartPos As Long, mEndPos As Long
    Dim i, j
       ' Start process
    mStartPos = InStr(strContent, "<")
    mEndPos = InStr(strContent, ">")
    Do While mStartPos <> 0 And mEndPos <> 0 And mEndPos > mStartPos
          mString = Mid(strContent, mStartPos, mEndPos - mStartPos + 1)
          strContent = Replace(strContent, mString, "")
          mStartPos = InStr(strContent, "<")
          mEndPos = InStr(strContent, ">")
    Loop
       'Escap seq
    strContent = Replace(strContent, "&nbsp;", " ")
    strContent = Replace(strContent, "&amp;", "&")
    strContent = Replace(strContent, "&quot;", "'")
    strContent = Replace(strContent, "&#", "#")
    strContent = Replace(strContent, "&lt;", "<")
    strContent = Replace(strContent, "&gt;", ">")
    strContent = Replace(strContent, "%20", " ")
    strContent = LTrim(Trim(strContent))
    Do While Left(strContent, 1) = Chr$(13) Or Left(strContent, 1) = Chr$(10)
          strContent = Mid(strContent, 2)
    Loop
    stripHtml = strContent
     
End Function

> Vb code for parsing HTML Table
This is the PHP forum. Posting your VB code isn't very relevant in this thread.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.