0

Hello,

I need a little bit of assistance with parsing data from an html using xpath. Please indulge me for a moment as I attempt to explain my problem and subsequently my question:

The html code from which I am parsing data is below. I would like to specifically parse for all href attributes starting from within the second <tr> of an html which is formatted as follows:

<tbody><tr class="clubHeaderRow">
    <td>Club Name <font size="-1">(Exp. Date)</font></td>
    <td align="center">Location and Days/Hours</td>
    <td>Contacts</td>
</tr>
<!--PARSING STARTS FROM HERE--><tr><td class="clubRow"><a href="http://www.bumpernets.com">BumperNets, Inc.</a><br><font size="-1">(7/31/2014)</font></td>
    <td align="center" valign="middle" width="500"><table width="98%" border="1" cellpadding="2">
        <tbody><tr>
            <td width="60%" class="clubRowEven">Riverchase Galleria Mall<br>2000 Riverchase Galleria Ste 179 &amp; 181A<br>Birmingham, AL  35244<br>205-987-2222<br><br><u>Directions</u>:<br>Toll Free Number 1-800-366-7664</td>
            <td width="40%" class="clubRowEven">Open 7 days a week  Mon-Thurs. 10AM to 9PM, Fri-Sat 10AM to 10PM and Sunday - 11AM to 6PM  
Tournaments - Every Friday 7:30 PM to 10:00 PM</td>
        </tr>
    </tbody></table>
    </td>
    <td class="clubRow"><a href="mailto:homer.bumpernets@gmail.com">Homer Brown</a><br>205-987-2222</td>
</tr>
<tr>
<td bgcolor="whitesmoke" class="clubRow"><a href="http://www.nattc.com">North Alabama Table Tennis Club</a><br><font size="-1">(1/31/2015)</font></td>
    <td align="center" valign="middle" width="500" bgcolor="whitesmoke"><table width="98%" border="1" cellpadding="2">
        <tbody><tr>
            <td width="60%" class="clubRowEven">Aquadome Recreation Center<br>1202 5th Ave SW<br>Decatur, AL  35601</td>
            <td width="40%" class="clubRowEven">Tuesday - 6:00 - 9:00PM</td>
        </tr>
        <tr>
<td width="60%" class="clubRowOdd">Brahan Springs Rec. Center<br>3770 Ivy St.<br>Huntsville, AL  35805<br>256-883-3710</td>
            <td width="40%" class="clubRowOdd">Winter - Wed 6 - 9 PM
Spring/Summer - Thur 6 - 9 PM</td>
        </tr>
    </tbody></table>
    </td>
    <td bgcolor="whitesmoke" class="clubRow"><a href="mailto:crpatton@hiwaay.net">Chip Patton</a><br>256-772-7359</td>
</tr>
<td width="60%" class="clubRowOdd">Brahan Springs Rec. Center<br>3770 Ivy St.<br>Huntsville, AL  35805<br>256-883-3710</td>
            <td width="40%" class="clubRowOdd">Winter - Wed 6 - 9 PM
Spring/Summer - Thur 6 - 9 PM</td>
        </tr>
    </tbody></table>
    </td>
    <td bgcolor="whitesmoke" class="clubRow"><a href="mailto:crpatton@hiwaay.net">Chip Patton</a><br>256-772-7359</td>
</tr>
    <tr>
    <td class="clubRow"><a href="http://neatt.weebly.com/">North East Alabama Table Tennis</a><br><font size="-1">(7/31/2014)</font></td>
    <td align="center" valign="middle" width="500"><table width="98%" border="1" cellpadding="2">
        <tbody><tr>
            <td width="60%" class="clubRowEven">Anniston Army Depot Gym, Bldg 206<br>7 Frankford Ave.<br>Anniston, AL  36201<br>256-235-6385<br><br><u>Directions</u>:<br>Call 256-235-6385</td>
            <td width="40%" class="clubRowEven">Tues 5:00 to 9:00PM</td>
        </tr>
    </tbody></table>
    </td>
    <td class="clubRow"><a href="mailto:238mike@bellsouth.net">Mike Harris</a><br>256-689-8603</td>
</tr>
 </tbody>

the code for the xpath is as follows:

$urlArr = array(); 
$clssname="clubRow";


$anchors = $xpath->query("//table/tr/td[@class='$clssname'] //a");
foreach($anchors as $a)

{ 
 // $urlArr[]= $a->nodeValue." - ".$a->getAttribute("href")."<br/>";
$urlArr[]= $a->getAttribute("href")."<br/>";

} 

In its current form, the output is:

Array
(
[0] => http://www.bumpernets.com

[1] => mailto:homer.bumpernets@gmail.com

[2] => http://www.nattc.com

[3] => mailto:crpatton@hiwaay.net

[4] => http://neatt.weebly.com/

[5] => mailto:238mike@bellsouth.net

)

my question is how to structure the array to look like the following:

Array
(
[0] => http://www.bumpernets.com

[1] => mailto:homer.bumpernets@gmail.com
)
Array
(
[0] => http://www.nattc.com

[1] => mailto:crpatton@hiwaay.net
)
Array
(
[0] => http://neatt.weebly.com/

[1] => mailto:238mike@bellsouth.net

)

Basically, data cell (<td>) from within each <tr> beginning with from the second is formatted in an array.
I would appreciate any thoughts on this.

Best
Mossa

2
Contributors
1
Reply
13
Views
3 Years
Discussion Span
Last Post by v1shwa
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.