Hi all,

I have a html file, i need to extract name , address and phone no. from it.

now name is in <span class="name"></span>
add is in <span class="list_address"></span>
phone is in <span style="display:none" ID="phoneVal11"></span>

I have used following code for getting name,add and phone:

<?php


function get_name($file){
echo"In Set_Name<br>";
    $h1count = preg_match_all('/<span\s+(?:.*?\s+)?class=([\"\'])name\1\s*>\s*(?:.*?\s+)*?(.*?)\s*<\/span>/',$file,$patterns);
    $res = array();
    array_push($res,$patterns[2]);
    array_push($res,count($patterns[2]));
    return $res;
} 

function get_add($file){
echo"In Set_Add<br>";
    $h1count = preg_match_all('/<span\s+(?:.*?\s+)?class=([\"\'])list_address\1\s*>\s*(?:.*?\s+)*?\s*(.*?)\s*<\/span>/',$file,$patterns);
    $res = array();
    array_push($res,$patterns[2]);
    array_push($res,count($patterns[2]));
    return $res;
} 

function get_phone_no($file){
echo"In Set_Phone<br>";
    $h1count = preg_match_all('/<span\s+(?:.*?\s+)?ID=([\"])phoneVal[1-9].*\1\s*>(?:.*?\s+)*?(.*?)\s*<\/span>/',$file,$patterns);
    $res = array();
    array_push($res,$patterns[2]);
    array_push($res,count($patterns[2]));
    return $res;
} 
$str = htmlentities(strip_tags("testfile2.html"));
    	 
	$file = file_get_contents($str); 
	
	$name = get_name($file); 
	$add = get_add($file); 
	$phone = get_phone_no($file); 
	 
	  // get names
	      if($name[1] != 0){
        echo "<br/>Names Found: $name[1]<ul>";
        foreach($name[0] as $key => $val){
        echo "<li>" . htmlentities($val) . "</li>";
        }
        echo "</ul>";
    }else{
        echo "<br/><div class=\"error\">No Names Found</div><br/>";
    } 
	
	// get addresses
	      if($add[1] != 0){
        echo "<br/>Addresses Found: $add[1]<ul>";
        foreach($add[0] as $key => $val){
        echo "<li>" . htmlentities($val) . "</li>";
        }
        echo "</ul>";
    }else{
        echo "<br/><div class=\"error\">No Addresses Found</div><br/>";
    } 
	
	// get phone no.s
	      if($phone[1] != 0){
        echo "<br/>Phone No.s Found: $phone[1]<ul>";
        foreach($phone[0] as $key => $val){
        echo "<li>" . htmlentities($val) . "</li>";
        }
        echo "</ul>";
    }else{
        echo "<br/><div class=\"error\">No Phone No.s Found</div><br/>";
    } 
	?>

now the problem is::
for names, i am getting names from span tags that do not have any other tag nested into it for eg::

<span class="name">
<a href="http://www.superpages.com/bp/Winston-Salem-NC/Murphy-Matthew-State-Farm-Insurance-Agent-L0136893005.htm" onClick='setLSBCookie14(); this.href = "http://clicks.superpages.com/ct/clickThrough?SRC=promo17&amp;target=SP&amp;PN=1&amp;FP=listings&amp;S=NC&amp;C=Insurance&amp;CID=495050&amp;PGID=yp452.8081.1220963148577.12010365560&amp;channelId=sp16202148s&amp;ACTION=log,red&amp;LID=0136893005&amp;relativePosition=14&amp;FL=list&amp;TL=profile&amp;LOC=" + "http://www.superpages.com/bp/Winston-Salem-NC/Murphy-Matthew-State-Farm-Insurance-Agent-L0136893005.htm?SRC=promo17&C=Insurance&L=NC&lbp=1"'">
Murphy, Matthew - State Farm Insurance Agent
</a>
</span>

here output i get is </a>

so i need a solution that this <a> tag is ignored

Problem with address tags is...it has commas and new lines eg::

<span class="list_address">
<br>1425D West 1st Street,
Winston Salem,
NC 27101
</span>

output i am getting is :: NC 27101

Problem with phone no. is..it has phone no. as well as fax or cell no.s so in output i get the last no. i.e eg::

<span style="display:none" ID="phoneVal14"><br>(336) 722-1718
<br>(336) 896-1060 (fax)
</span>

i am getting output :: <br>(336) 896-1060 (fax)
i need both the no.s

please help...i am stuck now for 2 days..i am a begginner please help.
Thanks in advance

This article has been dead for over six months. Start a new discussion instead.