User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Perl section within the Software Development category of DaniWeb, a massive community of 456,272 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 3,435 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Perl advertiser: Programming Forums
Views: 571 | Replies: 1
Reply
Join Date: Sep 2008
Posts: 7
Reputation: jyotiu is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
jyotiu jyotiu is offline Offline
Newbie Poster

Extract everything between a <span> tag ignoring any othe tag , newline , comma in it

  #1  
Sep 19th, 2008
Hi all,

I have a html file, i need to extract name , address and phone no. from it.

now name is in <span class="name"></span>
add is in <span class="list_address"></span>
phone is in <span style="display:none" ID="phoneVal11"></span>

I have used following code for getting name,add and phone:

<?php


function get_name($file){
echo"In Set_Name<br>";
    $h1count = preg_match_all('/<span\s+(?:.*?\s+)?class=([\"\'])name\1\s*>\s*(?:.*?\s+)*?(.*?)\s*<\/span>/',$file,$patterns);
    $res = array();
    array_push($res,$patterns[2]);
    array_push($res,count($patterns[2]));
    return $res;
} 

function get_add($file){
echo"In Set_Add<br>";
    $h1count = preg_match_all('/<span\s+(?:.*?\s+)?class=([\"\'])list_address\1\s*>\s*(?:.*?\s+)*?\s*(.*?)\s*<\/span>/',$file,$patterns);
    $res = array();
    array_push($res,$patterns[2]);
    array_push($res,count($patterns[2]));
    return $res;
} 

function get_phone_no($file){
echo"In Set_Phone<br>";
    $h1count = preg_match_all('/<span\s+(?:.*?\s+)?ID=([\"])phoneVal[1-9].*\1\s*>(?:.*?\s+)*?(.*?)\s*<\/span>/',$file,$patterns);
    $res = array();
    array_push($res,$patterns[2]);
    array_push($res,count($patterns[2]));
    return $res;
} 
$str = htmlentities(strip_tags("testfile2.html"));
    	 
	$file = file_get_contents($str); 
	
	$name = get_name($file); 
	$add = get_add($file); 
	$phone = get_phone_no($file); 
	 
	  // get names
	      if($name[1] != 0){
        echo "<br/>Names Found: $name[1]<ul>";
        foreach($name[0] as $key => $val){
        echo "<li>" . htmlentities($val) . "</li>";
        }
        echo "</ul>";
    }else{
        echo "<br/><div class=\"error\">No Names Found</div><br/>";
    } 
	
	// get addresses
	      if($add[1] != 0){
        echo "<br/>Addresses Found: $add[1]<ul>";
        foreach($add[0] as $key => $val){
        echo "<li>" . htmlentities($val) . "</li>";
        }
        echo "</ul>";
    }else{
        echo "<br/><div class=\"error\">No Addresses Found</div><br/>";
    } 
	
	// get phone no.s
	      if($phone[1] != 0){
        echo "<br/>Phone No.s Found: $phone[1]<ul>";
        foreach($phone[0] as $key => $val){
        echo "<li>" . htmlentities($val) . "</li>";
        }
        echo "</ul>";
    }else{
        echo "<br/><div class=\"error\">No Phone No.s Found</div><br/>";
    } 
	?>

now the problem is::
for names, i am getting names from span tags that do not have any other tag nested into it for eg::

<span class="name">
<a href="http://www.superpages.com/bp/Winston-Salem-NC/Murphy-Matthew-State-Farm-Insurance-Agent-L0136893005.htm" onClick='setLSBCookie14(); this.href = "http://clicks.superpages.com/ct/clickThrough?SRC=promo17&amp;target=SP&amp;PN=1&amp;FP=listings&amp;S=NC&amp;C=Insurance&amp;CID=495050&amp;PGID=yp452.8081.1220963148577.12010365560&amp;channelId=sp16202148s&amp;ACTION=log,red&amp;LID=0136893005&amp;relativePosition=14&amp;FL=list&amp;TL=profile&amp;LOC=" + "http://www.superpages.com/bp/Winston-Salem-NC/Murphy-Matthew-State-Farm-Insurance-Agent-L0136893005.htm?SRC=promo17&C=Insurance&L=NC&lbp=1"'">
Murphy, Matthew - State Farm Insurance Agent
</a>
</span>

here output i get is </a>

so i need a solution that this <a> tag is ignored

Problem with address tags is...it has commas and new lines eg::

<span class="list_address">
<br>1425D West 1st Street,
Winston Salem,
NC 27101
</span>

output i am getting is :: NC 27101

Problem with phone no. is..it has phone no. as well as fax or cell no.s so in output i get the last no. i.e eg::
<span style="display:none" ID="phoneVal14"><br>(336) 722-1718
<br>(336) 896-1060 (fax)
</span>

i am getting output :: <br>(336) 896-1060 (fax)
i need both the no.s

please help...i am stuck now for 2 days..i am a begginner please help.
Thanks in advance
AddThis Social Bookmark Button
Reply With Quote  
Join Date: Mar 2006
Posts: 641
Reputation: KevinADC is an unknown quantity at this point 
Rep Power: 4
Solved Threads: 36
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Practically a Master Poster

Re: Extract everything between a <span> tag ignoring any othe tag , newline , comma in it

  #2  
Sep 19th, 2008
this is the perl forum, not the php forum.
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

DaniWeb Perl Marketplace
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

Thread Tools Display Modes

Other Threads in the Perl Forum

All times are GMT -4. The time now is 6:30 pm.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC