•
•
•
•
What is DaniWeb IT Discussion Community?
You're currently browsing the Perl section within the Software Development category of DaniWeb, a massive community of 456,272 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 3,435 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Perl advertiser: Programming Forums
Views: 571 | Replies: 1
![]() |
•
•
Join Date: Sep 2008
Posts: 7
Reputation:
Rep Power: 0
Solved Threads: 0
Extract everything between a <span> tag ignoring any othe tag , newline , comma in it
#1
Sep 19th, 2008
Hi all,
I have a html file, i need to extract name , address and phone no. from it.
now name is in <span class="name"></span>
add is in <span class="list_address"></span>
phone is in <span style="display:none" ID="phoneVal11"></span>
I have used following code for getting name,add and phone:
now the problem is::
for names, i am getting names from span tags that do not have any other tag nested into it for eg::
here output i get is </a>
so i need a solution that this <a> tag is ignored
Problem with address tags is...it has commas and new lines eg::
output i am getting is :: NC 27101
Problem with phone no. is..it has phone no. as well as fax or cell no.s so in output i get the last no. i.e eg::
i am getting output :: <br>(336) 896-1060 (fax)
i need both the no.s
please help...i am stuck now for 2 days..i am a begginner please help.
Thanks in advance
I have a html file, i need to extract name , address and phone no. from it.
now name is in <span class="name"></span>
add is in <span class="list_address"></span>
phone is in <span style="display:none" ID="phoneVal11"></span>
I have used following code for getting name,add and phone:
<?php
function get_name($file){
echo"In Set_Name<br>";
$h1count = preg_match_all('/<span\s+(?:.*?\s+)?class=([\"\'])name\1\s*>\s*(?:.*?\s+)*?(.*?)\s*<\/span>/',$file,$patterns);
$res = array();
array_push($res,$patterns[2]);
array_push($res,count($patterns[2]));
return $res;
}
function get_add($file){
echo"In Set_Add<br>";
$h1count = preg_match_all('/<span\s+(?:.*?\s+)?class=([\"\'])list_address\1\s*>\s*(?:.*?\s+)*?\s*(.*?)\s*<\/span>/',$file,$patterns);
$res = array();
array_push($res,$patterns[2]);
array_push($res,count($patterns[2]));
return $res;
}
function get_phone_no($file){
echo"In Set_Phone<br>";
$h1count = preg_match_all('/<span\s+(?:.*?\s+)?ID=([\"])phoneVal[1-9].*\1\s*>(?:.*?\s+)*?(.*?)\s*<\/span>/',$file,$patterns);
$res = array();
array_push($res,$patterns[2]);
array_push($res,count($patterns[2]));
return $res;
}
$str = htmlentities(strip_tags("testfile2.html"));
$file = file_get_contents($str);
$name = get_name($file);
$add = get_add($file);
$phone = get_phone_no($file);
// get names
if($name[1] != 0){
echo "<br/>Names Found: $name[1]<ul>";
foreach($name[0] as $key => $val){
echo "<li>" . htmlentities($val) . "</li>";
}
echo "</ul>";
}else{
echo "<br/><div class=\"error\">No Names Found</div><br/>";
}
// get addresses
if($add[1] != 0){
echo "<br/>Addresses Found: $add[1]<ul>";
foreach($add[0] as $key => $val){
echo "<li>" . htmlentities($val) . "</li>";
}
echo "</ul>";
}else{
echo "<br/><div class=\"error\">No Addresses Found</div><br/>";
}
// get phone no.s
if($phone[1] != 0){
echo "<br/>Phone No.s Found: $phone[1]<ul>";
foreach($phone[0] as $key => $val){
echo "<li>" . htmlentities($val) . "</li>";
}
echo "</ul>";
}else{
echo "<br/><div class=\"error\">No Phone No.s Found</div><br/>";
}
?>now the problem is::
for names, i am getting names from span tags that do not have any other tag nested into it for eg::
<span class="name"> <a href="http://www.superpages.com/bp/Winston-Salem-NC/Murphy-Matthew-State-Farm-Insurance-Agent-L0136893005.htm" onClick='setLSBCookie14(); this.href = "http://clicks.superpages.com/ct/clickThrough?SRC=promo17&target=SP&PN=1&FP=listings&S=NC&C=Insurance&CID=495050&PGID=yp452.8081.1220963148577.12010365560&channelId=sp16202148s&ACTION=log,red&LID=0136893005&relativePosition=14&FL=list&TL=profile&LOC=" + "http://www.superpages.com/bp/Winston-Salem-NC/Murphy-Matthew-State-Farm-Insurance-Agent-L0136893005.htm?SRC=promo17&C=Insurance&L=NC&lbp=1"'"> Murphy, Matthew - State Farm Insurance Agent </a> </span>
here output i get is </a>
so i need a solution that this <a> tag is ignored
Problem with address tags is...it has commas and new lines eg::
<span class="list_address"> <br>1425D West 1st Street, Winston Salem, NC 27101 </span>
output i am getting is :: NC 27101
Problem with phone no. is..it has phone no. as well as fax or cell no.s so in output i get the last no. i.e eg::
<span style="display:none" ID="phoneVal14"><br>(336) 722-1718 <br>(336) 896-1060 (fax) </span>
i am getting output :: <br>(336) 896-1060 (fax)
i need both the no.s
please help...i am stuck now for 2 days..i am a begginner please help.
Thanks in advance
Re: Extract everything between a <span> tag ignoring any othe tag , newline , comma in it
#2
Sep 19th, 2008
![]() |
•
•
•
•
•
•
•
•
DaniWeb Perl Marketplace
•
•
•
•
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
Other Threads in the Perl Forum
- Previous Thread: can i run an exe from command line arguments
- Next Thread: where would i find this information?


Linear Mode