Please Help me to extract the inner table to the outer table so that it will not have table within table because it is not good data.. thank you

Raw Data

<item>
	<pre>
	<table width="100%" border="0" cellpadding="2" cellspacing="2" bgcolor="#eeeeee">
	 <tr>
	  <td><p>"Gratitude for the abundance you have received is
	the best insurance that the abundance will continue."<br>
	                              -Muhammad</p></td>
	  </tr>
	  <tr>
	   <td></td>
	  </tr>
	  <tr>
	    <td><p><p>"Gratitude for the abundance you have received is
	the best insurance that the abundance will continue."<br>
	                              -Muhammad</p></td>
	    </tr>
	    <tr>
	    <td></td>
	    </tr>
	    <tr>
	    <td><pre><table cellspacing="5" cellpadding="0" width="100" align="left">
	        <tr>
	         <td>
	         <pre><table cellspacing="0" cellpadding="5" width="100" bgcolor="#eeeeee">
	          <tr>
	           <td class="art"></td>
	          </tr>
	          <tr>
	            <td class="art">Photo courtesy of Bellentani</td>
	          </tr>
	        </table></pre>           
	        </td>
	        </tr>
	     </table></pre></td> </tr>
	</table></pre>
	</item>

the code i use

while(<INFILE>){
	 s/<\/?pre>//gm;
	 s/(<table)\s+[^\000]*?>/$1>/mgi;
	 
	 s/<(\/?table)/<$1_tmp/g;
	  while(/<table_tmp[^>]*?>[^\000]*?<\/table_tmp>/){
	    $text=$&;
	    $text=~s/<(\/?table)_tmp[^>]*?>/<\$1>/g;
	    $para="";
	    while($text=~s/(<table>[^\000]*?<\/table>)//){
	      $para="$para\n$1";
	    }
	    $para=~s/<table>([^\000]*?)<\/table>/<table>$1<\/table>/mgi;
	    s/(<table_tmp[^>]*?>[^\000]*?<\/table_tmp>)/$text\n$para/;
	  }
	  s/<(\/?table)[^>]*?>/<$1>/g;
}

Print OUTFILE $_;

the output should be

<item>
	 
	<table>
	 <tr>
	  <td><p>"Gratitude for the abundance you have received is
	the best insurance that the abundance will continue."<br>
	                              -Muhammad</p></td>
	  </tr>
	  <tr>
	   <td></td>
	  </tr>
	  <tr>
	    <td><p><p>"Gratitude for the abundance you have received is
	the best insurance that the abundance will continue."<br>
	                              -Muhammad</p></td>
	    </tr>
	    <tr>
	    <td></td>
	    </tr>
	    <tr>
	    <td>
	     </td>
	     </tr>
	</table>
	 
	<table cellspacing="5" cellpadding="0" width="100" align="left">
	        <tr>
	         <td></td> </tr>
	</table>
	          
	<table>
	    <tr>
	     <td class="art"></td>
	    </tr>
	    <tr>
	     <td class="art">Photo courtesy of Bellentani</td>
	   </tr>
	</table>          
	  
	</item>

Thank you

use strict;
use warnings;

open (FIN, "test.xml") or die "Error : $!";
read FIN, my $file, -s FIN;
close (FIN);

my @table=();
# Get the necessary contents 
while ($file=~ m{<table((?:(?!<\/?table).)*)</table>}sg) {
	my $find=quotemeta($&);
	$file=~ s{$find}{}s;
	unshift(@table, $&);
}

# show the extracted conents
for (my $i=0; $i<=$#table; $i++) {
print "\n\n---- Table", $i+1 ," -------\n$table[$i]";
}

Do some stuffs you may get your expected output. The above code extract single table contents only. Try this code.

Edited 5 Years Ago by k_manimuthu: n/a

Comments
Nice work.
This article has been dead for over six months. Start a new discussion instead.