We're a community of 1076K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,075,752 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

Remove specific whitespace from a text &Compare 4 files and output result

Hi,
I have 4 different files that contain the text below.
I need to Remove specific whitespace from the text and to find if there is a number that  appears in more then one file and to print the output result to a new file that will includ lines with the following output: 
1- The number
2- Letters thet appears next to the number
3- The name of the file that out put was taken from.

Example:
The text look like this in the files:
file 1 named PT1.txt:
IBN         LG   25 1 03 08       077 437 1234      CPB         GWLPOT
IBN         LG   25 1 03 08       077 437 1111      IDL           GWLPOT
file 2 named PT2.txt:
IBN         LG   25 1 03 08       077 437 1113      PLO         GWLPOT
IBN         LG   25 1 03 08       077 437 2738      SB           GWLPOT
IBN         LG   25 1 03 08       077 411 2238      MB           GWLPOT
file 3 named YK1.txt:
IBN         LG   25 1 03 08       077 437 1113      SB           GWLPOT
IBN         LG   25 1 03 08       077 411 2738      SB           GWLPOT
IBN         LG   25 1 03 08       077 411 2338      MB           GWLPOT

file 4 nemed YK2.txt
IBN         LG   25 1 03 08       077 437 1113      PLO         GWLPOT
IBN         LG   25 1 03 08       077 437 2738      SB           GWLPOT
IBN         LG   25 1 03 08       077 437 2738      MB           GWLPOT

The out put that I need to get is:
0774371113      SB | YK1.txt
0774371113      PLO | YK2.txt
I wrote a script for one file that delete some info from the lines and the out put look like this:
077 437 2738      CPB
077 437 2738      CPB

The script look like this:
#!/usr/bin/perl   
$file = "PT1.txt";   
open (IN, $file) || die "Cannot open file ".$file." for read";        
@lines=<IN>;     
open (OUT, ">", $file) || die "Cannot open file ".$file." for write"; 
foreach $line (@lines)   
{

$line =~ s/\s\d{2}\s | \s\d{2}\s{6}//ig;
$line =~ s/\s\d{2}\s | \s\d{2}\s{6}//ig;
$line =~ s/\s\d{1}\s | \s\d{2}\s{2}//ig;
$line =~ s/\s\d{2}\s//ig;
$line =~ s/IBN|LG|GWLPOT//ig;
$line =~ s/\s{11}//ig;


   print OUT $line;     
}     
close OUT; 

please advice me with this issue,
Thank's in advance.
3
Contributors
3
Replies
1 Day
Discussion Span
1 Year Ago
Last Updated
4
Views
Question
Answered
erezz
Newbie Poster
10 posts since May 2012
Reputation Points: 0
Solved Threads: 0
Skill Endorsements: 0

To answer the first part of your question about removing spaces from the number and reading from four input files:

#!/usr/bin/perl
use strict;
use warnings;

@ARGV = qw(PT1.txt PT2.txt YK1.txt YK2.txt);

while (my $line = <>){
    my @fields = split /\s+/, $line;
    print @fields[6..8], "\t", $fields[9], ' | ', $ARGV, "\n";
}
d5e5
Practically a Posting Shark
831 posts since Sep 2009
Reputation Points: 162
Solved Threads: 163
Skill Endorsements: 1

This does all that you want, I believe (Please take it also as a guide in the right direction).
Also note that the final output is to a new file called new_file.txt or any other name you might give it.

#!/usr/bin/perl
use warnings;
use strict;

my @files = qw(PT1.txt PT2.txt YK1.txt YK2.txt);
my %has_data;
my %sorter;

foreach my $file (@files) {
open my $fh, '<', $file or die "can't open $file:$!";
while (<$fh>) {
    chomp;
    my @rec = split;
    my $num = join "", @rec[ 6 .. 8 ];
    ++$sorter{$num};
    push @{ $has_data{$num} }, "$rec[9] | $file";
}
close $fh or die "can't close file:$!";
}

foreach ( keys %sorter ) {
    delete $has_data{$_} if $sorter{$_} == 1;
}

open my $fh, '>', 'new_file.txt' or die "can't open file:$!"; # Output 
foreach my $number ( keys %has_data ) {
    foreach ( @{ $has_data{$number} } ) {
        print $fh $number, "t", $_, $/;
    }
}
close $fh or die "can't close file:$!";

Just run this script from your CLI. But note that all your file namely PT1.txt,PT2.txt,YK1.txt and YK2.txt must be in the same directory with your script (though this is not cast in stone you can modify as you so wish)
Hope this help!

MY OUTPUT (new_file.txt)

0774371113  PLO | PT2.txt
0774371113  SB | YK1.txt
0774371113  PLO | YK2.txt
0774372738  SB | PT2.txt
0774372738  SB | YK2.txt
0774372738  MB | YK2.txt
2teez
Junior Poster
162 posts since Apr 2012
Reputation Points: 40
Solved Threads: 32
Skill Endorsements: 0
Question Answered as of 1 Year Ago by d5e5 and 2teez

Thank you both very much for your help,time and for your quick and efficient answer.

erezz
Newbie Poster
10 posts since May 2012
Reputation Points: 0
Solved Threads: 0
Skill Endorsements: 0

This question has already been solved: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
 
© 2013 DaniWeb® LLC
Page rendered in 0.0619 seconds using 2.68MB