Hi,

I have 2 large log files in .gz format

file1.gz contains

abcde
12345
23456
.
.
.
.
.
.
.
.
09123


file2.gz contains
abcde,1,2,3,4,5,6,7
09123,3,4,5,6,7,7,8
23456,9,6,5,4,3,2,1
....
...
...
...

I am basically looking for a script to open the file1 , read line by line and verify if the string matches in file 2 then redirect matching lines to one more file 3 , the string match should be irrespective of position they occur in file2. and output the file also should be compressed in .gz format.

Since i have a space limitation i need a script that can be executed in the compressed format only

Recommended Answers

All 9 Replies

Not having your input files, you are going to have to check to see if this works for the types of patterns you have in your files.
See if this helps

#!/usr/bin/perl 

use strict;
use warnings;
use IO::Compress::Gzip qw(gzip $GzipError);

if ( !defined($ARGV[0]) || 
      !defined($ARGV[1]) || 
       !defined($ARGV[2])){
   print "Usage: $0 <patterns.gz> <search.gz> <output.gz>\n";
   exit;
}

my $href = {};
my $outfile = $ARGV[2];
loadHash($href,$ARGV[0]);
$outfile .= ".gz" unless ( $ARGV[2] =~ /\.gz$/);
checkFile($href,$ARGV[1],$outfile);

##### Subs
sub loadHash {
   my ($ref,$file) = @_;
   my $fh;
   if ( $file =~ /\.gz/ ) {
       open($fh, "gunzip -c $file |") or 
               die "Unable to open pipe to $file\n";
   }
   else {
       open($fh, $file) or die "Unable to open pipe to $file\n";
   }
   
   while(<$fh>) {
      $ref->{$1} = 1 if ( /^(\S+)/);
   }
   close($fh);
}

sub checkFile {
   my ($ref,$file,$output) = @_;
   my $fh;
   if ( $file =~ /\.gz/ ) {
       open($fh, "gunzip -c $file |") or 
             die "Unable to open pipe to $file\n";
   }
   else {
       open($fh, $file) or die "Unable to open pipe to $file\n";
   }
   my $z = new IO::Compress::Gzip $output 
                 or die "gzip failed: $GzipError\n";
   while(<$fh>){
      if ( /^(\S+),/) {
         $z->print($_) if (defined($ref->{$1}));
      }
   }
   $z->close();   
   close($fh);
}

Pattern file:

$ gunzip -c pat.gz
abc
123
bbb
333
890
ddd

Search file:

$ gunzip -c search.gz
This,
is,
a,
test,
abc,Got the right line
to, 
see,
if,
I, 
bbb, Got another line
can,
make,
this,
work,

Output:

$ gunzip -c output.gz
abc,Got the right line
bbb, Got another line

How to execute the script can you provide syntax of the comman

Just save the code to foo.pl and chmod 777 foo.pl if linux and run:

$ ./foo.pl
Usage: foo.pl <patterns.gz> <search.gz> <output.gz>
# In your example it would be
$ ./foo.pl file1.gz file2.gz outfile.gz

If windows do whatever you do for window to make it run.

It is giving IOCompress ZIP error.

What is the error? Is is the use? I had to download it.

This is the exact error i got

Can't locate IO/Compress/Gzip.pm in @INC (@INC contains: /usr/perl5/5.8.4/lib/i86pc-solaris-64int /usr/perl5/5.8.4/lib /usr/perl5/site_perl/5.8.4/i86pc-solaris-64int /usr/perl5/site_perl/5.8.4 /usr/perl5/site_perl /usr/perl5/vendor_perl/5.8.4/i86pc-solaris-64int /usr/perl5/vendor_perl/5.8.4 /usr/perl5/vendor_perl .) at ./foo.pl line 5.
BEGIN failed--compilation aborted at ./foo.pl line 5.

Does it need to installed in certain specific path of SOLARIS Directory or just TAR that file will do an auto installation

I'm using a Mac. I would assume that it should be the same for you (not sure though). I un-tar the file and cd into the directory. In that directory is a Makefile.PL. Then I type the commands

$ perl Makefile.PL
$ make 
$ make install

You might need su permissions to do all of that on your system.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.