perl search and print record from large files

Question

hema_24 0 Newbie Poster

11 Years Ago

Hi Experts,

i have file called condition_file and the data in this file looks like....

condition_file

K 01
J 02
H 03
I 04

I am using below code to read the file condition_file and place the values into an hash

open F, "$condition_file" or die "File not exists";
while (<F>) {
chomp;
$records{$field1} = $field1;
}

I am reading list of files into array

@array

file-1
file-2
file-3
file-4

The content of each file

file-1

K,01,Europe,Sweden,Mace,rank1,check,01234
J,02,Australia,Sydney,Syd,rank2,chek1,01234
K,01,China,chen,mar,rank4,chack,11234
J,02,japan,Syin,yhk,ranek,chek2,21234

file-2

H,03,German,Ger,hgtk,rank4,hekc,1245
I,04,Negria,neg,ghsjk,rankk1,jusk,4562
K,01,Europe1,Sweden4,Mace1,rank15,check1,12234
K,02,Europe2,Sweden3,Mace2,rank14,check2,21234

file-3

H,03,German2,Ger,hgtk,rank4,hekc,1245
I,04,Negria2,neg,ghsjk,rankk1,jusk,4562
K,11,Europe5,Sweden6,Mace3,rank16,check11,42234

file-4

H,16,German2,Ger,hgtk,rank4,hekc,1245
I,17,Negria2,neg,ghsjk,rankk1,jusk,4562
K,11,Europe5,Sweden6,Mace3,rank16,check11,42234

I need to see if the first field in condition_file exists in any of the files available in @array

foreach my $file_name (@array) {
open FILE, "$file_name" or die "File not exists";
while ( chomp( my ( $field1, $field2 ) = ( split /\,/, <FILE> ) ) ) {
if (exists $records{$field1}) {
$field-2 = $records{$field1};
if $field2 = $field-2;
{
print OUTPUT ( ( join ",", $field1, $field2 ), "\n" );
}}}}

code works like... it reads each file for example file-1, takes first two elements field1 --> K and field2 --> 01 and checks if field1 exists in records array if exists it assigns its second field to field-2, then checks if field2 and field-2 are equal, if equal then print the value to OUTPUT. This prints the data to output text file as

Output what i got........

output file

K,01,
J,02,
K,01,
J,02,
H,03,
I,04,
K,01,
H,03,
I,04,

Output i am expecting

output file

K,01,Europe,Sweden,Mace,rank1,check,01234
J,02,Australia,Sydney,Syd,rank2,chek1,01234
K,01,China,chen,mar,rank4,chack,11234
J,02,japan,Syin,yhk,ranek,chek2,21234
H,03,German,Ger,hgtk,rank4,hekc,1245
I,04,Negria,neg,ghsjk,rankk1,jusk,4562
K,01,Europe1,Sweden4,Mace1,rank15,check1,12234
H,03,German2,Ger,hgtk,rank4,hekc,1245
I,04,Negria2,neg,ghsjk,rankk1,jusk,4562

The above logic is taking almost 4 hours of time to fetch the data from large/huge files

perl

2 Contributors
1 Reply
212 Views
22 Hours Discussion Span
Latest Post 11 Years Ago Latest Post by 2teez

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

2teez 43 Posting Whiz · Answer 1 · 2013-11-01T05:20:32+00:00

Hi hema_24,

One way of solving this kind of problem is like this:
Make an hash and get the data from your condition_files, and then read from the other files onces at a time. Since, the first two columns seerated by commas are what is needed for comparism. One can just get that can compare it with the keys of the hash, the print out those that matched.

These codes below demostrates thedescription above:
NOTE: That for your the code below to work you need to have the module Inline::Files installed on your system. Please check CPAN.

use warnings;
use strict;
use Inline::Files;

my %rec;

while (<DATA>) {
    next if /^$/;
    my $data = join '' => split /\s+/, $_;
    $rec{$data} = 1;
}

for my $fh (qw(FILE_1 FILE_2 FILE_3 FILE_4)) {

    while ( my $line = <$fh> ) {
        chomp $line;
        next if $line =~ /^$/;
        my $str_for_compare = join '' => ( split /,/, $line )[ 0, 1 ];
        for ( keys %rec ) {
            print $line, $/ if $_ eq $str_for_compare;
        }
    }
}

__DATA__
K 01
J 02
H 03
I 04


__FILE_1__
K,01,Europe,Sweden,Mace,rank1,check,01234
J,02,Australia,Sydney,Syd,rank2,chek1,01234
K,01,China,chen,mar,rank4,chack,11234
J,02,japan,Syin,yhk,ranek,chek2,21234

__FILE_2__
H,03,German,Ger,hgtk,rank4,hekc,1245
I,04,Negria,neg,ghsjk,rankk1,jusk,4562
K,01,Europe1,Sweden4,Mace1,rank15,check1,12234
K,02,Europe2,Sweden3,Mace2,rank14,check2,21234

__FILE_3__
H,03,German2,Ger,hgtk,rank4,hekc,1245
I,04,Negria2,neg,ghsjk,rankk1,jusk,4562
K,11,Europe5,Sweden6,Mace3,rank16,check11,42234

__FILE_4__
H,16,German2,Ger,hgtk,rank4,hekc,1245
I,17,Negria2,neg,ghsjk,rankk1,jusk,4562
K,11,Europe5,Sweden6,Mace3,rank16,check11,42234

In place of the Inline::Files, you can use 3-arugment open function type like open my $filehandle, '<', $file or die "can't open file: $!"; to read in your file one step at a time.

The output produced is like ...

K,01,Europe,Sweden,Mace,rank1,check,01234
J,02,Australia,Sydney,Syd,rank2,chek1,01234
K,01,China,chen,mar,rank4,chack,11234
J,02,japan,Syin,yhk,ranek,chek2,21234
H,03,German,Ger,hgtk,rank4,hekc,1245
I,04,Negria,neg,ghsjk,rankk1,jusk,4562
K,01,Europe1,Sweden4,Mace1,rank15,check1,12234
H,03,German2,Ger,hgtk,rank4,hekc,1245
I,04,Negria2,neg,ghsjk,rankk1,jusk,4562

Also, note that your files are CSV files you may also have to look into modules like Text::CSV_XS o Text::CSV.