1,105,242 Community Members

Compare both two columns of two files

Member Avatar
realoneomer
Light Poster
32 posts since Jan 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hello Perl Guru's

I am playing with two text files using perl but i have been end up after a one day effort and got nothing there fore i have decided to post some thing here for help well here are some details that what actually i want to do

(I am genarating one file from shell script and on the end of execution this file will be renamed to file_old and after that a shell script will be executed on next day and it will generate a file with name file_new and then i want to compare both the files that is if a value of col2 has been changed then show me the value of col1 either from FileA or FileB)

I have two text files name FileA and FileB and both files have two columns like

FileA

Col1 Col2
ABC 123
ABC 987
DEF 456
DEF 898
DEF 658
GHI 789

and FileB also have two columns and it looks like

Col1 Col2
ABC 123
ABC 987
DEF 456
DEF 898
DEF 658
GHI 789
GHI 435
GHI 654
KLM 543
KLM 123
KLM 324

now i want to compare the col2 of both files against col1 if any data in col2 has moved in first or second file then show me the value in col1 from both files i.e., if data has moved from fileA to fileB then show me the col1 value of fileB and if data has moved from fileB to fileA then show me the col1 value of fileA

Thanks in advance

Omer

Member Avatar
replic
Junior Poster in Training
52 posts since Nov 2008
Reputation Points: 17 [?]
Q&As Helped to Solve: 6 [?]
Skill Endorsements: 0 [?]
 
0
 

It would be nice if you post some code so we can help you with any problems you ran into.
Did you try to just read in the files split them and then compare them?

Member Avatar
realoneomer
Light Poster
32 posts since Jan 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi replic,

Thanks for your reply..

I am not a perl guru i am looking for some help if any one can give me a template code so that i can work on that

Member Avatar
d5e5
Practically a Posting Shark
827 posts since Sep 2009
Reputation Points: 109 [?]
Q&As Helped to Solve: 163 [?]
Skill Endorsements: 2 [?]
 
0
 

Hello Perl Guru's

I am playing with two text files using perl but i have been end up after a one day effort and got nothing there fore i have decided to post some thing here for help well here are some details that what actually i want to do

(I am genarating one file from shell script and on the end of execution this file will be renamed to file_old and after that a shell script will be executed on next day and it will generate a file with name file_new and then i want to compare both the files that is if a value of col2 has been changed then show me the value of col1 either from FileA or FileB)

I have two text files name FileA and FileB and both files have two columns like

FileA

Col1 Col2
ABC 123
ABC 987
DEF 456
DEF 898
DEF 658
GHI 789

and FileB also have two columns and it looks like

Col1 Col2
ABC 123
ABC 987
DEF 456
DEF 898
DEF 658
GHI 789
GHI 435
GHI 654
KLM 543
KLM 123
KLM 324

now i want to compare the col2 of both files against col1 if any data in col2 has moved in first or second file then show me the value in col1 from both files i.e., if data has moved from fileA to fileB then show me the col1 value of fileB and if data has moved from fileB to fileA then show me the col1 value of fileA

Thanks in advance

Omer

The first 6 lines of data in FileA and FileB look identical to me. FileB has the following additional lines that are not in FileA:

GHI 435
GHI 654
KLM 543
KLM 123
KLM 324

I don't understand what you mean by "data has moved" from one file to another. I see data in FileB that I don't see in FileA. What would you want your Perl program to output, given the input data you have shown us?

Member Avatar
realoneomer
Light Poster
32 posts since Jan 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi d5e5,

Thanks for your reply,

You are right that the two files are almost same but on day to day basis the second of columns of both files moves from fileA to fileB and similarly from fileB to fileA here is what exactly i want to do.

1) i have two files fileA & fileB
2) Initially both files have same values in my code i want if both files have same value simply show me a message like 'Nothing was moved'.
3) If the value of column2 of fileA moves to fileB then i want to display the value of column1 & Column2 (i.e., the moved values only)from which the value was moved and the value of column1 & column2 from fileB underwhich the values was moved.
4) Similarly if any value of fileB moves to file A then display the values of column1 & column2 (i.e., the moved values only) of fileB and also show me the values of column1 & column2 from fileA as well (i.e., the values which was moved from fileB to fileA.

FileA

Column1 column2

ABC 123
ABC 456
ABC 789
DEF 345
DEF 127
DEF 345
DEF 890
GHI 346
GHI 134
GHI 342
Unknown 214
Unknown 456
Unknown 146
Unknown 098
Unknown 312
Unknown 587
Unknown 431

FileB

Column1 column2

ABC 123
ABC 456
ABC 789
DEF 345
DEF 127
DEF 345
DEF 890
GHI 346
GHI 134
GHI 342
Unknown 214
Unknown 456
Unknown 146
Unknown 098
Unknown 312
Unknown 587
Unknown 431

You can see both files have same data.
Now few nightly jobs runs which normally uses the values of column2 from both files and after the end of jobs the files looks like this

FileA

Column1 column2

ABC 123
ABC 456
ABC 789
ABC 431
DEF 345
DEF 127
DEF 345
DEF 890
GHI 346
##GHI 134
GHI 342
Unknown 214
Unknown 456
Unknown 146
Unknown 098
Unknown 312
Unknown 587
Unknown 431

FileB

Column1 column2

ABC 123
ABC 456
ABC 789
DEF 345
DEF 127
DEF 345
DEF 890
GHI 346
GHI 134
GHI 342
Unknown 214
Unknown 456
Unknown 146
Unknown 098
Unknown 312
Unknown 587
##Unknown 431
Unknown 134

Now you can see that the values has been changed. One value from unknown group of fileB has been moved to the group ABC of fileA i.e., (Column2 value 431 agaist ABC ). similarly the value of group GHI from fileA has been moved in unknown group of fileB i.e., (column2 value 134 has been moved in unkonwn group underfile B)

now i want to write such a code that if any values changes their position simply give show me that values that has been moved

the output of the code shoule be like this.

Value 431 has been moved in group ABC from Unknown.
value 134 has been moved in group Unknown from GHI.

Please let me know how i can get this done.

Thanks in advance for your help in coordination.

Member Avatar
replic
Junior Poster in Training
52 posts since Nov 2008
Reputation Points: 17 [?]
Q&As Helped to Solve: 6 [?]
Skill Endorsements: 0 [?]
 
0
 

I'm still not exactly sure about what you want to do but maybe something like this?

use strict;
use warnings "all";
use 5.010;

my @fileAkey;
my @fileAval;
my @fileBkey;
my @fileBval;
my @tmp;

open IN, "<a.txt" or die "can't open a.txt";
while(<IN>)
{ 
	@tmp = split(/\s/, $_);
	push(@fileAkey, $tmp[0]);
	push(@fileAval, $tmp[1]);
}
close IN;

open IN, "<b.txt" or die "can't open b.txt";
while(<IN>)
{ 
	@tmp = split(/\s/, $_);
	push(@fileBkey, $tmp[0]);
	push(@fileBval, $tmp[1]);
}
close IN;

# from a to b
for(my $i = 0; $i < @fileAkey; $i++)
{
	for(my $j = 0; $j < @fileBkey; $j++)
	{
		if($fileAval[$i] == $fileBval[$j])
		{
			say "value $fileAval[$i] (FileA) moved from group $fileAkey[$i] (FileA) to group $fileBkey[$j] (FileB)";
		}
	}
}

# from b to a
for(my $i = 0; $i < @fileBkey; $i++)
{
	for(my $j = 0; $j < @fileAkey; $j++)
	{
		if($fileBval[$i] == $fileAval[$j])
		{
			say "value $fileBval[$i] (FileB) moved from group $fileBkey[$i] (FileB) to group $fileAkey[$j] (FileA)";
		}
	}
}
Member Avatar
realoneomer
Light Poster
32 posts since Jan 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi replic,

Thanks for your help.
I have tried to get desired output from the code you have given to me but it is showing me all the values which are matching from both files but i want only moved output not all

like

123 has been moved in ABC group in file A from file B GHI group
456 has been moved in LKM group in file B from file A DEF group

Thanks

Member Avatar
d5e5
Practically a Posting Shark
827 posts since Sep 2009
Reputation Points: 109 [?]
Q&As Helped to Solve: 163 [?]
Skill Endorsements: 2 [?]
 
0
 

The following does not give exactly the output you want but hopefully it is a first step in that direction.

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;

my $filename_a = 'a.txt';
my $filename_b = 'b.txt';
my %data;

read_file($filename_a);
read_file($filename_b);

#use Data::Dumper;
#print Dumper(\%data);

foreach my $v(keys %{$data{$filename_a}}){
    foreach my $g(keys %{$data{$filename_a}->{$v}}){
        $data{$filename_b}->{$v}->{$g}->{count} = 0
            unless defined $data{$filename_b}->{$v}->{$g}->{count};
        if ($data{$filename_a}->{$v}->{$g}->{count}
            != $data{$filename_b}->{$v}->{$g}->{count}) {
            say "$filename_a $v $g count is $data{$filename_a}->{$v}->{$g}->{count}";
            say "$filename_b $v $g count is $data{$filename_b}->{$v}->{$g}->{count}";
        }
    }
}

sub read_file {
    my $filename = shift;
    open my $fh, '<', $filename or die "Failed to open $filename: $!";    
    while (<$fh>){
        chomp;
        next if m/^##/; #Skip commented-out data lines
        next unless m/\d{3}/;
        my ($group, $value) = split;
        $data{$filename}->{$value}->{$group}->{'count'}++;
    }
}

This gives the following output:

a.txt 431 ABC count is 1
b.txt 431 ABC count is 0
a.txt 431 Unknown count is 1
b.txt 431 Unknown count is 0
Member Avatar
replic
Junior Poster in Training
52 posts since Nov 2008
Reputation Points: 17 [?]
Q&As Helped to Solve: 6 [?]
Skill Endorsements: 0 [?]
 
0
 

I assumed that if a certain value was found this implies that it was moved. However i am not sure if OP wanted to check for values that moved from any group (which, if i'm not mistaken, is what my code does).
Since we have 2 code samples now i'm sure OP should be able to figure it out by himself now :)

Member Avatar
d5e5
Practically a Posting Shark
827 posts since Sep 2009
Reputation Points: 109 [?]
Q&As Helped to Solve: 163 [?]
Skill Endorsements: 2 [?]
 
0
 

I assumed that if a certain value was found this implies that it was moved. However i am not sure if OP wanted to check for values that moved from any group (which, if i'm not mistaken, is what my code does).
Since we have 2 code samples now i'm sure OP should be able to figure it out by himself now :)

I'm still not completely sure that any program can say what lines 'moved' from FileA to FileB and vice versa by reading only those two files without reading a previous version of them -- especially since @realoneomer had to show us the contents of the previous versions in order to explain what it meant for lines to 'move'.

If I needed to do something like this, I would first try to get by with using some procedure to compare FileA with FileB, such as the diff command in Linux or one of several examples available in Perl to find the difference between two text files, and see if that served the purpose. Trying to determine what lines moved from what file to what file might require reading the previous files and comparing with them as well, and that could require a lot of work.

Member Avatar
realoneomer
Light Poster
32 posts since Jan 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hello guys, thanks for all your efforts but i am unable to get my desired output from the codes above although i have tried to get my desired output but so far i am unable to do that ok give me one more chance to guide you guys that actually what i want to do.

Suppose i have two version of file one is an old version and the other is a new one

old file looks like

Group Value

ABC 987
ABC 876
ABC 654
DEF 879
DEF 869
GHI 909
Unknown 877
Unknown 465
Unknown 986


now in new file we have some change enteries forexample the value 876 against ABC has been moved in group unknown and a value 465 from Unkown group has been moved in Group DEF now the new file will look like and note that in new file 876 has been deleted from group ABC and showing under unkown group similarly a value 465 has been deleted from group unknown and can be seen against DEF in new file.

Group Value

ABC 987
ABC 654
DEF 879
DEF 869
DEF 465
GHI 909
Unknown 877
Unknown 986
Unknown 876


Now i want to track this information that which values were moved from which group and has been moved under which group as i have explained above

now the output of the above moved values would be

Value 876 from group ABC has been moved in unknown group
Value 465 from unkown group has been moved in DEF group

please write me back if you guys need more information
Thanks

Member Avatar
d5e5
Practically a Posting Shark
827 posts since Sep 2009
Reputation Points: 109 [?]
Q&As Helped to Solve: 163 [?]
Skill Endorsements: 2 [?]
 
0
 
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;

my $filename_old = 'old.txt';
my $filename_new = 'new.txt';
my %data;
my %moved;

read_file($filename_old);
read_file($filename_new);

#Find lines moved from old
foreach my $v(keys %{$data{$filename_old}}){
    foreach my $g(keys %{$data{$filename_old}->{$v}}){
        $data{$filename_new}->{$v}->{$g}->{count} = 0
            unless defined $data{$filename_new}->{$v}->{$g}->{count};
        if ($data{$filename_old}->{$v}->{$g}->{count}
            > $data{$filename_new}->{$v}->{$g}->{count}) {
            $moved{'from'}->{$v}->{'value'} = $v;
            $moved{'from'}->{$v}->{'group'} = []
                unless defined $moved{'from'}->{$v}->{'group'};
            push @{$moved{'from'}->{$v}->{'group'}}, $g;
            #say "$v $g count is $data{$filename_old}->{$v}->{$g}->{count}";
            #say "$filename_new $v $g count is $data{$filename_new}->{$v}->{$g}->{count}";
        }
    }
}

#Find lines moved to new
foreach my $v(keys %{$data{$filename_new}}){
    foreach my $g(keys %{$data{$filename_new}->{$v}}){
        $data{$filename_old}->{$v}->{$g}->{count} = 0
            unless defined $data{$filename_old}->{$v}->{$g}->{count};
        if ($data{$filename_new}->{$v}->{$g}->{count}
            > $data{$filename_old}->{$v}->{$g}->{count}) {
            $moved{'to'}->{$v}->{'value'} = $v;
            $moved{'to'}->{$v}->{'group'} = []
                unless defined $moved{'to'}->{$v}->{'group'};
            push @{$moved{'to'}->{$v}->{'group'}}, $g;
            #say "$v $g count is $data{$filename_old}->{$v}->{$g}->{count}";
            #say "$filename_new $v $g count is $data{$filename_new}->{$v}->{$g}->{count}";
        }
    }
}

foreach my $k(sort keys %{$moved{'from'}}){
    my $v = $moved{'from'}->{$k}->{'value'};
    my @gf = @{$moved{'from'}->{$k}->{'group'}};
    my @gt = @{$moved{'to'}->{$k}->{'group'}};
    say "Value $v from group @gf has been moved to @gt group";
}

sub read_file {
    my $filename = shift;
    open my $fh, '<', $filename or die "Failed to open $filename: $!";    
    while (<$fh>){
        chomp;
        next if m/^##/; #Skip commented-out data lines
        next unless m/\d{3}/;
        my ($group, $value) = split;
        $data{$filename}->{$value}->{$group}->{'count'}++;
    }
}

This gives the following output:

Value 465 from group Unknown has been moved to DEF group
Value 876 from group ABC has been moved to Unknown group
Member Avatar
realoneomer
Light Poster
32 posts since Jan 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi d5e5,

Great Thank you.. I was looking for same one i got my desired output.

Once again thanks for all your effort and suppport.

Member Avatar
d5e5
Practically a Posting Shark
827 posts since Sep 2009
Reputation Points: 109 [?]
Q&As Helped to Solve: 163 [?]
Skill Endorsements: 2 [?]
 
0
 

Hi d5e5,

Great Thank you.. I was looking for same one i got my desired output.

Once again thanks for all your effort and suppport.

You're welcome. I'm glad it finally works.

Please don't forget to mark this thread solved.

Question Answered as of 3 Years Ago by d5e5 and replic
Member Avatar
naieev
Newbie Poster
2 posts since Mar 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi,

I have a similar requirement.
We have a huge csv file (2GB odd) that has to be loaded into our database.
We're concerned with only two fields here say col1 and col2 (of the 32 fields)
before loading into the database we need to check if there is a change in col2 for the same value of col1 , for the file as extracted from database.
Example : say below is the format of the file:
Col1,Col2
AAAAAAA,X
BBBBBBB,Y
CCCCCCC,X
TTTTTTT,X
GGGGGGG,Y
And say below is the extract of col1 and col2 from the table into which the file loads (before the file load):
Col1,Col2
AAAAAAA,Y
BBBBBBB,Y
CCCCCCC,Y
TTTTTTT,X
GGGGGGG,Y
As we can see, the identity AAAAAAA and CCCCCCC have moved from type Y (as exists in the table) to type X (as seen from the extract)
SO these records have to identified for updating the table with the new types.

Please let me know the simplest way to do this...
Also do suggest if querying from the database directly would be better than extracting form the databse and comparing the file with the extract.

You
This question has already been solved: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article