1,105,585 Community Members

Compare two files and replace value

Member Avatar
ajay_p5
Light Poster
29 posts since Apr 2009
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Dear All

I have two files having values like this:

File1:
10.1103/PhysRevA.10.2325 1
10.1103/PhysRevLett.32.584 2
10.1103/PhysRevB.13.4845 3

File 2:
having comma separated values like this:
10.1103/PhysRevA.10.2325,10.1103/PhysRevLett.32.584
10.1103/PhysRevB.13.4845,10.1103/PhysRevLett.32.584

I want the result two be like this:
1 2
3 2

I am using this code but have not be able to successfully do what I have mentioned above:

#!usr/bin/perl
use warnings;
use strict;

 my $input = 'file1.txt';
 my $input2 = 'file2.csv';
 my $output = 'output.txt';
 
 our(@aj1,@aj2,@aj3,@aj4);

 open(INFILE, "<$input") or die "Couldn't open $input for reading: $!";
  open(INFILE2, "<$input2") or die "Couldn't open $input2 for reading: $!";
   open(OUTFILE, ">$output") or die "Couldn't open $output for writing: $!";
   
   while(<INFILE>)
       {
        my $line= $_;
        $line=~tr/\n//d;
        my($hits1,$hits2)=(split("\t",$line))[0,1];
        push(@aj1,"$hits1");
         push(@aj2,"$hits2");
       }
      
      my $size= $#aj1;
        
      while(<INFILE2>)
       {
       	my $c = 0;
        my $lines= $_;
        chomp($lines);
         my($hits3,$hits4)=(split("\,",$lines))[0,1];
         print "$hits3\t";
          for(my $i=0;$i<$size;$i++)
      	 	{
          		my $a= $aj1[$i];
          		my $b= $aj2[$i];
         		if ($hits3 =~  m/($hits3)/)
         		{
         			print OUTFILE"$b";
         			print "$a\n";
        	 	}
        	 	elsif ($a =~ m/($hits4)/)
        	 	{
         	 		$c=$b;
         		}
       	 	}  
       	 print OUTFILE"\t$c\n";
         
       }
    
  
 close(INFILE);
 close(OUTFILE);

Can somebody suggest me what should I do so that I can get the desired output. Thanks!!

Best
Aj

Member Avatar
mitchems
Posting Whiz in Training
294 posts since Feb 2009
Reputation Points: 12 [?]
Q&As Helped to Solve: 38 [?]
Skill Endorsements: 0 [?]
 
0
 

I want the result two be like this:
1 2
3 2

What does the first column represent? Is it the last number in file 1 after the space or the line number or are they one in the same. And the 2nd column? Is that the line number in file #2?

Member Avatar
mitchems
Posting Whiz in Training
294 posts since Feb 2009
Reputation Points: 12 [?]
Q&As Helped to Solve: 38 [?]
Skill Endorsements: 0 [?]
 
0
 

It's funny how similar this was to the last one I solved with a hash.

use strict;
use warnings;
my %chash;
open(FILE2,"<tfile2.txt");
my $x=0;
while(<FILE2>){
	chomp;
	$x++;
	my ($first,$rest)=split(/,/);
	$chash{$first}=$x;
}
close FILE2;
$x=0;
open(FILE1,"<tfile1.txt");
open(OUT,">toutput.txt");
while(<FILE1>){
	chomp;
	$x++;
	my($first,$rest)=split(/\s/);
	$rest="" if(!$rest);
	$chash{$first}="" if(!$chash{$first});
	if($chash{$first}){
		print OUT "$chash{$first} $x\n"; # if you want the line num
		#print OUT "$chash{$first} $rest\n"; # if you want the number at the end
	}
}
close FILE1;
close OUT;
Member Avatar
ajay_p5
Light Poster
29 posts since Apr 2009
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi
I think you didn't get my question:

I have two files:

the 1st one contains ID and an integral value in a tab separated file like this:

ID Integral value
10.1103/PhysRevA.10.2325 1
10.1103/PhysRevLett.32.584 2
10.1103/PhysRevB.13.4845 3

and

File 2 contains 2 IDs corresponding to each other in a comma separated format like this:

10.1103/PhysRevA.10.2325,10.1103/PhysRevLett.32.584
10.1103/PhysRevB.13.4845,10.1103/PhysRevLett.32.584

so what I want to do is that I want to replace these IDs in the file1 with the corresponding integral values in file 2:

So the output should be like this in a tab separated format:
1 2
3 2

as 1 represents 10.1103/PhysRevA.10.2325
2 represents 10.1103/PhysRevLett.32.584
and similarly, 3 represents 10.1103/PhysRevB.13.4845

I have nothing to do with line numbers, I hope you will get it this time, if not let me know.
The above solution you provided does not provide any output probably because you understood it wrongly. Anyways Thanks.

Aj

Member Avatar
mitchems
Posting Whiz in Training
294 posts since Feb 2009
Reputation Points: 12 [?]
Q&As Helped to Solve: 38 [?]
Skill Endorsements: 0 [?]
 
0
 

OK, I get it now. I can look into fixing that.

Member Avatar
mitchems
Posting Whiz in Training
294 posts since Feb 2009
Reputation Points: 12 [?]
Q&As Helped to Solve: 38 [?]
Skill Endorsements: 0 [?]
 
0
 

Here you go... I just had to swap the files around to get what you wish.

use strict;
use warnings;
my %chash;
open(FILE1,"<tfile1.txt");#text file #1
while(<FILE1>){
	chomp;
	my ($first,$rest)=split(/\t/);
	$chash{$first}=$rest;
}
close FILE1;
open(FILE2,"<tfile2.txt");
open(OUT,">toutput.txt");
while(<FILE2>){
	chomp;
	my($first,$second)=split(/,/);
	$chash{$first}="" if(!$chash{$first});
	$chash{$second}="" if(!$chash{$second});
	if($chash{$first}){
		print OUT "$chash{$first}\t$chash{$second}\n";	
	}
}
close FILE2;
close OUT;
Member Avatar
k_manimuthu
Junior Poster in Training
93 posts since Jun 2009
Reputation Points: 43 [?]
Q&As Helped to Solve: 24 [?]
Skill Endorsements: 0 [?]
 
0
 
### Open the files and get the contents
open (FILE1, "file1.txt");
read FILE1, $file1, -s FILE1;
close (FILE1);

open (FILE2, "file2.txt");
read FILE2, $file2, -s FILE2;
close (FILE2);

### process file1 : Store the data in hash format
$id{$1}=$2 while ($file1=~ m{([^\s]+)\s+(\d+)}g);

### process file2 : if the 'id' find in the second file
### replace their corresponding value
$file2=~ s{$_(,|\s+)}{$id{$_}$1}g for (keys %id);

### Create the output file and print the file2 contents
open (FOUT, ">output.txt");
print FOUT $file2;
close (FOUT);
Member Avatar
ajay_p5
Light Poster
29 posts since Apr 2009
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

@ Mitchems

Your solution worked for me , I think i had some problem with the first file as not all of the values in the file were tab separated, so i had to replace \t with \s+ and then it worked for me.
I also used:
if(exists($chash{$first}))
as it looked to me as a better function then checking it twice like this:
$chash{$first}="" if(!$chash{$first});
$chash{$second}="" if(!$chash{$second});

Anyways thanks a lot for the help.

Best
Aj

@K_manimuthu: sorry, but your solution does not work for me. Anyways...Thanks.

Question Answered as of 3 Years Ago by mitchems and k_manimuthu
Member Avatar
mitchems
Posting Whiz in Training
294 posts since Feb 2009
Reputation Points: 12 [?]
Q&As Helped to Solve: 38 [?]
Skill Endorsements: 0 [?]
 
0
 

Glad it worked. I actually put in the tabs in the first file. The exists thing is cool. I out those other lines in so warnings would not complain, but I guess exists does the trick. Take care!

Member Avatar
k_manimuthu
Junior Poster in Training
93 posts since Jun 2009
Reputation Points: 43 [?]
Q&As Helped to Solve: 24 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi ajay_p5,

For your expected output like below format
1 2
3 2

But the above code gives
1,2
3,2

Insist the 15th line of the above code.
You may change the below code.
That gives the expected out.

$file2=~ s{$_,?}{$id{$_} }g for (keys %id);

Now, Is this OK ?

You
This question has already been solved: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article