Compare two files and change the chacter at position
Hello,
Could you please help me in following scenario in Perl scripting?
I want to compare two text files & change the charter at the position. Output of this comparision in third file with flags C-CHANGE, N-SAME at the end of line.
IN PUT1:
Posi
3 ATG
2 ACT
1 ATC
........
IN PUT2:
ref Multant
G C
C A
A A
........
OUT PUT:
posi Ref Mul
3 ATG ATC CHANGE
2 ACT AAT CHANGE
1 ATC ATC SAME
.................
biojet
Junior Poster in Training
52 posts since Aug 2011
Reputation Points: 10
Solved Threads: 0
Looks to me liek you are trying to compare DNA sequences.. in that case will it be correct to assume that the character sets will all be of length 3? Also i suspect these files would run into millions of rows then?
you right! I am try to compare DNA sequence. I know the positions where sequence and what kind Nuleotid were changed. I used excel to change chacter of sequence but I hope I can do it with Perl. Could you show me to do it?
biojet
Junior Poster in Training
52 posts since Aug 2011
Reputation Points: 10
Solved Threads: 0
I'm trying, but I can't figure out how your output follows from your input. Can you explain the problem more clearly?
Trentacle
Junior Poster in Training
72 posts since Dec 2010
Reputation Points: 110
Solved Threads: 20
I'm trying, but I can't figure out how your output follows from your input. Can you explain the problem more clearly?
I can do it with excel by Replace comand. I hope I can do it with perl. For ex amino acid ATG was changed at position 3 of that amino acid and the charter was changed G by C. Out put: at postion 3 of ATG was changed --> ATC and label "Change". I hope it helps you understand my problem.
biojet
Junior Poster in Training
52 posts since Aug 2011
Reputation Points: 10
Solved Threads: 0
Ok, wait.
If your first file says what position you want to change, why does your second file say what character you want to change? If the "reference" character in file 2 is different from the character found at the position given in file 1, do you still change it to the "mutant" (not "multant") form?
I might also ask, how do you do it in Excel?
Trentacle
Junior Poster in Training
72 posts since Dec 2010
Reputation Points: 110
Solved Threads: 20
input1.csv
3 ATG
2 ACT
1 ATC
input2.csv
G C
C A
A A
#!/usr/bin/perl;
use strict;
use warnings;
my ($filename1, $filename2) = ('input1.csv', 'input2.csv');
open my $fh1, '<', $filename1 or die "Failed to open $filename1: $!";
open my $fh2, '<', $filename2 or die "Failed to open $filename2: $!";
while (my $rec1 = <$fh1>){
defined (my $rec2 = <$fh2>) or last;
print compare($rec1, $rec2), "\n";
}
sub compare{
my ($str1, $str2) = @_;
my ($pos, $triplet) = split(/\s+/, $str1);
my ($ref, $mut) = split(/\s+/, $str2);
my $idx = $pos - 1;#index starts at 0
my $origtriplet = $triplet;
my $origchar = substr($triplet, $idx, 1);
my $stat;
if ($origchar eq $ref){
substr($triplet, $idx, 1) = $mut;
}
if ($origchar eq $mut){
$stat = 'SAME';
}
else {
$stat = 'CHANGE';
}
return "$origtriplet\t$triplet\t$stat";
}
Outputs ATG ATC CHANGE
ACT AAT CHANGE
ATC ATC SAME
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
Thank you very much for your tutoral.
It is so good for me.
biojet
Junior Poster in Training
52 posts since Aug 2011
Reputation Points: 10
Solved Threads: 0