out put with 50 character on one line:
GTGAGCCAGAACTCATCTTCTTTGCTCGAAACCTGGCGCCAAGTTGTTGCC
TGAGCCAGAACTCATCTTCTTTGCTCGAAACCTGGCGCCAAGTTGTTGCCG
GAGCCAGAACTCATCTTCTTTGCTCGAAACCTGGCGCCAAGTTGTTGCCGA
AGCCAGAACTCATCTTCTTTGCTCGAAACCTGGCGCCAAGTTGTTGCCGAT
GCCAGAACTCATCTTCTTTGCTCGAAACCTGGCGCCAAGTTGTTGCCGATC
CCAGAACTCATCTTCTTTGCTCGAAACCTGGCGCCAAGTTGTTGCCGATCT
CAGAACTCATCTTCTTTGCTCGAAACCTGGCGCCAAGTTGTTGCCGATCTC
Below is my script but it take me along time to do that. Could you show me the fast way to solve this problem.
#!/usr/bin/perl
#print_files_in_subdirs.pl
use strict;
use warnings;
my $input_filename = 'file.txt';
my $data = slurp_file($input_filename);
$data =~ s/\s//g;#Remove all space, newline, etc.
$data =~ s/(\w{50})/$1\n/g;
print $data;
sub slurp_file{
my $filename = shift;
local $/=undef;
open my $fh, $filename or die "Couldn't open file: $!";
my $string = <$fh>;
return $string;
}
Thanks you very much. It work well.
I sorry I did not have the good question. I mean I find 50 chacter on on line and then try revese data. When I revese data I have to chance A=T, T=A, G=C, and C=G.
When I tried to download your attached ref1.txt I got an error message from Daniweb saying "/tmp/Xfx9+ApI.part could not be saved, because the source file could not be read" so I can't see the data.
You already know how to reverse a text string. To replace A with T, T with A, etc. you could use the transliteration function $rec =~ tr/ATGC/TACG/;
Since I don't get the same output you want, I may have misunderstood the question.
hi d5e3,
Thank you very much for show me $rec =~ tr/ATGC/TACG/; I think it help me cript work fastly.
My work: 1.Find 50 base on the one line with each chacter.
2.same with 1 but with the reserve data
input: I have data with 60 chacter (1...60)
GTGAGCCAGAACTCATCTTCTTTGCTCGAAACCTGGCGCCAAGTTGTTGCCGATCTCACA
out put :
question 1: Begin G until 50 charater.(from left to right)
GCTCCTTGGGAAATATAGATCAAATATAGTTCATCGTTTAACTAAACCCG
Begin C until 50 charater.(from left to right)
CTCCTTGGGAAATATAGATCAAATATAGTTCATCGTTTAACTAAACCCGG
..................................................
question 2: resever (input data)
Begin T until 50 charater
TGTGAGAGTCGGCAACAACTTGGCGCCAGGTTTCGAGCAAAGAAGATGAG
Beign G until 50 charater
GTGAGAGTCGGCAACAACTTGGCGCCAGGTTTCGAGCAAAGAAGATGAGT
..................................................
my code repaired $rec =~ tr/ATGC/TACG/; belowCould you plese show me more advice to make cript run faster because my data abou 3.2MB.
use bigint;
use strict;
use warnings;
print "Insert the file:";
my $file = <STDIN>;
if (!open (IN,"$file")){
print "false.\n";
sleep;
}
open (OUTB, ">mode.txt");
open (OUTC, ">mode1.txt");
my $data = "";
while (<IN>){
if ($_ =~ />/){
next;
}
$_ =~ s/\r//;
$_ =~ s/\n//;
$data = $data.$_;
}
my $num = length($data);
for (my $i = 0; $i < $num; $i++){
my $tag = substr($data, 0 + $i, 50);
print (OUTB "$i\t$tag\n");
}
my $data1 = reverse($data);
$data1 =~ tr/A|T|G|C/T|A|C|G/;
for (my $i = 0; $i < $num; $i++){
my $tag = substr($data1, 0 + $i, 50);
print (OUTC "$i\t$tag\n");
}
print "\ndata finished.\ncheck 「Model and model.txt.\n";
close (IN);
close (OUTB);
close (OUTC);
Sorry, I don't know how to make your script run faster other than what I already said about slurping the file into your scalar variable instead of reading it one line at a time.
Taking 50 substrings starting at each character in a large file is probably taking most of the runtime, and I don't know a way of getting the substrings faster.
The regex engines may reduce the process time, Instead of use the 'substr' inside of the 'for' loop for this case.
#!/usr/bin/perl
use strict;
use warnings;
my $name='GTGAGCCAGAACTCATCTTCTTTGCTCGAAACCTGGCGCCAAGTTGTTGCCGATCTC';
my $num='50';
while($name=~ m{.{$num}}g)
{
print "\n\nFirst $num characters\t: $&";
my $reverse = reverse ($&);
print "\nReverse $num characters\t: $reverse";
# you may print the $reverse to some file handle
$reverse =~ tr/A|T|G|C/T|A|C|G/;
print "\nOutput of the sequence\t: $reverse";
# Remove the first character of $name.
# So $name will be reset and ready to find the next $num characters
$name=~ s{^.}{};
}
#!/usr/bin/perl
use strict;
use warnings;
### Inputs
my $input='input.txt';
open (FIN, "$input") || die "Cannot open the $input file : $!";
read FIN, my $file, -s FIN;
close (FIN);
### no of occurence to match
my $num='50';
### Output
open (FOUT, ">output.txt") || die "Cannot create the output file : $!";
while($file=~ m{.{$num}}g)
{
my $reverse = reverse ($&);
$reverse =~ tr/A|T|G|C/T|A|C|G/;
print FOUT "\n$reverse";
# Remove the first character of $name.
# So $name will be reset and ready to find the next $num characters
$name=~ s{^.}{};
}
close (FOUT);
thank you very much. The cript work good, but with the long file it have some problems. It have a good result at first then it have the same result (about 6 times). COuld you show how to solve that problems.
I don't know, what you have some problems. But I guess you may want to process each line and create the output as the possible sequences.
#!/usr/bin/perl
use strict;
use warnings;
### Inputs
my $input='ref1.txt';
open (FIN, "$input") || die "Cannot open the $input file : $!";
### no of occurence to match
my $num='50'; my $count=1;
### Output
open (FOUT, ">output.txt") || die "Cannot create the output file : $!";
while (<FIN>)
{
my $line = $_; chomp($line);
print FOUT "\n\nLine $count\t\t: $line"; my $seq=1;
while($line=~ m{.{$num}}g)
{
my $reverse = reverse ($&);
$reverse =~ tr/A|T|G|C/T|A|C|G/;
print FOUT "\nSequence $seq\t: $reverse";
# Remove the first character of $line.
# So $name will be reset and ready to find the next $num characters
$line=~ s{^.}{};
$seq++;
}
$count++;
}
close (FIN);
close (FOUT);