Hi all,

I am trying to separe my data DNA because it has a lot of name.
It has about 43 litte file in one data.
I made the small script to solve one name of data but I did not succes. The result was not enough. My script is below:

#!/usr/bin/perl;
use strict; 
use warnings; 
my $filename3 = 'file_out.txt';
my $filename = 'dna.txt';

open my $fh, '<', $filename or die "Failed to open $filename: $!";
open my $fho, '>', $filename3 or die "Failed to open $filename3: $!";#Open file for output
while (my $rec = <$fh>){
    $rec=~ s/\s//g;
    chomp($rec);
    #my $rec1 = join ('\n',$rec);
   
   my $line = $rec;
if ($line =~'AGQQ01000002.1'){
  
    print $fho "$rec  \n";
}
}
close $fh;

Result :

>gi|354512096|gb|AGQQ01000002.1|CorynebacteriumglutamicumATCC14067Contig02,wholegenomeshotgunsequenceTTAGCCAGGAAACGCTTCGCTGCCGCGACGTTGCGCTTCGGAGAGAGGTAAAAGTCCAGG

I hope I can separe with all the data of "AGQQ01000002.1" expect the name "]>gi|354512096|gb|AGQQ01000002.1|CorynebacteriumglutamicumATCC14067Contig02,wholegenomeshotgunsequence "
I mean:

GTATGCCCACCAGCGGTAATAGCCCGATAGAGGTAGCACCACCTGCCGCCGACCCGGATG  
TAGGTCTCATCCACCCGCCAGGACCGGGCCTGCCAGTCAGGTACCTGCCGGTACCACCGA  
GTGTGCTTGTCCAGCTCAGGGGCGTATTTCTGGACCCAGCGGTGAGAATCGTGGTGTGAT  
CAACTGGCACGCCGCGGCTGAAGTCATCATTTCCTCCAGATCTTAGGTCAGCTCACCCCG  
TAGCGGCAGTGACCTGCGCACTGCCCACAAAATGATGTCACGGGGGAAATGACGACCGGA  
GAAGATACCCATGGCTGTGATTATTTCACGTCGATCTTCCTACTGCCCCAACTTTGCAAC

Could you show me how to solve that problem. Thank you very much!

Recommended Answers

All 6 Replies

#!/usr/bin/perl;
use strict; 
use warnings; 
my $filename3 = 'file_out.txt';
my $filename = 'dna.txt';

open my $fh, '<', $filename or die "Failed to open $filename: $!";
open my $fho, '>', $filename3 or die "Failed to open $filename3: $!";#Open file for output

my $name;
while (my $rec = <$fh>){
    $rec=~ s/\s//g;
    chomp($rec);
    #If reading first line of data (starts with '>') for a name, save the name
    if ($rec =~ m/^>/){
        my @flds = split(/\|/, $rec);#split first line of group to get name
        $name = $flds[3];
        next; #Read next line (you don't want to print first line of group)
    }
    
    if ($name eq 'AGQQ01000002.1'){
        print $fho "$rec  \n";
    }
}
close $fh;

See if this does what you want.

#!/usr/bin/perl
use strict;
use warnings;
if ( !defined($ARGV[0])){
  print "\nUsage: $0 <name>\n";
  print "  Example:  $0 AGQQ01000003.1\n\n";
  exit;
}
my ($filename3,$filename,$printIt) = ('file_out.txt','dna.txt',0);
my @columns;
my $pat = quotemeta($ARGV[0]);
open my $fh, '<', $filename or die "Failed to open $filename: $!";
open my $fho, '>', $filename3 or die "Failed to open $filename3: $!";#Open file for output
while (my $rec = <$fh>){
 chomp($rec);
 @columns = split(/\|/,$rec);
 if ( $#columns >= 3 ) {
    if ( $columns[3] =~ /$pat/ ){
       #print $fho "$columns[3]: ";
       $printIt = 1;
    }
    else {
       $printIt = 0;
    }
 }
 print $fho "$rec\n" if ( $printIt && $#columns == 0 ); 
} 
close $fh;
commented: Looks good +9

See if this does what you want.

#!/usr/bin/perl
use strict;
use warnings;
if ( !defined($ARGV[0])){
  print "\nUsage: $0 <name>\n";
  print "  Example:  $0 AGQQ01000003.1\n\n";
  exit;
}
my ($filename3,$filename,$printIt) = ('file_out.txt','dna.txt',0);
my @columns;
my $pat = quotemeta($ARGV[0]);
open my $fh, '<', $filename or die "Failed to open $filename: $!";
open my $fho, '>', $filename3 or die "Failed to open $filename3: $!";#Open file for output
while (my $rec = <$fh>){
 chomp($rec);
 @columns = split(/\|/,$rec);
 if ( $#columns >= 3 ) {
    if ( $columns[3] =~ /$pat/ ){
       #print $fho "$columns[3]: ";
       $printIt = 1;
    }
    else {
       $printIt = 0;
    }
 }
 print $fho "$rec\n" if ( $printIt && $#columns == 0 ); 
} 
close $fh;

That way looks good too, histrungalot. One thing: I notice you left out the $rec=~ s/\s//g; statement the original script had. That may or may not be wanted depending on whether you want to keep the Windows-style CRLF newlines or if you want to replace them with those of the current platform (in my case, linux).

I was 2 minutes to slow. True, I would not have remove the space at the end.

I was 2 minutes to slow. True, I would not have remove the space at the end.

It's not an ordinary space character at the end of each line. It's a carriage return character that causes my text editor to warn me "This line does not end with the expected EOL: 'LF'..." (see attached screenshot.) The input file has Windows format newlines (CRLF) but linux expects linux-type newlines so perl's chomp command removes only the LF. When the script prints it re-adds LF to every line. If all the lines have carriage return characters except for the last then the output file has CRLF newline characters on all lines but the last, which will have only LF.

That's not a biggie normally, unless the output file will be processed as input by another script that gets confused by mixed line endings.

Thanks a lot d5e5 and histrungalot ! It works beautifully!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.