954,525 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

How to sepeare the data

Hi all,

I am trying to separe my data DNA because it has a lot of name.
It has about 43 litte file in one data.
I made the small script to solve one name of data but I did not succes. The result was not enough. My script is below:

#!/usr/bin/perl;
use strict; 
use warnings; 
my $filename3 = 'file_out.txt';
my $filename = 'dna.txt';

open my $fh, '<', $filename or die "Failed to open $filename: $!";
open my $fho, '>', $filename3 or die "Failed to open $filename3: $!";#Open file for output
while (my $rec = <$fh>){
    $rec=~ s/\s//g;
    chomp($rec);
    #my $rec1 = join ('\n',$rec);
   
   my $line = $rec;
if ($line =~'AGQQ01000002.1'){
  
    print $fho "$rec  \n";
}
}
close $fh;


Result :

>gi|354512096|gb|AGQQ01000002.1|CorynebacteriumglutamicumATCC14067Contig02,wholegenomeshotgunsequenceTTAGCCAGGAAACGCTTCGCTGCCGCGACGTTGCGCTTCGGAGAGAGGTAAAAGTCCAGG


I hope I can separe with all the data of "AGQQ01000002.1" expect the name "]>gi|354512096|gb|AGQQ01000002.1|CorynebacteriumglutamicumATCC14067Contig02,wholegenomeshotgunsequence "
I mean:

GTATGCCCACCAGCGGTAATAGCCCGATAGAGGTAGCACCACCTGCCGCCGACCCGGATG  
TAGGTCTCATCCACCCGCCAGGACCGGGCCTGCCAGTCAGGTACCTGCCGGTACCACCGA  
GTGTGCTTGTCCAGCTCAGGGGCGTATTTCTGGACCCAGCGGTGAGAATCGTGGTGTGAT  
CAACTGGCACGCCGCGGCTGAAGTCATCATTTCCTCCAGATCTTAGGTCAGCTCACCCCG  
TAGCGGCAGTGACCTGCGCACTGCCCACAAAATGATGTCACGGGGGAAATGACGACCGGA  
GAAGATACCCATGGCTGTGATTATTTCACGTCGATCTTCCTACTGCCCCAACTTTGCAAC


Could you show me how to solve that problem. Thank you very much!

Attachments dna.txt (8.8KB)
biojet
Junior Poster in Training
52 posts since Aug 2011
Reputation Points: 10
Solved Threads: 0
 
#!/usr/bin/perl;
use strict; 
use warnings; 
my $filename3 = 'file_out.txt';
my $filename = 'dna.txt';

open my $fh, '<', $filename or die "Failed to open $filename: $!";
open my $fho, '>', $filename3 or die "Failed to open $filename3: $!";#Open file for output

my $name;
while (my $rec = <$fh>){
    $rec=~ s/\s//g;
    chomp($rec);
    #If reading first line of data (starts with '>') for a name, save the name
    if ($rec =~ m/^>/){
        my @flds = split(/\|/, $rec);#split first line of group to get name
        $name = $flds[3];
        next; #Read next line (you don't want to print first line of group)
    }
    
    if ($name eq 'AGQQ01000002.1'){
        print $fho "$rec  \n";
    }
}
close $fh;
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
 

See if this does what you want.

#!/usr/bin/perl
use strict;
use warnings;
if ( !defined($ARGV[0])){
  print "\nUsage: $0 <name>\n";
  print "  Example:  $0 AGQQ01000003.1\n\n";
  exit;
}
my ($filename3,$filename,$printIt) = ('file_out.txt','dna.txt',0);
my @columns;
my $pat = quotemeta($ARGV[0]);
open my $fh, '<', $filename or die "Failed to open $filename: $!";
open my $fho, '>', $filename3 or die "Failed to open $filename3: $!";#Open file for output
while (my $rec = <$fh>){
 chomp($rec);
 @columns = split(/\|/,$rec);
 if ( $#columns >= 3 ) {
    if ( $columns[3] =~ /$pat/ ){
       #print $fho "$columns[3]: ";
       $printIt = 1;
    }
    else {
       $printIt = 0;
    }
 }
 print $fho "$rec\n" if ( $printIt && $#columns == 0 ); 
} 
close $fh;
histrungalot
Posting Whiz in Training
266 posts since May 2008
Reputation Points: 76
Solved Threads: 34
 

See if this does what you want.

#!/usr/bin/perl
use strict;
use warnings;
if ( !defined($ARGV[0])){
  print "\nUsage: $0 <name>\n";
  print "  Example:  $0 AGQQ01000003.1\n\n";
  exit;
}
my ($filename3,$filename,$printIt) = ('file_out.txt','dna.txt',0);
my @columns;
my $pat = quotemeta($ARGV[0]);
open my $fh, '<', $filename or die "Failed to open $filename: $!";
open my $fho, '>', $filename3 or die "Failed to open $filename3: $!";#Open file for output
while (my $rec = <$fh>){
 chomp($rec);
 @columns = split(/\|/,$rec);
 if ( $#columns >= 3 ) {
    if ( $columns[3] =~ /$pat/ ){
       #print $fho "$columns[3]: ";
       $printIt = 1;
    }
    else {
       $printIt = 0;
    }
 }
 print $fho "$rec\n" if ( $printIt && $#columns == 0 ); 
} 
close $fh;

That way looks good too, histrungalot. One thing: I notice you left out the $rec=~ s/\s//g; statement the original script had. That may or may not be wanted depending on whether you want to keep the Windows-style CRLF newlines or if you want to replace them with those of the current platform (in my case, linux).

d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
 

I was 2 minutes to slow. True, I would not have remove the space at the end.

histrungalot
Posting Whiz in Training
266 posts since May 2008
Reputation Points: 76
Solved Threads: 34
 
I was 2 minutes to slow. True, I would not have remove the space at the end.


It's not an ordinary space character at the end of each line. It's a carriage return character that causes my text editor to warn me "This line does not end with the expected EOL: 'LF'..." (see attached screenshot.) The input file has Windows format newlines (CRLF) but linux expects linux-type newlines so perl's chomp command removes only the LF. When the script prints it re-adds LF to every line. If all the lines have carriage return characters except for the last then the output file has CRLF newline characters on all lines but the last, which will have only LF.

That's not a biggie normally, unless the output file will be processed as input by another script that gets confused by mixed line endings.

Attachments Screenshot.png 129.43KB
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
 

Thanks a lot d5e5 and histrungalot ! It works beautifully!

biojet
Junior Poster in Training
52 posts since Aug 2011
Reputation Points: 10
Solved Threads: 0
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You