Hi,

I have 2 sets of multiple sequence files of DNA sequences.

Examples of set 1 are as follows:

Seq1.txt contains:
ATACAGGATCAGATG

Seq2.txt contains:
ATACCGGATCAGATG

Seq3.txt contains:
ATACAGGGTCAGATG

Examples of set 2 are as follows:
Seq1.txt contains:

seq1
ATACAGGATCAGATG

seq2
Seq2.txt contains:
ATACCGGATCAGATG

seq3
Seq3.txt contains:
ATACAGGGTCAGATG

I want one/two perl programs to read the multiple files and concatenate the twoi sets as follows:

Concatenated_set1.txt:

ATACAGGATCAGATGATACCGGATCAGATGATACAGGGTCAGATG

Concatenated_set2.txt:

seq1
ATACAGGATCAGATG
seq2
ATACCGGATCAGATG
seq3
ATACAGGGTCAGATG

My codes are as follows:

use strict;
use warnings;
use diagnostics;

open (OUT,'>>', 'C:/Test1/Concatenated_set1.txt');

while (my @files = <*.txt>){
chomp;

foreach my$files (@files){
    print OUT "$_"  
}

}
close OUT

use strict;
use warnings;
use diagnostics;

open (OUT,'>>', 'C:/Test1/Concatenated_set2.txt');

while (my @files = <*.txt>){
chomp;

foreach my$files (@files){
    print "<";
    print "$file\n";
    print OUT "$_"; 
}

}
close OUT

I would be happy to receive some help.

Thanks

Recommended Answers

All 8 Replies

If you only want to put the content of all those files into one i'd do that as follows:

use strict;
use warnings "all";

my @files = qw(Seq1.txt Seq2.txt Seq3.txt);

open my $out, ">>output.txt" or die "Cannot open output.txt\n";
foreach my $file (@files)
{
    open my $in, "<$file" or die "Cannot open $file!\n";
    while(<$in>)
    {
        print $out $_;
    }
    close $in;
}
close $out;

Hi replic,

Thanks a lot for your reply and help. The code worked.

However, I forgot to ask that I still need a code to concatenate contents of many files into one and in the single output file each starts with ">" followed by the file name.

Examples:

I have 3 different files as follows:

Seq1.txt contains ATACAGGATCAGATG

seq2.txt contains ATACCGGATCAGATG

seq3.txt contains ATACAGGGTCAGATG

Output required: Contents of the 3 files concatenated in a single file as follows:

Seq1
ATACAGGATCAGATG
seq2
ATACCGGATCAGATG
seq3
ATACAGGGTCAGATG

I would be grateful for more help.
Thanks

    use strict;
    use warnings "all";

    my @files = qw(Seq1.txt Seq2.txt Seq3.txt);
    my $tmp;

    open my $out, ">>output.txt" or die "Cannot open output.txt\n";
    foreach my $file (@files)
    {
        open my $in, "<$file" or die "Cannot open $file!\n";
        undef $tmp;
        while(<$in>)
        {
            $tmp .= $_;
        }
        print $out ">$file\n";
        print $out "$tmp\n";
        close $in;
    }
    close $out;

This script generates the following output:

>Seq1.txt
ATACAGGATCAGATG
>Seq2.txt
ATACAGGATCAGATG
>Seq3.txt
ATACAGGATCAGATG

If you have more than those 3 files you can just read the contents of the current directory and save the files you need to an array.

Again that worked nicely. Thanks. I'm having a trouble reading the entire folder content. I got errors about files/folder not being found using the following code:

use strict;
use warnings "all";

my $dir= shift || '.';
opendir DIR, $dir or die "Can't open directory $dir: $!\n";
my @files=readdir(DIR);
#print "@files";

#my $dir="C:/files/";
#opendir DIR, $dir or die "cannot open dir $dir: $!";
#my @files= readdir DIR;
#closedir DIR;
#my @files = qw(seq1.txt seq2.txt seq3.txt);

my $tmp;
open my $out, ">>output.txt" or die "Cannot open output.txt\n";
foreach my $file (@files){

    open my $in, "<$file" or die "Cannot open $file!\n";
    undef $tmp;
    while(<$in>){
        $tmp .= $_;
    }
    print $out ">$file\n";
    print $out "$tmp\n";
    close $in;
}
close $out;

Seems to work fine for me.

use strict;
use warnings "all";

my $tmp;
my $dir = ".";

opendir(DIR, $dir) or die "Cannot open directory: $dir!\n";
my @files = readdir(DIR);
closedir(DIR);

open my $out, ">>output.txt" or die "Cannot open output.txt!\n";
foreach my $file (@files)
{
    if($file =~ /txt$/)
    {
        open my $in, "<$file" or die "Cannot open $file!\n";
        undef $tmp;
        while(<$in>)
        {
            $tmp .= $_;
        }
        print $out ">$file\n";
        print $out "$tmp\n";
        close $in;
    }
}
close $out;

Fantastic!!! The latest code works beautifully as well. Many thanks. Just a couple of questions: why did you use the lines undef $tmp and $tmp .=$_ in the code?

$tmp is just a temporary variable which is used to store the content of the current file. $tmp .= $_ means we add the content of the current file to $tmp until everything is read. undef $tmp just resets $tmp. $tmp = "" would have the same effect and i am actually not sure which is more efficient here.

Thank you very much!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.