d5e5 109 Master Poster

...I can put out the data with

num  name              product               star_posi        end_posi
1    [gene=KIQ_00005] [protein=hypothetical [location=complement(<1..423)]

but I can not seperare the number of [location] and I can not delete [] of data...

To remove the square brackets from your text string you can do a regex substitution.

#!/usr/bin/perl;
use strict; 
use warnings; 

my $name = '[gene=KIQ_00005]';
print "$name\n";

my $character_to_remove = '\['; #add required escape character before [
$name =~ s/$character_to_remove//;# $name now contains gene=KIQ_00005]
print "$name\n";

$character_to_remove = '\]'; #add required escape character before ]
$name =~ s/$character_to_remove//;# $name now contains gene=KIQ_00005
print "$name\n";
d5e5 109 Master Poster
#!/usr/bin/perl;
use strict; 
use warnings; 

my $filename1 = 'file1.txt';
my $filename2 = 'file2.txt';
my $filename3 = 'file3.txt';

#Build a hash of arrays from file2
my %uniprot_families;
open my $fh, '<', $filename2 or die "Failed to open $filename2: $!";

while (my $rec = <$fh>){
    chomp($rec);
    my ($id, $family) = split(/\s+/, $rec);
    $uniprot_families{$id} = [] if not exists $uniprot_families{$id};
    push $uniprot_families{$id}, $family;
}
close $fh;

#open your other input file
open $fh, '<', $filename1 or die "Failed to open $filename1: $!";

#open your output file
open my $fh_out, '>', $filename3 or die "Failed to open $filename3: $!";

while (my $id = <$fh>){
    chomp($id);
    my $families;
    if (exists $uniprot_families{$id}){
        $families = join(',', @{$uniprot_families{$id}});
    }
    else{
        $families = 'None';
    }
    print $fh_out "$id\t$families\n";
}
close $fh;
d5e5 109 Master Poster

Thank d5e5 very much.
I can study a lot of from your script.

Could you show how to use the script if n = lenght of data.

Sorry, I only know how to write the script for a pre-determined value of n. It looks like $n represents the number of data lines to read before sorting and I don't know how to determine the length of data before reading them.

d5e5 109 Master Poster
#!/usr/bin/perl;
use strict; 
use warnings; 

my $n = 3;#Decided position
my $line_count = 0;
my $filename = 'sample file.txt';
open my $fh, '<', $filename or die "Failed to open $filename: $!";

LINE: while (1){
    my %hash = ();
    foreach (1 .. $n){
        my $rec = <$fh>;
        last LINE unless defined $rec;
        $rec = <$fh> while $rec =~ m/^#/;#Skip comment lines
        
        $rec =~ s/\s*$//;#Remove spaces or newline characters from end
        $hash{++$line_count} = $rec;
    }

    my @array = sort   {my ($sa) = split /\s+/, $hash{$a};
                    my ($sb) = split /\s+/, $hash{$b};
                    $sb <=> $sa;} keys(%hash);
    
    my $label = "decided position $line_count\n";
    foreach(@array){
        print $label;
        printf "%15d%10s\n", ($_, $hash{$_});
        $label = '';
    }
}
d5e5 109 Master Poster
#!/usr/bin/perl;
use strict; 
use warnings; 
use Data::Dumper;
my $n = 3;#Decided position
my $filename = 'sample file.txt';
open my $fh, '<', $filename or die "Failed to open $filename: $!";

while (<$fh>){
    next if m/^#/;#Skip comment lines
    my @array = ();
    foreach (1 .. $n){
        my $rec = <$fh>;
        last unless defined $rec;
        
        $rec =~ s/\s*$//;#Remove spaces or newline characters from end
        push @array, $rec;
    }

    @array = sort   {my ($sa) = split /\s+/, $a;
                    my ($sb) = split /\s+/, $b;
                    $sb <=> $sa;} @array;
    
    my $label = "decided position $n";
    foreach(@array){
        printf "%-23s%s\n", ($label, $_);
        $label = '';
    }
}
d5e5 109 Master Poster
#!/usr/bin/perl;
use strict; 
use warnings; 

printf("Dollar %s\nPound %s\nPercent %s\nCarot %s\nAmpersand %s",
       ('$', '#', '%', '^', '&', '*'));

#Outputs
#Dollar $
#Pound #
#Percent %
#Carot ^
#Ampersand &

I'm not sure I understand the question. You ask about symbols occurring within the text but your example shows a percent sign occurring within the printf format argument. I think the answer is 'No, other symbols will not cause a problem for printf.'

However, keep in mind that anything between double quotes will interpolate, so use single quotes for your text argument so perl won't attempt to interpolate strings including $, @, %, \, etc.

d5e5 109 Master Poster

You need to save your base information into a variable as you read it, and save this into one of your hashes, along with position.

#!/usr/bin/perl;
use strict;
use warnings;
use autodie;
my ( %data1, %data2 );
open my $in, '<', 'baselist.txt';
while (<$in>) {
    next unless /^\s*\d/;
    my ( $num, $posi, $base ) = split;#Read base into a variable
    $data1{$num}{'base_posi'} = $posi;#Save position into your hash
    $data1{$num}{'base'}      = $base;#Save base into your hash
}
open $in, '<', 'lao1.txt';
while (<$in>) {
    next unless /^\s*\d/;
    my ( $num, $SNP_posi, $ref, $mut ) = split;
    $data2{$num}{'SNP_posi'} = $SNP_posi;
    $data2{$num}{'ref'}      = $ref;
    $data2{$num}{'mut'}      = $mut;
}
close $in;
open( SNP, ">SNP.txt" );
for my $num ( keys %data1 ) {
    my $val  = $data1{$num}{'base_posi'};###
    my $base = $data1{$num}{'base'};###
    for my $num2 ( keys %data2 ) {
        my $min    = $data2{$num2}{'SNP_posi'};
        my $max    = $data2{$num2}{'ref'};
        my $tengen = $data2{$num2}{'mut'};
        if ( $val eq $min ) {
            print SNP $val . "\t";
            print SNP $base . "\t";
            print SNP $tengen . "\t";
            print SNP $max . "\n";
            last;
        }
    }
}
close(SNP);
d5e5 109 Master Poster

You need a chomp; statement before the statement that does the split. Otherwise the last codon on each line will include a newline character and so will not match any codon in your hash.

d5e5 109 Master Poster

You could create another variable to hold the position value and include that variable in the string that you print.

Instead of my ($ref, $mul) = split /\s+/, $rec; you could have my ($pos, $ref, $mul) = split /\s+/, $rec;

d5e5 109 Master Poster
#!/usr/bin/perl;
use strict;
use warnings;

my %codon2proteins = build_hash();

my $dna_filename = 'dna.txt';

open my $dna_fh, '<', $dna_filename or die "Failed to open $dna_filename: $!";

while (my $rec = <$dna_fh>){
    chomp($rec);
    my ($ref, $mul) = split /\s+/, $rec;
    my $ref_pro = convert_codon2prot($ref);
    my $mul_pro = convert_codon2prot($mul);
    my $info = compare_pros($ref_pro, $mul_pro);
    print "$ref\t$mul\t$ref_pro\t$mul_pro\t$info\n";
}

sub compare_pros{
    my ($r, $m) = @_;
    if ($r eq $m){
        return 'same';
    }
    else {
        return 'change';
    }
}

sub convert_codon2prot{
    my ($codon) = @_;
    
    if(exists $codon2proteins{$codon}){
        return $codon2proteins{$codon};
    }
    else{
        die "Bad codon $codon!!\n";
    }
}

sub build_hash{
    my(%g)=(
            'TCA'=>'S', #Serine
            'TCC'=>'S', #Serine
            'TCG'=>'S',  #Serine
            'TCT'=>'S', #Serine 
            'TTC'=>'F', #Phenylalanine 
            'TTT'=>'F', #Phenylalanine 
            'TTA'=>'L', #Leucine 
            'TTG'=>'L', #Leucine 
            'TAC'=>'Y', #Tyrosine 
            'TAT'=>'Y', #Tyrosine 
            'TAA'=>'_', #Stop 
            'TAG'=>'_', #Stop 
            'TGC'=>'C', #Cysteine 
            'TGT'=>'C', #Cysteine 
            'TGA'=>'_', #Stop 
            'TGG'=>'W', #Tryptophan 
            'CTA'=>'L', #Leucine 
            'CTC'=>'L', #Leucine 
            'CTG'=>'L', #Leucine 
            'CTT'=>'L', #Leucine 
            'CCA'=>'P', #Proline 
            'CAT'=>'H', #Histidine 
            'CAA'=>'Q', #Glutamine 
            'CAG'=>'Q', #Glutamine 
            'CGA'=>'R', #Arginine 
            'CGC'=>'R', #Arginine 
            'CGG'=>'R', #Arginine 
            'CGT'=>'R', #Arginine 
            'ATA'=>'T', #Isoleucine 
            'ATC'=>'T', #Isoleucine 
            'ATT'=>'T', #Isoleucine 
            'ATG'=>'M', #Methionine 
            'ACA'=>'T', #Threonine 
            'ACC'=>'T', #Threonine 
            'ACG'=>'T', #Threonine 
            'ACT'=>'T', #Threonine 
            'AAC'=>'N', #Asparagine 
            'AAT'=>'N', #Asparagine 
            'AAA'=>'K', #Lysine 
            'AAG'=>'K', #Lysine 
            'AGC'=>'S', #Serine#Valine 
            'AGT'=>'S', #Serine 
            'AGA'=>'R', #Arginine 
            'AGG'=>'R', #Arginine 
            'CCC'=>'P', #Proline 
            'CCG'=>'P', #Proline 
            'CCT'=>'P', #Proline 
            'CAC'=>'H', #Histidine 
            'GTA'=>'V', #Valine 
            'GTC'=>'V', #Valine 
            'GTG'=>'V', #Valine 
            'GTT'=>'V', #Valine 
            'GCA'=>'A', #Alanine 
            'GCC'=>'A', #Alanine 
            'GCG'=>'A', #Alanine 
            'GCT'=>'A', #Alanine 
            'GAC'=>'D', #Aspartic Acid 
            'GAT'=>'D', #Aspartic Acid 
            'GAA'=>'E', #Glutamic Acid 
            'GAG'=>'E', #Glutamic Acid 
            'GGA'=>'G', #Glycine 
            'GGC'=>'G', #Glycine 
            'GGG'=>'G', #Glycine 
            'GGT'=>'G', #Glycine 
    );
    return %g;
}

Outputs

AGA	AAA	R	K	change
CCA	CCT	P	P	same
GCA	ACA	A	T	change
GCA …
d5e5 109 Master Poster

I don't see how your script decides whether to print 'change' or 'same'? What is the rule that determines this?

d5e5 109 Master Poster

Every Perl expression is in one of two `contexts', either `list context' or `scalar context', depending on whether it is expected to produce a list or a scalar. Many expressions have quite different behaviors in list context than they do in scalar context.

http://perl.plover.com/context.html
The print command expects a list as its argument so the filehandle following print is called in a list context. When called in a list context a filehandle reads all the remaining lines in the file.

The length command puts its argument in a scalar context. When called in a scalar context a filehandle reads only one line from the file.

d5e5 109 Master Poster

input1.csv

3    ATG   
2    ACT
1    ATC

input2.csv

G    C 
C    A
A    A
#!/usr/bin/perl;
use strict;
use warnings;

my ($filename1, $filename2) = ('input1.csv', 'input2.csv');

open my $fh1, '<', $filename1 or die "Failed to open $filename1: $!";
open my $fh2, '<', $filename2 or die "Failed to open $filename2: $!";

while (my $rec1 = <$fh1>){
    defined (my $rec2 = <$fh2>) or last;
    print compare($rec1, $rec2), "\n";
}

sub compare{
    my ($str1, $str2) = @_;
    my ($pos, $triplet) = split(/\s+/, $str1);
    my ($ref, $mut) = split(/\s+/, $str2);
    my $idx = $pos - 1;#index starts at 0
    my $origtriplet = $triplet;
    my $origchar = substr($triplet, $idx, 1);
    my $stat;
    
    if ($origchar eq $ref){
        substr($triplet, $idx, 1) = $mut;
    }
    
    if ($origchar eq $mut){
        $stat = 'SAME';
    }
    else {
        $stat = 'CHANGE';
    }
    
    return "$origtriplet\t$triplet\t$stat";
}

Outputs

ATG	ATC	CHANGE
ACT	AAT	CHANGE
ATC	ATC	SAME
d5e5 109 Master Poster

Could you give us a snippet of your script to illustrate what you are trying to do? Maybe it's me but I really don't understand what you mean by evaluating "variabl%e". Does the following bear any resemblance to what you want?

#!/usr/bin/perl;
use strict;
use warnings;

my $variable = 1;

my $format = '%s%%';

printf $format, $variable; #1%
d5e5 109 Master Poster

Did you run it with at least two command line arguments? When I run your script from the command line with no arguments I get the same error. But if I run it from the command line as follows: perl temp01.pl localhost 42; it runs and prints lines until I kill it, but with no error.

d5e5 109 Master Poster

What version of perl are you running? I am getting the error on 5.8.8...other versions run it fine. I am almost certain this is a bug...I am just trying to figure away around it.

I'm running version 5.14.2 for linux.

d5e5 109 Master Poster

Sorry, I thought the script was just hanging but really it was waiting for 20 * 3 seconds for a response from the localhost server when attempting to create a Net::SNMP->session. If I wait long enough the script ends with ERROR: No response from remote host "localhost" during discovery. The script must have compiled successfully to run and it doesn't give me a division error.

d5e5 109 Master Poster

Why do you say it doesn't compile? When I run it with options -H localhost -t load, for example, it just hangs until I kill it. I don't get any syntax errors indicating it won't compile and no runtime errors such as division by zero, so I can't reproduce the error you are getting. Could you tell us what command-line options you use when you get the division error? Does it work for some hosts and options?

d5e5 109 Master Poster

Run Perl with the -d option and it starts in debug mode which you can use as a command line interpreter to print out the values of expressions. The 'p' command tells the Perl debugger to print the value of the expression.

david@david-laptop:~$ perl -de 0

Loading DB routines from perl5db.pl version 1.33
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(-e:1):	0
  DB<1> p 4+3 
7

You can search CPAN for modules that provide a more elaborate command line iterpreter, such as Devel::ptkdb and Perl::Shell

d5e5 109 Master Poster
#!/usr/bin/perl;
use strict;
use warnings;

my $filename = 'input.csv';
open my $fh, '<', $filename or die "Unable to open $filename: $!";

my $firstline = <$fh>;
chomp($firstline);
print "The first line is $firstline\n";

my @fields = split(/,/, $firstline);
my $len = length($fields[1]);
print "The second field contains '$fields[1]' which has length of $len\n";
d5e5 109 Master Poster
#!/usr/bin/perl;
use strict;
use warnings;

my $pathway;#To save string from previous iteration, create variable outside loop
while (my $line = <DATA>) {
    chomp($line);
    
    if ($line =~ /^$/) {
        next;
    }
    elsif ($line =~ /ko/) {
        $pathway = $line;
    }
    elsif ($line =~ /K/) {
        #trim($line); #What does your trim() subroutine do?
        print "$pathway\t$line\n";
    }
    else {
        print "problem with $line\n";
    }
}

__DATA__
ko10101
K01392
K09134

ko34231
K05789

ko13452
K04665
K07881

Outputs

ko10101	K01392
ko10101	K09134
ko34231	K05789
ko13452	K04665
ko13452	K07881
d5e5 109 Master Poster

Building on Trentacle's first correction of your data structure you could try the following:

#!/usr/bin/perl;
use strict;
use warnings;
use Data::Dumper;

my $chainStorage = {
    ACB => {
        E => { '06' => [100, 200, 95] },
        B => { '23' => [20, 1000, 5, 30] },
    },
    AFG => {
        C => { '24' => [18, 23, 2300, 3456] },
    },
    HJK => {
        A => { '12' => [24, 25, 3200, 5668] },
        D => { '15' => [168] },
    },
};

my %results;

foreach my $flk(keys %$chainStorage){
    $results{$flk} = 0;
    
    foreach my $slk(keys $$chainStorage{$flk}){
        foreach my $tlk(keys $$chainStorage{$flk}{$slk}){
            foreach my $aref ($$chainStorage{$flk}{$slk}{$tlk}){
                foreach my $elem(@$aref){
                    $results{$flk} += $elem;
                }
            }
        }
    }
}

print Dumper(\%results);

Outputs

$VAR1 = {
          'AFG' => 5797,
          'HJK' => 9085,
          'ACB' => 1450
        };
d5e5 109 Master Poster

Yes, third column could be a number or a characters or combination of both. But, it will have fixed width with leading zeroes for number & leading spaces for character.

Also, while execution above perl script program, I'm getting following error message. Is this Perl/module installation issue? Please suggest me to resolve this issue.

Error message -
Can't locate Sort/External.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi

Yes, you need to install the Sort::External module. Install it in the usual way that you install Perl modules on your system. There are many ways to install modules. I use App::cpanminus to install Perl modules but of course I had to install App::cpanminus before using it the first time to install other Perl modules.

Since you say your third column will already be padded with leading zeroes when numeric then you should be able to use the last script I posted above as long as you know what the maximum width of that column will be. If you know that the desired column starts at position 17 and has a maximum width of 28 then using substring to take your sorting data should work just fine. For data shorter than 28 it doesn't matter if you take extra characters on the right side because the characters on the left side are the most significant for the string comparison done during the sorting. Your only remaining issue is installing the Sort::External module.

d5e5 109 Master Poster

That should work OK. I think that for variable-width columns I would just do a split, put a delimiter between the sort prefix and the record for the encode step, and then remove everything from start of record up to the first occurrence of your delimiter for the decode step. The biggest gain in efficiency comes from eliminating the block of code that the sort would have to call, so how you extract the column to place at the start of the record doesn't affect the efficiency much and there are several ways of doing it.

One question: since the third column can vary in width, does it always contain only numeric digits? If so, do you want the data compared as a number or as a string of characters? In other words: should '000009' be considered bigger (because it's a bigger number) or smaller (because the first characters of the string are '000' compared to the first three characters of the other string which are '001'. If you want to compare the variable-width data numerically then you will need to make it fixed-width by adding leading zeroes.

d5e5 109 Master Poster

If you have fixed-width data records so you know your third column always starts at a specified position you can extract the column you want to sort by using the substring function instead of splitting, and assigning the result to an array. By appending the extracted data to the beginning of each record you can sort as strings instead of having to provide a block of code for the sorting logic. The following should run faster for very large input files than the previous script that uses a sortsub.

#!/usr/bin/perl;
use strict;
use warnings;
use Sort::External;

my $temp_directory = '/home/david/temp';

my $sortex = Sort::External->new(working_dir => $temp_directory, );

while (<DATA>) {
    chomp;
    #Encode by extracting the third column (assuming it starts at 17th character)
    #and concatenate it to the start of the record.
    $sortex->feed(substr($_,17,28) . $_);
}

$sortex->finish;

while ( defined( $_ = $sortex->fetch ) ) {
    #Decode
    print substr($_, 28), "\n"; #Remove the extra copy of data from start of record
}

__DATA__
1780438|20110709|0000007704000000000000004888|7704|48881|PE|08/12/2008 11:38:54|0|1000.00
1780437|20110708|0000007704000000000000004882|7704|48882|PE|08/12/2008 11:38:54|0|1000.00
1780436|20110707|0000007704000000000000004889|7704|48887|PE|08/11/2008 11:38:54|0|1000.00
1780435|20110703|0000007704000000000000004881|7704|48888|PE|08/12/2008 11:38:54|0|1000.00
d5e5 109 Master Poster

Welcome to Daniweb. Please start a new thread to ask your question. When you ask your question in a new thread, it will help if you list the script that you tried. Then maybe someone can advise you why your script didn't give you the result you wanted.

d5e5 109 Master Poster

I haven't used Sort::External before. I tried altering my previous example of sorting to use Sort::External as follows and it runs OK for me. Of course my example sorts a very small amount of data. It should also work for your large files but I have no idea how long it will take to run.

#!/usr/bin/perl;
use strict;
use warnings;
use Sort::External;

my $sortscheme = sub {
                    my @flds_a = split(/\|/, $Sort::External::a);
                    my @flds_b = split(/\|/, $Sort::External::b);
                    $flds_a[2] cmp $flds_b[2]; #compare key fields to sort
                    };

my $temp_directory = '/home/david/temp';

my $sortex = Sort::External->new(   sortsub         => $sortscheme,
                                    working_dir     => $temp_directory, );

while (<DATA>) {
    chomp;
    $sortex->feed($_);
}

$sortex->finish;

while ( defined( $_ = $sortex->fetch ) ) {
    print "$_\n";
}

__DATA__
1780438|20110709|0000007704000000000000004888|7704|48881|PE|08/12/2008 11:38:54|0|1000.00
1780437|20110708|0000007704000000000000004882|7704|48882|PE|08/12/2008 11:38:54|0|1000.00
1780436|20110707|0000007704000000000000004889|7704|48887|PE|08/11/2008 11:38:54|0|1000.00
1780435|20110703|0000007704000000000000004881|7704|48888|PE|08/12/2008 11:38:54|0|1000.00

If you find that the above takes too long to sort you may want to have a look at the Sort::External::Cookbook which suggests transforming your data before sorting in order to avoid using the $sortscheme block of code. The GRT way described in the Cookbook looks harder to understand than using the subsort parameter (as in the above script) but the GRT way is supposed to run faster.

d5e5 109 Master Poster

Hello,

Can someone tell me the equivalent perl script/command for following unix command:

sort -t"|" -k1,1 -T '/temp' input.txt > output.txt

Here, I want mention different physical directory for temprary sort file storage. like - T in unix shell command. In other word, How to mention different workspace directory in the Perl Sort command?

Thanks!

As far as I know the sort command in Perl does all the sorting in memory and you cannot specify a temporary directory. If your input file is too large to sort in memory you may consider using a module such as Sort::External. You'll find the docs mention a working_dir parameter.

d5e5 109 Master Poster
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $rec = '(9430, 3656) (9147, 14355) (133, 14393) (7917, 9513) (3719, 12775)';

$rec =~ s/['(),]//g; #Remove everything except digits and spaces

my %hash = split / /, $rec; #Split on space character and assign to hash

print Dumper(\%hash);
d5e5 109 Master Poster

We can take the greatest five from a hash without sorting. I don't know if that makes it more efficient. For example, the following prints the five colors having the longest wavelengths.

#!/usr/bin/perl
use strict;
use warnings;

my %d = (violet => 400,
         red    => 650,
         indigo => 445,
         orange => 590,
         blue   => 475,
         yellow => 570,
         green  => 510);

my @p;

foreach my $r (0 .. 4){
    foreach my $k (keys %d){
        if (!defined$p[$r]
            or $d{$k} > $d{$p[$r]}){
            $p[$r] = $k;
        }
    }
    delete $d{$p[$r]};#After saving one of top 5, delete it from hash
}

print join "\n", @p;
d5e5 109 Master Poster

Here's an example of sorting an array by a specified column.

#!/usr/bin/perl;
use strict;
use warnings;

my @array = sort howtosort <DATA>;

foreach (@array){
    chomp;
    print "$_\n";
}

sub howtosort{
    my @flds_a = split(/\|/, $a);
    my @flds_b = split(/\|/, $b);
    $flds_a[2] cmp $flds_b[2]; #compare key fields to sort
}
__DATA__
1780438|20110709|0000007704000000000000004888|7704|48881|PE|08/12/2008 11:38:54|0|1000.00
1780437|20110708|0000007704000000000000004882|7704|48882|PE|08/12/2008 11:38:54|0|1000.00
1780436|20110707|0000007704000000000000004889|7704|48887|PE|08/11/2008 11:38:54|0|1000.00
1780435|20110703|0000007704000000000000004881|7704|48888|PE|08/12/2008 11:38:54|0|1000.00
k_manimuthu commented: Done wonderful +6
d5e5 109 Master Poster

i want to be able to treat contents of a file as an array and traverse through it that way without having to store them in an array,can i do that?if so how?

Tie::File treats the contents of a file as an array, but not a 2D array. You might try using Tie::File with the recsep option and calculate that, for example, if you want the first (really the zero'th) element in the third (really the 2nd) row and there are 5 columns in every row then you can refer to element[10] of your one-dimensional array. It will take some thinking to determine how to calculate your row and column indices, but once you have it figured out the program should run fairly fast because Tie::File doesn't have to load the entire file into memory.

d5e5 109 Master Poster

thank you very much.
It is run well.
I had repair

for (my $i=1; $i<=$#arr; $i++){$arr{$i}+=$arr[$i];}{

but it was printed begin with the "Colum 0".

total of colum 0 is 9 and average is 3 
      total of colum 1 is 11 and average is 3.6666666 
     .................................................

How can I set up the colum 0 to become colum 1? It means "total of colum 1 is 9 and average is 3".
By the way, could you please show me where I could study the mean of symbol in perl and how to use it?(EX: what mean of m/^\d/)

I think you would need to convert your one-line for loop statement into a multi-statement for loop so you can create a variable for column number that will be $_ plus one.

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;

my (@arr, %arr, $row);
while(<DATA>){
    #skip non-numeric data
    next unless m/^\d/;
    
    @arr=split(/\s+/, $_);
    for (my $i=0; $i<=$#arr; $i++){
        $arr{$i}+=$arr[$i];
    }
    $row++;
}

#When $_ = 0 you want to call it column 1, etc.
for (sort keys (%arr)){
    my $col_nbr = $_ + 1;
    say ("total of column $col_nbr is $arr{$_} and average is \t", ($arr{$_}/$row));
}

__DATA__
#########
1 2 3 4 5

6 6 6 4 4
2 3 4 5 6

To find the meaning of m/^\d/ you can Google Perl regular expressions. The ^ means the beginning of the record and \d represents any numeric digit so m/^\d/ will match any data record that begins with a number. …

d5e5 109 Master Poster

You need to name your DATA section __DATA__ in capital letters, not __data__ .

You can skip records that don't begin with a number, as follows:

#skip non-numeric data
next unless m/^\d/;

Notice that the script ignores column 0 and starts with column 1, because in Perl array indexes begin at 0. Is that what you want?

d5e5 109 Master Poster

After the say $rec; statement you should reset the @fields array to an empty list so that your output line doesn't keep doubling in size (in case there will be more than one output record.) @fields = ();#Empty the array after record has been output

d5e5 109 Master Poster

Beginner tutorials can show you how to read and write files. As for splitting the data, pushing some of it into an array and outputting the array you could do something like the following:

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;

my @fields;
while (<DATA>){
    process_end_of_rec() if m/^;/ and @fields;
    if (m/^\@C/){
        my @data = split;
        push(@fields, $data[2]);
    }
}

process_end_of_rec();

sub process_end_of_rec{
    foreach my $fld(@fields){
        $fld = '' unless defined($fld);
    }
    my $rec = join(',', @fields);
    say $rec;
}
__DATA__
; Record 1
@FULLTEXT PAGE
@T R000358
@C ENDDOC# R000358
@C BEGATTACH R000358
@C ENDATTACH R000358
@C MAILSTORE No
@C AUTHOR 
@C BCC 
@C CC 
@C COMMENTS 
@C ATTACH 
@C DATECREATED 11/23/2010
@C DATELASTMOD 07/18/2010
@C DATELASTPRNT 
@C DATERCVD 
@C DATESENT 
@C FILENAME wrangling.wpd
@C LASTAUTHOR 
@C ORGANIZATION 
@C REVISION 
@C SUBJECT 
@C TIMEACCESSED 00:00:00
@C TIMECREATED 15:21:34
@C TIMELASTMOD 09:04:12
@C TIMELASTPRNT 
@C TIMERCVD 
@C TIMESENT 
@C TITLE 
@C TO 
@C FROM

Output: R000358,R000358,R000358,No,,,,,,11/23/2010,07/18/2010,,,,wrangling.wpd,,,,,00:00:00,15:21:34,09:04:12,,,,,,

d5e5 109 Master Poster

What part of the script gives you trouble? Reading a file? Splitting the data and saving them in an array? Writing a new file?

d5e5 109 Master Poster

I don't have Windows, but I think that instead of using the ODBC driver you may have more success using the DBD::mysql driver. For instructions see How to install and configure DBD::mysql

d5e5 109 Master Poster

Post your SQL statement here, wrapped in [code] your SQL statement [/code] tags and maybe someone here will be able see what's wrong with the syntax.

d5e5 109 Master Poster

Only very old versions of Perl require you to name file handles as barewords such as FILE1 etc. Use lexical filehandles instead.

FILEHANDLEs without sigils are a sign of ancient Perl. This page aims to describe why this style was once common, what's wrong with it, and what you can do to fix up your legacy source code.

d5e5 109 Master Poster

I haven't tested it under Windows, only Linux. However I don't see any reason why it shouldn't work just as well in a Windows environment.

d5e5 109 Master Poster

I found my mistake. my @suffixes = qw( .axf .fls .txt .cfg .log .o); is not quite right because each pattern is treated as a regex (regular expression) and the dot is a special regex character which needs to be 'escaped' by preceding it with a backslash in order to represent a dot character in the extension. Instead we should escape the dot in all the patterns, like this: my @suffixes = qw( \.axf \.fls \.txt \.cfg \.log \.o); Does the following script do what you want?

#!/usr/bin/perl
use strict;
use warnings;

use File::Find;
use File::Path qw(make_path);
use File::Basename;
use File::Copy;

# Target Directory
#my @directories_to_search = ('HW/TARGET_PLATFORM/MODEM_DEBUG');
my @directories_to_search = ('/home/david/Programming/data');

#my @suffixes = qw( \.axf \.fls \.txt \.cfg \.log );
my @suffixes = qw(\.o);#Backslash to escape regex special character dot ('.')

find(\&wanted, @directories_to_search);
sub wanted{
    my($filename, $directories, $suffix) = fileparse($File::Find::name, @suffixes);

    if ($suffix){
	#my $target_dir = "/home/mmhuqx/test/$directories";
        my $target_dir = "/home/david/Programming/test/$directories";
	make_path($target_dir);
        print "Try to copy $File::Find::name ... Extension is $suffix\n";
	copy($File::Find::name,"$target_dir/$filename$suffix") or die "Copy failed: $!";
        #print only if copied.
        #print "Copied $File::Find::name ... Extension is $suffix\n";
    }   
}
d5e5 109 Master Poster

I don't see what causes that error. Taking a second look at the script I see that the print statement should be inside the if block because we only want to print info about the files with the desired extension, since they are the only files to be copied. Try putting the print statement before the copy statement so we can see what it is going to try to copy before the error occurs. The following may work, or at least print better information for debugging:

#!/usr/bin/perl
use strict;
use warnings;

use File::Find;
use File::Path qw(make_path);
use File::Basename;
use File::Copy;

# Target Directory
#my @directories_to_search = ('HW/TARGET_PLATFORM/MODEM_DEBUG');
my @directories_to_search = ('/home/david/Programming/data');
#my @suffixes = qw( .axf .fls .txt .cfg .log );
my @suffixes = qw(.o);


find(\&wanted, @directories_to_search);
sub wanted{
    my($filename, $directories, $suffix) = fileparse($File::Find::name, @suffixes);

    if ($suffix){
	#my $target_dir = "/home/mmhuqx/test/$directories";
        my $target_dir = "/home/david/Programming/test/$directories";
	make_path($target_dir);
        print "Try to copy $File::Find::name ... Extension is $suffix\n";
	copy($File::Find::name,"$target_dir/$filename$suffix") or die "Copy failed: $!";
        #print only if copied.
        #print "Copied $File::Find::name ... Extension is $suffix\n";
    }   
}
d5e5 109 Master Poster

For example:

#!/usr/bin/perl
use strict;
use warnings;

use File::Find;
use File::Path qw(make_path);
use File::Basename;
use File::Copy;

my @directories_to_search = ('/home/david/Programming/data');
my @suffixes = qw(.axf .fls .txt .err .dummy .log .cfg);

find(\&wanted, @directories_to_search);

sub wanted{
    my($filename, $directories, $suffix) = fileparse($File::Find::name, @suffixes);
    if ($suffix){
        my $target_dir = "/home/david/test/$directories";
        make_path($target_dir);
        copy($File::Find::name,"$target_dir/$filename$suffix") or die "Copy failed: $!";
    }

    print "Copied$filename$suffix\n";
}
d5e5 109 Master Poster

**Deleted** (Didn't notice posts on page 2. Looks like this has already been solved).

d5e5 109 Master Poster

$sql = "INSERT INTO leaderboard (First Name) VALUES ('values')"; When a column name contains a space you must quote it using back ticks, like this: $sql = "INSERT INTO leaderboard (`First Name`) VALUES ('values')";

d5e5 109 Master Poster

Great! I'm glad that's what you wanted. Please remember to mark this thread solved.

d5e5 109 Master Poster
#!/usr/bin/perl
use strict;
use warnings;

while(my $rec = <DATA>){
    chomp($rec);
    #the // default delimiters for a match can be changed to arbitrary delimiters
    #The following regex is delimited by {} because pattern contains forward slashes
    $rec =~ s{/English-Folder/(.+)_e\.shtml}{/French-Folder/$1_f.shtml};
    print $rec, "\n";
}
__DATA__
/English-Folder/temp.php.u1conflict
/English-Folder/temp-post01.html
/English-Folder/temp-post.html
/English-Folder/temp.txt
/English-Folder/target01_e.shtml
/English-Folder/testcomponent.html
/English-Folder/target02_e.shtml
/English-Folder/zengarden-sample.html

Outputs:

/English-Folder/temp.php.u1conflict
/English-Folder/temp-post01.html
/English-Folder/temp-post.html
/English-Folder/temp.txt
/French-Folder/target01_f.shtml
/English-Folder/testcomponent.html
/French-Folder/target02_f.shtml
/English-Folder/zengarden-sample.html
d5e5 109 Master Poster

Sorry, I don't know how to make your script run faster other than what I already said about slurping the file into your scalar variable instead of reading it one line at a time.

Taking 50 substrings starting at each character in a large file is probably taking most of the runtime, and I don't know a way of getting the substrings faster.

d5e5 109 Master Poster

You're welcome. Please don't forget to mark this thread 'solved'.