d5e5 109 Master Poster

Which is the best way to make a plataform-independent perl app, I tried Perl Packer but it seems it wont work on windows, I don't want to use activestate? what are my alternatives?

If all your users have access to the web then you could develop it as a web app. Otherwise, I don't know but this thread on StackOverflow might interest you.

d5e5 109 Master Poster

I can't resolve a question.write a stucture in html that can call a perl script.please help.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
    "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
    <title>Run temp.cgi</title>
</head>
<body>
    <a href='../cgi-bin/temp.cgi'>
        Click here to run a perl script called temp.cgi
    </a>
</body>
</html>
d5e5 109 Master Poster

Hi, since I need to pring out the $ID, $SQ, $KW, AND $OC within the file should I declare them as variables and then print them out? Thanks

Why declare four scalar variables just to store the four literal values you want to look for at the beginning of the lines? Also, I don't know why you want to follow the same route as illustrated in the script you posted. That script reads two mult-line records into two variables: $annotation and $dna, which it then prints. Why do that if what you really want is to print the lines from the file that begin with ID, SQ, KW, or OC?

Why not take the following approach?

  1. Read the file one line at a time
  2. Test each line to see if it begins with ID, SQ, KW, or OC
  3. Decide whether or not to print the line based on the result of the test?

Maybe I don't understand what you mean by 'parsing' the file but it seems to me that a simple script like the following does what you say you want to do:

#!/usr/bin/perl
#embl01.pl
use strict;
use warnings;

my $filename = '/home/david/Programming/data/EMBL_records.txt';

open my $fh, $filename or die "Could not open $filename: $!";

while (<$fh>){
    chomp;
    if (m/^(ID|SQ|KW|OC)/){#Does line start with ID, SQ, KW, or OC?
        print $_, "\n";
    }
}

This gives the following output:

ID   M91373; SV 1; linear; mRNA; STD; PLN; 1131 BP.
KW   peroxidase.
OC   Eukaryota; Viridiplantae; Streptophyta; …
d5e5 109 Master Poster

I really don't know the bioinformatics subject matter involved here. I tried changing the regex and adding a chomp statement because including the newline \n in my regex caused it to fail on my computer for some reason. Here is what I changed:

# Now separate the annotation from the sequence data
#($annotation, $dna) = ($record =~ /^(LOCUS.*ORIGIN\s*\n)(.*)\/\/\n/s);#GenBank layout
($annotation, $dna) = ($record =~ /^(.*SQ\s*)(.*)\/\//s);#Trying to matchEMBL layout
chomp($annotation, $dna);
d5e5 109 Master Poster

I should have waited until you answered mitchems questions before jumping in. To clarify the above solution: if the string you test contains only one vm-name then the regex $data =~ /\Q[1.3.6.1.4.1.6876.2.1.1.2.1 =>\E\s*(.+?)\]/; should work for you. If your data file contains more than one vm-name, but no more than one per line you can still retrieve them all by reading and testing one line at a time.

However, if your data contains more than one vm-name and you read all the data into one scalar variable then my regex will find only the first vm-name. In that case, you can add the g flag to the regex and use it in list context, as follows:

#!/usr/bin/perl
use strict;
use warnings;

my $data = <<END;
Unknown Trap from 192.168.200.35 : Enterprise [1.3.6.1.4.1.6876.4.1] : Specific [3] : Generic [6] : Varbinds [1.3.$
[1.3.6.1.4.1.6876.50.102 =>
/vmfs/volumes/4c8157c9-4cbe22b0-9192-0019b9e2f54d/vm-jetmore/vm-jetmore.vmx]
[1.3.6.1.4.1.6876.2.1.1.2.1 => vm-jetmore]
more data
[1.3.6.1.4.1.6876.2.1.1.2.1 => vm-newwave]
Another one
[1.3.6.1.4.1.6876.2.1.1.2.1 => vm-failsafe]
END

my @vms = $data =~ /\Q[1.3.6.1.4.1.6876.2.1.1.2.1 =>\E\s*(.+?)\]/g;

print "Found the following vm-names:\n", join("\n", @vms);

Outputs the following:

Found the following vm-names:
vm-jetmore
vm-newwave
vm-failsafe
d5e5 109 Master Poster

Good, except for one detail. Look at the last word at the end of your output when you run your program. Does it say 'None'? Is 'None' the last word in alice_in_wonderland.txt?

d5e5 109 Master Poster

Assuming the example, which I hardcoded and assigned to $data, spans multiple lines and the value preceding the vm name will always be the same, you could try this.

#!/usr/bin/perl
use strict;
use warnings;

my $data = <<END;
Unknown Trap from 192.168.200.35 : Enterprise [1.3.6.1.4.1.6876.4.1] : Specific [3] : Generic [6] : Varbinds [1.3.$
[1.3.6.1.4.1.6876.50.102 =>
/vmfs/volumes/4c8157c9-4cbe22b0-9192-0019b9e2f54d/vm-jetmore/vm-jetmore.vmx]
[1.3.6.1.4.1.6876.2.1.1.2.1 => vm-jetmore]
END

$data =~ /\Q[1.3.6.1.4.1.6876.2.1.1.2.1 =>\E\s*(.+?)\]/;
my $vm = $1;
print "The vm name is $vm\n";
d5e5 109 Master Poster

I see what you mean about the split on spaces. Do you think if I remove that then the program will work? Thanks

No, that sounds too optimistic. Usually when I debug a program I get rid of one error and then another one pops up.:(

I think the reason the $site variable has no value when you try to concatenate it with something else is that the parseREBASE subroutine expects to find both the name and the site on each line that it reads, but in the file you attached there is only one field on each line. After reading a line to get the name, the program should read the next line to get the site. I made a change to the parseREBASE sub to read the next line and assign it to $site so $site will not be uninitialized when it is used. Try replacing the parseREBASE sub with the following:

sub parseREBASE {

    my($rebasefile) = @_;

    use strict;
    use warnings;

    # Declare variables
    my @rebasefile = (  );
    my %rebase_hash = (  );
    my $name;
    my $site;
    my $regexp;

    # Read in the REBASE file
    my $rebase_filehandle = open_file($rebasefile);

    while(<$rebase_filehandle>) {

    # Discard header lines
    ( 1 .. /Rich Roberts/ ) and next;

    # Discard blank lines
    /^\s*$/ and next;
    #--------------------------Start of changes 2010-12-02 d5e5
    ##### The following commented-out code assumes there are two or three fields
    ##### in each line of the file you attached, but there is only one
    ##### field per …
d5e5 109 Master Poster

Now I get the error.
Regarding this statement in the parseREBASE sub: my @fields = split( " ", $_); I don't understand why you split each line from the file on spaces because each non-blank line appears to contain only one sequence followed by end-of-line character but no spaces. For example, lines 10 through 15 of the file you attached look like this:

AanI
TTA!TAA
AarI
CACCTGCNNNN!
AasI
GACNNNN!NNGTC

... so why split on spaces?

My computer time is just about over for today but I'll try to have another look at this tomorrow.

d5e5 109 Master Poster

I copied and ran your script but couldn't reproduce the error you got. The program kept prompting me with the message "Search for what restriction site for (or quit)?: " as long as I typed and entered some input. When I pressed enter with no input the program exited with no error. I created a dummy file called 'rebase.txt' but didn't know what to put in it.

That message you got, "use of initialized value $site in concatenation or string" probably means that the $site variable has no value assigned to it when some statement attempts to combine it with another string. But I don't know what data to enter to get your program to reproduce the error you are getting.

d5e5 109 Master Poster

The question confuses me too. I still see only 10 bases and I don't see any 'ATG' in the sequence. Whoever gave you this question may have made a mistake.

d5e5 109 Master Poster

print read_book() calls the read_book function once. The read_book function does something to each of the lines in the file, so at the end of the first for-loop the variable l (bad name for a variable... looks like a number 1) contains the last line of the file. Then the second for-loop reads the first word into variable w and returns w. Returning from a function means exiting the function, which is called only once. Result: the one word returned is the first word in the last line of the file.

d5e5 109 Master Poster

Hello, I would like to know if I need to use a regular expression to match the desired substring in order to print out 10 characters of the start codon ATG.

My dna sequence is "CATAGAGATA"

Thanks for any advice.

I don't think I understand the question. Your dna sequence consists of 10 characters and you want to print out 10 characters starting with the substring 'ATG'? I don't see any occurrence of the substring 'ATG' in your sequence. Can we shuffle the dna sequence until it contains (or starts with?) 'ATG'? Please tell us how you would determine the output without using a program and then maybe we can advise how to write a program that does it.

For example, does the following do what you want?

#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(shuffle); #This module includes a method to shuffle arrays.

my $str = "CATAGAGATA";
my @arr;

while (1){
    @arr = $str =~ m/[AGCT]/g; #Convert string into array of single letters
    @arr = shuffle(@arr); #Shuffle the letters of the array randomly
    last if @arr[0,1,2] = qw(A T G)# Exit loop if first 3 elements = start codon
}

print "Shuffled sequence is:\n";
print join('', @arr), "\n";

This outputs:

Shuffled sequence is:
ATGAGTCTAA
d5e5 109 Master Poster

LOL?
are you trying to unmask my lack of Perl knowledge?
because I already stated I started learning perl few days ago.

BTW thanks for the answer but what I was wondering was if it is possible to print messages while the loop is running not when it finishes.

Because I coded a small script that runs for about 5 mins and It would be a nice detail to print the number of seconds until the process finishes

I think that is not possible thanks anyways.

I didn't mean to disparage your knowledge of perl and maybe I misunderstood your original question. What I thought you asked originally was "while the loop is running i want to print the numbers of times the loop has been repeated" and IMO mitchems answered that question. Now you ask how to print the number of seconds left before the process finishes. In my opinion that is a different question. Of course if you don't know in advance how much time a process is going to take then you probably cannot say how much longer the process will take while the process is running. But you would need to provide us with more information about the process before we can say whether what you want to do is possible. Why not start a new thread to ask it and mark this one solved?

d5e5 109 Master Poster

If you replace the cleanup => 'VACUUM', option in the above script with the prune => 1, option a neat thing happens. The folder, database and table are created if they don't already exist, then before the script terminates the folder and database are deleted. That might come in handy if you wanted to load a large amount of data in order to take advantage of various database query operations but did not want to save the data as a database. (The 'VACUUM' option condenses the database by reclaiming unused space, which you don't need to do if you prune it.)

In order for the prune option to work properly (i.e. delete folder as well as file), you need to have the File::Remove module installed. (Search for libfile-remove-perl in Synaptic Package Manager.)

d5e5 109 Master Poster

RTMF...

use strict;
use warnings;
my $x=0;
while($x<100){
	$x++;
	print "$x\n";
}
print "the loop ran $x times\n";

I wonder if terabyte would mind telling us how to code a few tools, including calculators without printing during a loop.:-/

If I were the OP, I'd mark this one solved.

d5e5 109 Master Poster

Someone mentioned in another thread they had found and downloaded OrLite, presumably to consider using it to access data in an SQLite database. Since, to use OrLite you need to have the DBI and DBD::SQLite modules installed on your computer as well, I think you could just use DBI to accomplish whatever you want to do with SQLite without bothering with OrLite. Nevertheless, some people may prefer the OrLite way of doing it. Unfortunately when you search for examples of how to use OrLite you may not find nearly as many as when looking for examples of using DBI, so if you like to have lots of examples available you may prefer sticking with DBI.

The following example creates an SQLite database in a folder, creates a table, populates it with data, then does a simple select of all the data. If the folder, database and table already exist, more data is inserted into the existing table.

#!/usr/bin/perl
use strict;
use warnings;
use ORLite {
     package      => 'Foo::Bar',
     file         => 'data/OrLiteDemo.db',
     user_version => 0,
     create       => sub {
         my $dbh = shift;
         $dbh->do('CREATE TABLE inventory ( item TEXT NOT NULL, quantity INT );')},
     cleanup      => 'VACUUM',
};

while (<DATA>){
    my @fields = split ',';
    my ($itm, $qty) = ($fields[0], $fields[1]);
    Foo::Bar->do(
      'insert into inventory (item, quantity) values (?, ?)',
      {},
      $itm, $qty,
  );
}

my @stock = Foo::Bar::Inventory->select;

foreach (@stock){
    printf("%s\t%d\n", ($_->item, $_->quantity));
}
__DATA__
hammers,1
nails,153
screwdrivers,7
saws,45
bolts,23
nuts,23

This gives the following output …

d5e5 109 Master Poster

After further googling I found a post titled Perl Standard Deviation function is wrong that explains that there are at least two ways of calculating standard deviation which give noticeably different results for small data lists such as my examples use.

Conclusion: the values for standard deviation calculated by Statistics::Basic and Statistics::Descriptive differ for small data sets but this doesn't mean either value is wrong. What statistics module you use for calculating standard deviation depends on what calculation method you, your colleagues and your boss agree on.

d5e5 109 Master Poster

I know next to nothing about statistics so tried Statistics::Basic because it seemed easy to use. Then I tried Statistics::Descriptive because it has functions that Statistics::Basic lacks. What I did not expect was to get a different result when calculating Standard Deviation using Statistics::Descriptive than when using the other module or a subroutine copied form Yahoo Answers. Am I doing something wrong, or does Statistics::Descriptive have a deviant way of calculating deviations?

#!/usr/bin/perl
use strict;
use warnings;

use Statistics::Basic qw(:all);
use Statistics::Descriptive;

my @d = (5,10,5,100,150);
print 'StdDev according to Basic is ', stddev(@d), "\n"; #Basic

my $stat = Statistics::Descriptive::Full->new();
$stat->add_data(@d);
print 'StdDev according to Descriptive is ', $stat->standard_deviation(), "\n"; #Descriptive

print 'StdDev according to subroutine is ', standard_deviation(@d) . "\n";

sub standard_deviation {
    my (@numbers) = @_;

    #Prevent division by 0 error in case you get junk data
    return undef unless ( scalar(@numbers) );

    # Step 1, find the mean of the numbers
    my $total1 = 0;
    foreach my $num (@numbers) {
        $total1 += $num;
    }
    my $mean1 = $total1 / ( scalar @numbers );

    # Step 2, find the mean of the squares of the differences
    # between each number and the mean
    my $total2 = 0;
    foreach my $num (@numbers) {
        $total2 += ( $mean1 - $num )**2;
    }
    my $mean2 = $total2 / ( scalar @numbers );

    # Step 3, standard deviation is the square root of the
    # above mean
    my $std_dev = sqrt($mean2);
    return $std_dev;
}

Gives the following output:


       
d5e5 109 Master Poster

To find out if the effect of push differs from the effect of unshift, try them both and see.

#!/usr/bin/perl
use strict;
use warnings;

my @pusharray = ('original', 'contents');
push @pusharray, $_ foreach(1..10);
print "array contains @pusharray\n";

my @unshiftarray = ('original', 'contents');
unshift @unshiftarray, $_ foreach(1..10);
print "array contains @unshiftarray\n";

Output:

array contains original contents 1 2 3 4 5 6 7 8 9 10
array contains 10 9 8 7 6 5 4 3 2 1 original contents
d5e5 109 Master Poster

Statistics::Descriptive (search for libstatistics-descriptive-perl on Synaptic PM) calculates all the functions you want but gives different results than Statistics::Basic for Standard Deviation. I don't know why.

#!/usr/bin/perl
use strict;
use warnings;

use Statistics::Descriptive;

#Try to open first command-line argument and assign to a filehandle
#If the open fails, terminate and print $! which contains open error status.
open my $fh, '<', $ARGV[0] or die("Cannot open $ARGV[0]: $!\n");

my @in;

my %scaffolds;
while(<$fh>){  
# first load data into hash of arrays
    chomp;
    my @columns = split(/\s|\t/);
    my ($scaf, $fsite) = ($columns[1], $columns[2]);
    $scaffolds{$scaf} = [] unless exists $scaffolds{$scaf};
    push @{$scaffolds{$scaf}}, $fsite;
}
close $fh;

print join "\t", qw(Scaffold Min Max Mean Mode StdDev), "\n";
foreach my $scaf (sort keys %scaffolds){
    my $stat = Statistics::Descriptive::Full->new();
    my @start_sites = sort {$a <=> $b} @{$scaffolds{$scaf}};

    $stat->add_data(@start_sites);
    my $count = $stat->count();
    my $min = $stat->min();
    my $max = $stat->max();
    my $mean = $stat->mean();
    my $mode = $stat->mode();
    $mode = 'None' if !defined $mode; #No element occurs more than any other
    my $stddev = $stat->standard_deviation();
    if ($count > 1){
        print join "\t", ($scaf, $min, $max, $mean, $mode, $stddev), "\n";
    }
}

Outputs:

Scaffold	Min	Max	Mean	Mode	StdDev	
scaffold657_3__	226	1348	787	None	793.373808491306	
scaffold657_5__	8776	14581	11678.5	None	4104.75486478791	
scaffold657_6__	11361	20463	15059	None	4784.81222202084	
scaffold657_9__	4998	6855	6195.33333333333	None	1038.71378797691
d5e5 109 Master Poster

Statistics::Basic doesn't seem to min and max functions. You can get min and max values by sorting the array of start sites before printing. I modified the script to include min and max.

#!/usr/bin/perl
use strict;
use warnings;

use Statistics::Basic qw(:all);

#Try to open first command-line argument and assign to a filehandle
#If the open fails, terminate and print $! which contains open error status.
open my $fh, '<', $ARGV[0] or die("Cannot open $ARGV[0]: $!\n");

my @in;

my %scaffolds;
while(<$fh>){  
# first load data into hash of arrays
    chomp;
    my @columns = split(/\s|\t/);
    my ($scaf, $fsite) = ($columns[1], $columns[2]);
    $scaffolds{$scaf} = [] unless exists $scaffolds{$scaf};
    push @{$scaffolds{$scaf}}, $fsite;
}
close $fh;

print join "\t", qw(Scaffold Min Max Mean StdDev), "\n";
foreach my $scaf (sort keys %scaffolds){
    my @start_sites = sort {$a <=> $b} @{$scaffolds{$scaf}};
    my $count = @start_sites;
    my $min = $start_sites[0]; #First element is smallest because of sort
    my $max = $start_sites[$#start_sites];
    my $mean = mean(@start_sites);
    my $stddev = stddev(@start_sites);
    if ($count > 1){
        print join "\t", ($scaf, $min, $max, $mean, $stddev), "\n";
    }
}

This gives the following output:

Scaffold	Min	Max	Mean	StdDev	
scaffold657_3__	226	1348	787	561	
scaffold657_5__	8776	14581	11,678.5	2,902.5	
scaffold657_6__	11361	20463	15,059	3,906.78	
scaffold657_9__	4998	6855	6,195.33	848.11
d5e5 109 Master Poster

I found Statistics::Basic but couldn't find OrLite.

I had to make some changes to your script to get it to run. Does this run for you? It doesn't give all the statistics you want but maybe it can serve as a first step.

#!/usr/bin/perl
use strict;
use warnings;

use Statistics::Basic qw(:all);

#Try to open first command-line argument and assign to a filehandle
#If the open fails, terminate and print $! which contains open error status.
open my $fh, '<', $ARGV[0] or die("Cannot open $ARGV[0]: $!\n");

my @in;

my %scaffolds;
while(<$fh>){  
# first load data into hash of arrays
    chomp;
    my @columns = split(/\s|\t/);
    if ( !defined $scaffolds{$columns[1]}){
        $scaffolds{$columns[1]} = [$columns[2]];
    }
    else{
        my @arr = @{$scaffolds{$columns[1]}};
        push @arr, $columns[2];
        $scaffolds{$columns[1]} = [@arr];
    }
}
close $fh;

printf("%-20s%20s%20s\n", 'Scaffold', 'Mean', 'StdDev');
foreach my $scaf (sort keys %scaffolds){
    my @start_sites = @{$scaffolds{$scaf}};
    my $count = @start_sites;
    my $mean = mean(@start_sites);
    my $stddev = stddev(@start_sites);
    if ($count > 1){
        printf("%-20s%20s%20s\n", $scaf,$mean,$stddev);
    }
}

This gives the following output:

Scaffold                            Mean              StdDev
scaffold657_3__                      787                 561
scaffold657_5__                 11,678.5             2,902.5
scaffold657_6__                   15,059            3,906.78
scaffold657_9__                 6,195.33              848.11
d5e5 109 Master Poster

The first problem I see in your script: my $data = @ARGV; will count the number of arguments provided on the command line where you run perl scriptname.pl sample.txt and assign that number to $data. When you assign an array to a scalar variable the result is the size of the array. Instead you want to open the contents of the first command-line argument (assuming it is a valid file name or path).

If you want to specify the name of the input file at runtime from the command line you can do it like this:

#Try to open first command-line argument and assign to a filehandle
#If the open fails, terminate and print $! which contains open error status.
open my $fh, '<', $ARGV[0] or die("Cannot open $ARGV[0]: $!\n");

All the tabs appear to have been removed from your sample data in the process of posting it here. Could you attach your sample data as an attachment to your post, please? For example attach sample.txt ("go advanced" while editing your post here, and look for the "manage attachments" button). Thanks.

d5e5 109 Master Poster

I should have mentioned I am running on UBUNTU which is also Linux based not Windows. Did I install the wrong package?

Good. Do you have the Synaptic Package Manager? It's really easy to use but it only finds some of the modules available on CPAN and so it won't help you install Statistics::Lite. I haven't worked much with statistics but if Perl won't let use Statistics::Lite I would say you haven't installed it. Downloading a file from CPAN and copying it to your bin folder doesn't install it, apparently.

If you have the Synaptic Package Manager and don't mind installing something other than Statistics::Lite you could try the following: Start up Synaptic Package Manager and type libstatistics-basic-perl into the Quicksearch box. Select the package(s) you want from the resulting list, mark them for installation and then Apply. That should install a statistics module you can use in your Perl scripts.

d5e5 109 Master Poster

My platform is linux so I'm not the one to say how to install from CPAN to Windows, but have a look at http://www.daniweb.com/forums/post1369660.html#post1369660 which is by mitchems about installing Statistics::Descriptive on Windows. His advice would probably apply to installing Statistics::Lite as well. Just copying the module into your bin folder wouldn't work if perl doesn't search that folder for modules, or if that module does not consist of pure perl and uses other components which need to be compiled.

d5e5 109 Master Poster

I don't find this regex character class [:<:] in the docs - what does it mean?

It means 'start-of-word boundary'. I wanted to search on upper-case 'F' but RLIKE doesn't have a case-sensitive mode, so I looked for a way to specify 'F' only when it starts a word. It's kind of buried in the docs at http://dev.mysql.com/doc/refman/5.1/en/regexp.html Scroll down to where it says

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).

d5e5 109 Master Poster

Why would you not want to use %word% ? With your way, if you have something like "Harry Potter and the Chamber of Secrets" and the person searches for "Chamber of Secrets" then no result would be returned.

If you have code to ignore "The" every time, then if someone searches for "The Simpsons" nothing would be returned as you are ignoring "The" from the database.

Also, if I'm not mistaken, smantscheff's code will only replace "The " and not "the " and since small words such as "the", "a", "and" etc. are supposed to be in lowercase when in titles, it shouldn't replace those.

CREATE TABLE IF NOT EXISTS `test` (
  `name` varchar(255) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `test` (`name`) VALUES
('Futurama'),
('The Office'),
('The Friends'),
('The Simpsons'),
('Harry Potter and the Chamber of Secrets');
SELECT name
FROM `test`
WHERE name
RLIKE '[[:<:]]F'
OR name
RLIKE 'Chamber of Secrets';

This gives the following output:

+-----------------------------------------+
| name                                    |
+-----------------------------------------+
| Futurama                                | 
| The Friends                             | 
| Harry Potter and the Chamber of Secrets | 
+-----------------------------------------+
d5e5 109 Master Poster

Thank you for your help and it makes sense

You're welcome. Please mark this thread solved.

d5e5 109 Master Poster

Could you have more than one prod_id per proposal-evaluator group? If so, which prod_id do you want... first, last, greatest, least? Or maybe you should group by prod_id in addition to PA.proposal_id and PA.evaluator_ID.

d5e5 109 Master Poster

You're welcome. Instead of saying 'ascii character 23' I should have said 'the character represented by hexadecimal number 23,' which is the pound sign #.

print "\x23"; # Prints # (Pound sign)
d5e5 109 Master Poster
#!/usr/bin/perl
#score_many_mutant.pl
use strict;
use warnings;

my $sequence='A G G G C A C C T C T C A G T T C T C A T T C T A A C A C C A C
A T A A T T T T T A T T T G T A T T A T T C A G A T T T T T C A T G A A C T T T
T C C A C A T A G A A T G A A G T T G A C A T T G T T A T T T C T C A G G G T C
T C G G T T C A C C A G T A T T T G A C A A A C T T G A A G C T G A A C T A G C
T A A A G C T G C T A T G T C A T T G C C T G C A A C C A A G G G C T T T C A G
T T T G G T A G T G G G T T T G C A G G C A C C T T T T T G A C T G G G A G T G
A A C A C A A T G A …
d5e5 109 Master Poster

Sorry about the square brackets this was my first time using the wrap code feature on this site. I thank you for all of the help and will try to see if the previous code you made can help me with the calculation of the scores.

Assuming the script you posted mutates a string of bases by changing one letter at one position, couldn't you calculate the scores as follows?

#!/usr/bin/perl
#score_mutant.pl
use strict;
use warnings;

my $sequence = 'AGCT'; #Short string (Should work for long strings too.)
my   $mutant = 'AGAT'; #Copy of above string except C has mutated to A.

print "Sequence,Mutant,Score\n";
foreach(0 .. length($sequence) - 1){
    my $s = substr($sequence, $_, 1);
    my $m = substr($mutant, $_, 1);
    my $score = determine_score($s, $m);
    print "$s,$m,$score\n";
}

sub determine_score{
    my ($alpha, $beta) = sort @_; #Sort two base args in alphabetical order
    
    #If the base pair did not change assign 0.
    return 0 if $alpha eq $beta;
    
    #If a purine was mutated to a purine,
    #or a pyrimidine to a pyrimidine assign a value of +1 to that base pair.
    #If a purine was mutated to a pyrimidine or vice versa
    #assign a value of -1 to that base pair.
    my %rules;
    $rules{'A'}{'G'} = +1;
    $rules{'A'}{'T'} = -1;
    $rules{'A'}{'C'} = -1;
    $rules{'G'}{'T'} = -1;
    $rules{'C'}{'G'} = -1;
    $rules{'C'}{'T'} = +1;
    
    return $rules{$alpha}{$beta};
}

This gives the following output:

Sequence,Mutant,Score
A,A,0
G,G,0
C,A,-1
T,T,0
d5e5 109 Master Poster

Why do you have square brackets around some of your statements? Once I removed the square brackets it seemed to run OK. It takes a long string of bases and modifies one of the bases at a random position. I don't understand how to use the array of scores either. If you were to generate an array of scores for the sequence compared to the mutant, you would end up with an array of all zero scores except possibly one non-zero element, because all the bases are the same except one. But I don't understand what the scores mean or how they are used.

d5e5 109 Master Poster

I would read it as "If value of $line starts with Referers followed by optional whitespace characters (zero or more) followed by = followed by optional whitespace characters (zero or more) followed by one or more other characters (note the ? lazy quantifier) followed by optional whitespace characters (zero or more) followed by either ascii character 23 (whatever that is) or the end of the $line string..."

d5e5 109 Master Poster

I have tried numerous times to incorporate all aspects of my script but I am only getting it to shuffle but not the random shuffle and not the mutation. Can you tell me what order I need to use with my subroutines in order to mutate the DNA sequence and then perform the random shuffle and then calculate the z score? I first put the srand expression and then my sequence and then a subroutine to shuffle the sequence. I don't understand how I am supposed to compare the original sequence and the mutated sequence. Thank you for all your help

The script you quoted reads a file into an array called @original, copies @original to @shuffled and then shuffles @shuffled the required number of times (between 10 and 20). You can add logic to the script you quoted to define the rules for assigning z-scores to base pairs constructed from @original and @shuffled. This logic is demonstrated in the script in the post at http://www.daniweb.com/forums/post1380986.html#post1380986. At the end you will have an array of z-scores called @scores. The first score in @scores is determined by the first base-pair in @base-pairs, and so forth. I don't know what you want to do with the @scores array. Maybe just print it? I don't know what you mean when you say the shuffle is not a random shuffle.

d5e5 109 Master Poster

Hi,
Actually i want the mail to be same in all systems.

since i used as /t as as delimeter i was not getting.

i want to use printf but i dont know to use.

what actually i want is for example if i take

first line

phaneesh should come under user

/proj/sw_apps/phaneesh under path

5.7GB under used space

1.20% under used%

Exactly.but as i was using /t iwas unable as its width is differnt from system to system.

so in short i need help

in area that to print 4scalars under 4 headings exactly with good looking

in mail.

try to help.tanq

I haven't tried the sendmail program but I guess if you can get it to line up OK on the display that's a start. Let's try lining up the column headers with the first line of data.

#!/usr/bin/perl
use strict;
use warnings;

printf ("%7s","user");
printf ("%25s","path");
printf ("%22s","Usedspace");
printf ("%23s", "Used%\n");
print '_' x 79, "\n";

#From your desired output, I guess your variables have the following values
my $user1 = 'phaneesh';
my $path1 = '/proj/sw_apps/phaneesh';
my $space1 = '5.7';
my $b = 'GB';
my $usedp1 = '1.20';
my $p = '%';

printf ('%-17s%-20s%13s%2s%21s%s', $user1, $path1, $space1, $b, $usedp1, $p);

This gives the following output:

user                     path             Usedspace                 Used%
_______________________________________________________________________________
phaneesh         /proj/sw_apps/phaneesh          5.7GB                 1.20%

I find http://www.devdaily.com/blog/post/perl/reference-page-perl-printf-formatting-format-cheat-sheet a handy guide to printf.

d5e5 109 Master Poster

I don't know what caused the error message you were getting but if you read the entire file into a string variable, make sure all letters are upper-case, remove all non-letter characters (including spaces and carriage-returns), and split the string into an array it should work OK.

#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(shuffle); #This module includes a method to shuffle arrays.

#Create integer between 10 and 20
my $times_to_shuffle = 10 + int(rand(11));
my $filename = 'Test_sequence.txt';
my $string = slurp_file($filename);
$string = uc($string); #Make sure all letters are upper case
$string =~ s/[^AGCT]//; #Remove all characters that are not A, G, C, or T

my @original = $string =~ m/[AGCT]/g; #Assign all bases to an array
my $base_count = @original; #Count elements in array
print "Number of bases in sequence is $base_count \n";

print "Shuffle sequence $times_to_shuffle times.\n";
my @shuffled = @original; #Copy original array to array to be shuffled.
foreach (1..$times_to_shuffle){
    @shuffled = shuffle(@shuffled);
}

print "@shuffled\n";

sub slurp_file {
    my $file = $_[0];
    local $/;
    open( FH, '<', $file ) or die "Could not open $file ... $!";
    my $text = <FH>;
    return $text;
}

Gives the following output:

Number of bases in sequence is 829 
Shuffle sequence 11 times.
G T A A G A T A A T A A A A G T G T T G T A G C A A G G A T T A G T A T A T A C G C C …
d5e5 109 Master Poster

My sequence is quite large so I was creating a shuffled sequence to the original sequence but I keep on getting a message that there is an "uninitialized value." My sequence has over 1,000 bases.

Can you attach your sequence as a text file to your post? (See the "Manage Attachments" button.)

d5e5 109 Master Poster

Try this modified version of one of dch26's solutions:

#!/usr/bin/perl
use strict;
use warnings;

my $df_stats = qx{df /home/}; #I don't have dir called /compare/ so I used /home/

my @fields=split(/\s+/,$df_stats); #Split on one or more whitespace characters

#Can we assume first 6 fields are column headers?
#If so, instead of the following
#print "available space=$fields[3], used=$fields[4]\n";

#Try this:
my $available = $fields[6+3];
my $used = $fields[6+4];

print "available space=$available, used=$used\n";

Running this on my computer (Linux platform) gives the following output:

available space=6655136, used=66830396
d5e5 109 Master Poster

How about something like this. Make an array of strings representing the base pair combinations of original and shuffled sequences. Then assign a score to each base pair according to the rules which can be represented by a hash.

#!/usr/bin/perl
use strict;
use warnings;

#If a purine was mutated to a purine,
#or a pyrimidine to a pyrimidine assign a value of +1 to that base pair.
#If a purine was mutated to a pyrimidine or vice versa
#assign a value of -1 to that base pair.
#If the base pair did not change assign 0.

my %rules = (
            AA => 0,
            AG => +1,
            AT => -1,
            AC => -1,
            GA => +1,
            GG => 0,
            GT => -1,
            GC => -1,
            TA => -1,
            TG => -1,
            TT => 0,
            TC => +1,
            CA => -1,
            CG => -1,
            CT => +1,
            CC => 0
            );

#Dummy sequence for testing
my @original = qw(C G T T T G T A A A T T G C A T C A A G);
my @shuffled = qw(G T T C A A A G T A G C T T A G A C T T);

my @scores;
my @base_pairs = make_base_pairs(\@original, \@shuffled);
foreach my $bp (@base_pairs){
    push @scores, $rules{$bp}
}
print join("\t", @base_pairs), "\n";
print join("\t", @scores);
sub make_base_pairs{
    my @orig = @{$_[0]}; #de-reference input array
    my @shuf = @{$_[1]}; #de-reference input array
    my $idx = 0;
    my @bps;
    foreach my $base (@orig){
        push @bps, …
d5e5 109 Master Poster

Thanks for the suggestion. I looked through it but I did not see anything. In order to get the same base distribution I was thinking that I had to use the srand expression. I would like to know if I am going in the right direction. Thank you

If you want rand to return the same sequence each time you run your program then using srand with a constant makes sense to me. If you don't explicitly call srand, it is called implicitly each time you run your program using time etc. as arguments so that shuffling can give you different results. (For most purposes, you do want shuffling to give unpredictable results.) I haven't tried it but that's what I read in http://perldoc.perl.org/functions/srand.html.

d5e5 109 Master Poster

You're welcome. I really don't know anything about z-scores as I haven't studied biology for about 40 years.:) You could look for a list of modules on CPAN but I don't know which module would best suit your purpose. Maybe this one?

d5e5 109 Master Poster

You are welcome choosenalpha. Please don't forget to mark this thread solved.

d5e5 109 Master Poster

What I posted above is not quite right. You don't want to start with the original sequence each time you shuffle. You want to replace the original sequence with the result of shuffling and use then shuffle the new sequence, and so on for a random number of repetitions.

Plus, you don't have to write your own shufflearray subroutine. Perl 5.7 and later comes with the List::Util module that you can import and use for its shuffle() function. The revised outline script would look like this.

#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(shuffle); #This module includes a method to shuffle arrays.

#Create integer between 10 and 20
my $times_to_shuffle = 10 + int(rand(21));

#Dummy sequence for testing
my @sequence = qw(C G T T T G T A A A T T G C A T C A A G);

print "Shuffle sequence $times_to_shuffle times.\n";
foreach (1..$times_to_shuffle){
    @sequence = shuffle(@sequence);
    print "@sequence\n";
}

This gives the following output:

Shuffle sequence 17 times.
T C A G A T A A T G A G T T T G C C T A
C T G A A G C T T A A T T G T C T A G A
G G G T C A C T T G A A A A T T C T T A
G A A G A C T G T C T C G T A T T T A A
G T G A …
d5e5 109 Master Poster

I would create the main outline of the logic first. If it needs to call a subroutine, you can create a dummy subroutine (sometimes called a 'stub') and modify it later after figuring out the details of how it will accomplish its task. A subroutine stub consists of the subroutine name plus logic to assign the arguments to its own variables, plus a comment to indicate it is just a stub and the real logic needs to be filled in later.

Your first task, I think, is to calculate the number of times you need to call the shufflearray subroutine and assign this number to a variable. If you haven't already figured out how to extract the FASTA sequence (whatever that is), start with a hardcoded sequence to test your subroutine for the first time. An example of a draft of your main logic might be something like this:

#!/usr/bin/perl
use strict;
use warnings;

#Create integer between 10 and 20
my $times_to_shuffle = 10 + int(rand(21));

#Dummy sequence for testing
my @sequence = qw(C G T T T G T A A A T T G C A T C A A G);
my @shuffled_sequence;

print "Shuffle sequence $times_to_shuffle times.\n";
foreach (1..$times_to_shuffle){
    @shuffled_sequence = shufflearray(\@sequence); #Pass reference to array as argument
    print "@shuffled_sequence\n";
}

sub shufflearray{
    #This sub is just a stub
    #Logic needed to randomly resequence @in an return as @out
    my @in = @{$_[0]}; #Dereference passed array ref
    
    my @out = @in;
    return @out;
}
d5e5 109 Master Poster

Yes.

#!/usr/bin/env python

filename = '/home/david/Programming/Python/data.txt'

f = open(filename)
for line in f:
    if line.startswith('apple'):
        print line
d5e5 109 Master Poster

All your scripts should include the following:

use strict;
use warnings;

These will give error messages or warnings if your script does things that could potentially result in unexpected results. Fix your script according to the error messages or warnings that the strict and warnings modules give.

For example, let's run the first two statements from what you posted:

#!/usr/bin/perl
use strict;
use warnings;

random_int(10, 20);
print "$x\n";

This gives the following output:

Global symbol "$x" requires explicit package name at /home/david/Programming/Perl/temp.pl line 6.
Execution of /home/david/Programming/Perl/temp.pl aborted due to compilation errors.

Also, you print $x but you don't show us any statement assigning a value to $x... so what is $x supposed to contain?

d5e5 109 Master Poster

Please wrap your script (or the perl code you wish to show us) in [CODE]Your program goes here[/CODE] tags.

You say part of the script is giving you "the issue". Can you show us the issue, preferably by giving an example of an input sequence, an example of the output you expect and an example of the unsatisfactory output that you are getting.

d5e5 109 Master Poster

That runs OK for me. I get a full-size blank window with "Windows" title bar. There's no scroll bar so I assume it fits as it should. My platform is Linux so maybe your problem is specific to Windows 7.