I know next to nothing about statistics so tried Statistics::Basic because it seemed easy to use. Then I tried Statistics::Descriptive because it has functions that Statistics::Basic lacks. What I did not expect was to get a different result when calculating Standard Deviation using Statistics::Descriptive than when using the other module or a subroutine copied form Yahoo Answers. Am I doing something wrong, or does Statistics::Descriptive have a deviant way of calculating deviations?

#!/usr/bin/perl
use strict;
use warnings;

use Statistics::Basic qw(:all);
use Statistics::Descriptive;

my @d = (5,10,5,100,150);
print 'StdDev according to Basic is ', stddev(@d), "\n"; #Basic

my $stat = Statistics::Descriptive::Full->new();
$stat->add_data(@d);
print 'StdDev according to Descriptive is ', $stat->standard_deviation(), "\n"; #Descriptive

print 'StdDev according to subroutine is ', standard_deviation(@d) . "\n";

sub standard_deviation {
    my (@numbers) = @_;

    #Prevent division by 0 error in case you get junk data
    return undef unless ( scalar(@numbers) );

    # Step 1, find the mean of the numbers
    my $total1 = 0;
    foreach my $num (@numbers) {
        $total1 += $num;
    }
    my $mean1 = $total1 / ( scalar @numbers );

    # Step 2, find the mean of the squares of the differences
    # between each number and the mean
    my $total2 = 0;
    foreach my $num (@numbers) {
        $total2 += ( $mean1 - $num )**2;
    }
    my $mean2 = $total2 / ( scalar @numbers );

    # Step 3, standard deviation is the square root of the
    # above mean
    my $std_dev = sqrt($mean2);
    return $std_dev;
}

Gives the following output:

StdDev according to Basic is 60.12
StdDev according to Descriptive is 67.2123500556259
StdDev according to subroutine is 60.1165534607565

After further googling I found a post titled Perl Standard Deviation function is wrong that explains that there are at least two ways of calculating standard deviation which give noticeably different results for small data lists such as my examples use.

Conclusion: the values for standard deviation calculated by Statistics::Basic and Statistics::Descriptive differ for small data sets but this doesn't mean either value is wrong. What statistics module you use for calculating standard deviation depends on what calculation method you, your colleagues and your boss agree on.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.