Have been studying Reservoir sampling for a couple of days , What I'v tried here is to draw a uniformly random sample of size 3 from a bigger data (the English alphabet) via reservoir sampling.
Below is what i'v come up with. if anyone could review my code and offer some suggestions , would be great.

here's the code :

import java.util.TreeMap;

public class test1 {
    public test1(){
        String[] data = "A B C D E F G H I J K L M O P".split(" ");
        double[] freq = new double[data.length];
        int k = 3;
        String[] sample = new String[k];

        // the tree map stores the count of appearance of each letter during
        // the total sampling procedure
        TreeMap<String, Double> dmap = new TreeMap<String, Double>();
        for (int index = 0; index < data.length; index++) {
            dmap.put(data[index], 0.0);
        }

        for (int loop = 0; loop < 1000000; loop++) {
            int i = 0;
            // initiate the sample
            while (i < k) {
                sample[i] = data[i++];
            }
            // start sampling from the reservoir
            for (; i < data.length; i++) {
                int r = (int) (Math.random() * (i + 1));
                if (r < k) {
                    sample[r] = data[i];
                }
            }
            // update the count of each entry in the map
            for (String s : sample) {
                dmap.put(s, dmap.get(s) + 1);
            }
        }
        int index = 0;
        for (Double s : dmap.values().toArray(new Double[dmap.size()])) {
            freq[index++] = s;
            // System.out.println(s);
        }
        // calculate statistics
        double mean, stddev , cov;
        mean = stats.mean(freq);
        stddev = stats.stddev(freq);
        cov = (stddev / mean) * 100;
        System.out.println(" mean                   : " + mean);
        System.out.println(" coeff of variation(%)  : " + cov);     }

    public static void main(String[] args) {
        new test1();
    }
}

and this is my stats class:

final class stats {

    private stats() {}

    public static double sum(double[] a) {
        double sum = 0.0;
        for (int i = 0; i < a.length; i++) {
            sum += a[i];
        }
        return sum;
    }

    public static double mean(double[] a) {
        if (a.length == 0)
            return Double.NaN;
        double sum = sum(a);
        return (double) sum / a.length;
    }

    public static double stddev(double[] a) {
        return Math.sqrt(var(a));
    }

    public static double var(double[] a) {
        if (a.length == 0)
            return Double.NaN;
        double avg = mean(a);
        double sum = 0.0;
        for (int i = 0; i < a.length; i++) {
            sum += (a[i] - avg) * (a[i] - avg);
        }
        return sum / (a.length);
    }
}

my output :

 mean                   : 200000.0
 coeff of variation(%)  : 0.18734211130086761

I have no major comments, but

There a some loops that could use the simpler enhanced for loop syntax, eg

for (int index = 0; index < data.length; index++) {
    dmap.put(data[index], 0.0);
}

vs

for (String s : data) {
    dmap.put(s, 0.0);
}

Instead of
for (Double s : dmap.values().toArray(new Double[dmap.size()])) {
you just need
for (Double s : dmap.values()) {
because the enhanced for can be used wuth any Collection.

The normal convention is that class names should start with a Capital letter.

I like your commenting style.

Edited 3 Years Ago by JamesCherrill

For some reason , when i tried for (Double s : dmap.values()) { eclipse was giving me a red dot indicating an error.Hence the long version. Perhaps it needed a refersh or something. but i tried again now, works just fine.

I like your commenting style.

thank you :)

The normal convention is that class names should start with a Capital letter.

i guess i got lazy there.

a question : how can i sort a hashmap ? i searched a bit , everyone suggested using a treemap , which is the reason i used them here. but TreeMap sorts the map in every entry , which i feel is a waste for this particular application. i'd rather sort it all at the end. Is that a good option ? if so , how to do it ?
ps : there were a few complicated looking solutions to this , but i really didnt feel at home with for some reason. Any simple solution ?

Edited 3 Years Ago by somjit{}

This article has been dead for over six months. Start a new discussion instead.