Have been studying Reservoir sampling for a couple of days , What I'v tried here is to draw a uniformly random sample of size 3 from a bigger data (the English alphabet) via reservoir sampling.
Below is what i'v come up with. if anyone could review my code and offer some suggestions , would be great.

here's the code :

``````import java.util.TreeMap;

public class test1 {
public test1(){
String[] data = "A B C D E F G H I J K L M O P".split(" ");
double[] freq = new double[data.length];
int k = 3;
String[] sample = new String[k];

// the tree map stores the count of appearance of each letter during
// the total sampling procedure
TreeMap<String, Double> dmap = new TreeMap<String, Double>();
for (int index = 0; index < data.length; index++) {
dmap.put(data[index], 0.0);
}

for (int loop = 0; loop < 1000000; loop++) {
int i = 0;
// initiate the sample
while (i < k) {
sample[i] = data[i++];
}
// start sampling from the reservoir
for (; i < data.length; i++) {
int r = (int) (Math.random() * (i + 1));
if (r < k) {
sample[r] = data[i];
}
}
// update the count of each entry in the map
for (String s : sample) {
dmap.put(s, dmap.get(s) + 1);
}
}
int index = 0;
for (Double s : dmap.values().toArray(new Double[dmap.size()])) {
freq[index++] = s;
// System.out.println(s);
}
// calculate statistics
double mean, stddev , cov;
mean = stats.mean(freq);
stddev = stats.stddev(freq);
cov = (stddev / mean) * 100;
System.out.println(" mean                   : " + mean);
System.out.println(" coeff of variation(%)  : " + cov);     }

public static void main(String[] args) {
new test1();
}
}
``````

and this is my stats class:

``````final class stats {

private stats() {}

public static double sum(double[] a) {
double sum = 0.0;
for (int i = 0; i < a.length; i++) {
sum += a[i];
}
return sum;
}

public static double mean(double[] a) {
if (a.length == 0)
return Double.NaN;
double sum = sum(a);
return (double) sum / a.length;
}

public static double stddev(double[] a) {
return Math.sqrt(var(a));
}

public static double var(double[] a) {
if (a.length == 0)
return Double.NaN;
double avg = mean(a);
double sum = 0.0;
for (int i = 0; i < a.length; i++) {
sum += (a[i] - avg) * (a[i] - avg);
}
return sum / (a.length);
}
}
``````

my output :

`````` mean                   : 200000.0
coeff of variation(%)  : 0.18734211130086761
``````

I have no major comments, but

There a some loops that could use the simpler enhanced for loop syntax, eg

``````for (int index = 0; index < data.length; index++) {
dmap.put(data[index], 0.0);
}
``````

vs

``````for (String s : data) {
dmap.put(s, 0.0);
}
``````

`for (Double s : dmap.values().toArray(new Double[dmap.size()])) {`
you just need
`for (Double s : dmap.values()) {`
because the enhanced for can be used wuth any Collection.

The normal convention is that class names should start with a Capital letter.

I like your commenting style.

For some reason , when i tried `for (Double s : dmap.values()) {` eclipse was giving me a red dot indicating an error.Hence the long version. Perhaps it needed a refersh or something. but i tried again now, works just fine.

I like your commenting style.

thank you :)

The normal convention is that class names should start with a Capital letter.

i guess i got lazy there.

a question : how can i sort a hashmap ? i searched a bit , everyone suggested using a treemap , which is the reason i used them here. but TreeMap sorts the map in every entry , which i feel is a waste for this particular application. i'd rather sort it all at the end. Is that a good option ? if so , how to do it ?
ps : there were a few complicated looking solutions to this , but i really didnt feel at home with for some reason. Any simple solution ?