I am trying to use a hash set to do a Monte Carlo analysis of the Birthday Paradox.

Here is my pseudo code:
Set number of collisions to zero
Loop (10 to100 by 10)
Loop:
Generate a birthday. (use 1-365)
See if it is already in the set. If it is count as a collision and end the loop.
Print out the number of collisions as a probability P.

The code is compiling and running fine, but I am getting completely wrong results:

After 100000 tests there were 1 occurrences of shared birthdays in a set of 10 people.
Probability : 1.0E-5
After 100000 tests there were 2 occurrences of shared birthdays in a set of 20 people.
Probability : 2.0E-5
After 100000 tests there were 3 occurrences of shared birthdays in a set of 30 people.
Probability : 3.0E-5
After 100000 tests there were 4 occurrences of shared birthdays in a set of 40 people.
Probability : 4.0E-5
After 100000 tests there were 5 occurrences of shared birthdays in a set of 50 people.
Probability : 5.0E-5
After 100000 tests there were 6 occurrences of shared birthdays in a set of 60 people.
Probability : 6.0E-5
After 100000 tests there were 7 occurrences of shared birthdays in a set of 70 people.
Probability : 7.0E-5
After 100000 tests there were 8 occurrences of shared birthdays in a set of 80 people.
Probability : 8.0E-5
After 100000 tests there were 9 occurrences of shared birthdays in a set of 90 people.
Probability : 9.0E-5
After 100000 tests there were 10 occurrences of shared birthdays in a set of 100 people.
Probability : 1.0E-4

When I should be getting:

After 100000 tests there were 11446 occurrences of shared birthdays in a set of 10 people.
Probability: 0.11446
After 100000 tests there were 41031 occurrences of shared birthdays in a set of 20 people.
Probability: 0.41031
After 100000 tests there were 70455 occurrences of shared birthdays in a set of 30 people.
Probability: 0.70455
After 100000 tests there were 89061 occurrences of shared birthdays in a set of 40 people.
Probability: 0.89061
After 100000 tests there were 97043 occurrences of shared birthdays in a set of 50 people.
Probability: 0.97043
After 100000 tests there were 99454 occurrences of shared birthdays in a set of 60 people.
Probability: 0.99454
After 100000 tests there were 99926 occurrences of shared birthdays in a set of 70 people.
Probability: 0.99926
After 100000 tests there were 99986 occurrences of shared birthdays in a set of 80 people.
Probability: 0.99986
After 100000 tests there were 99999 occurrences of shared birthdays in a set of 90 people.
Probability: 0.99999
After 100000 tests there were 100000 occurrences of shared birthdays in a set of 100 people.
Probability: 1.0

I am getting very frustrated and can't seem to figure out what I am doing wrong. I believe it has something to do with my loops. Any help would be greatly appreciated!!

``````import java.util.HashSet;
import java.util.Random;
import java.util.Set;

{
public static void main(String [] args)
{
Random random = new Random();

int collisionCount = 0;

for(int people = 10; people <= 100; people += 10)
{
Set<Integer> birthdays = new HashSet<>(365);
for(int runs = 0; runs < 100000; runs++)
{
int randomBirthday = random.nextInt(365);

if(birthdays.contains(randomBirthday))
{
collisionCount++;
break;
}

}
float probability = (float)collisionCount / 100000;

System.out.println("After 100000 tests there were " +
collisionCount + " occurrences of shared " +
" birthdays in a set of " + people + " people.");
System.out.println("Probability : " + probability);
}
}
}
``````

## All 6 Replies

On line 30 I see integer division so try 100000.0

Also, print out your variables so you can see what's up.

Changing to 100000.0 gives me
BirthdayParadox.java:30: error: incompatible types: possible lossy conversion from double to float

The whole probability section of my code (the bottom) seems to be working fine. It is something in the loops that I can't seem to get right.

``````for(int people = 10; people <= 100; people += 10)
{
Set<Integer> birthdays = new HashSet<>(365);
for(int runs = 0; runs < 100000; runs++)
{
int randomBirthday = random.nextInt(365);

if(birthdays.contains(randomBirthday))
{
collisionCount++;
break;
}

}
``````

 ; see invisal's post below for a better explanation of loop problem

100000.0 is a double constant, so the aithmetic is done in double precision, then you assign it to a float and potetially loose precision. Nowdays there's no real reason to use float ever - the precision is rubbish and its no faster than double ona 64 bit processos

Not entirely suprirsed that you got this result. You skip the loop as soon as found a collision

``````        if(birthdays.contains(randomBirthday))
{
collisionCount++;
break; // break from the test as soon as you found the collision
}
``````

and to be honest, your logic is a mess. Even after you fix this bug, you will still get the wrong answer. I think what you truly want to do is something like this

``````  public static void main(String[] args)
{
// Do 100,000 tests on 10 persons.
System.out.println(birthDayExperiment(10, 100000));
}

public static float birthDayExperiment(int person, int test)
{
int count = 0;
int j = test;

while(--j >= 0) {
Random random = new Random();
Set<Integer> birthdays = new HashSet<>(365);

for(int i = 0; i < person; i++) {
int tmp = random.nextInt(365);
if (birthdays.contains(tmp)) { count++; break; }
}
}

return (float)count / (float)test;
}
``````

With exactly 365 possibilities, why not keep it really simple and use an array

``````int[] noOfPeopleWithThisBirthday = new int[365];

loop  number of tries
noOfPeopleWithThisBirthday[random.nextInt(365)]++

now check array for number of >1 entries etc
``````
commented: Leap year? +6

rproffitt: Is this better?

``````int[] noOfPeopleWithThisBirthday = new int[366];
loop i=0 to number of tries
noOfPeopleWithThisBirthday[random.nextInt(
(i%4==0)?365:366)]++
now check array for number of >1 entries etc
``````

(not valid for people over 115 years old; will cease to be valid in 2100)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.