I am trying to use a hash set to do a Monte Carlo analysis of the Birthday Paradox.

Here is my pseudo code:
Set number of collisions to zero
Loop (10 to100 by 10)
Loop:
Generate a birthday. (use 1-365)
See if it is already in the set. If it is count as a collision and end the loop.
Add to the set.
Print out the number of collisions as a probability P.

The code is compiling and running fine, but I am getting completely wrong results:

After 100000 tests there were 1 occurrences of shared birthdays in a set of 10 people.
Probability : 1.0E-5
After 100000 tests there were 2 occurrences of shared birthdays in a set of 20 people.
Probability : 2.0E-5
After 100000 tests there were 3 occurrences of shared birthdays in a set of 30 people.
Probability : 3.0E-5
After 100000 tests there were 4 occurrences of shared birthdays in a set of 40 people.
Probability : 4.0E-5
After 100000 tests there were 5 occurrences of shared birthdays in a set of 50 people.
Probability : 5.0E-5
After 100000 tests there were 6 occurrences of shared birthdays in a set of 60 people.
Probability : 6.0E-5
After 100000 tests there were 7 occurrences of shared birthdays in a set of 70 people.
Probability : 7.0E-5
After 100000 tests there were 8 occurrences of shared birthdays in a set of 80 people.
Probability : 8.0E-5
After 100000 tests there were 9 occurrences of shared birthdays in a set of 90 people.
Probability : 9.0E-5
After 100000 tests there were 10 occurrences of shared birthdays in a set of 100 people.
Probability : 1.0E-4

When I should be getting:

After 100000 tests there were 11446 occurrences of shared birthdays in a set of 10 people.
Probability: 0.11446
After 100000 tests there were 41031 occurrences of shared birthdays in a set of 20 people.
Probability: 0.41031
After 100000 tests there were 70455 occurrences of shared birthdays in a set of 30 people.
Probability: 0.70455
After 100000 tests there were 89061 occurrences of shared birthdays in a set of 40 people.
Probability: 0.89061
After 100000 tests there were 97043 occurrences of shared birthdays in a set of 50 people.
Probability: 0.97043
After 100000 tests there were 99454 occurrences of shared birthdays in a set of 60 people.
Probability: 0.99454
After 100000 tests there were 99926 occurrences of shared birthdays in a set of 70 people.
Probability: 0.99926
After 100000 tests there were 99986 occurrences of shared birthdays in a set of 80 people.
Probability: 0.99986
After 100000 tests there were 99999 occurrences of shared birthdays in a set of 90 people.
Probability: 0.99999
After 100000 tests there were 100000 occurrences of shared birthdays in a set of 100 people.
Probability: 1.0

I am getting very frustrated and can't seem to figure out what I am doing wrong. I believe it has something to do with my loops. Any help would be greatly appreciated!!

import java.util.HashSet;
import java.util.Random;
import java.util.Set;


public class BirthdayParadox
{
public static void main(String [] args)
{
    Random random = new Random();

    int collisionCount = 0;

    for(int people = 10; people <= 100; people += 10)
    {
        Set<Integer> birthdays = new HashSet<>(365);
        birthdays.add(random.nextInt(365));
        for(int runs = 0; runs < 100000; runs++)
        {
            int randomBirthday = random.nextInt(365);

            if(birthdays.contains(randomBirthday))
            {
                collisionCount++;
                break;
            }
            birthdays.add(randomBirthday);

        }
        float probability = (float)collisionCount / 100000;

            System.out.println("After 100000 tests there were " +
                               collisionCount + " occurrences of shared " +
                               " birthdays in a set of " + people + " people.");
            System.out.println("Probability : " + probability);
    }
  }  
 }

Recommended Answers

Quick answer.

On line 30 I see integer division so try 100000.0

Also, print out your variables so you can see what's up.

Jump to Post

Not entirely suprirsed that you got this result. You skip the loop as soon as found a collision

        if(birthdays.contains(randomBirthday))
        {
            collisionCount++;
            break; // break from the test as soon as you found the collision
        }

and to be honest, your logic is a mess. Even …

Jump to Post

All 6 Replies

Quick answer.

On line 30 I see integer division so try 100000.0

Also, print out your variables so you can see what's up.

Changing to 100000.0 gives me
BirthdayParadox.java:30: error: incompatible types: possible lossy conversion from double to float

The whole probability section of my code (the bottom) seems to be working fine. It is something in the loops that I can't seem to get right.

for(int people = 10; people <= 100; people += 10)
    {
        Set<Integer> birthdays = new HashSet<>(365);
        birthdays.add(random.nextInt(365));
        for(int runs = 0; runs < 100000; runs++)
        {
            int randomBirthday = random.nextInt(365);

            if(birthdays.contains(randomBirthday))
            {
                collisionCount++;
                break;
            }
            birthdays.add(randomBirthday);

        }

[edit] ; see invisal's post below for a better explanation of loop problem

100000.0 is a double constant, so the aithmetic is done in double precision, then you assign it to a float and potetially loose precision. Nowdays there's no real reason to use float ever - the precision is rubbish and its no faster than double ona 64 bit processos

Not entirely suprirsed that you got this result. You skip the loop as soon as found a collision

        if(birthdays.contains(randomBirthday))
        {
            collisionCount++;
            break; // break from the test as soon as you found the collision
        }

and to be honest, your logic is a mess. Even after you fix this bug, you will still get the wrong answer. I think what you truly want to do is something like this

  public static void main(String[] args)
  {
       // Do 100,000 tests on 10 persons.
       System.out.println(birthDayExperiment(10, 100000));
  }

  public static float birthDayExperiment(int person, int test)
  {
      int count = 0;
      int j = test;

      while(--j >= 0) {
          Random random = new Random();
          Set<Integer> birthdays = new HashSet<>(365);

          for(int i = 0; i < person; i++) {
             int tmp = random.nextInt(365);
             if (birthdays.contains(tmp)) { count++; break; }
             birthdays.add(tmp);
          } 
      }

      return (float)count / (float)test;
   }

With exactly 365 possibilities, why not keep it really simple and use an array

int[] noOfPeopleWithThisBirthday = new int[365];

loop  number of tries
   noOfPeopleWithThisBirthday[random.nextInt(365)]++

now check array for number of >1 entries etc
commented: Leap year? +6

rproffitt: Is this better?

int[] noOfPeopleWithThisBirthday = new int[366];
loop i=0 to number of tries
   noOfPeopleWithThisBirthday[random.nextInt(
    (i%4==0)?365:366)]++
now check array for number of >1 entries etc

(not valid for people over 115 years old; will cease to be valid in 2100)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of 1.21 million developers, IT pros, digital marketers, and technology enthusiasts learning and sharing knowledge.