I am trying to use a hash set to do a Monte Carlo analysis of the Birthday Paradox.

Here is my pseudo code:
Set number of collisions to zero
Loop (10 to100 by 10)
Loop:
Generate a birthday. (use 1-365)
See if it is already in the set. If it is count as a collision and end the loop.
Add to the set.
Print out the number of collisions as a probability P.

The code is compiling and running fine, but I am getting completely wrong results:

After 100000 tests there were 1 occurrences of shared birthdays in a set of 10 people.
Probability : 1.0E-5
After 100000 tests there were 2 occurrences of shared birthdays in a set of 20 people.
Probability : 2.0E-5
After 100000 tests there were 3 occurrences of shared birthdays in a set of 30 people.
Probability : 3.0E-5
After 100000 tests there were 4 occurrences of shared birthdays in a set of 40 people.
Probability : 4.0E-5
After 100000 tests there were 5 occurrences of shared birthdays in a set of 50 people.
Probability : 5.0E-5
After 100000 tests there were 6 occurrences of shared birthdays in a set of 60 people.
Probability : 6.0E-5
After 100000 tests there were 7 occurrences of shared birthdays in a set of 70 people.
Probability : 7.0E-5
After 100000 tests there were 8 occurrences of shared birthdays in a set of 80 people.
Probability : 8.0E-5
After 100000 tests there were 9 occurrences of shared birthdays in a set of 90 people.
Probability : 9.0E-5
After 100000 tests there were 10 occurrences of shared birthdays in a set of 100 people.
Probability : 1.0E-4

When I should be getting:

After 100000 tests there were 11446 occurrences of shared birthdays in a set of 10 people.
Probability: 0.11446
After 100000 tests there were 41031 occurrences of shared birthdays in a set of 20 people.
Probability: 0.41031
After 100000 tests there were 70455 occurrences of shared birthdays in a set of 30 people.
Probability: 0.70455
After 100000 tests there were 89061 occurrences of shared birthdays in a set of 40 people.
Probability: 0.89061
After 100000 tests there were 97043 occurrences of shared birthdays in a set of 50 people.
Probability: 0.97043
After 100000 tests there were 99454 occurrences of shared birthdays in a set of 60 people.
Probability: 0.99454
After 100000 tests there were 99926 occurrences of shared birthdays in a set of 70 people.
Probability: 0.99926
After 100000 tests there were 99986 occurrences of shared birthdays in a set of 80 people.
Probability: 0.99986
After 100000 tests there were 99999 occurrences of shared birthdays in a set of 90 people.
Probability: 0.99999
After 100000 tests there were 100000 occurrences of shared birthdays in a set of 100 people.
Probability: 1.0

I am getting very frustrated and can't seem to figure out what I am doing wrong. I believe it has something to do with my loops. Any help would be greatly appreciated!!

import java.util.HashSet;
import java.util.Random;
import java.util.Set;


public class BirthdayParadox
{
public static void main(String [] args)
{
    Random random = new Random();

    int collisionCount = 0;

    for(int people = 10; people <= 100; people += 10)
    {
        Set<Integer> birthdays = new HashSet<>(365);
        birthdays.add(random.nextInt(365));
        for(int runs = 0; runs < 100000; runs++)
        {
            int randomBirthday = random.nextInt(365);

            if(birthdays.contains(randomBirthday))
            {
                collisionCount++;
                break;
            }
            birthdays.add(randomBirthday);

        }
        float probability = (float)collisionCount / 100000;

            System.out.println("After 100000 tests there were " +
                               collisionCount + " occurrences of shared " +
                               " birthdays in a set of " + people + " people.");
            System.out.println("Probability : " + probability);
    }
  }  
 }

Quick answer.

On line 30 I see integer division so try 100000.0

Also, print out your variables so you can see what's up.

Changing to 100000.0 gives me
BirthdayParadox.java:30: error: incompatible types: possible lossy conversion from double to float

The whole probability section of my code (the bottom) seems to be working fine. It is something in the loops that I can't seem to get right.

for(int people = 10; people <= 100; people += 10)
    {
        Set<Integer> birthdays = new HashSet<>(365);
        birthdays.add(random.nextInt(365));
        for(int runs = 0; runs < 100000; runs++)
        {
            int randomBirthday = random.nextInt(365);

            if(birthdays.contains(randomBirthday))
            {
                collisionCount++;
                break;
            }
            birthdays.add(randomBirthday);

        }

[edit] ; see invisal's post below for a better explanation of loop problem

100000.0 is a double constant, so the aithmetic is done in double precision, then you assign it to a float and potetially loose precision. Nowdays there's no real reason to use float ever - the precision is rubbish and its no faster than double ona 64 bit processos

Edited 1 Year Ago by JamesCherrill

Not entirely suprirsed that you got this result. You skip the loop as soon as found a collision

        if(birthdays.contains(randomBirthday))
        {
            collisionCount++;
            break; // break from the test as soon as you found the collision
        }

and to be honest, your logic is a mess. Even after you fix this bug, you will still get the wrong answer. I think what you truly want to do is something like this

  public static void main(String[] args)
  {
       // Do 100,000 tests on 10 persons.
       System.out.println(birthDayExperiment(10, 100000));
  }

  public static float birthDayExperiment(int person, int test)
  {
      int count = 0;
      int j = test;

      while(--j >= 0) {
          Random random = new Random();
          Set<Integer> birthdays = new HashSet<>(365);

          for(int i = 0; i < person; i++) {
             int tmp = random.nextInt(365);
             if (birthdays.contains(tmp)) { count++; break; }
             birthdays.add(tmp);
          } 
      }

      return (float)count / (float)test;
   }

Edited 1 Year Ago by invisal

With exactly 365 possibilities, why not keep it really simple and use an array

int[] noOfPeopleWithThisBirthday = new int[365];

loop  number of tries
   noOfPeopleWithThisBirthday[random.nextInt(365)]++

now check array for number of >1 entries etc
Comments
Leap year?

rproffitt: Is this better?

int[] noOfPeopleWithThisBirthday = new int[366];
loop i=0 to number of tries
   noOfPeopleWithThisBirthday[random.nextInt(
    (i%4==0)?365:366)]++
now check array for number of >1 entries etc

(not valid for people over 115 years old; will cease to be valid in 2100)

This question has already been answered. Start a new discussion instead.