Birthday paradox problem discusses that in a group of randomly chosen people, what is the probability that two or more people will share their birthdays? Wikipedia has got an extensive mathematical treatment of the problem and it surely by itself is worth a read. But somewhat quite un-intuitive result is if there are about 25 people in a room, the probability that at-least two of them will share their birthday is about half. The result might sound very surprising, but is indeed true.
Let's say we want to verify that this result is indeed true? How can we go about it?
What if we run an experiment where we randomly select a group of N people many times and actually find out how many of those times the above particular assertion holds true. Is that solution going to be close to actual mathematical probability?
How do you find birthdays for individuals? It's actually pretty simple if we ignore the complexity associated with leap years, and choose random numbers between 1 and 365, we can call that number as a birthday of a person. So more concretely, let's find out random birthdays of 10 persons using Python.
import random birthdays = [random.randint(1,365) for i in range(10)] print (birthdays)
Let's say we want to run this experiment for 1000 times, so essentially 1000 times, we are randomly finding birthdays of 10 persons. This can be done as follows
import random for i in range(1000): birthdays = [random.randint(1,365) for i in range(10)]
Calculating Empirical Probability
We need to now find out, if we run the experiment large number of times how many of those number of times, we actually have 2 or more birthdays that are same. This problem can be stated as follows, if for N people, if number of unique birthdays is N, that means no two people share the birthday. These become results of a single run. To find out empirical probability, we will have a large number of such runs and find out how many of those throw up duplicate birthdays. This can be done in python simply as follows
import random def are_birthdays_repeated(N): birthdays = [random.randint(1, 365) for i in range(10) unique_birthdays = set(birthdays) if len(unique_birthdays) == len(birthdays): return False return True
Now We can run this experiment for some really large number of runs and calculated unique birthdays in those runs and thus empirical probability.
def calc_empirical_prob(num_of_pepole, runs=1000): repeated_birthdays_runs = 0 for i in range(runs): if are_birthdays_repeated(num_of_people): repeated_birthdays_runs += 1 empirical_probability = repeated_birthdays_runs / runs return empirical_prbability
We can conclude this by finding such probabilities for number of peoples in a room from 2 to 100 and we'll observe the results and see that these probabilities are indeed very close to what are the actual probabilities computed using mathematical formula.
The final code could look like -
import random def are_birthdays_repeated(N): birthdays = [random.randint(1, 365) for i in range(N)] unique_birthdays = set(birthdays) if len(unique_birthdays) == len(birthdays): return False return True def calc_empirical_prob(num_of_people, runs=1000): repeated_birthdays_runs = 0 for i in range(runs): if are_birthdays_repeated(num_of_people): repeated_birthdays_runs += 1 empirical_probability = repeated_birthdays_runs / runs return empirical_probability for i in range(2, 100): print(i, calc_empirical_prob(i,runs=1000)
We have looked at a very simple problem and calculated empirical probabilities. Shown below are results from one such run.