There are only 365 days in a year (excluding leap year). Given that there are about 7.4 billion people on earth, this means that there are approximately 20 million people with the same birthday on any given day. You just divide 7,400,000,000 by 365 and you get 20 million. Happy Birthday to all 20 million people celebrating their birthday today!
Suppose you’re in a crowd, on a bus, in a restaurant, or stadium. There is a big chance you might be standing next to a person with the same birthday as you.
In fact, you only need about 23 people to have a 50/50 chance of two people having the same birthday! This may sound unbelievable since there are 365 days in a year but you only need 23 people to have a 50% chance of 2 people with the same birthday. How come?
This is called the Birthday Paradox and is very important in digital security, especially the password security.
Probability is all about counting the possibilities. Let’s make it simple by using a dice as an example. We all know what a dice looks like.
When a balanced dice is thrown, it can land showing any one of its six sides. We refer to the result of throwing a dice as an outcome and we say that a dice has 6 possible outcomes. If a dice is balanced, every side is equally likely to show up. We define the probability of a face showing up as the number of times that face occurs in the possible outcomes divided by the total number of possible outcomes. For example, out of the 6 possible outcomes, the number “1” occurs only once. Since there are 6 possible outcomes, the probability of getting a 1 is, therefore:
Adding another Dice
Let’s add a second dice. To identify our two dice, let’s call one of them Dice A and the other Dice B. Let’s throw the dice together. When they land, dice A and dice B will show numbers. For this scenario, an outcome is now defined as the numbers that Dice A and Dice B show when they land. A possible outcome is Dice A shows a 1 and Dice B shows a 2. We can give this outcome a name and call it 1,2. We should remind ourselves that the first number is the result of Dice A and the second number is the result of Dice B. We can also refer to each outcome as a combination.
Here are the possible outcomes that the two dice will show:
If you count the number of combinations above, you’ll get 36. The reason it’s 36 is because dice A has 6 different outcomes and dice B has 6 different outcomes. Multiplying them together gives .
If you add a third dice, say dice C, the total number of combinations becomes:
In general, for N dice, the total number of combinations is
How many combinations have at least 2 same numbers?
Since there are only 2 numbers for each combination, this question is also the same as “How many combinations show the same numbers?”. If you look at the diagonal, these are the combinations that have the same number for Dice A and Dice B.
If you count them, you’ll get 6. Therefore, the probability of getting at least two equal numbers (in our 2-Dice system) is
How many combinations show different numbers?
If you count all combinations outside the diagonal, you’ll get 30. Therefore, the probability of getting two different numbers is
Notice that the probability of getting at least 2 same numbers PLUS the probability of getting different numbers is equal to 1:
6/36 + 30/36 = 36/36 = 1
Knowing One gives you the other
If we know the probability of getting different numbers (30/36), then we can compute the probability of getting at least 2 same numbers simply by subtracting it from 1:
Avoid counting manually
When we counted the number of combinations which show different numbers, we counted it with our fingers. There is another way to count which is by doing it mentally. Since we are counting the number of ways that the 2-Dice system will show different numbers, we start by getting Dice A and asking how many different ways Dice A can land so that the number it shows is not equal to the number shown by Dice B. Since we have not yet thrown Dice B, then Dice A is allowed to show any number when it lands. This means there are 6 possible ways for Dice A to do this.
Number of ways Dice A can land = 6
Whatever number results in throwing Dice A, we cannot allow Dice B to have that number. This means that Dice B can only choose from 5 other numbers different from the one chosen by Dice A.
Number of ways Dice B can land = 5
If we multiply them, we get the number of combinations that Dice A and Dice B can land with different numbers:
6*5 = 30
This agrees with our manual counting.
At this point, pause and take note that the probability of getting at least 2 numbers the same for a 2-Dice system is 0.167. If we add more dice, this probability will increase. The question then is
How many dice do we need to throw so that the probability of getting 2 dice showing the same number is at least 50%?
Our 2-Dice example above shows that the probability of at least 2 dices showing the same number is 0.167, which is less than 50%. Let’s add a third dice and compute the probability.
How to compute the probability?
Let’s follow the pattern for the 2-Dice system. Since there are now 3 dice, the number of ways to get all numbers different is:
The total number of combinations of a 3-Dice system is
Therefore, the probability of getting at least 2 dice with the same number is
This is still less than 50%.
Adding a 4th Dice
Let’s now add a 4th dice and compute the probability using the same pattern:
This is greater than 50%! So the answer is we need 4 dice thrown so that the probability of getting at least 2 dice with the same number is at least 50%.
The general formula for the probability for a k-Dice system is:
How does this relate to the Birthday Problem?
Now that we have the foundations, it’s easy to translate Dice to people and numbers to birthdays. In our dice example, there are 6 different numbers (faces) per dice. Translating this to birthdays, each person can have 365 possible birthdays since there are 365 days in a year (not including leap year).
This is the analogy:
Dice -> 6 possible faces
Person -> 365 possible birthdays
We want to compute how many random persons we need so that the probability of at least two persons having the same birthday is at least 50%. Let k be the number of random persons. Following the same pattern as the Dice example, the formula to compute the probability, given k persons, is:
If we compute starting from k=1 to k=30, we can construct the following table:
probability 1 0.000000000 2 0.002739726 3 0.008204166 4 0.016355912 5 0.027135574 6 0.040462484 7 0.056235703 8 0.074335292 9 0.094623834 10 0.116948178 11 0.141141378 12 0.167024789 13 0.194410275 14 0.223102512 15 0.252901320 16 0.283604005 17 0.315007665 18 0.346911418 19 0.379118526 20 0.411438384 21 0.443688335 22 0.475695308 23 0.507297234 24 0.538344258 25 0.568699704 26 0.598240820 27 0.626859282 28 0.654461472 29 0.680968537 30 0.706316243
Below is the graph of the same data where we indicate at what number of persons the graph is greater than or equal to 50%. When the number of persons becomes 23, there is already a 50% chance that at least 2 of them have the same birthday!