Birthday Paradox and Cracking Passwords

Now that we know the basics of the Birthday Problem, we can use this knowledge to understand the security of password hashing.

In the early days, passwords were stored in the server “as-is”. This means that if your username was juan and your password is Password123! then that information is stored in the server like this:

juan,Password123!

This is information is usually stored in a password file. If this password file is stolen, then it’s easy for another person to use this information and log in using your username and password and impersonate you.

Since the theft of a password file is harder to prevent, the passwords are not anymore stored “as-is” (also known as clear-text). The server will apply an algorithm to the original password which outputs a text called a hash. The algorithm is called a hash function. The hash is what’s put in the password file. A thief in possession of the password file will not be able to know the original password just by looking at it.

For example, the information above will now look like this:

juan,2c103f2c4ed1e59c0b4e2e01821770fa

where “2c103f2c4ed1e59c0b4e2e01821770fa” is the has value of the password “Password123!“.

So when you log in to the server using your password “Password123!”, the server will then run an algorithm that will hash your password and compare the result of the hashing to the one stored in the server, say “2c103f2c4ed1e59c0b4e2e01821770fa”. If they match, then it means that your password was correct and you are given access to the server.

The hash function I’m using is called the MD5 hash function. Given a password, it will produce a hash value. The set of all hash values is not infinite. In fact, the number of possible hash values is 2^{128} for md5. Due to this restriction, the birthday paradox will apply.

How the Birthday Paradox Applies

The birthday paradox tells us that given a hash function f(x), the probability that at least two passwords hash to the same value is given by:

\displaystyle 1-\frac{N\times N-1\times N-2\times \ldots \times N-k+1}{N^k}

Since md5 hash function has N=2^{128} possible values, the probability that two passwords hash to the same value is

\displaystyle 1-\frac{2^{128}\times 2^{128}-1\times 2^{128}-2\times \ldots \times 2^{128}-k+1}{(2^{128})^k}

We want to compute for k so that this probability is at least 50%.

\displaystyle 1-\frac{2^{128}\times 2^{128}-1\times 2^{128}-2\times \ldots \times 2^{128}-k+1}{(2^{128})^k} \ge 0.5

which is equivalent to

\displaystyle \frac{2^{128}\times 2^{128}-1\times 2^{128}-2\times \ldots \times 2^{128}-k+1}{(2^{128})^k} < 0.5

Computing for k when N is large is hard so we need to approximate this. To that end, we need some tools to help us.

We can write the probability in the following way:

\displaystyle 1-\frac{N}{N}\times\frac{N-1}{N}\times\frac{N-2}{N}\times\frac{N-3}{N}\times\ldots\times\frac{N-k+1}{N}
= \displaystyle 1-\frac{N}{N}\times (1-\frac{1}{N})\times (1-\frac{2}{N})\times (1-\frac{3}{N}) \times\ldots\times (1-\frac{k-1}{N})

Since N is large, the quantities

\displaystyle \frac{1}{N}, \frac{2}{N}, \frac{3}{N}, \frac{k-1}{N}

are very small. Because of this, we can use the approximation

e^{-x} \approx 1-x

The above approximation comes from the Taylor expansion of e^{-x}:

\displaystyle e^{-x} = 1 - x + \frac{x^2}{2!} - \frac{x^3}{3!} + \frac{x^4}{4!} \ldots

If x is small, the higher order terms like x^2, x^3, x^4, \ldots vanish. Using this approximation, we can write the probability as:

\displaystyle \frac{N}{N}\times (1-\frac{1}{N})\times (1-\frac{2}{N})\times (1-\frac{3}{N}) \times\ldots\times (1-\frac{k-1}{N})

\displaystyle = e^{-\frac{1}{N}}\cdot e^{-\frac{2}{N}}\cdot e^{-\frac{3}{N}}\cdot \ldots\cdot e^{-\frac{k-1}{N}}

\displaystyle = e^{-\frac{1+2+3+4+\ldots + k-1}{N}}

Since

\displaystyle \sum_1^n j = 1+2+3+4+ \ldots + n = \frac{n(n+1)}{2}

we have

e^{-\frac{1+2+3+4+\ldots + k-1}{N}} = e^{-k\cdot (k-1)/2N }

Computing for k

Let’s compute k so that

\displaystyle e^{-k\cdot (k-1)/2N} < 0.5

Taking the natural logarithms of both sides

\displaystyle \ln e^{-k\cdot (k-1)/2N} < \ln 0.5

\displaystyle \frac{-k\cdot (k-1)}{2N} < \ln 0.5

\displaystyle k^2 - k + 2N\ln 0.5 > 0

Using the quadratic equation, we can solve for k:

\displaystyle k > \frac{-(-1) \pm \sqrt{(-1)^2 -4(1)(2N\ln 0.5}}{2}
\displaystyle k > \frac{1 \pm \sqrt{1-8N\ln 0.5}}{2}

When N=2^{128}, we have

\displaystyle k > \frac{1 \pm 4.343876e+19}{2} \approx 10^{19}

This is about 10 quintillion. What this means is that when k > 10^{19}, there is already a 50% chance that 2 passwords hash to the same value. In fact, the md5 was already cracked in 2004.

Advertisements

Published by

Bobby Corpus

Loves anything related to Mathematics, Physics, Computing and Economics.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s