Quantum Searching

Imagine a shuffled deck of 52 cards, and you are asked to find the ace of spades. The most natural thing to do is to take the top most card, see if it’s the ace of spades. If not, put it aside, take the top most and repeat the process until you find the ace of spades. If you are lucky, the ace of spades is the top most then we’re done. If you’re not so lucky, the ace of spades is at the bottom and it will take you 52 peeks before you find the card.

If we scatter the cards face down on the floor and randomly pick a card, then on the average, we will need 52/2 peeks before we can find the ace of spades.

With quantum computing we can do even better!

For the sake of demonstration, let’s say we have 8 cards as shown in the figure below. We want to find the ace of spades.

Here are the steps:

1. Label each card from 0 to 7 in random order. The positions of the cards will represent the states of cards. The state \left|6\right\rangle will therefore represent the ace of spades.

2. Let \left|\phi\right\rangle be the superposition of states:

\displaystyle \left|\phi\right\rangle =  \frac{1}{2^{n/2}} \sum_{k=0}^{2^n-1} \left|x\right\rangle = \frac{1}{2^{n/2}} \Big( \left|0\right\rangle + \ldots + \left|7\right\rangle \Big) = \frac{1}{\sqrt{8}} \left[ \begin{array}{c}  1\\  1\\  1\\  1\\  1\\  1\\  1\\  1  \end{array} \right]

3. Let f be a function such that

f(x) = \begin{cases}  1 & x \text{ corresponds to ace of spades}\\  0 & \text{ otherwise}  \end{cases}

Define the operator \mathbf{V}

\mathbf{V} = \mathbf{I} - 2\left|a\right\rangle \left\langle a\right|

where a is the unique value where

f(a) = 1

In our example, the value of a is 6. Therefore,

\mathbf{V} = \begin{bmatrix}    1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \\     0.00 & 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \\     0.00 & 0.00 & 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \\     0.00 & 0.00 & 0.00 & 1.00 & 0.00 & 0.00 & 0.00 & 0.00 \\     0.00 & 0.00 & 0.00 & 0.00 & 1.00 & 0.00 & 0.00 & 0.00 \\     0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 1.00 & 0.00 & 0.00 \\     0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & -1.00 & 0.00 \\     0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 1.00 \\     \end{bmatrix}

Observe that the matrix element \mathbf{V}_{6,6} = -1

4. Define the operator W:

\mathbf{W} = 2\left|\phi\right\rangle \left\langle\phi\right| - \mathbf{I} = \begin{bmatrix}    -0.75 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 \\     0.25 & -0.75 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 \\     0.25 & 0.25 & -0.75 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 \\     0.25 & 0.25 & 0.25 & -0.75 & 0.25 & 0.25 & 0.25 & 0.25 \\     0.25 & 0.25 & 0.25 & 0.25 & -0.75 & 0.25 & 0.25 & 0.25 \\     0.25 & 0.25 & 0.25 & 0.25 & 0.25 & -0.75 & 0.25 & 0.25 \\     0.25 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 & -0.75 & 0.25 \\     0.25 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 & -0.75 \\     \end{bmatrix}

5. Compute the operator \mathbf{WV}:

\mathbf{WV} = \begin{bmatrix}    -0.75 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 & -0.25 & 0.25 \\     0.25 & -0.75 & 0.25 & 0.25 & 0.25 & 0.25 & -0.25 & 0.25 \\     0.25 & 0.25 & -0.75 & 0.25 & 0.25 & 0.25 & -0.25 & 0.25 \\     0.25 & 0.25 & 0.25 & -0.75 & 0.25 & 0.25 & -0.25 & 0.25 \\     0.25 & 0.25 & 0.25 & 0.25 & -0.75 & 0.25 & -0.25 & 0.25 \\     0.25 & 0.25 & 0.25 & 0.25 & 0.25 & -0.75 & -0.25 & 0.25 \\     0.25 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 & 0.75 & 0.25 \\     0.25 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 & -0.25 & -0.75 \\     \end{bmatrix}

6. Multiply \mathbf{WV} to \left|\phi\right\rangle a number of times, about \pi/4 \cdot 2^{n/2} = \pi/4 \cdot \sqrt{8} = 2.2 \approx 2:

\mathbf{WV}\left|\phi\right\rangle = \begin{bmatrix}    0.18 \\     0.18 \\     0.18 \\     0.18 \\     0.18 \\     0.18 \\     0.88 \\     0.18 \\     \end{bmatrix}, \mathbf{(WV)^2}\left|\phi\right\rangle = \begin{bmatrix}    -0.09 \\     -0.09 \\     -0.09 \\     -0.09 \\     -0.09 \\     -0.09 \\     0.97 \\     -0.09 \\     \end{bmatrix}

As you can see, the probability of getting the state \left|6\right\rangle becomes very close to 1.

Making a measurement of the input register at this point will give us the state \left|6\right\rangle with a probability very close to 1.

We have just demonstrated the quantum search algorithm!

Why it works

Let f be a function such that

f(x) = \begin{cases}  1 & x \text{ the item we are looking for}\\  0 & \text{ otherwise}  \end{cases}

Define the operator \mathbf{U}_f whose action on an n qubit register and 1 qubit output register is

\mathbf{U}_f(\left|x\right\rangle \otimes \left|y\right\rangle) = \left|x\right\rangle\otimes\left|y\oplus f(x)\right\rangle

Prepare the n qubit input register and 1 qubit output register in the following state:

\underbrace{\left|0\right\rangle\ldots \left|0\right\rangle}_{\text{n times}} \otimes \left|1\right\rangle

Applying the Hadamard operator on the input and output qubits gives us a superposition of N=2^n states:

\begin{array}{rl}  \displaystyle \mathbf{H}^{\otimes n}\otimes\mathbf{H}(\left|0\ldots 0\right\rangle\otimes \left|1\right\rangle )&= \mathbf{H}^{\otimes n}\left|0\ldots 0\right\rangle\otimes \mathbf{H}\left|1\right\rangle\\  &= \displaystyle \frac{1}{2^{n/2}}\sum_{x=0}^{2^n-1} \left|x\right\rangle \otimes \frac{1}{\sqrt{2}} \left(\left|0\right\rangle - \left|1\right\rangle\right)    \end{array}

Next apply the operator \mathbf{U}_f to get

\begin{array}{rl}  \displaystyle \mathbf{U}_f \left[\frac{1}{2^{n/2}}\sum_{x=0}^{2^n-1} \left|x\right\rangle \otimes \frac{1}{\sqrt{2}} \left(\left|0\right\rangle - \left|1\right\rangle\right)\right] &= \displaystyle \frac{1}{\sqrt{2}} \frac{1}{2^{n/2}}\left[\sum_{x=0}^{2^n-1} \mathbf{U}_f \left|x\right\rangle \otimes  \left|0\right\rangle - \sum_{x=0}^{2^n-1} \mathbf{U}_f \left|x\right\rangle \otimes \left|1\right\rangle\right]  \\  &= \displaystyle \frac{1}{\sqrt{2}} \frac{1}{2^{n/2}}\left[\underbrace{\sum_{x=0}^{2^n-1} \left|x\right\rangle \otimes  \left|0\oplus f(x) \right\rangle }_{\text{first underbrace}}  - \underbrace{\sum_{x=0}^{2^n-1} \left|x\right\rangle \otimes \left|1\oplus f(x) \right\rangle}_{\text{second underbrace}}\right]  \end{array}

When f(x) = 1 for x=a, the first underbrace will expand to

\displaystyle \sum_{x=0}^{2^n-1} \left|x\right\rangle \otimes  \left|0\oplus f(x) \right\rangle = \left|0\right\rangle\otimes\left|0\right\rangle + \ldots + \underbrace{\left|a\right\rangle\otimes\left|1\right\rangle} + \ldots + \left|2^n-1\right\rangle \otimes \left|0\right\rangle

The second underbrace will expand to

\displaystyle \sum_{x=0}^{2^n-1} \left|x\right\rangle \otimes  \left|1\oplus f(x) \right\rangle = \left|0\right\rangle\otimes\left|1\right\rangle + \ldots + \underbrace{\left|a\right\rangle\otimes\left|0\right\rangle} + \ldots + \left|2^n-1\right\rangle \otimes \left|1\right\rangle\\

The underbraces in the above summations can be swapped so that we get

\displaystyle \frac{1}{\sqrt{2}} \frac{1}{2^{n/2}} \left[ \sum_{x=0}^{2^n-1} (-1)^{f(x)} \left|x\right\rangle \otimes  \left|0\right\rangle - \sum_{x=0}^{2^n-1} (-1)^{f(x)} \left|x\right\rangle \otimes \left|1\right\rangle\right]

\begin{array}{rl}  &= \displaystyle \frac{1}{2^{n/2}} \sum_{x=0}^{2^n-1} (-1)^{f(x)} \left|x\right\rangle \otimes \frac{1}{\sqrt{2}} \left( \left|0\right\rangle - \left|1\right\rangle\right)\\  &= \displaystyle \frac{1}{2^{n/2}} \sum_{x=0}^{2^n-1} (-1)^{f(x)} \left|x\right\rangle \otimes \mathbf{H}\left|1\right\rangle  \end{array}

This means that the action of \mathbf{U}_f leaves the output qubit unentangled with the input qubit. We can just ignore this moving forward and focus our attention to the input qubits.

We view the current state of the input qubit as the result of some operator \mathbf{V} defined by

\mathbf{V} \left|\phi\right\rangle = \displaystyle \frac{1}{2^{n/2}} \sum_{x=0}^{2^n-1} (-1)^{f(x)} \left|x\right\rangle

where

\left|\phi\right\rangle = \displaystyle \frac{1}{2^{n/2}} \sum_{x=0}^{2^n-1} \left|x\right\rangle

We can derive the expression of \mathbf{V} by expanding \mathbf{V} \left|\phi\right\rangle, noting that at x=a, f(x) = 1:

\begin{array}{rl}  \mathbf{V} \left|\phi\right\rangle &= \displaystyle \frac{1}{2^{n/2}} \sum_{x=0}^{2^n-1} (-1)^{f(x)} \left|x\right\rangle\\  &= \displaystyle \frac{1}{2^{n/2}} \left[\left|0\right\rangle + \ldots - \left|a\right\rangle +\ldots + \left|2^n-1\right\rangle\right]\\  &= \displaystyle \frac{1}{2^{n/2}} \left[\left|0\right\rangle + \ldots + \left|a\right\rangle +\ldots + \left|2^n-1\right\rangle - 2\left|a\right\rangle\right]\\  &= \displaystyle \frac{1}{2^{n/2}} \sum_{x=0}^{2^n-1} \left|x\right\rangle - \frac{2}{2^{n/2}} \left|a\right\rangle\\  &= \left|\phi\right\rangle - 2\left\langle a|\phi\right\rangle \left|a\right\rangle\\  &= \Big[\mathbf{I} - 2\left|a\right\rangle \left\langle a\right|\Big] \left|\phi\right\rangle\\  \end{array}

which means

\mathbf{V} = \mathbf{I} - 2\left|a\right\rangle \left\langle a\right|

The two vectors \left|\phi\right\rangle and \left|a\right\rangle determine a plane P. Let \left|a_{\perp}\right\rangle be the vector in the plane perpendicular to \left|a\right\rangle. We can write \left|\phi\right\rangle as

\left|\phi\right\rangle = \phi_1 \left|a_\perp\right\rangle + \phi_2\left|a\right\rangle

Applying \mathbf{V} to \left|\phi\right\rangle gives us:

\begin{array}{rl}  \mathbf{V}\left|\phi\right\rangle &= \left(\mathbf{I} - 2\left|a\right\rangle \left\langle a\right| \right) \left(  \phi_1 \left|a_\perp\right\rangle + \phi_2\left|a\right\rangle \right)\\  &= \phi_1 \left|a_\perp\right\rangle + \phi_2\left|a\right\rangle - 2\phi_2\left|a\right\rangle\\  &= \phi_1 \left|a_\perp\right\rangle - \phi_2\left|a\right\rangle  \end{array}

The effect of the operator \mathbf{V} is therefore to reflect the vector \left|\phi\right\rangle with respect to the vector \left|a_\perp\right\rangle.

Now, we want to reflect this vector \mathbf{V}\left|\phi\right\rangle with respect to \left|\phi\right\rangle. To accomplish this, we define the operator \mathbf{W} by

\mathbf{W} = 2\left|\phi\right\rangle \left\langle\phi\right| - \mathbf{I}

Apply this operator to \mathbf{V}\left|\phi\right\rangle:

\begin{array}{rl}  \mathbf{WV}\left|\phi\right\rangle &= \Big[ 2\left|\phi\right\rangle \left\langle\phi\right| - \mathbf{I} \Big]\mathbf{V}\left|\phi\right\rangle\\  &= 2\left|\phi\right\rangle \left\langle\phi\right|\mathbf{V}\left|\phi\right\rangle -\mathbf{V}\left|\phi\right\rangle  \end{array}

If we express \mathbf{V}\left|\phi\right\rangle as a linear combination of \left|\phi\right\rangle and a vector \left|\phi_\perp\right\rangle in the plane P,

\mathbf{V}\left|\phi\right\rangle = \alpha \left|\phi\right\rangle + \beta \left|\phi_\perp\right\rangle

We have,

\begin{array}{rl}  \mathbf{WV}\left|\phi\right\rangle &= \Big[ 2\left|\phi\right\rangle \left\langle\phi\right| - \mathbf{I} \Big]\mathbf{V}\left|\phi\right\rangle\\  &= 2\left|\phi\right\rangle \left\langle\phi\right|\mathbf{V}\left|\phi\right\rangle -\mathbf{V}\left|\phi\right\rangle\\  &= 2\alpha\left|\phi\right\rangle  - \alpha \left|\phi\right\rangle - \beta \left|\phi_\perp\right\rangle\\  &= \alpha\left|\phi\right\rangle  - \beta \left|\phi_\perp\right\rangle  \end{array}

which demonstrates that \mathbf{W} reflects \mathbf{V}\left|\phi\right\rangle with respect to \left|\phi\right\rangle.

Therefore, the effect of \mathbf{WV} on \left|\phi\right\rangle is to rotate it by an angle \gamma counter-clockwise.

We can compute this \gamma by getting the inner product of \mathbf{WV}\left|\phi\right\rangle and \left|\phi\right\rangle. First, let’s find the expression of \mathbf{WV}\left|\phi\right\rangle:

\begin{array}{rl}  \mathbf{WV}\left|\phi\right\rangle &= \mathbf{W} \left( \mathbf{I} - 2\left|a\right\rangle \langle a |\right) \left|\phi\right\rangle\\  &= \mathbf{W} \left(\left|\phi\right\rangle - 2 \underbrace{\langle a| \phi \rangle} \left|a\right\rangle \right)  \end{array}

The quantity \langle a| \phi \rangle is

\langle a| \phi \rangle = \displaystyle \langle a| \left( \frac{1}{2^{n/2}} \sum_{k=0}^{2^n-1} \left|x\right\rangle \right) = \frac{1}{2^{n/2}} = \cos \theta

where \theta is the angle between \left|a\right\rangle and \left|\phi\right\rangle.

The complementary angle, \rho = 90-\theta is the angle between \left|\phi\right\rangle and \left|a_\perp\right\rangle. Using a well-known trigonometric identity, we can compute for the angle of \rho:

\cos \theta = \sin \rho = \displaystyle \frac{1}{2^{n/2}}

Since \displaystyle \frac{1}{2^{n/2}} is very small if n is large,

\sin \rho = \displaystyle \frac{1}{2^{n/2}} \approx \rho

Continuing,

\begin{array}{rl}  \mathbf{WV}\left|\phi\right\rangle &= \mathbf{W} \left( \mathbf{I} - 2\left|a\right\rangle \langle a |\right) \left|\phi\right\rangle\\  &= \mathbf{W} \left(\left|\phi\right\rangle - 2 \cos\rho \left|a\right\rangle \right)\\  &= (2\left|\phi\right\rangle \langle \phi| - \mathbf{I}) \left(\left|\phi\right\rangle - 2 \cos\rho \left|a\right\rangle \right)\\  &= 2\left|\phi\right\rangle - \left|\phi\right\rangle - 2\cos\rho \left|\phi\right\rangle \langle \phi |a\rangle + 2 \cos\rho \left|a\right\rangle\\  &= \left|\phi\right\rangle - 4\cos^2\rho \left|\phi\right\rangle + 2 \cos\rho \left|a\right\rangle\\  &= (1- 4\cos^2\rho) \left|\phi\right\rangle + 2 \cos\rho \left|a\right\rangle  \end{array}

The inner product of \mathbf{WV}\left|\phi\right\rangle and \left|\phi\right\rangle is given by

\begin{array}{rl}  \langle\phi|\mathbf{WV}\left|\phi\right\rangle &= \langle\phi|\left[  (1- 4\cos^2\rho) \left|\phi\right\rangle + 2 \cos\rho \left|a\right\rangle\right]\\  &= (1- 4\cos^2\rho) + 2\cos^2\rho\\  &= 1- 2\cos^2\rho\\  &= \cos 2\rho  \end{array}

Therefore, the angle between these two vectors is 2\rho = \displaystyle \frac{1}{2^{n/2}}. How many times do we have to apply the operator \mathbf{WV} to get to \pi/2?

\begin{array}{rl}  m\cdot 2\rho &= \displaystyle \frac{\pi}{2}\\  \displaystyle m \frac{2}{2^{n/2}}&= \displaystyle \frac{\pi}{2}\\  m &=\displaystyle \frac{\pi\cdot 2^{n/2}}{4}  \end{array}

Therefore, we need to apply the operator \mathbf{WV} O(\sqrt{N}) times to find our value (where N=2^n).

Advertisements

Period Finding and the RSA

In the previous post, we learned how to decrypt RSA by getting the factors of the big number N and computing for the inverse of e (the encoding number) modulo N. There is also another way to decrypt an RSA encrypted message. This is when you are able to get the period of the ciphertext. If c is the ciphertext, the period r is the smallest integer that satisfies:

c^r \equiv 1 \mod N

Once we get the period, we compute for d^\prime , the inverse of e modulo r:

ed^\prime \equiv 1 \mod r

The inverse can then be used to decrypt the ciphertext:

m=c^{d^\prime}

In our previous example, we encrypted the message

THIS IS A SECRET MESSAGE

using public key p=53, q=59, N=pq=3127 and e=7 and private key d=431. The “plain text” is

1907 0818 2608 1826 0026 1804 0217 0419 2612 0418 1800 0604

and the ciphertext is:

0794 1832 1403 2474 1231 1453 0268 2223 0678 0540 0773 1095

Let’s compute the period of the first block of our ciphertext:

0794^r \equiv 1 \mod 3127

Using the python script below, we can compute the period

for r in range(1,100):
   p=pow(794,r,N)
   if p == 1:
     print "%d %d" % (r,p)

The result of running the above program gives r=58. We can then compute d^\prime using the following equation:

ed^\prime = m\times r + 1

The above equation is satisfied when m=3 and d^\prime = 25 . Using this value of d^\prime , we can compute for

\begin{array}{rl}  m&=0794^{25} \mod 3127 \\  &= 1907  \end{array}

which gives us the original message!

However, unlike using the private key, you need to compute the period r and d^\prime for every block of the ciphertext (unless the ciphertext is composed of only one block). However, that should not stop a cracker from deciphering all the blocks.

How RSA Encryption Works

Alice and Bob live in different parts of the world. They want to communicate with each other but they don’t want anyone to know the messages they exchange. In order to protect the message, they need to encrypt their message. Bob comes up with a secret key that will allow both of them to encrypt and decrypt their messages so that when they send it via email they will be confident that no one can read the message if ever someone (like Eve) intercepted the message. However, they have a dilemma. How will Bob send the encryption key to Alice ? If he sends the key and Eve intercepts it, then Eve will be able to decrypt the messages both Alice and Bob exchange and know what they are up to.

Bob should be able to send the key to Alice encrypted so that Eve will not be able to read it. In order to do this, Bob will have to create a second key to encrypt the key he wants to send to Alice. The problem now is how will Alice decrypt the message (which is the encrypted key) if she does not have the second key?

This is where RSA encryption is used. Suppose we want to encrypt a message using RSA, what we’ll do is find 2 large prime numbers p and q and get their product N = pq. We will need another number e, which we will use to encode the message into a ciphertext. The set of numbers N and e is called the public key which Alice can send to Bob via email. Bob will use these numbers to encrypt the secret key before sending to Alice.

We will represent a textual message like “THIS IS A SECRET MESSAGE” into numbers. To accomplish this, we need to map letters into numbers like the following:

\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}  \hline  A & B & C & D & E & F & G & H & I & J & K & L & M & N & O & P & Q & R & S & T & U & V & W & X & Y & Z & \\\hline  0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10& 11& 12& 13& 14& 15& 16& 17& 18& 19& 20& 21& 22& 23& 24& 25& 26\\  \hline  \end{tabular}

Using the above mapping, we can write ‘THIS IS A SECRET MESSAGE’ as:

19 7 8 18 26 8 18 26 0 26 18 4 2 17 4 19 26 12 4 18 18 0 6 4

If a number is less than 10, we pad it with a zero to the left. The message then becomes:

19 07 08 18 26 08 18 26 00 26 18 04 02 17 04 19 26 12 04 18 18 00 06 04

To conserve some space, we can group the numbers into groups of 4:

1907 0818 2608 1826 0026 1804 0217 0419 2612 0418 1800 0604

Now for each M number above, we will encode it using the formula:

\displaystyle C = M^e \mod N

What is this mod operation? The above says that we raise M to the exponent e, divide the result by N and get the remainder. For example, if M=10, e=7 and N=17 we have

10^7=10000000

Now divide the result by 17 and get the remainder:

10000000 / 7 = 588235 \text { Remainder } 5

Therefore

10^7 \mod 17 = 5

In python we can get the answer using the pow function:

>>> pow(10,7,17)
5

Let’s say we choose p = 53, q=59 and e = 7. This gives us N=pq = 53\times 59 = 3127. To encode 1907, we do

\displaystyle C=1907^7 \mod 3127 = 0794

The number 0794 is now the ciphertext. It is the number we give to the recipient of the message. We can use python to generate the ciphertext above.

for n in ("1907","0818","2608","1826","0026","1804","0217","0419","2612","0418","1800","0604"):
   print pow(int(n),7,3127)                                                                

Doing this for all numbers we get:

0794 1832 1403 2474 1231 1453 0268 2223 0678 0540 0773 1095

When the recipient gets this message, she can decipher it using a key which she keeps private to herself. The key d is the inverse of e modulo (p-1)(q-1). The key d is called the private key. We can retrieve the original message using the formula:

\displaystyle M = C^d

The number d can be calculated using the following formula:

ed \mod (p-1)(q-1) \equiv 1

Using python we can compute for d using the following program:

>>> p=53
>>> q=59
>>> e=7
>>> NN = (p-1)*(q-1)
>>> for d in range(1,NN):
...   x = e*d % NN
...   if x == 1:
...     print 'd = ', d
... 
d =  431

Using d = 431 and applying the decipher formula to the first block, we get

\displaystyle M = 0794^{431} \mod 3127 = 1907

which is our original message!

We now apply this to the entire ciphertext

0794 1832 1403 2474 1231 1453 0268 2223 0678 0540 0773 1095

using the python program below:

for n in ("0794","1832","1403","2474","1231","1453","0268","2223","0678","0540","0773","1095"):
   print pow(int(n),431,N)

we get

1907 0818 2608 1826 0026 1804 0217 0419 2612 0418 1800 0604

which is our original full message! It’s just a matter of mapping these numbers back to letters to get the message text.

Using this mechanism, Alice will send 2 numbers N and e to Bob which he will use to encrypt the secret key and send to Alice. When Alice receives the encrypted secret key, she will use her private key d to decrypt it and get the secret key. After that, they can now start using the secret key to encrypt messages between them.

An Interview Question: Using Integer Programming

We can solve the Interview Question using a mathematical technique called Integer Programming. Let d_1, d_2, \ldots, d_N be the variables representing diskette 1, diskette 2, diskette 3, etc. The values of the d_k variables can only be 0 or 1. A 0 means the diskette is not used while a 1 means that it is used.

Each file is saved to a certain diskette. We want to know to what diskette d_i a given file f_j is assigned. To represent this, we assign the variable a_{ij} a value of 1 if file f_j is assigned to diskette d_i.

We will normalize the file sizes so that if s_i is the size of f_i, the s_i \le 1. We do this by simply dividing all file sizes by the size of the diskette. For a given diskette d_i, the following constraint should be satisfied:

d_i - s_1a_{i1} - s_2a_{i2} - \ldots - s_N a_{iN} \ge 0

for diskette i = 1, 2, \ldots, N and s_i are the normalized file sizes of file f_i for i=1,2,\ldots,N.

Since each file f_j can only be assigned to one diskette, we have the following constraint:

a_{1j} + a_{2j} + \ldots + a_{Nj} = 1

where a_{1j} is the variable representing the “file f_j is in diskette d_1“, etc.

Finally, we have to constrain the value of d_i to be either 0 or 1, that is,

d_i \le 1

for all i=1,2,\ldots,N.

Integer Programming Formulation

Given the above information, we can formulate the Integer Programming problem as

Minimize:

d_1 + d_2 + d_3 + \ldots + d_N

subject to

\begin{array}{rl}  d_1 - s_1a_{11} - s_2a_{12} - s_3a_{13} - \ldots - s_Na_{1N} &\ge 0\\  d_2 - s_1a_{21} - s_2a_{22} - s_3a_{23} - \ldots - s_Na_{2N} &\ge 0\\  :\\  d_N - s_1a_{N1} - s_2a_{N2} - s_3a_{N3} - \ldots - s_Na_{NN} &\ge 0\\  a_{11} + a_{21} + a_{31} + \ldots + a_{N1} &= 1\\  a_{12} + a_{22} + a_{32} + \ldots + a_{N2} &= 1\\  :\\  a_{1N} + a_{2N} + a_{3N} + \ldots + a_{NN} &= 1\\  d_1 &\le 1\\  d_2 &\le 1\\  :\\  d_n &\le 1  \end{array}

Solving the Problem

We will use R to solve this Integer Programming Formulation. Please see code below:

library("lpSolve")
NUMFILES=4

# Generate random file sizes between 1 and 10
FileSizes=ceiling(10*runif(NUMFILES))
x = -1*FileSizes/10
l=length(x)

# Each files can be in any of the diskettes. Suppose there are N files,
# to determine if a file j is in diskette i, the value of variable x_ij will
# 1 if file j is in diskette i, and 0 otherwise.
# Here we construct the coefficients of variables x_ij which are the
# sizes of the files (normalized to 1)
zz=c()
for(i in 1:(l-1)){
  zz=c(zz,x,rep(0,l*l))
}
zz=c(zz,x)

# Construct the coefficients of the indicator variables representing the 
# diskettes d_i
zzmatrix=matrix(zz,ncol=l*l,byrow=T)
CoefficientsOfDiskettes=c();
for(i in 1:l){
 ttt=rep(0,l)
 ttt[i] = 1
 CoefficientsOfDiskettes= c(CoefficientsOfDiskettes,ttt,zzmatrix[i,])
}

# Construct the coefficients of x_ij for constant j. These variables 
# satisfy the equation \sum_{i=1}^N x_{ij}
SumOfFileAcrossDiskettes=c()
for(i in 1:l){
  ttt=rep(0,l)
  ttt[i]=1
  SumOfFileAcrossDiskettes=c(SumOfFileAcrossDiskettes,rep(ttt,l))
}

# Prepend Coefficients of variables d_i. The value of these coefficients is 0. 
SumOfFileAcrossDiskettesMatrix=matrix(SumOfFileAcrossDiskettes,ncol=l*l,byrow=T)
PrependCoefficientsOfDiskettes=c()
for(i in 1:l){
 PrependCoefficientsOfDiskettes=c(PrependCoefficientsOfDiskettes,c(rep(0,l),SumOfFileAcrossDiskettesMatrix[i,]))
}

# Construct coefficients of d_i to construct constraint d_i <= 1
DisketteConstraints=c()
for(i in 1:l){
 ttt=rep(0,l)
 ttt[i]=1
 DisketteConstraints=c(DisketteConstraints,ttt,rep(0,l*l))
}

# Construct matrix input of lpSolve
const.mat=matrix(c(CoefficientsOfDiskettes,PrependCoefficientsOfDiskettes,DisketteConstraints),ncol=l*(l+1),byrow=T)

print("Matrix Coefficients:")
print(const.mat)

# Construct inequalities/equalities
const.dir=c(rep(">=",l),rep("=",l),rep("<=",l))

# Construct Right-Hand side
const.rhs=c(rep(0,l),rep(1,l),rep(1,l))

# Construct Objective Function
objective.in=c(rep(1,l),rep(0,l*l))

# Invoke lpSolve
mylp=lp(direction="min",objective.in=objective.in,const.mat=const.mat,const.dir=const.dir,const.rhs=const.rhs,all.int=T)

# Print Results
print(paste("Number of Diskettes: ", sum(mylp$solution[1:l])))
tz=matrix(mylp$solution,ncol=l,byrow=T)
print("File Sizes: ")
print(FileSizes)
for(i in 2:(l+1)){
   files = which(tz[i,] == 1)
   if(length(files) > 0){
      print(paste("Files in diskette ", i-1))
      print(files)
   }
}

Most of the code above is setting up the matrix of coefficients. The line 70 then calls on lpSolve to compute the optimal values of the variables

Program Output

Running this code we get the output

[1] "Matrix Coefficients:"
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
 [1,]    1    0    0    0   -1 -0.2 -0.1 -0.1    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
 [2,]    0    1    0    0    0  0.0  0.0  0.0   -1  -0.2  -0.1  -0.1     0   0.0   0.0   0.0     0   0.0   0.0   0.0
 [3,]    0    0    1    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0    -1  -0.2  -0.1  -0.1     0   0.0   0.0   0.0
 [4,]    0    0    0    1    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0    -1  -0.2  -0.1  -0.1
 [5,]    0    0    0    0    1  0.0  0.0  0.0    1   0.0   0.0   0.0     1   0.0   0.0   0.0     1   0.0   0.0   0.0
 [6,]    0    0    0    0    0  1.0  0.0  0.0    0   1.0   0.0   0.0     0   1.0   0.0   0.0     0   1.0   0.0   0.0
 [7,]    0    0    0    0    0  0.0  1.0  0.0    0   0.0   1.0   0.0     0   0.0   1.0   0.0     0   0.0   1.0   0.0
 [8,]    0    0    0    0    0  0.0  0.0  1.0    0   0.0   0.0   1.0     0   0.0   0.0   1.0     0   0.0   0.0   1.0
 [9,]    1    0    0    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[10,]    0    1    0    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[11,]    0    0    1    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[12,]    0    0    0    1    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[1] "Number of Diskettes:  2"
[1] "File Sizes: "
[1] 10  2  1  1
[1] "Files in diskette  1"
[1] 2 3 4
[1] "Files in diskette  2"
[1] 1

Interpreting the Result

Lines 2-14 of the output gives you the matrix of coefficients. Line 15 prints the number of diskettes needed to store the files. Line 17 prints the randomly generated file sizes from 1 to 10. Finally lines 18-21 prints which diskettes contain which files.

The space complexity of this solution is quite substantial. Given N files, we need to specify N^2 + N variables by 3\times N equations for a total of (N^2 + N)\times 3N memory space for coefficients.

An Interview Question

I was given this interview question and I’d like to share it to you. The setting is back in the days when the largest size of a hard disk was 1 GB and there were no CD writers yet and the only way to back up your data is through those 1.44 floppy disks. You want to back up your files but you want to minimize the number of floppy disks you need to use. Assume your most important files are in a single directory. How will you distribute the files across your disks in such a way that the number of disks you use is minimized ?

Click here to see the Integer Programming Solution.

To make this simple, let’s assume the following:

– we will not take into account that every file you copy to the disk has a record of the metadata of the file and stored on the disk as well. This will eat up space as you put more files. For our purposes, we ignore this complexity.
– The size of each file is less than or equal to 1.44

First we need to have a list of those files including the size and sort the list according to size in descending order. If A is the list of files, we can apply this algorithm:

B := list of files to copy to current floppy disk
remaining_size := 1.44 MB
For file in A:
    If remaining_size - file.size > 0:
        B.add(file)
        A.remove(file)
Copy all files listed in B to disk
Empty B
Repeat process for remaining files in A

Although there are other better algorithms than the one above, this is the one I managed to to come up during the interview.

We now need to determine how fast our algorithm can run.

Worst Case Complexity

How slow can this algorithm get ? If for any two files F_i and F_j in A we have F_i + F_j > 1.44, then all files will have their own diskette. If this is the case, for each file, our algorithm will execute step 4. For the first disk, it will execute the step N times. For the second disk, it will execute the step N-1 times, for the third disk it will execute N-2 times, etc. the total number of times it executes step 4 is the total number of comparisons and is equal to the summation:

\displaystyle \sum_{1=1}^{N} i

which is equal to

\displaystyle \frac{N(N+1)}{2}

Therefore, in the worst case, the complexity is O(N^2).

Best Case Complexity

The best case is when all files fit in just one diskette. For this, the total number of comparisons is N

Average Case Complexity

On the average, files have different sizes. We now compute the complexity on the assumption that the probability distribution is uniform.

If k is the number of diskettes, the number of comparisons is a sequence of monotonic decreasing numbers \{ a_1, a_2, a_3, \ldots, a_k \} taken at random from the set \{ 1, 2, \ldots, N\}. Each of the numbers a_j, j\in \{1, 2, \ldots, k\} has a probability 1/N of being chosen. Let X be a random variable such that

\displaystyle Pr(X=a_j) = \frac{1}{N} \text{ for } j=1,2,\ldots,k

then the number of comparisons C is equal to

\displaystyle C = \sum_{i=1}^k X = kX

The expected value of C is given by the

E[C] = E[kX] = kE[X]

However, the expected value of X is given by

\displaystyle E[X] = \sum_{j=1}^N j\cdot Pr(X=j) = \frac{1}{N} \sum_{j=1}^N j = \frac{1}{N}\frac{N(N+1)}{2} = \frac{N+1}{2}

Therefore,

\displaystyle E[C] = k\frac{N+1}{2}

What remains is to determine the average value of k, which is the number of diskettes. If M=1.44 is the maximum file size, the average file size is M/2. The average total file size is then NM/2. The average number of diskettes is equal to the average total size divided by size of diskette, that is

k = \displaystyle \frac{NM}{2}\frac{1}{M} = \frac{N}{2}

This means that

\displaystyle E[C] = \frac{N}{2} \frac{N+1}{2} = O(N^2)

which is the same as the worst case complexity.

There is another way to solve this problem using Integer Programming.

Birthday Paradox and Cracking Passwords

Now that we know the basics of the Birthday Problem, we can use this knowledge to understand the security of password hashing.

In the early days, passwords were stored in the server “as-is”. This means that if your username was juan and your password is Password123! then that information is stored in the server like this:

juan,Password123!

This is information is usually stored in a password file. If this password file is stolen, then it’s easy for another person to use this information and log in using your username and password and impersonate you.

Since the theft of a password file is harder to prevent, the passwords are not anymore stored “as-is” (also known as clear-text). The server will apply an algorithm to the original password which outputs a text called a hash. The algorithm is called a hash function. The hash is what’s put in the password file. A thief in possession of the password file will not be able to know the original password just by looking at it.

For example, the information above will now look like this:

juan,2c103f2c4ed1e59c0b4e2e01821770fa

where “2c103f2c4ed1e59c0b4e2e01821770fa” is the has value of the password “Password123!“.

So when you log in to the server using your password “Password123!”, the server will then run an algorithm that will hash your password and compare the result of the hashing to the one stored in the server, say “2c103f2c4ed1e59c0b4e2e01821770fa”. If they match, then it means that your password was correct and you are given access to the server.

The hash function I’m using is called the MD5 hash function. Given a password, it will produce a hash value. The set of all hash values is not infinite. In fact, the number of possible hash values is 2^{128} for md5. Due to this restriction, the birthday paradox will apply.

How the Birthday Paradox Applies

The birthday paradox tells us that given a hash function f(x), the probability that at least two passwords hash to the same value is given by:

\displaystyle 1-\frac{N\times N-1\times N-2\times \ldots \times N-k+1}{N^k}

Since md5 hash function has N=2^{128} possible values, the probability that two passwords hash to the same value is

\displaystyle 1-\frac{2^{128}\times 2^{128}-1\times 2^{128}-2\times \ldots \times 2^{128}-k+1}{(2^{128})^k}

We want to compute for k so that this probability is at least 50%.

\displaystyle 1-\frac{2^{128}\times 2^{128}-1\times 2^{128}-2\times \ldots \times 2^{128}-k+1}{(2^{128})^k} \ge 0.5

which is equivalent to

\displaystyle \frac{2^{128}\times 2^{128}-1\times 2^{128}-2\times \ldots \times 2^{128}-k+1}{(2^{128})^k} < 0.5

Computing for k when N is large is hard so we need to approximate this. To that end, we need some tools to help us.

We can write the probability in the following way:

\displaystyle 1-\frac{N}{N}\times\frac{N-1}{N}\times\frac{N-2}{N}\times\frac{N-3}{N}\times\ldots\times\frac{N-k+1}{N}
= \displaystyle 1-\frac{N}{N}\times (1-\frac{1}{N})\times (1-\frac{2}{N})\times (1-\frac{3}{N}) \times\ldots\times (1-\frac{k-1}{N})

Since N is large, the quantities

\displaystyle \frac{1}{N}, \frac{2}{N}, \frac{3}{N}, \frac{k-1}{N}

are very small. Because of this, we can use the approximation

e^{-x} \approx 1-x

The above approximation comes from the Taylor expansion of e^{-x}:

\displaystyle e^{-x} = 1 - x + \frac{x^2}{2!} - \frac{x^3}{3!} + \frac{x^4}{4!} \ldots

If x is small, the higher order terms like x^2, x^3, x^4, \ldots vanish. Using this approximation, we can write the probability as:

\displaystyle \frac{N}{N}\times (1-\frac{1}{N})\times (1-\frac{2}{N})\times (1-\frac{3}{N}) \times\ldots\times (1-\frac{k-1}{N})

\displaystyle = e^{-\frac{1}{N}}\cdot e^{-\frac{2}{N}}\cdot e^{-\frac{3}{N}}\cdot \ldots\cdot e^{-\frac{k-1}{N}}

\displaystyle = e^{-\frac{1+2+3+4+\ldots + k-1}{N}}

Since

\displaystyle \sum_1^n j = 1+2+3+4+ \ldots + n = \frac{n(n+1)}{2}

we have

e^{-\frac{1+2+3+4+\ldots + k-1}{N}} = e^{-k\cdot (k-1)/2N }

Computing for k

Let’s compute k so that

\displaystyle e^{-k\cdot (k-1)/2N} < 0.5

Taking the natural logarithms of both sides

\displaystyle \ln e^{-k\cdot (k-1)/2N} < \ln 0.5

\displaystyle \frac{-k\cdot (k-1)}{2N} < \ln 0.5

\displaystyle k^2 - k + 2N\ln 0.5 > 0

Using the quadratic equation, we can solve for k:

\displaystyle k > \frac{-(-1) \pm \sqrt{(-1)^2 -4(1)(2N\ln 0.5}}{2}
\displaystyle k > \frac{1 \pm \sqrt{1-8N\ln 0.5}}{2}

When N=2^{128}, we have

\displaystyle k > \frac{1 \pm 4.343876e+19}{2} \approx 10^{19}

This is about 10 quintillion. What this means is that when k > 10^{19}, there is already a 50% chance that 2 passwords hash to the same value. In fact, the md5 was already cracked in 2004.

The Birthday Paradox

There are only 365 days in a year (excluding leap year). Given that there are about 7.4 billion people on earth, this means that there are approximately 20 million people with the same birthday on any given day. You just divide 7,400,000,000 by 365 and you get 20 million. Happy Birthday to all 20 million people celebrating their birthday today!

Suppose you’re in a crowd, on a bus, in a restaurant, or stadium. There is a big chance you might be standing next to a person with the same birthday as you.

1200px-michigan_stadium_2011
Draw any circle over the crowd. There’s a big chance you could be standing next to a person with the same birthday as you!

In fact, you only need about 23 people to have a 50/50 chance of two people having the same birthday! This may sound unbelievable since there are 365 days in a year but you only need 23 people to have a 50% chance of 2 people with the same birthday. How come?

This is called the Birthday Paradox and is very important in digital security, especially the password security.

Basic Counting

Probability is all about counting the possibilities. Let’s make it simple by using a dice as an example. We all know what a dice looks like.

1-dice
When a balanced dice is thrown, it can land showing any one of its six sides. We refer to the result of throwing a dice as an outcome and we say that a dice has 6 possible outcomes. If a dice is balanced, every side is equally likely to show up. We define the probability of a face showing up as the number of times that face occurs in the possible outcomes divided by the total number of possible outcomes. For example, out of the 6 possible outcomes, the number “1” occurs only once. Since there are 6 possible outcomes, the probability of getting a 1 is, therefore:

\displaystyle \text{Probability of getting a "1"} = 1/6

Adding another Dice

2-dice

Let’s add a second dice. To identify our two dice, let’s call one of them Dice A and the other Dice B. Let’s throw the dice together. When they land, dice A and dice B will show numbers. For this scenario, an outcome is now defined as the numbers that Dice A and Dice B show when they land. A possible outcome is Dice A shows a 1 and Dice B shows a 2. We can give this outcome a name and call it 1,2. We should remind ourselves that the first number is the result of Dice A and the second number is the result of Dice B. We can also refer to each outcome as a combination.

Here are the possible outcomes that the two dice will show:

matrix
The outcomes are arranged into a tabular format, which is sometimes called a “matrix”. The top column represent the outcomes of Dice B and the left most column represent the outcomes of Dice A. Combining both outcomes, we get the outcome of a single throw of Dice A and Dice B together.

If you count the number of combinations above, you’ll get 36. The reason it’s 36 is because dice A has 6 different outcomes and dice B has 6 different outcomes. Multiplying them together gives 6 \times 6=6^2 = 36 .

If you add a third dice, say dice C, the total number of combinations becomes:

\displaystyle 6^3 = 216.

In general, for N dice, the total number of combinations is

\displaystyle 6^N

How many combinations have at least 2 same numbers?

Since there are only 2 numbers for each combination, this question is also the same as “How many combinations show the same numbers?”. If you look at the diagonal, these are the combinations that have the same number for Dice A and Dice B.

diagonal

If you count them, you’ll get 6. Therefore, the probability of getting at least two equal numbers (in our 2-Dice system) is

6/36

How many combinations show different numbers?

off-diagonal

If you count all combinations outside the diagonal, you’ll get 30. Therefore, the probability of getting two different numbers is

30/36

Notice that the probability of getting at least 2 same numbers PLUS the probability of getting different numbers is equal to 1:

6/36 + 30/36 = 36/36 = 1

Knowing One gives you the other

If we know the probability of getting different numbers (30/36), then we can compute the probability of getting at least 2 same numbers simply by subtracting it from 1:

\displaystyle \text{probability of getting at least 2 numbers same} = 1-30/36 = 1/6 = 0.167

Avoid counting manually

When we counted the number of combinations which show different numbers, we counted it with our fingers. There is another way to count which is by doing it mentally. Since we are counting the number of ways that the 2-Dice system will show different numbers, we start by getting Dice A and asking how many different ways Dice A can land so that the number it shows is not equal to the number shown by Dice B. Since we have not yet thrown Dice B, then Dice A is allowed to show any number when it lands. This means there are 6 possible ways for Dice A to do this.

Number of ways Dice A can land = 6

Whatever number results in throwing Dice A, we cannot allow Dice B to have that number. This means that Dice B can only choose from 5 other numbers different from the one chosen by Dice A.

Number of ways Dice B can land = 5

If we multiply them, we get the number of combinations that Dice A and Dice B can land with different numbers:

6*5 = 30

This agrees with our manual counting.

At this point, pause and take note that the probability of getting at least 2 numbers the same for a 2-Dice system is 0.167. If we add more dice, this probability will increase. The question then is

How many dice do we need to throw so that the probability of getting 2 dice showing the same number is at least 50%?

Our 2-Dice example above shows that the probability of at least 2 dices showing the same number is 0.167, which is less than 50%. Let’s add a third dice and compute the probability.

How to compute the probability?

Let’s follow the pattern for the 2-Dice system. Since there are now 3 dice, the number of ways to get all numbers different is:

6*5*4

The total number of combinations of a 3-Dice system is

\displaystyle 6^3

Therefore, the probability of getting at least 2 dice with the same number is

\displaystyle 1- \frac{6\times 5\times 4}{6^3} = 0.444

This is still less than 50%.

Adding a 4th Dice

Let’s now add a 4th dice and compute the probability using the same pattern:

\displaystyle 1- \frac{6\times 5\times 4\times 3}{6^4} = 0.722

This is greater than 50%! So the answer is we need 4 dice thrown so that the probability of getting at least 2 dice with the same number is at least 50%.

The general formula for the probability for a k-Dice system is:

\displaystyle 1- \frac{ 6\times 5\times \ldots \times (6-k+1)}{6^k}

How does this relate to the Birthday Problem?

Now that we have the foundations, it’s easy to translate Dice to people and numbers to birthdays. In our dice example, there are 6 different numbers (faces) per dice. Translating this to birthdays, each person can have 365 possible birthdays since there are 365 days in a year (not including leap year).

This is the analogy:

Dice -> 6 possible faces
Person -> 365 possible birthdays

We want to compute how many random persons we need so that the probability of at least two persons having the same birthday is at least 50%. Let k be the number of random persons. Following the same pattern as the Dice example, the formula to compute the probability, given k persons, is:

\displaystyle \text{Probability of at least 2 persons with the same birthday} = 1-\frac{365 \times 364 \times 363 \times \ldots (365-k+1)}{365^k}

If we compute starting from k=1 to k=30, we can construct the following table:

   probability
1  0.000000000
2  0.002739726
3  0.008204166
4  0.016355912
5  0.027135574
6  0.040462484
7  0.056235703
8  0.074335292
9  0.094623834
10 0.116948178
11 0.141141378
12 0.167024789
13 0.194410275
14 0.223102512
15 0.252901320
16 0.283604005
17 0.315007665
18 0.346911418
19 0.379118526
20 0.411438384
21 0.443688335
22 0.475695308
23 0.507297234
24 0.538344258
25 0.568699704
26 0.598240820
27 0.626859282
28 0.654461472
29 0.680968537
30 0.706316243

Below is the graph of the same data where we indicate at what number of persons the graph is greater than or equal to 50%. When the number of persons becomes 23, there is already a 50% chance that at least 2 of them have the same birthday!

graph_of_probability