Quantum Computing: Programming the Quantum Dice

The power of quantum computing comes from the superposition of states which allows us to do computation in parallel. From an input of 1 qubit, we can do parallel computation of 2 integers (0 and 1). From an input of 2 qubits, we can do parallel computation of 4 numbers (0, 1, 2, 3) and so on. In general, for an n number, we can do parallel computation of 2^n;

A quantum computer allows us to simulate easily the roll of a dice. Imagine, for the sake of simplicity, say we have a 4-faced honest die with faces 0, 1, 2 ,3. Since there are 4 faces, the probability of getting any face is 1/4.


Using the quantum circuit below, we can show the probabilities using a quantum computer.

The circuit is constructed very easily and in my opinion is the hello work of quantum computing. We start with 2 qubits. Each qubit is initialized to 0. Then using a Hadamard gate, we can create a superposition of all states from 0 to 3:

The hadamard gates action on the qubits gives you this state:

\displaystyle \mathbf{H}^{\otimes n} |0\rangle = \displaystyle \frac{1}{\sqrt{2}^n} \sum_{x=0}^{2^n-1} |x\rangle

Then doing a measurement on the circuits 1000 times we get a probability distribution. The graph below shows that the probability of getting any state is more of less the same.

2-Dice Game

What if we toss 2 dice and get the sum of the faces that show up? The following numbers are the possible outcomes: 0, 2, 3, 3, 4, 5, 6. What number are you going to bet on to maximize your winnings?
If we plot a table of possible outcomes, we would end up with the following table:

\begin{tabular}{|c|cccc|}  \hline  & 0 & 1 & 2 & 3\\\hline  0 & 0 & 1 & 2 & 3\\  1 & 1 & 2 & 3 & 4\\  2 & 2 & 3 & 4 & 5\\  3 & 3 & 4 & 5 & 6\\\hline  \end{tabular}

We can see that the sum = 3 has a much greater probability of coming up compared to any other number. In fact, the sum 3 comes up 4 times out of 16, which means that the probability of getting a sum of 3 is 4/16 or 0.25.

We can also show the same probability distribution using quantum computing.

2-Qubit Addition Circuit

First we need to create a circuit for adding two 2-qubit numbers. Do do this, we can use of the full-adder circuit for adding 1 bit numbers:

The a_0 and b_0 refer to the input qubits, the C_{\mathrm{i}0} refer to the input carry bit from the addition of the lesser significant bit. The s_0 refer to the sum of the 2 bits. The C_{\mathrm{o}0} refer to the carry bit of the current addition operation and will be input to the addition of the next significant bit. For more information about binary adders refer to this wikipedia article.

Since there are two bits we need to add, we need two of these circuits stringed together as shown below:

Figure 3: Two Full Adders to add 2-Qubit Numbers

Translating to a Quantum Circuit

Using the circuit diagram above, we can translate it to a quantum circuit by using the following rules:

1. For each input bit in the XOR gate, create a CNOT gate such that the input of the CNOT gate is the input bit and the target is the output of the XOR gate:

2. For each AND gate, create a Control-Control-NOT gate (also known as Toffoli gate) such that the input bits of the Control-Control-NOT gate is the input bits of the AND gate and the target bit is the output bit of the AND gate.

Using these 2 rules, we can translate Figure 3 to the following quantum circuit:

Simulating the 2-dice Betting Game

Now that we have created the adder circuit, we can now use quantum computing to simulate the betting game. Thanks to the Hadamard gate, it’s very easy to create a superposition of states from the input qubits. We only have to apply hadamards to each input qubit. The full quantum circuit is now shown below:

Here is the QISKit code to do this:

# Import the Qiskit SDK
from qiskit import QuantumCircuit, ClassicalRegister, QuantumRegister
from qiskit import execute, Aer
from qiskit.tools.visualization import circuit_drawer, plot_histogram

# Create an input Quantum Registers
qa = QuantumRegister(2, name="a")
qb = QuantumRegister(2, name="b")

# Create the intermediate registers
qci = QuantumRegister(1, name="ci")
qco = QuantumRegister(1, name="co")
qd0 = QuantumRegister(3, name="d0")
qd1 = QuantumRegister(3, name="d1")

# Create the output registers

qs = QuantumRegister(3, name="s")

# Create a Classical Register with 3 bits.
c = ClassicalRegister(3, name="cl")
# Create a Quantum Circuit
qc = QuantumCircuit(qa, qb, qci, qco, qd0, qd1, qs, c)



# Flip the output qubit and apply Hadamard gate.
qc.h(qa[0])
qc.h(qa[1])
qc.h(qb[0])
qc.h(qb[1])

qc.barrier()
# The addition circuit

qc.cx(qa[0],qd0[0])
qc.cx(qb[0],qd0[0])
qc.cx(qd0[0],qs[0])
qc.cx(qci[0],qs[0])


qc.ccx(qci[0],qd0[0],qd0[1])
qc.ccx(qa[0],qb[0],qd0[2])
qc.cx(qd0[1],qco[0])
qc.cx(qd0[2],qco[0])

#### 2nd bit
qc.cx(qa[1],qd1[0])
qc.cx(qb[1],qd1[0])
qc.cx(qd1[0],qs[1])
qc.cx(qco[0],qs[1])

qc.ccx(qco[0],qd1[0],qd1[1])
qc.ccx(qa[1],qb[1],qd1[2])
qc.cx(qd1[1],qs[2])
qc.cx(qd1[2],qs[2])

qc.barrier()
qc.measure(qs, c)

# Compile and run the Quantum circuit on a simulator backend
backend_sim = Aer.get_backend('qasm_simulator')
shots = 100

job_sim = execute(qc, backend=backend_sim, shots=shots)
result_sim = job_sim.result()
answer = result_sim.get_counts()
plot_histogram(answer)

# Test
# 0+0 = 0 check
# 0+1 = 1 check
# 0+2 = 2 check
# 0+3 = 3 check
# 1+0 = 1 check
# 1+1 = 2 check
# 1+2 = 3 check
# 1+3 = 4 check
# 2+0 = 2 check
# 2+1 = 3 check
# 2+2 = 4 check
# 2+3 = 5 check
# 3+0 = 3 check
# 3+1 = 4 check
# 3+2 = 5 check
# 3+3 = 6 check

As usual, the code for drawing the circuit is as follows:

from qiskit.tools.visualization import matplotlib_circuit_drawer as drawer, qx_color_scheme
my_scheme=qx_color_scheme()
my_scheme['plotbarrier'] = False
drawer(qc, style=my_scheme)

And here’s the resulting probability distribution generated for 50 simulation runs.

Why it Works

Suppose we have 2 n-qubit numbers. We apply superposition of states to both numbers and an Identity to the output qubit:

\mathbf{H}^{\otimes n}\otimes \mathbf{H}^{\otimes n}\otimes \mathbf{I} |0\rangle |0\rangle |0\rangle

to get

\displaystyle \Big(\displaystyle \frac{1}{\sqrt{2}^n} \sum_{x=0}^{2^{n}-1} |x\rangle\Big)\otimes  \Big( \displaystyle \frac{1}{\sqrt{2}^n} \sum_{y=0}^{2^{n}-1} |y\rangle\Big)\otimes |0\rangle

Next, we apply the addition operator defined as

\mathrm{U}_{f} |xy\rangle |0\rangle = |xy\rangle |x+y\rangle

where

|xy\rangle = |x\rangle \otimes |y\rangle

to give

\displaystyle \frac{1}{2^n}\sum_{x=0}^{2^{n}-1} \sum_{y=0}^{2^{n}-1} \mathrm{U}_f |xy\rangle \otimes |0\rangle = \frac{1}{2^n}\sum_{x=0}^{2^{n}-1} \sum_{y=0}^{2^{n}-1} |xy\rangle \otimes |x+y\rangle

The right hand side can be written as:

\displaystyle \frac{1}{2^n}\sum_{i=0}^{2\cdot(2^n-1)}\Big(\sum_{j=0}^i|j\rangle |i-j\rangle\Big) |i\rangle

If we measure the output qubit, it will collapse to the state

\Psi = \displaystyle \left(\frac{1}{2}\right)^n \sum_{j=0}^i|j\rangle |i-j\rangle\Big) |i\rangle

with probability

\Psi^*\Psi = \displaystyle \left(\frac{1}{2}\right)^{2n} \sum_{j=0}^i 1 = (i+1) \left(\frac{1}{2}\right)^{2n}

In our example, we have n=2, the probability of getting a sum of i=3 is:

\displaystyle (i+1)\cdot \left(\frac{1}{2}\right)^{2n} = (3+1)\left(\frac{1}{2}\right)^{4} = 4\cdot \left(\frac{1}{16}\right) = \frac{1}{4} = 0.25

Conclusion

We have shown the power of quantum computing by allowing us to represent inputs as the superposition of all possible inputs to be computed in parallel. We have shown that this can be done using a Hadamard gate. Using an adder circuit we were able to modify the probabilities so that they are not evenly distributed but follow some distribution which favors some outcomes rather than others. You can just imagine what other possibilities we can use quantum computing for!

Advertisements

Think of a number: How to Program a Quantum Computer

I have always loved figuring out the unknowns. That’s why when I was a child, I love “think of a number” games.

Think of a number, multiply it by 2, add 4, divide the answer by 2. What’s your answer? If your answer is 5, then your number must be 3! The objective is to determine the unknown number.

There is a quantum version of this game. First, let’s define an operation called “dot product” of two numbers:
f(x) = a\cdot x = a_0x_0\oplus a_1x_1\oplus\ldots\oplus a_nx_n

where a_i is the ith bit of the binary representation of a , and x_i is the i’th bit of x and the symbol \oplus is the XOR operator defined as:

\begin{array}{rl}  1\oplus 1 &= 0\\  1\oplus 0 &= 1\\  0\oplus 1 &= 1\\  0\oplus 0 &= 0  \end{array}

For example, suppose a=5 and x=6 . The binary representation of 5=101_2 and the binary representation of 6=110_2 . Then

\begin{array}{rl}  f(6) &= 1\cdot 1 \oplus 0\cdot 1 \oplus 1\cdot 0\\  &= 1\oplus 0 \oplus 0\\  &= 1  \end{array}

If you were to guess my number a , how would you go about it?

Well the easiest way to guess my number is to get the dot product of my number with the powers of 2. For example,

\begin{array}{rl}  f(2^0) &= f(1) = 101_2 \cdot 001_2 = 1\\  f(2^1) &= f(2) = 101_2 \cdot 010_2 = 0\\  f(2^2) &= f(4) = 101_2 \cdot 100_2 = 1  \end{array}

So for an n-bit number, it takes you n invocations of the dot product to determine my hidden number.
The bigger the number, the larger the number of invocations.

Using a quantum computer, it only takes a 1-time invocation of the function in order to guess the number a !

How is it done? You can find how it works from here. In this blog post, we will show how it is implemented using the IBM Quantum Information Science Toolkit.

As we dive into coding, let’s also talk about Quantum Circuits. This is the foundation of programming the quantum computer.

A Quantum Circuit
A Quantum Circuit

Introducing QISKit

IBM has an opensource framework to program a real Quantum Computer which you can also use to simulate the computation. It’s called QISKit, short for Quantum Information Science Kit. They have an excellent documentation which you can refer for installation instructions.

Assuming we now have installed QISKit, we can start coding our quantum programs. We start with creating some qubits

# Import the Qiskit SDK
from qiskit import QuantumCircuit, ClassicalRegister, QuantumRegister
from qiskit import execute, Aer
from qiskit.tools.visualization import circuit_drawer, plot_histogram

# Create an input Quantum Register with 3 qubits.
qinput = QuantumRegister(3)
# Create an output Quantum Register with 1 qubits.
qoutput= QuantumRegister(1)
# Create a Classical Register with 3 bits.
c = ClassicalRegister(3)
# Create a Quantum Circuit
qc = QuantumCircuit(qinput, qoutput, c)

The most important class in the above code is the QuantumCircuit. This class allows us to perform quantum gate operations on the qubits that we pass to it in the initialization as well as to measure the qubits and write results to the classical qubits.

Quantum Circuits

A circuit is composed of qubits and gates. Gates are operators that act on qubits to change it or to change an output qubit. An example of how we visualize circuits is shown below.

You can see 3 input qubits, 1 output qubit and 3 classical bits. The “meter” icon you see in the drawing indicates that we are measuring the state of the qubit at that point in time and saving the result to a classical qubit.

The code for the above circuit is shown below:

from qiskit import QuantumCircuit, ClassicalRegister, QuantumRegister
from qiskit import execute, Aer
from qiskit.tools.visualization import circuit_drawer, plot_histogram

# Create an input Quantum Register with 3 qubits.
qinput = QuantumRegister(3,name="in")

# Create an output Quantum Register with 1 qubit.
qoutput= QuantumRegister(1,name="out")

# Create a Classical Register with 3 bits.
c = ClassicalRegister(3, name="c")

# Create a Quantum Circuit
qc = QuantumCircuit(qinput, qoutput, c)

The circuit drawing we saw above was easily generated using the following code:

from qiskit.tools.visualization import circuit_drawer

circuit_drawer(qc)

We repeat the output here for convenience:

Measuring the State of the Qubits

Below is the code to measure the input qubits and writing the result to the classical bits. We will execute the circuit 1000 times

shot=1000

and plot the histogram.

qc.measure(qinput, c)

# We will use a simulator
backend_sim = Aer.get_backend('qasm_simulator')
shots = 1000

job_sim = execute(qc, backend=backend_sim, shots=shots)
result_sim = job_sim.result()
answer = result_sim.get_counts()
plot_histogram(answer)

The resulting histogram shows that our qubits are in the initial state of |000\rangle with probability equal to 1.

NOT Gate

By default, our qubits are in the |0\rangle state. To convert it to a |1\rangle qubit, we use the NOT gate:

How do we know that the qubit is indeed in the |1\rangle state? When we measure it, the probability of getting the state |001\rangle is 1. This means if we do the experiment many times, the state will be in |001\rangle all the time!

Below is the result of the measurement:

The Hadamard Gate

Another interesting gate, and in my opinion, is the Hello World of Quantum Computing is the Hadamard gate. The Hadamard gate transforms the basis qubits according to the rule:

\begin{array}{rl}  \mathbf{H}|0\rangle &= \frac{1}{\sqrt{2}} \Big(|0\rangle + |1\rangle\Big)\\  \mathbf{H}|1\rangle &= \frac{1}{\sqrt{2}} \Big(|0\rangle - |1\rangle\Big)\\  \end{array}

What makes the Hadamard gate very interesting is that when you apply it to the initial state of

|\underbrace{000\ldots 0}_{\text{n bit string}}\rangle ,

it transforms the initial state into a superposition of all states |x\rangle

\displaystyle \mathbf{H}^{\otimes n} |0\rangle = \displaystyle \frac{1}{\sqrt{2}^n} \sum_{x=0}^{2^{n-1}} |x\rangle

where

|x\rangle = |x_{n-1}\rangle\otimes\ldots\otimes|x_2\rangle \otimes|x_1\rangle\otimes|x_0\rangle and x_i \in \{0,1\} for i=0\ldots 2

Therefore, for n=3, we have a superposition

|x\rangle = \displaystyle \frac{1}{\sqrt{2}^3} \Big(|0\rangle + |1\rangle + |2\rangle + |3\rangle + |4\rangle + |5\rangle + |6\rangle + |7\rangle\Big)

Below is a circuit that implements the above superposition:

If we measure the value of |x\rangle , we can see that every state is equally likely to be the value of |x\rangle .

CNOT Gate

The last basic gate we will introduce is the C_{\mathrm{NOT}} gate (C_{\mathrm{NOT}} stands for Controlled-NOT). The C_{\mathrm{NOT}} gate flips the value of the output qubit depending on the value of the input qubit. It is defined as:

C_{\mathrm{NOT}}|x\rangle|y\rangle = |x\rangle|y\oplus x\rangle

Below is an example of a C_{\mathrm{NOT}} circuit.

Here, we flip the value of the first bit using a NOT gate then we apply a C_{\mathrm{NOT}} gate between the first bit and the third bit. Since the first bit is now equal to |1\rangle , the C_{\mathrm{NOT}} will flip the value of the third bit from |0\rangle to |1\rangle . As expected, when we measure the value of the qubits, we find that the state is in |101\rangle with probability equal to 1.

Implementing the Dot Product Circuit

We now have the ingredients that we can use to implement the Dot Product circuit. In this circuit, we have 3 input qubits, 1 output qubit and 3 classical bits. If our hidden number is a=5 , this is represented in binary as 101 . To compute the dot product of any 3 bit number x with a=5=101_2, we draw 3 lines corresponding to the binary representation of x . We also draw the line for the output qubit.

Since a=101_2 , the 0th bit of a is 1. Let x_0 be the 0th bit of x.

x_0 \cdot 1 = \begin{cases}  1, \text{ if } x_0 = 1\\  0, \text{ if } x_0 = 0  \end{cases}

When x_0\cdot 1 = 1, then it will flip the output qubit. Otherwise, x_0\cdot 0 = 0 and it will leave the output qubit as is.

This suggests that we need a C_\mathrm{NOT} gate that connects |\mathrm{in}_0\rangle and |\mathrm{out}_0\rangle.

Therefore,

qc.cx(qinput[0], qoutput[0])

The same is true for a_2:

qc.cx(qinput[2],qoutput[0])

With this in mind, we can program the Dot Product circuit:

from qiskit import QuantumCircuit, ClassicalRegister, QuantumRegister
from qiskit import execute, Aer
from qiskit.tools.visualization import circuit_drawer, plot_histogram

# Create an input Quantum Register with 3 qubits.
qinput = QuantumRegister(3, "in")
# Create an output Quantum Register with 1 qubit.
qoutput= QuantumRegister(1,"out")

# Create a Quantum Circuit
qc = QuantumCircuit(qinput, qoutput)
qc.barrier()
# Add a CX (CNOT) gate on control qubit qinput[0] and target qubit qoutput[0]
qc.cx(qinput[0], qoutput[0])
# Add a CX (CNOT) gate on control qubit qinput[2] and target qubit qoutput[0]
qc.cx(qinput[2],qoutput[0])
qc.barrier()

Finally, Computing for the Unknown Number

The trick to solving the unknown is by first flipping the output qubit using the NOT gate and applying Hadamards to all qubits:

\mathbf{H}^{\otimes n+1}\mathbf{U}_f\mathbf{H}^{\otimes n+1}\Big[|0\rangle|1\rangle\Big] = |a\rangle|1\rangle

See this post for a more detailed explanation.

The circuit corresponding to this equation is the diagram below where we also show barriers that separate the input, the black-box function and the measurement.

The code to do the above is shown below:

from qiskit import QuantumCircuit, ClassicalRegister, QuantumRegister
from qiskit import execute, Aer
from qiskit.tools.visualization import circuit_drawer, plot_histogram

# Create an input Quantum Register with 3 qubits.
qinput = QuantumRegister(3)
# Create an output Quantum Register with 1 qubits.
qoutput= QuantumRegister(1)
# Create a Classical Register with 3 bits.
c = ClassicalRegister(3)
# Create a Quantum Circuit
qc = QuantumCircuit(qinput, qoutput, c)

# Apply Hadamard gate to the input qubits
qc.h(qinput[0])
qc.h(qinput[1])
qc.h(qinput[2])

# Flip the output qubit and apply Hadamard gate.
qc.x(qoutput[0])
qc.h(qoutput[0])

qc.barrier()
# Add a CX (CNOT) gate on control qubit 0 and the output qubit.
qc.cx(qinput[0], qoutput[0])

# Add a CX (CNOT) gate on control qubit 0 and the output qubit.
qc.cx(qinput[2],qoutput[0])

qc.barrier()

# Apply Hadamard gates to all input and output qubits.
qc.h(qinput[0])
qc.h(qinput[1])
qc.h(qinput[2])
qc.h(qoutput[0])


# Add a Measure gate to see the state.
qc.barrier()
qc.measure(qinput, c)

# Compile and use a simulator
backend_sim = Aer.get_backend('qasm_simulator')
shots = 1000

job_sim = execute(qc, backend=backend_sim, shots=shots)
result_sim = job_sim.result()
answer = result_sim.get_counts()
plot_histogram(answer)

# Draw the circuit
from qiskit.tools.visualization import circuit_drawer
my_style = {'plotbarrier': True}
circuit_drawer(qc,style=my_style)

Measuring the state of the input qubits after the computation, we get with probability 1 our hidden number a=5=101_2 .

Coloring the Circuits

You might have noticed that our circuit has colors while the code only gives a black-and-white color. To color our circuits, we use the following code:

from qiskit.tools.visualization import matplotlib_circuit_drawer as drawer, qx_color_scheme
my_scheme=qx_color_scheme()
my_scheme['plotbarrier'] = True
drawer(qc, style=my_scheme)

Conclusion

It’s now fairly easy to write quantum programs using IBM QISKit. In the future, it will even be much easier as we will just be using the libraries and composing them in high-level rather than really looking at the algorithms from a low-level. Just as AI is now accessible to a lot of people, Quantum Computing will have it’s day as well.

RSA Encryption and Quantum Computing

We are going to crack the RSA by finding the period of the ciphertext. Let m be the message, c the ciphertext, e the encoding number and N=pq such that

c=m^e \mod N

From Period Finding and the RSA post, the period of a ciphertext is defined as the smallest number r such that

c^r \equiv \mod N

First we prepare n=2n_0 QBits at the state \left|0\right\rangle where n_0 is the number of bits of N=pq . The number of bits can be computed using the formula:

\lceil \log_2(N) \rceil

We also prepare n_0 QBits for the output register.

So initially we have the following setup:

\underbrace{\left|00\ldots 0\right\rangle}_{n\text{ times}} \otimes \underbrace{\left|00\ldots0\right\rangle}_{n_0\text{ times}}

The output register will contain the output of the operator:

\mathbf{U}_f \left|x\right\rangle\left|0\right\rangle = \left|x\right\rangle\left|f(x)\right\rangle

where

f(x) = c^x \mod N

We now apply the Hadarmard gate to the input register and apply the operator \mathbf{U}_f:

\displaystyle \frac{1}{2^{n/2}} \sum_{x=0}^{2^n-1} \left|x\right\rangle\left|f(x)\right\rangle

The Hadamard gates transform the state into a superposition of all possible values of the period while the operator \mathbf{U}_f transforms the output QBits as superposition of all possible values of c^x\mod N.

Next we make a measurement of the output QBits. This will give us a number f_0 which corresponds to all states whose value of \left|x\right\rangle is of the form x_0+kr since

\begin{array}{rl}  c^{x_0+kr} & \equiv c^{x_0}c^{kr} \mod N \\  & \equiv c^{x_0}(c^{r})^k \mod N \\  & \equiv c^{x_0} \mod N  \end{array}

since (c^{r})^k \equiv 1 \mod N by definition of a period.

Therefore, after the measurement, the input state is now

\displaystyle \left|\Psi\right\rangle = \frac{1}{\sqrt{m}} \sum_{k=0}^{m-1} \left|x_0 + kr\right\rangle

where m is the smallest integer such that x_0 + mr \ge 2^{n}.

We will then apply the quantum Fourier transform to state \left|\Psi\right\rangle . The Quantum Fourier Transform is defined by its action on an arbitrary \left|x\right\rangle

\displaystyle \mathbf{U}_{\mathbf{FT}}\left|x\right\rangle = \frac{1}{2^{n/2}} \sum_{y=0}^{2^n-1} e^{2\pi ixy/2^n}\left|y\right\rangle

Applying the Fourier Transform to the state \left|\Psi\right\rangle ,

\begin{array}{rl}  \displaystyle \mathbf{U}_{\mathbf{FT}} \frac{1}{\sqrt{m}} \sum_{k=0}^{m-1}\left|x_0 + kr\right\rangle &= \displaystyle \frac{1}{\sqrt{m}} \sum_{k=0}^{m-1}\mathbf{U}_{\mathbf{FT}} \left|x_0 + kr\right\rangle\\  &= \displaystyle \frac{1}{\sqrt{m}} \sum_{k=0}^{m-1} \frac{1}{2^{n/2}} \sum_{y=0}^{2^n-1} e^{2\pi i(x_0+kr)y/2^n}\left|y\right\rangle\\  &= \displaystyle \frac{1}{\sqrt{m}} \sum_{k=0}^{m-1} \frac{1}{2^{n/2}} \sum_{y=0}^{2^n-1} e^{2\pi ix_0y/2^n}e^{2\pi ikry/2^n}\left|y\right\rangle\\  &= \displaystyle \sum_{y=0}^{2^n-1} \underbrace{e^{2\pi ix_0y/2^n} \frac{1}{\sqrt{2^n m}} \sum_{k=0}^{m-1} e^{2\pi ikry/2^n}}_{\text{probability amplitude}}\left|y\right\rangle  \end{array}

Now if we make a measurement of this new state, the probability of getting a value of y is the probability amplitude multiplied by it’s complex conjugate:

\displaystyle \Big[\underbrace{e^{2\pi ix_0y/2^n} } \frac{1}{\sqrt{2^n m}} \sum_{k=0}^{m-1} e^{2\pi ikry/2^n}\Big]\Big[\underbrace{e^{-2\pi ix_0y/2^n} } \frac{1}{\sqrt{2^n m}} \sum_{k=0}^{m-1} e^{-2\pi ikry/2^n}\Big]

The factors in underbrace evaluates to 1 when multiplied and we’re left with

\displaystyle \frac{1}{2^n m}\Big| \underbrace{\sum_{k=0}^{m-1} e^{2\pi ikry/2^n}}\Big|^2

Using the formula

e^{i\theta} = \cos(\theta) + i\sin(\theta)

we can write the summation in underbrace as

\displaystyle \Big(\sum_{k=0}^{m-1} \cos(2\pi kry/2^n)\Big)^2 + \Big(\sum_{k=0}^{m-1} \sin(2\pi kry/2^n)\Big)^2

We’d like to know at what value of y the expression above is maximum. We can use the following formula to simplify above summations:

\displaystyle \sum_{k=0}^{m-1} \cos(2\pi fk) = \frac{\cos(\pi f(m-1)) \sin(\pi fm)}{\sin(\pi f)}\\  \displaystyle \sum_{k=0}^{m-1} \sin(2\pi fk) = \frac{\sin(\pi f(m-1)) \sin(\pi fm)}{\sin(\pi f)}

If we let

f = ry/2^n

this reduces the summations to:

\displaystyle \Big(\sum_{k=0}^{m-1} \cos(2\pi kry/2^n)\Big)^2 + \Big(\sum_{k=0}^{m-1} \sin(2\pi kry/2^n)\Big)^2

\begin{array}{rl}  &=\displaystyle \frac{\cos^2(\pi f(m-1)) \sin^2(\pi fm)}{\sin^2(\pi f)} + \frac{\sin^2(\pi f(m-1))\sin^2(\pi fm))}{\sin^2(\pi f)}\\  &= \displaystyle \frac{\sin^2(\pi fm)}{\sin^2(\pi f)}\Big[\underbrace{\cos^2(\pi f(m-1)) + \sin^2(\pi f(m-1))}_{\text{= 1}}\Big]\\  &= \displaystyle \frac{\sin^2(\pi fm)}{\sin^2(\pi f)}  \end{array}

Therefore, the probability of getting a specific y is

p(y) = \displaystyle \frac{1}{2^nm} \displaystyle \frac{\sin^2(\pi fm)}{\sin^2(\pi f)}

Here is a plot of the function \sin^2(\pi fm)/\sin^2(\pi f) for m=4:

Notice that the graph attains its maximums when the values of f are integer values, that is when

\displaystyle f = j= \frac{ry}{2^n} \text{ where } j= 1, 2, 3, 4, \ldots, r

Since y is an integer value, this means that the value of y must be close to j2^n/r. We can use a theorem in Number theory which states that if p/q is a rational number that approximates a real number x and

\displaystyle \Big|\frac{p}{q} - x\Big| \le \frac{1}{2q}

we can approximate x using continued fraction expansion of p/q. In fact, if we find a y that is within 1/2 of one of the j2^n/r values, we have

\displaystyle \left| y-j\frac{2^n}{r}\right| \le \displaystyle \frac{1}{2}\\

and dividing both sides by 2^n, we get

\displaystyle \left| \frac{y}{2^n}-\frac{j}{r}\right| \le \displaystyle \frac{1}{2^{n+1}}

which satisfies the theorem. We can therefore expand y/2^n via continued fraction expansion and continue until we find a denominator r that is less that 2^{n_0}. We then test whether r could be our period if it satisfies

\displaystyle c^r \equiv 1

If it satisfies the above, then r is the period. It’s just a matter of computing d^\prime, the inverse of e modulo r satisfying

\displaystyle ed^\prime \equiv 1 \mod r

Having computed d^\prime, we can then decrypt the ciphertext using

m = c^{d^\prime} \mod N

In our example in period finding, let’s suppose that the quantum computer gave us the following number

y=1446311

We can expand the fraction y/2^n = 1446311/16777216 as a continued fractions as follows:

\displaystyle \frac{y}{2^n} = \frac{1446311}{16777216} = \frac{1}{11 + \displaystyle \frac{1}{1+\displaystyle\frac{1}{1+\displaystyle\frac{1}{1}}}} = \frac{5}{58}

or

\displaystyle \frac{y}{2^n} = \frac{1446311}{16777216} = \frac{1}{11 + \displaystyle \frac{1}{1+\displaystyle\frac{1}{1+\displaystyle\frac{1}{\displaystyle\frac{1}{6886}}}}} = \frac{34433}{399423}

We can stop until q=58 since q=58 < 2^{n_0} = 2^{12} = 4096 . So we use the first expansion where q=58 and testing it

\begin{array}{rl}  \displaystyle c^{58} \mod 3127 &= \displaystyle 794^{58} \mod 3127\\  &= \displaystyle 1 \mod 3127    \end{array}

which confirms that q=58 is a period. Using r=58 in the equation

\displaystyle ed^\prime = m\times r + 1

and solving for d^\prime (which is the inverse of e modulo 58) we get

\displaystyle d^\prime = 25

We can now get the original message using

\displaystyle 0794^{25} \mod 3127 = 1907

which agrees with our original message.

How lucky do we need to get in order for the state to collapse to one of these y values close to j2^n/r? Let’s evaluate the probability for

y_j=\displaystyle j\frac{2^n}{r} \pm \delta_j

where \delta_j is the distance of the closest y to j2^n/r. Since f = ry/2^n, we have

\begin{array}{rl}  p(y_j) = \displaystyle \frac{1}{2^nm}\frac{\sin^2(\pi fm)}{\sin^2(\pi f)} &= \displaystyle \frac{1}{2^nm} \frac{\sin^2(\pi mr\displaystyle\frac{y}{2^n})}{\sin^2(\pi r\displaystyle\frac{y}{2^n}) }\\  &= \displaystyle \frac{1}{2^nm} \frac{\sin^2\Big[\displaystyle\frac{\pi mr}{2^n}(j\frac{2^n}{r} \pm \delta_j)\Big]}{\sin^2\Big[\displaystyle\frac{\pi r}{2^n}(j\frac{2^n}{r} \pm \delta_j)\Big] }\\  &= \displaystyle \frac{1}{2^nm} \frac{\sin^2\Big[\displaystyle j\pi m \pm \frac{\pi m r\delta_j}{2^n})\Big]}{\sin^2\Big[\displaystyle j\pi \pm \frac{\pi r\delta_j}{2^n}\Big] }\\  &= \displaystyle \frac{1}{2^nm} \frac{\sin^2\Big[\displaystyle \frac{\pi m r\delta_j}{2^n})\Big]}{\sin^2\Big[\displaystyle \frac{\pi r\delta_j}{2^n}\Big] }  \end{array}

We can approximate the denominator using the small angle approximation of sine:

\sin(x) \approx x

to get

\displaystyle p(y_j) = \displaystyle \frac{1}{2^nm} \frac{\sin^2\Big[\displaystyle \frac{\pi m r\delta_j}{2^n})\Big]}{\left[\displaystyle \frac{\pi r\delta_j}{2^n}\right]^2}

Since m is the smallest integer such that x_0 + mr \ge 2^{n}, then

m = \displaystyle \left\lceil\frac{2^n}{r}\right\rceil

We can use this to approximate

\displaystyle m\frac{r}{2^n} \approx 1

to get

p(y_j) = \displaystyle \frac{1}{2^nm} \frac{\sin^2(\displaystyle \pi\delta_j)}{\left(\displaystyle \frac{\pi\delta_j}{m}\right)^2}

Since 0 \le \delta_j \le 1/2, we can see from the graph below that the line joining (0,0) and (\pi/2,1), whose equation is 2x/\pi is less that the value of the sine function at that interval, that is,

\displaystyle \frac{2x}{\pi} \le \sin(x)

Using this inequality, we can get a lower bound of the probability of getting a specific y value to be

\begin{array}{rl}  p(y_j) = \displaystyle \frac{1}{2^nm} \frac{\sin^2(\displaystyle \pi\delta_j)}{\left(\displaystyle \frac{\pi \delta_j}{m}\right)^2} &\ge \displaystyle \frac{1}{2^nm} \left(\frac{\displaystyle \frac{2\pi\delta_j}{\pi}}{\displaystyle \frac{\pi \delta_j}{m}}\right)^2\\  &= \displaystyle \frac{1}{2^nm} \left( \frac{2m}{\pi}\right)\\  &= \displaystyle \frac{m}{2^n}\frac{4}{\pi^2}\\  &= \displaystyle \frac{1}{r}\frac{4}{\pi^2}  \end{array}

Since there are r of these y_j‘s, the probability of getting any one of these y_j‘s is therefore:

p(y) = \displaystyle \frac{4}{\pi^2} \approx 0.405

which means that there is 40% probability of the state to collapse to a value of y that will give us the correct period.

We have just seen how a quantum computer can be used to crack the RSA. We have exploited the fact that the measurement will give us a value of y that is near j2^n/r with high probability. We then computed for the period using the continued fraction expansion of y/2^n. With the period computed, it’s becomes straightforward to decrypt the ciphertext.

Parallel Computing in the Small: An Introduction to Quantum Computing

As I watched my one-month old baby sleeping, a thought ran through my mind. I wonder what technology would look like when she’s my age. There’s a lot of things going on simultaneously in technology today. I remember doing Artificial Intelligence/Machine Learning about 15 years ago. They used to be done in a university setting. They are now the new normal. Some technologies are quite recent like Big Data and Blockchain but they have become mainstream very quickly. I wonder what it will look like when my daughter grows up.

There is a technology I’m currently watching. It’s still in it’s infancy but this technology is really fascinating. It is a technology based on a remarkably counter-intuitive theory of the universe that governs the atoms and sub-atomic particles. It is called Quantum Computing. Given the high speed at which science fiction become realities, it’s possible that my baby girl will someday be programming on a quantum computer. But for now, we can only hope.

I have not seen a Quantum Computer and based on what I read so far, it’s still an engineering challenge. It might probably be a few more years before they are mass-produced but it is something to look forward to. But the good thing is, we can already talk about quantum algorithms! So let’s sharpen our pencils and get a few sheets of paper for we are going to talk about this new way of computing.

In classical computing, a bit has a value that is either one or zero. In quantum computing, a bit can now have a value that is a superposition of 1 and 0. We call this a QBit. The values 1 and 0 are now symbolized as \left| 1\right\rangle and \left| 0\right\rangle respectively. These are unit vectors in what is known as a Hilbert space. They are also known as kets, derived from the word bracket which we’ll talk more of later. An example of a vector space that you are familiar with is the color wheel. Each color is actually a combination of red, blue and green. For example, the color yellow is a combination of red and green. It’s also the same as with the qubit. A general QBit vector is of the form

\displaystyle \left| \Psi \right\rangle = a\left| 0\right\rangle + b\left| 1\right\rangle

such that \mid a\mid^2 + \mid b\mid^2 = 1. The numbers a and b are complex numbers and the symbol \mid a\mid^2 is defined as

\displaystyle \mid a\mid^2 = a^*a

where a^* is the complex conjugate of a. A QBit vector represents the state of the QBit. We shall use the term vector and state interchangeably moving forward.

However, you won’t be able to see them in superposition state. If you want to know the state of a qubit, you will have to measure it. But when you measure it, it will collapse to one of either \left| 0\right\rangle or \left| 1\right\rangle but you will have a clue as to what that state will be. The coefficients a and b will give you the probability of the result of the measurement: it will be \mid a\mid^2 in state \left| a\right\rangle and \mid b\mid^2 in state \left| b\right\rangle.

If the state of a QBit is a superposition of basis states \left| 0\right\rangle and \left| 1\right\rangle, does this mean our computation is also random? Does it mean we get different answers every time ?

It turns out that we can get a definite answer! Let’s illustrate this by a simple quantum computation.

Suppose a I have a number between 0 and 1 million. What is my number? You can guess my number and I will only tell you if you got it correctly or not. I won’t tell you if my number is higher or lower than your guess. How many tries do you think you need in order to guess my number? I’m sure our tries will be many. In the worst case, you’ll need about 1 million tries.

For a quantum computer, you only need one try to guess it and that’s what we’re going to see. However, bear with me since we have to build up our arsenal of tools first before we can even talk about it. Let’s start with how a QBit looks like as a matrix.

The basis QBits \left| 0\right\rangle and \left| 1\right\rangle can be written as 1×2 matrices:

\displaystyle \left| 0 \right\rangle =  \begin{bmatrix}  1\\  0  \end{bmatrix}

\displaystyle \left| 1 \right\rangle =  \begin{bmatrix}  0\\  1  \end{bmatrix}

Using this representation, we can write a general QBit as:

\displaystyle \mid\Psi\rangle = a\mid 0\rangle + b\mid 1\rangle = a  \begin{bmatrix}  1\\  0  \end{bmatrix}  + b  \begin{bmatrix}  0\\  1  \end{bmatrix}  = \begin{bmatrix}  a\\  b  \end{bmatrix}

So now we know that every QBit is a linear combination of ket vectors \left| 0\right\rangle and \left| 1\right\rangle. These ket vectors have a corresponding “bra” vectors \langle 0 \mid and \langle 1\mid. In matrix representation, the bra basis vectors are defined as:

\displaystyle \langle 0\mid  = \begin{bmatrix}  1 & 0  \end{bmatrix}

and

\displaystyle \langle 1\mid =  \begin{bmatrix}  0 & 1  \end{bmatrix}

For a general QBit \mid\Psi\rangle = a\mid 0\rangle + b\mid 1\rangle, the corresponding bra vector is

\displaystyle  \langle \Psi \mid = a^*\langle 0\mid + b^*\langle 1\mid =  \begin{bmatrix}  a^* & b^*  \end{bmatrix}

There is an operation between bra and ket vectors of the basis states called inner product which is defined as:

\begin{array}{rl}  \displaystyle \langle 0\mid 0 \rangle &= 1\\  \displaystyle \langle 1\mid 1 \rangle &= 1\\  \displaystyle \langle 0\mid 1 \rangle &= \langle 1\mid 0\rangle = 0  \end{array}

Using the above rules, we can define the inner product of any two QBits \mid \Psi\rangle = a\mid 0\rangle + b\mid 1\rangle and \mid\Phi\rangle = c\mid 0\rangle + d\mid 1\rangle as

\begin{array}{rl}  \displaystyle \langle \Psi\mid\Phi\rangle &= \big(a^*\langle 0\mid + b^*\langle 1\mid\big)\big(c\mid 0\rangle + d\mid 1\rangle\big)\\  &= \displaystyle a^*c \langle 0\mid 0\rangle + a^*\langle 0\mid 1\rangle + b^*c\langle 1\mid 0\rangle + b^*d\langle 1\mid 1\rangle\\  &= a^*c + b^*d  \end{array}

Using the above definition of an inner product, the norm of a vector is defined as

\displaystyle \mid\left|\Psi\right\rangle\mid = \displaystyle \sqrt{\langle \Psi \mid \Psi \rangle} =\displaystyle \sqrt{a^*a + b^*b}

Geometrically, the norm is the length of the ket \mid \Psi\rangle.

Quantum computations are done using quantum operators. They are also called quantum gates. Operators are linear mappings that take a state to another state. That is, an operator \mathbf{A} acts on a\left| 0 \right\rangle + b\left| 1\right\rangle to give another vector c\left| 0 \right\rangle + d\left| 1\right\rangle. We write this as

\displaystyle \mathbf{A}(a\left| 0\right\rangle + b\left| 1\right\rangle) = c\left| 0\right\rangle + d\left| 1\right\rangle

Since the basis vectors \left| 0\right\rangle and \left| 1\right\rangle are also vectors, operators can also operate on them. An example of a very important operator and it’s action on the basis vectors is the Hadamard operator:

\begin{array}{rl}  \displaystyle \mathbf{H} \left| 0 \right\rangle &= \displaystyle \frac{1}{\sqrt{2}}(\left| 0 \right\rangle + \left| 1\right\rangle)\\  \displaystyle \mathbf{H} \left| 1 \right\rangle &= \displaystyle \frac{1}{\sqrt{2}}(\left| 0 \right\rangle - \left| 1\right\rangle)  \end{array}

By knowing the action of an operator on the basis vectors, we can determine the matrix representation of an operator. For the Hadamard operator, the matrix representations are:

\displaystyle \mathbf{H} \left| 0 \right\rangle = \displaystyle \frac{1}{\sqrt{2}}(\left| 0 \right\rangle + \left| 1\right\rangle) = \frac{1}{\sqrt{2}}  \begin{bmatrix}  1\\  1  \end{bmatrix}

and

\displaystyle \mathbf{H} \left| 1 \right\rangle = \displaystyle \frac{1}{\sqrt{2}}(\left| 0 \right\rangle - \left| 1\right\rangle) = \frac{1}{\sqrt{2}}  \begin{bmatrix}  1\\  -1  \end{bmatrix}

Therefore the matrix representation of the Hadamard operator is

\displaystyle \mathbf{H}=  \frac{1}{\sqrt{2}}  \begin{bmatrix}  1 & 1 \\  1 & -1  \end{bmatrix}

The identity operator is defined as the operator that maps a vector into itself and is symbolized as \mathbf{I}:

\displaystyle \mathbf{I}\left|\Psi\right\rangle = \left|\Psi\right\rangle

The NOT operator is defined as

\begin{array}{rl}  \mathbf{X}\left|0\right\rangle &= \left|1\right\rangle\\  \mathbf{X}\left|1\right\rangle &= \left|0\right\rangle  \end{array}

What it does is transform the \left|0\right\rangle to \left|1\right\rangle and vice versa, just like the classical bits.

Operators are linear. We can use linearity to derive the resulting vector when an operator acts on a general QBit:

\displaystyle \mathbf{A}(a\left| 0\right\rangle + b\left| 1\right\rangle) = a\mathbf{A} \left| 0\right\rangle + b\mathbf{A} \left| 1\right\rangle)

If \mathbf{A} is the Hadamard operator, that is, \mathbf{A} = \mathbf{H}, then the above operation becomes:

\displaystyle \mathbf{H}(a\left| 0\right\rangle + b\left| 1\right\rangle) = a\mathbf{H} \left| 0\right\rangle + b\mathbf{H} \left| 1\right\rangle)

Since we know that:

\begin{array}{rl}  \displaystyle \mathbf{H} \left| 0 \right\rangle &= \displaystyle \frac{1}{\sqrt{2}}(\left| 0 \right\rangle + \left| 1\right\rangle)\\  \displaystyle \mathbf{H} \left| 1 \right\rangle &= \displaystyle \frac{1}{\sqrt{2}}(\left| 0 \right\rangle - \left| 1\right\rangle)  \end{array}

We can then write this as:

\begin{array}{rl}  \displaystyle \mathbf{H}(a\left| 0\right\rangle + b\left| 1\right\rangle) &= \displaystyle a\mathbf{H} \left| 0\right\rangle + b\mathbf{H} \left| 1\right\rangle)\\  &= \displaystyle a\frac{1}{\sqrt{2}}\big(\left| 0\right\rangle + \left| 1\right\rangle\big) + b\frac{1}{\sqrt{2}}\big(\left| 0\right\rangle - \left| 1\right\rangle\big)\\  &= \displaystyle \frac{a+b}{\sqrt{2}} \left| 0\right\rangle + \frac{a-b}{\sqrt{2}} \left| 1\right\rangle  \end{array}

Let see what the operators look like when they operate on bra vectors. Let’s start with the Hadamard operator acting on a general QBit above.

\begin{array}{rl}  \displaystyle \mathbf{H} \mid 0 \rangle &= \displaystyle \frac{1}{\sqrt{2}}(\mid 0 \rangle + \mid 1\rangle) \rightarrow \langle 0\mid \mathbf{H^\dagger} = \displaystyle \frac{1}{\sqrt{2}}(\langle 0 \mid + \langle 1\mid)\\  \displaystyle \mathbf{H} \mid 1 \rangle &= \displaystyle \frac{1}{\sqrt{2}}(\mid 0 \rangle - \mid 1\rangle) \rightarrow \langle 1\mid \mathbf{H^\dagger} = \displaystyle \frac{1}{\sqrt{2}}(\langle 0 \mid - \langle 1\mid)  \end{array}

where we symbolize \mathbf{H^\dagger} as the operator corresponding to the Hadamard operator acting on bras.

The matrix representation of the Hadamard operator acting on bras is therefore:

\mathbf{H^\dagger} = \frac{1}{\sqrt{2}}  \begin{bmatrix}  1 & 1\\  1 & -1  \end{bmatrix} = \mathbf{H}

In general, the matrix representation of \mathbf{H^\dagger} is the conjugate transpose of the matrix of \mathbf{H}. In simple terms, this means take the transpose of \mathbf{H} and apply the complex conjugates to each element. It’s amazing that the Hadamard operator is equal to its conjugate transpose. The types of operators that behave that way are called Hermitian operators.

Let \mathbf{A} be an operator and \left| \Psi\right\rangle a vector and let \Phi = \mathbf{A}\left|\Psi\right\rangle, then the norm of \Phi is

\displaystyle \langle\Phi\mid\Phi\rangle = \langle \Psi\mid \mathbf{A^\dagger}\mathbf{A}\mid \Psi\rangle = \langle\Psi\mid\Psi\rangle

This means that

\displaystyle \mathbf{A^\dagger}\mathbf{A} = \mathbf{I}

The kind of operator that satisfies the above relationship is called Unitary.

Now that we have gained knowledge of some of the tools, we can now describe how a quantum computation is set up. We need QBits for the input and QBits for the output. The simplest setup is a one QBit input and one QBit output. Let \left| x\right\rangle and \left| y\right\rangle be the input and output QBits,respectively. The general state of this quantum system is a superposition of the following states:

\left| 0 \right\rangle \otimes \left| 0 \right\rangle, \left| 0 \right\rangle \otimes \left| 1 \right\rangle, \left| 1 \right\rangle \otimes \left| 0 \right\rangle, \left| 1 \right\rangle \otimes \left| 1 \right\rangle

which could also be written as

\left|00\right\rangle, \left|01\right\rangle, \left|10\right\rangle, \left|11\right\rangle,

or, if we take them as binary representation of numbers, we can write them concisely as

\left|0\right\rangle, \left|1\right\rangle,\left|2\right\rangle,\left|3\right\rangle,

Using the above basis vectors, we can write a general 2-QBit system as:

\displaystyle \left|\Psi\right\rangle = c_1\left|0\right\rangle + c_1\left|1\right\rangle + c_1\left|2\right\rangle + c_1\left|3\right\rangle = \sum_{i=0}^{2^n-1} c_i\left|i\right\rangle

where the numbers c_i are complex and n=2. The summation is up to 2^n-1 since the maximum number n QBits can represent is 2^n-1. The summation on the right also gives us the general form of a QBit for an n-QBit system.

As we saw earlier, the quantum operators (or gates) are the ones used to compute from a given QBit inputs. Let \mathbf{U}_f represent a unitary operator implementing the function f. The function f is a function that takes the numbers from the domain \{0,1\} to \{0,1\}, that is,

\displaystyle f: \{0,1\}\rightarrow \{0,1\}

For a two-QBit system, the form of quantum computation is

\displaystyle \mathbf{U}_f\left|x\right\rangle \left|y\right\rangle = \left|x\right\rangle \left|x\oplus f(x)\right\rangle

where the symbol \oplus represent bitwise OR operation. The table below will give you the results of the bitwise OR operation for every combination of values in \{0,1\}:

\begin{array}{cc}  0 \oplus 0 &= 0\\  0 \oplus 1 &= 1\\  1 \oplus 0 &= 1\\  1 \oplus 1 &= 0  \end{array}

A simple but important example of a unitary operator acting on two QBits is the \mathbf{C_{\text{NOT}}} gate, defined as:

\displaystyle \mathbf{C_{\text{NOT}}}\left|x\right\rangle\left|y\right\rangle = \left|x\right\rangle\left|y \oplus x\right\rangle

Here is the complete action of the \mathbf{C_{\text{NOT}}}:

\begin{tabular}{c|c|c}  \text{Input} & \text{Output} & \text{Result}\\  0 & 0 & 0\\  0 & 1 & 1\\  1 & 0 & 1\\  1 & 1 & 0  \end{tabular}

If you study the table above, you’ll notice that if the input QBit is in the state \left|0\right\rangle, the output QBit stays the same. However, if the input QBit is in \left|1\right\rangle, the output QBit is flipped. The input QBit in this case controls the flipping of the second QBit. The input QBit is said to be the control bit. That’s why it’s called a \mathbf{C_{\text{NOT}}} which stands for Controlled-NOT gate.

Let’s now go back to our original problem. There is a positive number a that is less than 2^n . We want to find this number using quantum computation. We are given a black box operator \mathbf{U}_f whose action on a bra is

\displaystyle \mathbf{U}_f \left|x\right\rangle\left|y\right\rangle = \left|x\right\rangle\left| y \oplus f(x)\right\rangle

where

\displaystyle f(x) = x_0 a_0 \oplus x_1 a_1 \oplus \cdots x_{n-1}a_{n-1} = x \cdot a

where a is the unknown number we are going to guess and x_ia_i are the i-th bits of x and a.

So how many times do we have to apply \mathbf{U}_f in order to guess the number a?

To solve this, we need n input QBits and 1 output QBit. We will initialize our QBits to \left|0\right\rangle. So the initial configuration is

\displaystyle \underbrace{\left|000...0\right\rangle}_{\text{n number of zeroes}}\otimes\left|1\right\rangle

Notice how we separate the input and output qubits.

Next we apply the Hadamard operator to each QBit:

\displaystyle \mathbf{H}^{\otimes n}\otimes \mathbf{H} \left|000...0\right\rangle\left|1\right\rangle = \mathbf{H}^{\otimes n} \left|000...0\right\rangle\otimes \mathbf{H}\left|1\right\rangle

We have seen the Hadamard operator earlier. For the sake of convenience we’ll repeat it here:

\begin{array}{rl}  \displaystyle \mathbf{H} \left| 0 \right\rangle &= \displaystyle \frac{1}{\sqrt{2}}(\left| 0 \right\rangle + \left| 1 \right\rangle)\\  \displaystyle \mathbf{H} \left| 1 \right\rangle &= \displaystyle \frac{1}{\sqrt{2}}(\left| 0 \right\rangle - \left| 1\right\rangle)  \end{array}

We can generalize this by letting x be either 0 or 1:

\begin{array}{rl}  \displaystyle \mathbf{H}\left| x\right\rangle &= \frac{1}{\sqrt{2}}\big(\left|0\right\rangle + (-1)^x \left|1\right\rangle  \end{array}

We can use this to compute the action on two QBits and generalize:

\begin{array}{rl}  \displaystyle \mathbf{H}^{\otimes 2}\left| x\right\rangle &= \mathbf{H}^{\otimes 2}\left| x_0x_1\right\rangle = \mathbf{H}\left| x_0\right\rangle \otimes \mathbf{H}\left| x_1\right\rangle\\  &= \displaystyle \frac{1}{\sqrt{2}}\big(\left|0\right\rangle + (-1)^{x_0} \left|1\right\rangle \otimes \frac{1}{\sqrt{2}}\big(\left|0\right\rangle + (-1)^{x_1} \left|1\right\rangle\\  &= \displaystyle \big(\frac{1}{\sqrt{2}}\big)^2 \big( \left|00\right\rangle + (-1)^{x_0} \left|01\right\rangle + (-1)^{x_1}\left|10\right\rangle + (-1)^{x_1 + x_0}\left|11\right\rangle\\  &= \displaystyle \big(\frac{1}{\sqrt{2}}\big)^2 \big( (-1)^{0\cdot x_1 + 0\cdot x_0} \left|00\right\rangle + (-1)^{0\cdot x_1 + 1\cdot x_0} \left|01\right\rangle + (-1)^{1\cdot x_1 + 0\cdot x_0}\left|10\right\rangle + (-1)^{1\cdot x_1 + 1\cdot x_0}\left|11\right\rangle\\  &= \displaystyle \big(\frac{1}{\sqrt{2}}\big)^2 \big( (-1)^{0\cdot x_1 + 0\cdot x_0} \left|0\right\rangle + (-1)^{0\cdot x_1 + 1\cdot x_0} \left|1\right\rangle + (-1)^{1\cdot x_1 + 0\cdot x_0}\left|2\right\rangle + (-1)^{1\cdot x_1 + 1\cdot x_0}\left|3\right\rangle\\  &= \displaystyle \big(\frac{1}{\sqrt{2}}\big)^2 \sum_{y=0}^{2^2-1} (-1)^{x\cdot y}\left|y\right\rangle  \end{array}

In general,

\displaystyle \mathbf{H}^{\otimes n}\left| x\right\rangle = \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} (-1)^{x\cdot y}\left|y\right\rangle

Using the above formula, the action of \mathbf{H}^{\otimes n} on \left|0\right\rangle is

\displaystyle \mathbf{H}^{\otimes n} \left|0\right\rangle = \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} \left|y\right\rangle

The next step is to apply our operator \mathbf{U}_{f} which was defined above:

\begin{array}{rl}  \displaystyle \mathbf{U}_f\mathbf{H}^{\otimes n}\left| 0\right\rangle \otimes \frac{1}{\sqrt{2}} \big( \left|0\right\rangle - \left|1\right\rangle\big) &= \displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} \mathbf{U}_f\left|y\right\rangle \otimes \frac{1}{\sqrt{2}} \big( \left|0\right\rangle - \left|1\right\rangle\big)\\  &= \displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} \frac{1}{\sqrt{2}} \big( \mathbf{U}_f\left|y\right\rangle \otimes  \left|0\right\rangle - \mathbf{U}_f\left|y\right\rangle \otimes \left|1\right\rangle\big)\\  &= \displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} \frac{1}{\sqrt{2}} \big( \left|y\right\rangle \otimes  \left|0\oplus f(y)\right\rangle - \left|y\right\rangle \otimes \left|1\oplus f(y)\right\rangle\big)\\  \end{array}

Let’s break this down a bit. Since the value of f(y) is either 0 or 1, we have 2 cases:

When f(y) = 0,

\displaystyle \left|y\right\rangle \otimes  \left|0\oplus f(y)\right\rangle = \left|y\right\rangle \otimes  \left|0\right\rangle
\displaystyle \left|y\right\rangle \otimes  \left|1\oplus f(y)\right\rangle = \left|y\right\rangle \otimes  \left|1\right\rangle

which means

\displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} \frac{1}{\sqrt{2}} \big( \left|y\right\rangle \otimes  \left|0\oplus f(y)\right\rangle - \left|y\right\rangle \otimes \left|1\oplus f(y)\right\rangle\big) = \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} \left|y\right\rangle \otimes \frac{1}{\sqrt{2}} \big(  \left|0\right\rangle - \left|1\right\rangle\big)

When f(y) = 1,

\displaystyle \left|y\right\rangle \otimes  \left|0\oplus f(y)\right\rangle = \left|y\right\rangle \otimes  \left|1\right\rangle
\displaystyle \left|y\right\rangle \otimes  \left|1\oplus f(y)\right\rangle = \left|y\right\rangle \otimes  \left|0\right\rangle

which means

\displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} \frac{1}{\sqrt{2}} \big( \left|y\right\rangle \otimes  \left|0\oplus f(y)\right\rangle - \left|y\right\rangle \otimes \left|1\oplus f(y)\right\rangle\big) = \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} (-1) \left|y\right\rangle \otimes \frac{1}{\sqrt{2}} \big(  \left|0\right\rangle - \left|1\right\rangle\big)

Combining these 2 we have

\begin{array}{rl}  \displaystyle \mathbf{U}_f\mathbf{H}^{\otimes n}\left| 0\right\rangle \otimes \frac{1}{\sqrt{2}} \big( \left|0\right\rangle - \left|1\right\rangle\big)  &= \displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} \frac{1}{\sqrt{2}} \big( \left|y\right\rangle \otimes  \left|0\oplus f(y)\right\rangle - \left|y\right\rangle \otimes \left|1\oplus f(y)\right\rangle\big)\\  &= \displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} (-1)^{f(y)} \left|y\right\rangle \otimes \frac{1}{\sqrt{2}} \big(  \left|0\right\rangle - \left|1\right\rangle\big)  \end{array}

Finally, getting the Hadamard action on this gives:

\begin{array}{rl}  \displaystyle \mathbf{H}^{\otimes n}\otimes\mathbf{H}\Big[\mathbf{U}_f\mathbf{H}^{\otimes n}\left| 0\right\rangle \otimes \frac{1}{\sqrt{2}} \big( \left|0\right\rangle - \left|1\right\rangle\big)\Big] &=\mathbf{H}^{\otimes n} \mathbf{U}_f\mathbf{H}^{\otimes n}\left| 0\right\rangle \otimes \mathbf{H}\Big[\frac{1}{\sqrt{2}} \big( \left|0\right\rangle - \left|1\right\rangle\big)\Big]\\  &= \underbrace{\mathbf{H}^{\otimes n} \Big[\displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} (-1)^{f(y)} \left|y\right\rangle \Big]}_{A} \otimes \underbrace{\mathbf{H}\Big[\frac{1}{\sqrt{2}} \big(  \left|0\right\rangle - \left|1\right\rangle\big)\Big]}_{B}  \end{array}

Expanding underbrace B gives us:

\begin{array}{rl}  \displaystyle \mathbf{H}\Big[\frac{1}{\sqrt{2}} \big(  \left|0\right\rangle - \left|1\right\rangle\big)\Big] &= \displaystyle \frac{1}{\sqrt{2}}\big(\mathbf{H}\left|0\right\rangle - \mathbf{H}\left|1\right\rangle\big)\\  &= \displaystyle \frac{1}{\sqrt{2}}\Big[\frac{1}{\sqrt{2}} \big( \left|0\right\rangle + \left|1\right\rangle \big) - \frac{1}{\sqrt{2}}\big( \left|0\right\rangle - \left|1\right\rangle\big) \Big]\\  &= \displaystyle \frac{1}{2}\Big[ 2\left|1\right\rangle \Big]\\  &= \left|1\right\rangle  \end{array}

Expanding underbrace A above:

\begin{array}{rl}  \mathbf{H}^{\otimes n} \Big[\displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} (-1)^{f(y)} \left|y\right\rangle \Big] &= \displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} (-1)^{f(y)}\mathbf{H}^{\otimes n} \left|y\right\rangle  \end{array}

We have found earlier that

\displaystyle \mathbf{H}^{\otimes n}\left| y\right\rangle = \frac{1}{2^{n/2}}\sum_{x=0}^{2^n-1} (-1)^{x\cdot y}\left|x\right\rangle

Substituting this to the above and noting that

\displaystyle f(y) = y\cdot a = \sum_{i=0}^{n-1} y_ia_i \mod 2,

we get:

\begin{array}{rl}  \mathbf{H}^{\otimes n} \Big[\displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} (-1)^{f(y)} \left|y\right\rangle \Big] &= \displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} (-1)^{f(y)}\mathbf{H}^{\otimes n} \left|y\right\rangle \\  &= \displaystyle \frac{1}{2^{n}}\sum_{y=0}^{2^n-1} \sum_{x=0}^{2^n-1} (-1)^{f(y)}(-1)^{y\cdot x}\left|x\right\rangle\\  &= \displaystyle \frac{1}{2^{n}}\sum_{y=0}^{2^n-1} \sum_{x=0}^{2^n-1} (-1)^{y\cdot a}(-1)^{y\cdot x}\left|x\right\rangle\\  &= \displaystyle \frac{1}{2^{n}}\sum_{x=0}^{2^n-1} \underbrace{\sum_{y=0}^{2^n-1} (-1)^{(a+x)\cdot y}}\left|x\right\rangle\\  \end{array}

where we have interchanged the order of the summation. To simplify the term with the underbrace, notice that

\begin{array}{rl}  \displaystyle \Big[1+ (-1)^{z_0}\Big]\Big[1+(-1)^{z_1}\Big] &= 1 + (-1)^{z_0} + (-1)^{z_1} + (-1)^{z_0+z_1}\\  &= \displaystyle (-1)^{0\cdot z_0 + 0\cdot z_1} + (-1)^{1\cdot z_0 + 0\cdot z_1} + (-1)^{0\cdot z_0 + 1\cdot z_1} + (-1)^{1\cdot z_0+ 1\cdot z_1}\\  &= \displaystyle(-1)^{0\cdot z} + (-1)^{1\cdot z} + (-1)^{2\cdot z} + (-1)^{3\cdot z}\\  &= \displaystyle \sum_{y=0}^{2^2-1} (-1)^{y\cdot z}  \end{array}

We can therefore write the factor in underbrace as

\begin{array}{rl}  \mathbf{H}^{\otimes n} \Big[\displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} (-1)^{f(y)} \left|y\right\rangle \Big] &= \displaystyle \frac{1}{2^{n}}\sum_{x=0}^{2^n-1} \underbrace{\sum_{y=0}^{2^n-1} (-1)^{(a+x)\cdot y}}\left|x\right\rangle\\  &= \displaystyle \frac{1}{2^{n}}\sum_{x=0}^{2^n-1} \prod_{i=0}^{n-1} \Big[ \underbrace{1 +(-1)^{(a_i+x_i)}}\Big] \left|x\right\rangle\\  \end{array}

Notice that

\displaystyle 1 +(-1)^{(a_i+x_i)} = \begin{cases}  0 & x\ne a\\  2 & x=a  \end{cases}

that is, the product \prod_{i=0}^{n-1} \Big[1 + (-1)^{(a_i+x_i)}\Big] = 0 unless x=a. Therefore

\begin{array}{rl}  \mathbf{H}^{\otimes n} \Big[\displaystyle \frac{1}{2^{n/2}}\sum_{y=0}^{2^n-1} (-1)^{f(y)} \left|y\right\rangle \Big]  &= \displaystyle \frac{1}{2^{n}}\sum_{x=0}^{2^n-1} \prod_{i=0}^{n-1} \Big[ \underbrace{1 +(-1)^{(a_i+x_i)}}\Big] \left|x\right\rangle\\  &= \displaystyle \frac{1}{2^{n}} 2^n \left|a\right\rangle\\  &= \left|a\right\rangle  \end{array}

Therefore, the quantum computation gives the answer in one application of \mathbf{U}_f:

\displaystyle \mathbf{H}^{\otimes n+1}\mathbf{U}_f \mathbf{H}^{\otimes n+1} \left|0\right\rangle = \left| a\right\rangle \left| 1\right\rangle

So what just happened? We began with a superposition of states and finished with the correct answer in one application of the black box operator \mathbf{U}_f. That’s the power of Quantum Computing!

Have You Seen A Half-Balloon?

Balloons come in all shapes and sizes but the shape the comes to mind when I hear the word balloon is that of a spherical or round shape. Given a spherical balloon, create an imaginary partition that divides the balloon into left and right hemispheres. This effectively places the air molecules into left and right. The number of molecules in the left is more or less the same as the number of molecules on the right. This is what we see in real life.

As we pump air into the balloon, a given air molecule can randomly go left or right of the partition. If all air molecules for some reason go left (or right), then we have a situation called a “half-balloon”.

So why aren’t we seeing half-balloons? It’s because that situation is highly improbable. For illustration purposes, imagine there are 6 air molecules. Each molecule has two equally likely choices, be on the left or on the right. Now imagine each air molecule has chosen it’s “side”. A possible combination or configuration (a term we will use moving forward) will be 3 molecules choose Left and 3 choose Right:

L  L  L  R  R  R

Where L stands for Left and R stands for Right.

In fact, the number of such configurations is equal to

\displaystyle \underbrace{2\times 2\times \ldots \times 2}_{\text{6 times}} = 2^6 = 64

We enumerate below the list of possible air molecule configurations:


# This is the code in R
# x=c("L","R")
# expand.grid(m1=x,m2=x,m3=x,m4=x,m5=x,m6=x)
# d1=expand.grid(m1=x,m2=x,m3=x,m4=x,m5=x,m6=x)
   m1 m2 m3 m4 m5 m6
1   L  L  L  L  L  L
2   R  L  L  L  L  L
3   L  R  L  L  L  L
4   R  R  L  L  L  L
5   L  L  R  L  L  L
6   R  L  R  L  L  L
7   L  R  R  L  L  L
8   R  R  R  L  L  L
9   L  L  L  R  L  L
10  R  L  L  R  L  L
11  L  R  L  R  L  L
12  R  R  L  R  L  L
13  L  L  R  R  L  L
14  R  L  R  R  L  L
15  L  R  R  R  L  L
16  R  R  R  R  L  L
17  L  L  L  L  R  L
18  R  L  L  L  R  L
19  L  R  L  L  R  L
20  R  R  L  L  R  L
21  L  L  R  L  R  L
22  R  L  R  L  R  L
23  L  R  R  L  R  L
24  R  R  R  L  R  L
25  L  L  L  R  R  L
26  R  L  L  R  R  L
27  L  R  L  R  R  L
28  R  R  L  R  R  L
29  L  L  R  R  R  L
30  R  L  R  R  R  L
31  L  R  R  R  R  L
32  R  R  R  R  R  L
33  L  L  L  L  L  R
34  R  L  L  L  L  R
35  L  R  L  L  L  R
36  R  R  L  L  L  R
37  L  L  R  L  L  R
38  R  L  R  L  L  R
39  L  R  R  L  L  R
40  R  R  R  L  L  R
41  L  L  L  R  L  R
42  R  L  L  R  L  R
43  L  R  L  R  L  R
44  R  R  L  R  L  R
45  L  L  R  R  L  R
46  R  L  R  R  L  R
47  L  R  R  R  L  R
48  R  R  R  R  L  R
49  L  L  L  L  R  R
50  R  L  L  L  R  R
51  L  R  L  L  R  R
52  R  R  L  L  R  R
53  L  L  R  L  R  R
54  R  L  R  L  R  R
55  L  R  R  L  R  R
56  R  R  R  L  R  R
57  L  L  L  R  R  R
58  R  L  L  R  R  R
59  L  R  L  R  R  R
60  R  R  L  R  R  R
61  L  L  R  R  R  R
62  R  L  R  R  R  R
63  L  R  R  R  R  R
64  R  R  R  R  R  R

Looking at the data above, we can see that there are 6 configurations that contain only 1 “L”:

# This is the code in R
# cc=c()
# for(i in 1:64){
#  tmp=d1[i,]
#  cc=c(cc,sum(tmp == "L"))
# }
# d1$count = cc
# d1[d1$count == 1, ]
   m1 m2 m3 m4 m5 m6 count
32  R  R  R  R  R  L     1
48  R  R  R  R  L  R     1
56  R  R  R  L  R  R     1
60  R  R  L  R  R  R     1
62  R  L  R  R  R  R     1
63  L  R  R  R  R  R     1

This means that the probability of getting a configuration having one molecule on the left and 5 molecules on the right is:

\displaystyle \text{Probability of 1 molecule Left and 5 molecules Right} = \frac{6}{2^6} = 0.093750

If we continue summarizing our data so that we get the number of combinations containing 2, 3, 4, 5, and 6 “L”s, we can generate what is known as a probability distribution of air molecules on the left hemisphere:

# cc=c()
# for(i in 0:6){
#  cc=c(cc,length(attributes(d1[d1$count == i, ])$row.names))
# }
# data.frame(X=0:6,count=cc,probability=cc/2^6)
  X count probability
1 0     1    0.015625
2 1     6    0.093750
3 2    15    0.234375
4 3    20    0.312500
5 4    15    0.234375
6 5     6    0.093750
7 6     1    0.015625

The graph of this probability distribution is shown below:

# Here is the code that generated the plot above
barplot(cc/2^6,names.arg=0:6,main="Probability Distribution \nNumber of Air Molecules on the Left Hemisphere")

The graph above tells us that the most probable configuration of balloon molecules is that half of them are on the right and the other half on the left as shown by the high probability of the value X = 3. It also tells us the probability of all molecules choosing the Left side equal to 0.015625. When the number of air molecules is very large, this probability will turn out to be extremely small, as we will show below.

The Mathematics

At this point, manually counting the number of rows will not be practical anymore for a large number of molecules. We need to use some mathematical formula to compute the combinations. We don’t really care which molecules chose left or right. We just care about the number of molecules on the left. Given N molecules, there are 2^N possible configurations of air molecules. Of these configurations, there are

\displaystyle {N \choose m } = \frac{N!}{m! (N-m)!}

combinations that have m molecules on the left. Therefore, the probability of a configuration having m molecules on the left is

P(m) = \displaystyle \frac{\displaystyle {N\choose m}}{\displaystyle 2^N}

This is a probability density function since

\displaystyle \sum_{m=0}^N P(m) = \displaystyle \sum_{m=0}^N \frac{\displaystyle {N\choose m}}{\displaystyle 2^N} = 1

To show this, we will use the Binomial Theorem

\displaystyle \sum_{m=0}^N {N \choose m} x^m = (1+x)^N

If we let x=1, the Binomial Theorem gives us

\displaystyle \sum_{m=0}^N {N \choose m} = (1+1)^N = 2^N

Therefore

\begin{array}{rl}  \displaystyle \sum_{m=0}^N P(m) &= \displaystyle \sum_{m=0}^N \frac{\displaystyle {N\choose m}}{\displaystyle 2^N}\\  &= \displaystyle \frac{1}{2^N} \sum_{m=0}^N {N\choose m}\\  &= \displaystyle \frac{1}{2^N} 2^N\\  &= 1  \end{array}

A Mole of Air

One mole of air contains 6.022 \times 10^{23} . This means the probability of that all molecules are on the left side of the balloon is

\displaystyle \frac{1}{\displaystyle 2^{6.022 \times 10^{23}}} < 1/1024^{10^{22}} < 1/(10^3)^{10^{22}}=10^{-30,000,000,000,000,000,000,000}

This is a very small number such that it contains 30 million trillion zeros to the right of the decimal point. When you write this zeros on a piece of paper with a thickness of 0.05 millimeters, you would need to stack it up to a height 74 times the distance of earth to pluto in kilometers!

An Interview Question

I was given this interview question and I’d like to share it to you. The setting is back in the days when the largest size of a hard disk was 1 GB and there were no CD writers yet and the only way to back up your data is through those 1.44 floppy disks. You want to back up your files but you want to minimize the number of floppy disks you need to use. Assume your most important files are in a single directory. How will you distribute the files across your disks in such a way that the number of disks you use is minimized ?

Click here to see the Integer Programming Solution.

To make this simple, let’s assume the following:

– we will not take into account that every file you copy to the disk has a record of the metadata of the file and stored on the disk as well. This will eat up space as you put more files. For our purposes, we ignore this complexity.
– The size of each file is less than or equal to 1.44

First we need to have a list of those files including the size and sort the list according to size in descending order. If A is the list of files, we can apply this algorithm:

B := list of files to copy to current floppy disk
remaining_size := 1.44 MB
For file in A:
    If remaining_size - file.size > 0:
        B.add(file)
        A.remove(file)
Copy all files listed in B to disk
Empty B
Repeat process for remaining files in A

Although there are other better algorithms than the one above, this is the one I managed to to come up during the interview.

We now need to determine how fast our algorithm can run.

Worst Case Complexity

How slow can this algorithm get ? If for any two files F_i and F_j in A we have F_i + F_j > 1.44, then all files will have their own diskette. If this is the case, for each file, our algorithm will execute step 4. For the first disk, it will execute the step N times. For the second disk, it will execute the step N-1 times, for the third disk it will execute N-2 times, etc. the total number of times it executes step 4 is the total number of comparisons and is equal to the summation:

\displaystyle \sum_{1=1}^{N} i

which is equal to

\displaystyle \frac{N(N+1)}{2}

Therefore, in the worst case, the complexity is O(N^2).

Best Case Complexity

The best case is when all files fit in just one diskette. For this, the total number of comparisons is N

Average Case Complexity

On the average, files have different sizes. We now compute the complexity on the assumption that the probability distribution is uniform.

If k is the number of diskettes, the number of comparisons is a sequence of monotonic decreasing numbers \{ a_1, a_2, a_3, \ldots, a_k \} taken at random from the set \{ 1, 2, \ldots, N\}. Each of the numbers a_j, j\in \{1, 2, \ldots, k\} has a probability 1/N of being chosen. Let X be a random variable such that

\displaystyle Pr(X=a_j) = \frac{1}{N} \text{ for } j=1,2,\ldots,k

then the number of comparisons C is equal to

\displaystyle C = \sum_{i=1}^k X = kX

The expected value of C is given by the

E[C] = E[kX] = kE[X]

However, the expected value of X is given by

\displaystyle E[X] = \sum_{j=1}^N j\cdot Pr(X=j) = \frac{1}{N} \sum_{j=1}^N j = \frac{1}{N}\frac{N(N+1)}{2} = \frac{N+1}{2}

Therefore,

\displaystyle E[C] = k\frac{N+1}{2}

What remains is to determine the average value of k, which is the number of diskettes. If M=1.44 is the maximum file size, the average file size is M/2. The average total file size is then NM/2. The average number of diskettes is equal to the average total size divided by size of diskette, that is

k = \displaystyle \frac{NM}{2}\frac{1}{M} = \frac{N}{2}

This means that

\displaystyle E[C] = \frac{N}{2} \frac{N+1}{2} = O(N^2)

which is the same as the worst case complexity.

There is another way to solve this problem using Integer Programming.

Asynchronous Versus Synchronous: A Performance Perspective

I just had my coffee from my favorite coffee shop down the corner. I’m a regular at that coffee shop that the baristas know me and I’m able to sit down after I order and do something else while they make my coffee and serve to me when ready.  If I order from other branches of this coffee shop, I would have to wait for my coffee in a corner and not being able to do anything productive.  So while I was seated comfortable waiting for my coffee, a thought came to me. This is a great example of synchronous versus asynchronous service invocation.

A synchronous service invocation is just like buying your coffee and having to wait in the corner for the baristas to complete your coffee before you can even sit down and do something useful. Asynchronous service invocation is having to get seated and do something else while waiting for your coffee to be served.  My instinct tells me that asynchronous invocation is better than synchronous in terms of performance especially when it takes a long for the service to complete and the waiting time is much better used doing something else. But how can we show that this is the case?

We can model the synchronous/asynchronous invocation as a Markov Chain and compute a few metrics from our model. If you’re not familiar with this approach, you can refer to this article MODELING QUEUING SYSTEMS USING MARKOV CHAINS.

First, we model the asynchronous invocation. We can identify each state of our system by  2 slots each of which contains a number. The first slot is the number of responses waiting for the client to process and the second slot is the number of requests waiting for the server to process.  To simplify the modeling, we assume the following:

  1. The maximum number of requests that a server can handle at any given time is 2. Which means that a client cannot anymore send a request if the second number is equal to 2.
  2. When the server responds, the first number increments by 1. Once a client receives a response, it will stop whatever it is doing and process that response. When the client finishes processing the response, the first number goes down to zero. As a consequence, this assumption says that the maximum value of slot number 1 is 1 at any given time.

With these assumptions, we can draw the Markov Diagram of the asynchronous system to come up with the below diagram:

Explanation of the Diagram

The system initially starts at 00 where there are no requests and no responses. It will then transition to state 01 when the client will send a  request which increments the server’s pending requests to 1. From this state, the system can transition to either state 02 or state 10. State 02 occurs when the client sends another request without waiting for a response. State 10 is when the server is able to process the request and return a response, thereby decrementing it’s pending items and incrementing the client’s number of responses to process.

At state 02, the client cannot anymore send requests since the maximum requests that the server can handle is 2. At this state, it can only transition to state 11, where the server has processed one of the requests (thus decrementing the second slot from 2 to 1 and incrementing the first slot from 0 to 1).

State 11 can only transition to 01 since the client will stop whatever it is doing to process the single response it has received (thus decrementing the first slot from 1 to 0).

The numbers a,b,c are the rates at which the transitions will occur. For example, the rate at which the system will transition from state 00 to state 01 is b.

In the remainder of this article, we assume that the client can request at the rate of 60 requests per unit time and can process the response at 60 per unit time. We also assume that the server can process requests at 30 per unit time. We therefore have the following:

a=60,  b=60,  c=30

Computing the Probabilities

At steady state, the flow going into each state is equal to the flow out of that state. For example, for state 00, the following is true:

\displaystyle bP_{00} = aP_{10}

Doing this for all states, we get the following balance equations:

\begin{array}{rlr}  \displaystyle  bP_{00} &= aP_{10} & (1)\\  bP_{00} + a P_{11} &= bP_{01} + cP_{01} & (2)\\  bP_{01} &= cP_{02} & (3)\\  aP_{10} &= cP_{01} & (4)\\  aP_{11} &= cP_{02} & (5)  \end{array}

Since the sum of the probabilities should be equal to 1, we have our last equation:

P_{00} + P_{01} + P_{02} + P_{10} + P_{11} = 1

We have 6 equations in 5 unknowns, one of the equations above is actually redundant. In fact, you can show that equation 2 above is redundant.

We can form the matrix equation of the system of equations above and solve for the probabilities:

\begin{bmatrix}  -b & 0 & 0 & a & 0 \\  0 & b & -c & 0 & 0 \\  0 & -c & 0 & a & 0 \\  0 & 0 & -c & 0 & a \\  1 & 1 & 1 & 1 & 1  \end{bmatrix}  \begin{bmatrix}  P_{00}\\  P_{01}\\  P_{02}\\  P_{10}\\  P_{11}  \end{bmatrix}  =  \begin{bmatrix}  0\\  0\\  0\\  0\\  1  \end{bmatrix}

Solving this matrix equation when a=60, b=60 and c=30, we can find the probabilities:

\begin{bmatrix}  P_{00}\\  P_{01}\\  P_{02}\\  P_{10}\\  P_{11}  \end{bmatrix}  =  \displaystyle  \frac{1}{ab^2+abc+ac^2+b^2c+bc^2}  \begin{bmatrix}  ac^2\\  abc\\  ab^2\\  bc^2\\  b^2c  \end{bmatrix}  =  \begin{bmatrix}  1/10 \\ 1/5 \\2/5\\ 1/10\\ 1/5  \end{bmatrix}

Utilization and Throughput: Asynchronous Case

The client utilization is defined to be the probability that the client is busy. In the same way, the server utilization is the probability that the server is busy. Looking at the diagram, the client is busy sending requests at state 00 and state 01. It is busy processing responses at state 10 and 11.  On the other hand, the server is busy processing requests at state 01 and 02.

Therefore, the client utilization is  equal to

\begin{array}{rl}  U_{\text{client}} &= P_{00} + P_{01} + P_{10} + P_{11}\\  &= 1/10 + 1/5 + 1/10 + 1/5\\  &= 3/5\\  &= 60\%  \end{array}

The server utilization is equal to

\begin{array}{rl}  U_{\text{server}} &= P_{01} + P_{02}\\  &= 1/5 + 2/5 \\  &= 3/5 \\  &= 60\%  \end{array}

The system througput is the number of requests the client is able to submit and is equal to

\begin{array}{rl}  X &= 60P_{00} + 60 P_{01} \\  &= 60*1/10 + 60 * 1/5 \\  &= 18  \end{array}

Comparison with Synchronous Invocation

For the synchronous invocation, the client will submit a request at state 00. The server will then receive this request and process it immediately at state 01. The client will get the response and process it at state 10 and do the loop again at state 00. Please see the Markov diagram below describing this process.

We can solve the probabitlies of this Markov Chain by solving the balance equation

\begin{bmatrix}  b & 0 & -a \\  0 & c & -a \\  1 & 1 & 1  \end{bmatrix}  \begin{bmatrix}  P_{00}\\  P_{01}\\  P_{10}  \end{bmatrix}  =  \begin{bmatrix}  0\\0\\1  \end{bmatrix}

Solving for the probabilities, we get

\begin{bmatrix}  P_{00}\\  P_{01}\\  P_{10}  \end{bmatrix}  =  \displaystyle  \frac{1}{ab + ac + bc}  \begin{bmatrix}  ac\\  ab\\  bc  \end{bmatrix}  =  \begin{bmatrix}  1/4\\ 1/2\\ 1/4  \end{bmatrix}

At state 00, the client is busy sending requests at the rate of 60 requests per unit time, therefore the system throughput is

\begin{array}{rl}  X &= 60 P_{00} \\  &= 60/4 \\  &= 15  \end{array}

which is less than the Asynchronous case. In addition, the client utilization is

\begin{array}{rl}  U_{\text{client}} &= P_{00} + P_{10}\\  &= 1/4 + 1/4\\  &= 1/2  \end{array}

This is lower than the Asynchronous case where the client utilization is 60%.

Therefore, the asynchronous service invocation is indeed better compared to the synchronous case. It has a much higher throughput and a much higher utilization compared to the synchronous case.

P.S. I used Mathics to solve the matrix equations. Mathics is an open source alternative to Mathematica. It features Mathematica-compatible syntax and functions so if you’re a user of Mathematica you can run basic commands in Mathics. If you’re a Mathics user, you can easily switch to Mathematica as well. Please visit http://mathics.github.io/ to know more about Mathics.