## RSA Encryption and Quantum Computing

We are going to crack the RSA by finding the period of the ciphertext. Let m be the message, c the ciphertext, e the encoding number and N=pq such that

$c=m^e \mod N$

From Period Finding and the RSA post, the period of a ciphertext is defined as the smallest number r such that

$c^r \equiv \mod N$

First we prepare $n=2n_0$ QBits at the state $\left|0\right\rangle$ where $n_0$ is the number of bits of $N=pq$. The number of bits can be computed using the formula:

$\lceil \log_2(N) \rceil$

We also prepare $n_0$ QBits for the output register.

So initially we have the following setup:

$\underbrace{\left|00\ldots 0\right\rangle}_{n\text{ times}} \otimes \underbrace{\left|00\ldots0\right\rangle}_{n_0\text{ times}}$

The output register will contain the output of the operator:

$\mathbf{U}_f \left|x\right\rangle\left|0\right\rangle = \left|x\right\rangle\left|f(x)\right\rangle$

where

$f(x) = c^x \mod N$

We now apply the Hadarmard gate to the input register and apply the operator $\mathbf{U}_f$:

$\displaystyle \frac{1}{2^{n/2}} \sum_{x=0}^{2^n-1} \left|x\right\rangle\left|f(x)\right\rangle$

The Hadamard gates transform the state into a superposition of all possible values of the period while the operator $\mathbf{U}_f$ transforms the output QBits as superposition of all possible values of $c^x\mod N$.

Next we make a measurement of the output QBits. This will give us a number $f_0$ which corresponds to all states whose value of $\left|x\right\rangle$ is of the form $x_0+kr$ since

$\begin{array}{rl} c^{x_0+kr} & \equiv c^{x_0}c^{kr} \mod N \\ & \equiv c^{x_0}(c^{r})^k \mod N \\ & \equiv c^{x_0} \mod N \end{array}$

since $(c^{r})^k \equiv 1 \mod N$ by definition of a period.

Therefore, after the measurement, the input state is now

$\displaystyle \left|\Psi\right\rangle = \frac{1}{\sqrt{m}} \sum_{k=0}^{m-1} \left|x_0 + kr\right\rangle$

where m is the smallest integer such that $x_0 + mr \ge 2^{n}$.

We will then apply the quantum Fourier transform to state $\left|\Psi\right\rangle$. The Quantum Fourier Transform is defined by its action on an arbitrary $\left|x\right\rangle$

$\displaystyle \mathbf{U}_{\mathbf{FT}}\left|x\right\rangle = \frac{1}{2^{n/2}} \sum_{y=0}^{2^n-1} e^{2\pi ixy/2^n}\left|y\right\rangle$

Applying the Fourier Transform to the state $\left|\Psi\right\rangle$,

$\begin{array}{rl} \displaystyle \mathbf{U}_{\mathbf{FT}} \frac{1}{\sqrt{m}} \sum_{k=0}^{m-1}\left|x_0 + kr\right\rangle &= \displaystyle \frac{1}{\sqrt{m}} \sum_{k=0}^{m-1}\mathbf{U}_{\mathbf{FT}} \left|x_0 + kr\right\rangle\\ &= \displaystyle \frac{1}{\sqrt{m}} \sum_{k=0}^{m-1} \frac{1}{2^{n/2}} \sum_{y=0}^{2^n-1} e^{2\pi i(x_0+kr)y/2^n}\left|y\right\rangle\\ &= \displaystyle \frac{1}{\sqrt{m}} \sum_{k=0}^{m-1} \frac{1}{2^{n/2}} \sum_{y=0}^{2^n-1} e^{2\pi ix_0y/2^n}e^{2\pi ikry/2^n}\left|y\right\rangle\\ &= \displaystyle \sum_{y=0}^{2^n-1} \underbrace{e^{2\pi ix_0y/2^n} \frac{1}{\sqrt{2^n m}} \sum_{k=0}^{m-1} e^{2\pi ikry/2^n}}_{\text{probability amplitude}}\left|y\right\rangle \end{array}$

Now if we make a measurement of this new state, the probability of getting a value of y is the probability amplitude multiplied by it’s complex conjugate:

$\displaystyle \Big[\underbrace{e^{2\pi ix_0y/2^n} } \frac{1}{\sqrt{2^n m}} \sum_{k=0}^{m-1} e^{2\pi ikry/2^n}\Big]\Big[\underbrace{e^{-2\pi ix_0y/2^n} } \frac{1}{\sqrt{2^n m}} \sum_{k=0}^{m-1} e^{-2\pi ikry/2^n}\Big]$

The factors in underbrace evaluates to 1 when multiplied and we’re left with

$\displaystyle \frac{1}{2^n m}\Big| \underbrace{\sum_{k=0}^{m-1} e^{2\pi ikry/2^n}}\Big|^2$

Using the formula

$e^{i\theta} = \cos(\theta) + i\sin(\theta)$

we can write the summation in underbrace as

$\displaystyle \Big(\sum_{k=0}^{m-1} \cos(2\pi kry/2^n)\Big)^2 + \Big(\sum_{k=0}^{m-1} \sin(2\pi kry/2^n)\Big)^2$

We’d like to know at what value of y the expression above is maximum. We can use the following formula to simplify above summations:

$\displaystyle \sum_{k=0}^{m-1} \cos(2\pi fk) = \frac{\cos(\pi f(m-1)) \sin(\pi fm)}{\sin(\pi f)}\\ \displaystyle \sum_{k=0}^{m-1} \sin(2\pi fk) = \frac{\sin(\pi f(m-1)) \sin(\pi fm)}{\sin(\pi f)}$

If we let

$f = ry/2^n$

this reduces the summations to:

$\displaystyle \Big(\sum_{k=0}^{m-1} \cos(2\pi kry/2^n)\Big)^2 + \Big(\sum_{k=0}^{m-1} \sin(2\pi kry/2^n)\Big)^2$

$\begin{array}{rl} &=\displaystyle \frac{\cos^2(\pi f(m-1)) \sin^2(\pi fm)}{\sin^2(\pi f)} + \frac{\sin^2(\pi f(m-1))\sin^2(\pi fm))}{\sin^2(\pi f)}\\ &= \displaystyle \frac{\sin^2(\pi fm)}{\sin^2(\pi f)}\Big[\underbrace{\cos^2(\pi f(m-1)) + \sin^2(\pi f(m-1))}_{\text{= 1}}\Big]\\ &= \displaystyle \frac{\sin^2(\pi fm)}{\sin^2(\pi f)} \end{array}$

Therefore, the probability of getting a specific y is

$p(y) = \displaystyle \frac{1}{2^nm} \displaystyle \frac{\sin^2(\pi fm)}{\sin^2(\pi f)}$

Here is a plot of the function $\sin^2(\pi fm)/\sin^2(\pi f)$ for m=4:

Notice that the graph attains its maximums when the values of f are integer values, that is when

$\displaystyle f = j= \frac{ry}{2^n} \text{ where } j= 1, 2, 3, 4, \ldots, r$

Since y is an integer value, this means that the value of y must be close to $j2^n/r$. We can use a theorem in Number theory which states that if $p/q$ is a rational number that approximates a real number x and

$\displaystyle \Big|\frac{p}{q} - x\Big| \le \frac{1}{2q}$

we can approximate x using continued fraction expansion of p/q. In fact, if we find a y that is within 1/2 of one of the $j2^n/r$ values, we have

$\displaystyle \left| y-j\frac{2^n}{r}\right| \le \displaystyle \frac{1}{2}\\$

and dividing both sides by $2^n$, we get

$\displaystyle \left| \frac{y}{2^n}-\frac{j}{r}\right| \le \displaystyle \frac{1}{2^{n+1}}$

which satisfies the theorem. We can therefore expand $y/2^n$ via continued fraction expansion and continue until we find a denominator $r$ that is less that $2^{n_0}$. We then test whether r could be our period if it satisfies

$\displaystyle c^r \equiv 1$

If it satisfies the above, then r is the period. It’s just a matter of computing $d^\prime$, the inverse of $e$ modulo r satisfying

$\displaystyle ed^\prime \equiv 1 \mod r$

Having computed $d^\prime$, we can then decrypt the ciphertext using

$m = c^{d^\prime} \mod N$

In our example in period finding, let’s suppose that the quantum computer gave us the following number

$y=1446311$

We can expand the fraction $y/2^n = 1446311/16777216$ as a continued fractions as follows:

$\displaystyle \frac{y}{2^n} = \frac{1446311}{16777216} = \frac{1}{11 + \displaystyle \frac{1}{1+\displaystyle\frac{1}{1+\displaystyle\frac{1}{1}}}} = \frac{5}{58}$

or

$\displaystyle \frac{y}{2^n} = \frac{1446311}{16777216} = \frac{1}{11 + \displaystyle \frac{1}{1+\displaystyle\frac{1}{1+\displaystyle\frac{1}{\displaystyle\frac{1}{6886}}}}} = \frac{34433}{399423}$

We can stop until $q=58$ since $q=58 < 2^{n_0} = 2^{12} = 4096$. So we use the first expansion where $q=58$ and testing it

$\begin{array}{rl} \displaystyle c^{58} \mod 3127 &= \displaystyle 794^{58} \mod 3127\\ &= \displaystyle 1 \mod 3127 \end{array}$

which confirms that $q=58$ is a period. Using $r=58$ in the equation

$\displaystyle ed^\prime = m\times r + 1$

and solving for $d^\prime$ (which is the inverse of e modulo 58) we get

$\displaystyle d^\prime = 25$

We can now get the original message using

$\displaystyle 0794^{25} \mod 3127 = 1907$

which agrees with our original message.

How lucky do we need to get in order for the state to collapse to one of these y values close to $j2^n/r$? Let’s evaluate the probability for

$y_j=\displaystyle j\frac{2^n}{r} \pm \delta_j$

where $\delta_j$ is the distance of the closest y to $j2^n/r$. Since $f = ry/2^n$, we have

$\begin{array}{rl} p(y_j) = \displaystyle \frac{1}{2^nm}\frac{\sin^2(\pi fm)}{\sin^2(\pi f)} &= \displaystyle \frac{1}{2^nm} \frac{\sin^2(\pi mr\displaystyle\frac{y}{2^n})}{\sin^2(\pi r\displaystyle\frac{y}{2^n}) }\\ &= \displaystyle \frac{1}{2^nm} \frac{\sin^2\Big[\displaystyle\frac{\pi mr}{2^n}(j\frac{2^n}{r} \pm \delta_j)\Big]}{\sin^2\Big[\displaystyle\frac{\pi r}{2^n}(j\frac{2^n}{r} \pm \delta_j)\Big] }\\ &= \displaystyle \frac{1}{2^nm} \frac{\sin^2\Big[\displaystyle j\pi m \pm \frac{\pi m r\delta_j}{2^n})\Big]}{\sin^2\Big[\displaystyle j\pi \pm \frac{\pi r\delta_j}{2^n}\Big] }\\ &= \displaystyle \frac{1}{2^nm} \frac{\sin^2\Big[\displaystyle \frac{\pi m r\delta_j}{2^n})\Big]}{\sin^2\Big[\displaystyle \frac{\pi r\delta_j}{2^n}\Big] } \end{array}$

We can approximate the denominator using the small angle approximation of sine:

$\sin(x) \approx x$

to get

$\displaystyle p(y_j) = \displaystyle \frac{1}{2^nm} \frac{\sin^2\Big[\displaystyle \frac{\pi m r\delta_j}{2^n})\Big]}{\left[\displaystyle \frac{\pi r\delta_j}{2^n}\right]^2}$

Since m is the smallest integer such that $x_0 + mr \ge 2^{n}$, then

$m = \displaystyle \left\lceil\frac{2^n}{r}\right\rceil$

We can use this to approximate

$\displaystyle m\frac{r}{2^n} \approx 1$

to get

$p(y_j) = \displaystyle \frac{1}{2^nm} \frac{\sin^2(\displaystyle \pi\delta_j)}{\left(\displaystyle \frac{\pi\delta_j}{m}\right)^2}$

Since $0 \le \delta_j \le 1/2$, we can see from the graph below that the line joining (0,0) and $(\pi/2,1)$, whose equation is $2x/\pi$ is less that the value of the sine function at that interval, that is,

$\displaystyle \frac{2x}{\pi} \le \sin(x)$

Using this inequality, we can get a lower bound of the probability of getting a specific y value to be

$\begin{array}{rl} p(y_j) = \displaystyle \frac{1}{2^nm} \frac{\sin^2(\displaystyle \pi\delta_j)}{\left(\displaystyle \frac{\pi \delta_j}{m}\right)^2} &\ge \displaystyle \frac{1}{2^nm} \left(\frac{\displaystyle \frac{2\pi\delta_j}{\pi}}{\displaystyle \frac{\pi \delta_j}{m}}\right)^2\\ &= \displaystyle \frac{1}{2^nm} \left( \frac{2m}{\pi}\right)\\ &= \displaystyle \frac{m}{2^n}\frac{4}{\pi^2}\\ &= \displaystyle \frac{1}{r}\frac{4}{\pi^2} \end{array}$

Since there are r of these $y_j$‘s, the probability of getting any one of these $y_j$‘s is therefore:

$p(y) = \displaystyle \frac{4}{\pi^2} \approx 0.405$

which means that there is 40% probability of the state to collapse to a value of y that will give us the correct period.

We have just seen how a quantum computer can be used to crack the RSA. We have exploited the fact that the measurement will give us a value of y that is near $j2^n/r$ with high probability. We then computed for the period using the continued fraction expansion of $y/2^n$. With the period computed, it’s becomes straightforward to decrypt the ciphertext.

## Have You Seen A Half-Balloon?

Balloons come in all shapes and sizes but the shape the comes to mind when I hear the word balloon is that of a spherical or round shape. Given a spherical balloon, create an imaginary partition that divides the balloon into left and right hemispheres. This effectively places the air molecules into left and right. The number of molecules in the left is more or less the same as the number of molecules on the right. This is what we see in real life.

As we pump air into the balloon, a given air molecule can randomly go left or right of the partition. If all air molecules for some reason go left (or right), then we have a situation called a “half-balloon”.

So why aren’t we seeing half-balloons? It’s because that situation is highly improbable. For illustration purposes, imagine there are 6 air molecules. Each molecule has two equally likely choices, be on the left or on the right. Now imagine each air molecule has chosen it’s “side”. A possible combination or configuration (a term we will use moving forward) will be 3 molecules choose Left and 3 choose Right:

L  L  L  R  R  R


Where L stands for Left and R stands for Right.

In fact, the number of such configurations is equal to

$\displaystyle \underbrace{2\times 2\times \ldots \times 2}_{\text{6 times}} = 2^6 = 64$

We enumerate below the list of possible air molecule configurations:


# This is the code in R
# x=c("L","R")
# expand.grid(m1=x,m2=x,m3=x,m4=x,m5=x,m6=x)
# d1=expand.grid(m1=x,m2=x,m3=x,m4=x,m5=x,m6=x)
m1 m2 m3 m4 m5 m6
1   L  L  L  L  L  L
2   R  L  L  L  L  L
3   L  R  L  L  L  L
4   R  R  L  L  L  L
5   L  L  R  L  L  L
6   R  L  R  L  L  L
7   L  R  R  L  L  L
8   R  R  R  L  L  L
9   L  L  L  R  L  L
10  R  L  L  R  L  L
11  L  R  L  R  L  L
12  R  R  L  R  L  L
13  L  L  R  R  L  L
14  R  L  R  R  L  L
15  L  R  R  R  L  L
16  R  R  R  R  L  L
17  L  L  L  L  R  L
18  R  L  L  L  R  L
19  L  R  L  L  R  L
20  R  R  L  L  R  L
21  L  L  R  L  R  L
22  R  L  R  L  R  L
23  L  R  R  L  R  L
24  R  R  R  L  R  L
25  L  L  L  R  R  L
26  R  L  L  R  R  L
27  L  R  L  R  R  L
28  R  R  L  R  R  L
29  L  L  R  R  R  L
30  R  L  R  R  R  L
31  L  R  R  R  R  L
32  R  R  R  R  R  L
33  L  L  L  L  L  R
34  R  L  L  L  L  R
35  L  R  L  L  L  R
36  R  R  L  L  L  R
37  L  L  R  L  L  R
38  R  L  R  L  L  R
39  L  R  R  L  L  R
40  R  R  R  L  L  R
41  L  L  L  R  L  R
42  R  L  L  R  L  R
43  L  R  L  R  L  R
44  R  R  L  R  L  R
45  L  L  R  R  L  R
46  R  L  R  R  L  R
47  L  R  R  R  L  R
48  R  R  R  R  L  R
49  L  L  L  L  R  R
50  R  L  L  L  R  R
51  L  R  L  L  R  R
52  R  R  L  L  R  R
53  L  L  R  L  R  R
54  R  L  R  L  R  R
55  L  R  R  L  R  R
56  R  R  R  L  R  R
57  L  L  L  R  R  R
58  R  L  L  R  R  R
59  L  R  L  R  R  R
60  R  R  L  R  R  R
61  L  L  R  R  R  R
62  R  L  R  R  R  R
63  L  R  R  R  R  R
64  R  R  R  R  R  R


Looking at the data above, we can see that there are 6 configurations that contain only 1 “L”:

# This is the code in R
# cc=c()
# for(i in 1:64){
#  tmp=d1[i,]
#  cc=c(cc,sum(tmp == "L"))
# }
# d1$count = cc # d1[d1$count == 1, ]
m1 m2 m3 m4 m5 m6 count
32  R  R  R  R  R  L     1
48  R  R  R  R  L  R     1
56  R  R  R  L  R  R     1
60  R  R  L  R  R  R     1
62  R  L  R  R  R  R     1
63  L  R  R  R  R  R     1


This means that the probability of getting a configuration having one molecule on the left and 5 molecules on the right is:

$\displaystyle \text{Probability of 1 molecule Left and 5 molecules Right} = \frac{6}{2^6} = 0.093750$

If we continue summarizing our data so that we get the number of combinations containing 2, 3, 4, 5, and 6 “L”s, we can generate what is known as a probability distribution of air molecules on the left hemisphere:

# cc=c()
# for(i in 0:6){
#  cc=c(cc,length(attributes(d1[d1$count == i, ])$row.names))
# }
# data.frame(X=0:6,count=cc,probability=cc/2^6)
X count probability
1 0     1    0.015625
2 1     6    0.093750
3 2    15    0.234375
4 3    20    0.312500
5 4    15    0.234375
6 5     6    0.093750
7 6     1    0.015625


The graph of this probability distribution is shown below:

# Here is the code that generated the plot above
barplot(cc/2^6,names.arg=0:6,main="Probability Distribution \nNumber of Air Molecules on the Left Hemisphere")


The graph above tells us that the most probable configuration of balloon molecules is that half of them are on the right and the other half on the left as shown by the high probability of the value X = 3. It also tells us the probability of all molecules choosing the Left side equal to 0.015625. When the number of air molecules is very large, this probability will turn out to be extremely small, as we will show below.

## The Mathematics

At this point, manually counting the number of rows will not be practical anymore for a large number of molecules. We need to use some mathematical formula to compute the combinations. We don’t really care which molecules chose left or right. We just care about the number of molecules on the left. Given N molecules, there are $2^N$ possible configurations of air molecules. Of these configurations, there are

$\displaystyle {N \choose m } = \frac{N!}{m! (N-m)!}$

combinations that have $m$ molecules on the left. Therefore, the probability of a configuration having $m$ molecules on the left is

$P(m) = \displaystyle \frac{\displaystyle {N\choose m}}{\displaystyle 2^N}$

This is a probability density function since

$\displaystyle \sum_{m=0}^N P(m) = \displaystyle \sum_{m=0}^N \frac{\displaystyle {N\choose m}}{\displaystyle 2^N} = 1$

To show this, we will use the Binomial Theorem

$\displaystyle \sum_{m=0}^N {N \choose m} x^m = (1+x)^N$

If we let $x=1$, the Binomial Theorem gives us

$\displaystyle \sum_{m=0}^N {N \choose m} = (1+1)^N = 2^N$

Therefore

$\begin{array}{rl} \displaystyle \sum_{m=0}^N P(m) &= \displaystyle \sum_{m=0}^N \frac{\displaystyle {N\choose m}}{\displaystyle 2^N}\\ &= \displaystyle \frac{1}{2^N} \sum_{m=0}^N {N\choose m}\\ &= \displaystyle \frac{1}{2^N} 2^N\\ &= 1 \end{array}$

## A Mole of Air

One mole of air contains $6.022 \times 10^{23}$. This means the probability of that all molecules are on the left side of the balloon is

$\displaystyle \frac{1}{\displaystyle 2^{6.022 \times 10^{23}}} < 1/1024^{10^{22}} < 1/(10^3)^{10^{22}}=10^{-30,000,000,000,000,000,000,000}$

This is a very small number such that it contains 30 million trillion zeros to the right of the decimal point. When you write this zeros on a piece of paper with a thickness of 0.05 millimeters, you would need to stack it up to a height 74 times the distance of earth to pluto in kilometers!

## Counting the Fruits of the Tree, Part 2

Algorithm Complexity

In the last post, we computed the probability distribution of the set of events that would occur when the first letter of the search term is pressed, given a set of 100K english words. Let’s review the algorithm we are trying to analyze:

input = []; //array of words
matches = [];
while(incremental_search){
for(i =0;i -1){
matches.push(input[i]);
}
}
input = matches;
matches = [];
}


When the first keystroke is pressed, we search through the entire array (comparing each element with the keystroke) and collecting all matching words. The matching words are then collected by the matches array. After the loop is finished, we set the input array to the contents of the matches array and we empty the matches array. We again search the new input array when the user presses the next keystroke, this time matching all words in the new input with the word fragment formed by the first and 2nd letter. We do this until the user is presented with a small set of words that matches his/her search term. We can see that the performance of this algorithm is dependent on the length of the input array for each iteration.

Suppose we take as our input a random subset of our list of words with size 1000. This subset of words should not have any word repeated. We want to know how our algorithm performs on the average for any such subset. To figure this out, we need to know how many words would match the input array when we press the first letter. Since this input array could be any subset of 1000 words from the original list of 109582 words, we want to know the average number of words that matches the first keystroke. Take for example, the letter a. What is the average number of words that will we would likely see when we press the letter a?

Random Variables

In a deterministic world, we can compute the behaviour of things using a function and we would get the same answer for the same input. But we know that the world is all but deterministic and that functions do not always assume the same value for the same input. We call such functions from a set $U$ to the set of real numbers whose value depend on a probability distribution a Random Variable. For example, when we toss a coin, it can either land heads or tails. If we assign a value 1 to heads and 0 to tails, then the coin toss can be thought of as a random variable whose domain is the set ${text{head},text{tail}}$ and whose range is the set ${1,0}$. It’s value is 1 with probability 1/2 and 0 with probability 1/2. If we toss the coin many times and count the number of heads that came up, we would find that 50% of the time, it came up heads. We also say that “on the average”, heads come up half of the time. The “on the average” is also known as the Expected Value of the random variable and is defined by the formula:

$\displaystyle E[X] = \sum_j j\cdot Pr(X = j)$

where $j$ is a value that the random variable assumes and $Pr(X=j)$ is the probability of the random variable $X$ in assuming the value $j$. The summation is over all values that X assumes. In our coin toss example, there are 2 possible values of the $X$ and these are 1 (with probability 1/2) and 0 (with probability 1/2). Therefore, the expected value of the coin toss random variable is

$\displaystyle E[X] = 1\cdot Pr(X=1) + 0\cdot Pr(X=0)$
$\displaystyle = 1\cdot \frac{1}{2} = \frac{1}{2}$

In the same way, given any word from our list, we can define a function that takes on a value of 1 if it contains the letter ‘a’ and 0 otherwise. From the previous post, we have computed the probability of getting an ‘a’ to be 0.0752. Let’s call this function $X_alpha$, then it is defined as;

$\displaystyle X_\alpha = \Big\{\begin{array}{ll} 1 & \text{if the word contains an a}\\ 0 & \text{otherwise} \end{array}$

In general, if $p$ is the probability of getting a letter ‘a’, then $1-p$ is the probability of getting something else. The expected value of this random variable can be computed using the formula:

$\displaystyle \begin{array}{ll} E[X_\alpha] &= 1 \cdot Pr(X_\alpha = 1) + 0 \cdot Pr(X_\alpha = 0)\\ &= 1 \cdot p + 0 \cdot (1-p)\\ &= p \end{array}$

Imagine we have a set of 1000 words, if for each word we apply this function, then we have an array of 1’s and 0’s. The total number of 1’s indicate the number of words that contain the letter ‘a’. In other words, the function

$\displaystyle X = \sum_\alpha^{1000} X_\alpha$

is a random variable that gives us the number of words that contain a letter ‘a’. We can compute the expected value of $X$ using the formula:

$\displaystyle E[X] = E[\sum_\alpha^{1000} X_\alpha]$
$\displaystyle = \sum_\alpha^{1000} E[X_\alpha]$
Since $E[X_\alpha] = p$, the above computation leads to

$\displaystyle E[X] = \sum_\alpha^{1000} p = 1000 p$

The result tells us that the expected number of words containing the letter ‘a’ is $1000p = 1000\cdot 0.0752$ or approximately 75 words. This is a good thing because in the second loop, we will only need to iterate on an input with an average length of 75 words. If we do the same computation for all letters, we find the expected number of words per letter for every thousand to be

   key Expected #Words Matching
1    '               0.02624589
2    a              75.22991402
3    b              22.05573578
4    c              43.26766611
5    d              39.77565011
6    e              99.04150001
7    f              14.90897924
8    g              31.76540371
9    h              25.38502724
10   i              80.46859417
11   j               2.27158200
12   k              10.22408743
13   l              54.74761950
14   m              30.32581651
15   n              67.64353879
16   o              59.99679800
17   p              30.62108280
18   q               2.24402381
19   r              74.48190608
20   s              80.93445876
21   t              65.87062875
22   u              37.54737384
23   v              12.34738014
24   w              10.11910386
25   x               3.54188320
26   y              20.14765939
27   z               5.01034088


If each letter is equally likely to be the first letter to be typed, then on the average we have 37 words matching the first letter, consequently reducing our 1000 length input to 37.

Since the figure we computed was the average, it means that the length of the next input array can be more than this average. In the next part of the series, we will derive some bounds on the length of the next input array. We will also compute the length of the next input array when we type the next letter of the search term.

## A Sound Correction

As I was taking my coffee over breakfast today, I realized that in the previous post, I did not incorporate the time it took for the sound waves to reach my ear when I pressed the stop button of the stopwatch. Sound waves travel at 340.29 meters per second. At a height of 25 meters, it would take 0.073 seconds for the sound of the explosion to reach the ground. This should be negligible for the purposes of our computation. However, what is really the effect if we incorporate this minor correction?

As seen in the figure below, the recorded time is composed of the time it took for the skyrocket to reach the maximum height plus the time it took for the sound waves to reach my ear, that is,

$t_r = t_a + t_s$

where $t_r$ is the recorded time, $t_a$ is the actual time the skyrocket reached its maximum height and $t_s$ is the time it took for the sound waves to travel from the maximum height to the ground.

We can rewrite the height as

$\displaystyle h_{\text{max}} = - \frac{gt_a^2}{2} + v_i t_a$

Since $v_i = gt_a$,

$\displaystyle h_{\text{max}} = - \frac{gt_a^2}{2} + gt_a^2$
$\displaystyle = \frac{gt_a^2}{2}$

At the maximum height, $h_{\text{max}}$, it would take the sound waves $t_s$ seconds to travel, that is,

$\displaystyle h_{\text{max}} = v_s t_s$

where $v_s$ is the speed of sound. Therefore,

$\displaystyle t_sv_s = \frac{gt_a^2}{2}$
$\displaystyle (t_r- t_a)v_s = \frac{gt_a^2}{2}$
$\displaystyle v_st_r - v_st_a = \frac{gt_a^2}{2}$
$\displaystyle \frac{gt_a^2}{2} + v_st_a - v_st_r = 0$

The last equation above is a quadratic equation which we can solve for $t_a$ using the quadratic formula:

$\displaystyle t_a = \frac{-v_s \pm \sqrt{v_s^2 - 4(1/2)gv_st_r}}{2(1/2)g}$

Substituting the values $g = 9.8$, $v_s=340.29$, and $t_r = 2.3$, we get

$\displaystyle t_a = 2.22$

Using this result, the maximum height is computed to be

$\displaystyle h_{\text{max}} = \frac{gt_a^2}{2} = 24.33$

Comparing this with our previous computation of $h_{\text{max}} = 25.829$, we find that we have overestimated the maximum height by about 1.49 meters. It’s not really that bad so we can just drop the effect of sound.

## New Year, New Heights and the Projectile

I spent my new year celebration at our friend’s place with 2 other colleagues. It was started with a dinner at about 9:30 pm and we bought some bottles of wine in addition to the spirits we already brought. At about 10:30, a street party was setup in the vicinity and about 5 other families joined in a potluck. Marijo’s brother started the fireworks at about that time with David and Gerard (my colleagues) taking turns setting them off. I did not attempt to set them off because I did a lot those things way back as a child and still lucky enough to have my fingers intact.

This is a video of our 2012 New Year’s Celebration:

As Gerard was flying off skyrockets (or kwitis as they are called in my language) David asked me “How high do you think these things go up?”. That was a really good question which got me to thinking. I said “Let’s compute!” So I took my iphone and launched its built-in stopwatch. Gerard flew about 3 skyrockets and I took the time it takes for them to fly until they blow up in the air. The average was found to be 2.3 seconds.

Independent of Mass

Legend has it that Galileo Galilei performed an experiment at the Leaning Tower of Pisa. He dropped objects of different masses to see which of them falls first. Starting from the heaviest to the lightest object, what he found was that, in a vacuum, they reach the ground at the same length of time when dropped from the same height. Using this result, we can compute for the distance travelled by any object under the influence of gravity. At heights near the earth’s surface, the acceleration of objects in the presence of gravity alone is given by

$\displaystyle \frac{d^2h}{dt^2}= \frac{dv}{dt} = -g$

where $h$ is the vertical distance (height), $v$ is the velocity and $g$ is the acceleration due to gravity given by $g = 9.8 \text{m/s}^2$.

The equation above is a simple differential equation which we can solve for the velocity at any time $t$ by integrating both sides:

$\displaystyle \int^{v}_{v_i} dv = \int^{t}_{t_i} -g dt$

which by the Fundamental Theorem of Calculus gives us

$\displaystyle v - v_i = -g(t - t_i)$
$\displaystyle v = -gt + v_i + t_i$

If we take initial time of the rocket launch to be zero, that is, $t_i=0$, we have

$\displaystyle v = \frac{dh}{dt}= -gt + v_i$

We don’t know the initial velocity $v_i$ of the rocket. However, at the maximum height attained by the rocket, the velocity is zero. Using this fact, we can solve for the initial velocity:

$\displaystyle 0 = -gt_{\text{max}} + v_i$
$\displaystyle v_i = gt_{\text{max}}$

where $t_{\text{max}}$ is the time the rocket reached maximum height.

At $t_{\text{max}}=2.3$ seconds, the initial velocity is therefore $v_i = 9.8 * 2.3 = 22.5$ m/s.

Maximum height

We can solve for the height as a function of time by integrating both sides of the velocity equation:

$\displaystyle \int^{h}_{h_i} dh = \int^{t}_{t_i} -gt dt + v_i dt$
$\displaystyle h - h_i = -\frac{gt^2}{2} + v_i t \big|^t_{t_i}$

Since the initial height is zero , that is , $h_i = 0$ and the initial time is zero $t_i = 0$, we have

$\displaystyle h = -\frac{gt^2}{2} + v_i t$

Since at t=2.3 seconds, the rocket reaches the maximum height, we have

$\displaystyle h_{\text{max}} = -\frac{9.8 \times (2.3)^2}{2} + 22.5 * 2.3 = 25.829 \text{ meters}$

Does this make sense? The average height of a building storey is about 3 meters. This is about 8 or 9 storeys high which I think makes sense.

It’s good that my new year not only started with a warm welcome from friends but it also appealed to my physics curiosity.

Thank you to Marijo Condes and family for letting us spend a memorable New Year’s Eve.

## My Topological Trip To Baguio

Last weekend, we went up to Baguio, the summer capital of the Philippines, to escape the summer heat. As we were going up the long and winding road to Baguio, I find myself frequently looking at my watch, impatient to get over the long trip. As I looked at the time, I also took note of the place where we were during that time. Suddenly, a question popped in my head: “Will it be possible that we’d be in this same place at this time when we return to Manila?”. We can visualize this problem using the diagram below:

In the figure above, the blue dot is the location along the way from Manila to Baguio which we occupied where the clock time was the same when going up as when going down. It turns out that such a place exists. However, where it is located is arbitrary. To see this, suppose that we leave Manila at 6 am and arrive in Baguio at 6 pm. (On the average, it takes about 5-6 hours to reach Baguio from Manila. In this case, we will be taking our time enjoying the scenery). On the day that we go back, we will leave Baguio at 6 am and arrive in Manila at 6 pm. To find the place and time in question, use 2 cars, one starting in Manila going to Baguio and the other car starting from Baguio and going to Manila. When they meet each other along the way, that’s precisely the place and time we are looking!.

Topology

The problem above is an illustration of the power of Topology, a branch of mathematics that studies the properties of mathematical objects that are invariant under stretching without tearing or gluing. For a topologist, there is no difference between a doughnut and a cup because we can always deform the doughnut to look like a cup.

The most obvious topological property of the doughnut is the single hole it has. Any transformation that stretches or deforms it will always retain that hole. Which means it is always possible to find a transformation that will transform your doughnut to a cup. Of course, don’t eat the cup!

Topology is sometimes called “Rubber Sheet Geometry” because it studies objects and its properties that are invariant under stretching. In the diagram below, we start with a flat sheet of rubber with a straight line in it. Stretch this rubber in any which way. You will notice that the line, even though it will not look straight anymore, will still be one piece of line and not two. In other words, the line retains its property of connectedness. It will still be in one piece and not suddenly become two pieces.

Brouwer’s Fixed Point Theorem

The Baguio trip example I gave is an instance of what is known as the Brouwer’s Fixed Point Theorem. In its simplest form, it states that any continuous function $f$ from a closed interval to itself has a fixed point. Let $[a,b]$ be the closed interval in 1 dimension. This means that $f(a)$ and $f(b)$ lie also in $[a,b]$. Furthermore, whatever number $f(a)$ is mapped to, it is always greater than or equal to $a$. Also, whatever number $f(b)$ is mapped to, it is always less than or equal to $b$.

Define the function $g(x) = f(x) - x$. We can easily see that $g(a) \ge 0$ and $g(b) \le 0$. Since $g$ is a continuous function, then there is a point $\xi$ such that $g(\xi) = 0$ (by the Intermediate Value Theorem), that is, $f(\xi) = \xi$. The number $\xi$ is called a fixed point of $f$.

For example, consider the cosine function defined in $[-1,1]$,

$\displaystyle f(x) = \cos(x), -1 \le x \le 1$

then by Brouwer’s Fixed Point Theorem, there is a point $x_0 \in [-1,1]$ such that

$\displaystyle \cos(x_0) = x_0$

As can be seen from the diagram above, this point is the intersection of the graph of the cosine function and the line $f(x) = x$ and is equal to 0.73909.

The Baguio trip has filled me with fond memories of my university days especially on Topology. I’ll be sharing them in my subsequent posts.

## Mt. Mayon’s Mathematical Beauty and Elegance

Mayon volcano refused to show its glory last weekend. It was like a beautiful but shy girl that hides her face behind a veil of clouds. You know the beauty that it is hiding but it keeps you in suspense and wants you go to back there and try your luck again to see it in all its splendor.

Fortunately, Mayon is a perfect cone volcano and it does not take a lot of effort to imagine how it looks like. I was inspired by Mayon to generate its shape using some computer algebra tools at my disposal. The first tool I used is the tool I always bring with me, the iPhone. The iPhone has a neat computer algebra system called Spacetime. It can do a lot of things you would expect from a Computer Algebra System among which is plotting surfaces. To plot the surface of Mayon, I used a coordinate system that is suited to its symmetrical shape. This coordinate system is the cylindrical coordinate system.

Cylindrical Coordinate System

Any point in three dimensions can be specified by three numbers. In the rectangular coordinate system, which we are very familiar with, these three numbers correspond to the x, y and z coordinates of the point. Shown below is how we specify a point using the rectangular coordinate system.

The rectangular coordinate system is not always the most convenient system to use. For example, to specify a circle in rectangular coordinates, you need to specify it as

$\displaystyle x^2 + y^2 = r^2$

where r is the radius of the circle.

In fact, we can specify a circle even simpler by using polar coordinates. We only need to specify the radius and the angle. For a circle of radius 1, the equation is

$\displaystyle r=1$

How simple can it get?

In generating the cone, we use a coordinate system that naturally fits the geometry of the cone. This coordinate system is called the cylindrical coordinate system. Imagine a point p in space and a cylinder that contains this point. We can specify the position of this point in the cylinder by three numbers. The first number specifies the height of this point from the xy plane. The second number specifies the distance of this point from the z axis. The last number specifies the angle $\phi$ that this point makes from the x axis. Below is a diagram of the cylindrical coordinate system.

Specifying the Cone Using Cylindrical Coordinates

Since the cone has circular symmetry around the z axis, we only need to specify the value of z with respect to r. Any plane that passes along the z axis intersects the cone through a line that lies on the cone. The diagram below shows the line of intersection of a cone of height $h$ and whose base has radius $b$.

The equation of this line has the form

$\displaystyle z=mr + k$

where $m$ is the slope and $k$ is the z intercept. When r=0, we have $z=k$. When $z=0$ we have $m= - h/b$. Therefore, we can specify z as:

$\displaystyle z = - \frac{h}{b} r + h$.

Mayon has a base approximately 4 times its height. Using $h=4$ and $b=8$, we can then specify the cone as

$\displaystyle z = - 0.5 r + 4$.
$\displaystyle 0 \le r \le b$
$\displaystyle 0 \le \phi \le 2\pi$

Using the function CylindricalPlot3D of spacetime, we get the plot of the cone as shown below.

Surface Plot Using Rectangular Coordinates

The plot of the cone using cylindrical coordinates is elegant. It used the circular symmetry of the cone around the z axis. We can still plot the cone using rectangular coordinates but it is quite messy. To do this, I used another tool called R. It’s a statistical computing tool but it has a great 3D plotting engine. Here is the R code I used to generate the plot:

# interval is the distance between points in the x or y axis
interval=2

# noisemean is the mean of the gaussian noise we want to introduce into the plot to
# simulate rugged mountain
noisemean=0

# noisedev is the standard deviation
noisedev=1

# addnoise is a boolean. It specifies if we want to add the noise

# generate the x and y vectors below using the interval between points
x=seq(-10,10,interval)
y=seq(-10,10,interval)

# generate the matrix of squares for the y axis
l=length(x)
yy=y^2 %*% t(rep(1,l))

# generate the matrix of squares for the x axis, which turns out to be just the transpose of yy
xx=t(yy)

# compute the zz coordinate: zz = -0.5* r +4
# but first let's compute r^2, let's call it rr

rr=xx + yy
zz = -0.5 * sqrt(rr) + 4

# plot only values greater than or equal to  0
myfunc=function(n){
if(n<0){
0
} else {
n
}
}

mayon= mapply(myfunc,zz)

noise=0.1*matrix(rnorm(l^2,noisemean,noisedev),l,l)
mayon=mayon+ noise
}

# plot using rgl
library(rgl)
persp3d(x,y,mayon,col="yellowgreen",aspect=c(4,4,1))



Below is the result of running the above code.

This how I see mayon through the lens of mathematics. But I definitely want to go back there and see its imposing grandeur.