Mathematical Beauty

It is said that beauty is in the eye of the beholder. As a person is more than the sum of its parts, physical beauty is just one aspect of the total beauty. Physical beauty, however, has some sort of mathematical standard. Why is it that we find movie stars beautiful? What is that standard that defines beauty? That standard is hidden in the sequence of numbers 0, 1, 1, 2, 3, 5, 8…

The careful eye will realize that, beginning on the second term, the terms in this sequence is equal to the sum of the previous two numbers. For example, the second term is just 0+1 = 1, the third term is just 1+2 = 3, and so on. This sequence is called the fibonacci sequence after Leonardo of Pisa, also known as Fibonacci. We can model this sequence as a recurrence relation. If we let F_n be the nth Fibonacci number, then by definition it is equal to the sum of the previous 2 Fibonacci numbers F_{n-1} and F_{n-2}. Mathematically we write this as

\displaystyle F_n = F_{n-1} + F_{n-2}

Looking at the recurrence relation above, we realize that it is an instance of a Homogeneous Linear Recurrence Relation With Constant Coefficients. The good news is we know how to solve this kind of recurrence relation. The characteristic equation is

\displaystyle r^2 - r -1 = 0

Using the quadratic formula, the roots are

\displaystyle r = \frac{ -(-1) \pm \sqrt{ (-1)^2 - 4\cdot 1 \cdot (-1)}}{2\cdot 1}
\displaystyle = \frac{1 \pm \sqrt{5}}{2}

If we define

\displaystyle \phi = \frac{1 + \sqrt{5}}{2}

then we can write negative root as

\displaystyle \frac{1 - \sqrt{5}}{2} = 1-\phi

From the previous post, we know that we can express the solution of F_n as a linear combination of \phi^n and (1-\phi)^n. Therefore, the solution of the Fibonacci recurrence relation is just

\displaystyle F_n = \alpha_1 \phi^n + \alpha_2 (1-\phi)^n

where \alpha_1 and \alpha_2 are constants. Since F_0 = 0 and F_1 = 1, we can solve for \alpha_1 and \alpha_2:

\displaystyle F_0 = 0 = \alpha_1 \phi^0   + \alpha_2 (1-\phi)^0 = \alpha_1 + \alpha_2
\displaystyle F_1 = 1 = \alpha_1 \phi^1   + \alpha_2 (1-\phi)^1 = \alpha_1 \phi+ \alpha_2 (1-\phi)

From the first equation, we solve for \alpha_1:

\displaystyle \alpha_1 = - \alpha_2

Substituting this into the second equation and solving for \alpha_2 we get:

\displaystyle -\alpha_2 \phi + \alpha_2 - \alpha_2 \phi = 1
\displaystyle = -2\alpha_2\phi + \alpha_2 = 1
\displaystyle = \alpha_2 (-2 \phi +1) = 1

Observe that

\displaystyle  1-2\phi = 1 - 2\frac{1+\sqrt{5}}{2} = 1 - (1 + \sqrt{5}) = -\sqrt{5}

This means that

\displaystyle -\sqrt{5} \cdot \alpha_2 = 1
\displaystyle \alpha_2 = - \frac{1}{\sqrt{5}}


\displaystyle \alpha_1 = \frac{1}{\sqrt{5}}

Therefore, the solution to the Fibonacci recurrence relation is

\displaystyle F_n = \frac{\phi^n - (1-\phi)^n}{\sqrt{5}}

Beauty and Fibonacci numbers

After all that tedious computations, so what? What does Fibonacci have anything to do with beauty? If you take the successive ratio of the fibonacci numbers

\displaystyle \lim_{n \rightarrow \infty} \frac{F_{n+1}}{F_n} = \phi

Here is a list of the first few fibonacci numbers starting at 1 and the corresponding ratios:

1        1       1        1.000000
2        2       1        2.000000
3        3       2        1.500000
4        5       3        1.666667
5        8       5        1.600000
6       13       8        1.625000
7       21      13        1.615385
8       34      21        1.619048
9       55      34        1.617647
10      89      55        1.618182
11     144      89        1.617978
12     233     144        1.618056
13     377     233        1.618026
14     610     377        1.618037
15     987     610        1.618033
16    1597     987        1.618034
17    2584    1597        1.618034
18    4181    2584        1.618034
19    6765    4181        1.618034
20   10946    6765        1.618034

The first column in the table above is just a line number. The second column is F_{n+1}, the third column is F_n and the last column is F_{n+1}/F_n. You can see that the ratio approaches value of \phi=1.618034.

The constant \phi is called the Golden Ratio by the Greeks. Any structure that follows the golden ratio is structurally beautiful to the eye. Below is an image of a rectangle with labeled sides.

The rectangle is called a Golden Rectangle if

\displaystyle \frac{a+b}{a} = \frac{a}{b} = \phi.

The Golden Ratio can be found in many aesthetic works. Leonardo Da Vinci used this in his Vitruvian Man. This is probably why the fibonacci sequence was featured in the beginning of the movie (book) Da Vinci Code.

A Method To The Madness

In the previous post on recurrence relations, we derived a general method to solve divide and conquer relations. In another article, we derived the formula of the sum of the first n non-negative integers:

\displaystyle \sum_{i=0}^n i = \frac{n(n+1)}{2}

If you think carefully about it, you can express the sum of the first n integers as a recurrence relation. If you denote s_n to be the sum of the first n integers, then it is just equal to n plus the sum of the first n-1 integers. Symbolically, we write this as

\displaystyle s_n = s_{n-1} + n

A good question now comes to mind: Is there also a method to solve this kind of recurrence relation? Surprise! Surprise! The answer is YES! But first, let us handle the easier cases.

A good example of a recurrence relation we frequently encounter is in our financial life, when we compute how much money we will get at the end of one year when we put it in a bank. Suppose, you deposit 1 million bucks in a bank with 10% interest compounded yearly. How much money will you get after 10 years? If we let P_n be the amount of money we have after n years, then we can write the compounding of interest as a recurrence relation:

\displaystyle P_n = P_{n-1} + 0.1\cdot P_{n-1} = 1.1 \cdot P_{n-1}

The recurrence relation above is just an example of a more general form called the Linear Homogeneous Recurrence Relation With Constant Coefficients. That’s a pretty long name for a seemingly simple concept. What this means is that your recurrence relation has the following form:

\displaystyle a_{n} = c_1 a_{n-1} + c_2 a_{n-2} + \ldots + c_k a_{n-k}

To solve this recurrence relation, we invoke the method of “Lucky Guess”, that is, we will guess that the solution of the above recurrence relation is of the form a_n = r^n. If we substitute this to the equation above, we will get

\displaystyle r^n = c_1 r^{n-1} + c_2 r^{n-2} + \ldots + c_k r^{n-k}

Dividing both sides by r^{n - k} we get,

\displaystyle r^k = c_1 r^{k-1} + c_2 r^{k-2} + \ldots + c_k.

This means that a_n  = r^n is a solution to this recurrence relation if r satisfies the following algebraic equation:

\displaystyle r^k - c_1 r^{k-1} - c_2 r^{k-2} - \ldots - c_k = 0

This is called the Characteristic Equation and the roots of this equation are called characteristic roots. To make things look simpler, let’s suppose we have a recurrence relation of degree 2:

\displaystyle a_n = c_1 a_{n-1} + c_2 a_{n-2}

The characteristic equation of this recurrence relation is

\displaystyle r^2 - c_1 r - c_2 = 0

Case of distinct roots

Suppose the roots of the above equation are r_1 and r_2, respectively. Then

\displaystyle a_n = \alpha_1 r_1^n + \alpha_2 r_2^n

is a solution of the recurrence relation, where \alpha_1 and \alpha_2 are constants. To prove this, substitute the solution to the recurrence relation

\displaystyle  c_1 a_{n-1} + c_2 a_{n-2} =  c_1 (\alpha_1 r_1^{n-1} + \alpha_2 r_2^{n-1}) + c_2 (\alpha_1 r_1^{n-2} + \alpha_2 r_2^{n-2})
\displaystyle = c_1\alpha_1 r_1^{n-1} + c_1 \alpha_2 r_2^{n-1} + c_2  \alpha_1 r_1^{n-2} + c_2\alpha_2 r_2^{n-2}

Collecting the \alpha coefficients:

\displaystyle = \alpha_1 ( c_1 r_1^{n-1} + c_2 r_1^{n-2}) + \alpha_2 (c_1 r_2^{n-1} + c_2 r_2^{n-2})
\displaystyle = \alpha_1 r_1^{n-2} (c_1 r_1 + c_2) + \alpha_2 r_2^{n-2} (c_1 r_2 + c_2)
\displaystyle = \alpha_1 r_1^{n-2} r_1^2 + \alpha_2 r_2^{n-2} r_2^2

Using the characteristic equation r^2=c_1 r + c_2, we have

\displaystyle = \alpha_1 r_1^n + \alpha_2 r_2^2
\displaystyle = a_n

as was to be shown.

Case of Repeated Roots

If the roots of the characteristic equation is repeated, that is, the quadratic equation is a perfect square, then if r_1 is the root, the solution to the recurrence relation can be expressed as

\displaystyle a_n = \alpha_1 r_1^n + n\alpha_2 r_1^n.

To see this, substitute this formula to the recurrence relation

\displaystyle c_1 a_{n-1} + c_2 a_{n-2} = c_1 (\alpha_1 r_1^{n-1} + n\alpha_2 r_1^{n-1}) + c_2 (\alpha_1 r_1^{n-2} + n\alpha_2 r_1^{n-2})
\displaystyle = \alpha_1 ( c_1 r_1^{n-1} + c_2 r_1^{n-2}) + n\alpha_2 (c_1 r_1^{n-1} + c_2 r_1^{n-2})
\displaystyle = \alpha_1 r_1^{n-2} ( c_1 r_1 + c_2) + n \alpha_2 r_1^{n-2} (c_1 r_1 + c_2)
\displaystyle = \alpha_1 r_1^n + n \alpha_2 r_1^n
\displaystyle = a_n

Compound Interest

Applying this to our compound interest recurrence relation, we see that the characteristic equation of

\displaystyle P_n = P_{n-1} + 0.1\cdot P_{n-1} = 1.1 \cdot P_{n-1}


\displaystyle r - 1.1 = 0

whose characteristic root is

\displaystyle r = 1.1

Therefore, the solution to the compound interest recurrence relation has the form

\displaystyle P_n = \alpha \cdot 1.1^n

If at n=0, the amount of money you have in the bank is P_0, we can solve for \alpha:

\displaystyle P_0 = \alpha \cdot 1.1^0 = \alpha \cdot 1 = \alpha

This means that at the end of period n, the amount of money you have in the bank is

\displaystyle P_n = 1.1^n \cdot P_0

Or substituting 1 million to P_0, we get P_{10} = 2593742.

Trees Are Made By Fools Like Me

We have seen in the previous post that searching a name in a sorted array of 1 million names can be done very efficiently using the binary search. But that was only the second half of the story. What was not told was how we ended up with a sorted array. Was it handed to us already sorted or were we the ones who sorted it?

Now suppose that someone gave us an unsorted list of 10 different names to append to the original 1 million. How do we search now for a name? We cannot just append these 10 names to the array of sorted names because it will destroy the “sorted” property of the original array. What we can do is increase the length of the original array by 10 more slots and reinsert the 1 million and 10 names to this new array and sorting the new array. Reinserting would take linear time and re-sorting on a 99% sorted array takes quadratic time using quicksort. So is there a better way to do it? Yes there is and we are going to build a tree for it.

Suppose we are given a list of animals:

leopard, cobra, shark, horse, alligator, bat, tiger, fox, dog, pig

Our task is to create a structure that will allow adding more entries and still be able to search as fast as a binary search. To do this, let’s construct a tree with the following properties

– each node will contain at most 2 children
– each node is identified by a key. For this example, we will use the name of the animal as key.
– the left child of each node should have a key that is less than the value of the node.
– the right child of each node should have a key that is greater than the value of the node.

We call the result of this construction a Binary Search Tree or BST for short.

Given these rules, we can start constructing our BST starting from the word leopard as the root of the tree. The next word is cobra. Since cobra is less than leopard (lexicographically speaking), we put cobra as the left child of leopard. The next word is shark. Since shark is greater than leopard, we put shark as the right child of leopard. Since leopard already has 2 children, we expect the other words to be children of either cobra or shark.

The next word is horse. Since horse is less than leopard, we compare it with cobra. Since horse is greater than cobra, we make horse the right child of cobra. Below is a step by step visual construction of the BST.

The words in bold orange are the newly inserted words into the BST structure. The resulting BST is shown below.

Searching a word on a BST is almost the same as inserting an entry. You first compare your search term with the key of the root. If they are equal, you have found the entry. If the search term is less than the root key, then search the left subtree. If the search term is greater than the root key, then search the right subtree. If the search term is not equal to the node key and the node has no more children, then the word is not in the list. This algorithm is recursive and is shown below:

boolean search_BST(Node node, String searchTerm):
    if node is null
        return false
    if node key is equal to searchTerm:
        return true
    if node key is less than searchTerm:
        search_BST(leftChild, searchTerm)
        search_BST(rightChild, searchTerm)

Let’s trace how to search for the word bat. We start at the root leopard. Since bat < leopard, we search the left subtree of leopard which is rooted at cobra. We now compare bat and cobra. Since bat < cobra, we search the left subtree of cobra, which is rooted at alligator. Since bat > alligator, we search the right subtree of alligator which is rooted at bat. However, since the search term is equal to bat, we have found our word and return true. It only took us 4 comparison to find the word bat.

If we look at our data structure closely, we observe that the running time of the search algorithm is proportional to the height of the tree. It is therefore in our best interest to make our tree as small as possible. But how small? Given a tree of n nodes, how small can be we build our binary tree? At the root (level 0) we have 1 node. At level 1, we can only attach 2 nodes. At level 2, we can attach 4 nodes. At level h, which we denote as the height of the tree, we can have a maximum of

\displaystyle n \le  2^0 + 2^1 + 2^2 + \ldots + 2^h = \sum_{i=0}^h 2^i = 2^{h+1} -1 < 2^{h+1}

This means that the smallest possible height of the binary tree we can create from n nodes is h = \lfloor \log_2 n \rfloor.

Looking at our example, we can see that the left subtree is 2 levels deeper than the right subtree. This means that our tree is not optimal. Ideally, our tree should have a height of \lfloor \log_2 10 \rfloor = 3. In fact, if we built our BST from a sorted list, say

alligator, bat, cobra, dog, fox, horse, leopard, pig,shark, tiger

we will get this structure:

With the structure above, the worst-case running time is O(n). So how do we optimize our BST? That will be the topic of the next post.

A Match Made In Heaven

There are some things which go naturally with each other. For some, it is a person whom they consider a soulmate. In the case of coffee, it is a creamer. For algorithms, it is the data structure. For example, a recursive algorithms suits very well with a tree data structure and a tree data structure suits well with a recursive algorithm.

I assume that you are a programmer or know how to program and that you know what a data structure is. Examples of data structures we usually encounter in daily programming chore is the array. An array is a collection of objects, usually of the same kind. Arrays go well with iterative algorithms like the for loop or while loop. A more interesting kind of data structure is the tree. You can imagine a tree as a hierarchical structure, much like the what you see in Windows Explorer where a filesystem is displayed as a tree of directories. I have encountered programmers, in my career, who don’t know how to navigate a directory structure in code. They usually come up with a spaghetti-like code but which are not able to go deep into the directory structure. However, a good programmer can navigate a directory tree with a few lines of code.

Let’s have an example. Below is a screenshot of a directory structure. It is composed of files and directories (surprise!). We know that directories can contain other files and directories whereas files don’t have that property. In tree language, we call files the “leaves” while directories are the nodes. Well not exactly, because if a directory is empty, it can appear as a leaf. To avoid confusion, let us treat both files and directories in a generic manner and call them nodes. A directory is a special kind of node in that it can contain other nodes, which in this case are files and directories. To traverse a directory structure, we follow the algorithm below:

algorithm traverseTree(Node node):
    print node name
    if node is a directory
        list the children of the directory
        for each child

Notice that the algorithm above calls itself and accepts any node as input, whether it’s a file or a directory. This is an example of a recursive algorithm applied to a tree data structure. Simple and elegant!

You might ask “What is the complexity of the traverseTree algorithm?”. Before we can answer that, let’s agree first on some terminology. We define a graph to be a set of nodes connected to each other by edges. An abstract model of the directory structure above is shown below. The circles represent the nodes while the lines connecting them represent the edges. Therefore a graph G is composed of a set of nodes V and a set of edges E and we denote a graph as G(V,E). Using this definition, you can see that a tree is just a special case of a graph. Let v\in V be a node. The degree of a node \deg(v) is the number of edges it has. For example, if a node is connected to 3 other nodes, then the degree of that node is 3.

An interesting property of the degree of a node is that the total sum of the degrees of each node is equal to 2 times the number of edges, that is,

\displaystyle \sum_{v\in V} \deg(v) = 2\cdot |E|.

To see why this is, assume you have two nodes u and v connected by an edge. Now, what is the degree of u and the degree of v? By definition, the degree of a node is the number of edges it has. Since u has one edge, \deg(u) = 1. In the same way, since v has one edge, \deg(v) = 1. Therefore, \deg(u) + \deg(v) = 1 + 1 = 2, which is equal to twice the number of edges of graph G(V,E).

Let’s now go back to our main task of computing the complexity of the algorithm above. For each node v, the number of times “print node name” is executed is equal to 1 (surprise!). The number of times the statement “if node is a directory” is executed is also 1. Finally, the number of times the for loop is executed is equal to the degree of node v, that is, \deg(v). Since we do this for each node of graph G(V,E), the total executions is equal to

\displaystyle \sum_{v\in V} \Big( 1 + 1 + \deg(v) \Big) = \sum_{v\in V} \Big( 2 + \deg(v) \Big)
\displaystyle = \sum_{v\in V} 2 + \sum_{v\in V} \deg(v)
\displaystyle =  2\cdot \sum_{v\in V} 1 + \sum_{v\in V} \deg(v)

Now, the first summation is just equal to the number of nodes of V, while the second summation is just twice the number of edges. This gives us,

\displaystyle 2\cdot |V| + 2\cdot |E| = 2\cdot \big(|V| + |E| \big) = O(|V| + |E|).

Therefore, the complexity of the algorithm above is Big Oh of the sum of the number of nodes and the number of edges of graph G(V,E).

Gilligan’s Island

Suppose you and a couple of people were shipwrecked in an island (which, for dramatic effect, we call Gilligan’s Island). Assume that none of you were friends with each other in the beginning, but over time you all managed to be friends with at least one other person. What is the probability that two of you have the same number of friends? Now suppose that the initial number of people shipwrecked, including you is 60, what is the least number of people whose surnames start with the same letter?

This kind of problem is an instance of what is known as the Pigeonhole Principle.

If n pigeons fly into m pigeonholes, where n > m, then one pigeonhole has at least 2 or more pigeons in it.

This principle is so intuitive that even a child will agree with you on this. Now the question is why are the 2 problems in the previous paragraph an instance of the Pigeonhole Principle?

In the first problem, let n be the number of people shipwrecked in Gilligan’s Island. Then the maximum number of friends anyone can have is therefore n-1 and the minimum number of friends anyone can have is 1 ( each person has at least one friend). Imagine the pigeonholes to be the possible number of friends anyone can have. The number of such pigeonholes is n-1. By assigning each person to a pigeonhole, you are in effect assigning the number of friends a person has. Since n > n -1, by the Pigeonhole Principle, one pigeonhole has at least 2 or more pigeons ( or at least 2 people have the same number of friends). Therefore, the probability of at least 2 persons having the same number of friends is equal to 1.

In general, the Pigeonhole Principle can be stated as

If n pigeons fly into m pigeonholes such that n > k\cdot m, for some integer k, then there is a pigeonhole with at least k+1 pigeons in it.

For example, suppose n = 11 pigeons flying into m=5 pigeonholes. By letting k = 2, we have 11 > 2\cdot 5 and by the generalized Pigeonhole Principle, there is one pigeonhole that has at least k+1 = 2+1 = 3 pigeons in it.

Going back to the second problem, we know that there are only 26 letters in the English alphabet. If we imagine the m = 26 letters to be pigeonholes and the n = 60 people to be pigeons, then by the generalized Pigeonhole Principle, we have

\displaystyle 60 > k\cdot 26

Solving for the minimum integer k that satisfies this, we get k = 2. This means that there are at least k +1 = 2+1 = 3 persons whose surnames begin with the same letter.

Now try this problem for size. Suppose you have 20 black socks and 20 white socks in a dark room. Since you cannot see clearly in the dark, you decided to pick any sock at random. What is the minimum number of socks you need to pick in order to get a pair with the same color?

101 Ways To Skin A Cat

Figuratively speaking, of course. I’m against animal cruelty. However, in this article we will explore other ways to compute algorithmic complexity. In a previous post, we computed the running time of the code below:

for(int i=0;i<10;i++){
   for(int j=0;j<i;j++){
     for(int k=0;k<j;k++){
         System.out.println("Hello World");

Now that we learned about the Big Oh, we don’t need to know exactly how many times the “Hello World” will execute. We just need to determine the asymptotic behavior of the algorithm. To do this, we will now compute the running time of the above code snippet using elementary calculus.

In the previous post, we came up with an expression of the running time of the code snippet as

\displaystyle C = \frac{1}{2} \Big( \sum_{i=1}^n i^2 + \sum_{i=1}^n i \Big)

We know that the second term above is just

\displaystyle \frac{n(n+1)}{2}

Also, our derivation of the sum of the first term was quite involved so we will try to avoid that as much as possible. If you take a look at the figure below, you can see the graph of f(x) = x^2. I have to mention that this figure was created using OmniGraphSketcher for Ipad.

You can also see rectangles shaded in blue. These rectangles are of width 1 unit and the height of each rectangle is the number above it. Therefore, the area of each rectangle is numerically equal to the height. As you can see from the figure, the total area of the rectangles from 1 to 4 is equal to 1 + 4 + 9 + 16 = 1^2 + 2^2 + 3^2 + 4^2 = \sum_{i=1}^4 i^2. You can also observe that this area is less than the area under the curve, the area indicated in gray. The area under the curve is equal to the proper integral of x^2 evaluated from x=1 to x=5, or symbolically as

\displaystyle \int_1^5 x^2 dx

You might want to ask yourself why the upper limit of integration is x=5 and not x=4.

As can be seen clearly in the figure, the total area of the rectangles is less than the area under the curve, or

\displaystyle \sum_{i=1}^n i^2 \le \int_1^{n+1} x^2 dx

for some finite number n.

Performing the integration and substituting the limits, we get

\displaystyle \sum_{i=1}^n i^2 \le \int_1^{n+1} x^2 dx = \frac{1}{3} x^3\Big|_1^{n+1}
\displaystyle = \frac{1}{3} \Big[ (n+1) ^3 -1 \Big]

Combining this result with C above, we get

\displaystyle C = \frac{1}{2} \Big( \sum_{i=1}^n i^2 + \sum_{i=1}^n i \Big) \le \frac{1}{3} \Big[ (n+1) ^3 -1 \Big] + \frac{n(n+1)}{2}
\displaystyle = \frac{2n^3 + 9 n^2 + 9 n}{6}
\displaystyle < \frac{9n^3 + 9 n^2 + 9 n}{6} < \frac{9}{6}\Big( n^3 + n^3 + n^3\Big) < \frac{9}{2} n^3

By taking M = 9/2, we have

\displaystyle  |C| < M\cdot n^3 = O(n^3)

Therefore, the asymptotic complexity of the code snippet is O(n^3).