Divided We Compute, United We Reduce

Once upon a time, in a far away village lay a dying old man. He called his sons to his deathbed and spoke to them one last time. He said “Sons, see that bundle of sticks? Each one of you try to break it. The one who can break it will inherit all my riches”. Each son, being greedy, wanted all the riches for himself. So each one of them tried to break the bundle of sticks but none of them succeeded. The old man asked his servant to untie the bundle and said to his sons, “Each one of you now get one stick and break it”. Without any effort, each son was able to break the stick. The old man said “You see, when you unite, no task will be difficult. The riches that I told you was a lie. We are broke. When i’m dead, make sure  you unite so that you can survive.”

Fast forward to modern times. You can think of the bundle of sticks to be a complex problem that is itself composed of smaller problems. The sons are the processors of your computer. When each processor was given the task to solve the complex problem, it fails to solve it in a reasonable amount of time. When the complex problem is decomposed into smaller problems and given to each processor, each processor is now able to solve the smaller problems quickly thereby solving the big problem quickly as well.

The process of decomposing a problem into smaller problems and solving them in separate processors is called Parallel Computing. In this article, we will compute how fast a certain algorithm will run when parallelized. The problem we want to investigate is sorting an array of a million (2^{20}) integers.

Efficient Sorting

Suppose you have an array \{ a_1, a_2, a_3, ..., a_n \} that you want to sort based on pairwise comparison.  The sorted array is just one of the many permutations of the array \{a_1, a_2, a_3,\ldots, a_n\}. In fact, if you have n different objects to sort, then there are exactly n! ways to arrange these objects, and one of them is the sorted state. You can imagine the sorting process as a decision tree. Say, for example we have array A={ a,b,c }. To sort this, we first compare a with b and there are 2 ways this can go. Either a \le b or a > b. If a \le b, we then compare b and c. This also give either b \le c or b > c. As you can see from the diagram below, this is nothing but a decision tree.



Since the height of this binary tree is lg(n!), then we have

\lg(n!) = \lg\Big[ n \cdot (n - 1) \cdot (n-2) \cdots 1\Big] \le \lg n + \lg (n-1) \cdots \lg1 \le \underbrace{\lg n \cdots \lg n}_\text{ n times}
\lg(n!) \le n\cdot \lg n

There are efficient algorithms that are able to sort of this complexity. For example, the merge sort has this complexity. Therefore, if you have an array of 2^{20} elements, then the complexity is

2^{20} \cdot \lg(2^{20}) = 2^{20} \cdot (20) = 20971520

that is, it takes about 20 million comparisons to sort an array of 1 million. Could we do any better than this? We can either upgrade the cpu of the machine doing the sorting or use two or more machines to divide the work among those machines. In this article, we are going to investigate the impact of dividing the work into smaller chunks and farming it to other processors.

Divide and Conquer

Assume we have an array n=2^{20} elements that we need to sort and suppose we have two identical processors we can use. Divide the array into 2 equal sized arrays. Give the first array to the first processor and the other half to the second processor. Apply an efficient sorting algorithm to the subarrays to produce a sorted array for each processor. We then combine the result of processor 1 and processor 2 to one big array by merging the two sorted arrays. The diagram below illustrates the process of computation:



This is also known as the MapReduce algorithm. Mapping is the process of assigning subsets of the input data to processors where each processor computes the partial result. Reducing is the process of aggregating the results of each processor to the final solution of the problem.

The process of merging is straightforward. Given two sorted arrays, begin by comparing the first element of each array. The smaller of the two will then occupy the first slot in the big array. The second element of the array from which we took the smallest element will now become the first element of that array. Repeat the process until all elements of both arrays have already occupied slots in the big array. The diagram below illustrates the algorithm of merging.




If you count the total number of comparisons that you need to merge two sorted arrays, you will find that it takes n-1 comparisons. Therefore, the complexity of the merging process is O(n).

Since each processor has n/2 sized subarrays, the sorting complexity is therefore n/p \lg (n/p). Furthermore, since the merging process takes O(n) comparisons, the total complexity of the parallel sorting process is therefore

\displaystyle n/p \lg(n/p) + n

In our example, C=2^{20}/2 \lg(2^{20}/2) + 2^{20}=  11010048 comparisons compared to 2^{20} \lg(2^{20}) = 20971520 when run sequentially. For large values of n, n/p \lg(n/p) dominates n, therefore the complexity of the parallel algorithm is O(n/p \lg(n/p)).

Can we do any better?

For a given value of n, what do you think is the value of p that reduces the running time to O(n)? If we take n=2^{20} and plot complexity against p = \{ 2, 4, 8, 16, 32\} we get the diagram below.



In this diagram, we also plotted the horizontal line y=2^{20}. The intersection of this line with the plot of \displaystyle f(p) = \frac{n}{p} \lg(\frac{n}{p}) gives us the value of p such that the total comparisons is already linear, that is,

\displaystyle f( p ) = n
\displaystyle \frac{n}{p} \lg(\frac{n}{p})  = n

To get the value of p numerically, we have to solve the root of the equation

\displaystyle g( p ) = \frac{n}{p} \lg(\frac{n}{p}) - n = 0

Simplifying,

\displaystyle \frac{1}{p} \lg(\frac{n}{p}) - 1 = 0
\displaystyle \lg(\frac{n}{p}) = p
\displaystyle \frac{n}{p} = 2^p
\displaystyle p2^p - n = 0

Since this is a non-linear equation, we can solve this using the Newton's method. It is a method to compute the roots by approximation given an initial value of the solution. Starting from a guess solution p_1, the root can be approximated using the recursive formula

\displaystyle p_{n+1} = p_n - \frac{g( p_n)}{g\prime ( p_n)}

where g\prime ( p ) is the first derivative of g( p ) . Applying the rules of derivatives, we get

\displaystyle g\prime ( p ) = p\cdot 2^p \ln 2 + 2^p

Substituting this to the formula for Newton's method, we get

\displaystyle p_{n+1} = p_n - \frac{p2^p - n}{p2^p \ln 2 - 2^p}

Below is an R code using newton’s method to compute the root of the equation g(p).

g=function(n,p){
	p* 2^p - n
}

gprime=function(n,p){
	p*2^p *log(2) - 2^p
}

newton=function(p,n,iter){
	tmp = p
	for(i in 1:iter){
		p=p-g(n,p)/gprime(n,p)
		
		if(abs(p-tmp)< 0.0001){
			break		
		}

		print(p)
		tmp=p
	}
	print(p)
}

Running this code, we get the value of p = 16:

> newton(15,n,100)
[1] 16.80905
[1] 16.08829
[1] 15.98603
[1] 16.00286
[1] 15.99944
[1] 16.00011
[1] 15.99998
[1] 16

Ignoring network latency, by distributing the input evenly into 16 processors, we get a running time of O(n) time complexity for n=2^{20} array of items. Therefore, instead of doing 20 million comparisons, you only need 1 million comparisons to sort 1 million objects.

In this age of multicore processors, parallel computing is fast becoming the norm than the exception. Learning to harness the power of multicores is becoming an extremely handy skill to have.

Divine Comedy*

Last night I had a dream. In my dream, an angel showed me heaven and hell. The first place we went to was a large hall filled with round tables. Around each table were 5 souls. All of them wearing white. The place was quiet and the only sound you can hear are the noise of the chopsticks as each soul eats the spaghetti from his or her plate. The spaghetti was unlimited and placed in a big bowl at the center of each table. The chopsticks were placed in such a way that there is one stick between each plate. No soul can eat unless he or she has taken his or her left and right chopsticks. If a soul is not eating, he or she spends this time thinking.

I asked the angel “What is this place?” The angel said “This is hell.” I asked the angel “But why is this hell? This seems to be a very peaceful place with people doing nothing but eat and think.” Then the angel pointed me to one table and said “What do you see?”. I looked at the table and I saw souls starving but never dying. As I looked closer, I noticed that all of them were able to pick their right chopsticks but not the left chopsticks. Suddenly I realized that many tables have souls starving forever.

I was about to ask the angel why they starved when the angel suddenly showed me another place. People were also wearing white and they also have a bowl of spaghetti at the center of each table. Two things were noticeably different from hell. Each table was inside its own room and there was a waiting area per room that can accommodate 4 souls. There were 5 seats per table, but the maximum number of souls per table was only 4. I observed the entire hall and I did not see a single soul starved. Everyone had a happy look on their face and were either eating or thinking. I asked the angel “What is this place?” The angel said “This is heaven. You can see that there are a maximum of 4 souls per table. A soul that is not seated at the table is in the waiting area.”

Suddenly, I woke up from my dream without any idea of what it means. What could the dream mean? I spent the whole day thinking about it.

Interpretation of the Dream

The chopsticks used by the souls are binary semaphores since no two of them can grab the same stick at any time. To model the behavior of the souls in hell, we need an array of 5 semaphores (chopsticks) and the following algorithm:

#loop forever
think()
wait(left chopstick)
wait(right chopstick)
eat()
signal(left chopstick)
signal(right chopstick)

The figure below shows a round table with 5 plates. The chopsticks are colored green and placed in between the plates. For a soul to eat, he/she should have both chopsticks first.


Since the chopsticks are semaphores, then no two souls can be holding the same chopstick at the same time. To see this, let’s make use of the following properties from the previous post:

1. \displaystyle \#CS = \#wait - \#signal . For chopstick i, the number of souls holding this chopstick at any time is equal to \#CS.
2. \displaystyle S.V = 1 - \#CS. Using this property to compute \#CS, we get

\displaystyle \#CS = 1 - S.V

Since S.V \ge 0, then \#CS \le 1, that is, at most one soul is holding the chopstick i at any given time.

The reason why the souls got starved is because it is possible that all of the souls grabbed their right forks at the same time leaving no single left fork to use.

The setup in heaven is subtly different. Instead of having 5 souls eating on the table at the same time, at least one of them waits for his turn in the waiting area (which is guarded by a room semaphore). To model this, we need an array of 5 semaphores (chopsticks) and a room semaphore initialized to 4. The following is the algorithm used by the souls in heaven:

# loop forever
think()
wait(room)
wait(left chopstick)
wait(right chopstick)
eat()
signal(left chopstick)
signal(right chopstick)
signal(room)

The diagram below shows a round table inside a room. The plates and chopsticks are arranged as before, but at least one seat is empty as indicated by the red plate corresponding to it. Any soul that is not inside the room is in the waiting area as shown by the stick figure.



Initially, all the souls are in the waiting area and the door to the room is guarded by the room semaphore. Since the initial value of the room semaphore is 4, at most 4 souls can enter the room at a time. Inside the room, the souls take their seats. Since at most 4 souls can be in the room, at least one seat is not taken. No soul is ever left starved in this system. Otherwise, a soul[i] can be blocked in one of three cases:

1. Blocked on his/her left chopstick. This means there is a soul[i-1] to his/her left that is using chopstick[i] as his/her right chopstick. By assumption of progress, this means that soul[i-1] will eventually execute signal(right chopstick). Hence, soul[i] will be able to pickup his/her left chopstick.
2. Blocked on his/her right chopstick. Soul[i] is not able to pick up his/her right chopstick because soul[i+1] is using this chopstick as his/her left chopstick and will never release it until he/she has eaten. But the only way for soul[i+1] to eat is if he/she is able to pick up the right chopstick which, by induction, is also being used by another soul to the right of soul[i+1]. However, since there is at least one seat that is vacant, there is a right chopstick that is not being used by any soul and hence by progress, at least one soul can eat and eventually soul[i] can eat.
3. Blocked from entering the room. If we use a first come first served policy, eventually, all souls can enter the room.

By using an extra locking mechanism, the souls in heaven are infinitely happier than the ones in hell.

* Disclaimer: This is a work of fiction and should not be taken literally aside from it’s algorithmic content. In concurrency literature, this is called the Dining Philosophers problem. For more information, please consult the book “Principles of Concurrent and Distributed Programming” by M. Ben-Ari.

Semaphorically Speaking

When I was child, our family lived about 60 kilometers away from the city. Once or twice a month we go to the city which took about an hour and a half one way. On the way to the city, we have to pass a stretch of very narrow road that was at the side of a very deep cliff. Many people have already lost their lives in that cliff when the bus they rode plunged into the cliff. In order to prevent this, a mechanism was setup that allowed only one bus at a time to enter that section of road, whether going one way or the other. In order for each end of the road to determine if a bus is traversing at any moment, they used a telephone to signal the other party so that the other buses can wait before taking their turn. It was always a frightening experience each time we pass that road. In computer science, we can view the dangerous narrow section of road as a critical section and the buses traversing this road as the processes. To guarantee mutual exclusion, the telephone system that signals each end of the road is called a semaphore.

We see semaphores everyday. The traffic light is a semaphore that protects the vehicles from bumping each other in that critical section of the road called the intersection. Semaphores are also used in the navy. Below is an example of a semaphore used in the US Navy.


In the last post, we talked about concurrency and an algorithm to ensure correctness of concurrent algorithms. In this post, we will present another mechanism to ensure correctness of concurrent programs. That mechanism is called a semaphore and was first proposed by Edsger Dijkstra. A semaphore* is a data structure S that is composed of an integer V and a set L and being modified by two atomic functions. The first function is called wait and is defined as follows:

wait(S)
     if S.V > 0
        S.V <- S.V -1
     else
        S.L <- S.L union p
        p.state <- blocked

The wait function is called by a process p on a semaphore before it enters the critical section. If the value of S.V is not zero, S.V is decremented by 1 and the process p is allowed to continue its execution. Otherwise, the process p is blocked and is put on the set S.L.

The signal function is defined as

signal(S)
    if S.L = empty set
       S.V <- S.V + 1
    else 
       select a process q in S.L
       S.L <- S.L - {q}
       q.state <- ready

When a process p is done in its critical section, it will call the signal operation on the semaphore S. If the set S.L has no blocked processes, it will increment the value of S.V by 1, otherwise a process q is selected from the set S.L and its state is set to “ready”.

To use the semaphore construct, a typical concurrent program will be written like the one below:

# loop forever
noncriticalSection()
wait(S)
criticalSection()
signal(S)

Simple Properties of Semaphores

In this section, we list some simple properties of semaphores that we can use in proving correctness. By properties, we mean those mathematical properties that remain invariant under whatever state of the computation.

1. From the definition of the wait and signal functions, we can see that the value of S.V never goes negative, that is

\displaystyle S.V \ge 0

2. If initially S.V = k, then we can compute the subsequent values of S.V by counting the number of calls to wait and signal (with some qualifications). If k processes call the wait function, then the value of S.V goes down to 0. If all of these processes call the signal function, then the value of S.V goes back to k. If however, a process calls wait and the value of S.V is already 0, then S.V will just remain zero and the process is blocked. We say that the call to wait has failed an we will not count this call. If another process calls on signal and seeing that S.L is not empty, the S.V value will remain the same. This call to signal is also not counted. Having laid down the rules for counting the calls to wait and signal, we can see that

\displaystyle S.V = k - \#wait + \#signal

where \#wait and \#signal are the number of calls to wait and signal, respectively.

3. At anytime, the number of processes executing the critical section is found by counting the number of processes entering the critical section by a successful call to wait, minus the number of processes leaving the critical section by a successful call to signal:

\displaystyle \#CS = \#wait - \#signal

Correctness of the Semaphore Solution for Two Processes

As we have discussed in the last post, to prove correctness we need to show that the algorithm is mutually exclusive, free from deadlock and starvation.

1. Mutually Exclusion – Let S.V = 1 initially. At anytime during the execution, the number of processes in the critical section is given by property 3 above, that is, \#CS = \#wait - \#signal. From property 2,

\displaystyle S.V = 1 - \#CS
\displaystyle S.V + \#CS = 1
\displaystyle \#CS = 1 - S.V

Since by property 1 S.V \ge 0, therefore \#CS \le 1. That is, there can be at most one process in the critical section at anytime.

2. Freedom from deadlock – Two processes can deadlock if the value of S.V = 0 and neither are in the critical section (\#CS = 0). By property 2 and 3,

\displaystyle S.V = 1 - \#CS

Since S.V = 0 and \#CS = 0, substituting this to the equation above gives us

\displaystyle 1 = 0,

a contradiction

3. Freedom from Starvation – Suppose a process q, upon calling wait, sees that S.V = 0, and is blocked to starvation in S.L. By property 2 and 3,

\displaystyle \#CS = 1 - S.V = 1 - 0 = 1

which means that the other process p is in the critical section. By assumption of progress, process p will eventually call signal and will pick one process from S.L to enter the critical section. Since there is only one process that is blocked, and that is q, process q can then enter its critical section, contradicting the assumption that it is starved.

* For more information, consult the book “Principles of Concurrent and Distributed Programming” by M. Ben-Ari.

A Mutually Beneficial Arrangement

It’s been said that the programmers of today are the pizza delivery boys of tomorrow. This has happened the last time the world witnessed a severe financial crisis. Usually, those who are not able to cope up with the current trends in IT are the first ones to go. One trend that is here to stay is the increasing number of cores of our computers. In order to maximize the power of multicore architecture, the programmers of today should be able to do concurrent programming correctly. Concurrent programming is the art of using the extra CPU capacity of your computer by exploiting parallelism to speed up your applications. This is a non-trivial task and requires a different mindset. In these series of articles, we will explore what concurrent programming is and its mathematical basis.

Imagine a concurrent program composed of two threads trying to read or write a value into an array. Let’s suppose that we have an array of 10 numbers and 2 threads. Thread 1 writes a number to the next empty array location. If the array is full, thread 1 will not write anything but will wait until at least one location is empty. Thread 2 will read and delete the last number in the array. If the array is empty, thread 2 will not read anything. How do we ensure that thread 1 will not write past the array length? How do we ensure that thread 2 will not read an empty element?

The section of code that does the updating or reading of the array is called the critical section. This is the portion of the code where we need to ensure that only one thread is executing at a time. To do this, we employ a mechanism for the two threads to know whose turn it is to enter the critical section. Below is a code that tries to solve this problem. First we define a global variable that is shared by both threads. This variable called turn will specify which thread can enter the critical section. The value of turn is either 1 or 2. A value of 1 means that thread 1 can enter the critical section while a value of 2 means that thread 2 can enter the critical section.


global turn = 1

This code is to be executed by thread 1:

#loop forever
1 execute non-critical section
2 await turn = 1
3 execute critical section
4 turn = 2

The code to be executed by thread 2 is similar:

#loop forever
1 execute non-critical section
2 await turn = 2
3 execute critical section
4 turn = 1

The critical section has be coded in such a way that it takes a short time as possible. We shall assume that any thread that enters the critical section should exit from it eventually. This is called the progress assumption.

We can imagine the execution of this program by the two threads to be described by a state specified by three numbers:

1. the pointer to the line that thread 1 is to execute.
2. the pointer to the line that thread 2 is to execute.
3. the value of the turn variable.

For example, the initial state is described by the triple (1,1,1). This means that thread 1 pointer is at line 1, thread 2 pointer is at line 1 and the value of the turn variable is 1. Let us first make it clear that when the pointer is at line 1, the thread has not yet executed that line. It only means that if the CPU scheduler will allow thread 1 to execute, it will execute line 1. For example, if the pointer of thread 1 is at line 4, it does not mean that the value of turn = 2. It only means that when thread 1 goes from state 4 to state 1, then the value of turn should have been set to 2.

We can draw the entire execution of the two threads using a state diagram as shown below:


Let’s try to understand this diagram. The execution begins with the state (1,1,1) which means that thread 1 pointer is at 1, thread 2 pointer is at 1, and the value of turn = 1. This state can either go to (2,1,1) or (1,2,1) depending on which thread the CPU scheduler wants to run first. The state transitions are indicated by arrows. From the diagram, you will see that when the value of turn = 1, thread 2 cannot go past pointer 2. In the same way, when the value of turn = 2, thread 1 cannot go past pointer 2.

Correctness Properties of Concurrent Programs

Writing concurrent programs is not easy. You have to demonstrate that your program is correct. What does correct mean? A concurrent program is correct if:

1. Mutual exclusion holds. There is at most one thread that enters the critical section.
2. No deadlock occurs. This means that no thread holds a resource that is needed by another thread, and vice versa. If this occurs, then each thread will be waiting for the other thread to release the resource and the application will appear to hang.
3. No starvation occurs. If a thread wants to enter the critical section, then it should eventually be able to do so.

Is our algorithm correct? First, the algorithm satisfies mutual exclusion. There is no state in which both thread pointers are pointing to line 3, which is the critical section. This means you cannot see states (3,3,1) or (3,32). There is no deadlock. The two threads do not get stuck on it’s preprotocol. Suppose that the value of turn = 1 and both threads execute line 2. Since the turn = 1, then thread 1 will enter its critical section. By the assumption of progress, it will exit its critical section and set turn = 2. This will then allow thread 2 to enter its critical section. Therefore, no thread gets stuck trying to enter its critical section.

There is however starvation. There is no assumption of progress on the non-critical section. This means that if turn = 1 and thread 1 will get stuck in its non-critical section, this will prevent thread 2 from entering its critical section. Therefore, the algorithm is not correct as it does not satisfy all conditions.

In the next post, we will examine a mechanism, called a semaphore, that is able to solve this problem.

Way Up High A Binary Tree

Way up high in the binary tree
Two binary bits smiled at me
I shook that tree as hard as I could
Down came the binary bits,
Mmmm–were they good! -based on a children’s song

Now it turns out that this binary tree is an AVL tree. I wonder how high this tree is! Why does it concern us ? The reason is that the height will give us the worst case running time of the binary search tree algorithm. An AVL tree maintains its height by making sure the difference in height between the left and right subtree is at most 1. If N_h is the number of nodes contained in the tree of height h, then N_h is equal to the number of nodes in the left subtree plus the number of nodes in the right subtree PLUS 1 (the root node). In the worst case, the two subtrees differ in height by at most 1, therefore

\displaystyle N_h = N_{h-1} + N_{h-2} + 1

with initial conditions N_0 = 0 and N_1 = 1.

By adding 1 to both sides of this recurrence relation we get

\displaystyle N_h + 1 = N_{h-1} + 1 + N_{h-2} + 1

If we let b_h = N_h + 1, we can rewrite the above recurrence relation as

\displaystyle b_h = b_{h-1} + b_{h-2}

The initial conditions are now

b_0 = N_0 + 1 = 0 + 1 = 1 and b_1 = N_1 + 1 = 1 + 1 = 2

Now this is interesting! This recurrence relation is very similar to the Fibonacci recurrence relation except that the initial conditions are different. From the previous post, we found that the general solution of this recurrence relation is

\displaystyle b_n = \alpha_1 \phi^n + \alpha_2 (1-\phi)^n

where \alpha_1 and \alpha_2 are constants yet to be determined. Let us solve this recurrence relation in the general case where b_0 = s and b_1=t. Computing the first 2 terms of the recurrence relation, we get two simultaneous equations in \alpha_1 and \alpha_2.

\displaystyle b_0 = s = \alpha_1 + \alpha_2
\displaystyle b_1 = t = \alpha_1 \phi + \alpha_2 (1-\phi)

Solving for \alpha_1 from the first equation, we get

\displaystyle \alpha_1 = s- \alpha_2

Plugging this into the second equation and solving for \alpha_2

\displaystyle t = (s-\alpha_2) \phi + \alpha_2 - \alpha_2\phi
\displaystyle t = s\phi + \alpha_2(1- 2\phi)
\displaystyle \alpha_2 = \frac{t-s\phi}{1-2\phi}

From which we solve for \alpha_1

\displaystyle \alpha_1 = s - \frac{t-s\phi}{1-2\phi}
\displaystyle = \frac{s(1-\phi) - t }{ 1-2\phi}

Substituting this to the recurrence relation of b_n, we have

\displaystyle b_n = \frac{s(1-\phi) -t }{1-2\phi} \phi^n + \frac{t - s\phi}{1-2\phi} (1-\phi)^n

Expanding this expression you’ll end up with a rather complex expression involving cross terms of \phi^n and (1-\phi)^n

\displaystyle b_n = \frac{s\phi^n ( 1- \phi) - t\phi^n + t(1-\phi)^n - s\phi(1-\phi)^n}{1-2\phi}

But that is where the complexity ends. Observe that

\displaystyle \phi(1-\phi) = \frac{1+\sqrt{5}}{2}\cdot \frac{1-\sqrt{5}}{2}
\displaystyle = \frac{1-5}{4} = -1

Using this result, we can write

\phi^{n}(1-\phi) = \phi^{n-1}\phi (1-\phi) = -\phi^{n-1}
\phi (1-\phi)^n = \phi (1-\phi) (1-\phi)^{n-1} = - (1-\phi)^{n-1}

Furthermore, we also observe that

\displaystyle 1-2\phi = 1 - \frac{1+ \sqrt{5}}{2} = -\sqrt{5}

Using all these results we have

\displaystyle b_n = - s\phi^{n-1} - t\phi^n + t(1-\phi)^n + s(1-\phi)^{n-1}
\displaystyle = s \Big[ \frac{(\phi^{n-1} - (1-\phi)^{n-1}}{\sqrt{5}} +\Big]  t \Big[ \frac{(\phi^n - (1-\phi)^n}{\sqrt{5}}\Big]

Notice that the expressions in square brackets are just fibonacci expressions for n-1 and n, respectively. If we use them in the expression for b_n, we can express it in terms of the fibonacci numbers

\displaystyle  b_n = sf_{n-1} + tf_n

Returning to our computation for the height of the AVL tree, since b_0 = s = 1 and b_1 = t = 2, we finally have

\displaystyle b_n = f_{n-1} + 2f_n

We can further simplify this by

\displaystyle b_n = f_{n-1} + f_n + f_n
\displaystyle = \Big(f_{n-1} + f_n\Big) + f_n
\displaystyle = f_{n+1} + f_n
\displaystyle = f_{n+2}

Since b_n = N_h + 1,

\displaystyle N_h + 1 = f_{n+2}
\displaystyle N_h = f_{n+2} -1

The formula above expresses the number of nodes of the AVL tree in terms of the height h. In terms of the golden ratio, we can write the last expression as

\displaystyle N_h = \frac{\phi^{n+2} - (1-\phi)^{n+2}}{\sqrt{5}} - 1

Approximating the nth Fibonacci Number

We can further simplify the above expression by observing that the nth fibonacci number is approximately equal to \phi^n/\sqrt{5}. To see this, we compute the distance of f_n to \phi^n/\sqrt{5}:

\displaystyle |f_n - \frac{\phi^n}{\sqrt{5}}| = |\frac{\phi^n - (1-\phi)^n}{\sqrt{5}} - \frac{\phi^n}{\sqrt{5}} | = |\frac{(1-\phi)^n}{\sqrt{5}}| < \frac{1}{\sqrt{5}} < \frac{1}{2}

Using this approximation, we can write

\displaystyle N_h = \frac{\phi^{h+2}}{\sqrt{5}} -1

So if n is the number of nodes in an AVL tree, we can estimate the height using the above formula, that is

\displaystyle n \approx \frac{\phi^{h+2}}{\sqrt{5}} -1
\displaystyle n + 1 \approx \frac{\phi^{h+2}}{\sqrt{5}}
\displaystyle \sqrt{5}(n+1) \approx \phi^{h+2}
\displaystyle h \approx \log_{\phi} \Big( \sqrt{5} (n+1)\Big) -2

Using a change of base

\displaystyle n \approx \frac{\log_2 \Big( \sqrt{5}(n+1)\Big)}{\log_2 \phi} -2
\displaystyle \approx \frac{\log_2\Big( \sqrt{5}(n+1)\Big) - 2\log_2\phi}{\log_2 \phi}
\displaystyle \approx \frac{1}{\log_2 \phi} \cdot \log_2\Big(\frac{\sqrt{5}(n+1)}{\phi^2}\Big)

Again using the fibonacci number approximation, we have \sqrt{5}/\phi^2 \approx f_2 = 1. This simplifies our estimate to

\displaystyle h \approx 1.44\log_2(n+1)

where 1/\log_2 \phi = 1.44.

If we have an array of a million names, then the worst case running time of the binary search is

\displaystyle 1.44\log_2(1000000 + 1) = 28.70.

That's incredibly efficient!

Eating The Fruit Of The AVL Tree

We have seen that searching a name in a sorted array of a million names is very efficient when using binary search. In fact, the running time is logarithmic, which means we only need 19 comparisons, in the worst case, to figure out if our search is successful or not. We have also seen that by using a binary search tree, we can save more efficiencies when we are searching on a dynamic array of names, that is, when the list of names grows. However, we also showed that if the BST is fed with a sorted array of names, then the search becomes very inefficient. In fact, in the worst case, we need to compare 1 million times before we realize that we don’t have a name in the array.

In the last post, we built a BST from an array of a sorted list of animals. Below is the result of that construction, repeated here for convenience.

This is a sad state of affairs. We have found a very good data structure for searching but it fails when fed with sorted data. Is there anything we can do to minimize the height of this tree? Fortunately, there is! The data structure is called a self-balancing binary search tree and there are many of them. We will only take a look at one and it is called AVL tree*.

A self-balancing binary search tree is able to maintain it’s height by keeping track of a variable in each node called the balance factor. The balance factor for a node x is defined as

Height of left subtree of node minus the height of the right subtree.

Below is an example of a binary tree with balance factors indicated inside the circles. The triangles that you see in the diagram are subtrees. They may or may not be present in the node but we show them in the diagram for some reason which will become clear later.

An AVL tree employs mechanisms to limit the difference in height between subtrees to 1, 0 or -1. These mechanisms are called rotations. There are only 4 rotations that are needed to balance an AVL tree. The figure below shows you how rotations work.

The first thing to look for after inserting a node is find a balance factor that is -2 or +2. in the first tree, the balance factor is +2. this means that the left subtree is imbalanced. Next, examine the balance factor of the left child. If the balance factor is -1, then the right subtree is higher than the left. To balance this, the child in yellow replaces its parent (green node) which now becomes its left child. What used to be the right child of the node in yellow is now the right child of the green node.

However, this rotation does not change the balance factor of the root node. Another rotation will establish the balance. It involves replacing the root node ( in red) with the yellow node and making the root node the right child of the yellow node. What used to be the right child of the yellow node now becomes the left child of the red node.

When the balance factor of the root node is -2 it means that the right subtree is imbalanced. If you observe carefully, this is just symmetric to the case above and the rotation can easily be figured out from the figure.

Balancing the Animal Tree

Let us show how we can use these transformations to fix our tree. From the figure below, you can see that the imbalance occurs after we insert the cobra. At this point, the balance factor of the “alligator” node is 2. Rotating this tree by making bat the root node fixes the tree.

We then insert dog and fox before we encounter another imbalance. Replacing cobra with dog and making cobra the left child of dog fixes the imbalance.


Inserting horse to this new configuration creates an imbalance at the root of the tree. Replacing bat with dog and making cobra the right child of bat fixes the tree.

You can follow the rest of the process until we have all ten words inserted into the BST. The result is a tree with a maximum height of 4.


What does this mean? It means that it only takes a maximum of 4 comparisons to find any animal in this structure. It also means it only takes a maximum of 4 comparisons to determine if an animal is not in the structure.

In the next post, we are going to compute the the height of an AVL tree to get an estimate of the complexity of the search.

* AVL is named after Adelson-Velskii and EM Landis.

Mathematical Beauty

It is said that beauty is in the eye of the beholder. As a person is more than the sum of its parts, physical beauty is just one aspect of the total beauty. Physical beauty, however, has some sort of mathematical standard. Why is it that we find movie stars beautiful? What is that standard that defines beauty? That standard is hidden in the sequence of numbers 0, 1, 1, 2, 3, 5, 8…

The careful eye will realize that, beginning on the second term, the terms in this sequence is equal to the sum of the previous two numbers. For example, the second term is just 0+1 = 1, the third term is just 1+2 = 3, and so on. This sequence is called the fibonacci sequence after Leonardo of Pisa, also known as Fibonacci. We can model this sequence as a recurrence relation. If we let F_n be the nth Fibonacci number, then by definition it is equal to the sum of the previous 2 Fibonacci numbers F_{n-1} and F_{n-2}. Mathematically we write this as

\displaystyle F_n = F_{n-1} + F_{n-2}

Looking at the recurrence relation above, we realize that it is an instance of a Homogeneous Linear Recurrence Relation With Constant Coefficients. The good news is we know how to solve this kind of recurrence relation. The characteristic equation is

\displaystyle r^2 - r -1 = 0

Using the quadratic formula, the roots are

\displaystyle r = \frac{ -(-1) \pm \sqrt{ (-1)^2 - 4\cdot 1 \cdot (-1)}}{2\cdot 1}
\displaystyle = \frac{1 \pm \sqrt{5}}{2}

If we define

\displaystyle \phi = \frac{1 + \sqrt{5}}{2}

then we can write negative root as

\displaystyle \frac{1 - \sqrt{5}}{2} = 1-\phi

From the previous post, we know that we can express the solution of F_n as a linear combination of \phi^n and (1-\phi)^n. Therefore, the solution of the Fibonacci recurrence relation is just

\displaystyle F_n = \alpha_1 \phi^n + \alpha_2 (1-\phi)^n

where \alpha_1 and \alpha_2 are constants. Since F_0 = 0 and F_1 = 1, we can solve for \alpha_1 and \alpha_2:

\displaystyle F_0 = 0 = \alpha_1 \phi^0   + \alpha_2 (1-\phi)^0 = \alpha_1 + \alpha_2
\displaystyle F_1 = 1 = \alpha_1 \phi^1   + \alpha_2 (1-\phi)^1 = \alpha_1 \phi+ \alpha_2 (1-\phi)

From the first equation, we solve for \alpha_1:

\displaystyle \alpha_1 = - \alpha_2

Substituting this into the second equation and solving for \alpha_2 we get:

\displaystyle -\alpha_2 \phi + \alpha_2 - \alpha_2 \phi = 1
\displaystyle = -2\alpha_2\phi + \alpha_2 = 1
\displaystyle = \alpha_2 (-2 \phi +1) = 1

Observe that

\displaystyle  1-2\phi = 1 - 2\frac{1+\sqrt{5}}{2} = 1 - (1 + \sqrt{5}) = -\sqrt{5}

This means that

\displaystyle -\sqrt{5} \cdot \alpha_2 = 1
\displaystyle \alpha_2 = - \frac{1}{\sqrt{5}}

and

\displaystyle \alpha_1 = \frac{1}{\sqrt{5}}

Therefore, the solution to the Fibonacci recurrence relation is

\displaystyle F_n = \frac{\phi^n - (1-\phi)^n}{\sqrt{5}}

Beauty and Fibonacci numbers

After all that tedious computations, so what? What does Fibonacci have anything to do with beauty? If you take the successive ratio of the fibonacci numbers

\displaystyle \lim_{n \rightarrow \infty} \frac{F_{n+1}}{F_n} = \phi

Here is a list of the first few fibonacci numbers starting at 1 and the corresponding ratios:

1        1       1        1.000000
2        2       1        2.000000
3        3       2        1.500000
4        5       3        1.666667
5        8       5        1.600000
6       13       8        1.625000
7       21      13        1.615385
8       34      21        1.619048
9       55      34        1.617647
10      89      55        1.618182
11     144      89        1.617978
12     233     144        1.618056
13     377     233        1.618026
14     610     377        1.618037
15     987     610        1.618033
16    1597     987        1.618034
17    2584    1597        1.618034
18    4181    2584        1.618034
19    6765    4181        1.618034
20   10946    6765        1.618034

The first column in the table above is just a line number. The second column is F_{n+1}, the third column is F_n and the last column is F_{n+1}/F_n. You can see that the ratio approaches value of \phi=1.618034.

The constant \phi is called the Golden Ratio by the Greeks. Any structure that follows the golden ratio is structurally beautiful to the eye. Below is an image of a rectangle with labeled sides.



The rectangle is called a Golden Rectangle if

\displaystyle \frac{a+b}{a} = \frac{a}{b} = \phi.

The Golden Ratio can be found in many aesthetic works. Leonardo Da Vinci used this in his Vitruvian Man. This is probably why the fibonacci sequence was featured in the beginning of the movie (book) Da Vinci Code.