Mt. Mayon’s Mathematical Beauty and Elegance

Mayon volcano refused to show its glory last weekend. It was like a beautiful but shy girl that hides her face behind a veil of clouds. You know the beauty that it is hiding but it keeps you in suspense and wants you go to back there and try your luck again to see it in all its splendor.

Fortunately, Mayon is a perfect cone volcano and it does not take a lot of effort to imagine how it looks like. I was inspired by Mayon to generate its shape using some computer algebra tools at my disposal. The first tool I used is the tool I always bring with me, the iPhone. The iPhone has a neat computer algebra system called Spacetime. It can do a lot of things you would expect from a Computer Algebra System among which is plotting surfaces. To plot the surface of Mayon, I used a coordinate system that is suited to its symmetrical shape. This coordinate system is the cylindrical coordinate system.

Cylindrical Coordinate System

Any point in three dimensions can be specified by three numbers. In the rectangular coordinate system, which we are very familiar with, these three numbers correspond to the x, y and z coordinates of the point. Shown below is how we specify a point using the rectangular coordinate system.



The rectangular coordinate system is not always the most convenient system to use. For example, to specify a circle in rectangular coordinates, you need to specify it as

\displaystyle x^2 + y^2 = r^2

where r is the radius of the circle.

In fact, we can specify a circle even simpler by using polar coordinates. We only need to specify the radius and the angle. For a circle of radius 1, the equation is

\displaystyle r=1

How simple can it get?

In generating the cone, we use a coordinate system that naturally fits the geometry of the cone. This coordinate system is called the cylindrical coordinate system. Imagine a point p in space and a cylinder that contains this point. We can specify the position of this point in the cylinder by three numbers. The first number specifies the height of this point from the xy plane. The second number specifies the distance of this point from the z axis. The last number specifies the angle \phi that this point makes from the x axis. Below is a diagram of the cylindrical coordinate system.


Specifying the Cone Using Cylindrical Coordinates

Since the cone has circular symmetry around the z axis, we only need to specify the value of z with respect to r. Any plane that passes along the z axis intersects the cone through a line that lies on the cone. The diagram below shows the line of intersection of a cone of height h and whose base has radius b.


The equation of this line has the form

\displaystyle z=mr + k

where m is the slope and k is the z intercept. When r=0, we have z=k. When z=0 we have m= - h/b. Therefore, we can specify z as:

\displaystyle z = - \frac{h}{b} r + h.

Mayon has a base approximately 4 times its height. Using h=4 and b=8, we can then specify the cone as

\displaystyle z = - 0.5 r + 4.
\displaystyle 0 \le r \le b
\displaystyle 0 \le \phi \le 2\pi

Using the function CylindricalPlot3D of spacetime, we get the plot of the cone as shown below.



Surface Plot Using Rectangular Coordinates

The plot of the cone using cylindrical coordinates is elegant. It used the circular symmetry of the cone around the z axis. We can still plot the cone using rectangular coordinates but it is quite messy. To do this, I used another tool called R. It’s a statistical computing tool but it has a great 3D plotting engine. Here is the R code I used to generate the plot:

# interval is the distance between points in the x or y axis
interval=2

# noisemean is the mean of the gaussian noise we want to introduce into the plot to
# simulate rugged mountain
noisemean=0

# noisedev is the standard deviation
noisedev=1

# addnoise is a boolean. It specifies if we want to add the noise
addnoise=T

# generate the x and y vectors below using the interval between points
x=seq(-10,10,interval)
y=seq(-10,10,interval)

# generate the matrix of squares for the y axis
l=length(x)
yy=y^2 %*% t(rep(1,l))

# generate the matrix of squares for the x axis, which turns out to be just the transpose of yy
xx=t(yy)

# compute the zz coordinate: zz = -0.5* r +4
# but first let's compute r^2, let's call it rr

rr=xx + yy
zz = -0.5 * sqrt(rr) + 4

# plot only values greater than or equal to  0
myfunc=function(n){
        if(n<0){
                0
        } else {
                n
        }
}

mayon= mapply(myfunc,zz)

# add some noise
if(addnoise){
        noise=0.1*matrix(rnorm(l^2,noisemean,noisedev),l,l)
        mayon=mayon+ noise
}

# plot using rgl
library(rgl)
persp3d(x,y,mayon,col="yellowgreen",aspect=c(4,4,1))

Below is the result of running the above code.


This how I see mayon through the lens of mathematics. But I definitely want to go back there and see its imposing grandeur.

Advertisements

Chances Are

I just came from a great weekend escapade, with my wife and two colleagues, in one of the beautiful regions of the Philippines. Bicol is home to Mayon Volcano, one of the seven perfect coned volcanoes on Earth. The scenery is so green and the air so refreshing. It was one of the best trips I ever made in the Philippines.



The trip was also full of coincidences. We met a colleague in one of the restaurants in Naga City where we had our dinner. In the same restaurant we also met one of our contacts in Ateneo de Naga. On the way home, we met another colleague at the airport. But the first coincidence happened on our way to Legazpi City where one of my companions (i’ll hide the name to protect the innocent) met a good friend in the airplane seated next to him. It was a pleasant surprise for my companion and he asked “What are the odds of that happening”? That’s a good question which got me thinking later that night.

To simplify the problem, let us not compute the odds of my companion and his friend to be in the same plane going to Bicol. Rather, let us compute the odds of him being seated next to his friend. The plane was an airbus A320 and about 180 seats. The seats are laid out in such a way that there is a middle isle and each row is composed of three seats. Let us draw a row of seats and assume we have persons A and B. How many ways can they be seated in that row in such a way that they sit next to each other? Below is a diagram on the ways two persons can be seated next to each other.



As you can see, there are four ways for them to be seated next to each other for any given row. In the first arrangement, we have A, B seated in that order. The third seat can be occupied by anyone. Having chosen the seats for A and B, how many choices do we have for the third person? Since there are 180 seats minus two, the third person has 178 other seats to choose from. Once person number 3 chooses one seat, the fourth person can then choose from the remaining 177 seats. Doing this for all other passengers, you will come up with 178! ways of assigning seating arrangements for the 180 passengers such that persons A and B are seated next to each other. Now, that is just for the case where A and B are in the first two seats. For the other three cases, you will see that it also takes 178! ways for each case. Since there are 4 cases per row and 60 rows, we have

\displaystyle 4\cdot 60\cdot 178!

ways to have two persons sit next to each other on a 180 seater plane. Since the total number of ways to assign passengers to 180 seats is 180!, then the probability of having two persons seat next to each other is

\displaystyle \frac{4\cdot 60\cdot 178!}{180!}
\displaystyle = \frac{4\cdot 60\cdot 178!}{180\cdot 179\cdot 178!}
\displaystyle = \frac{4\cdot 60}{180\cdot 179}
\displaystyle = 0.00744879

This is quite a low probability and it assumes that the seating arrangement is totally random. By totally random, I mean that my wife and I could be seated far apart from each other. In reality, this is not the case since many passengers come in groups and they prefer to be seated next to each other. If we take this into account, then the probability of my companion and his friend to be seated next to each other increases.

A Mutually Beneficial Arrangement

It’s been said that the programmers of today are the pizza delivery boys of tomorrow. This has happened the last time the world witnessed a severe financial crisis. Usually, those who are not able to cope up with the current trends in IT are the first ones to go. One trend that is here to stay is the increasing number of cores of our computers. In order to maximize the power of multicore architecture, the programmers of today should be able to do concurrent programming correctly. Concurrent programming is the art of using the extra CPU capacity of your computer by exploiting parallelism to speed up your applications. This is a non-trivial task and requires a different mindset. In these series of articles, we will explore what concurrent programming is and its mathematical basis.

Imagine a concurrent program composed of two threads trying to read or write a value into an array. Let’s suppose that we have an array of 10 numbers and 2 threads. Thread 1 writes a number to the next empty array location. If the array is full, thread 1 will not write anything but will wait until at least one location is empty. Thread 2 will read and delete the last number in the array. If the array is empty, thread 2 will not read anything. How do we ensure that thread 1 will not write past the array length? How do we ensure that thread 2 will not read an empty element?

The section of code that does the updating or reading of the array is called the critical section. This is the portion of the code where we need to ensure that only one thread is executing at a time. To do this, we employ a mechanism for the two threads to know whose turn it is to enter the critical section. Below is a code that tries to solve this problem. First we define a global variable that is shared by both threads. This variable called turn will specify which thread can enter the critical section. The value of turn is either 1 or 2. A value of 1 means that thread 1 can enter the critical section while a value of 2 means that thread 2 can enter the critical section.


global turn = 1

This code is to be executed by thread 1:

#loop forever
1 execute non-critical section
2 await turn = 1
3 execute critical section
4 turn = 2

The code to be executed by thread 2 is similar:

#loop forever
1 execute non-critical section
2 await turn = 2
3 execute critical section
4 turn = 1

The critical section has be coded in such a way that it takes a short time as possible. We shall assume that any thread that enters the critical section should exit from it eventually. This is called the progress assumption.

We can imagine the execution of this program by the two threads to be described by a state specified by three numbers:

1. the pointer to the line that thread 1 is to execute.
2. the pointer to the line that thread 2 is to execute.
3. the value of the turn variable.

For example, the initial state is described by the triple (1,1,1). This means that thread 1 pointer is at line 1, thread 2 pointer is at line 1 and the value of the turn variable is 1. Let us first make it clear that when the pointer is at line 1, the thread has not yet executed that line. It only means that if the CPU scheduler will allow thread 1 to execute, it will execute line 1. For example, if the pointer of thread 1 is at line 4, it does not mean that the value of turn = 2. It only means that when thread 1 goes from state 4 to state 1, then the value of turn should have been set to 2.

We can draw the entire execution of the two threads using a state diagram as shown below:


Let’s try to understand this diagram. The execution begins with the state (1,1,1) which means that thread 1 pointer is at 1, thread 2 pointer is at 1, and the value of turn = 1. This state can either go to (2,1,1) or (1,2,1) depending on which thread the CPU scheduler wants to run first. The state transitions are indicated by arrows. From the diagram, you will see that when the value of turn = 1, thread 2 cannot go past pointer 2. In the same way, when the value of turn = 2, thread 1 cannot go past pointer 2.

Correctness Properties of Concurrent Programs

Writing concurrent programs is not easy. You have to demonstrate that your program is correct. What does correct mean? A concurrent program is correct if:

1. Mutual exclusion holds. There is at most one thread that enters the critical section.
2. No deadlock occurs. This means that no thread holds a resource that is needed by another thread, and vice versa. If this occurs, then each thread will be waiting for the other thread to release the resource and the application will appear to hang.
3. No starvation occurs. If a thread wants to enter the critical section, then it should eventually be able to do so.

Is our algorithm correct? First, the algorithm satisfies mutual exclusion. There is no state in which both thread pointers are pointing to line 3, which is the critical section. This means you cannot see states (3,3,1) or (3,32). There is no deadlock. The two threads do not get stuck on it’s preprotocol. Suppose that the value of turn = 1 and both threads execute line 2. Since the turn = 1, then thread 1 will enter its critical section. By the assumption of progress, it will exit its critical section and set turn = 2. This will then allow thread 2 to enter its critical section. Therefore, no thread gets stuck trying to enter its critical section.

There is however starvation. There is no assumption of progress on the non-critical section. This means that if turn = 1 and thread 1 will get stuck in its non-critical section, this will prevent thread 2 from entering its critical section. Therefore, the algorithm is not correct as it does not satisfy all conditions.

In the next post, we will examine a mechanism, called a semaphore, that is able to solve this problem.

Way Up High A Binary Tree

Way up high in the binary tree
Two binary bits smiled at me
I shook that tree as hard as I could
Down came the binary bits,
Mmmm–were they good! -based on a children’s song

Now it turns out that this binary tree is an AVL tree. I wonder how high this tree is! Why does it concern us ? The reason is that the height will give us the worst case running time of the binary search tree algorithm. An AVL tree maintains its height by making sure the difference in height between the left and right subtree is at most 1. If N_h is the number of nodes contained in the tree of height h, then N_h is equal to the number of nodes in the left subtree plus the number of nodes in the right subtree PLUS 1 (the root node). In the worst case, the two subtrees differ in height by at most 1, therefore

\displaystyle N_h = N_{h-1} + N_{h-2} + 1

with initial conditions N_0 = 0 and N_1 = 1.

By adding 1 to both sides of this recurrence relation we get

\displaystyle N_h + 1 = N_{h-1} + 1 + N_{h-2} + 1

If we let b_h = N_h + 1, we can rewrite the above recurrence relation as

\displaystyle b_h = b_{h-1} + b_{h-2}

The initial conditions are now

b_0 = N_0 + 1 = 0 + 1 = 1 and b_1 = N_1 + 1 = 1 + 1 = 2

Now this is interesting! This recurrence relation is very similar to the Fibonacci recurrence relation except that the initial conditions are different. From the previous post, we found that the general solution of this recurrence relation is

\displaystyle b_n = \alpha_1 \phi^n + \alpha_2 (1-\phi)^n

where \alpha_1 and \alpha_2 are constants yet to be determined. Let us solve this recurrence relation in the general case where b_0 = s and b_1=t. Computing the first 2 terms of the recurrence relation, we get two simultaneous equations in \alpha_1 and \alpha_2.

\displaystyle b_0 = s = \alpha_1 + \alpha_2
\displaystyle b_1 = t = \alpha_1 \phi + \alpha_2 (1-\phi)

Solving for \alpha_1 from the first equation, we get

\displaystyle \alpha_1 = s- \alpha_2

Plugging this into the second equation and solving for \alpha_2

\displaystyle t = (s-\alpha_2) \phi + \alpha_2 - \alpha_2\phi
\displaystyle t = s\phi + \alpha_2(1- 2\phi)
\displaystyle \alpha_2 = \frac{t-s\phi}{1-2\phi}

From which we solve for \alpha_1

\displaystyle \alpha_1 = s - \frac{t-s\phi}{1-2\phi}
\displaystyle = \frac{s(1-\phi) - t }{ 1-2\phi}

Substituting this to the recurrence relation of b_n, we have

\displaystyle b_n = \frac{s(1-\phi) -t }{1-2\phi} \phi^n + \frac{t - s\phi}{1-2\phi} (1-\phi)^n

Expanding this expression you’ll end up with a rather complex expression involving cross terms of \phi^n and (1-\phi)^n

\displaystyle b_n = \frac{s\phi^n ( 1- \phi) - t\phi^n + t(1-\phi)^n - s\phi(1-\phi)^n}{1-2\phi}

But that is where the complexity ends. Observe that

\displaystyle \phi(1-\phi) = \frac{1+\sqrt{5}}{2}\cdot \frac{1-\sqrt{5}}{2}
\displaystyle = \frac{1-5}{4} = -1

Using this result, we can write

\phi^{n}(1-\phi) = \phi^{n-1}\phi (1-\phi) = -\phi^{n-1}
\phi (1-\phi)^n = \phi (1-\phi) (1-\phi)^{n-1} = - (1-\phi)^{n-1}

Furthermore, we also observe that

\displaystyle 1-2\phi = 1 - \frac{1+ \sqrt{5}}{2} = -\sqrt{5}

Using all these results we have

\displaystyle b_n = - s\phi^{n-1} - t\phi^n + t(1-\phi)^n + s(1-\phi)^{n-1}
\displaystyle = s \Big[ \frac{(\phi^{n-1} - (1-\phi)^{n-1}}{\sqrt{5}} +\Big]  t \Big[ \frac{(\phi^n - (1-\phi)^n}{\sqrt{5}}\Big]

Notice that the expressions in square brackets are just fibonacci expressions for n-1 and n, respectively. If we use them in the expression for b_n, we can express it in terms of the fibonacci numbers

\displaystyle  b_n = sf_{n-1} + tf_n

Returning to our computation for the height of the AVL tree, since b_0 = s = 1 and b_1 = t = 2, we finally have

\displaystyle b_n = f_{n-1} + 2f_n

We can further simplify this by

\displaystyle b_n = f_{n-1} + f_n + f_n
\displaystyle = \Big(f_{n-1} + f_n\Big) + f_n
\displaystyle = f_{n+1} + f_n
\displaystyle = f_{n+2}

Since b_n = N_h + 1,

\displaystyle N_h + 1 = f_{n+2}
\displaystyle N_h = f_{n+2} -1

The formula above expresses the number of nodes of the AVL tree in terms of the height h. In terms of the golden ratio, we can write the last expression as

\displaystyle N_h = \frac{\phi^{n+2} - (1-\phi)^{n+2}}{\sqrt{5}} - 1

Approximating the nth Fibonacci Number

We can further simplify the above expression by observing that the nth fibonacci number is approximately equal to \phi^n/\sqrt{5}. To see this, we compute the distance of f_n to \phi^n/\sqrt{5}:

\displaystyle |f_n - \frac{\phi^n}{\sqrt{5}}| = |\frac{\phi^n - (1-\phi)^n}{\sqrt{5}} - \frac{\phi^n}{\sqrt{5}} | = |\frac{(1-\phi)^n}{\sqrt{5}}| < \frac{1}{\sqrt{5}} < \frac{1}{2}

Using this approximation, we can write

\displaystyle N_h = \frac{\phi^{h+2}}{\sqrt{5}} -1

So if n is the number of nodes in an AVL tree, we can estimate the height using the above formula, that is

\displaystyle n \approx \frac{\phi^{h+2}}{\sqrt{5}} -1
\displaystyle n + 1 \approx \frac{\phi^{h+2}}{\sqrt{5}}
\displaystyle \sqrt{5}(n+1) \approx \phi^{h+2}
\displaystyle h \approx \log_{\phi} \Big( \sqrt{5} (n+1)\Big) -2

Using a change of base

\displaystyle n \approx \frac{\log_2 \Big( \sqrt{5}(n+1)\Big)}{\log_2 \phi} -2
\displaystyle \approx \frac{\log_2\Big( \sqrt{5}(n+1)\Big) - 2\log_2\phi}{\log_2 \phi}
\displaystyle \approx \frac{1}{\log_2 \phi} \cdot \log_2\Big(\frac{\sqrt{5}(n+1)}{\phi^2}\Big)

Again using the fibonacci number approximation, we have \sqrt{5}/\phi^2 \approx f_2 = 1. This simplifies our estimate to

\displaystyle h \approx 1.44\log_2(n+1)

where 1/\log_2 \phi = 1.44.

If we have an array of a million names, then the worst case running time of the binary search is

\displaystyle 1.44\log_2(1000000 + 1) = 28.70.

That's incredibly efficient!

Reflecting on Convexity

Last Friday, I was told of a puzzle by my colleague that was given to him on an interview. The puzzle goes like this: You and an opponent sit opposite a round table. You are to place coins (of the same size) on the table, one at a time. No coin should sit on top of another coin, even partially. No part of any coin should go beyond the table’s boundaries. You take turns putting coins on the table until you can no longer find any space to fit a coin. The last person to put a coin wins. If you had the first turn, what would be your strategy? This is a puzzle i’m not be able to answer on an interview and over 2 bottles of beer. So I told my colleague i’ll work on it on the weekend.

My first impulse was to treat this as a packing of circles inside a bigger circle. To start, let us have the circle small enough to fit only a 1 layer of circles along the circumference of the middle circle, as shown below.


In this configuration, the circles in dark blue are mine while the lighter ones are of the opponents. In this game, I win because I was the last one to put the coin. However, since my opponent is also rational, he will not allow me to win. He will put his lighter colored coin in a position shown below. This game started with me putting my coin in the middle. He then put his coin on top of the middle coin. Next I put my coin next to his coin, touching each other. But my opponent now puts his second coin in such a way that it has a space between my 2nd coin and his 2nd coin for which no coin can fit. Then I put my 3rd coin next to his 2nd coin. When he put his third coin, there was no more space for me to insert another coin. So I lost.


There must be a way for me to influence his next move. Looking at the geometry of circles, there must be something that can be made use of the circular symmetry. For one thing, all points on a circle have the same distance from the center. Another thing is if you pass a line through the center it will intersect the circle in 2 points. These points are antipodes of each other. This gives a very good way to control the outcome of the game. We let our opponent select a spot to put the coin, then we get the antipod of that location and put our coin there. We do this until no more coins can be put on the table.


Let us walkthrough the sequence of steps as shown above. After I put my coin in the middle, my opponent puts his coin above it. Then what I do is to put my 2nd coin on the opposite side (antipode). My opponent then employs his usual strategy which is to put his next coin so that it will have a small space in between. However, i put my 3rd coin opposite his 2nd coin. When he puts his 3rd coin, he will realize that there is no more space for him to squeeze it in, as shown by the red coin which already goes beyond the bounder of the table. So I win!

The good news is that this is not a coincidence. We will always be the last person to put a coin on the table. To see this, assume the contrary: our opponent was able to put a coin C_o on the table but for which the antipode has no more space to accommodate the coin. This means that if we put our coin, it will intersect at least one coin C\prime. Let p\prime be a point inside this intersection. By the way this was constructed, there is another coin C which contains the antipode p of point p\prime. This means that p \in C_o \cap C, that is, C_o intersects C, which is a contradiction.

Convex Regions and Point Reflections

Will this work only with circular tables? Will it work also with rectangular tables? Will it work only with coins? Will any other shape also work? Consider a table of some arbitrary shape but with the property that for any two points p and q on this table, we can draw a line connecting them in such a way that the line is contained with in the table. We call this region a convex region. The property of a convex region is that the centroid of this region is contained within it. Shown below is an example of a convex region and a non-convex region.



A table shaped like a horseshoe is not convex and its centroid is outside of the table.

Suppose we have found the centroid of our convex region. Instead of using a coin, let’s use a star to put into the table. If our opponent puts his star anywhere on the table at a position r_1 from the centroid, then the antipode of this position must also be inside the region, by virtue of its being convex. Let us put our star in such a way that it is the reflection through the centroid of our opponent’s star. If we set up a coordinate system on the table centered at the centroid (surprise!), then point reflection through the origin transforms in the following manner:

\displaystyle (x_1,y_1) \mapsto (-x_1, -y_1)

Since this mapping is isometric, i.e, preserves distances, any two points p_1 and p_2 will retain the same distance after reflection. This means that if two stars don’t intersect, they will not suddenly intersect under this mapping.

The figure below shows a star S_1 defined by the following coordinates.

           x         y
1  0.8754257 0.4100271
2  0.8292804 0.4380167
3  0.8248856 0.4700049
4  0.7941220 0.4360175
5  0.7501740 0.4400160
6  0.7809376 0.3980315
7  0.7567662 0.3620448
8  0.8029116 0.3680426
9  0.8226882 0.3380536
10 0.8358726 0.3780389
11 0.8732283 0.4080278

This star is the one shown in blue in the first quadrant. Another star S_2 was defined by translating S_1 by 0.1 in both directions. This is the star shown in red in the first quadrant.


Applying the point reflection through the origin, we get the two stars shown in the third quadrant. The blue colored star is the reflection of the blue colored star in the first quadrant. The same thing applies to the red star. What we have just shown is that you can modify the problem to tables like squares or polygons by virtue of being convex regions. Also, by the isometry of the point reflection through the origin, we can use other shapes and not just circles.

Eating The Fruit Of The AVL Tree

We have seen that searching a name in a sorted array of a million names is very efficient when using binary search. In fact, the running time is logarithmic, which means we only need 19 comparisons, in the worst case, to figure out if our search is successful or not. We have also seen that by using a binary search tree, we can save more efficiencies when we are searching on a dynamic array of names, that is, when the list of names grows. However, we also showed that if the BST is fed with a sorted array of names, then the search becomes very inefficient. In fact, in the worst case, we need to compare 1 million times before we realize that we don’t have a name in the array.

In the last post, we built a BST from an array of a sorted list of animals. Below is the result of that construction, repeated here for convenience.

This is a sad state of affairs. We have found a very good data structure for searching but it fails when fed with sorted data. Is there anything we can do to minimize the height of this tree? Fortunately, there is! The data structure is called a self-balancing binary search tree and there are many of them. We will only take a look at one and it is called AVL tree*.

A self-balancing binary search tree is able to maintain it’s height by keeping track of a variable in each node called the balance factor. The balance factor for a node x is defined as

Height of left subtree of node minus the height of the right subtree.

Below is an example of a binary tree with balance factors indicated inside the circles. The triangles that you see in the diagram are subtrees. They may or may not be present in the node but we show them in the diagram for some reason which will become clear later.

An AVL tree employs mechanisms to limit the difference in height between subtrees to 1, 0 or -1. These mechanisms are called rotations. There are only 4 rotations that are needed to balance an AVL tree. The figure below shows you how rotations work.

The first thing to look for after inserting a node is find a balance factor that is -2 or +2. in the first tree, the balance factor is +2. this means that the left subtree is imbalanced. Next, examine the balance factor of the left child. If the balance factor is -1, then the right subtree is higher than the left. To balance this, the child in yellow replaces its parent (green node) which now becomes its left child. What used to be the right child of the node in yellow is now the right child of the green node.

However, this rotation does not change the balance factor of the root node. Another rotation will establish the balance. It involves replacing the root node ( in red) with the yellow node and making the root node the right child of the yellow node. What used to be the right child of the yellow node now becomes the left child of the red node.

When the balance factor of the root node is -2 it means that the right subtree is imbalanced. If you observe carefully, this is just symmetric to the case above and the rotation can easily be figured out from the figure.

Balancing the Animal Tree

Let us show how we can use these transformations to fix our tree. From the figure below, you can see that the imbalance occurs after we insert the cobra. At this point, the balance factor of the “alligator” node is 2. Rotating this tree by making bat the root node fixes the tree.

We then insert dog and fox before we encounter another imbalance. Replacing cobra with dog and making cobra the left child of dog fixes the imbalance.


Inserting horse to this new configuration creates an imbalance at the root of the tree. Replacing bat with dog and making cobra the right child of bat fixes the tree.

You can follow the rest of the process until we have all ten words inserted into the BST. The result is a tree with a maximum height of 4.


What does this mean? It means that it only takes a maximum of 4 comparisons to find any animal in this structure. It also means it only takes a maximum of 4 comparisons to determine if an animal is not in the structure.

In the next post, we are going to compute the the height of an AVL tree to get an estimate of the complexity of the search.

* AVL is named after Adelson-Velskii and EM Landis.