We have seen that searching a name in a sorted array of a million names is very efficient when using binary search. In fact, the running time is logarithmic, which means we only need 19 comparisons, in the worst case, to figure out if our search is successful or not. We have also seen that by using a binary search tree, we can save more efficiencies when we are searching on a dynamic array of names, that is, when the list of names grows. However, we also showed that if the BST is fed with a sorted array of names, then the search becomes very inefficient. In fact, in the worst case, we need to compare 1 million times before we realize that we don’t have a name in the array.

In the last post, we built a BST from an array of a sorted list of animals. Below is the result of that construction, repeated here for convenience.

This is a sad state of affairs. We have found a very good data structure for searching but it fails when fed with sorted data. Is there anything we can do to minimize the height of this tree? Fortunately, there is! The data structure is called a **self-balancing binary search tree** and there are many of them. We will only take a look at one and it is called **AVL tree***.

A self-balancing binary search tree is able to maintain it’s height by keeping track of a variable in each node called the *balance factor*. The balance factor for a node is defined as

Height of left subtree of node minus the height of the right subtree.

Below is an example of a binary tree with balance factors indicated inside the circles. The triangles that you see in the diagram are subtrees. They may or may not be present in the node but we show them in the diagram for some reason which will become clear later.

An AVL tree employs mechanisms to limit the difference in height between subtrees to 1, 0 or -1. These mechanisms are called **rotations**. There are only 4 rotations that are needed to balance an AVL tree. The figure below shows you how rotations work.

The first thing to look for after inserting a node is find a balance factor that is -2 or +2. in the first tree, the balance factor is +2. this means that the left subtree is imbalanced. Next, examine the balance factor of the left child. If the balance factor is -1, then the right subtree is higher than the left. To balance this, the child in yellow replaces its parent (green node) which now becomes its left child. What used to be the right child of the node in yellow is now the right child of the green node.

However, this rotation does not change the balance factor of the root node. Another rotation will establish the balance. It involves replacing the root node ( in red) with the yellow node and making the root node the right child of the yellow node. What used to be the right child of the yellow node now becomes the left child of the red node.

When the balance factor of the root node is -2 it means that the right subtree is imbalanced. If you observe carefully, this is just symmetric to the case above and the rotation can easily be figured out from the figure.

**Balancing the Animal Tree**

Let us show how we can use these transformations to fix our tree. From the figure below, you can see that the imbalance occurs after we insert the `cobra`

. At this point, the balance factor of the “alligator” node is 2. Rotating this tree by making `bat`

the root node fixes the tree.

We then insert `dog`

and `fox`

before we encounter another imbalance. Replacing `cobra`

with `dog`

and making `cobra`

the left child of `dog`

fixes the imbalance.

Inserting `horse`

to this new configuration creates an imbalance at the root of the tree. Replacing `bat`

with `dog`

and making `cobra`

the right child of `bat`

fixes the tree.

You can follow the rest of the process until we have all ten words inserted into the BST. The result is a tree with a maximum height of 4.

What does this mean? It means that it only takes a maximum of 4 comparisons to find any animal in this structure. It also means it only takes a maximum of 4 comparisons to determine if an animal is not in the structure.

In the next post, we are going to compute the the height of an AVL tree to get an estimate of the complexity of the search.

* AVL is named after Adelson-Velskii and EM Landis.