We have seen in the previous post that searching a name in a sorted array of 1 million names can be done very efficiently using the binary search. But that was only the second half of the story. What was not told was how we ended up with a sorted array. Was it handed to us already sorted or were we the ones who sorted it?

Now suppose that someone gave us an unsorted list of 10 different names to append to the original 1 million. How do we search now for a name? We cannot just append these 10 names to the array of sorted names because it will destroy the “sorted” property of the original array. What we can do is increase the length of the original array by 10 more slots and reinsert the 1 million and 10 names to this new array and sorting the new array. Reinserting would take linear time and re-sorting on a 99% sorted array takes quadratic time using quicksort. So is there a better way to do it? Yes there is and we are going to build a tree for it.

Suppose we are given a list of animals:

`leopard, cobra, shark, horse, alligator, bat, tiger, fox, dog, pig`

Our task is to create a structure that will allow adding more entries and still be able to search as fast as a binary search. To do this, let’s construct a tree with the following properties

– each node will contain at most 2 children

– each node is identified by a key. For this example, we will use the name of the animal as key.

– the left child of each node should have a key that is less than the value of the node.

– the right child of each node should have a key that is greater than the value of the node.

We call the result of this construction a **Binary Search Tree** or **BST** for short.

Given these rules, we can start constructing our BST starting from the word `leopard`

as the root of the tree. The next word is `cobra`

. Since `cobra`

is less than `leopard`

(lexicographically speaking), we put `cobra`

as the left child of `leopard`

. The next word is `shark`

. Since `shark`

is greater than `leopard`

, we put shark as the right child of `leopard`

. Since `leopard`

already has 2 children, we expect the other words to be children of either `cobra`

or `shark`

.

The next word is `horse`

. Since `horse`

is less than `leopard`

, we compare it with `cobra`

. Since `horse`

is greater than `cobra`

, we make `horse`

the right child of `cobra`

. Below is a step by step visual construction of the BST.

The words in bold orange are the newly inserted words into the BST structure. The resulting BST is shown below.

Searching a word on a BST is almost the same as inserting an entry. You first compare your search term with the key of the root. If they are equal, you have found the entry. If the search term is less than the root key, then search the left subtree. If the search term is greater than the root key, then search the right subtree. If the search term is not equal to the node key and the node has no more children, then the word is not in the list. This algorithm is recursive and is shown below:

boolean search_BST(Node node, String searchTerm): if node is null return false if node key is equal to searchTerm: return true if node key is less than searchTerm: search_BST(leftChild, searchTerm) else search_BST(rightChild, searchTerm)

Let’s trace how to search for the word `bat`

. We start at the root `leopard`

. Since `bat`

< `leopard`

, we search the left subtree of `leopard`

which is rooted at `cobra`

. We now compare `bat`

and `cobra`

. Since `bat`

< `cobra`

, we search the left subtree of `cobra`

, which is rooted at `alligator`

. Since `bat`

> `alligator`

, we search the right subtree of `alligator`

which is rooted at `bat`

. However, since the search term is equal to `bat`

, we have found our word and return true. It only took us 4 comparison to find the word `bat`

.

If we look at our data structure closely, we observe that the running time of the search algorithm is proportional to the height of the tree. It is therefore in our best interest to make our tree as small as possible. But how small? Given a tree of n nodes, how small can be we build our binary tree? At the root (level 0) we have 1 node. At level 1, we can only attach 2 nodes. At level 2, we can attach 4 nodes. At level , which we denote as the height of the tree, we can have a maximum of

This means that the smallest possible height of the binary tree we can create from n nodes is .

Looking at our example, we can see that the left subtree is 2 levels deeper than the right subtree. This means that our tree is not optimal. Ideally, our tree should have a height of . In fact, if we built our BST from a sorted list, say

`alligator, bat, cobra, dog, fox, horse, leopard, pig,shark, tiger`

we will get this structure:

With the structure above, the worst-case running time is . So how do we optimize our BST? That will be the topic of the next post.

## One thought on “Trees Are Made By Fools Like Me”