# Trees Are Made By Fools Like Me

We have seen in the previous post that searching a name in a sorted array of 1 million names can be done very efficiently using the binary search. But that was only the second half of the story. What was not told was how we ended up with a sorted array. Was it handed to us already sorted or were we the ones who sorted it?

Now suppose that someone gave us an unsorted list of 10 different names to append to the original 1 million. How do we search now for a name? We cannot just append these 10 names to the array of sorted names because it will destroy the “sorted” property of the original array. What we can do is increase the length of the original array by 10 more slots and reinsert the 1 million and 10 names to this new array and sorting the new array. Reinserting would take linear time and re-sorting on a 99% sorted array takes quadratic time using quicksort. So is there a better way to do it? Yes there is and we are going to build a tree for it.

Suppose we are given a list of animals:

leopard, cobra, shark, horse, alligator, bat, tiger, fox, dog, pig

Our task is to create a structure that will allow adding more entries and still be able to search as fast as a binary search. To do this, let’s construct a tree with the following properties

– each node will contain at most 2 children
– each node is identified by a key. For this example, we will use the name of the animal as key.
– the left child of each node should have a key that is less than the value of the node.
– the right child of each node should have a key that is greater than the value of the node.

We call the result of this construction a Binary Search Tree or BST for short.

Given these rules, we can start constructing our BST starting from the word leopard as the root of the tree. The next word is cobra. Since cobra is less than leopard (lexicographically speaking), we put cobra as the left child of leopard. The next word is shark. Since shark is greater than leopard, we put shark as the right child of leopard. Since leopard already has 2 children, we expect the other words to be children of either cobra or shark.

The next word is horse. Since horse is less than leopard, we compare it with cobra. Since horse is greater than cobra, we make horse the right child of cobra. Below is a step by step visual construction of the BST.

The words in bold orange are the newly inserted words into the BST structure. The resulting BST is shown below.

Searching a word on a BST is almost the same as inserting an entry. You first compare your search term with the key of the root. If they are equal, you have found the entry. If the search term is less than the root key, then search the left subtree. If the search term is greater than the root key, then search the right subtree. If the search term is not equal to the node key and the node has no more children, then the word is not in the list. This algorithm is recursive and is shown below:

boolean search_BST(Node node, String searchTerm):
if node is null
return false
if node key is equal to searchTerm:
return true
if node key is less than searchTerm:
search_BST(leftChild, searchTerm)
else
search_BST(rightChild, searchTerm)



Let’s trace how to search for the word bat. We start at the root leopard. Since bat < leopard, we search the left subtree of leopard which is rooted at cobra. We now compare bat and cobra. Since bat < cobra, we search the left subtree of cobra, which is rooted at alligator. Since bat > alligator, we search the right subtree of alligator which is rooted at bat. However, since the search term is equal to bat, we have found our word and return true. It only took us 4 comparison to find the word bat.

If we look at our data structure closely, we observe that the running time of the search algorithm is proportional to the height of the tree. It is therefore in our best interest to make our tree as small as possible. But how small? Given a tree of n nodes, how small can be we build our binary tree? At the root (level 0) we have 1 node. At level 1, we can only attach 2 nodes. At level 2, we can attach 4 nodes. At level $h$, which we denote as the height of the tree, we can have a maximum of

$\displaystyle n \le 2^0 + 2^1 + 2^2 + \ldots + 2^h = \sum_{i=0}^h 2^i = 2^{h+1} -1 < 2^{h+1}$

This means that the smallest possible height of the binary tree we can create from n nodes is $h = \lfloor \log_2 n \rfloor$.

Looking at our example, we can see that the left subtree is 2 levels deeper than the right subtree. This means that our tree is not optimal. Ideally, our tree should have a height of $\lfloor \log_2 10 \rfloor = 3$. In fact, if we built our BST from a sorted list, say

alligator, bat, cobra, dog, fox, horse, leopard, pig,shark, tiger

we will get this structure:

With the structure above, the worst-case running time is $O(n)$. So how do we optimize our BST? That will be the topic of the next post.