We have seen in the previous post that searching a name in a sorted array of 1 million names can be done very efficiently using the binary search. But that was only the second half of the story. What was not told was how we ended up with a sorted array. Was it handed to us already sorted or were we the ones who sorted it?
Now suppose that someone gave us an unsorted list of 10 different names to append to the original 1 million. How do we search now for a name? We cannot just append these 10 names to the array of sorted names because it will destroy the “sorted” property of the original array. What we can do is increase the length of the original array by 10 more slots and reinsert the 1 million and 10 names to this new array and sorting the new array. Reinserting would take linear time and re-sorting on a 99% sorted array takes quadratic time using quicksort. So is there a better way to do it? Yes there is and we are going to build a tree for it.
Suppose we are given a list of animals:
leopard, cobra, shark, horse, alligator, bat, tiger, fox, dog, pig
Our task is to create a structure that will allow adding more entries and still be able to search as fast as a binary search. To do this, let’s construct a tree with the following properties
– each node will contain at most 2 children
– each node is identified by a key. For this example, we will use the name of the animal as key.
– the left child of each node should have a key that is less than the value of the node.
– the right child of each node should have a key that is greater than the value of the node.
We call the result of this construction a Binary Search Tree or BST for short.
Given these rules, we can start constructing our BST starting from the word
leopard as the root of the tree. The next word is
cobra is less than
leopard (lexicographically speaking), we put
cobra as the left child of
leopard. The next word is
shark is greater than
leopard, we put shark as the right child of
leopard already has 2 children, we expect the other words to be children of either
The next word is
horse is less than
leopard, we compare it with
horse is greater than
cobra, we make
horse the right child of
cobra. Below is a step by step visual construction of the BST.
The words in bold orange are the newly inserted words into the BST structure. The resulting BST is shown below.
Searching a word on a BST is almost the same as inserting an entry. You first compare your search term with the key of the root. If they are equal, you have found the entry. If the search term is less than the root key, then search the left subtree. If the search term is greater than the root key, then search the right subtree. If the search term is not equal to the node key and the node has no more children, then the word is not in the list. This algorithm is recursive and is shown below:
boolean search_BST(Node node, String searchTerm): if node is null return false if node key is equal to searchTerm: return true if node key is less than searchTerm: search_BST(leftChild, searchTerm) else search_BST(rightChild, searchTerm)
Let’s trace how to search for the word
bat. We start at the root
leopard, we search the left subtree of
leopard which is rooted at
cobra. We now compare
cobra, we search the left subtree of
cobra, which is rooted at
alligator, we search the right subtree of
alligator which is rooted at
bat. However, since the search term is equal to
bat, we have found our word and return true. It only took us 4 comparison to find the word
If we look at our data structure closely, we observe that the running time of the search algorithm is proportional to the height of the tree. It is therefore in our best interest to make our tree as small as possible. But how small? Given a tree of n nodes, how small can be we build our binary tree? At the root (level 0) we have 1 node. At level 1, we can only attach 2 nodes. At level 2, we can attach 4 nodes. At level , which we denote as the height of the tree, we can have a maximum of
This means that the smallest possible height of the binary tree we can create from n nodes is .
Looking at our example, we can see that the left subtree is 2 levels deeper than the right subtree. This means that our tree is not optimal. Ideally, our tree should have a height of . In fact, if we built our BST from a sorted list, say
alligator, bat, cobra, dog, fox, horse, leopard, pig,shark, tiger