# Way Up High A Binary Tree

Way up high in the binary tree
Two binary bits smiled at me
I shook that tree as hard as I could
Down came the binary bits,
Mmmm–were they good! -based on a children’s song

Now it turns out that this binary tree is an AVL tree. I wonder how high this tree is! Why does it concern us ? The reason is that the height will give us the worst case running time of the binary search tree algorithm. An AVL tree maintains its height by making sure the difference in height between the left and right subtree is at most 1. If $N_h$ is the number of nodes contained in the tree of height h, then $N_h$ is equal to the number of nodes in the left subtree plus the number of nodes in the right subtree PLUS 1 (the root node). In the worst case, the two subtrees differ in height by at most 1, therefore

$\displaystyle N_h = N_{h-1} + N_{h-2} + 1$

with initial conditions $N_0 = 0$ and $N_1 = 1$.

By adding 1 to both sides of this recurrence relation we get

$\displaystyle N_h + 1 = N_{h-1} + 1 + N_{h-2} + 1$

If we let $b_h = N_h + 1$, we can rewrite the above recurrence relation as

$\displaystyle b_h = b_{h-1} + b_{h-2}$

The initial conditions are now

$b_0 = N_0 + 1 = 0 + 1 = 1$ and $b_1 = N_1 + 1 = 1 + 1 = 2$

Now this is interesting! This recurrence relation is very similar to the Fibonacci recurrence relation except that the initial conditions are different. From the previous post, we found that the general solution of this recurrence relation is

$\displaystyle b_n = \alpha_1 \phi^n + \alpha_2 (1-\phi)^n$

where $\alpha_1$ and $\alpha_2$ are constants yet to be determined. Let us solve this recurrence relation in the general case where $b_0 = s$ and $b_1=t$. Computing the first 2 terms of the recurrence relation, we get two simultaneous equations in $\alpha_1$ and $\alpha_2$.

$\displaystyle b_0 = s = \alpha_1 + \alpha_2$
$\displaystyle b_1 = t = \alpha_1 \phi + \alpha_2 (1-\phi)$

Solving for $\alpha_1$ from the first equation, we get

$\displaystyle \alpha_1 = s- \alpha_2$

Plugging this into the second equation and solving for $\alpha_2$

$\displaystyle t = (s-\alpha_2) \phi + \alpha_2 - \alpha_2\phi$
$\displaystyle t = s\phi + \alpha_2(1- 2\phi)$
$\displaystyle \alpha_2 = \frac{t-s\phi}{1-2\phi}$

From which we solve for $\alpha_1$

$\displaystyle \alpha_1 = s - \frac{t-s\phi}{1-2\phi}$
$\displaystyle = \frac{s(1-\phi) - t }{ 1-2\phi}$

Substituting this to the recurrence relation of $b_n$, we have

$\displaystyle b_n = \frac{s(1-\phi) -t }{1-2\phi} \phi^n + \frac{t - s\phi}{1-2\phi} (1-\phi)^n$

Expanding this expression you’ll end up with a rather complex expression involving cross terms of $\phi^n$ and $(1-\phi)^n$

$\displaystyle b_n = \frac{s\phi^n ( 1- \phi) - t\phi^n + t(1-\phi)^n - s\phi(1-\phi)^n}{1-2\phi}$

But that is where the complexity ends. Observe that

$\displaystyle \phi(1-\phi) = \frac{1+\sqrt{5}}{2}\cdot \frac{1-\sqrt{5}}{2}$
$\displaystyle = \frac{1-5}{4} = -1$

Using this result, we can write

$\phi^{n}(1-\phi) = \phi^{n-1}\phi (1-\phi) = -\phi^{n-1}$
$\phi (1-\phi)^n = \phi (1-\phi) (1-\phi)^{n-1} = - (1-\phi)^{n-1}$

Furthermore, we also observe that

$\displaystyle 1-2\phi = 1 - \frac{1+ \sqrt{5}}{2} = -\sqrt{5}$

Using all these results we have

$\displaystyle b_n = - s\phi^{n-1} - t\phi^n + t(1-\phi)^n + s(1-\phi)^{n-1}$
$\displaystyle = s \Big[ \frac{(\phi^{n-1} - (1-\phi)^{n-1}}{\sqrt{5}} +\Big] t \Big[ \frac{(\phi^n - (1-\phi)^n}{\sqrt{5}}\Big]$

Notice that the expressions in square brackets are just fibonacci expressions for n-1 and n, respectively. If we use them in the expression for $b_n$, we can express it in terms of the fibonacci numbers

$\displaystyle b_n = sf_{n-1} + tf_n$

Returning to our computation for the height of the AVL tree, since $b_0 = s = 1$ and $b_1 = t = 2$, we finally have

$\displaystyle b_n = f_{n-1} + 2f_n$

We can further simplify this by

$\displaystyle b_n = f_{n-1} + f_n + f_n$
$\displaystyle = \Big(f_{n-1} + f_n\Big) + f_n$
$\displaystyle = f_{n+1} + f_n$
$\displaystyle = f_{n+2}$

Since $b_n = N_h + 1$,

$\displaystyle N_h + 1 = f_{n+2}$
$\displaystyle N_h = f_{n+2} -1$

The formula above expresses the number of nodes of the AVL tree in terms of the height h. In terms of the golden ratio, we can write the last expression as

$\displaystyle N_h = \frac{\phi^{n+2} - (1-\phi)^{n+2}}{\sqrt{5}} - 1$

Approximating the nth Fibonacci Number

We can further simplify the above expression by observing that the nth fibonacci number is approximately equal to $\phi^n/\sqrt{5}$. To see this, we compute the distance of $f_n$ to $\phi^n/\sqrt{5}$:

$\displaystyle |f_n - \frac{\phi^n}{\sqrt{5}}| = |\frac{\phi^n - (1-\phi)^n}{\sqrt{5}} - \frac{\phi^n}{\sqrt{5}} | = |\frac{(1-\phi)^n}{\sqrt{5}}| < \frac{1}{\sqrt{5}} < \frac{1}{2}$

Using this approximation, we can write

$\displaystyle N_h = \frac{\phi^{h+2}}{\sqrt{5}} -1$

So if n is the number of nodes in an AVL tree, we can estimate the height using the above formula, that is

$\displaystyle n \approx \frac{\phi^{h+2}}{\sqrt{5}} -1$
$\displaystyle n + 1 \approx \frac{\phi^{h+2}}{\sqrt{5}}$
$\displaystyle \sqrt{5}(n+1) \approx \phi^{h+2}$
$\displaystyle h \approx \log_{\phi} \Big( \sqrt{5} (n+1)\Big) -2$

Using a change of base

$\displaystyle n \approx \frac{\log_2 \Big( \sqrt{5}(n+1)\Big)}{\log_2 \phi} -2$
$\displaystyle \approx \frac{\log_2\Big( \sqrt{5}(n+1)\Big) - 2\log_2\phi}{\log_2 \phi}$
$\displaystyle \approx \frac{1}{\log_2 \phi} \cdot \log_2\Big(\frac{\sqrt{5}(n+1)}{\phi^2}\Big)$

Again using the fibonacci number approximation, we have $\sqrt{5}/\phi^2 \approx f_2 = 1$. This simplifies our estimate to

$\displaystyle h \approx 1.44\log_2(n+1)$

where $1/\log_2 \phi = 1.44$.

If we have an array of a million names, then the worst case running time of the binary search is

$\displaystyle 1.44\log_2(1000000 + 1) = 28.70$.

That's incredibly efficient!