Way Up High A Binary Tree

Way up high in the binary tree
Two binary bits smiled at me
I shook that tree as hard as I could
Down came the binary bits,
Mmmm–were they good! -based on a children’s song

Now it turns out that this binary tree is an AVL tree. I wonder how high this tree is! Why does it concern us ? The reason is that the height will give us the worst case running time of the binary search tree algorithm. An AVL tree maintains its height by making sure the difference in height between the left and right subtree is at most 1. If N_h is the number of nodes contained in the tree of height h, then N_h is equal to the number of nodes in the left subtree plus the number of nodes in the right subtree PLUS 1 (the root node). In the worst case, the two subtrees differ in height by at most 1, therefore

\displaystyle N_h = N_{h-1} + N_{h-2} + 1

with initial conditions N_0 = 0 and N_1 = 1.

By adding 1 to both sides of this recurrence relation we get

\displaystyle N_h + 1 = N_{h-1} + 1 + N_{h-2} + 1

If we let b_h = N_h + 1, we can rewrite the above recurrence relation as

\displaystyle b_h = b_{h-1} + b_{h-2}

The initial conditions are now

b_0 = N_0 + 1 = 0 + 1 = 1 and b_1 = N_1 + 1 = 1 + 1 = 2

Now this is interesting! This recurrence relation is very similar to the Fibonacci recurrence relation except that the initial conditions are different. From the previous post, we found that the general solution of this recurrence relation is

\displaystyle b_n = \alpha_1 \phi^n + \alpha_2 (1-\phi)^n

where \alpha_1 and \alpha_2 are constants yet to be determined. Let us solve this recurrence relation in the general case where b_0 = s and b_1=t. Computing the first 2 terms of the recurrence relation, we get two simultaneous equations in \alpha_1 and \alpha_2.

\displaystyle b_0 = s = \alpha_1 + \alpha_2
\displaystyle b_1 = t = \alpha_1 \phi + \alpha_2 (1-\phi)

Solving for \alpha_1 from the first equation, we get

\displaystyle \alpha_1 = s- \alpha_2

Plugging this into the second equation and solving for \alpha_2

\displaystyle t = (s-\alpha_2) \phi + \alpha_2 - \alpha_2\phi
\displaystyle t = s\phi + \alpha_2(1- 2\phi)
\displaystyle \alpha_2 = \frac{t-s\phi}{1-2\phi}

From which we solve for \alpha_1

\displaystyle \alpha_1 = s - \frac{t-s\phi}{1-2\phi}
\displaystyle = \frac{s(1-\phi) - t }{ 1-2\phi}

Substituting this to the recurrence relation of b_n, we have

\displaystyle b_n = \frac{s(1-\phi) -t }{1-2\phi} \phi^n + \frac{t - s\phi}{1-2\phi} (1-\phi)^n

Expanding this expression you’ll end up with a rather complex expression involving cross terms of \phi^n and (1-\phi)^n

\displaystyle b_n = \frac{s\phi^n ( 1- \phi) - t\phi^n + t(1-\phi)^n - s\phi(1-\phi)^n}{1-2\phi}

But that is where the complexity ends. Observe that

\displaystyle \phi(1-\phi) = \frac{1+\sqrt{5}}{2}\cdot \frac{1-\sqrt{5}}{2}
\displaystyle = \frac{1-5}{4} = -1

Using this result, we can write

\phi^{n}(1-\phi) = \phi^{n-1}\phi (1-\phi) = -\phi^{n-1}
\phi (1-\phi)^n = \phi (1-\phi) (1-\phi)^{n-1} = - (1-\phi)^{n-1}

Furthermore, we also observe that

\displaystyle 1-2\phi = 1 - \frac{1+ \sqrt{5}}{2} = -\sqrt{5}

Using all these results we have

\displaystyle b_n = - s\phi^{n-1} - t\phi^n + t(1-\phi)^n + s(1-\phi)^{n-1}
\displaystyle = s \Big[ \frac{(\phi^{n-1} - (1-\phi)^{n-1}}{\sqrt{5}} +\Big]  t \Big[ \frac{(\phi^n - (1-\phi)^n}{\sqrt{5}}\Big]

Notice that the expressions in square brackets are just fibonacci expressions for n-1 and n, respectively. If we use them in the expression for b_n, we can express it in terms of the fibonacci numbers

\displaystyle  b_n = sf_{n-1} + tf_n

Returning to our computation for the height of the AVL tree, since b_0 = s = 1 and b_1 = t = 2, we finally have

\displaystyle b_n = f_{n-1} + 2f_n

We can further simplify this by

\displaystyle b_n = f_{n-1} + f_n + f_n
\displaystyle = \Big(f_{n-1} + f_n\Big) + f_n
\displaystyle = f_{n+1} + f_n
\displaystyle = f_{n+2}

Since b_n = N_h + 1,

\displaystyle N_h + 1 = f_{n+2}
\displaystyle N_h = f_{n+2} -1

The formula above expresses the number of nodes of the AVL tree in terms of the height h. In terms of the golden ratio, we can write the last expression as

\displaystyle N_h = \frac{\phi^{n+2} - (1-\phi)^{n+2}}{\sqrt{5}} - 1

Approximating the nth Fibonacci Number

We can further simplify the above expression by observing that the nth fibonacci number is approximately equal to \phi^n/\sqrt{5}. To see this, we compute the distance of f_n to \phi^n/\sqrt{5}:

\displaystyle |f_n - \frac{\phi^n}{\sqrt{5}}| = |\frac{\phi^n - (1-\phi)^n}{\sqrt{5}} - \frac{\phi^n}{\sqrt{5}} | = |\frac{(1-\phi)^n}{\sqrt{5}}| < \frac{1}{\sqrt{5}} < \frac{1}{2}

Using this approximation, we can write

\displaystyle N_h = \frac{\phi^{h+2}}{\sqrt{5}} -1

So if n is the number of nodes in an AVL tree, we can estimate the height using the above formula, that is

\displaystyle n \approx \frac{\phi^{h+2}}{\sqrt{5}} -1
\displaystyle n + 1 \approx \frac{\phi^{h+2}}{\sqrt{5}}
\displaystyle \sqrt{5}(n+1) \approx \phi^{h+2}
\displaystyle h \approx \log_{\phi} \Big( \sqrt{5} (n+1)\Big) -2

Using a change of base

\displaystyle n \approx \frac{\log_2 \Big( \sqrt{5}(n+1)\Big)}{\log_2 \phi} -2
\displaystyle \approx \frac{\log_2\Big( \sqrt{5}(n+1)\Big) - 2\log_2\phi}{\log_2 \phi}
\displaystyle \approx \frac{1}{\log_2 \phi} \cdot \log_2\Big(\frac{\sqrt{5}(n+1)}{\phi^2}\Big)

Again using the fibonacci number approximation, we have \sqrt{5}/\phi^2 \approx f_2 = 1. This simplifies our estimate to

\displaystyle h \approx 1.44\log_2(n+1)

where 1/\log_2 \phi = 1.44.

If we have an array of a million names, then the worst case running time of the binary search is

\displaystyle 1.44\log_2(1000000 + 1) = 28.70.

That's incredibly efficient!


Published by

Bobby Corpus

Is an IT Architect.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s