How Do I love Thee? Let Me Count The Ways…

Counting is perhaps one of the very first skills we learn as a child. I remember, when I was small, watching Count von Count on Sesame Street teaching kids to count, in a rather frightening way! Even now, counting is still a very important skill for me in my profession as a programmer.

Counting allows me to determine if my program will run slow or fast. It allows me to determine if I need to a wait a couple of seconds, minutes, hours or even centuries for my computation to finish. Even with the proliferation of supercomputers with teraflops of computing power, an algorithm that is inherently slow will take forever to compute as the input data becomes large enough.

So How Do I Determine If My Algorithm Is Slow?

Let me illustrate the point by applying counting to the two Sorting Algorithms below. In both algorithms, the input is an array of unsorted numbers.

Algorithm 1: Find the least element in the list. This element is moved to the front. Then the first element among the remaining elements is found and put into the second position. This procedure is repeated until the entire list has been sorted.

Algorithm 2: Take the first element a1 and form 2 sublists, the first containing those elements less than a1, in the order they arise, and the second containing all elements greater than a1, in the order they arise. Then a1 is put at the end of the sublist. Repeat recursively for each sublist until all sublists contain one item.

When you sort a list, the thing which you always do is comparing one element to the next. This is a fundamental operation of sorting. And this operation is what we are going to count. The algorithm which executes the most comparisons will loose.

Analysis of Algorithm 1

Imagine we have an array $A$ of $n$ integers in unsorted order. According to algorithm 1, to sort this array, we first need to find the least element of array $A$. How many numbers do we need to examine in array $A$ in order to determine the least element? The thing is, we will never know which element is the smallest unless we examine “every” number in the array. But there are $n$ numbers in array $A$. Therefore, in the first iteration, the number of comparisons is $n$.

After the first iteration, we have identified the least element in the array $A$. In the second iteration, we apply the same technique of finding the least element in the remaining list of numbers. So how many comparisons do we need to identify the next smallest number? Since the remaining list contains $n-1$ numbers, therefore, the number of comparisons is $n-1$.

There is already a pattern emerging here. It seems that in the third iteration, we will need $n-2$ comparisons. This is indeed the case. So if we put our discovery in a mathematical form, the total number of comparisons $C$ to sort the entire array is therefore:

$\displaystyle C = n + (n - 1) + (n - 2) + ... + 2 + 1 = \sum_{i=0}^{n-1} (n -i )$

We can rearrange the sum of terms such that the summation looks like this:

$\displaystyle (n + 1) + (n - 1 + 2) + (n - 2 + 3) ... = (n+1) + (n+1) + (n+1) + ....$

If $n$ is even, then we have exactly $\frac{n}{2}$ terms of $(n+1)$, that is,:

$\displaystyle \sum_{i=0}^{n-1} (n - i) = \frac{ n ( n + 1)}{2}$

What happens when $n$ is odd? For example, if $n = 5$, then

$\displaystyle 5 + 4 +3 +2 +1 = (5+1) + (4+2) + 3 = 6 + 6 + 3 = 2(6) + 3$

If you take a look at the rightmost side, you can see a pattern. It seems that for an arbitrary integer $n$, we have

$\displaystyle \sum_{i=0}^{n-1} (n - i) = \frac{n-1}{2} (n+1) + \frac{n+1}{2}$
$\displaystyle \sum_{i=0}^{n-1} (n - i) = \frac{n^2 -1 + n + 1}{2} = \frac{n^2 + n}{2} = \frac{n(n+1)}{2}$

This means that for any $n$, whether odd or even, the number of comparisons executed by Algorithm 1 is

$\displaystyle \sum_{i=0}^{n-1} (n - i) = \frac{ n ( n + 1)}{2}$

Observe that you can also write

$\displaystyle \sum_{i=0}^{n-1} (n - i) = \sum_{i=1}^n i= \frac{ n ( n + 1)}{2}$

Analysis of Algorithm 2

On the average, we can assume that whenever we pick the first element $a1$, half of the list will be less than $a1$ and the other half is greater than $a1$. To make things simpler, let’s further assume that the number of elements $n$ is a power of 2, that is, $n = 2^m$. This is will ensure us that whenever we halve, we always get a whole number.

So, on the zeroth iteration, the list will be split into two equal parts of size $2^{m}/2 = 2^{m - 1}$. Below is what the input will be like after the first iteration.

Therefore, in the zeroth iteration, the total number of comparisons is $2^{m-1}$.

In the first iteration, the input to the algorithm are the two lists we got above. Since the size of each list is $2^{m-1}$, when we halve this list, we get 2 lists of size $2^{m-1}/2 = 2^{m-2}$. This is also the number of comparisons we need for each list. Since there are 2 lists at this iteration, the number of comparisons at this iteration is therefore $2\times 2^{m-2}$.

On the second iteration, there are 4 lists that are input to the algorithm. Each list has a size of $2^{m-2}$. Halving each list, we get 2 lists of size $2^{m-3}$. This is also the number of comparisons needed to half each input list. Since there are 4 lists in total at this iteration, the total number of comparisons is therefore $4\times 2^{m-3} = 2^2\times 2^{m-3}$.

At this point, let’s pause for a while and see what we have:

 Iteration Number # of Comparisons 0 $2^0 \times 2^{m-1}$ 1 $2^1 \times 2^{m-2}$ 2 $2^2 \times 2^{m-3}$

At this point, you will realize that a pattern is emerging. The pattern is this, for each iteration $i$, the number of comparisons is $2^{i}\times 2^{m-i -1}$.

You can continue this pattern until the number of elements of each list is equal to 1. This will occur on the iteration $i = m -1$. If you total the number of comparisons across all iterations, you will get the total number of comparisons $C$ executed by the algorithm, which is

$\displaystyle C = \sum_{i=0}^{m-1} 2^i\times 2^{m-i -1}$

Applying the rule of exponents:

$\displaystyle = \sum_{i=0}^{m-1} 2^{m-1}$

Since the term to be summed is independent of $m$, this simply reduces to adding $2^{m-1}$ m times:

$\displaystyle = m2^{m-1}$

Since we assumed $n = 2^m$, then taking the base 2 logarithms of both sides we get $m = \lg_2 m$. Using this result above, we get

$\displaystyle C = \frac{n \lg_2 n }{2}$

The Winner

By counting, we were able to compute the number of comparisons for algorithms 1 and 2. Below is a table of the results of our counting:

 Algorithm # of Comparisons for input of size n 1 $\displaystyle \frac{ n ( n + 1)}{2}$ 2 $\displaystyle \frac{n \lg_2 n }{2}$

Here is a comparison for various values of $n$

      n algorithm1 algorithm2
1     1          1          0
2   100       5050        333
3  1000     500500       4983
4 10000   50005000      66439


For an input of size ten thousand, algorithm 1 will take 50 million comparisons as compared to algorithm 2 which takes only 66 thousand comparisons. And the winner is Algorithm 2!