A Match Made In Heaven

There are some things which go naturally with each other. For some, it is a person whom they consider a soulmate. In the case of coffee, it is a creamer. For algorithms, it is the data structure. For example, a recursive algorithms suits very well with a tree data structure and a tree data structure suits well with a recursive algorithm.

I assume that you are a programmer or know how to program and that you know what a data structure is. Examples of data structures we usually encounter in daily programming chore is the array. An array is a collection of objects, usually of the same kind. Arrays go well with iterative algorithms like the for loop or while loop. A more interesting kind of data structure is the tree. You can imagine a tree as a hierarchical structure, much like the what you see in Windows Explorer where a filesystem is displayed as a tree of directories. I have encountered programmers, in my career, who don’t know how to navigate a directory structure in code. They usually come up with a spaghetti-like code but which are not able to go deep into the directory structure. However, a good programmer can navigate a directory tree with a few lines of code.

Let’s have an example. Below is a screenshot of a directory structure. It is composed of files and directories (surprise!). We know that directories can contain other files and directories whereas files don’t have that property. In tree language, we call files the “leaves” while directories are the nodes. Well not exactly, because if a directory is empty, it can appear as a leaf. To avoid confusion, let us treat both files and directories in a generic manner and call them nodes. A directory is a special kind of node in that it can contain other nodes, which in this case are files and directories. To traverse a directory structure, we follow the algorithm below:

algorithm traverseTree(Node node):
    print node name
    if node is a directory
        list the children of the directory
        for each child

Notice that the algorithm above calls itself and accepts any node as input, whether it’s a file or a directory. This is an example of a recursive algorithm applied to a tree data structure. Simple and elegant!

You might ask “What is the complexity of the traverseTree algorithm?”. Before we can answer that, let’s agree first on some terminology. We define a graph to be a set of nodes connected to each other by edges. An abstract model of the directory structure above is shown below. The circles represent the nodes while the lines connecting them represent the edges. Therefore a graph G is composed of a set of nodes V and a set of edges E and we denote a graph as G(V,E). Using this definition, you can see that a tree is just a special case of a graph. Let v\in V be a node. The degree of a node \deg(v) is the number of edges it has. For example, if a node is connected to 3 other nodes, then the degree of that node is 3.

An interesting property of the degree of a node is that the total sum of the degrees of each node is equal to 2 times the number of edges, that is,

\displaystyle \sum_{v\in V} \deg(v) = 2\cdot |E|.

To see why this is, assume you have two nodes u and v connected by an edge. Now, what is the degree of u and the degree of v? By definition, the degree of a node is the number of edges it has. Since u has one edge, \deg(u) = 1. In the same way, since v has one edge, \deg(v) = 1. Therefore, \deg(u) + \deg(v) = 1 + 1 = 2, which is equal to twice the number of edges of graph G(V,E).

Let’s now go back to our main task of computing the complexity of the algorithm above. For each node v, the number of times “print node name” is executed is equal to 1 (surprise!). The number of times the statement “if node is a directory” is executed is also 1. Finally, the number of times the for loop is executed is equal to the degree of node v, that is, \deg(v). Since we do this for each node of graph G(V,E), the total executions is equal to

\displaystyle \sum_{v\in V} \Big( 1 + 1 + \deg(v) \Big) = \sum_{v\in V} \Big( 2 + \deg(v) \Big)
\displaystyle = \sum_{v\in V} 2 + \sum_{v\in V} \deg(v)
\displaystyle =  2\cdot \sum_{v\in V} 1 + \sum_{v\in V} \deg(v)

Now, the first summation is just equal to the number of nodes of V, while the second summation is just twice the number of edges. This gives us,

\displaystyle 2\cdot |V| + 2\cdot |E| = 2\cdot \big(|V| + |E| \big) = O(|V| + |E|).

Therefore, the complexity of the algorithm above is Big Oh of the sum of the number of nodes and the number of edges of graph G(V,E).


Published by

Bobby Corpus

Is an IT Architect.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s