Almost 2 years ago, I ran an algorithm to produce the maximal cliques of my facebook friends. My experiment was made possible because of the old facebook REST api. You can find the description of my experiment from this site.
Recently, when I had 351 friends, I tried to do the same experiment, now with a much larger size. To my disappointment, it took forever to compute the maximal cliques.
Why would a network of size 81 take only 315 seconds to compute while a network of size 351 takes all of your patience away? The answer is in the time complexity of the maximal clique algorithm.
The maximal clique algorithm* is as follows:
How this works
Before you call this algorithm, you first number your vertices from 0 to n-1. This is to allow the algorithm to sort the vertices according the number that was assigned to them.
This algorithm is called with the value of . At this point, the algorithm will output the string “”, which stands for an empty clique. The set is initialized to the entire vertex set , the choice set is also initialized to . The sets and are precomputed before the algorithm starts and are defined as follows: (Here, is defined as the set of vertices)
What this means in layman’s terms is that the set is the set of all vertices connected to and the set is the set of all vertices whose number is greater than .
Using the definition above, the choice set is defined as:
The set is defined as
When , we say that the generated set of vertices, is a maximal clique.
For example, suppose we have the following graph:
Running the algorithm over this graph would generate:
The “*” indicates that the generated vertices form a clique.
So how do we answer the question: “Why would a network of size 81 take only 315 seconds to compute while a network of size 351 a very long time”? We compute the running time of the algorithm.
Average Case Analysis
Let be the set of nodes. The number of ways you can connect one node to another node is , these correspond to the number of possible vertices which you can make a graph . Given this set, the power set corresponds to the number of possible graphs of you can create using nodes. The cardinality of this power set is . Let us call this power set .
Let be an arbitrary graph and define be the number of cliques in . By the recursive nature of the algorithm, it produces a tree, called the state space tree, for which the number of nodes is equal to . Since the algorithm is invoke recursively times, the running time of this algorithms is . It remains for us to compute in the average case.
Denote by the average clique size, then
Let’s dissect the above formula. Keeping in mind what you learned in elementary math, the average value is computed over the total number of graphs is , which is equal to . The numerator of the formula is just the sum of the number of cliques of all graphs . That’s how we get the formula above. Although, this does not help us compute the average complexity on first blush. We have to express this explicitly. For example, how do we compute , which is the number of cliques of graph ? A more algorithmic way of counting the number of cliques of is to get all possible set of vertices of and for each set determining if it is a clique or not. Then we just count the number of cliques to get the value of .
Let be a subset of the set of vertices of the graph . Define a function, which returns 1 if is a clique and 0 if is not a clique:
With the definition above, we can then express mathematically as:
Plugging this into the formula for , we get
By the summation property, we are able to switch the order of the summation:
Let’s now focus on the inner summation. It is the summation of the value over all graph . Remember that is 1 if is a clique of for a specific and 0 if not. Imagine to be a clique. Then all its edges are contained in . How many are the edges of then? It is the number of vertices in , which is the cardinality of , given by taken 2 at a time, or . How many possible cliques can you form with as a subset? To compute this, first note that the remaining edges of , if we take away the edges formed by is . If we take the power set of this remaining edges union each subset of it with , we will get all possible graphs with as a clique. Therefore,
Plugging this into the expression for will give us:
Let us now attack this expression. Let’s imagine has vertices, i.e., . Then there are subsets of of size . The summation can now be written as:
By canceling the factor , we get:
If we let , we can simplify the expression of to
Below is the graph of versus .
As you can see from the graph, there is a value such that for all . This means that
It’s just a matter of getting the value of . If we take the ratio of successive terms in the series, we get
The details of the computation can be found here.
Notice that past the point , the value of the above ratio is less than 1, or
For the sake of simplicity, if we let , we have
This inequality is true if the value of because . The implies that . Using this upper bound, we can compute the upper bound of by the following process. First, observe that
Since we have shown above that ,
from which . The average running time of the all-clique algorithm is therefore . For a graph with 81 vertices, the average number of operations is 8.250472e+15. Since it takes 315 seconds to compute this graph on my machine, my computer was able to compute 2.619197e+13 operations, on the average, in one second. For a graph with 351 vertices, the number of operations on the average is 4.092784e+26 which my computer can do in 1.562610e+13 seconds, or 495,500 years!
* For more information, read the book of Donald Kreher and Douglas Stinson entitled “Combinatorial Algorithms: Generation, Enumeration and Search”. You can find the algorithm and the analysis on page