A Very Brief Introduction to Parallel Computing

Imagine you were given a hundred 3-digit numbers to add, how much time would it take you to get the answer? If it would take you 30 seconds to add ten numbers (using a calculator), then it would take you about 300 seconds (or 5 minutes) to add 100 numbers.

Now imagine there are a hundred people and each person has a number. How would a hundred people compute the sum of all numbers? Seems like a recipe for disaster. However, we can do this:

1. Group the people by two, if there is an extra person with no group, this person can join the nearest group (to make a group of 3 people).
2. Each group will add the numbers that they have to get the sum S.
3. Each group will nominate a representative that will carry this new number S. The remaining members can sit down.
4. Repeat step 1 until there is only one person remaining.
5. The last person remaining will have the sum of the 100 numbers.

At the beginning, we have 100 people in groups of two. It takes 3 seconds for them to add their respective numbers. In the next iteration, only 50 people remain that will then add their numbers. Continuing in this way, the remaining number of people will be halved until in the 7th iteration we get our answers. So if each iteration can execute in 3 seconds, then it will take 21 seconds for a hundred people to compute the sum of 100 numbers!

I mentioned that in the 7th iteration, we are able to get the answer. There is a formula to get the number of iterations and it is

\displaystyle \lceil\log_2(n)\rceil

where n is the number of people to start with and the symbol \lceil\rceil is the ceiling function which rounds off the result of the function \log_2(n) to the next highest integer. If n=100, the number of iterations is

\displaystyle \lceil\log_2 100\rceil = 7

Having a hundred people to compute the sum of 100 numbers might be practically infeasible. We might not have the space to accommodate them. However, if we only have say 10 people, we can still have a faster computation.

Map Reduce

Given a hundred numbers, we can group the numbers by 10 and distribute it to 10 people. Each person will then add the numbers they have and combine the sum with the rest of the participants to get the sum of 100 numbers in 1/10th of the time.

Grouping the numbers into smaller subsets and distributing them to each person is called mapping. Combining the sum computed by each person to the total sum is called reduction. This is called map-reduce and the example given is a simple example of map reduce.

A More Complex Example

Let \mathbf A and \mathbf B be two matrices. The product of matrices \mathbf A and \mathbf B is the matrix \mathbf C

\displaystyle  \begin{bmatrix}  c_{00} & c_{01} & \cdots & c_{0n}\\  c_{10} & c_{11} & \cdots & c_{1n}\\  \cdots & \cdots & \cdots & \cdots\\  c_{n0} & \cdots & \cdots & c_{nn}  \end{bmatrix} =  \begin{bmatrix}  a_{00} & a_{01} & \cdots & a_{0n}\\  a_{10} & a_{11} & \cdots & a_{1n}\\  \cdots & \cdots & \cdots & \cdots\\  a_{n0} & \cdots & \cdots & a_{nn}  \end{bmatrix}  \begin{bmatrix}  b_{00} & b_{01} & \cdots & b_{0n}\\  b_{10} & b_{11} & \cdots & b_{1n}\\  \cdots & \cdots & \cdots & \cdots\\  b_{n0} & \cdots & \cdots & b_{nn}  \end{bmatrix}

such that

\displaystyle c_{ij} = \sum_{k=0}^{n-1} a_{ik}b_{kj}

where c_{ij} is the entry of the matrix C on row i and column j.

Here is a sequential algorithm to compute the matrix C:

//a and b are nxn matrices
//c[i][j] is initialized to 0 for all i,j
for(var i=0;i<n;i++){
  for(var j=0;j<n;j++){
    for(var k=0;k<n;k++){
      c[i][j] += a[i][k] * b[k][j]
    }
  }
}

There are 3 loops in the above algorithm, the innermost loop will execute n times to compute the sum of element-wise product of row i and column j. In the diagram below, the inner loop will do the following: multiply each element inside the of the box of matrix A and the elements inside the box of matrix B and take the sum. Since there are n such products, the number of addition operations is n.

Then you have to do this n times for each column of matrix B and n times for each row of matrix A, as shown below, for a total of n^3 operations.

So if you to multiply a matrix with n=100, you will need to execute 100^3 (or 1 million) operations!

Parallel Computing can help us here.

Parallel Matrix Multiplication

The good thing about matrix multiplication is that we can multiply by blocks. For the sake of simplicity, suppose we have 2 square matrices of dimension 100. We can divide the matrices into small square sub-matrices of dimension 25:

A=  \left[\begin{array}{c|c|c|c}  \mathbf A_{00} & \mathbf A_{01} & \mathbf A_{02} & \mathbf A_{03}\\  \hline  \mathbf A_{10} & \mathbf A_{11} & \mathbf A_{12} & \mathbf A_{13}\\  \hline  \mathbf A_{20} & \mathbf A_{21} & \mathbf A_{22} & \mathbf A_{23}\\  \hline  \mathbf A_{30} & \mathbf A_{31} & \mathbf A_{32} & \mathbf A_{33}  \end{array}  \right]  ,  B=  \left[\begin{array}{c|c|c|c}  \mathbf B_{00} & \mathbf B_{01} & \mathbf B_{02} & \mathbf B_{03}\\  \hline  \mathbf B_{10} & \mathbf B_{11} & \mathbf B_{12} & \mathbf B_{13}\\  \hline  \mathbf B_{20} & \mathbf B_{21} & \mathbf B_{22} & \mathbf B_{23}\\  \hline  \mathbf B_{30} & \mathbf B_{31} & \mathbf B_{32} & \mathbf B_{33}  \end{array}  \right]

where each \mathbf A_{ij} and \mathbf B_{ij} are square matrices of dimension 25.

We can then compute the resulting sub-matrix of C using the usual formula:

\displaystyle \mathbf C_{ij} = \sum_{k=0}^{p-1} \mathbf A_{ik}\mathbf B_{jk}

where p=100/25=4.

For example, to compute \mathbf C_{00} we have

\mathbf C_{00} = \mathbf A_{00} \mathbf B_{00} + \mathbf A_{01} \mathbf B_{10} + \mathbf A_{02} \mathbf B_{20} + \mathbf A_{03} \mathbf B_{30}

We then use this mechanism to distribute our sub-matrices to different computers (or in modern parlance “compute nodes”). For this example, we need 4×4=16 compute nodes. Each compute node will contain 2 sub-matrices, one for A and one for B. For easy visualization, we can make a drawing of our compute nodes arranged in a square of side 4 and labelled as shown below:

Now that we have evenly distributed our matrix data to each node, the next question is how do we compute? Notice that not one node contains all the data. Each node has a very limited subset of the data.

The Trick

We can get a clue by looking at the end result of the computation for Node 00. At the end of the computation, Node 00 should have the following result:

\mathbf C_{00} = \mathbf A_{00} \mathbf B_{00} + \mathbf A_{01} \mathbf B_{10} + \mathbf A_{02} \mathbf B_{20} + \mathbf A_{03} \mathbf B_{30}

Looking at the above formula, Node 00 only has the following data

\mathbf A_{00} \text{ and } \mathbf B_{00}

The rest of the sub-matrices are not in its memory. Therefore, Node 00 can only compute

\mathbf A_{00} \times \mathbf B_{00}

We are missing the following products:

\mathbf A_{01}\mathbf B_{10}, \mathbf A_{02}\mathbf B_{20}, \text{ and } \mathbf A_{03}\mathbf B_{30}

The matrix \mathbf A_{01} is with Node 01 and the matrix \mathbf B_{10} is with Node 10. So if Node 01 can send matrix \mathbf A_{01} to Node 00 and Node 10 can send matrix \mathbf B_{10} to Node 00, we can then get the product \mathbf A_{01}\mathbf B_{10}. In fact, if we do a slight rearrangement like the below, we can use the following algorithm to compute matrix \mathbf C:

1. Each node will send the current A sub-matrix to the node on the left and receive a new A sub-matrix from the node on the right. If there is no node on the left, it will send the sub-matrix to the last node on its row.
2. Each node will send the B sub-matrix to the node on top and receive a new B sub-matrix from the node below it. If there is no node on top, it will send the sub-matrix to the last node on its column.
3. Multiply the new sub-matrices and add the result to the current value of the sub-matrix of C that it keeps in memory.
4. Repeat until the number of iterations equal to N, where N^2 is the number of nodes.

The figure below describes this algorithm for Node 00.

Doing this for all nodes, we can visualize the parallel algorithm at work on all nodes as shown in the animation below:

This algorithm is one of my favorite algorithms since it looks like the pumping of blood in to the heart.

How fast is this algorithm?

Given a 2 square matrices of dimension N, the number of sequential computations is O(N^3). Using parallel matrix multiplication above, we divide the matrices into sub-matrices of dimension n. The resulting block matrix is of dimension N/m=p. Each node will now compute the product in parallel using O(n^3) operations per sub-matrix multiplication. Since there are p multiplications per node, the number of operations per node is O(pn^3) The ratio of these two quantities will give us how fast the parallel algorithm is

\displaystyle \frac{N^3}{pn^3} = \frac{1}{p}\frac{N^3}{n^3} = \frac{1}{p}\cdot p^3 = p^2

For our particular example, the theoretical speedup is p^2 = 4^2 = 16, that is, our parallel algorithm can compute the product 16 times faster than the sequential algorithm. If we increase p, we can increase the speedup. However, we can only increase p up to a certain point as the network will then become the bottleneck and will make things slower than the sequential algorithm.

In the next post, we will see how to program parallel algorithms.

When Average Is Not Enough: Thoughts on Designing for Capacity

Designing a system from scratch to handle a workload you don’t know is a challenge. If you put to much hardware, you might be wasting money. You put little, then your users will complain of how slow the system is.

If you’re given only a rate, like 6000 hits/hour, you don’t know how these are distributed in a minute by minute or per second interval. We can make a guess and say that there are about 100 hits per minute or 1.67 hits/sec. If hits come uniformly at that rate, then we can design a system that can handle 2 hits/sec and all users will be happy since all requests will be served quickly and no queueing of requests. But we know it’s not going to happen. There will be some interval where the number of hits is less than 3 and some more than 3.

Theoretically, requests to our server come randomly. Let’s imagine 60 bins represented by seconds in one minute. We also imagine that requests are like balls we throw into the bins. Each bin is equally likely to be landed by a ball. It’s possible that all balls land on only one bin!

throw_balls_random

After throwing the balls into bins, let’s see what we have.

simulation

As you can see, some bins have more than 2 balls (which is the average number of balls in a bin). Therefore if we design our system based on the average, 50% of our users will have a great experience while the other 50% will have a bad experience. Therefore we need to find how many requests per second our server needs to handle so that our users will have a good experience (without overspending).

To determine how many requests per second we need to support, we need to get the probability of getting 4, 5, 6 or more request per second. We will compute the probability starting from 3 requests per second and increment by one until we can get a low enough probability. If we design the system for a rate that has a low probability, we are going to spend money for something that rarely occurs.

Computing the Probability Distribution

We can view the distribution of balls into bins in another way. Imagine labeling each ball with a number from 1 to 60. Each number has an equal chance to be picked. The meaning of this labeling is this: the number that was assigned to the ball is the bin (time bucket) it belongs to. After labeling all balls, what you have is a distribution of balls into bins.

label_balls
Since each ball can be labeled in 60 different ways and there are 100 balls, the number of ways we can label 100 different balls is therefore

\displaystyle 60^{100}

Pick a number from 1-60. Say number 1. Assume 2 balls out of 100 are labeled with number 1. In how many ways can you do this ? Choose the first ball to label. There are 100 ways to choose the ball. Choose the second ball. Now there are 99 ways to choose the second ball. We therefore have 990 ways to select 2 balls and label them 1. Since we don’t really care in what order we picked the ball, we divide 990 with the number of possible arrangements of ball 1 and ball 2, which is 2! (where the exclamation mark stands for “factorial”). So far, the number of ways to label 2 balls with the same number is

\displaystyle \frac{100 \times 99}{2!}

Since these are the only balls with label 1, the third ball can be labeled anything except number 1. In that case, there are 59 ways to label ball 3. In the same way, there are 59 ways to label ball 4. Continuing this reasoning until ball 100, the total ways we can label 2 balls with number 1 and the rest with anything else is therefore:

\displaystyle \frac{100 \times 99}{2!} \times 59^{98}

Notice that the exponent of 59 is 98 since there are 98 balls starting from ball 3 to ball 100.

Therefore, the probability of having two balls in the same bin is

\displaystyle \frac{100 \times 99}{2!} \times \frac{59^{98}}{60^{100}} = 0.2648

We can also write this as

\displaystyle \frac{100!}{2! \times 98!} \times \frac{(60-1)^{98}}{60^{100}} = \binom{100}{2} \frac{(60-1)^{98}}{60^{100}}

In general, if m is the number of balls, n the number of bins and k the number of balls with the same label, then the probability of having k balls within the same bin is given by

\displaystyle \binom{m}{k} \frac{(n-1)^{m-k}}{n^{m}}

,

where

\displaystyle \binom{m}{k} = \frac{m!}{k!(m-k)!}

is the binomial coefficient.

It turns out that this is a probability distribution since the sum of all probabilities from k=0 to k=m is equal to 1. that is

\displaystyle \sum_{k=0}^{n} \binom{m}{k} \frac{(n-1)^{m-k}}{n^{m}} = 1

To see this, recall from the Binomial Theorem that

\displaystyle \big( x + y \big)^n = \sum_{k=0}^{n} \binom{n}{k} x^{n-k}y^k

If we let x=n-1 and y=1, we can write the above equation as

\displaystyle  \begin{array}{ll}  \displaystyle \sum_{k=0}^{m} \binom{m}{k} \frac{(n-1)^{m-k}}{n^{m}} &= \displaystyle \sum_{k=0}^{m} \binom{m}{k} \frac{(n-1)^{m-k}\cdot 1^k}{n^{m}}\\  &= \displaystyle\frac{(n-1+1)^m}{n^{m}}\\  &= \displaystyle\frac{n^m}{n^m}\\  &= \displaystyle 1  \end{array}

Here is a graph of this probability distribution.

probability_distribution

Here’s the plot data:


probability
1 0.315663315854
2 0.264836171776
3 0.146632456689
4 0.060268424995
5 0.019612775592
6 0.005263315484
7 0.001197945897
8 0.000236035950
9 0.000040895118
10 0.000006307552

We can see that for k=9, the probability of it occurring is .004%. Anything beyond that we can call rare and no need to spend money with.

Just For Fun

What’s the probability that a given bin is empty, that is, there are no balls in it?

Other Probability Distributions

Our computation above was based on a uniform probability distribution. However, there are other distributions that are more suitable for arrival of requests. One of the most widely used is called the Poisson Distribution where you can read from here.

R Code

The R code to generate the simulation:

par(mfrow=c(4,4))
f=function(){
  x=c()
  for(i in 1:100){
    x=c(x,sample(seq(1,60),1,replace=T))
  }
  plot(tabulate(x),type="h", ylab="tx", xlab="secs")
}

for(i in 1:16){
 f()
}

The R code to generate the probability distribution:

p=function(m,n,s){
 prod(seq(m,m-s+1))/factorial(s)*(n-1)^(m-s)
}

tt=c()
for(i in 1:10){
 tt=c(tt,p(100,60,i)/60^100)
}
plot(tt,type="h",xlab="Number of Balls",ylab="Probability")

Dedication

This post is dedicated to my friend Ernesto Adorio, a mathematician. He loves combinatorial mathematics.

Rest in peace my friend! I miss bouncing ideas with you.

New Blog Location

I migrated my blog from a hosted linux server which I maintain into wordpress.com hosting service. Times are changing. I used to have more time before to maintain my site, upgrade my linux applications to the latest security patches, configure my own DNS server, email server, http server, what have you… No I just don’t have the luxury of time. It’s good that we have this hosting providers doing the job for you for free!!!

I thought that migrating my old wordpress blog from the old site to the new site would be a challenge. I was wrong. Thank you wordpress for making my migration painless.

I just followed the instructions from here to export and import my blog:

http://en.support.wordpress.com/import/

and it worked like a charm!

A Tribute to My Manager

janlategahn.jpgI have been under many managers in my IT career which spans over 11 years. They come in all shapes and sizes. But one manager stands out and this is a tribute to him.

My manager’s name is Jan Lategahn. He was my manager for one year in an IT project. But it was one of the most productive years of my life. He was different from all my previous managers because of the following qualities which I will enumerate below.

My manager inspires his subordinates. A manager who does not know how to do the work of his subordinates will find it hard to inspire. The metaphor that usually comes to my mind is that of a Shogun; the most skillful swordsman in his territory. Everybody treats him with respect. The same is true for a manager who is the best programmer in his team.

My manager is an expert programmer. He is an expert in SAP and UNIX. Being a programmer he knows that programming is a creative process and sometimes it is just difficult to predict when you can finish a piece of code. He doesn’t give unrealistic deadlines but if the deadline can’t be moved, he knows how to adjust the scope or to give you all the help you need to succeed.

Once my manager went on a vacation and I was so swamped with work. When he returned to work and found me working till 12 midnight that week he was so kind enough to offer his assistance. I was surprised when he approached my desk and asked me “what can I do to help you?” I asked him to install the server in the uat environment which he did in just two hours. After that he returned and asked for more tasks. I’ve never had any other manager ask me for tasks (which is not surprising; they don’t know how to do it).

When I went to Singapore for my vacation I brought my laptop just to make sure I can do support when extremely needed. I was expecting my manager to call me anytime and ruin my vacation. But instead I got a text message telling me about places I can go and enjoy in Singapore. After that he let me alone in my vacation until I returned to work. Later on he told me he had a rough time during those days and that he must have aged six years when I was as away. I really appreciate what he did and feel so lucky he can do without me.

My manager is not afraid to show his humanity. He is not afraid to show to his subordinates that he sometimes feels tired; that he dislikes unproductive meetings and chatterboxes. He would rather code something than waste his time in meetings. Other managers look like supermen who don’t seem to feel tired and never complain. My manager gives us an example that it’s okay to be human.

Once we had a visit wih the big bosses from onshore. And we had to behave the whole week because they were sitting near our cubicles. On the last day when the big bosses finally left, my manager was not ashamed to show a sigh of relief and say “it was very hard to breathe with them around” or so along those lines. We were laughing, but I really appreciated that reaction. It showed me that my boss is also human and whatever ordinary things we feel, he feels it too.

My manager has a great personality. He always greets people when he arrives in the morning and always finds time to do a quick chat about anything under the sun. I have learned so much from him about electronics, WI-FI hacking, etc.; things not directly related to work. In fact when he’s absent from work due to some illness, I find the team spirit not so happy. He brings life to the team. In my previous work we feel elated when our managers are not around, in my manager’s case the opposite is true. I even have to send a text message to him to ask how he is doing.

And most of all, my manager is a friend to his men. Other manager’s talk about open door policy but always maintain the distance from their subordinates. Not so with my manager. I can always speak my thoughts to him without reservation because he has become a friend.

To my manager, mentor and friend, I wish you only the best in life!

Do It Yourself Supercomputing in Linux Part 1

If you recently purchased a laptop or desktop computer, chances are you have a dual core system. There does not seem to be any indication anymore of us going back to single core unless some technological breakthrough will break the power barrier of CPUs. This means that more and more people will have a high powered computer in their homes without any idea how to harness such power.

What does it mean to have a dual processor? On first impulse, you probably will think it will speed up the execution of your programs. You would probably perceive a significant difference between the response time of your programs in a single core versus a dual core. But why do programs run faster in dual processor systems? For one thing, the speed of each processor is faster as compared to older single processors. The other thing is due to symmetric multiprocessing. An analogy of the dual processor system is a bank with 2 tellers and having a single line for customers. When a teller is done with one customer, it will get the next customer from the line. You can read about the performance of multiprocessor systems in this article.

There is another way you can make use of multiprocessors to make your programs run faster. This is through parallel programming. Parallel computing has been around for a long time already but are usually confined to university laboratories or supercomputing facilities. It has not caught the interest of the common people because they have no access to such machines. However, the future is already about multi-processing. More and more of these machines will reach the masses. This means that highly talented members of the masses get to experiment with parallel computing on a day-to-day basis. We are in for another revolution in computing. Are you ready for this revolution?

Continue reading Do It Yourself Supercomputing in Linux Part 1

The Best Place in Singapore

I was fortunate enough to go to Singapore in October of last year. We arrived in Changi airport about lunch time and it was raining. I was expecting it as some of my friends told me it rains every now and then in Singapore. I was immediately impressed by the airport. Although I have several big airports before when I was in San Franciso, USA and Dubai, this airport was refreshing. We took a cab from the airport to our hotel (Peninsula Excelsior). The scenery was really beautiful as we drove from the airport to the city. The city itself was nothing compared to anything in the Philippines. The streets were so clean and it was very hard for us to find any cigarette butt.

Our first destination was the Zoo. I’m not good with remembering names so I can’t remember the name of the Zoo. But it was the first time I entered a zoo where I saw many animals for the first time. Well, I don’t really go to the zoo, even here in the Philippines. It’s one of those places that are low in my priority. However, I was not disappointed when I went to Singapore zoo.

Our hotel was very near the place of our conference so we just walk it on the way home, utilizing the underground mall that connects Suntec to a train station near our hotel. I can’t remember the name of that underground mall, or if there is a name to it. It was just a pleasant walk. You can shop as you go to the hotel.

We stayed in Singapore for 6 days. Before we left for the Philippines, my wife and I went to Sentosa. It was the place in the top of her list to go. The place was really great. We went to this underwater museum where you can see esoteric fishes and crustaceans. There is already one here in the Philippines but I haven’t gone there yet. We went to Sentosa by train and went out of it by cable car. The view was really great!

On the first day, my wife asked me if I would consider going to Singapore next time for vacation. I told her the city was beautiful, very modern and clean but that there was really no compelling reason to go there. I can live with what we have in the Philippines.

However, all that changed when I went to Borders bookstore and Kinokuniya. There is nothing of such beauty and magnitude in the Philippines as those bookstores. I can spend my entire day just browsing through all the books. And that would be just one row of bookshelves! So the best place in Singapore for me are the bookstores and if I have my way, then the reason I will return to Singapore is to visit the bookstores.

Browsing Books at Borders

John Wheeler, A great physicist!

John Archibald Wheeler is one of my most admired physicist. He just died recently and I want to make a small tribute to him to did so much in General Relativity. Wheeler is the person who coined the word “black hole”, a concept which captivated my imagination when I was yet a kid and a concept which I eventually studied on my own in the University. He wrote one of the best books in General Relativity entitled “Gravitation”. This is a thick and heavy book full of physics and mathematics. It introduces advanced mathematical concepts as you progress in your study of Relativity in a very geometric way. This book has full of illustrations that really whets your appetite for studying advanced physics.

John Archibald Wheeleer

I have bought many books authored by Wheeler. One of them is “Exploring Black Holes“, with Taylor as co-author. I bought this book in Borders bookstore in Singapore. Unfortunately, we don’t have these kinds of books in the Philippines. A friend of mine also lent me a layman’s book written by Wheeler entitled “Geons, Black Holes & Quantum Foam”. This book gave accounts on Wheeler’s contribution to the Manhattan Project and the other great people whom he worked with.

I have a professor before who took his PhD in US who told me that whenever Wheeler gives a lecture at 7 am in the morning, he will always attend it no matter how cold it is in New York. That’s how big an impact Wheeler had on him.

To the great man Wheeler, thank you for inspiring us to study one of the greatest theories in the 20th century!