Performance Arithmetic 2

In the previous article, we learned the various performance laws that we can use to do quick and dirty calculations related to performance. In this article, we will learn how to compute the lower bound on response time. Let us start with a little known law.

Suppose you are a manager of a restaurant and you want to know how many people are dining in your restaurant during a certain time period of the day. Suppose you observe that on the average 3 people leave the restaurant every 30 minutes. Furthermore, the average time a customer spends on your restaurant is 1 hour. Let us denote this time as R. We can compute the throughput of the restaurant as

\displaystyle X =3/30
\displaystyle =0.1

person per minute. If the customers will remain for 1 hour (or 60 minutes) on the average, then we can expect to have 0.1 *60 = 6 customers at any time in the restaurant.
This result is very intuitive and is known as Little’s Law:

\displaystyle N = XR

where N is the number of concurrent transactions on the system and R is the response time.

Example 1: A financial institution has a server that computes the financial indicators for the day for all stocks in a certain stock exchange. Suppose there is 5 jobs are always running for this task. If a job is finished for a company, another job is launched to compute the indicators for another company. In other words, there are always 5 jobs doing this task at any given time. If each computation takes about 15 minutes to complete, how may stocks can be processed in a day?

Using little’s law:

\displaystyle N = X_0 * R
\displaystyle X_0 = \frac{N}{R}
\displaystyle = \frac{5}{15 * 60}
\displaystyle = 0.0055

jobs/s

In one day, only \displaystyle 0.055\times 24\times 3600 = 480 stocks can be computed.

In a call center there are 50 operators that each receive a query from a customer. There is a backend database that supports the query of the operator. Let us call the duration of time where the operator receives a call and composes a query to the database as the think time \displaystyle Z. The time that elapses from the time the operator presses the submit button to the time it receives the response from the database as the response time \displaystyle R. We can view the operators as alternating between two states namely “thinking” and “waiting” for response. Let \displaystyle \bar M be the average number of operators in “thinking” state and \bar N be the average number of operators “waiting” for response. Then since an operator can either be “thinking” or “waiting”, we have \bar M + \bar N = M, where M is the number of operators.

Applying little’s law to the operators in “thinking” state, wehave:

\displaystyle \bar M = X_0 Z
and
\displaystyle \bar N = X_0 R

Adding the two equations we get:

\displaystyle \bar M + \bar N = M
\displaystyle = X_0 Z + X_0 R
\displaystyle =X_0 (Z+R)
Solving for R we get

\displaystyle R=\frac{M}{X0} - Z

This is known as the interactive response time Law.

Example 2: Suppose we were to design a system in which the response time is not to exceed 2 secs and we have 50 operators. If the average think time is 5 minutes, what is the throughput of the system?

From the interactive response law we have:
\displaystyle X_0 = \frac{M}{z+R}
\displaystyle =\frac{50}{5*60 + 2}
\displaystyle =0.1655

transactions/s
or 0.1655 * 24 * 3600 = 14304.64 transactions per day.
We now come to an important consequence of thelittle’s law, that of estimating the lower bound of
the response time of a system.

From Little’s Law, the response time is give by

\displaystyle N =X_0 R
\displaystyle R = \frac{N}{X_0}

The lower bound for R occurs when the throughput is maximum, or when

\displaystyle R \ge \frac{N}{1/\text{max}\{D_i\}} = N \times \text{max}\{D_i\}

Example 3: A webserver serves images and other static files in disk1 with a service time of 15msecs and its dynamic content is put it disk2 which is faster with a service time of 5msecs. On a peak period of 3 hours, there is an average of 50 concurrent users and the CPU utilization is observed to be 35%. A transaction visits disk1 3 times on the average and disk2 2 times. The webserver completes 50000 transactions in 3 hours. Compute the lower bound of the response time of this system.
\displaystyle D_\text{disk1} = S_\text{disk1} V_\text{disk1}
\displaystyle = 0.015 \times 3
\displaystyle =0.045
\displaystyle D_\text{disk2} = S_\text{disk2} V_\text{disk2}
\displaystyle = 0.005 \times 2
\displaystyle =0.01

\displaystyle D_\text{cpu} = \frac{U_\text{cpu}}{X_0}
\displaystyle = \frac{0.35}{50000/(3*3600)}
\displaystyle =0.0756

Since the CPU has the greater service demand,it is the system bottleneck. The maximum throughput this system can attain is 1/0.0756 = 13.22 transactions per second and the response time this throughput is 50 * 0.0756 = 3.78 seconds.

Advertisements

Performance Arithmetic

When analyzing performance issues in computer systems, it is always a good thing to have a feel for the overall picture and be able to make back-of-the-envelope calculations. In this article, we are going to familiarize ourselves with the various performance metrics and how they relate to one another. We then apply these concepts to be able to compute the maximum throughput and minimum response time of a given system.
Performance can be viewed from different perspectives. For a manager, the most important metric is the throughput. How many transactions can be completed in an hour. The greater the throughput, the more money for the business. On the other hand, for a customer, the most important metric is the response time. How fast can I get the response? Do i have to wait forever to get the next page?

Throughput is defined as the number of completed transactions per unit of time. This unit of time is usually in “seconds”. For a website, throughput can be the number of requests in a day. If a site can complete one million transactions per day, then the thoughput is 1,000,000/(24*3600), since there are 24 hours in a day and 3600 seconds in an hour. We denote throughput by X and define it as:

X_i = C_i/T

where C_i is the number of completed transactions of resource i and T is the observation time.

A system can be composed of more than one resource connected with one another. The overall system throughput is denoted by X_0 and defined similarly as above:

X_0 = \frac{C_0}{T}

where C_0 is the number of transactions completed by the system.

Throughput depends on how fast a resource can complete a transaction. The time it takes for a server to complete a transaction is called the service time S_i. If a CPU takes on the average 30 msecs to complete a computation, then the service time for that type of transaction is 30 msecs.

In a typical 8 hour day, an employee does not really consume the whole 8 hours working. There are periods where the employee is idle, or is not doing useful work. The number of hours he/she is working on the average is less than 8 hours. If we denote the number of hours busy as B, then the Utilization is defined as

U=B/T,

where T is the total working hours. If in a one hour observation period, the CPU is utilization is 40%, then it was busy 0.40*(3600 s) = 1440 seconds and idle the rest of the time.

There is an interesting relationship between the utilization, service demand, and throughput. This is known as the Utilization Law:

U_i=X_iS_i

for a given resource i. This can be seen easily by dividing the numerator and denominator of the utilization by \displaystyle C_i:

\displaystyle U_i =\frac{B_i}{T}
\displaystyle =\frac{B_i/C_i}{T/C_i}

But B_i/C_i is average time per transaction completed, which is precisely the service time. Therefore

\displaystyle U_i =\frac{S_i}{T/C_i}
\displaystyle =\frac{S_i}{1/X_i}
\displaystyle = X_iS_i

If a website should process 1,000,000 requests in a day, how fast should the CPU process each request in order to maintain a utilization of 15%? Using the utilization law we have,
\displaystyle S_i =U_i/X_i
\displaystyle =15/(1000000/(24*3600))
\displaystyle = 0.01296
\displaystyle = 12.96\text{msec}

Consider a web server that is composed of a CPU and 2 disks. A typical transaction may use the CPU and both disks one or more times before completing. Let us denote the number of times a transaction visits a resource i as V_i. Then the total time a transaction spends on a resource i is V_iS_i where $S_i$ is the service time of resource i. This is called the Service Demand D_i of resource i:

\displaystyle D_i = V_i S_i.

There is an interesting relationship between the number of times a transaction visits a resource i and the system throughput. If V_i is the number of times a transaction visits resource i, then

\displaystyle X_i = V_i X_0
\displaystyle V_i = \frac{X_i}{X_0}
\displaystyle = \frac{C_i/T}{C_0/T}
\displaystyle = \frac{C_i}{C_0}

By definition of the service demand:

\displaystyle D_i =V_i S_i
\displaystyle = \frac{C_i}{C_0}S_i
\displaystyle = \frac{C_iSi/T}{C_0/T}
\displaystyle =\frac{U_i}{X_0}

The above result is called the Service Demand Law:

\displaystyle D_i= \frac{U_i}{X_0}

The service demand is a very important quantity. It enables us the compute the maximum throughput of the system. Since X_0=U_i/D_i, as the utilization increase to 100%, the throughput also increases. However, when U_i=100%, the throughput cannot be increased any further. Therefore, the throughput is bounded above by

X_0\le\frac{1}{D_i}.

Therefore, in a given system, the resource with the highest service demand limits the throughput and is therefore the bottleneck.

Let’s have an example. Suppose that the utilizations of the CPU,disk1 and disk2 are measured and given as 15%, 35% and 30% respectively. Furthermore, suppose that the current throughput is 500,000 transactions per day. Calculate the maximum throughput of this configuration.

The system throughput is X_0=500000/(24*3600)=5.79. The service demand of the CPU is

\displaystyle D_\text{CPU} = \frac{U_\text{CPU}}{X_0}
\displaystyle = \frac{0.15}{5.79}
\displaystyle =0.026

Similar computations for disk1 and disk2 yeild:

\displaystyle D_\text{disk1} = 0.06
\displaystyle D_\text{disk2}=0.052

Since disk1 has the highest service demand, this resource limits the throughput. The maximum throughput this configuration can attain is:

\displaystyle X_0 \le \frac{1}{D_\text{disk1}}
\displaystyle = \frac{1}{0.06}
\displaystyle = 16.67

In one day, the maximum number of transactions is therefore

\displaystyle 16.67\times 24\times 3600=1,440,000

In the next article, we are going to learn the minimum response time this system will attain. It can never do faster than that minimum.

UNIX Filters

JAVAERO – UNIX was designed with simplicity in mind. Each command is supposed to do only one thing and do it well. However, this does not mean that
unix can only do simple things. On the contrary, highly complex tasks can be done by stringing these commands together. The output of one command can be
the input of another.

$ cat poem.txt| less

In this example, the command “cat” will dump to the console the contents of the file poem.txt. This is then made as input to the less command. This is quite
a simplistic example. In fact we can just do “less poem.txt” to have to same effect. However, this example is just to illustrate the concept of a unix pipe. The symbol “|” is called the pipe symbol and separates commands. The output of “cat poem.txt” is fed to the “less” command. Notice that the word immediately to the right of the “|” symbol will be interpreted by the shell as a command. An error will occur if the command does not exist.

Now that we have some familiarity with unix, let us add more to our arsenal of commands. One very common task we want to do is to find a specified file matching a certain pattern. The command to do this is “find”.

$ find . -name test.c

This command will search for files in the current directory named test.c. I will also search for test.c in all subdirectories contained in the current directory. The first argument to find is the directory to search. In this case, the . tells find to search in the current directory. The option “-name test.c” tells find that the filename matches test.c. In this command, we specified an exact match. We can also tell find to search for all files ending in .c

$ find . -name *.c

The “” backslash is necessary in order for the command to work. The “” is an escape character. We will have more to say about it in the coming sections.

More advanced shell.

Some characters are treated specially by the shell. When the shell encounters them in a command, it does some preprocessing before executing the command. The most common of these characters have something to do filename expansion.

Filename expansion allows us to specify a group of files using a pattern. For example, the pattern *.c will expand into all files whose name end in .c.

$ ls

$ ls *.c

is the same as

$ ls blah.c foo.c …

In other words, the shell first looks for all files ending in .c and constructs a list of all these files and makes them the argument to the “ls” command.

The following characters can be used to construct a pattern for filename expansion.

* – matches one or more characters
? – matches a single character
[] – matches any single character contains within the “[]”. If “^” is the first character of the list, it matches all files not containing any character in the list.

$ ls
README
license.txt
main.c
pr.h

To match main.c and pr.h you can either do

$ ls *.?

or

$ ls *.[hc]

The first command will match all files with a single letter extension. The second will match all .c and .h files and is equivalent to the command ls *.h *.c.

To match the file README, you can

$ ls R*

Exercise: Suppose we have the following files in the directory

$ ls
README
INSTALL
main.c
init.c
Makefile

How do you match README and INsTALL?

Since the characters *, ?, and [] are interpreted by the shell, we must avoid filenames that contain them. For example, if we have a file named *, we can’t just do this:

$ rm *

for this will match all files in the directory and will delete all of them. To match the file named * we must quote the * using the “” backslash character.

$ rm *

will delete the file named *.

However, this expansion will only be applicable to the current files in the directory and not to the files in the subdirectories. To find all files ending in .c in all subdirectories of the current folder, we use the find command.

$ find . -name *.c

Notice the backslash character. This is necessary in order for the shell not to expand the *.c.

REGULAR EXPRESSIONS

A very important skill that any unix professional should acquire is the ability to create and use regular expressions. A regular expression is a way to specify a pattern that can be used in matching text. A regular expressioin is similar to shell filename expansions but they are more powerful. The power of regular expressions is very hard to express in writing, they can only be experienced.

Three important filters in unix make use extensively of regular expressions, namely: sed, grep and awk.

The following characters are used to construct regular expressions.

^ – matches the beginning of a string
$ – matches the end of a string
. – matches any single character
[..] – matches a single character in the list
[^..] – matches a single character NOT in the list
(..) – used to group a regular expression
| – used to indicate alternative regular expression
* – an operator, it specifies that the preceding regular expression be matched zero or more times
+ – similar to the * operator but requires the preceding regular expression to be matched at least once.
? – similar to the * operator but requires the preceding regular expression to be matched at most once.
{N} – also known as a braced regular expression, is an operator that the requires the preceding regex to be matched N times
{N,} – at least N times
{N,M} – N to M times.

The above list will already produce a lot of possible regular expressions. In daily work, we usually only use a few of these constructs.

Let us give examples of the above constructs.

$ touch Readme theReadme

To match the file “Readme”,

$ /bin/ls -1 |grep “^Readme”

To match the file “theReadme”

$ /bin/ls -1 |grep “Readme$”

$ touch main.c main.x test.C test.cxx
$ /bin/ls -1 |grep “.[ch]$”

will match all files ending in .c or .h Notice that the “.” is explicitly matched and should be escaped by a backslash. The $ requires that the character “c” or “h” should be the last character on the line. Without the $, the pattern will also match .cxx.

$ /bin/ls -1 | grep “.[^ch]”

will match all files not containing a c or h after the dot.

Suppose we have the following files:

$ /bin/ls *Readme*
theReadme
Readme
dontReadme

If we want to match on theReadme and Readme, we can write

$ /bin/ls -1|grep “^(the)*Readme”

Notice that we grouped “the” as a regular expression. The * operator after the regex “the” will match zero or more “the” patterns. The ^ in the beginning of the pattern acts as an anchor. It requires that the pattern occur at the beginning of the line. This prevents the file dontReadme to be matched.

$ touch peterson johnson benson

To match peterson and johnson we use the alternation operator

$ /bin/ls -1 |grep “(peter|john)son”

We need to escape the “|” in order to protect it from the shell.

Sed is another tool makes extensive use of regular expressions. Sed is short for stream editor. It has a lot of options but most of the time, sed is used in
find-replace operations.

The syntax of sed find-replace is

sed ‘s/pattern to find/replacement/g’

$ echo “the quick brown fox jumped over another fox”|sed ‘s/fox/cat/g’
the quick brown cat jumped over another cat

In the above example, sed substituted the word fox with cat. Since the word “fox” occurred twice in this string, the substitution occured
twice. To replace only the first occurece, we omit the “g” in the above sed argument.

$ echo “the quick brown fox jumped over another fox”|sed ‘s/fox/cat/’
the quick brown cat jumped over another fox

The pattern “fox” is an exact pattern, We can also specify a regular expression in place of fox, for example,

$ echo “the quick brown fox jumped over another fox”|sed ‘s/f.*x/cat/’
the quick brown cat

This tells sed to substitute the word cat to the string that matched the pattern f.*x. Let us analyze this pattern. It consists of strings that start with “f”, one or more characters in between ( as specified by .*) and ends in y. You might think that sed will match the word “fox”. However, in this case, sed scanned the input string and first meets the letter “f”. Then it examines the next character, which is “o”. Since this satisfied the condion “one or more character in between”, sed accepts this and continues to examine the next character. Upon seeing “x”, sed should probably stop. However, it does not. It continues to scan the whole string until it encounters the last “x”. Therefore, the pattern that matched is “fox jumped over another fox”. In this way, sed is said to match the longest matching pattern.

Editing files using VI editor.

JAVAERO – The vi editor is probably the only editor you will use in the unix command line. It is quite different from the ordinary editor you are familiar in windows. It will take time to get used to. However, when you have mastered it, you’re will be more efficient in your work.

To edit a file named poem.txt, invoke vi like this:

$ vi poem.txt

If the file poem.txt exists, vi will open the file for editing. If not, this will tell vi to create a new file named poem.txt.

Unlike our usual microsoft editors where you can begin typing your text, vi does not allow you to do that. When it is launched it is said to be in “command mode”. In this mode, vi will interpret your keystrokes as commands. These commands are usually what you see in the toolbar of our familiar editors. In vi, there is no toolbar. Commands are invoked using key strokes. In order to type any text at all, you need to be in the “edit” mode. You can do this by pressing the “i” key. This is a command that tell vi to be in the edit mode. Once in the edit mode, you can then start typing. Here is a nice unix poem you can type using vi.

Waka waka bang splat tick tick hash,
Caret quote back-tick dollar dollar dash,
Bang splat equal at dollar under-score,
Percent splat waka waka tilde number four,
Ampersand bracket bracket dot dot slash,
Vertical-bar curly-bracket comma comma CRASH.

One you are done entering the text, you go to the command mode using the escape key (ESC). In the command mode, you can save your text by typing “:wq”.

This is basic vi editing. other advance features we will learn as we progress.

Getting Started With the BASH Shell

Perhaps the first question you want to know when you are at the shell is where you are. The command pwd will give you your current working directory.

$ pwd
/home/bobby

The ls command will give you a list of all files in the current directory.

$ ls

Giving it an argument, ls will determine if the argument is a directory. If it is, it will list the files contained in that directory. If it is not a file, ls will just list down the file as if echoing what you typed. However, if the file does not exist, ls will tell you that fact.

$ ls non-existent
non-existent not found

Some unix commands require a parameter in order to complete successfully. Some, like the ls command above, have default values. A unix command is structured in the following way:

command_name options arguments

options are usually preceded by a “-” sign. Options allow you have more control over the output of a command. For example, the command

$ ls -l

will list down detailed information about the files in the current directory. We will learn how to interpret the output of this command later when we gain more familiarity with the basic commands.

Exercise: List down the files of your parent directory.

Other basic commands.

To change to a directory, we use the cd command. It takes a single argument, which is the directory to change to. If you don’t specify an argument, it will change to your home directory, as defined by the HOME environment variable.

To create a directory, we use the mkdir command. It takes one or more arguments. These arguments are the names of the directories to create.

$ mkdir tmp1 tmp2 tmp3

will create directories tmp1, tmp2 tmp3 as shown by the ls command below.

$ ls -F

tmp1/ tmp2/ tmp3/

Question: What does the -F option to ls do?
We can remove the directories we just created using the rmdir command.

$ rmdir tmp1 tmp2 tmp3

If a directory is not empty, rmdir will refuse to remove that directory unless it is first emptied of all files.

Exercise: How do you force rmdir to remove a non-empty directory.

We can create empty files using the command “touch”.

$ touch me

Will create the file named “me” in the current working directory.

To remove this file, use the “rm” command.

$ rm me

We should use the rm command with caution because you cannot undelete the file you just deleted. You can tell rm to ask you for confirmation to delete the file using the “-i” commandline switch.

$ rm -i me
me: ? (y/n)

We now know how to do some basic commands for unix. But we are still not able to do anything useful. Next we want to do is to be able to view the content of a file. There are many ways to peek
at the contents of files. For short files that will fit the screen, we can use the “cat” command.

$ cat filename

This will dump on the screen the contents of file named filename. If the file is long and will not fit one
screenful, the contents will just flash quickly on the screen and you will just be able to view the end portion
of the file.

To view longer files, we can use the “more” command.

$ more filename

will let you view the file page by page. The more command is also known as a “pager” command because it let’s you view a file page by page. If a file is long, you can scroll the the next page using the space bar. The only problem with the “more” command is that you cannot go back to the previous page. You can only scroll-down, no up.

A more powerful pager than “more” is the “less” command. It allows you to scroll up and down. In this respect, we can say that “less” is more.

$ less long-file

To scroll backwards, press the “b” key.

Shell environment variables.

The shell is not only a command interpreter, it is also a programming environment. By an environment, what we mean is that is gives you the necessary tools to create programs conveniently and according to your own programming style and preference. We will explore that later.

As we have seen before, some commands are able to operate without an argument, like the cd command. The cd commands depends on the environment variable HOME which stores the value of your home directory. To view the value of this variable we use the echo command.

$ echo $HOME
/home/bobby

Using this value, cd command will change to that directory when there are no arguments.

There are predefined environment variables when your account is first created. Many commands depend on the definitions of these variable for proper functioning. (just like the human body depends on some factors in order to operate properly). Among the most important variables you the shell depends are PATH. The path tells the shell where to find programs. Accidentally changing the value of this variable can give you a big headache as the shell cannot anymore find some programs.

You can also define your own environment variables. To define an environment variable, the syntax is

name=value

For example, to define the variable QUOTE to “Health is wealth.”
we type

$ QUOTE=”Health is wealth.”

Notice that the value is enclosed in quotation marks. This will tell the shell that the value contains spaces. If the quotation marks is ommitted an error will occur, like this:

$ QUOTE=Health is wealth
bash: is: command not found

As you can see, the shell interpreted “is” as a command, and having found none, it issues the error “command not found”.

Unix Questions

Unix Questions. This is a commandline exercise. Find a way to make execute these instructions in the fastest way possible.
1. cut and paste the following to a file named the_road_not_taken
TWO roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,
And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.
I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I.
I took the one less traveled by,
And that has made all the difference
2. count the number of words
3. count the number of unique words
4. create a directory for every unique word
5. move the directories to their uppercase equivalent. For example,
mv Yet YET
6. copy the file the_road_not_taken to each directory with the name of the directory in lower case and extension is .txt
for example:
cp the_road_not_taken WORN/worn.txt
7. rename all .txt files to .doc extension. For example
mv worn.txt worn.doc
8. format the file the_road_not_taken in such a way that it will look like this:

TWO |roads |diverged |in |a |yellow |wood, |
And |sorry |I |could |not |travel |both |
And |be |one |traveler, |long |I |stood |
And |looked |down |one |as |far |as |I |could |
To |where |it |bent |in |the |undergrowth;|