The Power of the Command Line

The command line separates the amateurs from the experts. The command line lies at the heart of the Unix computing power.

The beauty of command line is that you are able to do relatively complicated tasks in just one line. Take for example renaming of doc files to txt files. An ordinary user will issue the command

bobby-corpuss-imac-5:~ bobby$ mv file1.doc file1.txt

for every file the current directory. If the directory turns out to contain so many files, then the poor user will spend so much time trying to do a non-value added work.

A better use of one’s time is to study how the command line works and be able to do the job in just one line of code:

bobby-corpuss-imac-5:~ bobby$ /bin/ls -1 *.doc |sed 's/(.*).doc/mv & 1.txt/g'|bash

To see how this works, let us dissect the command which is composed of 3 commands linked together using the pipe symbol “|”. The pipe allows you to feed the output of one command to the input stream of the next command. The 3 commands used are:

  • /bin/ls
  • sed
  • bash

The first command will list down the files in the current directory and present them in one column. For example,

bobby-corpuss-imac-5:tmp bobby$ /bin/ls -1 *.doc
file1.doc
file2.doc
file3.doc
file4.doc
file5.doc

The next command is quite tricky so let’s conquer it little by little. The command name is sed which is short for Stream Editor. It takes a regular expression that matches any string ending with a doc. The cryptic looking expression (.*) is nothing to worry about. The “” (backslash) you see that precedes the “(” and “)” is an escape character and is used to indicate to the shell that the next character should not be interpreted. The “.*” is a regular expression composed of “.” and “*”. The “.” tells “sed” to match any character. The “*” modifies the behaviour of “.” by telling “sed” to match a sequence of any character zero or more times. If you remove the escape character, the regular expression reads (.*). The parenthesis tells “sed” that whatever string of characters it matches in .* should be saved for later use. The final part of the regular expression says that the matched expression should contain the string doc.

The last paragraph is really a mouthful. If you are new to this, you might need to read it again, and again. That is just the first part of the “sed” input. Sed commands typically accept inputs of the following format:

sed 's/regex/replacement/g'

where regex is a regular expression to match, as discussed above and replacement is the text which can make use of the saved text matched in the regex. The “g” tells sed to apply the substitution for all matches. Let’s now try to dissect the replacement part of the sed input.

The replacement looks like mv & 1.txt. This still looks complicated but we’re almost done. The symbol & stands for the entire string matching the regular expression. The symbol 1 stands for the sequence of characters that matched (.*). Therefore, an input string file1.doc will give the following values to the replacement:

  • & = file1.doc
  • 1 = file1
  • mv & 1.txt = mv file1.doc file1.txt

If we now try to string the first two commands together, we get:

bobby-corpuss-imac-5:tmp bobby$ /bin/ls -1 *.doc |sed 's/(.*).doc/mv & 1.txt/g'
mv file1.doc file1.txt
mv file2.doc file2.txt
mv file3.doc file3.txt
mv file4.doc file4.txt
mv file5.doc file5.txt

If you look closely at the output above, you can see that we are in effect building a command. For example, the first line says mv file1.doc file1.txt which is a command to move file1.doc to file2.doc. The only thing we need to do now is to execute this command, which is conveniently done by piping the output to bash. Executing this command in the shell gives us the following output.

bobby-corpuss-imac-5:tmp bobby$ ls *.doc
file1.doc	file2.doc	file3.doc	file4.doc	file5.doc
bobby-corpuss-imac-5:tmp bobby$ /bin/ls -1 *.doc |sed 's/(.*).doc/mv & 1.txt/g'|bash
bobby-corpuss-imac-5:tmp bobby$ ls *.docls: *.doc: No such file or directory
bobby-corpuss-imac-5:tmp bobby$ ls *.txt
file1.txt	file2.txt	file3.txt	file4.txt	file5.txt

Are you ready to tap the power of the command line?

Advertisements

UNIX Filters

JAVAERO – UNIX was designed with simplicity in mind. Each command is supposed to do only one thing and do it well. However, this does not mean that
unix can only do simple things. On the contrary, highly complex tasks can be done by stringing these commands together. The output of one command can be
the input of another.

$ cat poem.txt| less

In this example, the command “cat” will dump to the console the contents of the file poem.txt. This is then made as input to the less command. This is quite
a simplistic example. In fact we can just do “less poem.txt” to have to same effect. However, this example is just to illustrate the concept of a unix pipe. The symbol “|” is called the pipe symbol and separates commands. The output of “cat poem.txt” is fed to the “less” command. Notice that the word immediately to the right of the “|” symbol will be interpreted by the shell as a command. An error will occur if the command does not exist.

Now that we have some familiarity with unix, let us add more to our arsenal of commands. One very common task we want to do is to find a specified file matching a certain pattern. The command to do this is “find”.

$ find . -name test.c

This command will search for files in the current directory named test.c. I will also search for test.c in all subdirectories contained in the current directory. The first argument to find is the directory to search. In this case, the . tells find to search in the current directory. The option “-name test.c” tells find that the filename matches test.c. In this command, we specified an exact match. We can also tell find to search for all files ending in .c

$ find . -name *.c

The “” backslash is necessary in order for the command to work. The “” is an escape character. We will have more to say about it in the coming sections.

More advanced shell.

Some characters are treated specially by the shell. When the shell encounters them in a command, it does some preprocessing before executing the command. The most common of these characters have something to do filename expansion.

Filename expansion allows us to specify a group of files using a pattern. For example, the pattern *.c will expand into all files whose name end in .c.

$ ls

$ ls *.c

is the same as

$ ls blah.c foo.c …

In other words, the shell first looks for all files ending in .c and constructs a list of all these files and makes them the argument to the “ls” command.

The following characters can be used to construct a pattern for filename expansion.

* – matches one or more characters
? – matches a single character
[] – matches any single character contains within the “[]”. If “^” is the first character of the list, it matches all files not containing any character in the list.

$ ls
README
license.txt
main.c
pr.h

To match main.c and pr.h you can either do

$ ls *.?

or

$ ls *.[hc]

The first command will match all files with a single letter extension. The second will match all .c and .h files and is equivalent to the command ls *.h *.c.

To match the file README, you can

$ ls R*

Exercise: Suppose we have the following files in the directory

$ ls
README
INSTALL
main.c
init.c
Makefile

How do you match README and INsTALL?

Since the characters *, ?, and [] are interpreted by the shell, we must avoid filenames that contain them. For example, if we have a file named *, we can’t just do this:

$ rm *

for this will match all files in the directory and will delete all of them. To match the file named * we must quote the * using the “” backslash character.

$ rm *

will delete the file named *.

However, this expansion will only be applicable to the current files in the directory and not to the files in the subdirectories. To find all files ending in .c in all subdirectories of the current folder, we use the find command.

$ find . -name *.c

Notice the backslash character. This is necessary in order for the shell not to expand the *.c.

REGULAR EXPRESSIONS

A very important skill that any unix professional should acquire is the ability to create and use regular expressions. A regular expression is a way to specify a pattern that can be used in matching text. A regular expressioin is similar to shell filename expansions but they are more powerful. The power of regular expressions is very hard to express in writing, they can only be experienced.

Three important filters in unix make use extensively of regular expressions, namely: sed, grep and awk.

The following characters are used to construct regular expressions.

^ – matches the beginning of a string
$ – matches the end of a string
. – matches any single character
[..] – matches a single character in the list
[^..] – matches a single character NOT in the list
(..) – used to group a regular expression
| – used to indicate alternative regular expression
* – an operator, it specifies that the preceding regular expression be matched zero or more times
+ – similar to the * operator but requires the preceding regular expression to be matched at least once.
? – similar to the * operator but requires the preceding regular expression to be matched at most once.
{N} – also known as a braced regular expression, is an operator that the requires the preceding regex to be matched N times
{N,} – at least N times
{N,M} – N to M times.

The above list will already produce a lot of possible regular expressions. In daily work, we usually only use a few of these constructs.

Let us give examples of the above constructs.

$ touch Readme theReadme

To match the file “Readme”,

$ /bin/ls -1 |grep “^Readme”

To match the file “theReadme”

$ /bin/ls -1 |grep “Readme$”

$ touch main.c main.x test.C test.cxx
$ /bin/ls -1 |grep “.[ch]$”

will match all files ending in .c or .h Notice that the “.” is explicitly matched and should be escaped by a backslash. The $ requires that the character “c” or “h” should be the last character on the line. Without the $, the pattern will also match .cxx.

$ /bin/ls -1 | grep “.[^ch]”

will match all files not containing a c or h after the dot.

Suppose we have the following files:

$ /bin/ls *Readme*
theReadme
Readme
dontReadme

If we want to match on theReadme and Readme, we can write

$ /bin/ls -1|grep “^(the)*Readme”

Notice that we grouped “the” as a regular expression. The * operator after the regex “the” will match zero or more “the” patterns. The ^ in the beginning of the pattern acts as an anchor. It requires that the pattern occur at the beginning of the line. This prevents the file dontReadme to be matched.

$ touch peterson johnson benson

To match peterson and johnson we use the alternation operator

$ /bin/ls -1 |grep “(peter|john)son”

We need to escape the “|” in order to protect it from the shell.

Sed is another tool makes extensive use of regular expressions. Sed is short for stream editor. It has a lot of options but most of the time, sed is used in
find-replace operations.

The syntax of sed find-replace is

sed ‘s/pattern to find/replacement/g’

$ echo “the quick brown fox jumped over another fox”|sed ‘s/fox/cat/g’
the quick brown cat jumped over another cat

In the above example, sed substituted the word fox with cat. Since the word “fox” occurred twice in this string, the substitution occured
twice. To replace only the first occurece, we omit the “g” in the above sed argument.

$ echo “the quick brown fox jumped over another fox”|sed ‘s/fox/cat/’
the quick brown cat jumped over another fox

The pattern “fox” is an exact pattern, We can also specify a regular expression in place of fox, for example,

$ echo “the quick brown fox jumped over another fox”|sed ‘s/f.*x/cat/’
the quick brown cat

This tells sed to substitute the word cat to the string that matched the pattern f.*x. Let us analyze this pattern. It consists of strings that start with “f”, one or more characters in between ( as specified by .*) and ends in y. You might think that sed will match the word “fox”. However, in this case, sed scanned the input string and first meets the letter “f”. Then it examines the next character, which is “o”. Since this satisfied the condion “one or more character in between”, sed accepts this and continues to examine the next character. Upon seeing “x”, sed should probably stop. However, it does not. It continues to scan the whole string until it encounters the last “x”. Therefore, the pattern that matched is “fox jumped over another fox”. In this way, sed is said to match the longest matching pattern.

Editing files using VI editor.

JAVAERO – The vi editor is probably the only editor you will use in the unix command line. It is quite different from the ordinary editor you are familiar in windows. It will take time to get used to. However, when you have mastered it, you’re will be more efficient in your work.

To edit a file named poem.txt, invoke vi like this:

$ vi poem.txt

If the file poem.txt exists, vi will open the file for editing. If not, this will tell vi to create a new file named poem.txt.

Unlike our usual microsoft editors where you can begin typing your text, vi does not allow you to do that. When it is launched it is said to be in “command mode”. In this mode, vi will interpret your keystrokes as commands. These commands are usually what you see in the toolbar of our familiar editors. In vi, there is no toolbar. Commands are invoked using key strokes. In order to type any text at all, you need to be in the “edit” mode. You can do this by pressing the “i” key. This is a command that tell vi to be in the edit mode. Once in the edit mode, you can then start typing. Here is a nice unix poem you can type using vi.

Waka waka bang splat tick tick hash,
Caret quote back-tick dollar dollar dash,
Bang splat equal at dollar under-score,
Percent splat waka waka tilde number four,
Ampersand bracket bracket dot dot slash,
Vertical-bar curly-bracket comma comma CRASH.

One you are done entering the text, you go to the command mode using the escape key (ESC). In the command mode, you can save your text by typing “:wq”.

This is basic vi editing. other advance features we will learn as we progress.

Getting Started With the BASH Shell

Perhaps the first question you want to know when you are at the shell is where you are. The command pwd will give you your current working directory.

$ pwd
/home/bobby

The ls command will give you a list of all files in the current directory.

$ ls

Giving it an argument, ls will determine if the argument is a directory. If it is, it will list the files contained in that directory. If it is not a file, ls will just list down the file as if echoing what you typed. However, if the file does not exist, ls will tell you that fact.

$ ls non-existent
non-existent not found

Some unix commands require a parameter in order to complete successfully. Some, like the ls command above, have default values. A unix command is structured in the following way:

command_name options arguments

options are usually preceded by a “-” sign. Options allow you have more control over the output of a command. For example, the command

$ ls -l

will list down detailed information about the files in the current directory. We will learn how to interpret the output of this command later when we gain more familiarity with the basic commands.

Exercise: List down the files of your parent directory.

Other basic commands.

To change to a directory, we use the cd command. It takes a single argument, which is the directory to change to. If you don’t specify an argument, it will change to your home directory, as defined by the HOME environment variable.

To create a directory, we use the mkdir command. It takes one or more arguments. These arguments are the names of the directories to create.

$ mkdir tmp1 tmp2 tmp3

will create directories tmp1, tmp2 tmp3 as shown by the ls command below.

$ ls -F

tmp1/ tmp2/ tmp3/

Question: What does the -F option to ls do?
We can remove the directories we just created using the rmdir command.

$ rmdir tmp1 tmp2 tmp3

If a directory is not empty, rmdir will refuse to remove that directory unless it is first emptied of all files.

Exercise: How do you force rmdir to remove a non-empty directory.

We can create empty files using the command “touch”.

$ touch me

Will create the file named “me” in the current working directory.

To remove this file, use the “rm” command.

$ rm me

We should use the rm command with caution because you cannot undelete the file you just deleted. You can tell rm to ask you for confirmation to delete the file using the “-i” commandline switch.

$ rm -i me
me: ? (y/n)

We now know how to do some basic commands for unix. But we are still not able to do anything useful. Next we want to do is to be able to view the content of a file. There are many ways to peek
at the contents of files. For short files that will fit the screen, we can use the “cat” command.

$ cat filename

This will dump on the screen the contents of file named filename. If the file is long and will not fit one
screenful, the contents will just flash quickly on the screen and you will just be able to view the end portion
of the file.

To view longer files, we can use the “more” command.

$ more filename

will let you view the file page by page. The more command is also known as a “pager” command because it let’s you view a file page by page. If a file is long, you can scroll the the next page using the space bar. The only problem with the “more” command is that you cannot go back to the previous page. You can only scroll-down, no up.

A more powerful pager than “more” is the “less” command. It allows you to scroll up and down. In this respect, we can say that “less” is more.

$ less long-file

To scroll backwards, press the “b” key.

Shell environment variables.

The shell is not only a command interpreter, it is also a programming environment. By an environment, what we mean is that is gives you the necessary tools to create programs conveniently and according to your own programming style and preference. We will explore that later.

As we have seen before, some commands are able to operate without an argument, like the cd command. The cd commands depends on the environment variable HOME which stores the value of your home directory. To view the value of this variable we use the echo command.

$ echo $HOME
/home/bobby

Using this value, cd command will change to that directory when there are no arguments.

There are predefined environment variables when your account is first created. Many commands depend on the definitions of these variable for proper functioning. (just like the human body depends on some factors in order to operate properly). Among the most important variables you the shell depends are PATH. The path tells the shell where to find programs. Accidentally changing the value of this variable can give you a big headache as the shell cannot anymore find some programs.

You can also define your own environment variables. To define an environment variable, the syntax is

name=value

For example, to define the variable QUOTE to “Health is wealth.”
we type

$ QUOTE=”Health is wealth.”

Notice that the value is enclosed in quotation marks. This will tell the shell that the value contains spaces. If the quotation marks is ommitted an error will occur, like this:

$ QUOTE=Health is wealth
bash: is: command not found

As you can see, the shell interpreted “is” as a command, and having found none, it issues the error “command not found”.

Unix Questions

Unix Questions. This is a commandline exercise. Find a way to make execute these instructions in the fastest way possible.
1. cut and paste the following to a file named the_road_not_taken
TWO roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,
And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.
I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I.
I took the one less traveled by,
And that has made all the difference
2. count the number of words
3. count the number of unique words
4. create a directory for every unique word
5. move the directories to their uppercase equivalent. For example,
mv Yet YET
6. copy the file the_road_not_taken to each directory with the name of the directory in lower case and extension is .txt
for example:
cp the_road_not_taken WORN/worn.txt
7. rename all .txt files to .doc extension. For example
mv worn.txt worn.doc
8. format the file the_road_not_taken in such a way that it will look like this:

TWO |roads |diverged |in |a |yellow |wood, |
And |sorry |I |could |not |travel |both |
And |be |one |traveler, |long |I |stood |
And |looked |down |one |as |far |as |I |could |
To |where |it |bent |in |the |undergrowth;|