Unix/Linux Shell I/O Redirection (including stdin, stdout, stderr, and Pipes)

Ian! D. Allen – www.idallen.com

Winter 2017 - January to April 2017 - Updated 2019-01-29 15:21 EST

1 Introduction to redirection – changing Input/OutputIndexup to index

Shell redirection is a powerful way to change from where commands read input and to where commands send output. It applies to every command run by the shell, so once you learn the shell syntax of how it works, it works on all the commands you can type into the shell.

If you want to hear a simple explanation of the power of shell redirection directly from the bearded men who invented it back in the 1970s, watch either of these historic 1982 videos on The UNIX Operating System:

1.1 Redirection of input – redirection of outputIndexup to index

You can redirect the input of a command away from your keyboard, and you can redirect the output of a command away from your screen.

The redirection can be to or from a file using the shell meta-characters ‘>’ or ‘<’ (angle brackets) or it can be to or from a program using the shell meta character ‘|’ (the pipe symbol).

2 Output Redirection – Standard Output and Standard ErrorIndexup to index

Commands produce two kinds of output – normal output and error message output – and the shell can redirect each of these separately.

2.1 Normal Output: Standard Output – stdoutIndexup to index

In the process of output redirection, the shell (not the command) redirects (diverts) most command output that would normally appear on the screen to some other place. The redirection can be either into a file using a file output redirect meta-character ‘>’, or into the input of another command by separating the commands using a pipe meta-character ‘|’:

$ echo hi >outfile         # redirect echo output into an output file "outfile"
$ echo hi | wc             # redirect echo output into the program "wc"

The normal command output that appears on your screen is called the standard output of the command, abbreviated stdout. It is the expected output that a command generates if everything works. Anything written to stdout can be redirected as shown in the above examples.

2.2 Error Messages: Standard Error – stderrIndexup to index

If something goes wrong, commands produce error messages. These error message are sent to what is called the standard error output of the command, abbreviated stderr. Error messages are almost always sent to your screen, even if you redirect stdout somewhere else:

$ ls nosuchfile >outfile
ls: cannot access nosuchfile: No such file or directory

Standard error output is not subject to simple output redirection, but with extra syntax the shell can redirect it, too, into a file:

$ ls nosuchfile >outfile 2>errors

In programming terms, stdout and stderr outputs are written to different I/O units, but both end up on your screen, by default. The shell can redirect them separately. More on that later.

2.3 Four Rules for Output RedirectionIndexup to index

You need to remember these four things about output redirection:

  1. Redirection is done for the command by the shell, first, before finding and running the command; the shell has no idea if the command exists or will produce any output. The shell performs the redirection before it finds or runs the command.

  2. The shell can only redirect output that is produced by the command. You can only redirect the output that you can see. If there is no visible output on your screen without redirection, adding redirection won’t create any. Re-read this a few times and remember it.

    Before you redirect output into a file or a pipe, look at what output is on your screen with no redirection added. If what is appearing on your screen isn’t right wihout redirection, adding redirection won’t make it right.

  3. Redirection can only go to one place. You can’t use multiple redirections to send output to multiple places. (See the tee command for a way to send output into multiple files.)

  4. By default, error messages (called “standard error” or stderr) are not redirected; only “normal output” (called “standard output” or stdout) is redirected. (You can also redirect stderr with more shell syntax; see below.)

Summary of Rules for Output Redirection:

  1. Redirection is done first, before running the command.
  2. You can only redirect output that you can (could) see.
  3. Redirection goes only to one place.
  4. Only standard output is redirected, by default.

We will discuss each of these rules in detail, below.

3 Output redirection into files using >Indexup to index

The shell meta-character right angle bracket>’ signals that the next word on the command line is an output file (not a program) that the shell should create or truncate (set to empty) and make ready to receive the standard output of a command. Standard output is the normal output that appears on your screen when you run the command:

$ echo hi                  # normal output appears on screen; no redirection
hi
$ echo hi >outfile         # output goes into file "outfile", not on screen
$ cat outfile              # display the contents of outfile on screen
hi

The space before the angle bracket ‘>’ is sometimes required, so always include it and you won’t go wrong. The space after the ‘>’ is optional and most people omit it:

$ echo hi >outfile         # this is the most common usage
$ echo hi > outfile        # space is optional after the >
$ echo hi>outfile          # AVOID THIS - always put a space before the >

Putting the space in front of ‘>’ makes the command line easier to read.

Output redirection means the shell truncates (make empty) an existing file. An existing file will have its contents removed:

$ echo hello >out          # "hello" goes into the file "out"
$ cat out
hello
$ nosuchcommandxxx >out    # "out" is made empty again
sh: nosuchcommandxxx: command not found
$ cat out                  # command failed -- "out" is still empty
$

The shell always makes the output file empty before trying to run the command. If the command fails or doesn’t produce any standard output, the redirection output file remains empty.

3.1 Rule 1a: Redirection is done by the shell, not by the commandIndexup to index

It is the shell that does all redirection, not the command being run.

For output redirection into files, the shell creates or truncates the output file and sets up the redirection, not the command being redirected. The command knows nothing about the redirection – the redirection syntax is removed from the command line by the shell before the command is found and executed:

$ echo one two three                   # echo has three command line arguments
one two three
$ echo one two three >out              # echo still has three arguments
$ cat out
one two three

Shells handle redirection before they go looking for the command name to run. Indeed, redirection happens even if the command is not found or if there is no command at all:

$ >out                                 # file "out" is created or truncated empty
$ wc out
0 0 0 out                              # shell created an empty file

$ nosuchcommandxxx >out                # file "out" is created empty
sh: nosuchcommandxxx: command not found
$ wc out
0 0 0 out                              # shell created an empty file

The shell creates or truncates the file “out” empty, and then it tries to find and run the nonexistent command and fails. The empty file that was created by the shell remains.

3.2 Rule 1b: Redirection file creation and truncation happen firstIndexup to index

The shell does the output file creation and truncation before it finds or runs the command. This can affect the output of commands that operate on files:

$ mkdir empty
$ cd empty
$ ls -l
total 0                                 # no files found
$ ls -l >out                            # shell creates "out" first
$ cat out                               # display output
total 0
-rw-r--r--  1 idallen idallen 0 Sep 21 06:02 out

The shell creates the file out before it runs the ls command, so the ls command finds the new output file and the output of ls with redirection is different than the output of ls without.

Because the shell truncates the redirection output file before the shell looks for and runs the command, you cannot use output redirection files as input to the same command:

$ cp /etc/passwd a
$ sort a >a         # WRONG WRONG WRONG! File "a" is truncated to be empty!
$ cat a             # shows empty file
$

Above, the shell makes the file a empty before it runs the sort command. The sort command sorts an empty file, and the empty output goes into the file a, which remains empty.

Never use redirection output files as input files!

3.3 Rule 2: You can only redirect what you can seeIndexup to index

Redirection does not create output that wasn’t there without redirection. If you don’t see any output from a command without redirection, adding redirection to the command won’t cause the command to create output.

Adding redirection to a command that generates no output will simply have the shell redirect no output and create an empty file:

$ cp /etc/passwd x             # no output visible on standard output
$ cp /etc/passwd x >out        # file "out" is created empty

$ cd /tmp                      # no output visible on standard output
$ cd /tmp >out                 # file "out" is created empty

$ touch x ; rm x               # no output from rm on standard output
$ touch x ; rm x >out          # file "out" is created empty

You can only redirect the output that you can see! Run a command without redirection and observe what output the command produces. If you don’t see any output, adding redirection won’t create any.

3.4 Rule 3: Output redirection only goes to one placeIndexup to index

Output redirection can only go to one place, so adding multiple output redirection to a command line doesn’t do what you might think:

$ date >a >b >c                # output goes into file c; a and b are empty

The right-most output file redirection always gets all the output and the other output redirection files to the left are created empty by the shell.

If you redirect output into both a file and a pipe, the file gets all the output and the pipe gets nothing:

$ date >a | cat     # all output goes into file "a"; cat shows nothing

See the following section on redirection into programs using “|” pipes.

3.5 Rule 4: Error messages do not get redirected, by defaultIndexup to index

By default, the shell only redirects the standard output (normal output) of commands. Error messages from commands are not redirected into the output file and still appear directly on your screen:

$ ls -l nosuchfile /etc/passwd
ls: cannot access nosuchfile: No such file or directory
-rw-r--r-- 1 root root 2542 Jun 24  2014 /etc/passwd

$ ls -l nosuchfile /etc/passwd >out    # standard output goes into "out"
ls: cannot access nosuchfile: No such file or directory

$ cat out                              # show contents of "out"
-rw-r--r-- 1 root root 2542 Jun 24  2014 /etc/passwd

Error messages continue to appear on your screen, even with redirection, so that you know the command you are using had an error.

3.6 Output file redirection syntax can go anywhereIndexup to index

Shells don’t care where in the command line you put the output file redirection. No matter where in the command line you type it, it has the same effect, though most people put it at the end, as in the first example below:

$ echo hi there mom >file                   # echo has three arguments
$ echo hi there >file mom                   # echo has three arguments
$ echo hi >file there mom                   # echo has three arguments
$ echo >file hi there mom                   # echo has three arguments
$ >file echo hi there mom                   # echo has three arguments

All the command lines above are equivalent and create the same output file. The redirection syntax is processed by and removed by the shell before the command runs. The redirection syntax is never counted as arguments to a command.

In every case above the echo command sees exactly three command line arguments and all three arguments “hi”, “there”, and “mom” are all redirected into the output “file”.

The file redirection is found and done first by the shell, then the redirection syntax is removed from the command line before the command is called. The command actually being run doesn’t see any part of the redirection syntax; the number of arguments is not affected.

3.7 Output redirection: Appending to files using >>Indexup to index

You can append (add) output to the end of a file, instead of truncating it to empty, using a double right angle bracket “>>”:

$ echo first line  >file        # file is truncated using single >
$ echo second line >>file       # append second line to end of file using >>
$ echo third line  >>file       # append third line to end of file using >>
$ cat file                      # display what is in the file
first line
second line
third line

The redirection output file is always truncated (set to empty) before the command runs, unless you use the append syntax “>>” instead of “>”.

3.8 Summary of Standard Output redirection into filesIndexup to index

Redirection is done by the shell, first, before it finds and runs the command. Things happen in this order on the command line:

  1. All redirection is found and done by the shell, no matter where the redirection is typed in the command line. All redirection output files are created or truncated to empty (except when appending). This redirection and truncation happens even if no command executes. (If the redirection fails, the shell does not run any command.)

    The output file is created or truncated to zero size before the command runs.

  2. The shell removes all the redirection syntax from the command line. The command will have no idea that its output is being redirected.

  3. The command (if any) executes and may produce output. The shell executes the command after doing all the redirection.

  4. The output from the command (if any) happens, and it goes into the indicated redirection output file. This happens last. If the command produces no output, the output file will be empty. Adding redirection never creates output. Standard Error Output (error messages) does not get redirected; it goes onto your screen.

3.9 Exercises in shell Standard Output redirectionIndexup to index

Explain this sequence of commands:

$ mkdir empty
$ cd empty
$ cp a b
cp: cannot stat 'a': No such file or directory
$ cp a b >a
$                # why is there no error message from cp this time? what is in file a ?

Explain this sequence of commands:

$ date >a
$ cat a
Wed Feb  8 03:01:21 EST 2012

$ cp a b
$ cat b
Wed Feb  8 03:01:21 EST 2012

$ cp a b >a
$ cat b
$                                 # why is file b empty? what is in file a ?

Explain this sequence of commands:

$ rm
rm: missing operand

$ touch file
$ rm >file
rm: missing operand               # why doesn't rm remove "file"?

$ rm nosuchfile
rm: cannot remove 'nosuchfile': No such file or directory

$ rm nosuchfile >nosuchfile
$                                 # why is there no rm error message here?

Is the file nosuchfile in existence after the last command, above?

How many words are in each of these five output files?

$ echo one two three >out1
$ echo one two >out2 three
$ echo one >out3 two three
$ echo >out4 one two three
$ >out5 echo one two three

What is in each file a, b, c, after this command line?

$ echo hi >a >b >c

4 Standard Output (“stdout”) and Standard Error (“stderr”)Indexup to index

Most commands have two separate output “streams” or “units”, numbered 1 and 2:

  1. Standard Output or stdout: the normal output of a command (stream 1)
  2. Standard Error Output or stderr: the error and warning output (stream2)

The stdout (stream 1) and stderr (stream 2) outputs mix together on your screen. They look the same on the screen, so you can’t tell by looking at your screen what comes out of a program on stdout and what comes out of a program on stderr.

To show a simple example of stdout and stderr both appearing on your screen, use the ls command and give it one file name that exists and one name that does not exist (and thus causes an error message to be displayed on standard error output):

$ ls /etc/passwd nosuchfile                 # no redirection used
ls: nosuchfile: No such file or directory   # this on screen from stderr
/etc/passwd                                 # this on screen from stdout

Both output streams look the same on your screen. The stderr (error message) output often appears first, before stdout, due to internal I/O buffers used by commands for stdout.

The default type of output redirection (whether redirecting to files or to programs using pipes) redirects only standard output and lets standard error go, untouched, to your screen:

$ ls /etc/passwd nosuchfile >out            # shell redirects only stdout
ls: nosuchfile: No such file or directory   # only stderr appears on screen
$ cat out
/etc/passwd                                 # stdout went into the file

Programming information (for programmers only):

The Standard Output is the Unit 1 output from printf and cout statements in C and C++ programs, and from System.print and System.println in Java.

The Standard Error Output is the Unit 2 output from fprintf(stderr and cerr statements in C and C++ programs, and from System.err.print and System.err.println in Java.

4.1 Redirecting Standard Error Output using 2>outfileIndexup to index

Normally, both stdout and stderr appear together on your screen, and redirection only redirects stdout and not stderr.

You can redirect stdout and stderr separately into files using a unit number immediately before the right angle-bracket ‘>’ meta-character:

  1. Standard Output (stdout) is unit 1 – redirect it using >outfile or 1>outfile
  2. Standard Error Output (stderr) is unit 2 – redirect it using 2>outfile

Put the unit number immediately (no blank) before the ‘>’ meta-character to redirect just that output stream:

$ ls /etc/passwd nosuchfile 2>errors        # shell redirects only stderr (unit 2)
/etc/passwd                                 # only stdout (unit 1) appears on screen
$ cat errors
ls: nosuchfile: No such file or directory   # stderr unit 2 went into file "errors"

If you don’t use a unit number before the right angle-bracket >, the default is to assume Unit 1 and redirect just standard output. The default output redirection syntax >foo (no preceding unit number) is a shell shorthand for 1>foo so that >foo and 1>foo are identical.

You can redirect stdout (unit 1) into one file and stderr (unit 2) into another file using two redirections on the same command line:

$ ls /etc/passwd nosuchfile >out 2>errors   # shell redirects each one
$                                           # nothing appears on screen
$ cat out
/etc/passwd                                 # stdout unit 1 went into "out"
$ cat errors
ls: nosuchfile: No such file or directory   # stderr unit 2 went into "errors"

Always use different output file names if you redirect both units. Do not redirect both units into the same output file name; the two outputs will overwrite each other. See the next section.

4.2 Redirecting both stdout and stderr using 2>&1Indexup to index

You needed a special syntax “2>&1” to redirect both stdout and stderr safely together into a single file. Read the syntax “2>&1” as “send unit 2 to the same place as unit 1”:

$ ls /etc/passwd nosuchfile >both 2>&1      # redirect both into same file
$                                           # nothing appears on screen
$ cat both
ls: nosuchfile: No such file or directory
/etc/passwd

The order of the redirections >both and 2>&1 on the command line matters! The stdout redirect “>both” (unit 1) must come first (to the left of) the stderr redirect “2>&1” (unit 2) because you must set where unit 1 goes before you send unit 2 to go “to the same place as unit 1”. Don’t reverse these! Remember: 1 comes before 2.

You must use the special syntax “>both 2>&1” to put both stdout and stderr into the same file. Don’t use the following, which is not the same:

$ ls /etc/passwd nosuchfile >wrong 2>wrong  # WRONG! DO NOT DO THIS!
$ cat wrong
/etc/passwd
ccess nosuchfile: No such file or directory

This above WRONG example shows how stderr and stdout overwrite each other and the result is a mangled output file; don’t do this. Use 2>&1 to send stderr into the same file as stdout.

The modern Bourne shells now have a special shorter syntax for redirecting both stdout and stderr into the same output file:

$ ls /etc/passwd nosuchfile &>both          # redirect both into same file
$                                           # nothing appears on screen
$ cat both
ls: nosuchfile: No such file or directory
/etc/passwd

You can now use either “&>both” or “>both 2>&1”, but only the latter works in every version of the Bourne shell (back to the 1960’s!). When writing shell scripts, use the “>both 2>&1” version for maximum portability. Don’t rely on &>both working everywhere.

5 Output redirection mistakes to avoidIndexup to index

The most common redirection mistake is to use a redirection output file as a command argument input file. There is an obvious way to get this wrong, and a hidden way to get this wrong:

5.1 Obvious misuse of redirection output file as input file: sort out >outIndexup to index

Suppose you want to sort a file and put the sorted output back into the same file. (The sort command is used as the example program below – anything that reads the content of a file and produces output is at risk.) This is the WRONG way to do it:

$ cp /etc/passwd out
$ sort out >out    # WRONG! Redirection output file is used as sort input file!
$ cat out
$                   # File is empty!

Here is the problem with the sort out >out command above:

  1. The shell first finds the output redirection on the command line and truncates (makes empty) the file “out” and gets it ready to receive the standard output of the command being run:
    • Redirection is always done first by the shell, before running the command.
    • The output file (which is also the input file) is now empty.
    • The original contents of “out” are lost – truncated – GONE! – before the shell even goes looking for the sort command to run!
  2. The shell finds and runs the sort command with its one file name argument “out” that is now an empty file.
  3. The sort command opens the empty argument file “out” for reading. Sorting an empty file produces no output.
  4. Standard output of the command has been redirected by the shell to appear in file “out”, so the “no output” goes into file out; the file remains empty.

Result: File “out” is always empty, no matter what was in it before.

There are two safe and correct ways to do this, one of which depends on a special output file feature of the sort command (that may not be available in other commands):

$ sort out >tmp  &&  mv tmp out   # sort into tmp file and rename tmp to out
$ sort -o out out                 # use special sort output file option

Here is another incorrect example that uses the same redirection output file as an input file. The result is wrong but is not an empty file this time:

$ date >out
$ wc out         # count the lines, words, and characters in file "out"
1 6 29 out
$ wc out >out    # WRONG! Redirection output file is used as input file!
$ cat out
0 0 0 out        # Using wc on an empty file produces zeroes!

Here is the problem with the wc out >out command above:

  1. The shell first finds the output redirection on the command line and truncates (makes empty) the file “out” and gets it ready to receive the standard output of the command being run:
    • Redirection is always done first by the shell, before running the command.
    • The output file (which is also the input file) is now empty.
    • The original contents of “out” are lost – truncated – GONE! – before the shell even goes looking for the wc command to run!
  2. The shell finds and runs the wc command with its one file name argument “out” that is now an empty file.
  3. The wc command opens the empty argument file “out” for reading. It counts the lines, words, and characters of an empty file and produces one line of output: 0 0 0 out
  4. Standard output of the command has been redirected by the shell to appear in file “out”, so the one line of output goes into file out. The file shows all zeroes, not the word count of the original date.

Result: File “out” always shows zeroes, not the count of the original content.

Here is the only safe and correct way to do this with wc:

$ wc out >tmp  &&  mv tmp out    # do output redirection into tmp file and move it

Other incorrect redirection examples that DO NOT WORK because the redirection output file is being used as an input file:

$ head file >file           # ALWAYS creates an EMPTY FILE
$ tail file >file           # ALWAYS creates an EMPTY FILE
$ uniq file >file           # ALWAYS creates an EMPTY FILE
$ cat  file >file           # ALWAYS creates an EMPTY FILE
$ sort file >file           # ALWAYS creates an EMPTY FILE
$ fgrep 'foo' file >file    # ALWAYS creates an EMPTY FILE
$ wc   file >file           # ALWAYS counts an EMPTY FILE (0 0 0)
$ sum  file >file           # ALWAYS checksums an EMPTY FILE (0)
...etc...

Do not use a redirection output file as an input to a program or a pipeline! Never use the same file name for both input and redirection output – the shell will truncate the file before the command reads it.

5.2 Hidden misuse of redirection output file as GLOB input fileIndexup to index

The hidden way to have a redirection output file used as an input file is to have the input file name hidden in a shell GLOB wildcard expansion.

Suppose you want to number the lines in a bunch of files and put the numbered output into a file file in the same directory. (The nl command is used as the example program below – anything that reads the content of a file and produces output is at risk.) This is the WRONG way to do it:

$ cp /etc/passwd bar   # create a file larger than a disk block
$ touch foo
$ nl * >foo  # WRONG! GLOB * input files match redirection output file!
^C           # interrupt this command immediately before your disk is full!
$ ls -l
-rw-rw-r--  1 idallen idallen    194172 Feb 15 05:19 bar
-rw-r--r--  1 idallen idallen 289808384 Feb 16 05:20 foo  # HUGE FILE!

Here is what happens to make the output file “foo” grow forever when you type nl * >foo:

  1. The shell first expands the GLOB “*” to match all the pathnames in the current directory, that includes the “bar” and “foo” names: nl bar foo >foo
  2. The shell truncates output redirection file foo and gets file foo ready to receive all the stdout of the command as it runs.
  3. The shell finds and runs the nl command, giving it all the GLOB file name arguments including the names “bar” and “foo”: nl bar foo
  4. The nl command opens the first input file (from the GLOB expansion) named “bar” and sends its numbered output to stdout, which means into the redirection output file foo.
  5. The nl next opens the next input file (from the GLOB expansion) named “foo” and starts reading lines from the top of the file, numbering them, and writing the numbered output to stdout, which is the bottom of the same file. The output file is the same as the input file and the nl command is reading lines from the same file into which it is writing lines. This never finishes, and the file “foo” grows until all the disk space is used.

Result: An infinite loop by nl reading and writing the same file. Eventually the disk drive fills up as “foo” gets bigger and bigger.

Fix #1: Use a hidden file name that GLOB doesn’t match as an input file:

$ nl * >.z

Fix #2: Use an output redirection file in some other directory not matched by the shell GLOB pattern:

$ nl * >../z
$ nl * >/tmp/z

Do not use a wildcard/GLOB file pattern that picks up the name of the output redirection file and causes it to become an unintended input file.

6 Summary: Four Rules for Output RedirectionIndexup to index

Here are the four rules for Output Redirection again:

  1. Redirection is done first, before running the command.
  2. You can only redirect output that you can (could) see.
  3. Redirection goes only to one place.
  4. Only standard output is redirected, by default.

Never use a redirection output file as an input file!

7 Input Redirection – Standard InputIndexup to index

Many Unix/Linux commands read input from files, if file pathnames are given on the command line. If no file names are given, these commands usually read from what is called Standard Input (“stdin”), which is usually connected to your keyboard. (You can send EOF by typing ^D (Ctrl-D) to get the command to stop reading your keyboard.)

Here is an example of the nl command reading from a file, then reading from stdin (your keyboard) when no files are supplied:

$ nl /etc/passwd      # nl reads content from the file /etc/passwd
[...many lines print here, with line numbers...]
$

$ nl                   # no files; nl reads standard input (your keyboard)
foo                    # you type this line and push ENTER
    1  foo             # this is the line as numbered and output by nl
bar                    # you type this line and push ENTER
    2  bar             # this is the line as numbered and output by nl
^D                     # you signal keyboard EOF by typing ^D (CTRL-D)
$

Examples of commands that may read from pathnames or, if not given any pathnames, from standard input:

less, more, cat, head, tail, sort, wc, grep, fgrep, nl, uniq, etc.

Commands such as the above may read standard input. They will read standard input (which may be your keyboard) only if there are no pathnames to read on the command line:

$ cat foo       # cat opens and reads file "foo"; cat completely ignores stdin
$ cat           # cat opens and reads standard input = your keyboard; use ^D for EOF

$ tail foo      # tail opens and reads "foo"; tail completely ignores stdin
$ tail          # tail opens and reads standard input = your keyboard; use ^D for EOF

$ wc foo        # wc opens and reads file "foo"; wc completely ignores stdin
$ wc            # wc opens and reads standard input = your keyboard; use ^D for EOF

The above is true for all commands that can read from stdin. They only read from stdin if there are no pathmames given on the command line.

To tell a command to stop reading your keyboard, send it an EOF (End-Of-File) indication, usually by typing ^D (Control-D). If you interrupt the command (e.g. by typing ^C), you usually kill the command and the command may not produce any output at all.

7.1 Not all commands read standard inputIndexup to index

Not all commands read from standard input, because not all commands read data from files supplied on the command line. Examples of common Unix/Linux commands that don’t read any data from files or standard input:

 ls, date, who, pwd, echo, cd, hostname, ps, sleep   # etc. NEVER READ DATA from STDIN

All the above commands have in common the fact that they never open any files for reading on the command line. If a command never reads any data from any files, it will never read any data from standard input, and it will never read data from your keyboard or anywhere else.

The Unix/Linux copy command cp obviously reads content from files, but it never reads file data from standard input because, as written, it always has to have both a source and destination pathname argument. The cp command must always have an input file name. It never reads file data from standard input.

7.2 Redirection of standard input from a file: <fileIndexup to index

The shell meta-character left angle-bracket<’ signals that the next word on the command line is an input file (not a program) whose content the shell should make available to a command on standard input. Standard Input is the place that many commands read when they don’t have any pathnames to open. The command may or may not actually read the input made available by the shell; the shell can’t know that.

Using the shell meta-character ‘<’ to do input redirection, the shell changes from where standard input comes for a command, so that it doesn’t come from your keyboard but instead comes from the specified input file.

$ nl                   # no files; nl reads standard input (your keyboard)
foo                    # you type this line and push ENTER
    1  foo             # this is the line as numbered and output by nl
^D                     # you signal keyboard EOF by typing ^D (CTRL-D)
$

$ nl </etc/passwd      # no files; nl reads from standard input (/etc/passwd)
[...many lines print here, with line numbers...]
$

You can only usefully use standard input redirection on a command that would otherwise read your keyboard. If the command doesn’t read your keyboard (standard input) without the redirection, adding the redirection does nothing and is ignored. The redirection only works if, without redirection, the command would read your keyboard.

If (and only if!) a command reads from standard input, the redirected standard input will cause the program to read from whatever file the shell attaches to standard input. Here are examples using the shell to attach files to commands that are all reading standard input:

$ cat file              # reads from file "file"
$ cat                   # reads from stdin (from your keyboard)
$ cat <file             # reads from stdin that is now from file "file"
$ cat file <bar         # reads from file "file" and ignores stdin file "bar"

$ head file             # reads from file "file"
$ head                  # reads from stdin (from your keyboard)
$ head <file            # reads from stdin that is now from file "file"
$ head file <bar        # reads from file "file" and ignores stdin file "bar"

$ sort file             # reads from file "file"
$ sort                  # reads from stdin (from your keyboard)
$ sort <file            # reads from stdin that is now from file "file"
$ sort file <bar        # reads from file "file" and ignores stdin file "bar"

The above is true for all commands that can read from stdin. They only read from stdin if there are no pathmames given on the command line.

The shell does not know which commands will actually read input from standard input; you can attach a file on standard input to any command. A command that ignores standard input will ignore the attached file.

If a command is not reading from standard input, redirecting input into the command will be ignored and do nothing. The shell cannot force a command to read any data from standard input.

For example, the date command and the sleep command never read any data from standard input, and you can’t force them to do so by adding redirection. The redirection is just ignored:

$ date                            # date never reads stdin
Thu Feb 16 05:48:13 EST 2012
$ date <file                      # date never reads stdin and ignores <file
Thu Feb 16 05:48:15 EST 2012

$ sleep 10                        # sleep never reads stdin
$ sleep 10 <file                  # sleep never reads stdin; ignores <file
$ sleep    <file                  # sleep never reads stdin; ignores <file
sleep: too few arguments

Many other common commands never read standard input, and so adding input redirection to these commands does nothing useful:

$ ls -l /bin                      # show pathnames under /bin
$ ls -l /bin <input               # no difference; ls never reads stdin

$ cd /bin                         # change to the /bin directory
$ cd /bin <input                  # no difference; cd never reads stdin

$ cp foo bar                      # cp reads data from foo and writes to bar
$ cp foo bar <file                # no difference; cp never reads stdin

Commands have to want to read stdin. The shell can’t force it.

7.3 Commands ignore standard input if they are given file names to readIndexup to index

Commands that take pathname arguments do not read standard input if any pathnames are present on the command line. If supplied with pathname arguments, the commands always read the pathnames and ignore stdin.

Here are more examples that DO NOT WORK as input redirection because the command was not reading from standard input when redirection was added. The following command lines all ignore standard input, because all the commands have been given file name arguments to read instead:

$ cat  file1 <file2       # cat  reads from "file1", ignores stdin <file2
$ sort file1 <file2       # sort reads from "file1", ignores stdin <file2
$ head file1 <file2       # head reads from "file1", ignores stdin <file2
$ tail file1 <file2       # tail reads from "file1", ignores stdin <file2

The above is true for all commands that can read from stdin. They only read from stdin if there are no pathmames given on the command line.

If there are pathname arguments on the command line, stdin is not used. In all the above incorrect examples, the shell will open the file file2 and attach it and make it ready on stdin for the command to read; the command itself will ignore stdin and read from the file1 pathname argument supplied on the command line. Attaching the input redirection <file2 on standard input is ignored because the command is reading from the pathname argument.

Commands never read both pathname arguments and standard input; it’s one or the other, and command pathname arguments are always used instead of stdin.

7.4 Syntax and input redirection: wc file vs. wc <fileIndexup to index

If a file can be supplied as a command line pathname or attached to a command via standard input, what is the difference? Below are the differences between “wc file” and “wc <file”:

We assume we have put the current date and time into a file:

$ date >file

7.4.1 wc fileIndexup to index

$ wc file
1  6 29 file
  • there is no redirection syntax used here; the shell will not open any files
  • the wc command has a pathname argument, which means it ignores stdin
  • the wc command reads data from the file file that it opens itself
  • the wc command is the program that is opening the file argument file, not the shell
  • any errors will come from the wc command, not the shell, and will mention the file name given on the command line, e.g.:
$ wc /etc/shadow
wc: /etc/shadow: Permission denied

Note how it is the wc program that issues the error message, above.

7.4.2 wc <fileIndexup to index

Rather than giving the command a pathname argument, you might instead redirect input to the command using shell input redirection:

$ wc <file
1  6 29
  • there is redirection syntax used; the shell is performing standard input redirection from file file, which means standard input for the wc program will come from the file named file
  • the wc command has no pathname arguments, which means it will read from standard input, opened by the shell
  • the shell is the program that is opening the file file, not the wc command
  • because wc has no file name, it can’t print a file name in the output
  • any errors will come from the shell, not from the wc command, and the shell will be the one mentioning the file name, e.g.:
$ wc </etc/shadow
-bash: /etc/shadow: Permission denied

Note how it is the bash shell that issues the error message, above. Because the shell cannot open the file, it will not even look for or run the wc program. Redirection I/O errors mean that no command will be run.

7.4.3 Commands displaying pathnames: wc file vs. wc <fileIndexup to index

For commands that display their input pathnames in their output, the difference between giving a pathname on the command line or using stdin is more significant. Normally, the pathname is passed to the command, so the command knows the pathname and prints the name in the output:

$ wc -l /etc/passwd
44 /etc/passwd
  • there is no redirection syntax used; the shell will not open any files
  • wc was passed the file name /etc/passwd as a command line pathname argument and so wc has to open the file itself and knows its name
  • the wc command knows the file name, so it prints the name in the output

If no pathnames are supplied on the command line and all the data comes from standard input, there is no pathname available to the command to indicate in the output:

$ wc -l </etc/passwd
44
  • there is redirection syntax used; the shell is performing standard input redirection from file /etc/passwd, which means standard input for the wc command will come from the file /etc/passwd
  • the wc command has no pathname arguments, which means it will read from standard input, opened by the shell
  • the wc command has no pathname arguments, which means it does not know the name of the file it is reading from stdin
  • the shell is the program that is opening the file /etc/passwd, not the wc command
  • the wc command doesn’t know the file name; only the shell knows the name
  • wc does not print any file name; it wasn’t given any file name
  • wc cannot know the file name, since it didn’t open the file

The above input redirection trick can be useful to get just the number of lines in a file, without also getting the file name as well:

$ echo "The number of lines is:" ; wc -l /etc/passwd
The number of lines is:
44 /etc/passwd                     # wrong - "44 /etc/passwd" is not a number

$ echo "The number of lines is:" ; wc -l </etc/passwd
The number of lines is:
44                                 # correct - just the number, no name

7.5 Don’t use redirection output file as redirection input fileIndexup to index

You already know that using an output redirection file as an input file name argument doesn’t work because the file is truncated by the output redirection. The same is true if you use the output redirection file name as an input redirection file name. Don’t do it:

$ cat  <myfile >myfile             # WRONG! myfile is truncated empty!
$ sort <myfile >myfile             # WRONG! myfile is truncated empty!
$ head <myfile >myfile             # WRONG! myfile is truncated empty!
$ tr   <myfile >myfile             # WRONG! myfile is truncated empty!

Given the above, why is myfile not left empty in the following case?

$ wc <myfile >myfile               # WRONG! myfile is trucated empty!
$ cat myfile                       # What is in the file "myfile" now?

Hint: What happens when wc counts nothing? Is there no output?

8 Redirection into programs using | (pipes)Indexup to index

Since the shell can redirect both the output of programs and the input of programs, it can connect (redirect) the output of one program directly into the input of another program without using any files in between. This output-to-input redirection is called piping and uses the “pipe” meta-character ‘|’ that is usually located above the backslash key ‘\’ on your keyboard. Using it looks like this:

$ date
Mon Feb 27 06:37:52 EST 2012
$ date | wc                       # wc counts the output of "date"
1   6   29

8.1 Three Rules for PipesIndexup to index

Here are three major rules that apply to useful pipes:

  1. Pipe redirection is done by the shell, first, before file redirection.
  2. The command on the left of the pipe must produce some standard output.
  3. The command on the right of the pipe must want to read standard input.

8.2 Using the pipe meta-character | between commandsIndexup to index

The shell meta-character | (“pipe”) is similar to semicolon ; in that it signals the start of another command on the command line. The pipe is different because the standard output (only stdout; not stderr) of the command on the immediate left of the pipe | is attached/connected (“piped”) to the standard input of the command on the immediate right of the pipe:

$ date
Mon Feb 27 06:37:52 EST 2012
$ date | wc                       # wc counts the output of "date"
1   6   29

$ echo hi
hi
$ echo hi | wc                    # wc counts the output of "echo hi"
1   1   3

(Note that the invisible newline character at the end of a line is also counted by wc in the above example.)

It is the shell that is redirecting the standard output of the command on the left into the standard input of the command on the right. As with all redirection, the shell does this redirection before it finds and runs any commands. The commands themselves do not see the redirection.

8.3 Piped output flows immediately without temporary filesIndexup to index

You can approximate some of the behaviour of a pipe between two commands by using an intermediate file for intermediate storage of the output of the first command before using the second command to read that output:

$ nl /etc/passwd >out  # save all the first command's standard output in a file
$ head <out            # use the file as standard input for the second command
[...first ten line-numberd lines display here...]

If you use an intermediate file instead of a pipe, the first command has to finish and put all its output into the intermediate file before the shell can find and run the next command to read the file containing the output of the first command. This is true even, as in the above example, the second command will only use the first few lines of output from the first command. Without using a pipe, the nl command has to line-number the entire password file before we can run the second command to see the first ten lines. Using a pipe, the output from nl flows into head until ten lines have been displayed, then both commands exit:

$ nl /etc/passwd | head         # use a pipe instead of a temporary file

If the first command takes a long time to run, using a temporary file means an unnecessary delay. Without using pipes:

$ find / -ls >out        # huge output of find has to finish first (slow)
$ less out               # now we can display the output of "find"

Using a pipe, the output from find can start to appear in less right away, before the find command has finished generating all the output:

$ find / -ls | less      # huge output of find goes directly into "less"

Pipes don’t need to wait for the first command to finish before the second command starts reading the output of the first. The output starts flowing immediately through the pipe because both commands are actually running simultaneously.

The pipe also requires no intermediate file to hold the output of the first command, and so as soon as the command on the left of the pipe starts producing standard output, it goes directly into the standard input of the command on the right.

If the command on the left of the pipe never finishes, the command on the right will read all the input that currently available and then continue to wait for more input, processing it as soon as it appears.

If the command on the left of the pipe does finish, the command on the right sees an EOF (end-of-file) on the pipe (its standard input). As with EOF from a file, EOF usually means that the command on the right will finish processing, produce its last output, and exit.

8.4 Pipe-splitting happens before file redirectionIndexup to index

As with semicolon meta-characters ;, the shell does the recognizing of pipe characters and splitting a command line into piped commands first, before doing file redirection. File redirection happens second (after pipe splitting), and if present, has precedence over pipe redirection. (The file redirection is done after pipe splitting, so it always wins, leaving nothing for the pipe.)

$ ls -l      | wc              # correct - output of ls goes into the pipe
2 11 57

$ ls -l >out | wc              # WRONG! - output of ls goes into the file
0 0 0                          # wc reads an empty pipe and outputs zeroes

This is why in the above pipe wc has no characters to count from ls:

  1. First, the shell splits the command line on the pipe, redirecting the output of the command on the left into the input of the command on the right, without knowing anything about what the commands might be.
  2. Next, the shell does the standard output file redirection on the ls command on the left of the pipe and changes the ls standard output away from the pipe into the file out.
  3. Finally, the shell finds and runs both commands simultaneously:
    • All the standard output from ls on the left goes into the file out; nothing is available to go into the pipe.
    • The wc command on the right of the pipe counts an empty input from the pipe and outputs zeroes: 0 0 0

Remember: Redirection can only go to one place, and file redirection always wins over pipes, because it is done after pipe splitting:

$ ls /bin >out                 # all output from ls goes into file "out"
$ ls /bin >out | wc            # WRONG! output goes into "out", not into pipe
0 0 0                          # wc counts an empty input from the pipe

8.5 Pipes: You can only redirect what you can seeIndexup to index

As with output redirection into files, you can only redirect into a pipe the standard output that you can see. Using redirection never creates output, even when using pipes:

$ ls /bin >out                 # all output from ls goes into file "out"
$ ls /bin >out | wc            # nothing goes into the pipe to "wc"
0 0 0                          # wc counts an empty input from the pipe

$ cp /etc/passwd x             # no output visible on standard output
$ cp /etc/passwd x | wc        # nothing goes into the pipe to "wc"
0 0 0                          # wc counts an empty input from the pipe

$ cd /tmp                      # no output visible on standard output
$ cd /tmp | wc                 # nothing goes into the pipe to "wc"
0 0 0                          # wc counts an empty input from the pipe

$ touch x ; rm x               # no output from rm on standard output
$ touch x ; rm x | wc          # nothing goes into the pipe to "wc"
0 0 0                          # wc counts an empty input from the pipe

You can only redirect output that you can see.

8.6 Redirecting stderr using 2>&1 with pipesIndexup to index

As with file redirection, pipes only redirect Standard Output (stdout) from commands, not Standard Error Output (stderr). Standard Error Output still goes directly to your screen; it does not go into a pipe:

$ ls /etc/passwd nosuchfile            # no redirection used
ls: cannot access nosuchfile: No such file or directory   # STDERR unit 2
/etc/passwd                                               # STDOUT unit 1

$ ls /etc/passwd nosuchfile | wc       # only stdout is redirected to "wc"
ls: cannot access nosuchfile: No such file or directory   # STDERR unit 2
1 1 12                                 # stdout went into the pipe to "wc"

You need the special syntax “2>&1” to redirect both stdout and stderr both into a pipe. Recall that “2>&1” means “redirect standard error to go to the same place as standard output”, so if standard output is already going into a pipe (and remember pipe splitting happens first), “2>&1” will send standard error into the pipe too:

$ ls /etc/passwd nosuchfile 2>&1 | wc  # both stdout and stderr redirected
2 10 68                                # wc counts both lines from pipe

The “2>&1” above happens after pipe-splitting; it works because pipe-splitting happens first and Standard Output is already redirected into the pipe. It sends Standard Error to the same place, i.e. into the pipe.

8.7 Alternate BASH shell pipe syntax |& instead of 2>&1Indexup to index

Some shells (including the BASH shell) allow a “|&” pipe syntax to redirect both stderr and stdout into the pipe. These are equivalent in the BASH shell:

$ ls /etc/passwd nosuchfile 2>&1 |  wc  # both stdout and stderr redirected (all Bourne-style shells)
$ ls /etc/passwd nosuchfile      |& wc  # both stdout and stderr redirected (BASH shell only)

Not all shells recognize the “|&” pipe syntax. (The /bin/sh shell on Ubuntu systems does not!) Don’t use the |& syntax inside a shell script; use the standard “2>&1” instead that works with all Bourne-style shells.

8.8 Using commands without pathnames as filters in pipesIndexup to index

Many Unix/Linux commands can be made to act as filters in pipelines. A filter command has no file name arguments and doesn’t open any files itself. The filter command reads its input lines from its standard input that is usually connected to a pipe on its left. The filter command writes it output to standard output, which might often be into another pipe and filter command on its right. The filter command has no file name arguments of its own to process.

With no file name arguments on the command line, filter commands read from standard input and write to standard output. The shell uses pipes to provide redirection for both standard input and standard output:

$ fgrep "/bin/sh" /etc/passwd | sort | head

The fgrep command above is reading from the filename argument /etc/passwd given on the command line. The output of the fgrep command always goes to standard output, which in the above command pipeline means the output goes into the pipe, not onto the screen.

The sort and head commands above have no file name arguments to read. Without file name arguments, each of the commands reads from its standard input, which is set up to be from the pipes created by the shell.

Both sort and head have no file name arguments and are acting as filter commands. (The fgrep command is technically not a filter – it is reading from the supplied pathname argument, not from standard input.)

Lines of input are sent through a pipe into the standard input of a filter command (such as sort and head, above). The filter command reads the lines from the pipe, filters them in some way, and sends the result into another pipe (or perhaps onto your screen, or into an output file with redirection, if the command is the last one in the pipeline).

Filter commands read from standard input (not from a file name) and they write to standard output.

8.9 Using successive filters in pipesIndexup to index

You can only redirect what you can see, so if you use a command to select some lines from a file and then send those lines into a second filter command via a pipe, remember that it is only the selected lines that are being read by that second filter command, not the original file.

Filter commands in pipelines read their input from other commands output, through pipes, they don’t read directly from files.

Below is an example that shows how a second fgrep in a pipeline searches for its text pattern in the output of the first fgrep, not in the original file.

In the example below, looking for the word mail in the file /etc/services finds five lines. Looking for the word file in the file /etc/services also finds five lines, but they are a different five lines. There are no lines in that file with both words in them:

$ fgrep 'mail' /etc/services
smtp            25/tcp          mail
re-mail-ck      50/tcp                  # Remote Mail Checking Protocol
re-mail-ck      50/udp
mailq           174/tcp                 # Mailer transport queue for Zmailer
mailq           174/udp

$ fgrep 'file' /etc/services
remotefs        556/tcp         rfs_server rfs  # Brunhoff remote filesystem
afs3-fileserver 7000/tcp        bbs             # file server itself
afs3-fileserver 7000/udp        bbs
supfilesrv      871/tcp                         # SUP server
supfiledbg      1127/tcp                        # SUP debugging

$ fgrep 'file' /etc/services | fgrep 'mail'     # pipeline gives NO OUTPUT !!!
$ fgrep 'mail' /etc/services | fgrep 'file'     # pipeline gives NO OUTPUT !!!

The two fgrep pipeline command lines at the end of the above example give no output, because none of the lines that contain the text string file also contain the text string mail, and vice-versa.

In each example pipeline above, the second fgrep is searching for its pattern in the output of the first fgrep, and the second pattern is not in any of the lines output by the first fgrep.

A line in the file would have to contain both text strings mail and file to pass through both fgrep commands in the pipe. The first fgrep selects lines with one text string and then the second fgrep reads the output of the first fgrep and looks for the second text string. Lines must contain both strings to be output.

No lines contain both strings in the example. There is no output.

If we change the second fgrep in the pipeline to select a word that is in the output of the first fgrep, it finds a line to output:

$ fgrep 'mail' /etc/services | fgrep 'Remote'
re-mail-ck      50/tcp                  # Remote Mail Checking Protocol

$ fgrep 'Remote' /etc/services | fgrep 'mail'
re-mail-ck      50/tcp                  # Remote Mail Checking Protocol

The output line is the only line from /etc/services that contains both the word mail and the word Remote in it. It doesn’t matter which word you search for first; the order of the searches doesn’t matter. In both cases, the output is the only line that has both words in it.

Successive filter commands can be used to select lines that contain multiple strings in a line.

8.9.1 Example 1: Count ssh break-in attempts in JanuaryIndexup to index

We are asked to count the number of times the machine rejected an SSH break-in attempt in the month of January. Here is a practical example showing the use of a filter command that reads from standard input and writes to standard output.

We need to look for lines in the system log file auth.log that contain both the string 'refused connect' and the date string for January.

Here is a sample auth.log input file that we will use in the following example (484 lines): auth.log This sample file was taken from an actual /var/log/auth.log file.

First, we need to extract from the log file only the lines that indicate a rejected break-in attempt. Since there could be thousands of lines of output in a real system log file, we always pipe the large output into a command head that limits the output on our screen to only ten lines:

$ fgrep 'refused connect' auth.log | head
Sep  2 02:51:01 refused connect from 61.174.49.108 (61.174.49.108)
Sep  4 09:05:00 refused connect from 193.107.17.72 (193.107.17.72)
Sep  5 03:27:11 refused connect from 61.144.43.235 (61.144.43.235)
Sep  6 05:53:51 refused connect from 122.225.109.208 (122.225.109.208)
Sep  8 06:28:53 refused connect from 116.10.191.180 (116.10.191.180)
Sep 10 15:30:18 refused connect from 122.225.109.105 (122.225.109.105)
Sep 22 12:11:22 refused connect from 211.143.243.35 (211.143.243.35)
Sep 30 04:11:02 refused connect from 220.177.198.39 (220.177.198.39)
Oct  3 01:09:02 refused connect from 61.174.51.235 (61.174.51.235)
Oct  3 19:54:33 refused connect from 117.21.173.35 (117.21.173.35)

$ fgrep 'refused connect' auth.log | wc
100  800  7055

$ fgrep -c 'refused connect' auth.log
100

Looking at the output, we see that every line has the month abbreviation at the start of the line. We only want January dates, so we use the date string 'Jan ' in another fgrep filter to further restrict the output to only lines containing both 'refused connect' and 'Jan '. (Note the trailing blank in the date string.)

$ fgrep 'refused connect' auth.log | fgrep 'Jan ' | head
Jan  2 15:43:42 refused connect from 221.235.188.212 (221.235.188.212)
Jan  2 15:46:46 refused connect from 221.235.188.212 (221.235.188.212)
Jan  2 15:49:48 refused connect from 221.235.188.212 (221.235.188.212)
[... etc ...]

$ fgrep 'refused connect' auth.log | fgrep 'Jan ' | wc
26  208  1948

$ fgrep 'refused connect' auth.log | fgrep -c 'Jan '
26

Below are the functions of the two commands in the above pipeline. The second fgrep command is acting as a filter command, reading Standard Input from a pipe and writing output to Standard Output (to the screen).

  1. The first fgrep command selects the lines containing the text string 'refused connect' inside the auth.log file. The output of this first command (only lines containing the 'refused connect’ string) goes into the first pipe, not onto the screen.
  2. The second fgrep reads the output of the first fgrep from the pipe and only selects (and counts, using the -c option) lines that also contain the date pattern for January 'Jan ' (with a trailing blank). The lines being selected and counted have to contain both the string 'refused connect' from the first fgrep and the string 'Jan ' from the second fgrep. The output of this second fgrep (a count of lines containing both strings: 26) displays on the screen.

When filtering output by date, always look in the file you are filtering to see what format date is being used on each line. Use the date format found in the file.

8.9.2 Example 2: Count shells in the password fileIndexup to index

The last (seventh) colon-separated field in the system password file /etc/passwd contains the name of the login shell given to the user when the user logs in:

$ head /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
[... etc ...]

(A blank/empty field means use the default shell, which on Linux systems is usually /bin/sh that is often a link to /bin/bash.)

In this example we must “Count the number of each kind of shell in /etc/passwd and display the top four results sorted in descending numeric order.”

We will build up the answer iteratively using a pipeline:

  • 2-A. Extract just the shell field from each line in the password file.
  • 2-B. Count the identical shells.
  • 2-C. Display the top four most used shells in descending order of use.

Problem 2-A: Extract just the shell field from each line.

Solution 2-A: Use the cut command that extracts from input lines fields separated by a delimiter. Since there could be thousands of lines of output, we pipe the large output into a command that limits the output on our screen to ten lines:

$ cut -d : -f 7 /etc/passwd | head
/bin/bash
/usr/sbin/nologin
/bin/sync
[... etc ...]

We now have a list of shells, in the order that they appear in the password file. On to the next problem: 2-B.

Problem 2-B: Count the identical shells.

Solution 2-B: The uniq command can count adjacent lines in an input file (or from standard input) using the -c option, but the lines have to be adjacent. We can sort the lines to make all the shell lines adjacent so that they can be counted, then add uniq -c to count the sorted lines. First, we add the sort to the pipeline, check the output, then we add the uniq -c to the pipeline:

$ cut -d : -f 7 /etc/passwd | sort | head
/bin/bash
/bin/bash
/bin/bash
[... etc ...]

$ cut -d : -f 7 /etc/passwd | sort | uniq -c
   1170 /bin/bash
     23 /bin/false
      1 /bin/sh
      1 /bin/sync
     16 /usr/sbin/nologin
    697 /usr/sbin/nologin_lock.sh

The output of uniq -c shows the counts of each shell, but the counts are not sorted in descending order, and there are more than four lines of output. On to the next problem: 2-C.

Problem 2-C: Display the top four most used shells in descending order of use.

Solution 2-C: First we add another sort to the pipeline, using options to sort the count numbers numerically and in descending (reverse) order, then we add a final head command to limit the output to four lines:

$ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr
   1170 /bin/bash
    697 /usr/sbin/nologin_lock.sh
     23 /bin/false
     16 /usr/sbin/nologin
      1 /bin/sync
      1 /bin/sh

$ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr | head -n 4
   1170 /bin/bash
    697 /usr/sbin/nologin_lock.sh
     23 /bin/false
     16 /usr/sbin/nologin

Summary:

$ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr | head -n 4
  1. The cut command picks out colon-delimited field 7 in each line in the password file and sends just those fields (the shell name) into the pipe.
  2. The sort command reads the shell names from the pipe and puts all the shell names in sorted ascending order and sends the sorted names into another pipe.
  3. The uniq command reads the sorted names from the pipe and counts the number of adjacent names. The output for each unique name is the count followed by the name. The output goes into another pipe.
  4. The sort command reads the lines containing the count and the shell name and it sorts the lines numerically (using the count field) and in reverse. Those sorted lines go into another pipe.
  5. The head command reads the sorted lines from the pipe and selects only the first four lines. Only those four lines display on the screen.

8.9.3 Example 3: Count IP addresses used in SSH break-in attempts in JanuaryIndexup to index

In this Example showing the use of multiple filter commands, we use filter commands to find the unique IP addresses used in SSH break-in attempts in January and then count how many times each IP address was used. This Example uses features of the previous two Examples.

As in the first Example above, we need to look for lines in the system log file auth.log that contain both the string 'refused connect' and the date string 'Jan '. Instead of counting all of them together, we need to extract the IP address from each line and count the number of times each IP address appears. Counting occurrences was a feature of the second Example, above.

Here is the solution, using features of both previous Examples:

$ fgrep 'refused connect' auth.log | fgrep 'Jan ' \
  | awk '{print $NF}' \
  | sort | uniq -c | sort -nr

Below are the functions of the six commands in the above pipeline. Five of the commands are acting as filter commands, reading Standard Input from a pipe and writing output to Standard Output (often, to another pipe, except for the last command that writes on the screen).

This example uses the same sample auth.log input file that we used earlier (484 lines): auth.log

  1. The first fgrep command selects the lines containing the text string 'refused connect' inside the auth.log file. The output of this first command (only lines containing the 'refused connect' string) goes into the first pipe, not onto the screen.
  2. The second fgrep reads the output of the first fgrep from the pipe and only selects lines that also contain the date pattern for January 'Jan '. The lines being selected have to contain both the string 'refused connect' from the first fgrep and the string 'Jan ' from the second fgrep. The output of this second fgrep (lines containing both strings) goes into another pipe.
  3. The awk command reads the selected lines from the pipe. It displays just the last field (NF) on each line, which happens to be the IP address used by the attacker. The awk output (a list of IP addresses, one per line) goes into another pipe. (The list of addresses are not in sorted order; they are in whatever order they appear in the input file.)
  4. The first (leftmost) sort command reads lines of IP addresses from the pipe. It sorts all the IP addresses together so that uniq can count them, and sends the sort output (the sorted lines) into another pipe.
  5. The uniq -c command reads the sorted list of IP addresses from the pipe. It counts how many adjacent addresses are the same and sends the uniq output (lines with the count followed by the IP address with that count) into another pipe.
  6. The sort -nr command reads the lines with the counts and IP addresses from the pipe. It sorts numerically and in reverse (descending) order the lines containing the leading count numbers and sends the second sort output (sorted lines, each containing a count and an IP address) onto the screen.

Note the use of two sort commands. The first sort takes an unordered list of IP addresses and sorts them so that all the same IP addresses are together, so that the uniq command can count them. (The uniq command can only count adjacent lines in an input stream.) Without the first sort, the IP addresses wouldn’t be all together and wouldn’t be counted correctly by uniq. The second sort command sorts the output of uniq numerically and in reverse and puts the IP addresses with the largest counts first. Both sort commands are needed.

8.9.4 Example 4: Select a range of lines from a fileIndexup to index

Problem: Display only lines 6-10 of the password file.

Solution: Extract the first 10 lines of the file, and from those 10 lines extract just the last five lines, which are lines 6-10. You can use the nl command to add line numbers to the file to confirm your solution.

$ head /etc/passwd | tail -n 5
$ nl /etc/passwd | head | tail -n 5

Problem: Display only the second-to-last line of the password file.

Solution: Extract the last two lines of the file, and from those last two lines extract just the first line, which is the second-to-last line.

$ tail -n 2 /etc/passwd | head -n 1

8.9.5 Example 5: Select large filesIndexup to index

Problem: Which five (non-hidden) files in current directory are largest:

$ ls -s | sort -nr | head -n 5

The -s option outputs the size of the file in blocks as a number at the start of every line, which makes it easy to sort the lines numerically.

Here is another answer that uses some sort options to pick which field to sort:

$ ls -l | sort -k 5,5nr | head -n 5

If we want to sort by file size in bytes, bytes is the fifth field in the output of ls -l. We have to use some options to sort that tell it to sort using the fifth field of every line. The above sort command is sorting by the fifth field, numerically, in reverse.

8.10 Misuse of pipesIndexup to index

There are many ways to misuse pipes. Here are some common ones.

8.10.1 Give file names to commands where possibleIndexup to index

If a command does read from file names supplied on the command line, it is more efficient to let it open its own file name than to use cat to open the file and feed the data to the command on standard input. (There is less data copying done!)

Do not do this (wasteful of processes and I/O and flags you as a novice):

$ cat /etc/passwd | head                # DO NOT DO THIS - INEFFICIENT
$ cat /etc/passwd | sort                # DO NOT DO THIS - INEFFICIENT
$ cat /etc/passwd | fgrep 'root:'       # DO NOT DO THIS - INEFFICIENT

Do this: Give the file name(s) directly to the commands, like this:

$ head /etc/passwd
$ sort /etc/passwd
$ fgrep 'root:' /etc/passwd

Let commands open their own files; don’t feed them with cat and unnecessary pipes.

8.10.2 Commands with file arguments never read data from Standard InputIndexup to index

If a Unix/Linux command that can open and read the contents of pathnames is not given any pathnames to open, it usually reads input lines from standard input (stdin) instead:

$ wc /etc/passwd    # wc reads /etc/passwd, ignores stdin and your keyboard
$ wc                # without a file name, wc reads stdin (your keyboard)

If the command is given a pathname, it reads from the pathname and always ignores standard input, even if you try to send it something:

$ date | wc foo     # WRONG! wc opens and reads file foo; wc ignores stdin

The above applies to every command that reads file content, e.g.:

$ date | head foo   # WRONG! head opens and reads file foo; head ignores stdin
$ date | less foo   # WRONG! less opens and reads file foo; less ignores stdin

If you want a command to read stdin, you cannot give it any file name arguments. Commands with file name arguments ignore standard input; they should not be used on the right side of a pipe.

Commands that are ignoring standard input (because they are opening and reading from pathnames on the command line) will always ignore standard input, no matter what silly things you try to send them on standard input:

$ echo hi | head /etc/passwd   # WRONG: head has a pathname and ignores stdin
$ echo hi | tail /etc/group    # WRONG: tail has a pathname and ignores stdin
$ echo hi | wc .vimrc          # WRONG:   wc has a pathname and ignores stdin
$ sort a | cat b               # WRONG:  cat has a pathname and ignores stdin
$ cat a | sort b               # WRONG: sort has a pathname and ignores stdin

Standard input is thrown away if it is sent to a command that ignores it. The shell cannot make a command read stdin; it’s up to the command. The command must want to read standard input, and it will only want to read standard input if you leave off all the file names.

8.10.3 Some commands never read data from Standard InputIndexup to index

Commands that do not open and process the contents of files usually ignore standard input, no matter what silly things you try to send them on standard input. All these commands will never read standard input:

$ echo hi | ls          # NO: ls doesn't open files - always ignores stdin
$ echo hi | pwd         # NO: pwd doesn't open files - always ignores stdin
$ echo hi | cd          # NO: cd doesn't open files - always ignores stdin
$ echo hi | date        # NO: date doesn't open files - always ignores stdin
$ echo hi | chmod +x .  # NO: chmod doesn't open files - always ignores stdin
$ echo hi | rm foo      # NO: rm doesn't open files - always ignores stdin
$ echo hi | rmdir dir   # NO: rmdir doesn't open files - always ignores stdin
$ echo hi | echo me     # NO: echo doesn't open files - always ignores stdin
$ echo hi | mv a b      # NO: mv doesn't open files - always ignores stdin
$ echo hi | ln a b      # NO: ln doesn't open files - always ignores stdin

Some commands that open and read file contents only operate on file name arguments and never read stdin:

$ echo hi | cp a b      # NO: cp opens arguments - always ignores stdin

Standard input is thrown away if it is sent to a command that ignores it. The shell cannot make a command read stdin; it’s up to the command.

Commands that might read standard input will do so only if no file name arguments are given on the command line. The presence of any file arguments will cause the command to ignore standard input and process the file(s) instead, and that means they cannot be used on the right side of a pipe to read standard input. File name arguments always win over standard input.

8.10.4 Do not use pathnames on filter commands in pipelinesIndexup to index

Remember: If a file name is given to a command on the command line, the command ignores standard input and only operates on the file name.

The very long sequence of pipes below is pointless – the last (rightmost) command head has a pathname argument and it will open and read it, ignoring all the standard input coming from all the pipes on the left:

$ fgrep "/bin/sh" /etc/passwd | sort | head /etc/passwd    # WRONG!

The head command is ignoring the standard input coming from the pipe and is reading directly from its /etc/passwd filename argument. The fgrep and sort commands are doing a lot of work for nothing, since head is not reading the output of sort coming down the pipe. The head command is reading from the supplied file name argument /etc/passwd instead. File names take precedence over standard input.

The above long-but-mal-formed pipeline is equivalent to this (same output):

$ head /etc/passwd

Don’t make the above mistake. Filter commands must not have file name arguments; they must read standard input from the pipe.

If you give a command a file to process, it will ignore standard input, and so a command with a file name must not be used on the right side of any pipe.

8.10.5 Don’t use redirection output file as input file anywhere in pipelineIndexup to index

The following command line redirection is faulty (an input file on the left is also used as and output file on the right); however, it sometimes works for small files:

$ cat foo bar | tr 'a' 'b' | fgrep "lala" | sort | head >foo   # WRONG!

There is a critical race between the first cat command trying to read the data out of file foo before the shell truncates it to zero when launching the head command at the right end of the pipeline. Depending on the system load and the size of the file, cat may or may not get out all the data before the foo file is truncated or altered by the shell in the redirection at the end of the pipeline. Don’t do this.

Don’t depend on long pipelines saving you from bad redirection! Never redirect output into a file that is being used as input in the same command or anywhere in the command pipeline.

8.11 Summary: Three Rules for PipesIndexup to index

  1. Pipe redirection is done by the shell, first, before file redirection.
  2. The command on the left of the pipe must produce some standard output.
  3. The command on the right of the pipe must want to read standard input.

Never use a redirection output file as an input file anywhere in a pipeline!

9 Unique STDIN and STDOUTIndexup to index

There is only one standard input and one standard output for each command. Each can only be redirected to one other place. You cannot redirect standard input from two different places, nor can you redirect standard output into two different places.

The Bourne shells (including BASH) do not warn you that you are trying to redirect the input of a command from two or more different places (and that only one of the redirections will work – the others will be ignored):

$ wc <a <b <c <d <e
$ date | wc <file

The Bourne shells (including BASH) do not warn you that you are trying to redirect the output of a command to two or more different places and that only one of the redirections will work – the others will be ignored:

$ date >a >b >c >d >e
$ date >out | wc
0 0 0

Some shells (including the “C” shells, but not the Bourne shells) will try to warn you about silly shell redirection mistakes:

csh% date <a <b <c <d
Ambiguous input redirect.

csh% date | cat <file
Ambiguous input redirect.

csh% date >a >b >c
Ambiguous output redirect.

csh% date >a | wc
Ambiguous output redirect.

The C shells tell you that you can’t redirect stdin or stdout to/from more than one place at the same time. Bourne shells do not tell you – they simply ignore the “extra” redirections and do only the last one of each.

10 Throwing away input/output using /dev/nullIndexup to index

There is a special file on every Unix/Linux system into which you can redirect output that you don’t want to keep or see: /dev/null

The following command generates some error output we don’t like to see:

$ cat * >/tmp/out
cat: course_outlines: Is a directory           # errors print on STDERR
cat: jclnotes: Is a directory                  # errors print on STDERR
cat: labs: Is a directory                      # errors print on STDERR
cat: notes: Is a directory                     # errors print on STDERR

We can throw away the errors (stderr, unit 2) into /dev/null:

$ cat * >/tmp/out 2>/dev/null

The file /dev/null never fills up; it just eats and throws away output.

System Administrators: Do not get in the habit of throwing away all the error output of commands! You will also throw away legitimate error messages and nobody will know that these commands are failing.

When used as an input pathname, /dev/null always appears to be empty:

$ wc /dev/null
0 0 0 /dev/null

You can use /dev/null to provide “no input” to a program that would normally read your keyboard:

$ mail -s "Test message" user@example.com </dev/null
$

The mail command reads from standard input; it would normally read your keyboard as the message to send. Redirecting input from /dev/null ensures that there is nothing to read and mail will send a message with no message body and only a subject line.

11 You can only redirect what you can seeIndexup to index

This is worth repeating:

People are often misled into thinking that adding redirection to a command will create output that wasn’t there before the redirection was added.

It isn’t so. You can only redirect what you can see.

$ cp /etc/passwd x             # no output visible on standard output
$ cp /etc/passwd x >out        # file "out" is created empty
$ cp /etc/passwd x | wc        # word count counts nothing; output is zeroes

If there was no output on your screen before you added redirection, adding redirection will not create any. You will redirect nothing; no output.

Before you add redirection to a command, look at the output on your screen. If there is no output visible on your screen, why are you bothering to redirect it?

You can only redirect what you can see.

12 tr – a command that only reads Standard InputIndexup to index

The tr command is one of the few (only?) commands that reads standard input and does not allow any pathnames on the command line – you must always supply input to tr on standard input, either through file input redirection or through a pipe:

$ tr 'abc' 'ABC' <in >out                 # correct for a single file

$ cat file1 file2 | tr 'abc' 'ABC' >out   # correct for multiple files

$ tr 'abc' 'ABC' file1 file2 >out         # *** WRONG - ERROR ***
tr: too many arguments

The tr command must always use some kind of Input Redirection to read data.

No version of tr accepts pathnames on the command line. All versions of tr only read standard input.

12.1 Don’t make input and output file names the same to trIndexup to index

Don’t make the mistake of using a tr output redirection file as its redirection input file. (This doesn’t work for any command.) See Don’t use redirection output file as redirection input file, above.

12.2 Different requirements in character lists on System VIndexup to index

System V Unix versions of tr demand that character lists appear inside square brackets, e.g.:   tr '[abc]' '[ABC]'

Berkeley Unix and Linux do not need or use the brackets around the lists.

12.3 Example using trIndexup to index

Problem: convert some selected lower-case letters to upper-case from the “who” command:

$ who | tr 'abc' 'ABC'

Shell question: Are the single quotes required around the two arguments? (Are there any special characters in the arguments that need protection?)

12.4 Don’t use character ranges with trIndexup to index

Using POSIX character classes such as [:lower:] and [:upper:], you can use tr to convert a lower-case file of text into upper-case.

Warning: Do not use alphabetic character ranges such as a-z or A-Z in tr or any other commands, since the ranges often contain unexpected characters in the character set collating sequence. For full details, see Internationalization and Collating

13 Do not redirect full-screen programs such as VIMIndexup to index

Full-screen keyboard interactive programs such as the VIM text editor do not behave nicely if you redirect their input or output – they really want to be talking to your keyboard and screen; don’t redirect them or try to run them in the background using &. You can hang your terminal if you try.

If you accidentally redirect the input or output of something such as vim, switch screens or log in a second time using a different terminal and find and kill the hung process.

14 Redirect only stderr into a pipe (ADVANCED!)Indexup to index

It’s easy to redirect only stdout into a pipe; that’s just the way pipes work. In this example below, only stdout is sent into the line numbering program. The error message sent to stderr bypasses the redirection and goes directly onto the screen:

$ ls /etc/passwd nosuchfile | nl
ls: cannot access nosuchfile: No such file or directory
     1  /etc/passwd

It’s also easy to redirect both stdout and stderr into a pipe by sending stderr to the same place as stdout:

$ ls /etc/passwd nosuchfile 2>&1 | nl
     1  ls: cannot access nosuchfile: No such file or directory
     2  /etc/passwd

How do you redirect only stderr into the pipe, and let stdout bypass the pipe and go directly to the screen? This is tricky; on the left of the pipe you have to swap stdout (attached to the pipe) and stderr (attached to the screen). You need a temporary output unit (I use “3”, below) to record and remember where the screen is (redirect unit 3 to the same place as unit 2: “3>&2”), then redirect stderr into the pipe (redirect unit 2 to the same place as unit 1: “2>&1”), then redirect stdout to the screen (redirect unit 1 to the same place as unit 3: “1>&3”):

$ ls /etc/passwd nosuchfile 3>&2 2>&1 1>&3 | nl
     1  ls: cannot access nosuchfile: No such file or directory
/etc/passwd

You seldom need to do this advanced trickery, even inside scripts. But you can do it!

Author: 
| Ian! D. Allen, BA, MMath  -  idallen@idallen.ca  -  Ottawa, Ontario, Canada
| Home Page: http://idallen.com/   Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom:  http://eff.org/  and have fun:  http://fools.ca/

Plain Text - plain text version of this page in Pandoc Markdown format

Campaign for non-browser-specific HTML   Valid XHTML 1.0 Transitional   Valid CSS!   Creative Commons by nc sa 3.0   Hacker Ideals Emblem   Author Ian! D. Allen