% Unix/Linux Shell I/O Redirection (including stdin, stdout, stderr, and Pipes) % Ian! D. Allen -- -- [www.idallen.com] % Fall 2015 - September to December 2015 - Updated 2019-01-29 15:21 EST - [Course Home Page] - [Course Outline] - [All Weeks] - [Plain Text] Introduction to redirection -- changing Input/Output ==================================================== Shell redirection is a powerful way to change from where commands read input and to where commands send output. It applies to every command run by the shell, so once you learn the shell syntax of how it works, it works on all the commands you can type into the shell. > If you want to hear a simple explanation of the power of shell redirection > directly from the bearded men who invented it back in the 1970s, watch > either of these historic 1982 videos on **The UNIX Operating System**: > > - [UNIX: Making Computers More Productive] (27 minutes). > - [UNIX: Making Computers Easier To Use] (23 minutes). Redirection of input -- redirection of output --------------------------------------------- You can redirect the **input** of a command away from your keyboard, and you can redirect the **output** of a command away from your screen. The redirection can be to or from a **file** using the shell meta-characters '`>`' or '`<`' (angle brackets) or it can be to or from a **program** using the shell meta character '`|`' (the *pipe* symbol). Output Redirection -- Standard Output and Standard Error ======================================================== Commands produce two kinds of output -- normal output and error message output -- and the shell can redirect each of these separately. Normal Output: Standard Output -- `stdout` ------------------------------------------ In the process of output redirection, the shell (not the command) redirects (diverts) most command output that would normally appear on the screen to some other place. The redirection can be either into a file using a file output redirect meta-character '`>`', or into the input of another command by separating the commands using a pipe meta-character '`|`': $ echo hi >outfile # redirect echo output into an output file "outfile" $ echo hi | wc # redirect echo output into the program "wc" The normal command output that appears on your screen is called the *standard output* of the command, abbreviated **stdout**. It is the expected output that a command generates if everything works. Anything written to **stdout** can be redirected as shown in the above examples. Error Messages: Standard Error -- `stderr` ------------------------------------------ If something goes wrong, commands produce error messages. These error message are sent to what is called the *standard error* output of the command, abbreviated **stderr**. Error messages are almost always sent to your screen, even if you redirect **stdout** somewhere else: $ ls nosuchfile >outfile ls: cannot access nosuchfile: No such file or directory Standard error output is not subject to simple output redirection, but with extra syntax the shell can redirect it, too, into a file: $ ls nosuchfile >outfile 2>errors > In programming terms, **stdout** and **stderr** outputs are written to > different I/O units, but both end up on your screen, by default. The shell > can redirect them separately. More on that later. Four Rules for Output Redirection --------------------------------- You need to remember these four things about output redirection: 1. Redirection is done for the command by the shell, first, before finding and running the command; the shell has no idea if the command exists or will produce any output. The shell performs the redirection *before* it finds or runs the command. 2. The shell can only redirect output that is produced by the command. **You can only redirect the output that you can see.** If there is no visible output on your screen without redirection, adding redirection won't create any. *Re-read this a few times and remember it.* Before you redirect output into a file or a pipe, **look at what output is on your screen** with no redirection added. If what is appearing on your screen isn't right wihout redirection, adding redirection won't make it right. 3. Redirection can only go to *one* place. You can't use multiple redirections to send output to multiple places. (See the `tee` command for a way to send output into multiple files.) 4. By default, error messages (called "standard error" or **stderr**) are not redirected; only "normal output" (called "standard output" or **stdout**) is redirected. (You can also redirect *stderr* with more shell syntax; see below.) Summary of Rules for Output Redirection: > 1. Redirection is done first, before running the command. > 2. You can only redirect output that you can (could) see. > 3. Redirection goes only to one place. > 4. Only standard output is redirected, by default. We will discuss each of these rules in detail, below. Output redirection into files using `>` ======================================= The shell meta-character **right angle bracket** '`>`' signals that the next word on the command line is an output file (not a program) that the shell should create or truncate (set to empty) and make ready to receive the *standard output* of a command. *Standard output* is the normal output that appears on your screen when you run the command: $ echo hi # normal output appears on screen; no redirection hi $ echo hi >outfile # output goes into file "outfile", not on screen $ cat outfile # display the contents of outfile on screen hi The space before the angle bracket '`>`' is *sometimes* required, so always include it and you won't go wrong. The space after the '`>`' is optional and most people omit it: $ echo hi >outfile # this is the most common usage $ echo hi > outfile # space is optional after the > $ echo hi>outfile # AVOID THIS - always put a space before the > Putting the space in front of '`>`' makes the command line easier to read. Output redirection means the shell truncates (make empty) an existing file. An existing file will have its contents removed: $ echo hello >out # "hello" goes into the file "out" $ cat out hello $ nosuchcommandxxx >out # "out" is made empty again sh: nosuchcommandxxx: command not found $ cat out # command failed -- "out" is still empty $ The shell always makes the output file empty before trying to run the command. If the command fails or doesn't produce any standard output, the redirection output file remains empty. Rule 1a: Redirection is done by the shell, not by the command ------------------------------------------------------------- It is the shell that does all redirection, not the command being run. For output redirection into files, the shell creates or truncates the output file and sets up the redirection, not the command being redirected. The command knows nothing about the redirection -- the redirection syntax is **removed from the command line** by the shell before the command is found and executed: $ echo one two three # echo has three command line arguments one two three $ echo one two three >out # echo still has three arguments $ cat out one two three Shells handle redirection before they go looking for the command name to run. Indeed, redirection happens even if the command is not found or if there is no command at all: $ >out # file "out" is created or truncated empty $ wc out 0 0 0 out # shell created an empty file $ nosuchcommandxxx >out # file "out" is created empty sh: nosuchcommandxxx: command not found $ wc out 0 0 0 out # shell created an empty file The shell creates or truncates the file "`out`" empty, and then it tries to find and run the nonexistent command and fails. The empty file that was created by the shell remains. Rule 1b: Redirection file creation and truncation happen first -------------------------------------------------------------- The shell does the output file creation and truncation *before* it finds or runs the command. This can affect the output of commands that operate on files: $ mkdir empty $ cd empty $ ls -l total 0 # no files found $ ls -l >out # shell creates "out" first $ cat out # display output total 0 -rw-r--r-- 1 idallen idallen 0 Sep 21 06:02 out The shell creates the file `out` before it runs the `ls` command, so the `ls` command finds the new output file and the output of `ls` with redirection is different than the output of `ls` without. Because the shell truncates the redirection output file *before* the shell looks for and runs the command, you cannot use output redirection files as input to the same command: $ cp /etc/passwd a $ sort a >a # WRONG WRONG WRONG! File "a" is truncated to be empty! $ cat a # shows empty file $ Above, the shell makes the file `a` empty before it runs the `sort` command. The `sort` command sorts an empty file, and the empty output goes into the file `a`, which remains empty. **Never use redirection output files as input files!** Rule 2: You can only redirect what you can see ---------------------------------------------- Redirection does not create output that wasn't there without redirection. If you don't see any output from a command without redirection, adding redirection to the command won't cause the command to create output. Adding redirection to a command that generates no output will simply have the shell redirect no output and create an empty file: $ cp /etc/passwd x # no output visible on standard output $ cp /etc/passwd x >out # file "out" is created empty $ cd /tmp # no output visible on standard output $ cd /tmp >out # file "out" is created empty $ touch x ; rm x # no output from rm on standard output $ touch x ; rm x >out # file "out" is created empty You can only redirect the output that you can *see*! Run a command without redirection and observe what output the command produces. If you don't see any output, adding redirection won't create any. Rule 3: Output redirection only goes to one place ------------------------------------------------- Output redirection can only go to *one* place, so adding multiple output redirection to a command line doesn't do what you might think: $ date >a >b >c # output goes into file c; a and b are empty The right-most output file redirection always gets all the output and the other output redirection files to the left are created empty by the shell. > If you redirect output into both a file and a pipe, the file gets all the > output and the pipe gets nothing: > > $ date >a | cat # all output goes into file "a"; cat shows nothing > > See the following section on redirection into programs using "`|`" pipes. Rule 4: Error messages do not get redirected, by default -------------------------------------------------------- By default, the shell only redirects the standard output (normal output) of commands. Error messages from commands are not redirected into the output file and still appear directly on your screen: $ ls -l nosuchfile /etc/passwd ls: cannot access nosuchfile: No such file or directory -rw-r--r-- 1 root root 2542 Jun 24 2014 /etc/passwd $ ls -l nosuchfile /etc/passwd >out # standard output goes into "out" ls: cannot access nosuchfile: No such file or directory $ cat out # show contents of "out" -rw-r--r-- 1 root root 2542 Jun 24 2014 /etc/passwd Error messages continue to appear on your screen, even with redirection, so that you know the command you are using had an error. Output file redirection syntax can go anywhere ---------------------------------------------- Shells don't care where in the command line you put the output file redirection. No matter where in the command line you type it, it has the same effect, though most people put it at the end, as in the first example below: $ echo hi there mom >file # echo has three arguments $ echo hi there >file mom # echo has three arguments $ echo hi >file there mom # echo has three arguments $ echo >file hi there mom # echo has three arguments $ >file echo hi there mom # echo has three arguments All the command lines above are equivalent and create the same output file. The redirection syntax is processed by and removed by the shell before the command runs. The redirection syntax is never counted as arguments to a command. In every case above the `echo` command sees exactly three command line arguments and all three arguments "`hi`", "`there`", and "`mom`" are all redirected into the output "`file`". The file redirection is found and done first by the shell, then the redirection syntax is removed from the command line before the command is called. The command actually being run doesn't see any part of the redirection syntax; the number of arguments is not affected. Output redirection: Appending to files using `>>` ------------------------------------------------- You can append (add) output to the end of a file, instead of truncating it to empty, using a double right angle bracket "`>>`": $ echo first line >file # file is truncated using single > $ echo second line >>file # append second line to end of file using >> $ echo third line >>file # append third line to end of file using >> $ cat file # display what is in the file first line second line third line The redirection output file is always truncated (set to empty) before the command runs, unless you use the append syntax "`>>`" instead of "`>`". Summary of Standard Output redirection into files ------------------------------------------------- Redirection is done by the shell, first, before it finds and runs the command. Things happen in this order on the command line: 1. All redirection is found and done by the shell, no matter where the redirection is typed in the command line. All redirection output files are created or truncated to empty (except when appending). This redirection and truncation happens even if no command executes. (If the redirection fails, the shell does not run any command.) **The output file is created or truncated to zero size *before* the command runs.** 2. The shell removes all the redirection syntax from the command line. The command will have no idea that its output is being redirected. 3. The command (if any) executes and may produce output. The shell executes the command *after* doing all the redirection. 4. The output from the command (if any) happens, and it goes into the indicated redirection output file. This happens last. If the command produces no output, the output file will be empty. Adding redirection never creates output. Standard Error Output (error messages) does not get redirected; it goes onto your screen. Exercises in shell Standard Output redirection ---------------------------------------------- Explain this sequence of commands: $ mkdir empty $ cd empty $ cp a b cp: cannot stat 'a': No such file or directory $ cp a b >a $ # why is there no error message from cp this time? what is in file a ? Explain this sequence of commands: $ date >a $ cat a Wed Feb 8 03:01:21 EST 2012 $ cp a b $ cat b Wed Feb 8 03:01:21 EST 2012 $ cp a b >a $ cat b $ # why is file b empty? what is in file a ? Explain this sequence of commands: $ rm rm: missing operand $ touch file $ rm >file rm: missing operand # why doesn't rm remove "file"? $ rm nosuchfile rm: cannot remove 'nosuchfile': No such file or directory $ rm nosuchfile >nosuchfile $ # why is there no rm error message here? Is the file `nosuchfile` in existence after the last command, above? How many words are in each of these five output files? $ echo one two three >out1 $ echo one two >out2 three $ echo one >out3 two three $ echo >out4 one two three $ >out5 echo one two three What is in each file `a`, `b`, `c`, after this command line? $ echo hi >a >b >c Standard Output ("stdout") and Standard Error ("stderr") ======================================================== Most commands have two separate output "streams" or "units", numbered 1 and 2: 1. **Standard Output** or **stdout**: the normal output of a command (stream 1) 2. **Standard Error Output** or **stderr**: the error and warning output (stream2) The **stdout** (stream 1) and **stderr** (stream 2) outputs mix together on your screen. They look the same on the screen, so you can't tell by looking at your screen what comes out of a program on **stdout** and what comes out of a program on **stderr**. To show a simple example of **stdout** and **stderr** both appearing on your screen, use the `ls` command and give it one file name that exists and one name that does not exist (and thus causes an error message to be displayed on standard error output): $ ls /etc/passwd nosuchfile # no redirection used ls: nosuchfile: No such file or directory # this on screen from stderr /etc/passwd # this on screen from stdout Both output streams look the same on your screen. The **stderr** (error message) output often appears first, before **stdout**, due to internal I/O buffers used by commands for **stdout**. The default type of output redirection (whether redirecting to files or to programs using pipes) redirects *only* standard output and lets standard error go, untouched, to your screen: $ ls /etc/passwd nosuchfile >out # shell redirects only stdout ls: nosuchfile: No such file or directory # only stderr appears on screen $ cat out /etc/passwd # stdout went into the file > Programming information (for programmers only): > > The Standard Output is the Unit 1 output from `printf` and `cout` > statements in C and C++ programs, and from `System.print` and > `System.println` in Java. > > The Standard Error Output is the Unit 2 output from `fprintf(stderr` and > `cerr` statements in C and C++ programs, and from `System.err.print` and > `System.err.println` in Java. Redirecting Standard Error Output using `2>outfile` --------------------------------------------------- Normally, both **stdout** and **stderr** appear together on your screen, and redirection only redirects **stdout** and not **stderr**. You can redirect **stdout** and **stderr** separately into files using a unit number immediately before the right angle-bracket '`>`' meta-character: 1. Standard Output (stdout) is unit 1 -- redirect it using `>outfile` or `1>outfile` 2. Standard Error Output (stderr) is unit 2 -- redirect it using `2>outfile` Put the unit number immediately (no blank) before the '`>`' meta-character to redirect just that output stream: $ ls /etc/passwd nosuchfile 2>errors # shell redirects only stderr (unit 2) /etc/passwd # only stdout (unit 1) appears on screen $ cat errors ls: nosuchfile: No such file or directory # stderr unit 2 went into file "errors" If you don't use a unit number before the right angle-bracket `>`, the default is to assume Unit 1 and redirect just standard output. The default output redirection syntax `>foo` (no preceding unit number) is a shell shorthand for `1>foo` so that `>foo` and `1>foo` are identical. You can redirect **stdout** (unit 1) into one file and **stderr** (unit 2) into another file using two redirections on the same command line: $ ls /etc/passwd nosuchfile >out 2>errors # shell redirects each one $ # nothing appears on screen $ cat out /etc/passwd # stdout unit 1 went into "out" $ cat errors ls: nosuchfile: No such file or directory # stderr unit 2 went into "errors" Always use different output file names if you redirect both units. Do *not* redirect both units into the same output file name; the two outputs will overwrite each other. See the next section. Redirecting both **stdout** and **stderr** using `2>&1` ------------------------------------------------------- You needed a special syntax "`2>&1`" to redirect both **stdout** and **stderr** safely together into a single file. Read the syntax "`2>&1`" as *"send unit 2 to the same place as unit 1"*: $ ls /etc/passwd nosuchfile >both 2>&1 # redirect both into same file $ # nothing appears on screen $ cat both ls: nosuchfile: No such file or directory /etc/passwd The order of the redirections `>both` and `2>&1` on the command line matters! The **stdout** redirect "`>both`" (unit 1) must come first (to the left of) the **stderr** redirect "`2>&1`" (unit 2) because you must set where unit 1 goes *before* you send unit 2 to go "to the same place as unit 1". Don't reverse these! Remember: 1 comes before 2. You must use the special syntax "`>both 2>&1`" to put both **stdout** and **stderr** into the same file. Don't use the following, which is not the same: $ ls /etc/passwd nosuchfile >wrong 2>wrong # WRONG! DO NOT DO THIS! $ cat wrong /etc/passwd ccess nosuchfile: No such file or directory This above **WRONG** example shows how **stderr** and **stdout** overwrite each other and the result is a mangled output file; don't do this. Use `2>&1` to send **stderr** into the same file as **stdout**. The modern Bourne shells now have a special shorter syntax for redirecting both **stdout** and **stderr** into the same output file: $ ls /etc/passwd nosuchfile &>both # redirect both into same file $ # nothing appears on screen $ cat both ls: nosuchfile: No such file or directory /etc/passwd You can now use either "`&>both`" or "`>both 2>&1`", but only the latter works in every version of the Bourne shell (back to the 1960's!). When writing shell scripts, use the "`>both 2>&1`" version for maximum portability. Don't rely on `&>both` working everywhere. Output redirection mistakes to avoid ==================================== The most common redirection mistake is to use a redirection output file as a command argument input file. There is an obvious way to get this wrong, and a hidden way to get this wrong: Obvious misuse of redirection output file as input file: `sort out >out` ------------------------------------------------------------------------ Suppose you want to sort a file and put the sorted output back into the same file. *(The `sort` command is used as the example program below -- anything that reads the content of a file and produces output is at risk.)* This is the **WRONG** way to do it: $ cp /etc/passwd out $ sort out >out # WRONG! Redirection output file is used as sort input file! $ cat out $ # File is empty! Here is the problem with the `sort out >out` command above: 1. The shell first finds the output redirection on the command line and *truncates* (makes empty) the file "`out`" and gets it ready to receive the standard output of the command being run: - Redirection is *always* done first by the shell, before running the command. - The output file (which is also the input file) is now empty. - The original contents of "`out`" are lost -- truncated -- GONE! -- before the shell even goes looking for the `sort` command to run! 2. The shell finds and runs the `sort` command with its one file name argument "`out`" that is now an empty file. 3. The `sort` command opens the empty argument file "`out`" for reading. Sorting an empty file produces no output. 4. Standard output of the command has been redirected by the shell to appear in file "`out`", so the "no output" goes into file `out`; the file remains empty. Result: File "`out`" is always empty, no matter what was in it before. There are two safe and correct ways to do this, one of which depends on a special output file feature of the `sort` command (that may not be available in other commands): $ sort out >tmp && mv tmp out # sort into tmp file and rename tmp to out $ sort -o out out # use special sort output file option Here is another incorrect example that uses the same redirection output file as an input file. The result is wrong but is not an empty file this time: $ date >out $ wc out # count the lines, words, and characters in file "out" 1 6 29 out $ wc out >out # WRONG! Redirection output file is used as input file! $ cat out 0 0 0 out # Using wc on an empty file produces zeroes! Here is the problem with the `wc out >out` command above: 1. The shell first finds the output redirection on the command line and *truncates* (makes empty) the file "`out`" and gets it ready to receive the standard output of the command being run: - Redirection is *always* done first by the shell, before running the command. - The output file (which is also the input file) is now empty. - The original contents of "`out`" are lost -- truncated -- GONE! -- before the shell even goes looking for the `wc` command to run! 2. The shell finds and runs the `wc` command with its one file name argument "`out`" that is now an empty file. 3. The `wc` command opens the empty argument file "`out`" for reading. It counts the lines, words, and characters of an empty file and produces one line of output: `0 0 0 out` 4. Standard output of the command has been redirected by the shell to appear in file "`out`", so the one line of output goes into file `out`. The file shows all zeroes, not the word count of the original date. Result: File "`out`" always shows zeroes, not the count of the original content. Here is the only safe and correct way to do this with `wc`: $ wc out >tmp && mv tmp out # do output redirection into tmp file and move it Other incorrect redirection examples that **DO NOT WORK** because the redirection output file is being used as an input file: $ head file >file # ALWAYS creates an EMPTY FILE $ tail file >file # ALWAYS creates an EMPTY FILE $ uniq file >file # ALWAYS creates an EMPTY FILE $ cat file >file # ALWAYS creates an EMPTY FILE $ sort file >file # ALWAYS creates an EMPTY FILE $ fgrep 'foo' file >file # ALWAYS creates an EMPTY FILE $ wc file >file # ALWAYS counts an EMPTY FILE (0 0 0) $ sum file >file # ALWAYS checksums an EMPTY FILE (0) ...etc... **Do not use a redirection output file as an input to a program or a pipeline!** Never use the same file name for both input and redirection output -- the shell will truncate the file before the command reads it. Hidden misuse of redirection output file as GLOB input file ----------------------------------------------------------- The hidden way to have a redirection output file used as an input file is to have the input file name hidden in a shell GLOB wildcard expansion. Suppose you want to number the lines in a bunch of files and put the numbered output into a file file in the same directory. *(The `nl` command is used as the example program below -- anything that reads the content of a file and produces output is at risk.)* This is the **WRONG** way to do it: $ cp /etc/passwd bar # create a file larger than a disk block $ touch foo $ nl * >foo # WRONG! GLOB * input files match redirection output file! ^C # interrupt this command immediately before your disk is full! $ ls -l -rw-rw-r-- 1 idallen idallen 194172 Feb 15 05:19 bar -rw-r--r-- 1 idallen idallen 289808384 Feb 16 05:20 foo # HUGE FILE! Here is what happens to make the output file "`foo`" grow forever when you type `nl * >foo`: 1. The shell first expands the GLOB "`*`" to match all the pathnames in the current directory, that includes the "`bar`" and "`foo`" names: `nl bar foo >foo` 2. The shell truncates output redirection file `foo` and gets file `foo` ready to receive all the **stdout** of the command as it runs. 3. The shell finds and runs the `nl` command, giving it all the GLOB file name arguments including the names "`bar`" and "`foo`": `nl bar foo` 4. The `nl` command opens the first input file (from the GLOB expansion) named "`bar`" and sends its numbered output to **stdout**, which means into the redirection output file `foo`. 5. The `nl` next opens the next input file (from the GLOB expansion) named "`foo`" and starts reading lines from the top of the file, numbering them, and writing the numbered output to **stdout**, which is the bottom of the **same** file. The output file is the same as the input file and the `nl` command is reading lines from the same file into which it is writing lines. This never finishes, and the file "`foo`" grows until all the disk space is used. Result: An infinite loop by `nl` reading and writing the same file. Eventually the disk drive fills up as "`foo`" gets bigger and bigger. **Fix #1:** Use a hidden file name that GLOB doesn't match as an input file: $ nl * >.z **Fix #2:** Use an output redirection file in some other directory not matched by the shell GLOB pattern: $ nl * >../z $ nl * >/tmp/z **Do not use a wildcard/GLOB file pattern that picks up the name of the output redirection file and causes it to become an unintended input file.** Summary: Four Rules for Output Redirection ========================================== Here are the four rules for Output Redirection again: > 1. Redirection is done first, before running the command. > 2. You can only redirect output that you can (could) see. > 3. Redirection goes only to one place. > 4. Only standard output is redirected, by default. Never use a redirection output file as an input file! Input Redirection -- Standard Input =================================== Many Unix/Linux commands read input from files, if file pathnames are given on the command line. If *no* file names are given, these commands usually read from what is called Standard Input ("stdin"), which is usually connected to your keyboard. (You can send `EOF` by typing `^D` (Ctrl-D) to get the command to stop reading your keyboard.) Here is an example of the `nl` command reading from a file, then reading from **stdin** (your keyboard) when no files are supplied: $ nl /etc/passwd # nl reads content from the file /etc/passwd [...many lines print here, with line numbers...] $ $ nl # no files; nl reads standard input (your keyboard) foo # you type this line and push ENTER 1 foo # this is the line as numbered and output by nl bar # you type this line and push ENTER 2 bar # this is the line as numbered and output by nl ^D # you signal keyboard EOF by typing ^D (CTRL-D) $ Examples of commands that may read from pathnames or, if not given any pathnames, from standard input: less, more, cat, head, tail, sort, wc, grep, fgrep, nl, uniq, etc. Commands such as the above may read standard input. They will read standard input (which may be your keyboard) *only* if there are *no* pathnames to read on the command line: $ cat foo # cat opens and reads file "foo"; cat completely ignores stdin $ cat # cat opens and reads standard input = your keyboard; use ^D for EOF $ tail foo # tail opens and reads "foo"; tail completely ignores stdin $ tail # tail opens and reads standard input = your keyboard; use ^D for EOF $ wc foo # wc opens and reads file "foo"; wc completely ignores stdin $ wc # wc opens and reads standard input = your keyboard; use ^D for EOF The above is true for all commands that can read from **stdin**. They only read from **stdin** if there are *no pathmames* given on the command line. > To tell a command to stop reading your keyboard, send it an EOF > (End-Of-File) indication, usually by typing `^D` (Control-D). If you > interrupt the command (e.g. by typing `^C`), you usually kill the command > and the command may not produce any output at all. Not all commands read standard input ------------------------------------ Not all commands read from standard input, because not all commands read data from files supplied on the command line. Examples of common Unix/Linux commands that don't read any data from files or standard input: ls, date, who, pwd, echo, cd, hostname, ps, sleep # etc. NEVER READ DATA from STDIN All the above commands have in common the fact that they *never* open any files for reading on the command line. If a command never reads any data from any files, it will never read any data from standard input, and it will never read data from your keyboard or anywhere else. The Unix/Linux copy command `cp` obviously reads content from files, but it never reads file data from standard input because, as written, it always has to have both a source and destination pathname argument. The `cp` command must always have an input file name. It never reads file data from standard input. Redirection of standard input from a file: `file ### `wc file` $ wc file 1 6 29 file - there is no redirection syntax used here; the shell will *not* open any files - the `wc` command has a pathname argument, which means it ignores **stdin** - the `wc` command reads data from the file `file` that it opens itself - the `wc` command is the program that is opening the file argument `file`, not the shell - any errors will come from the `wc` command, not the shell, and will mention the file name given on the command line, e.g.: $ wc /etc/shadow wc: /etc/shadow: Permission denied Note how it is the `wc` program that issues the error message, above. ### `wc myfile # WRONG! myfile is truncated empty! $ sort myfile # WRONG! myfile is truncated empty! $ head myfile # WRONG! myfile is truncated empty! $ tr myfile # WRONG! myfile is truncated empty! Given the above, why is `myfile` not left empty in the following case? $ wc myfile # WRONG! myfile is trucated empty! $ cat myfile # What is in the file "myfile" now? Hint: What happens when `wc` counts nothing? Is there no output? Redirection into programs using `|` (pipes) =========================================== Since the shell can redirect both the output of programs and the input of programs, it can connect (redirect) the output of one program directly into the input of another program without using any files in between. This output-to-input redirection is called **piping** and uses the "pipe" meta-character '`|`' that is usually located above the backslash key '`\`' on your keyboard. Using it looks like this: $ date Mon Feb 27 06:37:52 EST 2012 $ date | wc # wc counts the output of "date" 1 6 29 Three Rules for Pipes --------------------- Here are three major rules that apply to useful pipes: > 1. Pipe redirection is done by the shell, first, before file redirection. > 2. The command on the left of the pipe must produce some standard output. > 3. The command on the right of the pipe must want to read standard input. Using the pipe meta-character `|` between commands -------------------------------------------------- The shell meta-character `|` ("pipe") is similar to semicolon `;` in that it signals the start of another command on the command line. The pipe is different because the standard output (only **stdout**; not **stderr**) of the command on the immediate left of the pipe `|` is attached/connected ("piped") to the standard input of the command on the immediate right of the pipe: $ date Mon Feb 27 06:37:52 EST 2012 $ date | wc # wc counts the output of "date" 1 6 29 $ echo hi hi $ echo hi | wc # wc counts the output of "echo hi" 1 1 3 *(Note that the invisible newline character at the end of a line is also counted by `wc` in the above example.)* It is the **shell** that is redirecting the standard output of the command on the left into the standard input of the command on the right. As with all redirection, the shell does this redirection before it finds and runs any commands. The commands themselves do not see the redirection. Piped output flows immediately without temporary files ------------------------------------------------------ You can approximate some of the behaviour of a pipe between two commands by using an intermediate file for intermediate storage of the output of the first command before using the second command to read that output: $ nl /etc/passwd >out # save all the first command's standard output in a file $ head out # huge output of find has to finish first (slow) $ less out # now we can display the output of "find" Using a pipe, the output from `find` can start to appear in `less` right away, before the `find` command has finished generating all the output: $ find / -ls | less # huge output of find goes directly into "less" Pipes don't need to wait for the first command to finish before the second command starts reading the output of the first. The output starts flowing immediately through the pipe because *both* commands are actually running *simultaneously*. The pipe also requires no intermediate file to hold the output of the first command, and so as soon as the command on the left of the pipe starts producing standard output, it goes directly into the standard input of the command on the right. If the command on the left of the pipe never finishes, the command on the right will read all the input that currently available and then continue to wait for more input, processing it as soon as it appears. If the command on the left of the pipe does finish, the command on the right sees an EOF (end-of-file) on the pipe (its standard input). As with EOF from a file, EOF usually means that the command on the right will finish processing, produce its last output, and exit. Pipe-splitting happens before file redirection ---------------------------------------------- As with semicolon meta-characters `;`, the shell does the recognizing of pipe characters and splitting a command line into piped commands first, *before* doing file redirection. File redirection happens second (after pipe splitting), and if present, has precedence over pipe redirection. (The file redirection is done *after* pipe splitting, so it always wins, leaving nothing for the pipe.) $ ls -l | wc # correct - output of ls goes into the pipe 2 11 57 $ ls -l >out | wc # WRONG! - output of ls goes into the file 0 0 0 # wc reads an empty pipe and outputs zeroes This is why in the above pipe `wc` has no characters to count from `ls`: 1. First, the shell splits the command line on the pipe, redirecting the output of the command on the left into the input of the command on the right, without knowing anything about what the commands might be. 2. Next, the shell does the standard output file redirection on the `ls` command on the left of the pipe and changes the `ls` standard output away from the pipe into the file `out`. 3. Finally, the shell finds and runs both commands simultaneously: - All the standard output from `ls` on the left goes into the file `out`; nothing is available to go into the pipe. - The `wc` command on the right of the pipe counts an empty input from the pipe and outputs zeroes: `0 0 0` **Remember:** Redirection can only go to *one* place, and file redirection always wins over pipes, because it is done after pipe splitting: $ ls /bin >out # all output from ls goes into file "out" $ ls /bin >out | wc # WRONG! output goes into "out", not into pipe 0 0 0 # wc counts an empty input from the pipe Pipes: You can only redirect what you can see --------------------------------------------- As with output redirection into files, you can only redirect into a pipe the standard output that you can *see*. Using redirection never creates output, even when using pipes: $ ls /bin >out # all output from ls goes into file "out" $ ls /bin >out | wc # nothing goes into the pipe to "wc" 0 0 0 # wc counts an empty input from the pipe $ cp /etc/passwd x # no output visible on standard output $ cp /etc/passwd x | wc # nothing goes into the pipe to "wc" 0 0 0 # wc counts an empty input from the pipe $ cd /tmp # no output visible on standard output $ cd /tmp | wc # nothing goes into the pipe to "wc" 0 0 0 # wc counts an empty input from the pipe $ touch x ; rm x # no output from rm on standard output $ touch x ; rm x | wc # nothing goes into the pipe to "wc" 0 0 0 # wc counts an empty input from the pipe You can only redirect output that you can *see*. Redirecting stderr using `2>&1` with pipes ------------------------------------------ As with file redirection, pipes only redirect Standard Output (**stdout**) from commands, not Standard Error Output (**stderr**). Standard Error Output still goes directly to your screen; it does not go into a pipe: $ ls /etc/passwd nosuchfile # no redirection used ls: cannot access nosuchfile: No such file or directory # STDERR unit 2 /etc/passwd # STDOUT unit 1 $ ls /etc/passwd nosuchfile | wc # only stdout is redirected to "wc" ls: cannot access nosuchfile: No such file or directory # STDERR unit 2 1 1 12 # stdout went into the pipe to "wc" You need the special syntax "`2>&1`" to redirect both **stdout** and **stderr** both into a pipe. Recall that "`2>&1`" means "redirect standard error to go to the same place as standard output", so if standard output is already going into a pipe (and remember pipe splitting happens first), "`2>&1`" will send standard error into the pipe too: $ ls /etc/passwd nosuchfile 2>&1 | wc # both stdout and stderr redirected 2 10 68 # wc counts both lines from pipe The "`2>&1`" above happens *after* pipe-splitting; it works because pipe-splitting happens first and Standard Output is already redirected into the pipe. It sends Standard Error to the same place, i.e. into the pipe. Alternate BASH shell pipe syntax `|&` instead of `2>&1` ------------------------------------------------------- Some shells (including the BASH shell) allow a "`|&`" pipe syntax to redirect both stderr and stdout into the pipe. These are equivalent in the BASH shell: $ ls /etc/passwd nosuchfile 2>&1 | wc # both stdout and stderr redirected (all Bourne-style shells) $ ls /etc/passwd nosuchfile |& wc # both stdout and stderr redirected (BASH shell only) Not all shells recognize the "`|&`" pipe syntax. (The `/bin/sh` shell on Ubuntu systems does not!) Don't use the `|&` syntax inside a shell script; use the standard "`2>&1`" instead that works with all Bourne-style shells. Using commands without pathnames as filters in pipes ---------------------------------------------------- Many Unix/Linux commands can be made to act as **filters** in pipelines. A filter command has no file name arguments and doesn't open any files itself. The filter command reads its input lines from its **standard input** that is usually connected to a pipe on its left. The filter command writes it output to **standard output**, which might often be into another pipe and filter command on its right. The filter command has no file name arguments of its own to process. With no file name arguments on the command line, filter commands read from standard input and write to standard output. The shell uses pipes to provide redirection for both standard input and standard output: $ fgrep "/bin/sh" /etc/passwd | sort | head The `fgrep` command above is reading from the filename argument `/etc/passwd` given on the command line. The output of the `fgrep` command always goes to standard output, which in the above command pipeline means the output goes into the pipe, not onto the screen. The `sort` and `head` commands above have no file name arguments to read. Without file name arguments, each of the commands reads from its standard input, which is set up to be from the pipes created by the shell. Both `sort` and `head` have no file name arguments and are acting as filter commands. (The `fgrep` command is technically not a filter -- it is reading from the supplied pathname argument, not from standard input.) Lines of input are sent through a pipe into the standard input of a filter command (such as `sort` and `head`, above). The filter command reads the lines from the pipe, filters them in some way, and sends the result into another pipe (or perhaps onto your screen, or into an output file with redirection, if the command is the last one in the pipeline). Filter commands read from standard input (not from a file name) and they write to standard output. Using successive filters in pipes --------------------------------- *You can only redirect what you can see,* so if you use a command to select some lines from a file and then send those lines into a second filter command via a pipe, remember that it is only the selected lines that are being read by that second filter command, not the original file. Filter commands in pipelines read their input from other commands output, through pipes, they don't read directly from files. Below is an example that shows how a second `fgrep` in a pipeline searches for its text pattern in the *output* of the first `fgrep`, not in the original file. In the example below, looking for the word `mail` in the file `/etc/services` finds five lines. Looking for the word `file` in the file `/etc/services` also finds five lines, but they are a different five lines. There are no lines in that file with both words in them: $ fgrep 'mail' /etc/services smtp 25/tcp mail re-mail-ck 50/tcp # Remote Mail Checking Protocol re-mail-ck 50/udp mailq 174/tcp # Mailer transport queue for Zmailer mailq 174/udp $ fgrep 'file' /etc/services remotefs 556/tcp rfs_server rfs # Brunhoff remote filesystem afs3-fileserver 7000/tcp bbs # file server itself afs3-fileserver 7000/udp bbs supfilesrv 871/tcp # SUP server supfiledbg 1127/tcp # SUP debugging $ fgrep 'file' /etc/services | fgrep 'mail' # pipeline gives NO OUTPUT !!! $ fgrep 'mail' /etc/services | fgrep 'file' # pipeline gives NO OUTPUT !!! The two `fgrep` pipeline command lines at the end of the above example give no output, because none of the lines that contain the text string `file` also contain the text string `mail`, and vice-versa. In each example pipeline above, the second `fgrep` is searching for its pattern in the *output* of the first `fgrep`, and the second pattern is not in any of the lines output by the first `fgrep`. A line in the file would have to contain both text strings `mail` and `file` to pass through both `fgrep` commands in the pipe. The first `fgrep` selects lines with one text string and then the second `fgrep` reads the output of the first `fgrep` and looks for the second text string. Lines must contain both strings to be output. No lines contain both strings in the example. There is no output. If we change the second `fgrep` in the pipeline to select a word that *is* in the output of the first `fgrep`, it finds a line to output: $ fgrep 'mail' /etc/services | fgrep 'Remote' re-mail-ck 50/tcp # Remote Mail Checking Protocol $ fgrep 'Remote' /etc/services | fgrep 'mail' re-mail-ck 50/tcp # Remote Mail Checking Protocol The output line is the only line from `/etc/services` that contains both the word `mail` *and* the word `Remote` in it. It doesn't matter which word you search for first; the order of the searches doesn't matter. In both cases, the output is the only line that has both words in it. Successive filter commands can be used to select lines that contain multiple strings in a line. ### Example 1: Count `ssh` break-in attempts in January We are asked to count the number of times the machine rejected an SSH break-in attempt in the month of January. Here is a practical example showing the use of a **filter** command that reads from standard input and writes to standard output. We need to look for lines in the system log file `auth.log` that contain both the string `'refused connect'` and the date string for January. Here is a sample `auth.log` input file that we will use in the following example (484 lines): [`auth.log`] This sample file was taken from an actual `/var/log/auth.log` file. First, we need to extract from the log file only the lines that indicate a rejected break-in attempt. Since there could be thousands of lines of output in a real system log file, we always pipe the large output into a command `head` that limits the output on our screen to only ten lines: $ fgrep 'refused connect' auth.log | head Sep 2 02:51:01 refused connect from 61.174.49.108 (61.174.49.108) Sep 4 09:05:00 refused connect from 193.107.17.72 (193.107.17.72) Sep 5 03:27:11 refused connect from 61.144.43.235 (61.144.43.235) Sep 6 05:53:51 refused connect from 122.225.109.208 (122.225.109.208) Sep 8 06:28:53 refused connect from 116.10.191.180 (116.10.191.180) Sep 10 15:30:18 refused connect from 122.225.109.105 (122.225.109.105) Sep 22 12:11:22 refused connect from 211.143.243.35 (211.143.243.35) Sep 30 04:11:02 refused connect from 220.177.198.39 (220.177.198.39) Oct 3 01:09:02 refused connect from 61.174.51.235 (61.174.51.235) Oct 3 19:54:33 refused connect from 117.21.173.35 (117.21.173.35) $ fgrep 'refused connect' auth.log | wc 100 800 7055 $ fgrep -c 'refused connect' auth.log 100 Looking at the output, we see that every line has the month abbreviation at the start of the line. We only want January dates, so we use the date string `'Jan '` in another `fgrep` filter to further restrict the output to only lines containing both `'refused connect'` *and* `'Jan '`. (Note the trailing blank in the date string.) $ fgrep 'refused connect' auth.log | fgrep 'Jan ' | head Jan 2 15:43:42 refused connect from 221.235.188.212 (221.235.188.212) Jan 2 15:46:46 refused connect from 221.235.188.212 (221.235.188.212) Jan 2 15:49:48 refused connect from 221.235.188.212 (221.235.188.212) [... etc ...] $ fgrep 'refused connect' auth.log | fgrep 'Jan ' | wc 26 208 1948 $ fgrep 'refused connect' auth.log | fgrep -c 'Jan ' 26 Below are the functions of the two commands in the above pipeline. The second `fgrep` command is acting as a **filter** command, reading Standard Input from a pipe and writing output to Standard Output (to the screen). 1. The first `fgrep` command selects the lines containing the text string `'refused connect'` inside the `auth.log` file. The [output] of this first command (only lines containing the `'refused connect`' string) goes into the first pipe, not onto the screen. 2. The second `fgrep` reads *the output of the first `fgrep`* from the pipe and only selects (and counts, using the `-c` option) lines that *also* contain the date pattern for January `'Jan '` (with a trailing blank). The lines being selected and counted have to contain *both* the string `'refused connect'` from the first `fgrep` *and* the string `'Jan '` from the second `fgrep`. The [output][1] of this second `fgrep` (a count of lines containing both strings: 26) displays on the screen. > When filtering output by date, always look in the file you are filtering to > see what format date is being used on each line. Use the date format found > in the file. ### Example 2: Count shells in the password file The last (seventh) colon-separated field in the system password file `/etc/passwd` contains the name of the login shell given to the user when the user logs in: $ head /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin sync:x:4:65534:sync:/bin:/bin/sync [... etc ...] (A blank/empty field means use the *default* shell, which on Linux systems is usually `/bin/sh` that is often a link to `/bin/bash`.) In this example we must "Count the number of each kind of shell in `/etc/passwd` and display the top four results sorted in descending numeric order." We will build up the answer iteratively using a pipeline: - 2-A. Extract just the shell field from each line in the password file. - 2-B. Count the identical shells. - 2-C. Display the top four most used shells in descending order of use. **Problem 2-A:** Extract just the shell field from each line. **Solution 2-A:** Use the `cut` command that extracts from input lines fields separated by a delimiter. Since there could be thousands of lines of output, we pipe the large output into a command that limits the output on our screen to ten lines: $ cut -d : -f 7 /etc/passwd | head /bin/bash /usr/sbin/nologin /bin/sync [... etc ...] We now have a list of shells, in the order that they appear in the password file. On to the next problem: 2-B. **Problem 2-B:** Count the identical shells. **Solution 2-B:** The `uniq` command can count adjacent lines in an input file (or from standard input) using the `-c` option, but the lines have to be adjacent. We can sort the lines to make all the shell lines adjacent so that they can be counted, then add `uniq -c` to count the sorted lines. First, we add the sort to the pipeline, check the output, then we add the `uniq -c` to the pipeline: $ cut -d : -f 7 /etc/passwd | sort | head /bin/bash /bin/bash /bin/bash [... etc ...] $ cut -d : -f 7 /etc/passwd | sort | uniq -c 1170 /bin/bash 23 /bin/false 1 /bin/sh 1 /bin/sync 16 /usr/sbin/nologin 697 /usr/sbin/nologin_lock.sh The output of `uniq -c` shows the counts of each shell, but the counts are not sorted in descending order, and there are more than four lines of output. On to the next problem: 2-C. **Problem 2-C:** Display the top four most used shells in descending order of use. **Solution 2-C:** First we add another `sort` to the pipeline, using options to sort the count numbers numerically and in descending (reverse) order, then we add a final `head` command to limit the output to four lines: $ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr 1170 /bin/bash 697 /usr/sbin/nologin_lock.sh 23 /bin/false 16 /usr/sbin/nologin 1 /bin/sync 1 /bin/sh $ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr | head -n 4 1170 /bin/bash 697 /usr/sbin/nologin_lock.sh 23 /bin/false 16 /usr/sbin/nologin Summary: $ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr | head -n 4 1. The `cut` command picks out colon-delimited field 7 in each line in the password file and sends just those fields (the shell name) into the pipe. 2. The `sort` command reads the shell names from the pipe and puts all the shell names in sorted ascending order and sends the sorted names into another pipe. 3. The `uniq` command reads the sorted names from the pipe and counts the number of adjacent names. The output for each unique name is the count followed by the name. The output goes into another pipe. 4. The `sort` command reads the lines containing the count and the shell name and it sorts the lines numerically (using the count field) and in reverse. Those sorted lines go into another pipe. 5. The `head` command reads the sorted lines from the pipe and selects only the first four lines. Only those four lines display on the screen. ### Example 3: Count IP addresses used in SSH break-in attempts in January In this Example showing the use of multiple **filter** commands, we use filter commands to find the unique IP addresses used in SSH break-in attempts in January and then count how many times each IP address was used. This Example uses features of the previous two Examples. As in the first Example above, we need to look for lines in the system log file `auth.log` that contain both the string `'refused connect'` and the date string `'Jan '`. Instead of counting all of them together, we need to extract the IP address from each line and count the number of times each IP address appears. Counting occurrences was a feature of the second Example, above. Here is the solution, using features of both previous Examples: $ fgrep 'refused connect' auth.log | fgrep 'Jan ' \ | awk '{print $NF}' \ | sort | uniq -c | sort -nr Below are the functions of the six commands in the above pipeline. Five of the commands are acting as **filter** commands, reading Standard Input from a pipe and writing output to Standard Output (often, to another pipe, except for the last command that writes on the screen). This example uses the same sample `auth.log` input file that we used earlier (484 lines): [`auth.log`] 1. The first `fgrep` command selects the lines containing the text string `'refused connect'` inside the `auth.log` file. The [output] of this first command (only lines containing the `'refused connect'` string) goes into the first pipe, not onto the screen. 2. The second `fgrep` reads *the output of the first `fgrep`* from the pipe and only selects lines that *also* contain the date pattern for January `'Jan '`. The lines being selected have to contain both the string `'refused connect'` from the first `fgrep` *and* the string `'Jan '` from the second `fgrep`. The [output][1] of this second `fgrep` (lines containing both strings) goes into another pipe. 3. The `awk` command reads the selected lines from the pipe. It displays just the last field (`NF`) on each line, which happens to be the IP address used by the attacker. The [`awk` output] (a list of IP addresses, one per line) goes into another pipe. (The list of addresses are not in sorted order; they are in whatever order they appear in the input file.) 4. The first (leftmost) `sort` command reads lines of IP addresses from the pipe. It sorts all the IP addresses together so that `uniq` can count them, and sends the [`sort` output] (the sorted lines) into another pipe. 5. The `uniq -c` command reads the sorted list of IP addresses from the pipe. It counts how many adjacent addresses are the same and sends the [`uniq` output] (lines with the count followed by the IP address with that count) into another pipe. 6. The `sort -nr` command reads the lines with the counts and IP addresses from the pipe. It sorts numerically and in reverse (descending) order the lines containing the leading count numbers and sends the [second `sort` output] (sorted lines, each containing a count and an IP address) onto the screen. **Note** the use of two `sort` commands. The first `sort` takes an unordered list of IP addresses and sorts them so that all the same IP addresses are together, so that the `uniq` command can count them. (The `uniq` command can only count *adjacent* lines in an input stream.) Without the first `sort`, the IP addresses wouldn't be all together and wouldn't be counted correctly by `uniq`. The second `sort` command sorts the *output* of `uniq` numerically and in reverse and puts the IP addresses with the largest counts first. Both `sort` commands are needed. ### Example 4: Select a range of lines from a file **Problem:** Display only lines 6-10 of the password file. Solution: Extract the first 10 lines of the file, and from those 10 lines extract just the last five lines, which are lines 6-10. You can use the `nl` command to add line numbers to the file to confirm your solution. $ head /etc/passwd | tail -n 5 $ nl /etc/passwd | head | tail -n 5 **Problem:** Display only the second-to-last line of the password file. Solution: Extract the last two lines of the file, and from those last two lines extract just the first line, which is the second-to-last line. $ tail -n 2 /etc/passwd | head -n 1 ### Example 5: Select large files **Problem:** Which five (non-hidden) files in current directory are largest: $ ls -s | sort -nr | head -n 5 The `-s` option outputs the size of the file in blocks as a number at the start of every line, which makes it easy to sort the lines numerically. Here is another answer that uses some sort options to pick which field to sort: $ ls -l | sort -k 5,5nr | head -n 5 If we want to sort by file size in bytes, bytes is the fifth field in the output of `ls -l`. We have to use some options to sort that tell it to sort using the fifth field of every line. The above sort command is sorting by the fifth field, numerically, in reverse. ### Example 6: Using `elinks` to fetch and search formatted web pages For the examples below, we need a program that fetches formatted web pages (or RSS pages) from the Internet. We will use the `elinks` text web browser with some options. Because we are typing commands in to an interactive shell, we will define a shell alias for `elinks` and its list of required arguments, to make the examples below shorter to type and display: $ alias ee='elinks -dump -no-numbering -no-references' *Don't use aliases inside script files that you write.* **Problem:** Display the dates of the Midterm tests from the [Course Home Page]: $ ee 'http://teaching.idallen.com/cst8207/19w/' | fgrep 'Midterm' **Problem:** Display weekly course notes file modify dates: $ ee 'http://teaching.idallen.com/cst8207/19w/notes/' | fgrep 'notes.txt' **Problem:** Display the assignment file modify dates from the Course Notes: $ ee 'http://teaching.idallen.com/cst8207/19w/notes/' | fgrep 'assignment' **Problem:** Display current Ottawa weather temperature: $ ee 'http://weather.gc.ca/rss/city/on-118_e.xml' | fgrep 'Temperature:' **Problem:** Display the current BBC weather for Vancouver: $ ee 'http://www.bbc.co.uk/weather/6173331' \ | fgrep -A19 'Observations' | tail -n 20 **Problem:** Display the current Space Weather forecast for Canada: $ ee 'http://www.spaceweather.gc.ca/forecast-prevision/short-court/sfst-1-eng.php' \ | fgrep 'Current Conditions' **Problem:** Display the current phase of the Moon: $ ee 'http://www.die.net/moon/' \ | fgrep -A2 'Moon Phase' | head -n 3 | tail -n 1 Misuse of pipes --------------- There are many ways to misuse pipes. Here are some common ones. ### Give file names to commands where possible If a command does read from file names supplied on the command line, it is more efficient to let it open its own file name than to use `cat` to open the file and feed the data to the command on standard input. (There is less data copying done!) Do not do this (wasteful of processes and I/O and flags you as a novice): $ cat /etc/passwd | head # DO NOT DO THIS - INEFFICIENT $ cat /etc/passwd | sort # DO NOT DO THIS - INEFFICIENT $ cat /etc/passwd | fgrep 'root:' # DO NOT DO THIS - INEFFICIENT Do this: Give the file name(s) directly to the commands, like this: $ head /etc/passwd $ sort /etc/passwd $ fgrep 'root:' /etc/passwd **Let commands open their own files; don't feed them with `cat` and unnecessary pipes.** ### Commands with file arguments never read data from Standard Input If a Unix/Linux command that can open and read the contents of pathnames is not given any pathnames to open, it usually reads input lines from standard input (stdin) instead: $ wc /etc/passwd # wc reads /etc/passwd, ignores stdin and your keyboard $ wc # without a file name, wc reads stdin (your keyboard) If the command is given a pathname, it reads from the pathname and *always* ignores standard input, even if you try to send it something: $ date | wc foo # WRONG! wc opens and reads file foo; wc ignores stdin The above applies to every command that reads file content, e.g.: $ date | head foo # WRONG! head opens and reads file foo; head ignores stdin $ date | less foo # WRONG! less opens and reads file foo; less ignores stdin If you want a command to read **stdin**, you *cannot* give it any file name arguments. Commands with file name arguments *ignore* standard input; they should not be used on the right side of a pipe. Commands that are ignoring standard input (because they are opening and reading from pathnames on the command line) will always ignore standard input, no matter what silly things you try to send them on standard input: $ echo hi | head /etc/passwd # WRONG: head has a pathname and ignores stdin $ echo hi | tail /etc/group # WRONG: tail has a pathname and ignores stdin $ echo hi | wc .vimrc # WRONG: wc has a pathname and ignores stdin $ sort a | cat b # WRONG: cat has a pathname and ignores stdin $ cat a | sort b # WRONG: sort has a pathname and ignores stdin Standard input is thrown away if it is sent to a command that ignores it. The shell *cannot* make a command read **stdin**; it's up to the command. The command must *want* to read standard input, and it will *only* want to read standard input if you *leave off all the file names*. ### Some commands never read data from Standard Input Commands that do not open and process the *contents* of files usually ignore standard input, no matter what silly things you try to send them on standard input. All these commands will never read standard input: $ echo hi | ls # NO: ls doesn't open files - always ignores stdin $ echo hi | pwd # NO: pwd doesn't open files - always ignores stdin $ echo hi | cd # NO: cd doesn't open files - always ignores stdin $ echo hi | date # NO: date doesn't open files - always ignores stdin $ echo hi | chmod +x . # NO: chmod doesn't open files - always ignores stdin $ echo hi | rm foo # NO: rm doesn't open files - always ignores stdin $ echo hi | rmdir dir # NO: rmdir doesn't open files - always ignores stdin $ echo hi | echo me # NO: echo doesn't open files - always ignores stdin $ echo hi | mv a b # NO: mv doesn't open files - always ignores stdin $ echo hi | ln a b # NO: ln doesn't open files - always ignores stdin Some commands that open and read file contents *only* operate on file name arguments and never read **stdin**: $ echo hi | cp a b # NO: cp opens arguments - always ignores stdin Standard input is thrown away if it is sent to a command that ignores it. The shell *cannot* make a command read **stdin**; it's up to the command. Commands that might read standard input will do so only if *no* file name arguments are given on the command line. The presence of any file arguments will cause the command to ignore standard input and process the file(s) instead, and that means they cannot be used on the right side of a pipe to read standard input. File name arguments always win over standard input. ### Do not use pathnames on filter commands in pipelines Remember: If a file name is given to a command on the command line, the command ignores standard input and only operates on the file name. The very long sequence of pipes below is pointless -- the last (rightmost) command `head` has a pathname argument and it will open and read it, ignoring all the standard input coming from all the pipes on the left: $ fgrep "/bin/sh" /etc/passwd | sort | head /etc/passwd # WRONG! The `head` command is ignoring the standard input coming from the pipe and is reading directly from its `/etc/passwd` filename argument. The `fgrep` and `sort` commands are doing a lot of work for nothing, since `head` is not reading the output of `sort` coming down the pipe. The `head` command is reading from the supplied file name argument `/etc/passwd` instead. File names take precedence over standard input. The above long-but-mal-formed pipeline is equivalent to this (same output): $ head /etc/passwd Don't make the above mistake. Filter commands must not have file name arguments; they must read standard input from the pipe. If you give a command a file to process, it will ignore standard input, and so a command with a file name must not be used on the right side of any pipe. ### Don't use redirection output file as input file anywhere in pipeline The following command line redirection is faulty (an input file on the left is also used as and output file on the right); however, it sometimes works for small files: $ cat foo bar | tr 'a' 'b' | fgrep "lala" | sort | head >foo # WRONG! There is a critical race between the first `cat` command trying to read the data out of file `foo` before the shell truncates it to zero when launching the `head` command at the right end of the pipeline. Depending on the system load and the size of the file, `cat` may or may not get out all the data before the `foo` file is truncated or altered by the shell in the redirection at the end of the pipeline. Don't do this. Don't depend on long pipelines saving you from bad redirection! Never redirect output into a file that is being used as input in the same command or anywhere in the command pipeline. Summary: Three Rules for Pipes ------------------------------ > 1. Pipe redirection is done by the shell, first, before file redirection. > 2. The command on the left of the pipe must produce some standard output. > 3. The command on the right of the pipe must want to read standard input. Never use a redirection output file as an input file anywhere in a pipeline! Unique STDIN and STDOUT ======================= There is only one standard input and one standard output for each command. Each can only be redirected to *one* other place. You cannot redirect standard input from two different places, nor can you redirect standard output into two different places. The Bourne shells (including BASH) do not warn you that you are trying to redirect the input of a command from two or more different places (and that only one of the redirections will work -- the others will be ignored): $ wc a >b >c >d >e - The `date` output goes only into the rightmost file `e`. - The other four output files are each created and truncated by the shell but they are all left empty because only the final redirection into `e` wins. $ date >out | wc 0 0 0 - The `date` output goes into file `out`. Nothing goes into the pipe and `wc` outputs zeroes. (File redirection is done second and always wins over pipe redirection.) > Some shells (including the "C" shells, but not the Bourne shells) will try > to warn you about silly shell redirection mistakes: > > csh% date Ambiguous input redirect. > > csh% date | cat Ambiguous input redirect. > > csh% date >a >b >c > Ambiguous output redirect. > > csh% date >a | wc > Ambiguous output redirect. > > The C shells tell you that you can't redirect **stdin** or **stdout** > to/from more than one place at the same time. Bourne shells do not tell you > -- they simply ignore the "extra" redirections and do only the last one of > each. Throwing away input/output using `/dev/null` ============================================ There is a special file on every Unix/Linux system into which you can redirect output that you don't want to keep or see: `/dev/null` The following command generates some error output we don't like to see: $ cat * >/tmp/out cat: course_outlines: Is a directory # errors print on STDERR cat: jclnotes: Is a directory # errors print on STDERR cat: labs: Is a directory # errors print on STDERR cat: notes: Is a directory # errors print on STDERR We can throw away the errors (stderr, unit 2) into `/dev/null`: $ cat * >/tmp/out 2>/dev/null The file `/dev/null` never fills up; it just eats and throws away output. > System Administrators: Do not get in the habit of throwing away all the > error output of commands! You will also throw away legitimate error > messages and nobody will know that these commands are failing. When used as an input pathname, `/dev/null` always appears to be empty: $ wc /dev/null 0 0 0 /dev/null You can use `/dev/null` to provide "no input" to a program that would normally read your keyboard: $ mail -s "Test message" user@example.com out # file "out" is created empty $ cp /etc/passwd x | wc # word count counts nothing; output is zeroes If there was no output on your screen before you added redirection, adding redirection will not create any. You will redirect nothing; no output. Before you add redirection to a command, look at the output on your screen. If there is no output visible on your screen, why are you bothering to redirect it? **You can only redirect what you can see.** `tr` -- a command that only reads Standard Input ================================================ The `tr` command is one of the few (only?) commands that reads standard input and does *not* allow any pathnames on the command line -- you must *always* supply input to `tr` on standard input, either through file input redirection or through a pipe: $ tr 'abc' 'ABC' out # correct for a single file $ cat file1 file2 | tr 'abc' 'ABC' >out # correct for multiple files $ tr 'abc' 'ABC' file1 file2 >out # *** WRONG - ERROR *** tr: too many arguments The `tr` command must always use some kind of Input Redirection to read data. No version of `tr` accepts pathnames on the command line. All versions of `tr` *only* read standard input. Don't make input and output file names the same to `tr` ------------------------------------------------------- Don't make the mistake of using a `tr` output redirection file as its redirection input file. (This doesn't work for any command.) See [**Don't use redirection output file as redirection input file**], above. Different requirements in character lists on System V ----------------------------------------------------- System V Unix versions of `tr` demand that character lists appear inside square brackets, e.g.:   `tr '[abc]' '[ABC]'` Berkeley Unix and Linux do not need or use the brackets around the lists. Example using `tr` ------------------ **Problem:** convert some selected lower-case letters to upper-case from the "who" command: $ who | tr 'abc' 'ABC' **Shell question:** Are the single quotes required around the two arguments? (Are there any special characters in the arguments that need protection?) Don't use character ranges with `tr` ------------------------------------ Using POSIX character classes such as `[:lower:]` and `[:upper:]`, you can use `tr` to convert a lower-case file of text into upper-case. **Warning:** Do not use alphabetic character ranges such as `a-z` or `A-Z` in `tr` or any other commands, since the ranges often contain unexpected characters in the character set collating sequence. For full details, see [Internationalization and Collating] Do not redirect full-screen programs such as VIM ================================================ Full-screen keyboard interactive programs such as the VIM text editor do not behave nicely if you redirect their input or output -- they really want to be talking to your keyboard and screen; don't redirect them or try to run them in the background using `&`. You can hang your terminal if you try. > If you accidentally redirect the input or output of something such as > `vim`, switch screens or log in a second time using a different terminal > and find and kill the hung process. Redirect *only* stderr into a pipe (ADVANCED!) ============================================== It's easy to redirect only **stdout** into a pipe; that's just the way pipes work. In this example below, only **stdout** is sent into the line numbering program. The error message sent to **stderr** bypasses the redirection and goes directly onto the screen: $ ls /etc/passwd nosuchfile | nl ls: cannot access nosuchfile: No such file or directory 1 /etc/passwd It's also easy to redirect *both* **stdout** and **stderr** into a pipe by sending **stderr** to the same place as **stdout**: $ ls /etc/passwd nosuchfile 2>&1 | nl 1 ls: cannot access nosuchfile: No such file or directory 2 /etc/passwd How do you redirect *only* **stderr** into the pipe, and let **stdout** bypass the pipe and go directly to the screen? This is tricky; on the left of the pipe you have to swap **stdout** (attached to the pipe) and **stderr** (attached to the screen). You need a temporary output unit (I use "3", below) to record and remember where the screen is (redirect unit 3 to the same place as unit 2: "`3>&2`"), then redirect **stderr** into the pipe (redirect unit 2 to the same place as unit 1: "`2>&1`"), then redirect **stdout** to the screen (redirect unit 1 to the same place as unit 3: "`1>&3`"): $ ls /etc/passwd nosuchfile 3>&2 2>&1 1>&3 | nl 1 ls: cannot access nosuchfile: No such file or directory /etc/passwd You seldom need to do this advanced trickery, even inside scripts. But you *can* do it! -- | Ian! D. Allen, BA, MMath - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ [Plain Text] - plain text version of this page in [Pandoc Markdown] format [www.idallen.com]: http://www.idallen.com/ [Course Home Page]: .. [Course Outline]: course_outline.pdf [All Weeks]: indexcgi.cgi [Plain Text]: 200_redirection.txt [UNIX: Making Computers More Productive]: http://www.youtube.com/watch?v=tc4ROCJYbm0 [UNIX: Making Computers Easier To Use]: https://www.youtube.com/watch?v=XvDZLjaCJuw [`auth.log`]: data/redir1.txt [output]: data/redir2.txt [1]: data/redir3.txt [`awk` output]: data/redir4.txt [`sort` output]: data/redir5.txt [`uniq` output]: data/redir6.txt [second `sort` output]: data/redir7.txt [**Don't use redirection output file as redirection input file**]: #dont-use-redirection-output-file-as-redirection-input-file [Internationalization and Collating]: ../../../cst8177/15w/notes/000_character_sets.html [Pandoc Markdown]: http://johnmacfarlane.net/pandoc/