% Selecting fields from input lines using awk
% Ian! D. Allen -- <idallen@idallen.ca> -- [www.idallen.com]
% Fall 2015 - September to December 2015 - Updated 2017-01-20 00:48 EST

-   [Course Home Page]
-   [Course Outline]
-   [All Weeks]
-   [Plain Text]

Extracting fields from lines: `awk`
===================================

The oddly-named `awk` command can extract a field (or multiple fields), by
field number, from one or more input lines.

The default is to find fields separated by any number of *space* characters:

    $ echo one two three four five
    one two three four five

    $ echo one two three four five | awk '{ print $1 }'
    one

    $ echo one two three four five | awk '{ print $2 }'
    two

    $ echo one two three four five | awk '{ print $5 }'
    five

As you see above, you tell `awk` which field to extract by using a dollar
sign followed by the number of the field on the line.

You can also use the field number `NF` (Number of Fields) to extract just the
*last* field from any input line(s):

    $ echo one two three four five | awk '{ print $NF }'
    five

    $ echo one two three four | awk '{ print $NF }'
    four

    $ echo one two three | awk '{ print $NF }'
    three

    $ echo one two | awk '{ print $NF }'
    two

    $ echo one | awk '{ print $NF }'
    one

The first command-line argument to `awk` must be single-quoted to hide the
dollar character inside it from unwanted expansion by the shell.

If there is more than one argument, the remaining arguments are taken as
pathnames that `awk` will open and from which it will read lines.

The `awk` program can do much more (RTFM), but in this course we only use it
to extract fields from lines.

Extracting a column from a file
-------------------------------

If you extract the same field number from a bunch of input lines, you've
effectively extracted a **column** from the input:

    $ cat file
    a b c
    1 2 3
    d e f
    4 5 6
    g h i

    $ awk '{ print $2 }' file
    b
    2
    e
    5
    h

Remember that the number of spaces between the fields doesn't matter.

Extracting a column and counting it
-----------------------------------

Here is a common use of `fgrep` to select lines and `awk` to extract fields
from a system log file and count the unique occurrences:

    $ fgrep 'refused connect' /var/log/auth.log \
       | awk '{print $NF}' \
       | sort | uniq -c | sort -nr | head

The `awk` program has selected the last field (the IP address) from every
input line found by `fgrep`. The next `sort` command puts all the IP
addresses in order, the `uniq` command counts adjacent identical lines, the
second `sort` puts the lines with the highest count first (a numeric sort),
and the `head` command shows only the top ten.

You can see how `awk` extracts the last field on every line by selecting just
a few lines of output:

    $ fgrep 'refused connect' /var/log/auth.log \
       | awk '{print $NF}' | head -n5
    (115.239.228.13)
    (173.203.113.140)
    (222.161.4.147)
    (115.231.218.130)
    (115.239.228.11)

Usually to use `awk` all the input lines have to have the same number of
fields, or else the field has to be the last field on every line.

    -- 
    | Ian! D. Allen, BA, MMath  -  idallen@idallen.ca  -  Ottawa, Ontario, Canada
    | Home Page: http://idallen.com/   Contact Improv: http://contactimprov.ca/
    | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
    | Defend digital freedom:  http://eff.org/  and have fun:  http://fools.ca/

[Plain Text] - plain text version of this page in [Pandoc Markdown] format

  [www.idallen.com]: http://www.idallen.com/
  [Course Home Page]: ..
  [Course Outline]: course_outline.pdf
  [All Weeks]: indexcgi.cgi
  [Plain Text]: 187_selecting_fields_awk.txt
  [Pandoc Markdown]: http://johnmacfarlane.net/pandoc/