Winter 2017 - January to April 2017 - Updated 2017-01-20 00:48 EST
awk
IndexThe oddly-named awk
command can extract a field (or multiple fields), by field number, from one or more input lines.
The default is to find fields separated by any number of space characters:
$ echo one two three four five
one two three four five
$ echo one two three four five | awk '{ print $1 }'
one
$ echo one two three four five | awk '{ print $2 }'
two
$ echo one two three four five | awk '{ print $5 }'
five
As you see above, you tell awk
which field to extract by using a dollar sign followed by the number of the field on the line.
You can also use the field number NF
(Number of Fields) to extract just the last field from any input line(s):
$ echo one two three four five | awk '{ print $NF }'
five
$ echo one two three four | awk '{ print $NF }'
four
$ echo one two three | awk '{ print $NF }'
three
$ echo one two | awk '{ print $NF }'
two
$ echo one | awk '{ print $NF }'
one
The first command-line argument to awk
must be single-quoted to hide the dollar character inside it from unwanted expansion by the shell.
If there is more than one argument, the remaining arguments are taken as pathnames that awk
will open and from which it will read lines.
The awk
program can do much more (RTFM), but in this course we only use it to extract fields from lines.
If you extract the same field number from a bunch of input lines, you’ve effectively extracted a column from the input:
$ cat file
a b c
1 2 3
d e f
4 5 6
g h i
$ awk '{ print $2 }' file
b
2
e
5
h
Remember that the number of spaces between the fields doesn’t matter.
Here is a common use of fgrep
to select lines and awk
to extract fields from a system log file and count the unique occurrences:
$ fgrep 'refused connect' /var/log/auth.log \
| awk '{print $NF}' \
| sort | uniq -c | sort -nr | head
The awk
program has selected the last field (the IP address) from every input line found by fgrep
. The next sort
command puts all the IP addresses in order, the uniq
command counts adjacent identical lines, the second sort
puts the lines with the highest count first (a numeric sort), and the head
command shows only the top ten.
You can see how awk
extracts the last field on every line by selecting just a few lines of output:
$ fgrep 'refused connect' /var/log/auth.log \
| awk '{print $NF}' | head -n5
(115.239.228.13)
(173.203.113.140)
(222.161.4.147)
(115.231.218.130)
(115.239.228.11)
Usually to use awk
all the input lines have to have the same number of fields, or else the field has to be the last field on every line.