Selecting fields from input lines using awk

Ian! D. Allen – www.idallen.com

Winter 2015 - January to Apil 2015 - Updated 2017-01-20 00:48 EST

1 Extracting fields from lines: awkIndexup to index

The oddly-named awk command can extract a field (or multiple fields), by field number, from one or more input lines.

The default is to find fields separated by any number of space characters:

$ echo one two three four five
one two three four five

$ echo one two three four five | awk '{ print $1 }'
one

$ echo one two three four five | awk '{ print $2 }'
two

$ echo one two three four five | awk '{ print $5 }'
five

As you see above, you tell awk which field to extract by using a dollar sign followed by the number of the field on the line.

You can also use the field number NF (Number of Fields) to extract just the last field from any input line(s):

$ echo one two three four five | awk '{ print $NF }'
five

$ echo one two three four | awk '{ print $NF }'
four

$ echo one two three | awk '{ print $NF }'
three

$ echo one two | awk '{ print $NF }'
two

$ echo one | awk '{ print $NF }'
one

The first command-line argument to awk must be single-quoted to hide the dollar character inside it from unwanted expansion by the shell.

If there is more than one argument, the remaining arguments are taken as pathnames that awk will open and from which it will read lines.

The awk program can do much more (RTFM), but in this course we only use it to extract fields from lines.

1.1 Extracting a column from a fileIndexup to index

If you extract the same field number from a bunch of input lines, you’ve effectively extracted a column from the input:

$ cat file
a b c
1 2 3
d e f
4 5 6
g h i

$ awk '{ print $2 }' file
b
2
e
5
h

Remember that the number of spaces between the fields doesn’t matter.

1.2 Extracting a column and counting itIndexup to index

Here is a common use of fgrep to select lines and awk to extract fields from a system log file and count the unique occurrences:

$ fgrep 'refused connect' /var/log/auth.log \
   | awk '{print $NF}' \
   | sort | uniq -c | sort -nr | head

The awk program has selected the last field (the IP address) from every input line found by fgrep. The next sort command puts all the IP addresses in order, the uniq command counts adjacent identical lines, the second sort puts the lines with the highest count first (a numeric sort), and the head command shows only the top ten.

You can see how awk extracts the last field on every line by selecting just a few lines of output:

$ fgrep 'refused connect' /var/log/auth.log \
   | awk '{print $NF}' | head -n5
(115.239.228.13)
(173.203.113.140)
(222.161.4.147)
(115.231.218.130)
(115.239.228.11)

Usually to use awk all the input lines have to have the same number of fields, or else the field has to be the last field on every line.

Author: 
| Ian! D. Allen, BA, MMath  -  idallen@idallen.ca  -  Ottawa, Ontario, Canada
| Home Page: http://idallen.com/   Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom:  http://eff.org/  and have fun:  http://fools.ca/

Plain Text - plain text version of this page in Pandoc Markdown format

Campaign for non-browser-specific HTML   Valid XHTML 1.0 Transitional   Valid CSS!   Creative Commons by nc sa 3.0   Hacker Ideals Emblem   Author Ian! D. Allen