% Selecting fields from input lines using awk % Ian! D. Allen -- -- [www.idallen.com] % Fall 2015 - September to December 2015 - Updated 2017-01-20 00:48 EST - [Course Home Page] - [Course Outline] - [All Weeks] - [Plain Text] Extracting fields from lines: `awk` =================================== The oddly-named `awk` command can extract a field (or multiple fields), by field number, from one or more input lines. The default is to find fields separated by any number of *space* characters: $ echo one two three four five one two three four five $ echo one two three four five | awk '{ print $1 }' one $ echo one two three four five | awk '{ print $2 }' two $ echo one two three four five | awk '{ print $5 }' five As you see above, you tell `awk` which field to extract by using a dollar sign followed by the number of the field on the line. You can also use the field number `NF` (Number of Fields) to extract just the *last* field from any input line(s): $ echo one two three four five | awk '{ print $NF }' five $ echo one two three four | awk '{ print $NF }' four $ echo one two three | awk '{ print $NF }' three $ echo one two | awk '{ print $NF }' two $ echo one | awk '{ print $NF }' one The first command-line argument to `awk` must be single-quoted to hide the dollar character inside it from unwanted expansion by the shell. If there is more than one argument, the remaining arguments are taken as pathnames that `awk` will open and from which it will read lines. The `awk` program can do much more (RTFM), but in this course we only use it to extract fields from lines. Extracting a column from a file ------------------------------- If you extract the same field number from a bunch of input lines, you've effectively extracted a **column** from the input: $ cat file a b c 1 2 3 d e f 4 5 6 g h i $ awk '{ print $2 }' file b 2 e 5 h Remember that the number of spaces between the fields doesn't matter. Extracting a column and counting it ----------------------------------- Here is a common use of `fgrep` to select lines and `awk` to extract fields from a system log file and count the unique occurrences: $ fgrep 'refused connect' /var/log/auth.log \ | awk '{print $NF}' \ | sort | uniq -c | sort -nr | head The `awk` program has selected the last field (the IP address) from every input line found by `fgrep`. The next `sort` command puts all the IP addresses in order, the `uniq` command counts adjacent identical lines, the second `sort` puts the lines with the highest count first (a numeric sort), and the `head` command shows only the top ten. You can see how `awk` extracts the last field on every line by selecting just a few lines of output: $ fgrep 'refused connect' /var/log/auth.log \ | awk '{print $NF}' | head -n5 (115.239.228.13) (173.203.113.140) (222.161.4.147) (115.231.218.130) (115.239.228.11) Usually to use `awk` all the input lines have to have the same number of fields, or else the field has to be the last field on every line. -- | Ian! D. Allen, BA, MMath - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ [Plain Text] - plain text version of this page in [Pandoc Markdown] format [www.idallen.com]: http://www.idallen.com/ [Course Home Page]: .. [Course Outline]: course_outline.pdf [All Weeks]: indexcgi.cgi [Plain Text]: 187_selecting_fields_awk.txt [Pandoc Markdown]: http://johnmacfarlane.net/pandoc/