Background
Section 7.3
Programming model (in ).
Section 7.5.2
3 ways to define FS
(field separator):
* Default FS
: space/tab, multiple successive characters are combined;
* Other single character: multiple successive characters won't be combined;
* More than one character: treated as regex,
which could be a string instead of a single character.
Section 7.6
Any variable has a string and a numeric value. Set the value of CONVFMT to change the number-to-string conversion (need test for strings as number, and number as string).
No declaration for variable, the initial value is 0 and null string.
A space is the string concatenation operator.
Section 7.7
Two types of system variables: * Defined by user: e.g.: field and row separator (defined in the BEGIN procedure); * Updated by awk automatically: e.g.: current record number and input file names;
Type I: * FS * RS
Type II: * NR: the number of input records that were read so far;
$NF
(last feild) as $
(last line) of sed.
awk 'END{print}' filename
equals to sed -n $p filename
.
next
equals to continue
in while
loop in C language.
Chapter 8
Arrays of awk are actually dict (of Python) or map (of Java).
The system variable ARGV and ENVIRON are arrays.
Ref: Sed & Awk, 2nd edition by Dale Dougherty.
Frequently Used Actions
-
Print the last line: awk 'END{print $0}'
-
Print the second last field: awk '{print $(NF-1)}'
-
Print the second last field of last line: awk 'END{print $(NF-1)}'
-
Set a complex delimiter: awk -F ', ' 'END{print $(NF-1)}'
Note: You can't use double quotes instead of single quotes, or an error raises.
Case study: convert timestamp
The input file:
$ cat test.csv
STATION_ID,DEVICE_ID,MONITOR_TIME,CURRENT_VALUE,VOLTAGE_VALUE
OXT,04001406,2019-03-20 11:38:19.679,6.2,679.1
OXT,04001702,2019-03-20 11:45:45.779,6.0,669.0
OXT,04001702,2019-03-20 11:57:33.731,6.4,665.0
OXT,02000906,2019-03-20 11:39:52.074,3.5,730.4
OXT,02000906,2019-03-20 11:49:31.618,3.7,717.1
OXT,02000906,2019-03-20 11:50:44.541,3.7,719.2
OXT,02001301,2019-03-20 11:41:45.810,3.7,710.5
OXT,02001301,2019-03-20 11:44:29.355,3.5,734.1
OXT,02001301,2019-03-20 11:45:08.453,3.5,729.1
We need to remove milliseconds in timestamp of each record, for example:
from OXT,04001406,2019-03-20 11:38:19.679,6.2,679.1
to
OXT,04001406,2019-03-20 11:38:19,6.2,679.1
.
While keep the header line unchanged.
Solutions:
Option 1: using ternary operator:
awk -F, '{print (NR==1)? $0 : $1 FS $2 FS substr($3, 1, 19) FS $4 FS $5}' test.csv
.
Here -F
and FS
for field seperator, NR
for the number of row.
To concat multiple string fields, list their names and seperate with whitespace.
The ternary operator is a expression instead of a statement.
Here it's used as the target of print
.
Or make the code easier to read through a variable definition:
awk -F, '{text = $1 FS $2 FS substr($3, 1, 19) FS $4 FS $5; print (NR==1)? $0 : text}' test.csv
Note: using ;
to seperate statements. There's no $
prefix before plain variable names.
Option 2: using if-else statement:
awk -F, '{if (NR==1) print $0; else print $1 FS $2 FS substr($3, 1, 19) FS $4 FS $5}' test.csv
Don't forget the semicolon after the if
clause.
Or use a multiline statement:
$ awk 'BEGIN{FS=","}
{if (NR==1) print $0
else print $1 FS $2 FS substr($3, 1, 19) FS $4 FS $5
}' test.csv
Finally output to a file: awk '...' test.csv > result.csv
.