DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

Awk Notes


Background

Section 7.3

Programming model (in ).

Section 7.5.2

3 ways to define FS (field separator): * Default FS: space/tab, multiple successive characters are combined; * Other single character: multiple successive characters won't be combined; * More than one character: treated as regex, which could be a string instead of a single character.

Section 7.6

Any variable has a string and a numeric value. Set the value of CONVFMT to change the number-to-string conversion (need test for strings as number, and number as string).

No declaration for variable, the initial value is 0 and null string.

A space is the string concatenation operator.

Section 7.7

Two types of system variables: * Defined by user: e.g.: field and row separator (defined in the BEGIN procedure); * Updated by awk automatically: e.g.: current record number and input file names;

Type I: * FS * RS

Type II: * NR: the number of input records that were read so far;

$NF (last feild) as $ (last line) of sed.

awk 'END{print}' filename equals to sed -n $p filename.

next equals to continue in while loop in C language.

Chapter 8

Arrays of awk are actually dict (of Python) or map (of Java).

The system variable ARGV and ENVIRON are arrays.

Ref: Sed & Awk, 2nd edition by Dale Dougherty.

Frequently Used Actions

  • Print the last line: awk 'END{print $0}'

  • Print the second last field: awk '{print $(NF-1)}'

  • Print the second last field of last line: awk 'END{print $(NF-1)}'

  • Set a complex delimiter: awk -F ', ' 'END{print $(NF-1)}'

Note: You can't use double quotes instead of single quotes, or an error raises.

Case study: convert timestamp

The input file:

$ cat test.csv
STATION_ID,DEVICE_ID,MONITOR_TIME,CURRENT_VALUE,VOLTAGE_VALUE
OXT,04001406,2019-03-20 11:38:19.679,6.2,679.1
OXT,04001702,2019-03-20 11:45:45.779,6.0,669.0
OXT,04001702,2019-03-20 11:57:33.731,6.4,665.0
OXT,02000906,2019-03-20 11:39:52.074,3.5,730.4
OXT,02000906,2019-03-20 11:49:31.618,3.7,717.1
OXT,02000906,2019-03-20 11:50:44.541,3.7,719.2
OXT,02001301,2019-03-20 11:41:45.810,3.7,710.5
OXT,02001301,2019-03-20 11:44:29.355,3.5,734.1
OXT,02001301,2019-03-20 11:45:08.453,3.5,729.1

We need to remove milliseconds in timestamp of each record, for example: from OXT,04001406,2019-03-20 11:38:19.679,6.2,679.1 to OXT,04001406,2019-03-20 11:38:19,6.2,679.1.

While keep the header line unchanged.

Solutions:

Option 1: using ternary operator: awk -F, '{print (NR==1)? $0 : $1 FS $2 FS substr($3, 1, 19) FS $4 FS $5}' test.csv.

Here -F and FS for field seperator, NR for the number of row.

To concat multiple string fields, list their names and seperate with whitespace.

The ternary operator is a expression instead of a statement. Here it's used as the target of print.

Or make the code easier to read through a variable definition: awk -F, '{text = $1 FS $2 FS substr($3, 1, 19) FS $4 FS $5; print (NR==1)? $0 : text}' test.csv

Note: using ; to seperate statements. There's no $ prefix before plain variable names.

Option 2: using if-else statement: awk -F, '{if (NR==1) print $0; else print $1 FS $2 FS substr($3, 1, 19) FS $4 FS $5}' test.csv

Don't forget the semicolon after the if clause.

Or use a multiline statement:

$ awk 'BEGIN{FS=","}
{if (NR==1) print $0
 else print $1 FS $2 FS substr($3, 1, 19) FS $4 FS $5
}' test.csv

Finally output to a file: awk '...' test.csv > result.csv.



Published

Apr 10, 2014

Last Updated

Apr 24, 2020

Category

Tech

Tags

  • awk 3
  • shell 46

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor