Tuesday, 10 April 2018

AWK Programming Tutorial- Awk built-in variables FS, OFS, RS, ORS, NF, NR

Awk built-in variables: This is the fourth article of this tutorial series on awk and in this one, we will be learning about built-in variables in awk. In case you have missed any of our previous articles, you can find them out here.


Awk comes up with a number of built-in variables. Of these variables, some have a default value associated with them which can be changed e.g. FS ( field separator, with default value of a whitespace ) and RS ( record separator, with default value of \n ). While, some variables are quite useful while doing analysis or creating reports e.g. NF ( number of fields ) and NR ( number of records ). Lets take a look at them one-by-one.

FS (Field Separator) and OFS (Output Field Separator)

  • With FS, we instruct awk that, in a particular input file, fields are separated by some character.
  • Default value if this variable is a whitespace, telling awk that fields are separated by one or more whitespaces (including tabs).
  • This default value can be overwritten with a character or a regular expression. For example, we can use a colon ( : ) to separate fields while working on /etc/passwd file.
  • With OFS, we ask awk to use a particular character to separate the fields in the output.
  • For this variable too, default value is a single whitespace.
Lets take a look at an example now. For this, we will use demo csv file with contents as shown below:

1,"Eldon Base for stackable storage shelf, platinum",Muhammed MacIntyre,3,-213.25,38.94,35,Nunavut,Storage & Organization,0.8
2,"1.7 Cubic Foot Compact ""Cube"" Office Refrigerators",Barry French,293,457.81,208.16,68.02,Nunavut,Appliances,0.58
3,"Cardinal Slant-DÆ Ring Binder, Heavy Gauge Vinyl",Barry French,293,46.71,8.69,2.99,Nunavut,Binders and Binder Accessories,0.39
4,R380,Clay Rozendal,483,1198.97,195.99,3.99,Nunavut,Telephones and Communication,0.58
5,Holmes HEPA Air Purifier,Carlos Soltero,515,30.94,21.78,5.94,Nunavut,Appliances,0.5
6,G.E. Longer-Life Indoor Recessed Floodlight Bulbs,Carlos Soltero,515,4.43,6.64,4.95,Nunavut,Office Furnishings,0.37
7,"Angle-D Binders with Locking Rings, Label Holders",Carl Jackson,613,-54.04,7.3,7.72,Nunavut,Binders and Binder Accessories,0.38
8,"SAFCO Mobile Desk Side File, Wire Frame",Carl Jackson,613,127.70,42.76,6.22,Nunavut,Storage & Organization,
9,"SAFCO Commercial Wire Shelving, Black",Monica Federle,643,-695.26,138.14,35,Nunavut,Storage & Organization,
10,Xerox 198,Dorothy Badders,678,-226.36,4.98,8.33,Nunavut,Paper,0.38

By default FS will use whitespace as a default value. Lets check extracting 1st and 3rd column without default value of FS.

$ awk '{ print $1, $3 }' input.csv 
1,"Eldon for
2,"1.7 Foot
3,"Cardinal Ring
4,R380,Clay and
5,Holmes Air
6,G.E. Indoor
7,"Angle-D with
8,"SAFCO Desk
9,"SAFCO Wire
10,Xerox Badders,678,-226.36,4.98,8.33,Nunavut,Paper,0.38

And now, using comma ( , ) as the field separator value.

$ awk 'BEGIN { FS = ","; } { print $1, $3 }' input.csv
1  platinum"
2 Barry French
3  Heavy Gauge Vinyl"
4 Clay Rozendal
5 Carlos Soltero
6 Carlos Soltero
7  Label Holders"
8  Wire Frame"
9  Black"
10 Dorothy Badders

As we can see in above outputs, awk uses the default value of OFS which is a single whitespace. We can overwrite this value, to say a pipe ( | ) as shown in below example:

$ awk 'BEGIN { FS = ","; OFS = "|" } { print $1, $3 }' input.csv
1| platinum"
2|Barry French
3| Heavy Gauge Vinyl"
4|Clay Rozendal
5|Carlos Soltero
6|Carlos Soltero
7| Label Holders"
8| Wire Frame"
9| Black"
10|Dorothy Badders





RS (Record Separator) and ORS (Output Record Separator)

  • RS and ORS are useful while dealing with multi-line records. In this case, each field is on a new line.
  • Default value of both these variables is a newline character ( \n ).
  • With ORS value overwritten, we can tell awk to separate records with some other character then the newline.
Lets take a look at our demo file wherein each record is separated by dual newlines ( \n\n ) and each field in the record is separated using single newline character ( \n ).

$ cat address.txt 
Cecilia Chapman
711-2880 Nulla St.
Mankato Mississippi 96522
(257) 563-7401

Iris Watson
P.O. Box 283 8562 Fusce Rd.
Frederick Nebraska 20620
(372) 587-2335

Celeste Slater
606-3727 Ullamcorper. Street
Roseville NH 11523
(786) 713-8616

Theodore Lowe
Ap #867-859 Sit Rd.
Azusa New York 39531
(793) 151-6230

Now, to display a person's name ( $1 ) and his/her phone number ( $4 ) on a separate line ( ORS will be \n, while RS is \n\n ), we can use below command:

$ awk ' BEGIN { FS = "\n"; RS = "\n\n"; ORS = "\n" } { print $1, $4 } ' address.txt 
Cecilia Chapman (257) 563-7401
Iris Watson (372) 587-2335
Celeste Slater (786) 713-8616
Theodore Lowe (793) 151-6230

NF (Number of Fields) and NR (Number of Record)

  • Awk variable NF defines the number of fields if the current record ( $0 ).
  • If we try to increase the value of NF, awk adds additional fields separated by the delimiter value in OFS.
  • Whereas, when we decrease the value of NF, all the fields with identifiers greater than the value are ignored.
  • NR is the variable that stores the current record number being processed by awk.
  • There is another variable, FNR, which is useful while dealing with multiple files. It stores the position of the record relative to the current file only.
Lets take a look at below demo file to illustrate this example. If you observe, it has different number of fields on each record.

$ cat cities.txt 
Washington 18 23 21 19
London 10 7 13 5 -1
Moscow 2 0 -3
Mumbai 24 27

Now, we print number of fields a record has before printing the record itself, using below command:

$ awk '{print NF, $0}' cities.txt 
5 Washington 18 23 21 19
6 London 10 7 13 5 -1
4 Moscow 2 0 -3
3 Mumbai 24 27

To illustrate the use of NR, we use the same file again. Its pretty straight forward.

$ awk '{print NR, $0}' cities.txt 
1 Washington 18 23 21 19
2 London 10 7 13 5 -1
3 Moscow 2 0 -3
4 Mumbai 24 27

In case there are multiple files, we can print the record number relative to the current input file being processed using the variable FNR.

$ awk '{print FNR, $0}' cities.txt address.txt 
1 Washington 18 23 21 19
2 London 10 7 13 5 -1
3 Moscow 2 0 -3
4 Mumbai 24 27
1 Cecilia Chapman
2 711-2880 Nulla St.
3 Mankato Mississippi 96522
4 (257) 563-7401
5 
6 Iris Watson
7 P.O. Box 283 8562 Fusce Rd.
8 Frederick Nebraska 20620
9 (372) 587-2335
10 
11 Celeste Slater
12 606-3727 Ullamcorper. Street
13 Roseville NH 11523
14 (786) 713-8616
15 
16 Theodore Lowe
17 Ap #867-859 Sit Rd.
18 Azusa New York 39531
19 (793) 151-6230

Observe the line after line #4. Awk has numbered it #1, just because we have used FNR. Had we used NR here, it would have been numbered #5. You can check this out, I will leave this for you.

That's it for the scope of this article. Please share your feedback and suggestions in the comments section below and stay tuned for more articles. Thanks for reading.

0 comments:

Post a Comment