Awk Field separator and field references: This is the third article from our tutorial series on
Referencing Fields and Records
In the first article from this tutorial series, Introduction to awk, we covered following points:
Lets consider a familiar example to know about records, fields and delimiters,
In above file, each of the line is interpreted as a record. As each word/string is separated by a colon ( : ), it becomes a delimiter and each word separated by the delimiter i.e.
In
Lets take a look at following example. We have an input file
We can see that there are 10 records and each record has 3 fields. Now we refer to each record and every field with their respective identifiers.
Field Separator
In above example, we have not used any field separator or delimiter anywhere in the
While writing an
So, we can write an
As covered in our first tutorial (link), we can use the instructions from this script using option
To make the output comprehensible, we can introduce a tab ( \t ) character between two output fields.
By default, all the instructions from the script are executed on every single line from the input file. To execute these instructions on selected lines, we can also introduce pattern matching by enclosing the regular expression within slashes (
To verify this, we use our
Or we can filter only those records in which students who have their names starting with string Jo. For this, we can use a regex
Or we can negate the same using the bang or logical not operator ( ! ) as shown below (result is be too long, hence now shown):
That's all for the scope of this article. Please share your feedback and suggestions in the comments section below and stay tuned for more articles on this topic.
awk
. In first article, we had an introduction with awk
and in second one, we created Hello world program in awk
. In this article, we will be learning about separating fields and referencing them using awk
.Referencing Fields and Records
In the first article from this tutorial series, Introduction to awk, we covered following points:
awk
presumes that the input is a structured type of data- It interprets each line from input file(s) as a Record
- Each line will have strings/words separated (or delimited) by whitespaces or some character. These separators are referred to as delimiters.
- Each of those strings/words separated by delimiter is called as a Field.
Lets consider a familiar example to know about records, fields and delimiters,
/etc/passwd
file:messagebus:x:107:111::/var/run/dbus:/bin/false uuidd:x:108:112::/run/uuidd:/bin/false sshd:x:110:65534::/var/run/sshd:/usr/sbin/nologin foouser:x:1001:1001:,,,:/home/foouser:/bin/bash
In above file, each of the line is interpreted as a record. As each word/string is separated by a colon ( : ), it becomes a delimiter and each word separated by the delimiter i.e.
foouser
, 1001
, /bin/bash
, etc. are the fields.In
awk
, we reference each field using $ operator, followed by a number or an awk
variable. We learn more about awk
variables in later articles to keep things simple here. Thus, we can reference first field from the record using $1, second field with $2, third field with $3 and so on. $0 is used to reference the record (or the input line).Lets take a look at following example. We have an input file
result.txt
with contents as below [snipped]:Student Subject Marks James Biology 31 Velma Biology 43 Kibo Biology 81 Louis Biology 11 Phyllis Biology 18 Zenaida Biology 55 Gillian Biology 38 Constance Biology 16 Giselle Biology 73
We can see that there are 10 records and each record has 3 fields. Now we refer to each record and every field with their respective identifiers.
# Referencing first field $ awk '{ print $1 }' result.txt Student James Velma Kibo ... ... # Referencing second field $ awk '{ print $2 }' result.txt Subject Biology Biology Biology ... ... # Referencing third field $ awk '{ print $3 }' result.txt Marks 31 43 81 ... ... # Referencing all fields $ awk '{ print $3, $1, $2 }' result.txt Marks Student Subject 31 James Biology 43 Velma Biology 81 Kibo Biology ... ... # Referencing a record $ awk '{ print $0 }' result.txt Student Subject Marks James Biology 31 Velma Biology 43 Kibo Biology 81 ... ...
Field Separator
In above example, we have not used any field separator or delimiter anywhere in the
awk
command. So, it can be concluded that, awk
considers whitespace as a default field separator. awk
allows us to set a field separator of our own choice with -F
option followed by the delimiter. Lets check this with /etc/passwd
file, that has fields delimited by a colon. # /etc/passwd file contents (snipped) $ cat /etc/passwd ... messagebus:x:107:111::/var/run/dbus:/bin/false uuidd:x:108:112::/run/uuidd:/bin/false sshd:x:110:65534::/var/run/sshd:/usr/sbin/nologin foouser:x:1001:1001:,,,:/home/foouser:/bin/bash ... $ awk -F ':' '{ print $3, $1, $7 }' /etc/passwd ... 107 messagebus /bin/false 108 uuidd /bin/false 110 sshd /usr/sbin/nologin 1001 foouser /bin/bash ...
While writing an
awk
script, we can change the field separator by using awk
variable FS
. We need to instruct awk
to consider a custom delimiter before it start reading lines from input file. Here, BEGIN
block comes handy. BEGIN
block is executed before any input lines are read. Similarly, we have END
block which gets executed once all of the lines from input file are read. Both BEGIN
and END
blocks are optional.So, we can write an
awk
script passwd.awk
as:BEGIN { FS = ":" } { print $3, $1, $7 }
As covered in our first tutorial (link), we can use the instructions from this script using option
-f
as below:$ awk -f passwd.awk /etc/passwd
...
107 messagebus /bin/false
108 uuidd /bin/false
110 sshd /usr/sbin/nologin
1001 foouser /bin/bash
...
To make the output comprehensible, we can introduce a tab ( \t ) character between two output fields.
$ cat passwd.awk BEGIN { FS = ":" } { print $3 "\t" $1 "\t" $7 } $ awk -f passwd.awk /etc/passwd 107 messagebus /bin/false 108 uuidd /bin/false 110 sshd /usr/sbin/nologin 1001 foouser /bin/bash
By default, all the instructions from the script are executed on every single line from the input file. To execute these instructions on selected lines, we can also introduce pattern matching by enclosing the regular expression within slashes (
/[REGEX]/
). This will execute the instructions from awk
script on only those lines matching the regex.To verify this, we use our
results.txt
file again. From the entire list of students and their marks in certain subjects, we can filter only those records of students who got exactly 50 marks, whichever may be the subject. So, we can use 50
as the pattern to match, as shown below:awk ' /50/ {print $1"\t"$2"\t"$3} ' result.txt
Ori Chemistry 50
Hyatt Mathematics 50
Or we can filter only those records in which students who have their names starting with string Jo. For this, we can use a regex
^Jo
with tilde ( ~ ) operator to match against first field ( $1 ) which is name of the student.$ awk ' $1 ~ /^Jo/ { print $1"\t"$2"\t"$3 }' result.txt John Biology 55 Jonas Mathematics 40
Or we can negate the same using the bang or logical not operator ( ! ) as shown below (result is be too long, hence now shown):
$ awk ' $1 !~ /^Jo/ { print $1"\t"$2"\t"$3 }' result.txt
That's all for the scope of this article. Please share your feedback and suggestions in the comments section below and stay tuned for more articles on this topic.
0 comments:
Post a comment