In some of our articles we have learned about one of the most important and useful utilities in Linux for text processing -
In this article, we are introducing you to another immensely powerful text processing utilities in Bash -
How
How to use
Till now, it is clear that both
When using command line:
When using a script file:
The
Consider that, we have an input file with contents as below (snipped):
In order to print first field of every line from the input file, we use the reference variable
Note that, we have not used any pattern here. So the instructions will simply be executed on each and every line from the input file. In the next example, we take a look at using regular expression to process only selected lines (that matches the pattern) from the input file. Consider that, we just need to find the lines that matches the pattern
In above example, we have not mentioned any instructions. So, the default instruction is to print every line that matches the pattern.
In the next example, we will use both pattern and the instruction. Lets's say, we want to extract the marks from those records that match the pattern
So far, we have not used delimiters in our examples. But, just for your information,
Or we can use multiple instructions in the
Awesome! Don't worry if its too much to hog. We have many articles coming up covering each topic in brief. Just give it a try with simple
Please put your comments or suggestions in the comments section below and stay tuned for more articles on Awesome awk!
sed
. Using sed
, we can:- Edit () one or more files in place
- Simplify and automate file edits on one or more files at a time without using
vi
- Write scripts to process and convert text
sed
: sed command in LinuxIn this article, we are introducing you to another immensely powerful text processing utilities in Bash -
awk
. awk
programming is very useful while manipulating structured data, especially while creating a summerized reports out of some data that has some structure (like a table, which has columns and rows). Below are some of the common tasks which we can perform using awk
:- Interpret a text file as a SQL database and process fields and records.
- Perform arithmetic and logical operations
- Perform string operations
- Create and use conditionals and loops
- Define and use functions
- Execute a shell command and process its output
- Extract, analyze and create reports from the data
awk
offers us, we can replace an entire shell script with an awk
single liner.How
sed
and awk
operateawk
andsed
, both of them work in a similar way.- Read an input line from a file
- Store it in a buffer (or create a copy of it, whatever is easier to understand)
- Run instructions from the script on the buffered (or copied) input line. This won't actually change the original input line.
- Replace the original data with processed data (in case of
sed
).
sed
and awk
Instructions- A
sed
and anawk
instruction consist of two components - a pattern and a process - The pattern part is a regular expression
- The process part is the action we wish to perform
- It reads first line from the input file and first instruction from the script
- It then matches the pattern against the line.
- If there is no match, current instruction is skipped and next one is picked up.
- If a match is found, instruction is executed on the line.
- Once all the instructions are executed on the current line, the cycle repeats for the all other lines from the input file.
- As soon as all instructions are executed on the current line,
sed
prints the output. However,awk
's behaviour is a bit different, as the subsequent instructions in the script control further processing on the line.
How to use
awk
?Till now, it is clear that both
sed
and awk
run instructions on a single line from input file at a time. We can provide these instructions either on command line itself or we can store those in a file. Depending on the scenario, we have two syntaxes to use awk
command:When using command line:
$ awk '<instructions>' <input_file/s>
When using a script file:
$ awk -f <script_file> <input_file/s>
The
awk
instructions in the script file have the same components we discussed earlier - pattern and process. The later component is a bit more complex than that in sed
, as it will have variables, conditionals, loops, functions etc.awk
works better on a structured data, so it considers each line from input file(s) as a record. Being a structured data, each line will have strings/words delimited or separated by spaces or tabs or some character (comma in case of CSV files). awk
interprets each of those strings/words as a field. We can reference these fields using their column numbers as $1
, $2
, $3
and so on, where $1
represents first field from the record, $2
is second field and so on. $0
is used to reference the whole line or record. Lets understand this from below example.Consider that, we have an input file with contents as below (snipped):
$ head result.txt
Student Subject Marks
James Biology 31
Velma Biology 43
Kibo Biology 81
Louis Biology 11
Phyllis Biology 18
Zenaida Biology 55
Gillian Biology 38
Constance Biology 16
Giselle Biology 73
In order to print first field of every line from the input file, we use the reference variable
$1
as :$ awk '{ print $1 }' result.txt Student James Velma Kibo Louis Phyllis Zenaida Gillian Constance Giselle
Note that, we have not used any pattern here. So the instructions will simply be executed on each and every line from the input file. In the next example, we take a look at using regular expression to process only selected lines (that matches the pattern) from the input file. Consider that, we just need to find the lines that matches the pattern
/Ki/
(Remember to include any pattern inside forward slashes).$ awk '/Ki/' result.txt Kibo Biology 81 Kirsten Biology 16 Kieran Biology 45 Kitra Chemistry 47
In above example, we have not mentioned any instructions. So, the default instruction is to print every line that matches the pattern.
In the next example, we will use both pattern and the instruction. Lets's say, we want to extract the marks from those records that match the pattern
/Ki/
. Remember, marks is the 3rd field, so we should reference it accordingly.$ awk '/Ki/ { print $3 }' result.txt 81 16 45 47
So far, we have not used delimiters in our examples. But, just for your information,
awk
interprets whitespace as a delimiter by default. In order to specify a custom delimiter, say a comma for a CSV file, we can use option -F
(and not -f
which we use to provide script file as an input) as below:$ awk -F, '/some_pattern/ {print $2}' input_csv_file
Or we can use multiple instructions in the
awk
command. These instructions need to be separated by a semicolon as shown below:awk '/Ki/ { print $3; print $2; print $1 }' result.txt
81
Biology
Kibo
16
Biology
Kirsten
45
Biology
Kieran
47
Chemistry
Kitra
Awesome! Don't worry if its too much to hog. We have many articles coming up covering each topic in brief. Just give it a try with simple
awk
commands. You may face some errors while using awk
. You just need to correct your syntax and you should be good. Normally, the cause of errors are:- Not opening/closing the braces (
{ }
) - Not opening/closing the single quotes (
' '
) - Not opening or closing the slashes for patterns (
/ /
)
Please put your comments or suggestions in the comments section below and stay tuned for more articles on Awesome awk!
0 comments:
Post a comment