In some of my recent articles on text processing, I have explained the use of
sedcommand in Linux/Unix. In case of
sedcommand, we provide an input file to the command, it reads the file line-by-line, processes each line and then prints it on the
STDOUT. So, in brief, its a row-wise operation. Similar is the case with
cutcommand - there is an input file, there is processing part and the processed output can be displayed on
STDOUTor saved in a file. A minor difference between
cutcommand processes the file in vertical manner. So, the outcome of the
cutcommand is a single or multiple columns.
As of now, just remember that,
cutcommand is just a filter, that processes the file and extracts columns from it. Basically, using
cutcommand, we can process a file in order to extract - either a column of characters or some fields. Thus, to achieve more clarity about
cutcommand, we would study it in two parts.
Here we go!
A. Extracting Column of CharactersTo begin with, consider a file
cuttest.txtwith contents as below:
Now, just have a look at the basic syntax of the
$ cat cuttest.txt This is line #1 It is line #2 That is line #3 While, this is line #4 It's line #5 I am line #6 Myself line #7 It's me, line #8 Hello, I am line #9 Last line, line #10
cutcommand, to extract column(s) of characters from a file:
cut -c [RANGE] [FILENAME]
To explain this briefly, we are instructing
cutcommand to select on the specific characters specified by
RANGEfrom the file
1. Display a Column of CharactersTo begin with, lets display the fourth character from each line of the file
This does make sense!
$ cat -c 4 cuttest.txt s i t l s m e s l t
2. Display a Group of Columns of Characters
In order to extract a group of columns, we need to specify a range - Start and End, to the
cutcommand. To try with, lets display first five characters of each line of the file.
Conclusion is - a whitespace is also considered as a character.
$ cut -c 1-5 cuttest.txt This It is That While It's I am Mysel It's Hello Last
Another variant of this case is, when you want to start from a particular column and display till the last one. As an example, we will start displaying from the 6th column will the end. So, in this case, we would mention start of the range as '6' and we do not mention any end. Thus, it will print everything after the 6th column.
Similarly, to get first 6 characters from the beginning of each line, we would have an example as follows:
$ cut -c 6- cuttest.txt is line #1 line #2 is line #3 , this is line #4 line #5 line #6 f line #7 me, line #8 , I am line #9 line, line #10
Now, there might be a curiosity that, what if I don't mention the start and the end of the range. Let's see what happens-
$ cut -c -6 cuttest.txt This i It is That i While, It's l I am l Myself It's m Hello, Last l
Those who thought that entire columns will be printed, are proved to be wrong. Conclusion is - There has to be a valid range.
$ cut -c - cuttest.txt cut: invalid range with no endpoint: -
B. Extracting Field from a File
In order to understand this usage of
cutcommand, lets consider a csv file as follows:
$ cat employees.txt Employee ID, Employee Name, Age, Gender, Department, Salary 101, John Davies, 35, M, Finance, $4000 102, Mary Fernandes, 29, F, Human Resources, $3000 103, Jacob Williams, 40, M, Sales, $4700 104, Sean Anderson, 25, M, Production, $2700 105, Nick Jones, 42, M, Finance, $7500 106, Diana Richardson, 29, F, Finance, $3200
Remember, in order to extract a field from a file, we would need a delimiter (i.e. a column separator), based on which the file will be divided into columns and we can extract any of them. In this case, the syntax would be-
Here, we are instructing
cut -d [DELIMITER] -f [RANGE] [FILENAME]
cutcommand to use a particular delimiter with option
-dand then extract certain fields using option
1. Display a specific field from a file
In case of a csv file, it is crystal clear that our delimiter will be a comma (,). Now, we need to enlist the names of the employees working in our organization, i.e. field number 2.
$ cut -d ',' -f 2 employees.txt Employee Name John Davies Mary Fernandes Jacob Williams Sean Anderson Nick Jones Diana Richardson
2. Displaying Multiple Fields from a File
Moving forward now, lets display more than one field now. Suppose, we need to include 'Age' and 'Gender' fields also. For this, we must specify the range - again, a start and an end.
Conclusion, in this case, is that, Input Delimiter = Output Delimiter.
$ cut -d ',' -f 2-4 employees.txt Employee Name, Age, Gender John Davies, 35, M Mary Fernandes, 29, F Jacob Williams, 40, M Sean Anderson, 25, M Nick Jones, 42, M Diana Richardson, 29, F
Lets have a look at a variant in this case. Suppose, we need to extract 'Employee ID', 'Employee Name', 'Department' and 'Salary'. In that case, we need to specify two ranges as below:
This is just awesome!
$ cut -d ',' -f 1-2,5-6 employees.txt Employee ID, Employee Name, Department, Salary 101, John Davies, Finance, $4000 102, Mary Fernandes, Human Resources, $3000 103, Jacob Williams, Sales, $4700 104, Sean Anderson, Production, $2700 105, Nick Jones, Finance, $7500 106, Diana Richardson, Finance, $3200
3. Change the Delimiter in the OutputAs we just saw in one of the examples above, by default, Input Delimiter = Output Delimiter. What if I wish to change the output delimiter? Just have a look at the example below:
$ cut -d ',' -f 2-4 --output-delimiter='|' employees.txt Employee Name| Age| Gender John Davies| 35| M Mary Fernandes| 29| F Jacob Williams| 40| M Sean Anderson| 25| M Nick Jones| 42| M Diana Richardson| 29| F
4. Do not Display Certain Columns
Just like above example, if we use
--complementas an option,
cutcommand will display all the fields, but the specified field.
That's all for this tutorial. Please do let me know about your views about this article in the comment section below, and stay tuned for more awesome articles.
$ cut -d ',' --complement -f 6 employees.txt Employee ID, Employee Name, Age, Gender, Department 101, John Davies, 35, M, Finance 102, Mary Fernandes, 29, F, Human Resources 103, Jacob Williams, 40, M, Sales 104, Sean Anderson, 25, M, Production 105, Nick Jones, 42, M, Finance 106, Diana Richardson, 29, F, Finance