Introduction to Linux - A Hands on Guide | Linux Bible | Linux From Scratch | A Newbie's Getting Started Guide to Linux | Linux Command Line Cheat Sheet | More Linux eBooks



Saturday, 2 May 2015

Sed Command in Linux - Search and Replace Text in a File

This is the fourth article of the "Super sed' Series", in which we will learn how to search for a pattern and replace it with other pattern in a file, using regular expressions. In previous articles on sed command, we've learned how to print lines in a file, delete lines from a file and insert/append lines to a file.


Before we directly jump to the main content, every learner should know what sed is. Here is the brief introduction of the Super sed:
  • sed stand for Stream EDitor and it being based on the ed editor, it borrows most of the commands from the ed. It was developed by Lee E. McMahon of Bell Labs.
  • sed offers large range of text transformations that include printing lines, deleting lines, editing line in-place, search and replace, appending and inserting lines, etc.
  • sed is useful whenever you need to perform common editing operations on multiple lines without using 'vi' editor.
  • Whenever sed is executed on an input file or on the contents from stdin, sed reads the file line-by-line and after removing the trailing newline, places it in the "Pattern space", where the commands are executed on them after conditions (as in case of regex matching) are verified, and then printed on the stdout.

Search and Replace a Pattern using sed

Before we start, just remember two points:
  1. sed "s" command searches for a particular pattern in the file and if the pattern is matched, it is replaced with another pattern mentioned in the command. (We will see the syntax a bit later, just remember that, there is a search pattern and a replace pattern).
  2. When ^ means beginning of the line and $ denotes end of the line, ^$ makes an "Empty Line", very useful while deleting empty lines from a file.
  3. sed with option -i will edit the file in place, i.e. unless you use the option -i, the changes will not be written to the file.
  4. sed with option -n will suppress automatic printing of pattern buffer/space. So, we would want to use this option (How and why? Check this article).
For our better understanding, let us have a file sedtest.txt with contents as follows:

$ cat sedtest.txt
This is line #1
It is line #2
That is line #3
While, this is line #4
It's line #5
I am line #6
Myself line #7
It's me, line #8
Hello, I am line #9
Last line, line #10

Syntax for Search and Replace:

The syntax used by sed for 'search and replace' might seem to be a bit confusing initially, but its pretty straight-forward once it's completely understood.

sed 's/SEARCH/REPLACE/OPTIONS' FILE.txt
Let's get deeper in above syntax,
  • s - As mentioned earlier, s is used to search for a pattern.
  • / - It's a delimiter. There are some alternatives for / to be used as a delimiter. A few of them are # % @ :.
  • SEARCH - What are you searching for? You would be mentioning the search pattern here.
  • REPLACE - I've got the SEARCH pattern, what should I replace it with? You would be mentioning the replace pattern here.
  • OPTIONS - In order to make your work more comfortable, s commands comes with following options:
  • i : If you use this option, your search for the SEARCH pattern would be case-insensitive.
    • g : It can be used to replace ALL the SEARCH instances with REPLACE pattern.
    • n : Unlike g, it would replace only the 'n'th occurrence of the pattern.
    • w : Just I've mentioned, in case you wish to edit the original file in place, you will have to use option -i. But, in case you want to write the output to other file, keeping the original file intact, you can use option w. Of course, redirections (> or >>) could have been used instead.
    • p : When this option is used, it would display only those lines wherein replacements are made.
Let us now begin with the practical examples.

1. sed s command with no option specified

When we do not mention any of the options (g, n or w), only the first match of each line is replaced. Have a look at below example, in which we will replace the letter i with X. Here, SEARCH will be letter 'i', REPLACE will be letter 'X' and we do not mention any option therein.

Example:

$ sed 's/i/X/' sedtest.txt
ThXs is line #1
It Xs line #2
That Xs line #3
WhXle, this is line #4
It's lXne #5
I am lXne #6
Myself lXne #7
It's me, lXne #8
Hello, I am lXne #9
Last lXne, line #10
So, from above output, we can conclude that only the first occurrence of letter 'i' is replaced, while letter 'X' along with non-first occurrences of letter 'i' remained unchanged.

2. Replace 'n'th occurrence of a word/string

Consider that, in above file sedtest.txt, we need to replace the 2nd occurrence of the string is with the string XX. So, in this case, SEARCH = is, REPLACE=XX and this time, we would use an option 2 indicating that, we are interested in the 2nd occurrence only.

Example:

$ sed 's/is/XX/2' sedtest.txt
This XX line #1
It is line #2
That is line #3
While, this XX line #4
It's line #5
I am line #6
Myself line #7
It's me, line #8
Hello, I am line #9
Last line, line #10
If you look at line 1 and 4, it has skipped the first occurrence of the pattern 'is' (included in the word This) and replaced the 2nd one with 'XX'.

3. Replace all the occurrences of a word/string

Now, let us replace all the occurrences of the word 'line' with the word 'sentence'. For this purpose, SEARCH = line, REPLACE=sentence and the option would be g indicating that, we are interested in all the occurrences.

Example:

$ sed 's/line/sentence/g' sedtest.txt
This is sentence #1
It is sentence #2
That is sentence #3
While, this is sentence #4
It's sentence #5
I am sentence #6
Myself sentence #7
It's me, sentence #8
Hello, I am sentence #9
Last sentence, sentence #10
So, as expected, all the lines are replaced by sentences.

4. Display only the altered lines using p

As mentioned in the earlier part, sed s command with p option will print only those lines in which substitutions are made. By default, it will also display the pattern buffer, and to suppress it we would use -n.

Example:
Without -n,

$ sed 's/While/Whereas/gp' sedtest.txt
This is sentence #1
It is sentence #2
That is sentence #3
Whereas, this is sentence #4
Whereas, this is sentence #4
It's sentence #5
I am sentence #6
Myself sentence #7
It's me, sentence #8
Hello, I am sentence #9
Last sentence, sentence #10
Could you find 2 'Whereas's?

With -n,

$ sed -n 's/While/Whereas/gp' sedtest.txt
Whereas, this is sentence #4
So, conclusion is, option p prints the line where replacement has been done.

5. Dump the substitutions into a file

Similar to option p, but where p displays the changes on STDOUT, option g writes the changes to a file. In this case, we need to mention a file name, say changes.txt, where changes are supposed to be written to.

Example:

$ sed -n 's/While/Whereas/gpw changes.txt' sedtest.txt
Whereas, this is sentence #4
$
$ cat changes.txt
Whereas, this is sentence #4
Here, with pw together, we can make sure that, the lines displayed by p are written to the output file.

6. Delete words/strings using sed

Just imagine, what would happen if we use nothing as a REPLACE pattern?

$ sed 's/line//g' sedtest.txt
This is  #1
It is  #2
That is  #3
While, this is  #4
It's  #5
I am  #6
Myself  #7
It's me,  #8
Hello, I am  #9
Last ,  #10
Whoops! The lines are gone.

Similarly, instead of deleting an entire string, if we wish to delete last 2 characters of every line, our SEARCH pattern will be ..$, where each period [.] (intentionally used square brackets ;)) matches a letter and $ matches end of the line. So, when they combine together make "Last two characters of a line".

Example:

$ sed 's/..$//g' sedtest.txt
This is line 
It is line 
That is line 
While, this is line 
It's line 
I am line 
Myself line 
It's me, line 
Hello, I am line 
Last line, line 
So, this time, line numbers are gone.

7. Replace only if line contains a pattern

Let's consider that, I wish to replace the word line with statement, only if that line contains the word Myself. Had you read previous articles on sed, you would have done this exercise on your own.

$ sed '/Myself/s/line/statement/g' sedtest.txt
This is line #1
It is line #2
That is line #3
While, this is line #4
It's line #5
I am line #6
Myself statement #7
It's me, line #8
Hello, I am line #9
Last line, line #10 
See statement #7, line is a statement now.

8. Multiple actions in sed command

This time, the scenario is - I have a text file and it's heavily commented. I do not want to read those comments and I want those lines to be deleted. So, we need to remove all the characters in a line that starts with a #, which will give us an empty line. And, in order to delete that line, we would need the d command of sed.

In this case, the SEARCH pattern will be ^#.*, indicating whatever that begins with a # in a line, REPLACE pattern will be empty. We have already seen how to delete empty lines using sed with d command.

Example:
$ sed 's/^#.*//g; /^$/d' FILE.txt
Just try this.

That was all about the fourth article on sed command. More articles on sed are coming soon. So, stay tuned. Of course, do not forget to share your feedback in the comment section below.

1 comment: