Overview
Sed stands for (s)tream (ed)itor and is useful for taking any input you have and modifying it in some way before sending it along as output, either to the console or as input to the next command in your pipeline. It operates on each line of your input, executing some command on each as the input is read.
Chances are if you’ve encountered sed
before, it came from some sage linux
master on stack overflow when providing an answer to a bash scripting question.
If you’ve used sed
yourself for some task, it’s likely using the substitute
command to find and replace some search term.
You may have seen a command like this before:
echo "this is line one\nthis is line two" | sed 's/line/snake/'
this is snake one
this is snake two
However, sed
can do so much more, if only you spend the time to figure out
what on earth it’s talking about in its manual. In general, a command is defined
in the man page as follows:
[address[,address]]function[arguments]
But to unpack this fully, we will need to clear up some strange terminology.
Terminology
- Address
If all you’ve ever used
sed
to do is substitute terms usings/term/replacement/
, the concept of anaddress
might not make sense at first. If one is provided, it will be before the function (s
- substitute, in this case). Anaddress
provides some way to restrict which lines of the inputsed
will actually operate on. You can also optionally provide a second address to create a range of lines forsed
to restrict its operation over.An address can be one of three things:
- a number representing which line(s) of input to act on. ex
3,7s/apple/banana/
- the
$
sign, meaning the last line of input - a regex, meaning only lines of input which match the regex will be passed along to the function
In this example, we're telling
sed
to execute the substitude command only on lines 2 through four. For lines within that range, the provided function will be ran. For lines outside the range, lines will be left untouched and printed as is.sed '2,4s/line/fish/'
In the next example,
$
is used as the terminating address to restrict the liness
operates on.sed '3,$s/foo/bar/'
In this example, there are two lines with the word
two
. If we just did a substitute command without an address, all lines with the wordtwo
would be changed tosnake
. But because we are providing a regex address (restricting to lines containingapple
), only line four (containing the wordapple
) actually has the text in the substitute command replaced.echo "line one\nline two\nline two\napple two" | sed '/apple/s/two/snake/'
line one line two line two apple snake
- a number representing which line(s) of input to act on. ex
- function
The actual thing you want
sed
to do.- arguments
Some functions take arguments. We’ll cover this a bit more when covering some of those functions.
- cycle
Because
sed
is line oriented, it filters input and executes functions on a per line bases. Each round of reading in a line from input, checking if the line is within the given address, executing the function, and printing output constitutes one cycle.
Functions
So far we’re only shown examples using the (s)ubstitute function. There are many
more. Let’s start with a simple one. The p
function “Writes the pattern space
to standard output.”. Ok, hold up. What’s a pattern space? To answer that,
we’ll need to dig a bit deeper into how sed
works.
And to do that, let’s visualize a some of the moving parts of a sed
execution
with the following input:
echo "This is the first line of text.
And this is another line.
Here's another for you.
This is anonther line with the word 'line' in it.
and this is the last line." > example.txt
Say we have the given sed
command: sed '2,4s/line/REPLACED/'
. The addresses
are the lines 2 through four. The command is substitute
the text line
with
REPLACED
.
line no. | pattern space | output |
---|---|---|
1 | This is the first line of text. | This is the first line of text. |
2 | And this is another line. | And this is another REPLACED. |
3 | Here’s another for you. | Here’s another for you. |
4 | This is another line with the word ’line’ in it. | This is another REPLACED with the word ’line’ in it. |
5 | and this is the last line. | and this is the last line. |
Let’s walk through each line sed
read and operated on (remember this is called
a cycle
). Line one loaded the text into the pattern space, which is a
temporary buffer used to operate on with the given function. Sed
loads each
line into the pattern space regardless of if the line falls within the address
(if provided).
Next sed
checks if the line is within the provided address (2,4
in this
case). Because this is line 1, it does not fall within the address provided,
so sed
just prints it to output without operating on it.
After this, it clears the pattern space, and a new cycle begins by loading the next line of input into the pattern space.
Line 2 is within the pattern provided, so the substitute function will run. We
provided the arguments line
and REPLACED
to the function, and so the output
reflects that substitution.
The same happens for line 3, except there was no match for the substitute to
replace. Line 4 is the last line in the address space of 2,4
, so line
is
replaced, however you will notice only the first occurrence of line
was
replaced. By default, sed
will only find the first instance. This can be
changed if you provide the g
flag to the function: sed 's/foo/bar/g'
. Line
5 is outside the address space, and so line
is not replaced.
sed '2,4s/line/REPLACED/' example.txt
This is the first line of text.
And this is another REPLACED.
Here's another for you.
This is anonther REPLACED with the word 'line' in it.
and this is the last line.
Additionally, the behavior of, print the line, unchanged, when outside the
address can be modified by using the -n
flag.
With all of this, we’re now ready to look at the p
function again. Consider
our example again.
sed '2,4s/line/REPLACED/p' example.txt
This is the first line of text.
And this is another REPLACED.
And this is another REPLACED.
Here's another for you.
This is anonther REPLACED with the word 'line' in it.
This is anonther REPLACED with the word 'line' in it.
and this is the last line.
Whith the additional p
function executing after the s
function, only the
lines that s
successfully operated on get passed along to the p
function.
This is why line 3 was not printed twice even though it was within the address
range.
Now let’s try it without printing by default.
sed -n '2,4s/line/REPLACED/p' example.txt
And this is another REPLACED.
This is anonther REPLACED with the word 'line' in it.
Is this what you expected? Hopefully by this point it should make sense. Lines
are not printed by default because of the -n
flag, and only lines that passed
s
successfully were then sent to p
to be printed.
Next time
The more I dug into said, the more powerful I discovered it really was. This is
only the beginning. Hopefully this alone was helpful to try and understand the
man
page on sed
.
Next time will be almost entirely examples of using all of the other functions
sed
provides and fun ways to combine them into surprisingly powerful commands.