What Is the Best Way to Perform a Search and Replace On only Specific Sections of a File? - bash

I have a markdown file with sections separated by headings. I want to perform a search and replace only on specific sections; however, each section has similar content, so a global search and replace would end up affecting all sections. Because of this, I would need to somehow limit the search and replace to only certain sections of the file.
For example, say I wanted to replace all instances of foo with bar under # Section 1, # Section 3, and # Section 4 leaving # Section 2 and # Section 5 unchanged, as shown below
Sample Input:
# Section 1
- foo
- foo
- Unimportant Item
- foo
- Unimportant Item
# Section 2
- foo
- Unimportant Item
# Section 3
- foo
- Unimportant Item
# Section 4
- foo
- Unimportant Item
- foo
# Section 5
- foo
- foo
Sample Output
# Section 1
- bar
- bar
- Unimportant Item
- bar
- Unimportant Item
# Section 2
- foo
- Unimportant Item
# Section 3
- bar
- Unimportant Item
# Section 4
- bar
- Unimportant Item
- bar
# Section 5
- foo
- foo
If I didn't have to worry about the individual sections, a global search and replace would be trivial by using
sed -i 's/foo/bar/g' <input_file>
but I'm not sure if sed is capable of checking context to allow what I am looking for.

Here's a sed version:
sed -E '/^#[^#]\s*Section\s+[134]\s*$/, // s/foo/bar/' input.md

You may use this awk:
awk 'p {sub(/foo$/, "bar")} /^#/ {p = / (Section [134])$/} 1' file
# Section 1
- bar
- bar
- Unimportant Item
- bar
- Unimportant Item
# Section 2
- foo
- Unimportant Item
# Section 3
- bar
- Unimportant Item
# Section 4
- bar
- Unimportant Item
- bar
# Section 5
- foo
- foo
To make it more readable:
awk 'p { # if p==1 and current line # == n
sub(/foo$/, "bar") # replace foo with bar
}
/^#/ { # if line starts with #
p = / (Section [134])$/ # set p = 1/0 if it matches sections
} 1' file

For completion, this awk answer will do the substitutions in the whole section, including the header:
awk '/^#/ { in_section = /Section [1|3|4]/ } in_section { sub(/foo/, "bar") } 1' input.md
If you want to exclude the headers from the substitution:
awk ' /^#/ { in_section = /Section [1|3|4]/; header_line = NR }
in_section && (NR > header_line) { sub(/foo/, "bar") } 1' input.md
Detail
awk '/^#/ { # if in section header
in_section = /Section [1|3|4]/; # determine if section of interest (1/0)
header_line = NR; # value of header line to exclude
}
in_section && (NR > header_line) { # if in section of interest and after header line
sub(/foo/, "bar"); # substitute text
} 1' input.md # 1 is to print all lines

My usual advice whenever you're considering sed -i is to use its older brother ed instead, as unlike sed, it's intended from the get-go to edit files (It's also POSIX standard, unlike sed -i, and thus more portable.)
Something like
ed -s input.md <<EOF
/Section 1/;/Section/s/foo/bar/g
/Section 3/;/Section/sg
w
EOF
Translated: In the block starting with the first line containing Section 1 and ending with the next Section line, replace foo with bar. Then do the same substitution in the Section 3 block. Finally, write the changes back to disk.

You can always provide multiple commands to sed with the -e option so that substitution occurs even when the sections are one after another:
sed -e '/# Section 1/,/#/ s/foo/bar/' -e '/# Section 2/,/#/ s/foo/bar/' input.md
Multiple commands can also be placed in a "sed script file":
# content of script.sed
/# Section 1/,/#/ s/foo/bar/
/# Section 2/,/#/ s/foo/bar/
And you executed like this:
sed -f script.sed input.md

A solution with sed
The key is in the range. The first addressing pattern matches the header(s) where we want the substitution to begin, and the second, matches all headers except the ones in the first addressing pattern. Note that the substitute command is inclusive of the first and last lines in the range (i.e. the headers).
sed -E '/^# Section [134]/, /^# Section [^134]/ s/foo/bar/' input.md
This one excludes the headers from the substitution:
sed -E '/^# Section [134]/, /^# Section [^134]/ { /^#/!s/foo/bar/ }' input.md

Related

Use `sed` to replace text in code block with output of command at the top of the code block

I have a markdown file that has snippets of code resembling the following example:
```
$ cat docs/code_sample.sh
#!/usr/bin/env bash
echo "Hello, world"
```
This means there there's a file at the location docs/code_sample.sh, whose contents is:
#!/usr/bin/env bash
echo "Hello, world"
I'd like to parse the markdown file with sed (awk or perl works too) and replace the bottom section of the code snippet with whatever the above bash command evaluates to, for example whatever cat docs/code_sample.sh evaluates to.
Perl to the rescue!
perl -0777 -pe 's/(?<=```\n)^(\$ (.*)\n\n)(?^s:.*?)(?=```)/"$1".qx($2)/meg' < input > output
-0777 slurps the whole file into memory
-p prints the input after processing
s/PATTERN/REPLACEMENT/ works similarly to a substitution in sed
/g replaces globally, i.e. as many times as it can
/m makes ^ match start of each line instead of start of the whole input string
/e evaluates the replacement as code
(?<=```\n) means "preceded by three backquotes and a newline"
(?^s:.*?) changes the behaviour of . to match newlines as well, so it matches (frugally because of the *?) the rest of the preformatted block
(?=```) means "followed by three backquotes`
qx runs the parameter in a shell and returns its output
A sed-only solution is easier if you have the GNU version with an e command.
That said, here's a quick, simplistic, and kinda clumsy version I knocked out that doesn't bother to check the values of previous or following lines - it just assumes your format is good, and bulls through without any looping or anything else. Still, for my example code, it worked.
I started by making an a, a b, and an x that is the markup file.
$: cat a
#! /bin/bash
echo "Hello, World!"
$: cat b
#! /bin/bash
echo "SCREW YOU!!!!"
$: cat x
```
$ cat a
foo
bar
" b a z ! "
```
```
$ cat b
foo
bar
" b a z ! "
```
Then I wrote s which is the sed script.
$: cat s
#! /bin/env bash
sed -En '
/^```$/,/^```$/ {
# for the lines starting with the $ prompt
/^[$] / {
# save the command to the hold space
x
# write the ``` header to the pattern space
s/.*/```/
# print the fabricated header
p
# swap the command back in
x
# the next line should be blank - add it to the current pattern space
N
# first print the line of code as-is with the (assumed) following blank line
p
# scrub the $ (prompt) off the command
s/^[$] //
# execute the command - store the output into the pattern space
e
# print the output
p
# put the markdown footer back
s/.*/```/
# and print that
p
}
# for the (to be discarded) existing lines of "content"
/^[^`$]/d
}
' $*
It does the job and might get you started.
$: s x
```
$ cat a
#! /bin/bash
echo "Hello, World!"
```
```
$ cat b
#! /bin/bash
echo "SCREW YOU!!!!"
```
Lots of caveats - better to actually check that the $ follows a line of backticks and is followed by a blank line, maybe make sure nothing bogus could be in the file to get executed... but this does what you asked, with (GNU) sed.
Good luck.
A rare case when use of getline would be appropriate:
$ cat tst.awk
state == "importing" {
while ( (getline line < $NF) > 0 ) {
print line
}
close($NF)
state = "imported"
}
$0 == "```" { state = (state ? "" : "importing") }
state != "imported" { print }
$ awk -f tst.awk file
See http://awk.freeshell.org/AllAboutGetline for getline uses and caveats.

Multiple "sed" actions on previous results

Have this input:
bar foo
foo ABC/DEF
BAR ABC
ABC foo DEF
foo bar
on the above I need do 4 (sequential) actions:
select only lines containing "foo" (lowercase)
on the selected lines, remove everything but UPPERCASE letters
delete empty lines (if some is created by the previous action)
and on the remaining from the above - enclose every char with [x]
I'm able to solve the above, but need two sed invocations piped together. Script:
#!/bin/bash
data() {
cat <<EOF
bar foo
foo ABC/DEF
BAR ABC
ABC foo DEF
foo bar
EOF
}
echo "Result OK"
data | sed -n '/foo/s/[^A-Z]//gp' | sed '/^\s*$/d;s/./[&]/g'
# in the above it is solved using 2 sed invocations
# trying to solve it using only one invocation,
# but the following doesn't do what i need.. :( :(
echo "Variant 2 - trying to use only ONE invocation of sed"
data | sed -n '/foo/s/[^A-Z]//g;/^\s*$/d;s/./[&]/gp'
output from the above:
Result OK
[A][B][C][D][E][F]
[A][B][C][D][E][F]
Variant 2 - trying to use only ONE invocation of sed
[A][B][C][D][E][F]
[B][A][R][ ][A][B][C]
[A][B][C][D][E][F]
The variant 2 should be also only
[A][B][C][D][E][F]
[A][B][C][D][E][F]
It is possible to solve the above using only by one sed invocation?
sed -n '/foo/{s/[^A-Z]//g;/^$/d;s/./[&]/g;p;}' inputfile
Output:
[A][B][C][D][E][F]
[A][B][C][D][E][F]
Alternative sed approach:
sed '/foo/!d;s/[^A-Z]//g;/./!d;s/./[&]/g' file
The output:
[A][B][C][D][E][F]
[A][B][C][D][E][F]
/foo/!d - deletes all lines that don't contain foo
/./!d - deletes all empty lines

Split text file basing on date tag / timestamp

I have big log file containing date tags. It looks like this:
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
[04/11/2015, 12:21]
foo
bar
[08/11/2015, 14:12]
bar
foo
[09/11/2015, 11:25]
...
[15/11/2015, 19:22]
...
[15/11/2015, 21:55]
...
and so on. I need to split these data into files of days, like:
01.txt:
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
04.txt:
[04/11/2015, 12:21]
foo
bar
etc. How can I do that using any of unix tools?
I don't think there's a tool that will do it without a little programming, but with Awk the little programming really isn't all that hard.
script.awk
/^\[[0-3][0-9]\/[01][0-9]\/[12][0-9]{3},/ {
if ($1 != old_date)
{
if (outfile != "") close(outfile);
outfile = sprintf("%.2d.txt", ++filenum);
old_date = $1
}
}
{ print > outfile }
The first (bigger) block of code recognizes the date string, which is also in $1 (so the condition could be made more precise by referring to $1, but the benefit it minimal to non-existent). Inside the actions, it checks to see if the date is different from the last date it remembered. If so, it checks whether it has a file open and closes it if necessary (close is part of POSIX awk). Then it generates a new file name, and remembers the current date it is processing.
The second smaller block simply writes the current line to the current file.
Invocation
awk -f script.awk data
This assumes you have a file script.awk; you could provide it as a script argument if you prefer. If the whole is encapsulated in a shell script, I'd use an expression rather than a second file, but I find it convenient for development to use a file. (The shell script would contain awk '…the script…' "$#" with no separate file.)
Example output files
Given the sample data from the question, the output is in five files, 01.txt .. 05.txt.
$ for file in 0?.txt; do boxecho $file; cat $file; done
************
** 01.txt **
************
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
************
** 02.txt **
************
[04/11/2015, 12:21]
foo
bar
************
** 03.txt **
************
[08/11/2015, 14:12]
bar
foo
************
** 04.txt **
************
[09/11/2015, 11:25]
...
************
** 05.txt **
************
[15/11/2015, 19:22]
...
[15/11/2015, 21:55]
...
$
The boxecho command is a simple script that echoes its arguments in a box of stars:
echo "** $* **" | sed -e h -e s/./*/g -e p -e x -e p -e x
Revised file name format
I wish have output as a [day].txt or [day].[month].[year].txt, based on date in file. Is that possible?
Yes; it is possible and not particularly hard. The split function is one way of dealing with breaking up the value in $1. The regex specifies that square brackets, slashes and commas are the field separators. There are 5 sub-fields in the value in $1: an empty field before the [, the three numeric components separated by slashes and an empty field after the ,. The array name, dmy, is mnemonic for the sequence in which the components are stored.
/^\[[0-3][0-9]\/[01][0-9]\/[12][0-9]{3},/ {
if ($1 != old_date)
{
if (outfile != "") close(outfile)
n = split($1, dmy, "[/\[,]")
outfile = sprintf("%s.%s.%s.txt", dmy[4], dmy[3], dmy[2])
old_date = $1
}
}
{ print > outfile }
Permute the numbers 4, 3, 2 in the sprintf() statement to suit yourself. The given order is year, month, day, which has many merits including that it is exploiting the ISO 8601 standard and the files sort automatically into date order. I strongly counsel its use, but you may do as you wish. For the sample data and the input shown in the question, the files it generates are:
2015.11.01.txt
2015.11.04.txt
2015.11.08.txt
2015.11.09.txt
2015.11.15.txt
This is my idea. I use sed command and awk script.
$ cat biglog
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar
[04/11/2015, 12:21]
foo
bar
aaa
bbb
[08/11/2015, 14:12]
bar
foo
$ cat sample.awk
#!/bin/awk -f
BEGIN {
FS = "\n"
RS = "\n\n"
}
{
date = substr($1, 2, 2)
filename = date ".txt"
for (i = 2; i <= NF; i++) {
print $i >> filename
}
}
How to use
sed -e 's/^\(\[[0-9][0-9]\)/\n\1/' biglog | sed -e 1d | ./sample.awk
Confirmation
ls *.txt
01.txt 04.txt 08.txt
$ cat 01.txt
foo
bar
$ cat 04.txt
foo
bar
aaa
bbb
$ cat 08.txt
bar
foo
yet another awk
$ awk -F"[[/,]" -v d="." '/^[\[0-9\/, :\]]*$/{f=$4 d $3 d $2 d"txt"}
{print $0>f}' file
$ ls 20*
2015.11.01.txt 2015.11.04.txt 2015.11.08.txt 2015.11.09.txt 2015.11.15.txt
$ cat 2015.11.01.txt
[01/11/2015, 02:19]
foo
[01/11/2015, 08:40]
bar

Delete lines before and after a match in bash (with sed or awk)?

I'm trying to delete two lines either side of a pattern match from a file full of transactions. Ie. find the match then delete two lines before it, then delete two lines after it and then delete the match. The write this back to the original file.
So the input data is
D28/10/2011
T-3.48
PINITIAL BALANCE
M
^
and my pattern is
sed -i '/PINITIAL BALANCE/,+2d' test.txt
However this is only deleting two lines after the pattern match and then deleting the pattern match. I can't work out any logical way to delete all 5 lines of data from the original file using sed.
an awk one-liner may do the job:
awk '/PINITIAL BALANCE/{for(x=NR-2;x<=NR+2;x++)d[x];}{a[NR]=$0}END{for(i=1;i<=NR;i++)if(!(i in d))print a[i]}' file
test:
kent$ cat file
######
foo
D28/10/2011
T-3.48
PINITIAL BALANCE
M
x
bar
######
this line will be kept
here
comes
PINITIAL BALANCE
again
blah
this line will be kept too
########
kent$ awk '/PINITIAL BALANCE/{for(x=NR-2;x<=NR+2;x++)d[x];}{a[NR]=$0}END{for(i=1;i<=NR;i++)if(!(i in d))print a[i]}' file
######
foo
bar
######
this line will be kept
this line will be kept too
########
add some explanation
awk '/PINITIAL BALANCE/{for(x=NR-2;x<=NR+2;x++)d[x];} #if match found, add the line and +- 2 lines' line number in an array "d"
{a[NR]=$0} # save all lines in an array with line number as index
END{for(i=1;i<=NR;i++)if(!(i in d))print a[i]}' #finally print only those index not in array "d"
file # your input file
sed will do it:
sed '/\n/!N;/\n.*\n/!N;/\n.*\n.*PINITIAL BALANCE/{$d;N;N;d};P;D'
It works this way:
if sed has only one string in pattern space it joins another one
if there are only two it joins the third one
if it does natch to pattern LINE + LINE + LINE with BALANCE it joins two following strings, deletes them and goes at the beginning
if not, it prints the first string from pattern and deletes it and goes at the beginning without swiping the pattern space
To prevent the appearance of pattern on the first string you should modify the script:
sed '1{/PINITIAL BALANCE/{N;N;d}};/\n/!N;/\n.*\n/!N;/\n.*\n.*PINITIAL BALANCE/{$d;N;N;d};P;D'
However, it fails in case you have another PINITIAL BALANCE in string which are going to be deleted. However, other solutions fails too =)
For such a task, I would probably reach for a more advanced tool like Perl:
perl -ne 'push #x, $_;
if (#x > 4) {
if ($x[2] =~ /PINITIAL BALANCE/) { undef #x }
else { print shift #x }
}
END { print #x }' input-file > output-file
This will remove 5 lines from the input file. These lines will be the 2 lines before the match, the matched line, and the two lines afterwards. You can change the total number of lines being removed modifying #x > 4 (this removes 5 lines) and the line being matched modifying $x[2] (this makes the match on the third line to be removed and so removes the two lines before the match).
A more simple and easy to understand solution might be:
awk '/PINITIAL BALANCE/ {print NR-2 "," NR+2 "d"}' input_filename \
| sed -f - input_filename > output_filename
awk is used to make a sed-script that deletes the lines in question and the result is written on the output_filename.
This uses two processes which might be less efficient than the other answers.
This might work for you (GNU sed):
sed ':a;$q;N;s/\n/&/2;Ta;/\nPINITIAL BALANCE$/!{P;D};$q;N;$q;N;d' file
save this code into a file grep.sed
H
s:.*::
x
s:^\n::
:r
/PINITIAL BALANCE/ {
N
N
d
}
/.*\n.*\n/ {
P
D
}
x
d
and run a command like this:
`sed -i -f grep.sed FILE`
You can use it so either:
sed -i 'H;s:.*::;x;s:^\n::;:r;/PINITIAL BALANCE/{N;N;d;};/.*\n.*\n/{P;D;};x;d' FILE

Delete n1 previous lines and n2 lines following with respect to a line containing a pattern

sed -e '/XXXX/,+4d' fv.out
I have to find a particular pattern in a file and delete 5 lines above and 4 lines below it simultaneously. I found out that the line above removes the line containing the pattern and four lines below it.
sed -e '/XXXX/,~5d' fv.out
In sed manual it was given that ~ represents the lines which is followed by the pattern. But when i tried it, it was the lines following the pattern that was deleted.
So, how do I delete 5 lines above and 4 lines below a line containing the pattern simultaneously?
One way using sed, assuming that the patterns are not close enough each other:
Content of script.sed:
## If line doesn't match the pattern...
/pattern/ ! {
## Append line to 'hold space'.
H
## Copy content of 'hold space' to 'pattern space' to work with it.
g
## If there are more than 5 lines saved, print and remove the first
## one. It's like a FIFO.
/\(\n[^\n]*\)\{6\}/ {
## Delete the first '\n' automatically added by previous 'H' command.
s/^\n//
## Print until first '\n'.
P
## Delete data printed just before.
s/[^\n]*//
## Save updated content to 'hold space'.
h
}
### Added to fix an error pointed out by potong in comments.
### =======================================================
## If last line, print lines left in 'hold space'.
$ {
x
s/^\n//
p
}
### =======================================================
## Read next line.
b
}
## If line matches the pattern...
/pattern/ {
## Remove all content of 'hold space'. It has the five previous
## lines, which won't be printed.
x
s/^.*$//
x
## Read next four lines and append them to 'pattern space'.
N ; N ; N ; N
## Delete all.
s/^.*$//
}
Run like:
sed -nf script.sed infile
A solution using awk:
awk '$0 ~ "XXXX" { lines2del = 5; nlines = 0; }
nlines == 5 { print lines[NR%5]; nlines-- }
lines2del == 0 { lines[NR%5] = $0; nlines++ }
lines2del > 0 { lines2del-- }
END { while (nlines-- > 0) { print lines[(NR - nlines) % 5] } }' fv.out
Update:
This is the script explained:
I remember the last 5 lines in the array lines using rotatory indexes (NR%5; NR is the record number; in this case lines).
If I find the pattern in the current line ($0 ~ "XXXX; $0 being the current record: in this case a line; and ~ being the Extended Regular Expression match operator), I reset the number of lines read and note that I have 5 lines to delete (including the current line).
If I already read 5 lines, I print the current line.
If I do not have lines to delete (which is also true if I had read 5 lines, I put the current line in the buffer and increment the number of lines. Note how the number of lines is decremented and then incremented if a line is printed.
If lines need to be deleted, I do not print anything and decrement the number of lines to delete.
At the end of the script, I print all the lines that are in the array.
My original version of the script was the following, but I ended up optimizing it to the above version:
awk '$0 ~ "XXXX" { lines2del = 5; nlines = 0; }
lines2del == 0 && nlines == 5 { print lines[NR%5]; lines[NR%5] }
lines2del == 0 && nlines < 5 { lines[NR%5] = $0; nlines++ }
lines2del > 0 { lines2del-- }
END { while (nlines-- > 0) { print lines[(NR - nlines) % 5] } }' fv.out
awk is a great tool ! I strongly recommend that you find a tutorial on the net and read it. One important thing: awk works with Extended Regular Expressions (ERE). Their syntax is a little different from Standard Regular Expression (RE) used in sed, but all that can be done with RE can be done with ERE.
The idea is to read 5 lines without printing them. If you find the pattern, delete the unprinted lines and the 4 lines bellow. If you do not find the pattern, remember the current line and print the 1st unprinted line. At the end, print what is unprinted.
sed -n -e '/XXXX/,+4{x;s/.*//;x;d}' -e '1,5H' -e '6,${H;g;s/\n//;P;s/[^\n]*//;h}' -e '${g;s/\n//;p;d}' fv.out
Of course, this only works if you have one occurrence of your pattern in the file. If you have many, you need to read 5 new lines after finding your pattern, and it gets complicated if you again have your pattern in those lines. In this case, I think sed is not the right tool.
This might work for you:
sed 'H;$!d;g;s/\([^\n]*\n\)\{5\}[^\n]*PATTERN\([^\n]*\n\)\{5\}//g;s/.//' file
or this:
awk --posix -vORS='' -vRS='([^\n]*\n){5}[^\n]*PATTERN([^\n]*\n){5}' 1 file
a more efficient sed solution:
sed ':a;/PATTERN/,+4d;/\([^\n]*\n\)\{5\}/{P;D};$q;N;ba' file
If you are happy to output the result to a file instead of stdout, vim can do it quite efficiently:
vim -c 'g/pattern/-5,+4d' -c 'w! outfile|q!' infile
or
vim -c 'g/pattern/-5,+4d' -c 'x' infile
to edit the file in-place.

Resources