MacOS SED to find a second matching line and insert lines above it - bash

I am looking for a BASH sed script that can open an .mdx file and search line by line and find the second line that has the value I'm searching for: three hyphens like this ---. Then, I'm hoping to insert two lines of redirect information above that second set of hyphens.
The second occurrence of these three hyphens could be on any line following line 1, so I would need a script that is smart enough to search until it finds the second one.
I'll need something that runs in the MacOS that can do some in-place file updates.
Here's my input file:
---
title: Some kind of title
---
I'd like to locate that second instance of three hyphens and insert new text above it like this:
---
title: Some kind of title
redirects:
- /some/kind/of/directory/path
---
In my shell script, I have a variable that contains that redirect path, so I would somehow need to pass that variable along with a hard-coded redirects: to sed.
I looked at a variety of options, including the POSIX option included here, but it just deletes the second occurrence. Perhaps there's an easy way I could modify that to update?
Let me know if you need more to understand what I'm looking for.

This is one way of doing it:
testpath="/some/kind/of/directory/path"
sed "s|---|redirects:\n\ \ -\ $testpath\n---|" file.mdx | sed '/---/,$!d'
This works by adding the "redirect: path" directly above both "---" lines, then deletes the top "redirect: path". This will fail miserably if there is more than two "---" in the file.
To do it inline:
testpath="/some/kind/of/directory/path"
sed -i .bak "s|---|redirects:\n\ \ -\ $testpath\n---|" test.txt && sed -i _bak2 '/---/,$!d' test.txt

You can find out the line number of the 2nd --- and then insert before that line.
Example (tested on macos):
$ cat file
---
title: Some kind of title
---
$ cat foo.sh
path=/some/kind/of/directory/path
n=$( sed -n '/^---$/=' file | sed -n 2p )
sed -e "$n i\\
redirects:\\
- $path
" file
$ bash foo.sh
---
title: Some kind of title
redirects:
- /some/kind/of/directory/path
---
$
(Use sed -i for updating the file in place.)

Are you hellbent on using sed for this? Generally Awk is both more versatile and more readable.
awk '/^---$/ { print; hyphens=1; next }
hyphens && /^title: / { print; print "redirects:\n - /some/kind/of/directory/path"; next }
{ hyphens=0 } 1' file.mdx >newfile.mdx
In brief, we keep track of whether the previous line was three hyphens; if it was, and the current line matches the regex ^title: , print the additional lines. Otherwise, we reset the state variable and print. (The final 1 is a common Awk idiom to avoid having to say { print } explicitly.)
Unfortunately, standard Awk has no -i option. If you can use GNU Awk, it has an option -i inplace which emulates the (also nonstandard, but common) -i option of sed. Otherwise, just write to a temporary file and move it back onto the original afterwards; that's what -i does behind the scenes, too.

Related

sed - replace pattern with content of file, while the name of the file is the pattern itself

Starting from the previous question, I have another one. If I make this work, I can just delete several lines of script :D
I want to transform this line:
sed -i -r -e "/$(basename "$token_file")/{r $token_file" -e "d}" "$out_dir_rug"/rug.frag
into this line:
sed -i -r -e "/(##_[_a-zA-Z0-9]+_##)/{r $out_dir_frags_rug/\1" -e "d}" "$out_dir_rug"/rug.frag
The idea is the following. Originally (the first line), I searched for some patterns, and then replaced those patterns with their associated files. The names of the files are the patterns themselves.
Eample:
Pattern: ##_foo_##
File name: ##_foo_##
Content of file ##_foo_##:
first line of foo text
second line of foo text
so the text
bar
##_foo_##
bar
would become
bar
first line of foo text
second line of foo text
bar
In my second attempt, I used sed for both locating the patterns, and for the actual replacement.
The result is that the patterns are found, but replaced with pretty much nothing.
Is sed supposed to be able to do the replacement I want? If yes, how should I change my command?
Note: a file usually has several different patterns (I call them tokens), and the same pattern may appear more than one time.
So an input file might look like:
bar
bar
##_foo_##
bar
##_haa_##
bar
##_foo_##
and so on
I already tried to replace the / in the address with ,, to no useful result. Escaping the / in the path to \/ also does not help.
I verified that the path to the replacement files is good by adding the next line, just before the sed:
echo "$out_dir_frags_rug"
The names of the files are the patterns themselves.
If you need anything "dynamic", then sed is not enough for it. As sed can't do "eval" - can't reinterpret the content of pattern buffer or hold buffer as commands (that would be amazing!) - you can't use the line as part of the command.*
You can use bash, untested, written here:
while IFS= read -r line; do
if [[ "$line" =~ ^##_([_a-zA-Z0-9]+)_## ]]; then
cat "${BASH_REMATCH[1]}"
else
printf "%s\n" "$line"
fi
done < inputfile
but that would be slow - bash is slow on reading lines. A similar design could be working in a POSIX shell with POSIX tools, by replacing [[ bash extension with some grep + sed or awk.
An awk would be waaaaaaay faster, something along, also untested:
awk '/^##_[_a-zA-Z0-9]+_##$/{
gsub(/^##_/, "", $0);
gsub(/_##$/, "", $0);
file = $0
while (getline tmp < (file)) print tmp;
next
}
{print}
' inputfile
That said, for your specific problem, instead of reinventing the wheel and writing yet another templating and preprocessing tool, I would advise to concentrate on researching existing solutions. A simple cpp file with the following content can be preprocessed with C preprocessor:
bar
bar
#include "foo"
bar
#include "haa"
bar
#include "foo"
and so on
It's clear to anyone what it means and it has a very standarized format and you also get get all the #ifdef conditional expressions, macros and macro functions that you can use - but you can't start lines with #, dunno if that's important. For endless ultimate templating power, I could recommend m4 from the standard unix commands.
* You can however with GNU sed execute the content of replacement string inside s command in shell with e flag. I did forget about it when writing this answer, as it's rarely used and I would strongly advise against using e flag - finding out proper quoting for the subshell is hard (impossible?) and it's very easy to abuse it. Anyway, the following could work:
sed -n '/^##_\(.*\)_##$/!{p;n;}; s//cat \1/ep'
but with the following input it may cause harm on your system:
some input file
##_$(rm /)_##
^^^^^^^ - will be executed in subshell and remove all your files
I think proper quoting would be something along (untested):
sed -n '/^##_\(.*\)_##$/!{p;n;}; s//\1/; '"s/'/'\\\\''/g; p; s/.*/cat '&'/ep"
but I would go with existing tools like cpp or m4 anyway.
With sed
Yes, this is possible with GNU sed.
With this input file input.txt:
= bar =
##_foo_##
= bar2 =
##_foo_##
= bar3 =
And the ##_foo_## file you gave in your question, the command
sed -E '
/^##_[_a-zA-Z0-9]+_##$/ {
s|^|cat ./|
e
}
' input.txt
... will yield:
= bar =
first line of foo text
second line of foo text
= bar2 =
first line of foo text
second line of foo text
= bar3 =
This command can also be shortened to this one-liner:
sed -E '/^##_[_a-zA-Z0-9]+_##$/ s|^|cat ./|e' input.txt
Explanation
GNU sed has a special command e that executes the command found in pattern space and then replaces the content of the pattern space with the output of the command.
When the above program encounters a line matching your pattern ##_file_##, it prepends cat ./ to the pattern space and executes it with e.
The s/.../.../e command is a shortened version that does exactly the same, the command being executed only if a successful substitution occured.
Contrary to what KamilCuk says in their answer, both sed commands above are perfectly safe and don't need any escaping/quoting because they are executed on a known harmless pattern that cannot be tricked to execute anything else than the expected cat.
Of course, this is designed to work with that ##_file_## pattern you gave in your question. Allowing spaces or other fancy characters in your pattern may break things since they might be interpreted by the shell.
With awk
Here is the equivalent with awk:
awk '
/^##_[_a-zA-Z0-9]+_##$/ {
system("cat ./" $0)
next
}
1
' input.txt
This command can also be shortened to this one-liner:
awk '! /^##_[_a-zA-Z0-9]+_##$/ || system("cat ./" $0)' input.txt
Explanation
This is very similar to the sed commands above: when awk meets the pattern ##_file_## it builds the corresponding cat command and executes it with system() then it skips to the next input line with next. Lines that don't match the pattern are printed as is (the 1 line).
Of course, the command being interpreted by the shell, the same caveat applies here: both awk commands are perfectly safe and don't need any escaping/quoting as long as your pattern stays that simple.

How to split a text file content by a string?

Suppose I've got a text file that consists of two parts separated by delimiting string ---
aa
bbb
---
cccc
dd
I am writing a bash script to read the file and assign the first part to var part1 and the second part to var part2:
part1= ... # should be aa\nbbb
part2= ... # should be cccc\ndd
How would you suggest write this in bash ?
You can use awk:
foo="$(awk 'NR==1' RS='---\n' ORS='' file.txt)"
bar="$(awk 'NR==2' RS='---\n' ORS='' file.txt)"
This would read the file twice, but handling text files in the shell, i.e. storing their content in variables should generally be limited to small files. Given that your file is small, this shouldn't be a problem.
Note: Depending on your actual task, you may be able to just use awk for the whole thing. Then you don't need to store the content in shell variables, and read the file twice.
A solution using sed:
foo=$(sed '/^---$/q;p' -n file.txt)
bar=$(sed '1,/^---$/b;p' -n file.txt)
The -n command line option tells sed to not print the input lines as it processes them (by default it prints them). sed runs a script for each input line it processes.
The first sed script
/^---$/q;p
contains two commands (separated by ;):
/^---$/q - quit when you reach the line matching the regex ^---$ (a line that contains exactly three dashes);
p - print the current line.
The second sed script
1,/^---$/b;p
contains two commands:
1,/^---$/b - starting with line 1 until the first line matching the regex ^---$ (a line that contains only ---), branch to the end of the script (i.e. skip the second command);
p - print the current line;
Using csplit:
csplit --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}" && sed -i '/---/d' foo_bar*
If version of coreutils >= 8.22, --suppress-matched option can be used and sed processing is not required, like
csplit --suppress-matched --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}".

Find two string in same line and then replace using sed

I am doing a find and replace using sed in a bash script. I want to search each file for words with files and no. If both the words are present in the same line then replace red with green else do nothing
sed -i -e '/files|no s/red/green' $file
But I am unable to do so. I am not receiving any error and the file doesn't get updated.
What am I doing wrong here or what is the correct way of achieving my result
/files|no/ means to match lines with either files or no, it doesn't require both words on the same line.
To match the words in either order, use /files.*no|no.*files/.
sed -i -r -e '/files.*no|no.*files/s/red/green/' "$file"
Notice that you need another / at the end of the pattern, before s, and the s operation requires / at the end of the replacement.
And you need the -r option to make sed use extended regexp; otherwise you have to use \| instead of just |.
This might work for you (GNU sed):
sed '/files/{/no/s/red/green/}' file
or:
sed '/files/!b;/no/s/red/green/' file
This method allows for easy extension e.g. foo, bar and baz:
sed '/foo/!b;/bar/!b;/baz/!b;s/red/green/' file
or fee, fie, foe and fix:
sed '/fee/!b;/fi/!b;/foe/!b;/fix/!b;s/bacon/cereal/' file
An awk verison
awk '/files/ && /no/ {sub(/red/,"green")} 1' file
/files/ && /no/ files and no have to be on the same line, in any order
sub(/red/,"green") replace red with green. Use gsub(/red/,"green") if there are multiple red
1 always true, do the default action, print the line.

How to insert a specific character at a specific line of a file using sed or awk?

I want to use command to edit the specific line of a file instead of using vi. This is the thing. If there is a # starting with the line, then replace the # to make it uncomment. Otherwise, add the # to make it comment. I'd like to use sed or awk. But it won't work as expected.
This is the file.
what are you doing now?
what are you gonna do? stab me?
this is interesting.
This is a test.
go big
don't be rude.
For example, I just want to add the # at the beginning of the the line 4 This is a test if it doesn't start with #. And if it starts with #, then remove the #.
I've already tried via sed & gawk (awk)
gawk -i inplace '$1!="#" {print "#",$0;next};{print substr($0,3,length-1)}' file
sed -i /test/s/^#// file # make it uncomment
sed -i /test/s/^/#/ file # make it comment
I don't know how to use if else to make sed work. I could only make it with a single command, then use another regex to make the opposite.
Using gawk, it works as the main line. But it will mess the rest of the code up.
This might work for you (GNU sed):
sed '4{s/^/#/;s/^##//}' file
On line 4 prepend a # to the line and if there 2 #'s remove them.
Could also be written:
sed '4s/^/#/;4s/^##//' file
This will remove # from the start of line 4 or add it if it wasn't already there:
sed -i '4s/^#/\n/; 4s/^[^\n]/#&/; 4s/^\n//' File
The above assume GNU sed. If you have BSD/MacOS sed, some minor changes will be required.
When sed reads a new line, the one thing that we know for sure about the new line is that it does not contain \n. (If it did, it would be two lines, not one.) Using this knowledge, the script works by:
s/^#/\n/
If the fourth line starts with #, replace # with \n. (The \n serves as a notice that the line had originally been commented out.)
4s/^[^\n]/#&/
If the fourth line now starts with anything other than \n (meaning that it was not originally commented), put a # in front.
4s/^\n//
If the fourth line now starts with \n, remove it.
Alternative: Modifying lines that contain test
To comment/uncomment lines that contain test:
sed '/test/{s/^#/\n/; s/^[^\n]/#&/; s/^\n//}' File
Alternative: using awk
The exact same logic can be applied using awk. If we want to comment/uncomment line 4:
awk 'NR==4 {sub(/^#/, "\n"); sub(/^[^\n]/, "#&"); sub(/^\n/, "")} 1' File
If we want to comment/uncomment any line containing test:
awk '/test/ {sub(/^#/, "\n"); sub(/^[^\n]/, "#&"); sub(/^\n/, "")} 1' File
Alternative: using sed but without newlines
To comment/uncomment any line containing test:
sed '/test/{s/^#//; t; s/^/#/; }' File
How it works:
s/^#//; t
If the line begins with #, then remove it.
t tells sed that, if the substitution succeeded, then it should skip the rest of the commands.
s/^/#/
If we get to this command, that means that the substitution did not succeed (meaning the line was not originally commented out), so we insert #.
If you end up on a system with a sed that doesn't support in-place editing, you can fall back to its uncle ed:
ed -s file 2>/dev/null <<EOF
4 s/^/#/
s/^##//
w
q
EOF
(Standard error is redirected to /dev/null because in ed, unlike sed, it's an error if s doesn't replace anything and a question mark is thus printed to standard error.)
$ awk 'NR==4{$0=(sub(/^#/,"") ? "" : "#") $0} 1' file
what are you doing now?
what are you gonna do? stab me?
this is interesting.
#This is a test.
go big
don't be rude.
$ awk 'NR==4{$0=(sub(/^#/,"") ? "" : "#") $0} 1' file |
awk 'NR==4{$0=(sub(/^#/,"") ? "" : "#") $0} 1'
what are you doing now?
what are you gonna do? stab me?
this is interesting.
This is a test.
go big
don't be rude.

Sed/Awk to delete second occurence of string - platform independent

I'm looking for a line in bash that would work on both linux as well as OS X to remove the second line containing the desired string:
Header
1
2
...
Header
10
11
...
Should become
Header
1
2
...
10
11
...
My first attempt was using the deletion option of sed:
sed -i '/^Header.*/d' file.txt
But well, that removes the first occurence as well.
How to delete the matching pattern from given occurrence suggests to use something like this:
sed -i '/^Header.*/{2,$d} file.txt
But on OS X that gives the error
sed: 1: "/^Header.*/{2,$d}": extra characters at the end of d command
Next, i tried substitution, where I know how to use 2,$, and subsequent empty line deletion:
sed -i '2,$s/^Header.*//' file.txt
sed -i '/^\s*$/d' file.txt
This works on Linux, but on OS X, as mentioned here sed command with -i option failing on Mac, but works on Linux , you'd have to use
sed -i '' '2,$s/^Header.*//' file.txt
sed -i '' '/^\s*$/d' file.txt
And this one in return doesn't work on Linux.
My question then, isn't there a simple way to make this work in any Bash? Doesn't have to be sed, but should be as shell independent as possible and i need to modify the file itself.
Since this is file-dependent and not line-dependent, awk can be a better tool.
Just keep a counter on how many times this happened:
awk -v patt="Header" '$0 == patt && ++f==2 {next} 1' file
This skips the line that matches exactly the given pattern and does it for the second time. On the rest of lines, it prints normally.
I would recommend using awk for this:
awk '!/^Header/ || !f++' file
This prints all lines that don't start with "Header". Short-circuit evaluation means that if the left hand side of the || is true, the right hand side isn't evaluated. If the line does start with Header, the second part !f++ is only true once.
$ cat file
baseball
Header and some other stuff
aardvark
Header for the second time and some other stuff
orange
$ awk '!/^Header/ || !f++' file
baseball
Header and some other stuff
aardvark
orange
This might work for you (GNU sed):
sed -i '1b;/^Header/d' file
Ignore the first line and then remove any occurrence of a line beginning with Header.
To remove subsequent occurrences of the first line regardless of the string, use:
sed -ri '1h;1b;G;/^(.*)\n\1$/!P;d' file

Resources