bash - sed query to edit yaml file - bash

I have a config.yaml file which contains among other values the following list of kafka brokers which I want to remove from the config using a bash script.
kafka.brokers:
- "node003"
- "node004"
I am doing this currently by invoking vi from inside the script using the command:
vi $CONF_BENCHMARK/config.yaml -c ":%s/kafka.brokers:\(\n\s*-\s".*"\)*/kafka.brokers:/g" -c ':wq!'
I understand that sed is a more appropriate tool to accomplish the same task but when I try to translate the above regex to sed, it does not work.
sed -i -e "s/kafka.brokers:\(\n\s*-\s".*"\)*/kafka.brokers:/g" $CONF_BENCHMARK/config.yaml
I am doing something wrong ?

awk to the rescue!
sed is line based, this should work...
$ awk 's{if(/\s*-\s*"[^"]*"/) next; else s=0} /kafka.brokers:/{s=1}1' file
Explanation
if(/\s*-\s*"[^"]*"/) next if pattern matches skip to next line
s{if(/\s... check pattern if only s is set
/kafka.brokers:/{s=1} when header seen set s
1 shorthand for print lines (if not skipped)
s{... else s=0} if s was set but pattern not found, reset s

As other have pointed out, you will need to be explicit to get sed working with multiply lines.
The real answer is to use AWK a beautiful answer is provided by karakfa. But for the educational purpose I will provide an sed answer:
sed '
/kafka.brokers/ {
:a
$be
N
/\n[[:space:]]*-[[:space:]]"[^\n]*"[^\n]*$/ba
s/\n.*\(\n\)/\1/
P;D
:e
s/\n.*//
}
' input
Basically sed will keep append lines to the pattern space from when kafka.brokers up until \n[[:space:]]*-[[:space:]]"[^\n]*"[^\n]*$ is not matches.
This will leave the pattern space with one trailing line it in, i.e:
kafka.brokers:\n - "node003"\n - "node004"\nother stuff$
Replacing everything \n.*\(\n\) with a newline leaves the following pattern space:
kafka.brokers:\nother stuff$
P;D will print the first line from the pattern space and then restart the cycle with the remaning pattern space. Making the input support:
kafka.brokers:
- "node003"
- "node004"
kafka.brokers:
- "node005"
more_input

Your Vim pattern matches across multiple lines, but sed works line-by-line. (That is, it first tries to match your pattern against kafka.brokers: and fails, then it tries to match - "node003", and so on.) Your instinct to use something other than Vim was right, but sed probably isn't the best tool for the job here.
This answer addresses the problem of matching multi-line patterns with sed in more detail.
My personal recommendation would be to use a scripting language like Python or Perl to deal with complicated pattern-matching. You can run a Python command with python -c <command>, for instance, just like you did with Vim, or you could write a small Python script that you call from your Bash script. It's a little more complicated than a sed one-liner, but it will probably save you a lot of debugging and make your script easier to maintain and modify.

Consider using yq instead of sed or awk. Deleting the key kafka.brokers then becomes as simple as:
yq d $CONF_BENCHMARK/config.yaml '"kafka.brokers"'
The following snippet demonstrates the yq delete feature:
cat <<EOF | yq d - '"kafka.brokers"'
some:
path: value
kafka.brokers:
- "node003"
- "node004"
EOF
... and results in the output
some:
path: value

Related

Regex to match characters between two specific characters in shell script

I want to clean my file before/after saving so I have to delete unnecessary characters that I have there. Sadly, even that my regex is working in Regex101, it does not work in shell script I wrote.
I am getting my list from Kubernetes via
kubectl get pods -n $1 -o jsonpath='{range .items[*]}{#.spec.containers[*].image}{","}{#.status.containerStatuses[*].imageID}{"\n"}{end}'
Then I saving it to the temp file and using sed to clear it - the regex should match and (sed should) delete any character between , and # (also should delete #). I am escaping them since they are special characters.
sed -i 's/(?<=\,)(.*?)(?<=\#)//g' temp
The problem is that this regex is working fine (for example in Regex101) but is not working with the sed command. I even tried awk but getting the same output.
awk '!/(?<=\,)(.*?)(?<=\#)/' temp
Am I missing something or is the regex acting differently somehow in Unix/shell?
Thanks for any input.
Example content of the file (for test):
docker.elastic.co/elasticsearch/elasticsearch:7.17.5,docker-pullable://docker.elastic.co/elasticsearch/elasticsearch#sha256:76344d5f89b13147743db0487eb76b03a7f9f0cd55abe8ab887069711f2ee27d
docker.io/bitnami/kafka:3.3.1-debian-11-r11,docker-pullable://bitnami/kafka#sha256:be29db0e37b6ab13df5fc14988a4aa64ee772c7f28b4b57898015cf7435ff662
docker.io/bitnami/mongodb:6.0.3-debian-11-r0,docker-pullable://bitnami/mongodb#sha256:e7438d7964481c0bcfcc8f31bca2d73022c0b7ba883143091a71ae01be6d9edb
docker.io/bitnami/postgresql:14.1.0-debian-10-r80,docker-pullable://bitnami/postgresql#sha256:6eb9c4ab3444e395df159e2cad21f283e4bf30802958467590c886f376dc9959
docker.io/bitnami/zookeeper:3.8.0-debian-11-r47,docker-pullable://bitnami/zookeeper#sha256:0f3169499c5ee02386c3cb262b2a0d3728998d9f0a94130a8161e389f61d1462
Expected output:
docker.elastic.co/elasticsearch/elasticsearch:7.17.5,sha256:76344d5f89b13147743db0487eb76b03a7f9f0cd55abe8ab887069711f2ee27d
docker.io/bitnami/kafka:3.3.1-debian-11-r11,sha256:be29db0e37b6ab13df5fc14988a4aa64ee772c7f28b4b57898015cf7435ff662
docker.io/bitnami/mongodb:6.0.3-debian-11-r0,sha256:e7438d7964481c0bcfcc8f31bca2d73022c0b7ba883143091a71ae01be6d9edb
docker.io/bitnami/postgresql:14.1.0-debian-10-r80,sha256:6eb9c4ab3444e395df159e2cad21f283e4bf30802958467590c886f376dc9959
docker.io/bitnami/zookeeper:3.8.0-debian-11-r47,sha256:0f3169499c5ee02386c3cb262b2a0d3728998d9f0a94130a8161e389f61d1462
You are trying to use Perl extensions which are not supported by more traditional regex tools like sed and Awk.
Perhaps see also Why are there so many different regular expression dialects? and the Stack Overflow regex tag info page.
If I can guess what you are trying to do, you want simply
sed -i 's/,[^#]*#/,/g' temp
The /g flag is unnecessary if you only expect one match per line.
Neither , nor # is a regex metacharacter; they do not require escaping.
Usually you would want to avoid using a temporary file or sed -i; perhaps simply
kubectl blah blah | sed 's/,[^#]*#/,/' > temp
to create the file, or remove the redirection if you want to pipe the results further.

sed - replace pattern with content of file, while the name of the file is the pattern itself

Starting from the previous question, I have another one. If I make this work, I can just delete several lines of script :D
I want to transform this line:
sed -i -r -e "/$(basename "$token_file")/{r $token_file" -e "d}" "$out_dir_rug"/rug.frag
into this line:
sed -i -r -e "/(##_[_a-zA-Z0-9]+_##)/{r $out_dir_frags_rug/\1" -e "d}" "$out_dir_rug"/rug.frag
The idea is the following. Originally (the first line), I searched for some patterns, and then replaced those patterns with their associated files. The names of the files are the patterns themselves.
Eample:
Pattern: ##_foo_##
File name: ##_foo_##
Content of file ##_foo_##:
first line of foo text
second line of foo text
so the text
bar
##_foo_##
bar
would become
bar
first line of foo text
second line of foo text
bar
In my second attempt, I used sed for both locating the patterns, and for the actual replacement.
The result is that the patterns are found, but replaced with pretty much nothing.
Is sed supposed to be able to do the replacement I want? If yes, how should I change my command?
Note: a file usually has several different patterns (I call them tokens), and the same pattern may appear more than one time.
So an input file might look like:
bar
bar
##_foo_##
bar
##_haa_##
bar
##_foo_##
and so on
I already tried to replace the / in the address with ,, to no useful result. Escaping the / in the path to \/ also does not help.
I verified that the path to the replacement files is good by adding the next line, just before the sed:
echo "$out_dir_frags_rug"
The names of the files are the patterns themselves.
If you need anything "dynamic", then sed is not enough for it. As sed can't do "eval" - can't reinterpret the content of pattern buffer or hold buffer as commands (that would be amazing!) - you can't use the line as part of the command.*
You can use bash, untested, written here:
while IFS= read -r line; do
if [[ "$line" =~ ^##_([_a-zA-Z0-9]+)_## ]]; then
cat "${BASH_REMATCH[1]}"
else
printf "%s\n" "$line"
fi
done < inputfile
but that would be slow - bash is slow on reading lines. A similar design could be working in a POSIX shell with POSIX tools, by replacing [[ bash extension with some grep + sed or awk.
An awk would be waaaaaaay faster, something along, also untested:
awk '/^##_[_a-zA-Z0-9]+_##$/{
gsub(/^##_/, "", $0);
gsub(/_##$/, "", $0);
file = $0
while (getline tmp < (file)) print tmp;
next
}
{print}
' inputfile
That said, for your specific problem, instead of reinventing the wheel and writing yet another templating and preprocessing tool, I would advise to concentrate on researching existing solutions. A simple cpp file with the following content can be preprocessed with C preprocessor:
bar
bar
#include "foo"
bar
#include "haa"
bar
#include "foo"
and so on
It's clear to anyone what it means and it has a very standarized format and you also get get all the #ifdef conditional expressions, macros and macro functions that you can use - but you can't start lines with #, dunno if that's important. For endless ultimate templating power, I could recommend m4 from the standard unix commands.
* You can however with GNU sed execute the content of replacement string inside s command in shell with e flag. I did forget about it when writing this answer, as it's rarely used and I would strongly advise against using e flag - finding out proper quoting for the subshell is hard (impossible?) and it's very easy to abuse it. Anyway, the following could work:
sed -n '/^##_\(.*\)_##$/!{p;n;}; s//cat \1/ep'
but with the following input it may cause harm on your system:
some input file
##_$(rm /)_##
^^^^^^^ - will be executed in subshell and remove all your files
I think proper quoting would be something along (untested):
sed -n '/^##_\(.*\)_##$/!{p;n;}; s//\1/; '"s/'/'\\\\''/g; p; s/.*/cat '&'/ep"
but I would go with existing tools like cpp or m4 anyway.
With sed
Yes, this is possible with GNU sed.
With this input file input.txt:
= bar =
##_foo_##
= bar2 =
##_foo_##
= bar3 =
And the ##_foo_## file you gave in your question, the command
sed -E '
/^##_[_a-zA-Z0-9]+_##$/ {
s|^|cat ./|
e
}
' input.txt
... will yield:
= bar =
first line of foo text
second line of foo text
= bar2 =
first line of foo text
second line of foo text
= bar3 =
This command can also be shortened to this one-liner:
sed -E '/^##_[_a-zA-Z0-9]+_##$/ s|^|cat ./|e' input.txt
Explanation
GNU sed has a special command e that executes the command found in pattern space and then replaces the content of the pattern space with the output of the command.
When the above program encounters a line matching your pattern ##_file_##, it prepends cat ./ to the pattern space and executes it with e.
The s/.../.../e command is a shortened version that does exactly the same, the command being executed only if a successful substitution occured.
Contrary to what KamilCuk says in their answer, both sed commands above are perfectly safe and don't need any escaping/quoting because they are executed on a known harmless pattern that cannot be tricked to execute anything else than the expected cat.
Of course, this is designed to work with that ##_file_## pattern you gave in your question. Allowing spaces or other fancy characters in your pattern may break things since they might be interpreted by the shell.
With awk
Here is the equivalent with awk:
awk '
/^##_[_a-zA-Z0-9]+_##$/ {
system("cat ./" $0)
next
}
1
' input.txt
This command can also be shortened to this one-liner:
awk '! /^##_[_a-zA-Z0-9]+_##$/ || system("cat ./" $0)' input.txt
Explanation
This is very similar to the sed commands above: when awk meets the pattern ##_file_## it builds the corresponding cat command and executes it with system() then it skips to the next input line with next. Lines that don't match the pattern are printed as is (the 1 line).
Of course, the command being interpreted by the shell, the same caveat applies here: both awk commands are perfectly safe and don't need any escaping/quoting as long as your pattern stays that simple.

sed substitute and show line number

I'm working in bash trying to use sed substitution on a file and show both the line number where the substitution occurred and the final version of the line. For a file with lines that contain foo, trying with
sed -n 's/foo/bar/gp' filename
will show me the lines where substitution occurred, but I can't figure out how to include the line number. If I try to use = as a flag to print the current line number like
sed -n 's/foo/bar/gp=' filename
I get
sed: -e expression #1, char 14: unknown option to `s'
I can accomplish the goal with awk like
awk '{if (sub("foo","bar",$0)){print NR $0}}' filename
but I'm curious if there's a way to do this with one line of sed. If possible I'd love to use a single sed statement without a pipe.
I can't think of a way to do it without listing the search pattern twice and using command grouping.
sed -n "/foo/{s/foo/bar/g;=;p;}" filename
EDIT: mklement0 helped me out there by mentioning that if the pattern space is empty, the default pattern space is the last one used, as mentioned in the manual. So you could get away with it like this:
sed -n "/foo/{s//bar/g;=;p;}" filename
Before that, I figured out a way not to repeat the pattern space, but it uses branches and labels. "In most cases," the docs specify, "use of these commands indicates that you are probably better off programming in something like awk or Perl. But occasionally one is committed to sticking with sed, and these commands can enable one to write quite convoluted scripts." [source]
sed -n "s/foo/bar/g;tp;b;:p;=;p" filename
This does the following:
s/foo/bar/g does your substitution.
tp will jump to :p iff a substitution happened.
b (branch with no label) will process the next line.
:p defines label p, which is the target for the tp command above.
= and p will print the line number and then the line.
End of script, so go back and process the next line.
See? Much less readable...and maybe a distant cousin of :(){ :|:& };:. :)
It cannot be done in any reasonable way with sed, here's how to really do it clearly and simply in awk:
awk 'sub(/foo/,"bar"){print NR, $0}' filename
sed is an excellent tool for simple substitutions on a single line, for anything else use awk.

Insert line after match using sed

For some reason I can't seem to find a straightforward answer to this and I'm on a bit of a time crunch at the moment. How would I go about inserting a choice line of text after the first line matching a specific string using the sed command. I have ...
CLIENTSCRIPT="foo"
CLIENTFILE="bar"
And I want insert a line after the CLIENTSCRIPT= line resulting in ...
CLIENTSCRIPT="foo"
CLIENTSCRIPT2="hello"
CLIENTFILE="bar"
Try doing this using GNU sed:
sed '/CLIENTSCRIPT="foo"/a CLIENTSCRIPT2="hello"' file
if you want to substitute in-place, use
sed -i '/CLIENTSCRIPT="foo"/a CLIENTSCRIPT2="hello"' file
Output
CLIENTSCRIPT="foo"
CLIENTSCRIPT2="hello"
CLIENTFILE="bar"
Doc
see sed doc and search \a (append)
Note the standard sed syntax (as in POSIX, so supported by all conforming sed implementations around (GNU, OS/X, BSD, Solaris...)):
sed '/CLIENTSCRIPT=/a\
CLIENTSCRIPT2="hello"' file
Or on one line:
sed -e '/CLIENTSCRIPT=/a\' -e 'CLIENTSCRIPT2="hello"' file
(-expressions (and the contents of -files) are joined with newlines to make up the sed script sed interprets).
The -i option for in-place editing is also a GNU extension, some other implementations (like FreeBSD's) support -i '' for that.
Alternatively, for portability, you can use perl instead:
perl -pi -e '$_ .= qq(CLIENTSCRIPT2="hello"\n) if /CLIENTSCRIPT=/' file
Or you could use ed or ex:
printf '%s\n' /CLIENTSCRIPT=/a 'CLIENTSCRIPT2="hello"' . w q | ex -s file
Sed command that works on MacOS (at least, OS 10) and Unix alike (ie. doesn't require gnu sed like Gilles' (currently accepted) one does):
sed -e '/CLIENTSCRIPT="foo"/a\'$'\n''CLIENTSCRIPT2="hello"' file
This works in bash and maybe other shells too that know the $'\n' evaluation quote style. Everything can be on one line and work in
older/POSIX sed commands. If there might be multiple lines matching the CLIENTSCRIPT="foo" (or your equivalent) and you wish to only add the extra line the first time, you can rework it as follows:
sed -e '/^ *CLIENTSCRIPT="foo"/b ins' -e b -e ':ins' -e 'a\'$'\n''CLIENTSCRIPT2="hello"' -e ': done' -e 'n;b done' file
(this creates a loop after the line insertion code that just cycles through the rest of the file, never getting back to the first sed command again).
You might notice I added a '^ *' to the matching pattern in case that line shows up in a comment, say, or is indented. Its not 100% perfect but covers some other situations likely to be common. Adjust as required...
These two solutions also get round the problem (for the generic solution to adding a line) that if your new inserted line contains unescaped backslashes or ampersands they will be interpreted by sed and likely not come out the same, just like the \n is - eg. \0 would be the first line matched. Especially handy if you're adding a line that comes from a variable where you'd otherwise have to escape everything first using ${var//} before, or another sed statement etc.
This solution is a little less messy in scripts (that quoting and \n is not easy to read though), when you don't want to put the replacement text for the a command at the start of a line if say, in a function with indented lines. I've taken advantage that $'\n' is evaluated to a newline by the shell, its not in regular '\n' single-quoted values.
Its getting long enough though that I think perl/even awk might win due to being more readable.
A POSIX compliant one using the s command:
sed '/CLIENTSCRIPT="foo"/s/.*/&\
CLIENTSCRIPT2="hello"/' file
Maybe a bit late to post an answer for this, but I found some of the above solutions a bit cumbersome.
I tried simple string replacement in sed and it worked:
sed 's/CLIENTSCRIPT="foo"/&\nCLIENTSCRIPT2="hello"/' file
& sign reflects the matched string, and then you add \n and the new line.
As mentioned, if you want to do it in-place:
sed -i 's/CLIENTSCRIPT="foo"/&\nCLIENTSCRIPT2="hello"/' file
Another thing. You can match using an expression:
sed -i 's/CLIENTSCRIPT=.*/&\nCLIENTSCRIPT2="hello"/' file
Hope this helps someone
The awk variant :
awk '1;/CLIENTSCRIPT=/{print "CLIENTSCRIPT2=\"hello\""}' file
I had a similar task, and was not able to get the above perl solution to work.
Here is my solution:
perl -i -pe "BEGIN{undef $/;} s/^\[mysqld\]$/[mysqld]\n\ncollation-server = utf8_unicode_ci\n/sgm" /etc/mysql/my.cnf
Explanation:
Uses a regular expression to search for a line in my /etc/mysql/my.cnf file that contained only [mysqld] and replaced it with
[mysqld]
collation-server = utf8_unicode_ci
effectively adding the collation-server = utf8_unicode_ci line after the line containing [mysqld].
I had to do this recently as well for both Mac and Linux OS's and after browsing through many posts and trying many things out, in my particular opinion I never got to where I wanted to which is: a simple enough to understand solution using well known and standard commands with simple patterns, one liner, portable, expandable to add in more constraints. Then I tried to looked at it with a different perspective, that's when I realized i could do without the "one liner" option if a "2-liner" met the rest of my criteria. At the end I came up with this solution I like that works in both Ubuntu and Mac which i wanted to share with everyone:
insertLine=$(( $(grep -n "foo" sample.txt | cut -f1 -d: | head -1) + 1 ))
sed -i -e "$insertLine"' i\'$'\n''bar'$'\n' sample.txt
In first command, grep looks for line numbers containing "foo", cut/head selects 1st occurrence, and the arithmetic op increments that first occurrence line number by 1 since I want to insert after the occurrence.
In second command, it's an in-place file edit, "i" for inserting: an ansi-c quoting new line, "bar", then another new line. The result is adding a new line containing "bar" after the "foo" line. Each of these 2 commands can be expanded to more complex operations and matching.

Need help interpreting this sed command

I'm looking through an Oracle script I found online, but it runs a sed command to filter results from a trace file. I'm running Oracle on a Windows server, so the sed command isn't recognized.
host sed -n '/scattered/s/.*p3=//p' &Trace_Name | sort -n | tail -1
I've tried reading the online documentation, but am still not sure how to interpret what this command is trying to filter. Would anyone be so kind as to help me interpret what this command is trying to filter? Or better yet, what I can run from a Windows command prompt to achieve the same result.
Thanks!
It says "on lines that contain 'scattered' replace zero or more of any character followed by 'p3=' with nothing (delete it, in other words) and print the result" (-n says don't print lines unless there's an explicit print command).
For this example input:
abc organized p3=123
def scattered p3=456
ghi ordered p3=789
The output would be:
456
The sed command is searching a file for strings that match a pattern. The "-n" option will suppress output that is not explicitly printed. The "p" at the end says to print the lines that match the preceding pattern.
sort -n does a numerical sort.
tail -1 prints the last line, only.
So, it seems to be searching for scattered disk reads and printing the line with the biggest value.
I think the regular expression pattern is eliminating everything up to and including "p3=". The s/from/to/" is a substitute command.
The sed command works with cygwin, a unix-like shell for Windows.
To address the other part of your question, the unxutils project ports many GNU utilities to Win32, including sed. Find out more.

Resources