Insert a text with indentation persevered on the next line of matched string - yaml

I have a file with a yaml data as below:
cat x.yml
foo:
- bar: 1
- zoo: 2
I am able to insert the text but this is messing the indentation(see 2nd line):
sed -r '/^[ ]+- bar:/a- hoo: 3' x.yml
foo:
- bar: 1
- hoo: 3
- zoo: 2
Then, I tried to backreference the leading spaces but seems like it is not working with /a flag.
sed -r '/^([ ]+)- bar:/a\1- hoo: 3' x.yml
foo:
- bar: 1
1- hoo: 3
- zoo: 2
Any help to get the following using a one-liner ?
foo:
- bar: 1
- hoo: 3
- zoo: 2

I suggest to switch to GNU sed's s command:
sed -E 's/( *)- bar:.*/&\n\1- hoo: 3/' file
Output:
foo:
- bar: 1
- hoo: 3
- zoo: 2
See: man sed and The Stack Overflow Regular Expressions FAQ

Best option is probably to use a parser. If you know exactly where the values should be, you can just pop them in there. Otherwise you'd have to loop and look for the "bar" key. This is using the YAML module.
use strict;
use warnings;
use YAML;
my $yaml = do { local $/; <> };
my $href = Load($yaml);
for (0 .. $#{ $href->{foo} }) {
if (grep { $_ eq "bar" } keys %{ $href->{foo}[$_] }) {
splice #{ $href->{foo} }, $_+1, 0, { hoo => 1 };
}
}
print Dump $href;
It outputs:
foo:
- bar: 1
- hoo: 1
- zoo: 2
Otherwise you can use Perl like so:
$ perl -pe's/^( *- *)bar.*\K/$1hoo: 1\n/s' x.yml
foo:
- bar: 1
- hoo: 1
- zoo: 2
Capture from beginning of line ^ a dash surrounded by dashes. Expect "bar", then absorb everything after it into the regex match, including the newline at the end (hence the /s modifier). Keep (\K) everything that was matched, and after it, add on the captured dash-string, plus your new content and a newline. Done.

First off, I agree with Inian saying a YML parser would be more appropriate here.
Nevertheless, you could use the s command and capture groups instead like
$ sed -r 's/^([ ]+)- bar:(.+)$/\1- bar:\2\n\1- hoo: 3/' x.yml
which gives
foo:
- bar: 1
- hoo: 3
- zoo: 2

Related

Get the YAML path of a given line in a file

Using yq (or any other tool), how can I return the full YAML path of an arbitrary line number ?
e.g. with this file :
a:
b:
c: "foo"
d: |
abc
def
I want to get the full path of line 2; it should yield: a.b.c. Line 0 ? a, Line 4 ? a.d (multiline support), etc.
Any idea how I could achieve that?
Thanks
I have coded two solutions that differ slightly in their behaviour (see remarks below)
Use the YAML processor mikefarah/yq.
I have also tried to solve the problem using kislyuk/yq, but it is not suitable,
because the operator input_line_number only works in combination with the --raw-input option
Version 1
FILE='sample.yml'
export LINE=1
yq e '[..
| select(line == env(LINE))
| {"line": line,
"path": path | join("."),
"type": type,
"value": .}
]' $FILE
Remarks
LINE=3 returns two results, because line 3 contains two nodes
the key 'c' of map 'a.b'
the string value 'foo' of key 'c'.
LINE=5 does not return a match, because the multiline text node starts in line 4.
the results are wrapped in an array, as multiple nodes can be returned
Output for LINE=1
- line: 1
path: ""
type: '!!map'
value:
a:
b:
c: "foo"
d: |-
abc
def
Output for LINE=2
- line: 2
path: a
type: '!!map'
value:
b:
c: "foo"
Output for LINE=3
- line: 3
path: a.b
type: '!!map'
value:
c: "foo"
- line: 3
path: a.b.c
type: '!!str'
value: "foo"
Output for LINE=4
- line: 4
path: d
type: '!!str'
value: |-
abc
def
Output for LINE=5
[]
Version 2
FILE='sample.yml'
export LINE=1
if [[ $(wc -l < $FILE) -lt $LINE ]]; then
echo "$FILE has less than $LINE lines"
exit
fi
yq e '[..
| select(line <= env(LINE))
| {"line": line,
"path": path | join("."),
"type": type,
"value": .}
]
| sort_by(.line, .type)
| .[-1]' $FILE
Remarks
at most one node is returned, even if there are more nodes in the selected row. So the result does not have to be wrapped in an array.
Which node of one line is returned can be controlled by the sort_by function, which can be adapted to your own needs.
In this case, text nodes are preferred over maps because "!!map" is sorted before "!!str".
LINE=3 returns only the text node of line 3 (not node of type "!!map")
LINE=5 returns the multiline text node starting at line 4
LINE=99 does not return the last multiline text node of sample.yaml because the maximum number of lines is checked in bash beforehand
Output for LINE=1
line: 1
path: ""
type: '!!map'
value:
a:
b:
c: "foo"
d: |-
abc
def
Output for LINE=2
line: 2
path: a
type: '!!map'
value:
b:
c: "foo"
Output for LINE=3
line: 3
path: a.b.c
type: '!!str'
value: "foo"
Output for LINE=4
line: 4
path: d
type: '!!str'
value: |-
abc
def
Output for LINE=5
line: 4
path: d
type: '!!str'
value: |-
abc
def
Sharing my findings since I've spent too much time on this.
As #Inian mentioned line numbers won't necessary be accurate.
YQ does provides us with the line operator, but I was not able to find a decent way of mapping that from an input.
That said, if you're sure the input file will not contain any multi-line values, you could do something like this
Use awk to get the key of your input line, eg 3 --> C
This assumes the value will never contain :, the regex can be edited if needed to go around this
Select row in awk
Trim leading and trailing spaces from a string in awk
export searchKey=$(awk -F':' 'FNR == 3 { gsub(/ /,""); print $1 }' ii)
Use YQ to recursive (..) loop over the values, and create each path using (path | join("."))
yq e '.. | (path | join("."))' ii
Filter the values from step 2, using a regex where we only want those path's that end in the key from step 1 (strenv(searchKey))
yq e '.. | (path | join(".")) | select(match(strenv(searchKey) + "$"))' ii
Print the path if it's found
Some examples from my local machine, where your input file is named ii and both awk + yq commands are wrapped in a bash function
$ function getPathByLineNumber () {
key=$1
export searchKey="$(awk -v key=$key -F':' 'FNR == key { gsub(/ /, ""); print $1 }' ii)"
yq e '.. | (path | join(".")) | select(match(strenv(searchKey) + "$"))' ii
}
$
$
$
$
$ yq e . ii
a:
b:
c: "foo"
$
$
$ getPathByLineNumber 1
a
$ getPathByLineNumber 2
a.b
$ getPathByLineNumber 3
a.b.c
$
$

yq: Add new value to list in alphabetical order

I have a simple yaml file called foo.yaml
foo:
- a
- c
bar:
- foo: bar
foo2: bar2
I'm trying to add a new value (b) to foo, in alphabetical order. I can add the value with +=, but it doesn't get alphabatized
$ yq '.foo += "b"' foo.yaml
foo:
- a
- c
- b
bar:
- foo: bar
foo2: bar2
If I use + I can use sort, but I only get the raw values. e.g.:
$ yq '.foo + "b" | sort()' foo.yaml
- a
- b
- c
I tried to set this into a bash variable and then use it with =, but it appears as a multi-line text
$ variable=$(yq '.foo + "b" | sort()' foo.yaml)
$ yq ".foo = \"$variable\"" foo.yaml
foo: |-
- a
- b
- c
bar:
- foo: bar
foo2: bar2
Is there an easier way to insert a new value into foo alphabetically, while keeping the rest of the yaml in tact?
The reason you are getting the raw values is that you've told yq to traverse into 'foo'. Instead try:
yq '.foo = (.foo + "b" | sort)' file.yaml
yields:
foo:
- a
- b
- c
bar:
- foo: bar
foo2: bar2
Explanation:
you need to update the entry in 'foo'
then, in brackets, set the new value. Normally you can use +=, but because you want to sort I've used '='
Disclaimer: I wrote yq

What Is the Best Way to Perform a Search and Replace On only Specific Sections of a File?

I have a markdown file with sections separated by headings. I want to perform a search and replace only on specific sections; however, each section has similar content, so a global search and replace would end up affecting all sections. Because of this, I would need to somehow limit the search and replace to only certain sections of the file.
For example, say I wanted to replace all instances of foo with bar under # Section 1, # Section 3, and # Section 4 leaving # Section 2 and # Section 5 unchanged, as shown below
Sample Input:
# Section 1
- foo
- foo
- Unimportant Item
- foo
- Unimportant Item
# Section 2
- foo
- Unimportant Item
# Section 3
- foo
- Unimportant Item
# Section 4
- foo
- Unimportant Item
- foo
# Section 5
- foo
- foo
Sample Output
# Section 1
- bar
- bar
- Unimportant Item
- bar
- Unimportant Item
# Section 2
- foo
- Unimportant Item
# Section 3
- bar
- Unimportant Item
# Section 4
- bar
- Unimportant Item
- bar
# Section 5
- foo
- foo
If I didn't have to worry about the individual sections, a global search and replace would be trivial by using
sed -i 's/foo/bar/g' <input_file>
but I'm not sure if sed is capable of checking context to allow what I am looking for.
Here's a sed version:
sed -E '/^#[^#]\s*Section\s+[134]\s*$/, // s/foo/bar/' input.md
You may use this awk:
awk 'p {sub(/foo$/, "bar")} /^#/ {p = / (Section [134])$/} 1' file
# Section 1
- bar
- bar
- Unimportant Item
- bar
- Unimportant Item
# Section 2
- foo
- Unimportant Item
# Section 3
- bar
- Unimportant Item
# Section 4
- bar
- Unimportant Item
- bar
# Section 5
- foo
- foo
To make it more readable:
awk 'p { # if p==1 and current line # == n
sub(/foo$/, "bar") # replace foo with bar
}
/^#/ { # if line starts with #
p = / (Section [134])$/ # set p = 1/0 if it matches sections
} 1' file
For completion, this awk answer will do the substitutions in the whole section, including the header:
awk '/^#/ { in_section = /Section [1|3|4]/ } in_section { sub(/foo/, "bar") } 1' input.md
If you want to exclude the headers from the substitution:
awk ' /^#/ { in_section = /Section [1|3|4]/; header_line = NR }
in_section && (NR > header_line) { sub(/foo/, "bar") } 1' input.md
Detail
awk '/^#/ { # if in section header
in_section = /Section [1|3|4]/; # determine if section of interest (1/0)
header_line = NR; # value of header line to exclude
}
in_section && (NR > header_line) { # if in section of interest and after header line
sub(/foo/, "bar"); # substitute text
} 1' input.md # 1 is to print all lines
My usual advice whenever you're considering sed -i is to use its older brother ed instead, as unlike sed, it's intended from the get-go to edit files (It's also POSIX standard, unlike sed -i, and thus more portable.)
Something like
ed -s input.md <<EOF
/Section 1/;/Section/s/foo/bar/g
/Section 3/;/Section/sg
w
EOF
Translated: In the block starting with the first line containing Section 1 and ending with the next Section line, replace foo with bar. Then do the same substitution in the Section 3 block. Finally, write the changes back to disk.
You can always provide multiple commands to sed with the -e option so that substitution occurs even when the sections are one after another:
sed -e '/# Section 1/,/#/ s/foo/bar/' -e '/# Section 2/,/#/ s/foo/bar/' input.md
Multiple commands can also be placed in a "sed script file":
# content of script.sed
/# Section 1/,/#/ s/foo/bar/
/# Section 2/,/#/ s/foo/bar/
And you executed like this:
sed -f script.sed input.md
A solution with sed
The key is in the range. The first addressing pattern matches the header(s) where we want the substitution to begin, and the second, matches all headers except the ones in the first addressing pattern. Note that the substitute command is inclusive of the first and last lines in the range (i.e. the headers).
sed -E '/^# Section [134]/, /^# Section [^134]/ s/foo/bar/' input.md
This one excludes the headers from the substitution:
sed -E '/^# Section [134]/, /^# Section [^134]/ { /^#/!s/foo/bar/ }' input.md

bash script increment number at end with zero prefix

given a file with contents like this:
foo: 8.3.1
bar: 803001
I need a bash script to read this file only increment the last digit and consider the second line's last three digits as space for the the z in x.y.x to grow to three digits (and overwrite the original file with the new data:
Input 1:
foo: 8.3.1
bar: 803001
Output 1:
foo: 8.3.2
bar: 803002
Input 2:
foo: 8.3.9
bar: 803009
Output 2:
foo: 8.3.10
bar: 803010
Input 3:
foo: 8.3.199
bar: 803199
Output 3:
foo: 8.3.200
bar: 803200
I could do this in 2 seconds in java but I need to do it in a shell script or i'll face endless taunting from the build team.
Short of some rough string splitting any slick sep command would be a big help!
awk to the rescue!
awk -v d='.' '{n=split($2,v,d);
if (n>1) $2=v[1] d v[2] d v[3]+1;
else $2++}1' file
depending what else is in the file, you may need to qualify the replacement with a condition before the statement.
while read key val _; do
val_left="${val%.*}" val_right="${val##*.}"
printf '%s ' "$key"
[[ $val = *.* ]] && printf '%s.' "$val_left"
printf '%d\n' $(( 1+val_right ))
done
Bikeshedding and slightly golfing. Increment the number formed by the digits at the end of line:
perl -i -pe '/^(foo|bar):/ && s/\d+$/$&+1/e;' input

Multiple "sed" actions on previous results

Have this input:
bar foo
foo ABC/DEF
BAR ABC
ABC foo DEF
foo bar
on the above I need do 4 (sequential) actions:
select only lines containing "foo" (lowercase)
on the selected lines, remove everything but UPPERCASE letters
delete empty lines (if some is created by the previous action)
and on the remaining from the above - enclose every char with [x]
I'm able to solve the above, but need two sed invocations piped together. Script:
#!/bin/bash
data() {
cat <<EOF
bar foo
foo ABC/DEF
BAR ABC
ABC foo DEF
foo bar
EOF
}
echo "Result OK"
data | sed -n '/foo/s/[^A-Z]//gp' | sed '/^\s*$/d;s/./[&]/g'
# in the above it is solved using 2 sed invocations
# trying to solve it using only one invocation,
# but the following doesn't do what i need.. :( :(
echo "Variant 2 - trying to use only ONE invocation of sed"
data | sed -n '/foo/s/[^A-Z]//g;/^\s*$/d;s/./[&]/gp'
output from the above:
Result OK
[A][B][C][D][E][F]
[A][B][C][D][E][F]
Variant 2 - trying to use only ONE invocation of sed
[A][B][C][D][E][F]
[B][A][R][ ][A][B][C]
[A][B][C][D][E][F]
The variant 2 should be also only
[A][B][C][D][E][F]
[A][B][C][D][E][F]
It is possible to solve the above using only by one sed invocation?
sed -n '/foo/{s/[^A-Z]//g;/^$/d;s/./[&]/g;p;}' inputfile
Output:
[A][B][C][D][E][F]
[A][B][C][D][E][F]
Alternative sed approach:
sed '/foo/!d;s/[^A-Z]//g;/./!d;s/./[&]/g' file
The output:
[A][B][C][D][E][F]
[A][B][C][D][E][F]
/foo/!d - deletes all lines that don't contain foo
/./!d - deletes all empty lines

Resources