How should I extract and combine parts of files based on a common text string effectively in bash? - bash

Suppose I have two similar files:
a.yaml
data:
- name: a1
args: ["cmd", "something"]
config:
- name: some
val: thing
- name: a2
args: ["cmd2", "else"]
[...other array values...]
tags: ["something-in-a"]
values: ["else-in-a"]
substitutions:
key1: a-value
key2: a-value
key3: a-value
b.yaml
data:
- name: b1
args: ["cmd", "something"]
config:
- name: some
val: thing
- name: b2
args: ["cmd2", "else"]
[...other array values...]
tags: ["something-in-b"]
values: ["else-in-b"]
substitutions:
key1: b-value
key2: b-value
key3: b-value
My goal is to combine parts of a and b file such that I have a new file which consists of file content before substitutions: from b.yaml and content including and after substitutions: from a.yaml
So in this case, my desired output would be like this:
c.yaml
data:
- name: b1
args: ["cmd", "something"]
config:
- name: some
val: thing
- name: b2
args: ["cmd2", "else"]
[...other array values...]
tags: ["something-in-b"]
values: ["else-in-b"]
substitutions:
key1: a-value
key2: a-value
key3: a-value
The parts before and after substitutions: in both file contents might have different lengths.
Currently, my method is like this:
head -q -n `awk '/substitution/ {print FNR-1}' b.yaml` b.yaml >! c.yaml ; \
tail -q -n `awk '/substitution/ {ROWNUM=FNR} END {print NR-ROWNUM+1}' a.yaml` a.yaml >> c.yaml; \
rm a.yaml b.yaml; mv c.yaml a.yaml; # optional newfile renaming to original
But I wonder if there's an alternative or better method for combining parts of different files based on a common text string in bash?

Use awk, you just need to flag the flow based on the string:
awk '$1 == "substitutions:"{skip = FNR==NR ? 1:0}!skip' b.yaml a.yaml
Explaination:
FNR==NR: if true, process lines in the first file b.yaml, otherwise the 2nd file a.yaml
!skip: if TRUE, print the line, otherwise skip the line.

{
head -B9999 'substitutions:' a.yaml | head -n -1
head -A9999 'substitutions:' b.yaml
} > c.yaml
A oneliner:
{ head -B9999 'substitutions:' a.yaml | head -n -1; head -A9999 'substitutions:' b.yaml; } > c.yaml
The -A9999 and -B9999 are a bit dirty, here's a solution with sed's:
{
sed '/substitutions:/,$d' a.yaml
echo substitutions:
sed '1,/substitutions:/d' b.yaml
} > c.yaml

Related

How to merge two .yaml files such that shared keys between the files uses only one of their values?

I am attempting to merge two yaml files and would like any shared keys under a specific key to use values from one of the yaml files, and not merge both. This problem may be better described using an example. GIven file1.yaml and file2.yaml, I am trying to achieve the following:
file1.yaml
name: 'file1'
paths:
path1:
content: "t"
path2:
content: "b"
file2.yaml
name: 'file2'
paths:
path1:
value: "t"
My ideal result in merging is the following file:
file3.yaml
name: 'file2'
paths:
path1:
value: "t"
path2:
content: "b"
Specifically, I would like to overwrite any key under paths such that if both yaml files have the same key under paths, then only use the value from file2. Is there some tool that enables this? I was looking into yq but I'm not sure if that tool would work
Please specify which implementation of yq you are using. They are quite similar, but sometimes differ a lot.
For instance, using kislyuk/yq, you can use input to access the second file, which you can provide alongside the first one:
yq -y 'input as $in | .name = $in.name | .paths += $in.paths' file1.yaml file2.yaml
name: file2
paths:
path1:
value: t
path2:
content: b
With mikefarah/yq, you'd use load with providing the second file in the code, while only the first one is your regular input:
yq 'load("file2.yaml") as $in | .name = $in.name | .paths += $in.paths' file1.yaml
name: 'file2'
paths:
path1:
value: "t"
path2:
content: "b"

bash search and replace a line after a certain line

I have a big yaml file containing multiple declaration blocks, related to different services.
The structure is similar to the following (but repeated for multiple applications):
- name: commerce-api
type: helm
version: 0.0.5
I would like to find the block of code that is containing commerce-api and replace the version property value with something else.
The thing is, I wrote this script:
bumpConfig() {
LINE=$(awk "/- name: $1$/{print NR + $2}" "$CONFIG_YML")
sed -i "" -E "${LINE}s/version: $3.*$/\version: $4/" "$CONFIG_YML"
}
bumpConfig "commerce-api" 2 "$OLD_APP_VERSION" "$NEW_APP_VERSION"
Which is kind of allowing me to do what I want, but the only problem is that, the property version is not always on the third line.
How can I make my script to look for the first occurrence of version given the service name to be commerce-api?
Is this even possible using awk?
Adding some variation to the input file:
$ cat config.yml
- name: commerce-api-skip
type: helm
version: 0.0.5
- name: commerce-api
type: helm
bogus line1: bogus value1
version: 0.0.5
bogus line2: bogus value2
- name: commerce-api-skip-too
type: helm
version: 0.0.5
One awk idea:
bumpConfig() {
awk -v name="$1" -v old="$2" -v new="$3" '
/- name: / { replace=0
if ($NF == name)
replace=1
}
replace && $1=="version:" { if ($NF == old)
$0=substr($0,1,index($0,old)-1) new
}
1
' "${CONFIG_YML}"
}
Taking for a test drive:
CONFIG_YML='config.yml'
name='commerce-api'
OLD_APP_VERSION='0.0.5'
NEW_APP_VERSION='0.0.7'
bumpConfig "${name}" "${OLD_APP_VERSION}" "${NEW_APP_VERSION}"
This generates:
- name: commerce-api-skip
type: helm
version: 0.0.5
- name: commerce-api
type: helm
bogus line1: bogus value1
version: 0.0.7
bogus line2: bogus value2
- name: commerce-api-skip-too
type: helm
version: 0.0.5
Once OP is satisfied with the result:
if running GNU awk the file can be updated 'in place' via: awk -i inplace -v name="$1" ...
otherwise the output can be saved to a temp file and then copy the temp file over the original: awk -v name="$1" ... > tmpfile; mv tmpfile "${CONFIG_YML}"
Entirely in sed
sed -i '' "s/^version: $3/version: $4/' "$CONFIG_YML"
/^- name: $1\$/,/^- name:/ restricts the s command to just the lines between the requested name and the next - name: line.
#!/bin/bash
OLD_APP_VERSION=0.0.5
NEW_APP_VERSION=0.0.7
CONFIG_YML=config.yml
bumpConfig() {
gawk -i inplace -v name="$1" -v old="$2" -v new="$3" '
1
/^- name: / && $3 == name {
while (getline > 0) {
if (/^ version: / && $2 == old)
$0 = " version: " new
print
if (!NF || /^-/ || /^ version: /)
break
}
}
' "${CONFIG_YML}"
}
bumpConfig commerce-api "${OLD_APP_VERSION}" "${NEW_APP_VERSION}"

Insert element using github.com/mikefarah/yq command in a file

I have below content in yml file:
category:
toolSettings: settings.xml
Below snippet needs to be added to the existing under category:
env:
variables:
- user: ABC
- passowrd: BCD
Expected Output:
category:
env:
variables:
- user: ABC
- passowrd: BCD
toolSettings: settings.xml
Tried below:
yq e '."category" +=({env: {variables:[ {"user":"ABC"},{"passowrd":"BCD"}]}})' jules.yml > tmp.yml
yq -i '.category.env.variables[0].user="ABC"' jules.yml > tmp.yml
yq -i '.category.env.variables[1].passowrd="BCD"' jules.yml > tmp.yml
But none of the above are not working .
github.com/mikefarah/yq,
yq Version: 4.26.1
One should use either yq -i or a redirection on stdout, not both together.
The following code:
# Create original file
cat >jules.yml <<'EOF'
category:
toolSettings: settings.xml
EOF
# Edit that file in-place
yq -i '
.category.env.variables.user = "ABC"
| .category.env.variables.password = "DEF"
' jules.yml
# Write file to output
echo "New jules.yml file follows:"
echo "---"
cat jules.yml
...leaves jules.yml with the content:
category:
toolSettings: settings.xml
env:
variables:
user: ABC
password: DEF
...as you can see at https://replit.com/#CharlesDuffy2/EnragedPreviousScans#runme.bash

Get the YAML path of a given line in a file

Using yq (or any other tool), how can I return the full YAML path of an arbitrary line number ?
e.g. with this file :
a:
b:
c: "foo"
d: |
abc
def
I want to get the full path of line 2; it should yield: a.b.c. Line 0 ? a, Line 4 ? a.d (multiline support), etc.
Any idea how I could achieve that?
Thanks
I have coded two solutions that differ slightly in their behaviour (see remarks below)
Use the YAML processor mikefarah/yq.
I have also tried to solve the problem using kislyuk/yq, but it is not suitable,
because the operator input_line_number only works in combination with the --raw-input option
Version 1
FILE='sample.yml'
export LINE=1
yq e '[..
| select(line == env(LINE))
| {"line": line,
"path": path | join("."),
"type": type,
"value": .}
]' $FILE
Remarks
LINE=3 returns two results, because line 3 contains two nodes
the key 'c' of map 'a.b'
the string value 'foo' of key 'c'.
LINE=5 does not return a match, because the multiline text node starts in line 4.
the results are wrapped in an array, as multiple nodes can be returned
Output for LINE=1
- line: 1
path: ""
type: '!!map'
value:
a:
b:
c: "foo"
d: |-
abc
def
Output for LINE=2
- line: 2
path: a
type: '!!map'
value:
b:
c: "foo"
Output for LINE=3
- line: 3
path: a.b
type: '!!map'
value:
c: "foo"
- line: 3
path: a.b.c
type: '!!str'
value: "foo"
Output for LINE=4
- line: 4
path: d
type: '!!str'
value: |-
abc
def
Output for LINE=5
[]
Version 2
FILE='sample.yml'
export LINE=1
if [[ $(wc -l < $FILE) -lt $LINE ]]; then
echo "$FILE has less than $LINE lines"
exit
fi
yq e '[..
| select(line <= env(LINE))
| {"line": line,
"path": path | join("."),
"type": type,
"value": .}
]
| sort_by(.line, .type)
| .[-1]' $FILE
Remarks
at most one node is returned, even if there are more nodes in the selected row. So the result does not have to be wrapped in an array.
Which node of one line is returned can be controlled by the sort_by function, which can be adapted to your own needs.
In this case, text nodes are preferred over maps because "!!map" is sorted before "!!str".
LINE=3 returns only the text node of line 3 (not node of type "!!map")
LINE=5 returns the multiline text node starting at line 4
LINE=99 does not return the last multiline text node of sample.yaml because the maximum number of lines is checked in bash beforehand
Output for LINE=1
line: 1
path: ""
type: '!!map'
value:
a:
b:
c: "foo"
d: |-
abc
def
Output for LINE=2
line: 2
path: a
type: '!!map'
value:
b:
c: "foo"
Output for LINE=3
line: 3
path: a.b.c
type: '!!str'
value: "foo"
Output for LINE=4
line: 4
path: d
type: '!!str'
value: |-
abc
def
Output for LINE=5
line: 4
path: d
type: '!!str'
value: |-
abc
def
Sharing my findings since I've spent too much time on this.
As #Inian mentioned line numbers won't necessary be accurate.
YQ does provides us with the line operator, but I was not able to find a decent way of mapping that from an input.
That said, if you're sure the input file will not contain any multi-line values, you could do something like this
Use awk to get the key of your input line, eg 3 --> C
This assumes the value will never contain :, the regex can be edited if needed to go around this
Select row in awk
Trim leading and trailing spaces from a string in awk
export searchKey=$(awk -F':' 'FNR == 3 { gsub(/ /,""); print $1 }' ii)
Use YQ to recursive (..) loop over the values, and create each path using (path | join("."))
yq e '.. | (path | join("."))' ii
Filter the values from step 2, using a regex where we only want those path's that end in the key from step 1 (strenv(searchKey))
yq e '.. | (path | join(".")) | select(match(strenv(searchKey) + "$"))' ii
Print the path if it's found
Some examples from my local machine, where your input file is named ii and both awk + yq commands are wrapped in a bash function
$ function getPathByLineNumber () {
key=$1
export searchKey="$(awk -v key=$key -F':' 'FNR == key { gsub(/ /, ""); print $1 }' ii)"
yq e '.. | (path | join(".")) | select(match(strenv(searchKey) + "$"))' ii
}
$
$
$
$
$ yq e . ii
a:
b:
c: "foo"
$
$
$ getPathByLineNumber 1
a
$ getPathByLineNumber 2
a.b
$ getPathByLineNumber 3
a.b.c
$
$

My sed command to insert lines into a file is not working - is special characters the issue?

I am trying to add a couple of lines in a text file with sed.
I think I have special characters that are giving me the issue.
I want to insert lines between
username: system:node:{{EC2PrivateDNSName}}
and
kind: ConfigMap
This is what I want to insert -
- groups:
- eks-role
- system:master
rolearn: arn:aws:iam::xxxxx:role/eks
username: eks
mapUsers: |
- userarn: arn:aws:iam::xxxxx:user/test-ecr
username: test-ecr
groups:
- eks-role
I have also tried using forward slashes around the special characters to no avail.
Here is the sed command I have now that does not work - it seems not to insert anything. I assume it can't find the line "username: system:node:{{EC2PrivateDNSName}}".
sed '/^username\: system:node\:{{EC2PrivateDNSName}}$/r'<(
echo " - groups:"
echo " - eks-role"
echo " - system:master"
echo " rolearn: arn:aws:iam::xxxxx:role/eks"
echo " username: eks"
echo " mapUsers: |"
echo " - userarn: arn:aws:iam::xxxxx:user/test-ecr"
echo " username: ecr"
echo " groups:"
echo " - eks-role"
) -i -- temp-aws-auth.yaml
Here is the contents of the file that I want to insert into -
apiVersion: v1
data:
mapRoles: |
- groups:
- system:bootstrappers
- system:nodes
rolearn: arn:aws:iam::xxxxx:role/eksctl-ops-nodegroup-linux-ng-sys-NodeInstanceRole-763ALQD2ZGXK
username: system:node:{{EC2PrivateDNSName}}
kind: ConfigMap
metadata:
creationTimestamp: "2020-12-09T15:54:56Z"
name: aws-auth
namespace: kube-system
resourceVersion: "1298"
UPDATE: Taking into consideration OPs answer/comment re: missing spaces, and a bit more fiddling, I was able to get the following sed command to work, too:
sed '/^.*username.*EC2PrivateDNSName.*$/r'<(cat replace.txt) temp-aws-auth.yaml
Assumptions:
OP is unable to use a yaml-aware tool to perform the edit
username ... EC2PrivateDNSName only shows up in one place in the file (or, alternatively, it shows up in multiple places and OP wishes to add a new line after each occurrence)
Replacement data:
$ cat replace.txt
- groups:
- eks-role
- system:master
rolearn: arn:aws:iam::xxxxx:role/eks
username: eks
mapUsers: |
- userarn: arn:aws:iam::xxxxx:user/test-ecr
username: test-ecr
groups:
- eks-role
NOTE: If the replacement data is in a variable it can fed into awk as a herestring.
One awk idea:
awk '
FNR==NR { a[FNR]=$0 # store first file (replacement data) into an array
next } # skip to next line in first file
{ print } # print current line of second file
/username.*EC2PrivateDNSName/ { for (i in a) # if we found our match then dump the contents of array a[] to stdout
print a[i]
next
}
' replace.txt temp-aws-auth.yaml
Or as a single-line:
awk 'FNR==NR {a[FNR]=$0; next} {print} /username.*EC2PrivateDNSName/ { for (i in a) print a[i]; next}' replace.txt temp-aws-auth.yaml
This generates:
apiVersion: v1
data:
mapRoles: |
- groups:
- system:bootstrappers
- system:nodes
rolearn: arn:aws:iam::xxxxx:role/eksctl-ops-nodegroup-linux-ng-sys-NodeInstanceRole-763ALQD2ZGXK
username: system:node:{{EC2PrivateDNSName}}
- groups:
- eks-role
- system:master
rolearn: arn:aws:iam::xxxxx:role/eks
username: eks
mapUsers: |
- userarn: arn:aws:iam::xxxxx:user/test-ecr
username: test-ecr
groups:
- eks-role
kind: ConfigMap
metadata:
creationTimestamp: "2020-12-09T15:54:56Z"
name: aws-auth
namespace: kube-system
resourceVersion: "1298"
I found out the issue with my original command - Sed needs the spaces included in the line it is looking for!
Since the line I was looking for has spaces in it :
' username: system:node:{{EC2PrivateDNSName}}'
I had to add the spaces to my sed statement :
sed '/^ username\: system:node\:{{EC2PrivateDNSName}}$/r'<(
Thanks for the feedback!
Happy holidays!!
This might work for you (GNU sed & cat):
cat <<\! |sed ':a;/username: system:node:{{EC2PrivateDNSName}}/{n;/kind: ConfigMap/!ba;h;s/.*/cat -/ep;g}' file
- groups:
- eks-role
- system:master
rolearn: arn:aws:iam::xxxxx:role/eks
username: eks
mapUsers: |
- userarn: arn:aws:iam::xxxxx:user/test-ecr
username: test-ecr
groups:
- eks-role
!
Make a here-document with the lines to be inserted.
Pipe the here-document through to a sed command.
If a line contains username: system:node:{{EC2PrivateDNSName}}, print it and fetch the next line.
If the following line does not contain kind: ConfigMap go to the start of the sed cycle and start again.
Otherwise, copy the current line, replace/print the current line by the lines to inserted from the here-document and then over write the replacement by the copy in the hold space.
N.B. The replacement lines are inserted into the document by way of the substitute command and the e flag, that evaluates what is substituted into the pattern space i.e. the cat - that is the here-document that is passed through via the pipe.

Resources