bash script to edit yaml usimg awk [duplicate] - bash

This question already has an answer here:
Need help on bask awk to update Yaml file by finding the pattern in the file
(1 answer)
Closed 1 year ago.
I am new to bash.. looking some advise on one issue mentioned below.
I have config file below.
impulse.yaml
- job_name: orch
value: CPST
- group: indalco
value1: wr
- monitor:
- name: quid
cnt: 2
- name: kwid
cnt: 3
- name: knid
cnt: 4
- interval: 3m
- static_configs:
- targets: targets1
labels:
group: BSA
gid: geo
dc: lba
if i run the bash script, it should update like below
need to be updated impulse.yaml
- job_name: orch
value: CPST
- group: indalco
value1: wr
- monitor:
- name: quid
cnt: 2
- name: kwid
cnt: 3
- name: knid
cnt: 4
- name: orch_vm1
- name: orch_vm2
- interval: 3m
- static_configs:
- targets: targets1
labels:
group: BSA
gid: geo
dc: lba
------------------bash script---------
getline() {
awk '
BEGIN { ln=1; find_monitor=0; }
(find_monitor==1 && $0~/^[a-z]/) { exit }
($0~/^monitor:/) { find_monitor = 1 ;ln = NR }
END { print ln }' ${1}
}
word="monitor" # no use of this variable
echo $line
filename="impulse.yaml"
for vm_name in orch_vm1 orch_vm2;
do
line=`getline $filename $word`
sed -i -e ${line}"a\ - name: \"${vm_name}\" " $filename
the code right now is updating at the begning of the monitor section of yaml file like below..but it needs to be updated at the end of the monitor section before interval section. Please advise what pattern matching technic can be applied.
- job_name: orch
value: CPST
- group: indalco
value1: wr
- monitor:
- name: orch_vm1
- name: orch_vm2
- name: quid
cnt: 2
- name: kwid
cnt: 3
- name: knid
cnt: 4
- interval: 3m
- static_configs:
- targets: targets1
labels:
group: BSA
gid: geo
dc: lba

I agree with #LéaGris’ comment. Structured data like YAML should be interpreted via its defined syntax. Traditional command line tools can't do this. yq is the closest analogue

Related

Update value in yaml using yq and returning the whole yaml content

spec:
templates:
- name: some_step1
script:
image: some_image
imagePullPolicy: IfNotPresent
- name: some_step2
activeDeadlineSeconds: 300
script:
image: some_image
imagePullPolicy: IfNotPresent
env:
- name: HOST
value: "HOST"
- name: CONTEXT
value: "CONTEXT"
I would like to update the value for CONTEXT. The following command updates the value however it does not return the whole yaml:
yq '.spec.templates | map(select(.name == "some_step2")).[].script.env | map(select(.name == "CONTEXT").value = "CONTEXT1")' test.yaml
The output of the command above:
- name: HOST
value: "HOST"
- name: CONTEXT
value: "CONTEXT1"
How can I make changes in the yq command above to update the value for CONTEXT and return the whole yaml?
I am using mikefarah/yq version v4.30.6.
Use parentheses around the LHS of the assignment to retain the context, i.e. (…) = ….
( .spec.templates[]
| select(.name == "some_step2").script.env[]
| select(.name == "CONTEXT").value
) = "CONTEXT1"
spec:
templates:
- name: some_step1
script:
image: some_image
imagePullPolicy: IfNotPresent
- name: some_step2
activeDeadlineSeconds: 300
script:
image: some_image
imagePullPolicy: IfNotPresent
env:
- name: HOST
value: "HOST"
- name: CONTEXT
value: "CONTEXT1"

promtail: transform the whole log line based on regex

I'm having some challenges with coercing my log lines in a certain format.
I'm running one promtail instance on several log files, of which some are logfmt and others are free-form.
My objective is to transform the free-form ones to the same logfmt as the others, independent of any other labeling. That means the actual payload (log line) pushed to my qryn instance is then supposed to have the same format, and I woudn't even be able to "see" the original, free-form log line downstream. This should enable me to use a simple | logfmt in grafana, regardless of the log source.
I tried in several ways, but I can't get the log line replaced, i.e. while I can extract to labels in all ways conceivable, I can't replace the actual log line.
A (slightly redacted) promtail-config.yml:
server:
disable: true
positions:
filename: ${RUNDIR}/.logs/positions.yaml
clients:
- url: http://mylocalqryn:33100/loki/api/v1/push
batchwait: 5s
timeout: 30s
scrape_configs:
- job_name: consolidated-logs
# https://grafana.com/docs/loki/latest/clients/promtail/pipelines/
# https://grafana.com/docs/loki/latest/clients/promtail/stages/template/
pipeline_stages:
- match:
selector: '{ Program="freeformlog" }'
stages:
- regex:
expression: '^(?P<time>^[0-9-:TZ.+]*)\s+(?P<level>[A-z]*)\s+(?P<Function>[0-9A-z:.]*)\s+(?P<msg>.*$)'
- timestamp:
format: RFC3339
source: time
- template:
source: level
template: '{{ ToLower .Value }}'
- labels:
level:
msg:
Function:
- replace:
expression: '.*'
replace: 'time="{{ .timestamp }}" level="{{ .level }}" msg="{{ .msg }}" Host="{{ .Host }}" Program="{{ .Program }}" Function="{{ .Function }}"'
static_configs:
- targets:
- localhost
labels:
Host: ${HOST:-"_host-unknown_"}
Program: logfmtcompat
__path__: ${RUNDIR}/.logs/logfmtcompat.log
- targets:
- localhost
labels:
Host: ${HOST:-"_host-unknown_"}
Program: freeformlog
__path__: ${RUNDIR}/.logs/freeformlog.log

yq update yaml array using other yaml file's array

I dont have any detail knowledge on yq.
template1.yaml
spec:
template:
temp:
vars:
- name: first
env: []
template2.yaml
env:
-name: "first"
value: 1
-name: "two"
value: 2
I want to add env array of template2.yaml to template1.yaml's env array using yq. How can we do this ??
Which tool called yq are you using?
Using mikefarah/yq (tested with v4.20.2):
yq '
.spec.template.temp.vars[].env += load("template2.yaml").env
' template1.yaml
Using kislyuk/yq (tested with v3.0.2):
yq -y '
.spec.template.temp.vars[].env += input.env
' template1.yaml template2.yaml
Output:
spec:
template:
temp:
vars:
- name: first
env:
- name: "first"
value: 1
- name: "two"
value: 2
Note: This assumes, your template2.yaml looks more like this:
env:
- name: "first"
value: 1
- name: "two"
value: 2

Task with loop in Argo workflow

I want to introduce a for loop in a workflow that consists of 2 individual tasks. The second will be dependent on the first. Each one should use different templates. The second should iterate with {{item}}. For each iteration I want to know if the default is to execute only the second task or it will re-execute the whole flow?
To repeat the second step only, use withItems/withParameter (there is no withArtifact, though you can get the same behavior with data). These loops repeat the specific step they are mentioned in for the specified items/parameter only.
- name: create-resources
inputs:
paramet`enter code here`ers:
- name: env
- name: giturl
- name: resources
- name: awssecret
dag:
tasks:
- name: resource
template: resource-create
arguments:
parameters:
- name: env
value: "{{inputs.parameters.env}}"
- name: giturl
value: "{{inputs.parameters.giturl}}"
- name: resource
value: "{{item}}"
- name: awssecret
value: "{{inputs.parameters.awssecret}}"
withParam: "{{inputs.parameters.resources}}"
############# For parallel execution use steps ##############
steps:
- - name: resource
template: resource-create
arguments:
parameters:
- name: env
value: "{{inputs.parameters.env}}"
- name: giturl
value: "{{inputs.parameters.giturl}}"
- name: resource
value: "{{item}}"
- name: awssecret
value: "{{inputs.parameters.awssecret}}"
withParam: "{{inputs.parameters.resources}}"

Multiple json_query in Ansible?

I have the following yaml file.
resources:
- apiVersion: v1
kind: Deployment
metadata:
labels:
app: test
name: test-cluster-operator
namespace: destiny001
spec:
selector:
matchLabels:
name: test-cluster-operator
test.io/kind: cluster-operator
strategy:
type: Recreate
template:
metadata:
labels:
name: test-cluster-operator
test.io/kind: cluster-operator
spec:
containers:
- args:
- /path/test/bin/cluster_operator_run.sh
env:
- name: MY_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthy
port: 8080
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
name: test-cluster-operator
readinessProbe:
failureThreshold: 3
httpGet:
path: /ready
port: 8080
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: '1'
memory: 256Mi
requests:
cpu: 200m
memory: 256Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/data
name: data-cluster-operator
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: test-cluster-operator
serviceAccountName: test-cluster-operator
terminationGracePeriodSeconds: 30
volumes:
- name: data-cluster-operator
persistentVolumeClaim:
claimName: data-cluster-operator
I am trying to get the the value of env variable called MY_NAMESPACE.
This is what I tried in Ansible to get to the env tree path.
- name: "set test fact"
set_fact:
myresult: "{{ yaml_file_variable | json_query(\"resources[?metadata.name=='test-cluster-operator'].spec.template.spec\") | json_query(\"containers[?name=='test-cluster-operator'].env\") }}"
- name: "debug"
debug:
msg: "{{ myresult }}"
This produces an empty list, however the first json_query works well.
How do I use json_query correctly in this case?
Can I achieve this with just one json_query?
EDIT:
I seem to be closer to a solution but the result ist a list and not string, which I find annoying.
- name: "set test fact"
set_fact:
myresult: "{{ yaml_file_variable | json_query(\"resources[?metadata.name=='test-cluster-operator'].spec.template.spec\") | json_query(\"[].containers[?name=='test-cluster-operator']\") | json_query(\"[].env[?name=='MY_NAMESPACE'].name\") }}"
This prints - - MY_NAMESPACE instead of just MY_NAMESPACE.
Do I have to use first filter every time after json_query? I know for sure that there is only one containers element. I don't understand why json_query returns a list.
This is finally working but no idea whether it's correct way to do it.
- name: "set test fact"
set_fact:
myresult: "{{ yaml_file_variable | json_query(\"resources[?metadata.name=='test-cluster-operator'].spec.template.spec\") | first | json_query(\"containers[?name=='test-cluster-operator']\") | first | json_query(\"env[?name=='MY_NAMESPACE'].valueFrom \") | first }}"
json_query uses jmespath and jmespath always returns a list. This is why your first example isn't working. The first query returns a list but the second is trying to query a key. You've corrected that in the second with [].
You're also missing the jmespath pipe expression: | which is used pretty much as you might expect - the result of the first query can be piped into a new one. Note that this is separate from ansible filters using the same character.
This query:
resources[?metadata.name=='test-cluster-operator'].spec.template.spec | [].containers[?name=='test-cluster-operator'][].env[].valueFrom
Should give you the following output:
[
{
"fieldRef": {
"apiVersion": "v1",
"fieldPath": "metadata.namespace"
}
}
]
Your task should look like this:
- name: "set test fact"
set_fact:
myresult: "{{ yaml_file_variable | json_query(\"resources[?metadata.name=='test-cluster-operator'].spec.template.spec | [].containers[?name=='test-cluster-operator'][].env[].valueFrom\") | first }}"
To answer your other question, yes you'll need to the first filter. As mentioned jmespath will always return a list, so if you just want the value of a key you'll need to pull it out.

Resources