yq command is not writing contents to the file persistently - yaml

I am using yq tool to write data to a yaml file , but i am unable to write the data persistently to the yaml file. when executed the below command it returns the output i have expected to the console but it is writing only first section to the file... any help is greatly appreciated.
yq version: 3.4.1
Command :
yq n affinity-controller.fresh_install "False" > history.yaml
yq w -d'*' history.yaml snapshot-validation-webhook.fresh_install "False"
contents of history.yaml after executing the above command
affinity-controller:
fresh_install: False
expected output :
affinity-controller:
fresh_install: False
snapshot-validation-webhook:
fresh_install: False

For yq v3, according to doc, you should do it like this:
yq w -i history.yaml snapshot-validation-webhook.fresh_install "False"
For yq v4 (please note the leading dot):
yq e '.snapshot-validation-webhook.fresh_install=False' -i history.yaml
Tested and verified on localhost.
Snap users: Please note that there's a bug in yq:4.30.7
https://github.com/mikefarah/yq/issues/1521
Symptom: You'll get the following error when running yq:
Error: chown/tmp/temp1636774104: operation not permitted
Downgrade yq to last working version (4.30.3): sudo snap refresh yq --channel=v4/stable

Related

Dynamically fetch values using env variable from yaml file using yq

I'm trying to get the values of tenants in below yaml file using yq. The intent is to dynamically fetch the value depending on env variable.
Let's assume there's an env variable var="az-dev", then tenants of az-dev should be retrieved.
I have given some tries as mentioned below but no luck.
YAML file
tenantlist:
az-dev:
tenants:
- myazdev
az-stage:
tenants:
- myazstg1 myazstg2
aw-dev:
tenants:
- myawdev1 myawdev2
aw-stage:
tenants:
- myawstg1 myawstg2
Tries:
var="az-dev" yq e '.tenantlist | select(. == env(var))' file.txt
var=az-dev; yq '.tenantlist |= strenv(var) has("az-dev")' file.txt
Any help would be appreciated. TIA.
With mikefarah/yq, you can simply index the required key name by [..] and get the corresponding tenants list
var="az-dev" yq '.tenantlist[strenv(var)].tenants[]' yaml
Or just pick the keys of interest from the map (available since v4.22.1)
var="az-dev" yq '.tenantlist | pick([strenv(var)]) | .[].tenants[]' yaml
Note: Since 4.18.1, yq's eval/e command is the default command and no longer needs to be specified.

Extract the lines using sed or awk and save them in file

Dear Stackoverflow Community,
I am trying to grab the value or the part of the string or lines.
The Kubernetes init gives 2 kubeadm join commands.
I want to extract the first one and save it in a file and similarly extract the 2nd one and save it in a file.
Below is the text that I am trying to extract from the file:
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 10.0.0.0:6443 --token jh88qi.uch1l58ri160bve1 \
--discovery-token-ca-cert-hash sha256:f9c9ab441d913fec7d157c20f1c5e93c496123456ac4ec14ca8e02ab7f916d7fb \
--control-plane --certificate-key 179e288571e33d3d68f5691b6d8e7cefa4657550fc0886856a52e2431hjkl7155
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.0.0.0:6443 --token jh88qi.uch1l58ri160bve1 \
--discovery-token-ca-cert-hash sha256:f9c9ab441d913fec7d157c20f1c5e93c496123456ac4ec14ca8e02ab7f916d7fb
Goal -
Extract both kubeadm join commands and save them in different files for automation.
Commands Used till now -
sed -ne '/--control-plane --certificate-key/p' token
With the above command, I want to extract value if I can and save it in a file.
The other command -
awk '/kubeadm join/{x=NR+2}(NR<=x){print}' token
token is the filename
You didn't show the expected output so it's a bit of a guess but this:
awk -v RS= '/^ *kubeadm join/{print > ("out"NR); close("out"NR)}' file
should do what I think you want given the input you provided.

Snakemake on cluster error: 'Wildcards' object has no attribute 'output'

I'm running into an error of 'Wildcards' object has no attribute 'output', similar to this earlier question 'Wildcards' object has no attribute 'output', when I submit Snakemake to my cluster. I'm wondering if you have any suggestions for how to make this compatible with the cluster?
While my rule annotate_snps works when I test it locally, I get the following error on the cluster:
input: results/CI226380_S4/vars/CI226380_S4_bwa_H37Rv_gatk.vcf.gz
output: results/CI226380_S4/vars/CI226380_S4_bwa_H37Rv_gatk_rename.vcf.gz, results/CI226380_S4/vars/CI226380_S4_bwa_H37Rv_gatk_tmp.vcf.gz, results/CI226380_S4/vars/CI226380_S4_bwa_H37Rv_gatk_ann.vcf.gz
log: results/CI226380_S4/vars/CI226380_S4_bwa_H37Rv_annotate_snps.log
jobid: 1139
wildcards: samp=CI226380_S4, mapper=bwa, ref=H37Rv
WorkflowError in line 173 of /oak/stanford/scg/lab_jandr/walter/tb/mtb/workflow/Snakefile:
'Wildcards' object has no attribute 'output'
My rule is defined as:
rule annotate_snps:
input:
vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_gatk.vcf.gz'
log:
'results/{samp}/vars/{samp}_{mapper}_{ref}_annotate_snps.log'
output:
rename_vcf=temp('results/{samp}/vars/{samp}_{mapper}_{ref}_gatk_rename.vcf.gz'),
tmp_vcf=temp('results/{samp}/vars/{samp}_{mapper}_{ref}_gatk_tmp.vcf.gz'),
ann_vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_gatk_ann.vcf.gz'
params:
bed=config['bed_path'],
vcf_header=config['vcf_header']
shell:
'''
# Rename Chromosome to be consistent with snpEff/Ensembl genomes.
zcat {input.vcf}| sed 's/NC_000962.3/Chromosome/g' | bgzip > {output.rename_vcf}
tabix {output.rename_vcf}
# Run snpEff
java -jar -Xmx8g {config[snpeff]} eff {config[snpeff_db]} {output.rename_vcf} -dataDir {config[snpeff_datapath]} -noStats -no-downstream -no-upstream -canon > {output.tmp_vcf}
# Also use bed file to annotate vcf
bcftools annotate -a {params.bed} -h {params.vcf_header} -c CHROM,FROM,TO,FORMAT/PPE {output.tmp_vcf} > {output.ann_vcf}
'''
Thank you so much in advance!
The raw rule definition appears to be consistent except for the multiple calls to the contents of config, e.g. config[snpeff].
One thing to check is if the config definition on the single machine and on the cluster is the same, if it's not there might be some content that is confusing snakemake, e.g. if somehow config[snpeff] == "wildcards.output" (or something similar).

Passing a string containing curly brackets to helm caused: Error: failed parsing --set data: key

${CLIENT_ID} and ${CLIENT_SECRET} are both sourced from a yaml based properties file e.g
CLIENT_ID: 11111111-1111-1111-1111-111111111111
CLIENT_SECRET: '{aes}AM+JYP8t9ga1m6s111x1fjdfePL10v90RmbgWFdOjVdD/wlnszAbJad8aOI4qqMv6eSGaW2nfTF4PG2OYH+rx9K052TXNP6PGAAcRph9pl11'
using:
PROPERTIES_FILE="properties.yaml"
CLIENT_ID=$(yq r "${PROPERTIES_FILE}" CLIENT_ID)
CLIENT_SECRET=$(yq r "${PROPERTIES_FILE}" CLIENT_SECRET)
and then passed into my helm command for deploying my app:
echo ${CLIENT_ID}
# 11111111-1111-1111-1111-111111111111
echo ${CLIENT_SECRET}
# {aes}AM+JYP8t9ga1m6s111x1fjdfePL10v90RmbgWFdOjVdD/wlnszAbJad8aOI4qqMv6eSGaW2nfTF4PG2OYH+rx9K052TXNP6PGAAcRph9pl11
helm upgrade -i --debug --namespace mynamespace release \
-f "charts/app/values.yaml" \
--set "app.configmap.dependancy.client_id=${CLIENT_ID}" \
--set "app.configmap.dependancy.client_secret=${CLIENT_SECRET}" \
"charts/app/"
charts/app/values.yaml contains:
app:
..
configmap:
dependancy:
client_id: ""
client_secret: ""
The problem is, I get this error when running the helm command:
Error: failed parsing --set-string data: key "AM+JYP8t9ga1m6s111x1fjdfePL10v90RmbgWFdOjVdD/wlnszAbJad8aOI4qqMv6eSGaW2nfTF4PG2OYH+rx9K052TXNP6PGAAcRph9pl11" has no value
No resources found.
Any idea why the prefixed {aes} is causing problems when passed into helm like this? The command works if I remove the {aes} prefix.
Helm tries to make it possible to pass some structured data in using --set and that's tripping you up here. In particular,
Lists can be expressed by enclosing values in { and }. For example, --set name={a, b, c} translates to [a list of a, b, and c].
So if you --set 'key={aes}AM+JYP8...', the {aes} part looks like that list syntax but then there's things after it that Helm doesn't understand.
You can backslash-escape the curly braces, though that's a little tricky to do in shell syntax. There's a --set-string option (documented in the Helm 2 documentation but still present in Helm 3) that might be able to do this. Possibly the most straightforward path, though, is to write out your own custom YAML values file:
#!/bin/sh
PROPERTIES_FILE="properties.yaml"
CLIENT_ID=$(yq r "${PROPERTIES_FILE}" CLIENT_ID)
CLIENT_SECRET=$(yq r "${PROPERTIES_FILE}" CLIENT_SECRET)
cat >values.tmp.yaml <<EOF
app:
configmap:
dependancy:
client_id: "${CLIENT_ID}"
client_secret: "${CLIENT_SECRET}"
EOF
helm upgrade -i --debug --namespace mynamespace release \
-f values.tmp.yaml charts/app
(You can have multiple helm install -f options if you need to. The chart's values.yaml is read in automatically, and is overridden by command-line options.)

Snakemake conda env parameter is not taken from config.yaml file

I use a conda env that I create manually, not automatically using Snakemake. I do this to keep tighter version control.
Anyway, in my config.yaml I have the following line:
conda_env: '/rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake'
Then, at the start of my Snakefile I read that variable (reading variables from config in your shell part does not seem to work, am I right?):
conda_env = config['conda_env']
Then in a shell part I hail said parameter like this:
rule rsem_quantify:
input:
os.path.join(fastq_dir, '{sample}_R1_001.fastq.gz'),
os.path.join(fastq_dir, '{sample}_R2_001.fastq.gz')
output:
os.path.join(analyzed_dir, '{sample}.genes.results'),
os.path.join(analyzed_dir, '{sample}.STAR.genome.bam')
threads: 8
shell:
'''
#!/bin/bash
source activate {conda_env}
rsem-calculate-expression \
--paired-end \
{input} \
{rsem_ref_base} \
{analyzed_dir}/{wildcards.sample} \
--strandedness reverse \
--num-threads {threads} \
--star \
--star-gzipped-read-file \
--star-output-genome-bam
'''
Notice the {conda_env}. Now this gives me the following error:
Could not find conda environment: None
You can list all discoverable environments with `conda info --envs`.
Now, if I change {conda_env} for its parameter directly /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake, it does work! I don't have any trouble reading other parameters using this method (like rsem_ref_base and analyzed_dir in the example rule above.
What could be wrong here?
Highest regards,
Freek.
The pattern I use is to load variables into params, so something along the lines of
rule rsem_quantify:
input:
os.path.join(fastq_dir, '{sample}_R1_001.fastq.gz'),
os.path.join(fastq_dir, '{sample}_R2_001.fastq.gz')
output:
os.path.join(analyzed_dir, '{sample}.genes.results'),
os.path.join(analyzed_dir, '{sample}.STAR.genome.bam')
params:
conda_env=config['conda_env']
threads: 8
shell:
'''
#!/bin/bash
source activate {params.conda_env}
rsem-calculate-expression \
...
'''
Although, I'd also never do this with a conda environment, because Snakemake has conda environment management built-in. See this section in the docs on Integrated Package Management for details. This makes reproducibility much more manageable.

Resources