pre-commit-hooks: yml validation fails when using merging several anchors - yaml

(Side note: This is a follow-up on this https://sourceforge.net/p/ruamel-yaml/tickets/313/)
I'm building a GitLab CI pipeline by defining a .gitlab-ci.yml file, see https://docs.gitlab.com/ee/ci/yaml/.
As my CI consists of several very similar build steps, I'm using YAML-Anchors quite heavily. For example to define common cache and before-scripts.
I saw that "the correct way" of merging several yaml-anchors, due to the spec, is using
befor-script: &before-script
...
cache: &cache
...
ci-step:
image: ABC
<<: [*before-script, *cache]
script: ...
However, using this also works fine with GitLab CI and IMHO is much nicer to read:
...
ci-step:
image: abc
<<: *before-script
script: ...
<<: *cache
This also enables to put different merge keys at different positions.
All is fine so far, because it is working in GitLab CI.
Now we are using https://github.com/pre-commit/pre-commit-hooks to validate YAML-files in our repository. pre-commit-hooks is using ruamel-yaml internally for yaml-validation.
As a result, the pre-commit-hook fails with the following error message
while construction a mapping
in ".gitlab-ci.yml", line xx, column y
found duplicate key "<<"
in ".gitlab-ci.yml", line zz, column y
How can I prevent this exception from happeing if the key is equal to << in the ruamel-yaml library.
It would also be possible to update pre-commit-hooks to set allow_duplicate_keys = True, see yaml-duplicate-keys.
But this would also allow other duplicate keys, which is not perfect.

The normal way to prevent duplicate keys from throwing an error, is by setting .allow_duplicate_keys as
you indicated. If you set that, any values for duplicate keys 'later' in the mapping overwrite previous values.
In PyYAML, from which ruamel.yaml was derived, this is the side effect of a bug in PyYAML.
However duplicating << is IMO more problematic, as
<<: *a
<<: *b
is undefined and might be expected to work as if YAML document contained:
<<: [*a, *b]
or contained:
<<: [*b, *a]
or only:
<<: *b
or:
<<: *a
And depending on what key-value pairs a and b refer to, these have all different outcomes for the mapping in which the merge is applied.
To prevent the error from being thrown on merge keys only, you need to adapt the loader, but make sure you don't try to use or dump the result, garbage in means garbage out.
import sys
import ruamel.yaml
yaml_str = """\
before-script: &before-script
x: 1
cache: &cache
y: 2
ci-step:
image: ABC
<<: *before-script
script: DEF
<<: *cache
"""
class MyConstructor(ruamel.yaml.SafeConstructor):
def flatten_mapping(self, node):
index = 0
while index < len(node.value):
key_node, value_node = node.value[index]
if key_node.tag == 'tag:yaml.org,2002:merge':
del node.value[index]
index += 1
yaml = ruamel.yaml.YAML(typ='safe')
yaml.Constructor = MyConstructor
data = yaml.load(yaml_str)
print(list(data['ci-step'].keys()))
which gives:
['image', 'script']
You should complain to Gitlab that it allows invalid YAML, especially bad because it has no defined loading behaviour. And if they insist on continuing to support that kind of invalid YAML, they should tell you what it means for the mapping in which this happens.

Related

(bitbucket-pipelines.)yml file has weird indentation?

Why is every section in this file indented by 2 except the step
image: atlassian/default-image:2
pipelines:
default:
- step:
deployment: production
script:
- git submodule update --recursive --init
- zip -r $FILENAME . -x bitbucket-pipelines.yml *.git*
- pipe: atlassian/bitbucket-upload-file:0.1.6
variables:
BITBUCKET_USERNAME: $BITBUCKET_USERNAME
BITBUCKET_APP_PASSWORD: $BITBUCKET_APP_PASSWORD
FILENAME: $FILENAME
And gives an error if changed to two?
https://bitbucket-pipelines.prod.public.atl-paas.net/validator
This topic seems to say because YAML does not consider - to be the first character? But after pipe is the same sequence with only two?
https://community.atlassian.com/t5/Bitbucket-questions/Is-it-intentional-that-bitbucket-pipelinese-yml-indentation/qaq-p/582084
You will usually see two things in YAML
Sequences — in other programming languages, this is commonly referred as arrays, lists, ...:
- apple
- banana
- pear
Mappings — in other programming languages, this is commonly referred as objects, hashes, dictionaries, associative arrays, ...:
question: how do I foobar?
body: Can you help?
tag: yaml
So if you want to store a mapping of mapping you would do:
pier1:
boat: yes
in_activity: yes
comments: needs some painting in 2023
## The mapping above this comment refers to the pier1,
## as it is indented one level below the key `pier1`
pier2:
boat: no
in_activity: no
comments: currently inactive, needs urgent repair
## The mapping above this comment refers to the pier2,
## as it is indented one level below the key `pier2`
While, if you store a list in a mapping, you could go without the indent, indeed:
fruits:
- apple
- banana
- pear
Is strictly equal to
fruits:
- apple
- banana
- pear
But, if you are trying to indent the first step of your pipeline only, like so:
pipelines:
default:
- step:
deployment: production
You end up with an valid YAML, but, a YAML that does not have the same meaning anymore.
When your original yaml means: I do have a default pipeline with a list of actions, the first action being a step that consists of a deployment to production.
The resulting incorrectly indented YAML means: I do have a default pipeline with a list of actions, the first action having two properties, the first property being an empty step, and the second property is that it is a deployment to production.
So, here, really, the deplyement key that was a property of the step mapping became a property of the first element of the list default!
To indent all this as you would like, you will have to go:
pipelines:
default:
- step:
deployment: production
## ^-- This property is now further indented too
So, you end up with the YAML:
image: atlassian/default-image:2
pipelines:
default:
- step:
deployment: production
script:
- git submodule update --recursive --init
- zip -r $FILENAME . -x bitbucket-pipelines.yml *.git*
- pipe: atlassian/bitbucket-upload-file:0.1.6
variables:
BITBUCKET_USERNAME: $BITBUCKET_USERNAME
BITBUCKET_APP_PASSWORD: $BITBUCKET_APP_PASSWORD
FILENAME: $FILENAME

How to specify a condition for a loop in yaml pipelines

I'm trying to download multiple artifacts into different servers(like web, db) using environments. Currently i have added the task DownloadPipelineArtifact#2 in a file and using template to add that task in azure-pipelines.yml. As i'm having multiple artifacts, im trying to use for loop where i'm getting issues.
#azure-pipelines.yml
- template: artifacts-download.yml
parameters:
pipeline:
- pipeline1
- pipeline2
- pipeline3
path:
- path1
- path2
- path3
I need to write loop in yaml so that it should download the pipeline1 artifacts to path1 and so on. Can someone please help??
Object-type parameters are your friend. They are incredibly powerful. As qBasicBoy answered, you'll want to make sure that you group the multiple properties together. If you're finding that you have a high number of properties per object, though, you can do a multi-line equivalent.
The following is an equivalent parameter structure to what qBasicBoy posted:
parameters:
- name: pipelines
type: object
default:
- Name: pipeline1
Path: path1
- Name: pipeline2
Path: path2
- Name: pipeline3
Path: path3
An example where you can stack many properties to a single object is as follows:
parameters:
- name: big_honkin_object
type: object
default:
config:
- appA: this
appB: is
appC: a
appD: really
appE: long
appF: set
appG: of
appH: properties
- appA: and
appB: here
appC: we
appD: go
appE: again
appF: making
appG: more
appH: properties
settings:
startuptype: service
recovery: no
You can, in essence, create an entire dumping ground for everything that you want to do by sticking it in one single object structure and properly segmenting everything. Sure, you could have had "startuptype" and "recovery" as separate string parameters with defaults of "service" and "no" respectively, but this way, we can pass a single large parameter from a high level pipeline to a called template, rather than passing a huge list of parameters AND defining said parameters in the template yaml scripts (remember, that's necessary!).
If you then want to access JUST a single setting, you can do something along the lines of:
- task: PowerShell#2
inputs:
targetType: 'inline'
script: |
# Write your PowerShell commands here.
Write-Host "Apps start as a "${{ parameters.settings.startuptype }}
Write-Host "Do the applications recover? "${{ parameters.settings.recovery }}
This will give you the following output:
Apps start as a service
Do the applications recover? no
YAML and Azure Pipelines are incredibly powerful tools. I can't recommend enough going through the entire contents of learn.microsoft.com on the subject. You'll spend a couple hours there, but you'll come out the other end with an incredibly knowledge of how these pipelines can be tailored to do everything you could ever NOT want to do yourself!
Notable links that helped me a TON (only learned this a couple months ago):
How to work with the YAML language in Pipelines
https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema?view=azure-devops&tabs=schema%2Cparameter-schema
How to compose expressions (also contains useful functions like convertToJSON for your object parameters!)
https://learn.microsoft.com/en-us/azure/devops/pipelines/process/expressions?view=azure-devops
How to create variables (separate from parameters, still useful)
https://learn.microsoft.com/en-us/azure/devops/pipelines/build/variables?view=azure-devops&tabs=yaml
SLEEPER ALERT!!! Templates are HUGELY helpful!!!
https://learn.microsoft.com/en-us/azure/devops/pipelines/process/templates?view=azure-devops
You could use an object with multiple properties
parameters:
- name: pipelines
type: object
default:
- { Name: pipeline1, Path: path1 }
- { Name: pipeline2, Path: path2 }
- { Name: pipeline3, Path: path3 }
steps:
- ${{each pipeline in parameters.pipelines}}:
# use pipeline.Name or pipeline.Path

Ruby, parsing YAML and outputting value

I'm pretty new to ruby and all the documentation on this subject has confused me a bit. So here goes.
I'm using inspec to test my infrastructure, but I want it to consume some variables from the YAML file used by ansible. This effectively means I can share vars from ansible code and use them in ruby.
The YAML file looks like this:
- name: Converge
hosts: all
vars:
elasticsearch_config:
cluster.name: "{{ elasticsearch_cluster_name }}"
node.name: "es-test-node"
path.data: "/var/lib/elasticsearch"
path.logs: "/var/log/elasticsearch"
elasticsearch_cluster_name: test
pre_tasks:
roles:
- elasticsearch
post_tasks:
At this point, I'm just playing around with ruby code to extract that, and have:
require 'yaml'
parsed = begin
YAML.load(File.open("../../playbook.yml"))
rescue ArgumentError => e
puts "Could not parse YAML: #{e.message}"
end
puts parsed
Which outputs the hash:
{"name"=>"Converge", "hosts"=>"all", "vars"=>{"elasticsearch_config"=>{"cluster.name"=>"{{ elasticsearch_cluster_name }}", "node.name"=>"es-test-node", "path.data"=>"/var/lib/elasticsearch", "path.logs"=>"/var/log/elasticsearch"}, "elasticsearch_cluster_name"=>"test"}, "pre_tasks"=>nil, "roles"=>["elasticsearch"], "post_tasks"=>nil}
So far so good. This all makes sense to me. Now, I would like to pull values out of this data and use them in the ruby code, referencing them by the keys. So, if I wanted to get the value of vars.elasticsearch_config.node.name, how would I go about doing this?
YAML.load reads the document into an array, so you must get the first element in your example:
loaded_yaml[0]["vars"]["elasticsearch_config"]["node.name"]
The reason for this is that the document you are parsing begins with a single dash, indicating a list item. Even though there is only one item in the list, Psych (thy YAML engine) is still placing it into an array representing a list. This is also why you got a no implicit conversion of String to Integer error. Note that the response you get is enclosed by square brackets:
=> [{"name"=>"Converge", "hosts"=>"all", "vars"=>{"elasticsearch_config"=>{"cluster.name"=>"{{ elasticsearch_cluster_name }}", "node.name"=>"es-test-node", "path.data"=>"/var/lib/elasticsearch", "path.logs"=>"/var/log/elasticsearch"}, "elasticsearch_cluster_name"=>"test"}, "pre_tasks"=>nil, "roles"=>["elasticsearch"], "post_tasks"=>nil}]

Ruby YAML/Psych dump custom comments

I have various lang files in yaml. Every time I update one file, I want to track all the new keys, write them with the original text in all the other lang files, but delimited with ### NEW and ### END.
The translator will then periodically fetch them and update.
Example:
# it.yml
a: A
b: B
some:
new: Key
# en.yml
a: A
b: B
After I run the program, en.yml should be:
a: A
b: B
### NEW
some:
new: Key
### END
I've already done most of the work, updated the en object and tagged the new key in a custom object, but I cannot understand how to tell Psych to print out ### NEW and ### END.
Any help will be appreciated!
You can find the code here: https://gist.github.com/Iazel/4f5c9fcd9b33c3ea994c
EDIT:
After A LOT of searching, seems that libyaml doesn't support comments, hence I don't think is possible what I want to do... Not without extending libyaml at least.

Updating YML on a production server

I am using a YML file to store trivial data.
I can create yml:
File.open("data.yml", "w") do |yaml|
yaml.write(#some_hash.to_yaml)
end
And open yml:
path = File.expand_path(File.dirname(__FILE__))
#trivial_data = YAML.load_file("#{path}/../../../config/data.yml")
But I don't know how to update a file. Say I want to add another row:
4:
agent_id: 332
last: Wade
first: Jason
suffix: Sr
rep_number: 2
How do I open, and update the yaml file? And is this a good idea on a production server?
Combine what you have and that's what you should do:
path = File.expand_path(File.dirname(__FILE__))
trivial_data = YAML.load_file("#{path}/../../../config/data.yml")
# ... manipulate data ...
File.open("data.yml", "w") do |yaml|
yaml.write(trivial_data.to_yaml)
end
You can't add something to a file without writing to it. YaML is a serialization language, and it doesn't make much sense to try and manipulate it directly. There is no simpler way (that I know of) that isn't horribly prone to errors.

Resources