YAML How many spaces per indent? - ruby

Is there any difference if i use one space, two or four spaces per indent level in YAML?
Are there any specific rules for space numbers per Structure type??
For example 4 spaces for nesting maps , 1 space per list item etc??
I am writing a yaml configuration file for elastic beanstalk .ebextensions and i am having really hard time constructing this correctly. Although i have valid yaml in YAML Validator elastic beanstalk seems to understand a different structure.

There is no requirement in YAML to indent any concrete number of spaces. There is also no requirement to be consistent. So for example, this is valid YAML:
a:
b:
- c
- d
- e
f:
"ghi"
Some rules might be of interest:
Flow content (i.e. everything that starts with { or [) can span multiple lines, but must be indented at least as many spaces as the surrounding current block level.
Block list items can (but don't need to) have the same indentation as the surrounding block level because - is considered part of the indentation:
a: # top-level key
- b # value of that key, which is a list
- c
c: # next top-level key
d # non-list value which must be more indented

The YAML spec for v 1.2 merely says that
In YAML block styles, structure is determined by indentation. In general, indentation is defined as a zero or more space characters at the start of a line.
To maintain portability, tab characters must not be used in indentation, since different systems treat tabs differently. Note that most modern editors may be configured so that pressing the tab key results in the insertion of an appropriate number of spaces.
The amount of indentation is a presentation detail and must not be used to convey content information.
So you can set the indent depth to your preference, as long as you use spaces and not tabs. Interestingly, IntelliJ uses 2 spaces by default.

INDENTATION
The suggested syntax for YAML files is to use 2 spaces for indentation, but YAML will follow whatever indentation system that the individual file uses. Indentation of two spaces works very well for SLS files given the fact that the data is uniform and not deeply nested.
NESTED DICTIONARIES
When dictionaries are nested within other data structures (particularly lists), the indentation logic sometimes changes. Examples of where this might happen include context and default options from the file.managed state:
/etc/http/conf/http.conf:
file:
- managed
- source: salt://apache/http.conf
- user: root
- group: root
- mode: 644
- template: jinja
- context:
custom_var: "override"
- defaults:
custom_var: "default value"
other_var: 123
Notice that while the indentation is two spaces per level, for the values under the context and defaults options there is a four-space indent. If only two spaces are used to indent, then those keys will be considered part of the same dictionary that contains the context key, and so the data will not be loaded correctly. If using a double indent is not desirable, then a deeply-nested dict can be declared with curly braces:
/etc/http/conf/http.conf:
file:
- managed
- source: salt://apache/http.conf
- user: root
- group: root
- mode: 644
- template: jinja
- context: {
custom_var: "override" }
- defaults: {
custom_var: "default value",
other_var: 123 }
you can read more from this link

Related

Helm-Charts(yaml): Regex expression broken

I am working with https://github.com/prometheus-community/helm-charts and am running into some issues with a couple of regex queries are a part of our basic yaml deployments. The issue I'm having is specifically with the Node exporter part of the prometheus chart. I have configured this:
nodeExporter:
extraArgs: {
collector.filesystem.ignored-fs-types="^(devpts|devtmpfs|mqueue|proc|securityfs|binfmt_misc|debugfs|overlay|pstore|selinuxfs|tmpfs|hugetlbfs|nfsd|cgroup|configfs|rpc_pipefs|sysfs|autofs|rootfs)$",
collector.filesystem.ignored-mount-points="^/etc/.+$",
collector.netstat.fields="*",
collector.diskstats.ignored-devices="^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p|dm-)\d+$", # BROKEN
collector.netclass.ignored-devices=^(?:tun|kube|veth|dummy|docker).+$, # BROKEN
collector.nfs
}
tolerations:
- operator: Exists
As noted above, these two lines with regex are broken:
collector.diskstats.ignored-devices="^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p|dm-)\d+$", # BROKEN
collector.netclass.ignored-devices=^(?:tun|kube|veth|dummy|docker).+$, # BROKEN
There seems to be a problem with the | character right be fore "nvme" in the first one, and with the ?: in the second. I believe it's something to do with regex/yaml format, but I'm not sure how to correct this.
With {, you are beginning a YAML flow mapping. It typically contains comma-separated key-value pairs, though you can also, like in this example, give single values instead, which will make them a key with null value.
In YAML, as soon as you enter a flow-style collection, all special flow-indicators cannot be used in plain scalars anymore. Special flow indicators are {}[],. A plain scalar is a non-quoted textual value.
The first broken value is illegal because it contains [ and ]. The second broken value is actually legal according to the specification, but quite some YAML implementations choke on it because ? is also used as indicator for a mapping key.
You have several options:
Quote the scalars. since none of them contain single quotes, enclosing each with single quotes will do the trick. Generally you can also double-quote them, but then you need to escape all double-quote characters and all backslashes in there which does not help readability.
nodeExporter:
extraArgs: {
collector.filesystem.ignored-fs-types="^(devpts|devtmpfs|mqueue|proc|securityfs|binfmt_misc|debugfs|overlay|pstore|selinuxfs|tmpfs|hugetlbfs|nfsd|cgroup|configfs|rpc_pipefs|sysfs|autofs|rootfs)$",
collector.filesystem.ignored-mount-points="^/etc/.+$",
collector.netstat.fields="*",
'collector.diskstats.ignored-devices="^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p|dm-)\d+$"',
'collector.netclass.ignored-devices=^(?:tun|kube|veth|dummy|docker).+$',
collector.nfs
}
tolerations:
- operator: Exists
Use block scalars. Block scalars are generally the best way to enter scalars with lots of special characters because they are ended via indentation and therefore can contain any special character. Block scalars can only occur in other block structures, so you'd need to make extraArgs a block mapping:
nodeExporter:
extraArgs:
? collector.filesystem.ignored-fs-types="^(devpts|devtmpfs|mqueue|proc|securityfs|binfmt_misc|debugfs|overlay|pstore|selinuxfs|tmpfs|hugetlbfs|nfsd|cgroup|configfs|rpc_pipefs|sysfs|autofs|rootfs)$"
? collector.filesystem.ignored-mount-points="^/etc/.+$"
? collector.netstat.fields="*"
? |-
collector.diskstats.ignored-devices="^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p|dm-)\d+$"
? |-
collector.netclass.ignored-devices=^(?:tun|kube|veth|dummy|docker).+$
? collector.nfs
tolerations:
- operator: Exists
As you can see, this is now using the previously mentioned ? as key indicator.
Since it is a block sequence, you don't need the commas anymore.
|- starts a literal block scalar from which the final linebreak is stripped.

Why isn't two-spaced YAML parsed like four-spaced YAML?

I'm seeing strange behavior when parsing YAML (using Ruby 2.5/Psych) created using two space indentations. The same file, indented with four spaces per line works -- to my mind -- as expected.
Two spaces:
windows:
- shell:
panes:
- echo hello
results in the following hash:
{"windows"=>[{"shell"=>nil, "panes"=>["echo hello"]}]}
Whereas using four space indentations:
windows:
- shell:
panes:
- echo hello
results in:
{"windows"=>[{"shell"=>{"panes"=>["echo hello"]}}]}
I just skimmed through the spec and didn't see anything relevant to this issue.
Is this behavior expected? If so, I'd greatly appreciate links to resources explaining why.
While Wayne's solution is correct, the explanation seems a bit off, so I'll throw in mine:
In YAML, the - for block sequence items (like ? and : for block mappings) is treated as indentation (spec):
The “-”, “?” and “:” characters used to denote block collection entries are perceived by people to be part of the indentation. This is handled on a case-by-case basis by the relevant productions.
Moreover, all block collections (sequences and mappings) take their indentation from their first item (since there is no explicit starting indicator). So in the line - shell:, the - defines the indentation level of the newly started sequence, while at the same time, shell: defines the indentation level of the newly started mapping, which is the content of the sequence item. Note how the - is treated as indentation for defining the indentation level of the mapping.
Now, revisiting your first example:
windows:
- shell:
panes:
- echo hello
panes: is on the same level as shell:. This means that YAML parses it as key of the mapping started by shell:, meaning that the key shell has an empty value. Mapping values of implicit keys, if not on the same line, must always be indented more than the corresponding mapping key (spec):
The block node’s properties may span across several lines. In this case, they must be indented by at least one more space than the block collection, regardless of the indentation of the block collection entries.
OTOH, in the second example:
windows:
- shell:
panes:
- echo hello
panes: is on a deeper indentation level compared to shell:. This means that it is parsed as value of the key shell and thus starts a new, nested block mapping.
Finally, mind that since - is treated as part of the indentation, „indenting by two spaces“ could also mean this:
windows:
- shell:
panes:
- echo hello
Note how the - are not more indented than their mapping keys. This works because the spec says:
Since people perceive the “-” indicator as indentation, nested block sequences may be indented by one less space to compensate, except, of course, if nested inside another block sequence (block-out context vs. block-in context).
The trouble is that you cannot simply replace every two spaces with four spaces. That is because in this pair of lines:
- shell:
panes:
these two spaces in the second line:
panes:
^^
Are an abbrevation for the "- " in the line above. If the second line were not abbreviated, then the pair of lines would be:
- shell:
- panes:
So when doubling the indentation, the second of these line should only have its first pair of spaces doubled, not the second. That would yield the correct indentation for the pair:
- shell:
panes:
So, if you only expand the first pair of spaces in the "panes:" line, you get:
windows:
- shell:
panes:
- git status
Which correctly parses to the expected result.

Antlr4 handling of yaml unquoted multi-line strings

I am trying to build a parser for a limited set of YAML syntax similar to what is shown below using Antlr 4.7:
name:
last: Smith
first: John
address:
street: 123 Main St
Suite 100
city: Boston
state: MA
zip: 12345
I have a grammar (derived from the Python 3 grammar) that works correctly if I put quotes around the "value" strings but fails if I remove them. It seems that defining the "value" string so matching terminates before the next "tag:" portion of a new block or a "tag: " portion of a new assign statement is the trick.
Does anyone have any ideas or working samples that handle this use case?
It is the indentation of a non-empty line that should end the matching of a plain scalar. If that indentation is not more than the indentation of the current mapping, the scalar ends there.
For example:
mapping:
key: value with
multiple lines
key2:
other value
Here, the value with multiple lines ends at the line with key2:, because it is not indented more than the current mapping (i.e. the value of mapping: above). Of course, the last newline character and the indentation of key2: is not a part of that scalar's content.
In the YAML specification, this is handled by a production
s-indent(n) ::= s-space × n
Now in our case, the inner mapping has an indentation of n=2, so your scalar would be matched by something like
plain-scalar-part (s-indent(3) s-white* plain-scalar-part)*
(I don't know Antlr syntax, just assume these are all non-terminals). After the (possibly empty) first line, you match an indentation of more than the parent mapping (so 3 spaces in this case), then there might be even more whitespace (which is not part of the content), and then more content follows. For simplicity, I ignored possible empty lines.
This will not match the line key2: because it has too few indentation, which is how the matching of the scalar will end.
Now I do not know how to do something like s-indent(n) in Antlr, but the Python grammar should give you the right pointers.

YAML file syntax

I'm working with a yaml file that I'm not supposed to break it. The problem is I'm not familiar with it, so is not sure if I can change some of its format...
The source file we received looks like this:
- items:
- heading: Maps
description: >
Integrate 3D buildings and tacos.
image_path: /music/images/v2/web_api-music.png
After processing the files, it looks like this:
- items:
- heading: Maps
description: > Integrate 3D buildings and tacos.
image_path: /music/images/v2/web_api-music.png
Does it break the code if the line break is missing between the greater than sign and the string? Would it have any potential impact on the UI format?
Also does it matter if there's extra spaces before "Integrate 3D buildings and tacos"? like below
- items:
- heading: Maps
description: >
Integrate 3D buildings and tacos.
image_path: /music/images/v2/web_api-music.png
Thank you and happy Thanksgiving!
It is easiest to check your files using some online YAML validator. For example: yamllint. Also, there are libraries for many languages, so if possible I recommend you use one of these to process your yaml files.
Your first processes file is not valid. There should be a newline after the >, or you can leave out the >.
Your last example is valid. The amount of indentation doesn't matter. From the spec:
The amount of indentation is a presentation detail and must not be used to convey content information.
[...]
Each node must be indented further than its parent node. All sibling nodes must use the exact same indentation level. However the content of each sibling node may be further indented independently.
See the spec for the version of yaml you are interested in
Generally speaking, the > is only significant at the end of the line, and means that the subsequent indented block should be folded on to this line with all newlines and leading/trailing space removed (replaced by sinlge spaces.) So
- heading: Maps
description: >
Integrate 3D buildings and tacos.
image_path: /music/images/v2/web_api-music.png
would be equivalent to
- heading: Maps
description: Integrate 3D buildings and tacos.
and leaving the > in when you remove the newline essentially adds it to the string value.
Changing the amount of indentation of any given block is generally irrelevant, as long as the lines of a block are consistently indented

Indenting a YAML sequence inside a mapping

Should the following be valid?
parent:
- child
- child
So what we have is a sequence of values inside a mapping.
The specific question is about whether the indentation for the 2nd and 3rd lines is valid. The Ruby YAML.dump generated this code, but the Yaml parser here rejects it, because the child lines are not indented.
i.e. it wants something like:
parent:
- child
- child
Who is right?
Looking at the YAML spec, it's certainly not obvious, and the line
The “-”, “?” and “:” characters used to denote block collection entries are perceived by people to be part of the indentation
doesn't help much.
Yes, that is legal YAML. The relevant text from the spec is here:
Since people perceive the “-” indicator as indentation, nested block sequences may be indented by one less space to compensate, except, of course, if nested inside another block sequence (block-out context vs. block-in context).
and the subsequent example 8.22:
sequence: !!seq
- entry
- !!seq
- nested
mapping: !!map
foo: bar

Resources