Indenting a YAML sequence inside a mapping

Indenting a YAML sequence inside a mapping - yaml

Should the following be valid?
parent:
- child
- child
So what we have is a sequence of values inside a mapping.
The specific question is about whether the indentation for the 2nd and 3rd lines is valid. The Ruby YAML.dump generated this code, but the Yaml parser here rejects it, because the child lines are not indented.
i.e. it wants something like:
parent:
- child
- child
Who is right?
Looking at the YAML spec, it's certainly not obvious, and the line
The “-”, “?” and “:” characters used to denote block collection entries are perceived by people to be part of the indentation
doesn't help much.

Yes, that is legal YAML. The relevant text from the spec is here:
Since people perceive the “-” indicator as indentation, nested block sequences may be indented by one less space to compensate, except, of course, if nested inside another block sequence (block-out context vs. block-in context).
and the subsequent example 8.22:
sequence: !!seq
- entry
- !!seq
- nested
mapping: !!map
foo: bar

Related

Reading and writing back yaml files with multi-line strings

I have to read a yaml file, modify it and write back using pyYAML. Every thing works fine except when there is multi-line string values in single quotes e.g. if input yaml file looks like
FOO:
- Bar: '{"HELLO":
"WORLD"}'
then reading it as data=yaml.load(open("foo.yaml")) and writing it yaml.dump(data, fref, default_flow_style=False) generates something like
FOO:
- Bar: '{"HELLO": "WORLD"}'
i.e. without the extra line for Bar value. Strange thing is that if input file has something like
FOO:
- Bar: '{"HELLO":
"WORLD"}'
i.e. one extra new line for Bar value then writing it back generates the correct number of new lines. Any idea what I am doing wrong?

You are not doing anything wrong, but you probably should have read more of the YAML specification.
According to the (outdated) 1.1 spec that PyYAML implements, within
single quoted scalars:
In a multi-line single-quoted scalar, line breaks are subject to (flow) line folding, and any trailing white space is excluded from the content.
And line-folding:
Line folding allows long lines to be broken for readability, while retaining the original semantics of a single long line. When folding is done, any line break ending an empty line is preserved. In addition, any specific line breaks are also preserved, even when ending a non-empty line.
This means that your first two examples are the same, as the
line-break is read as if there is a space.
The third example is different, because it actually contains a newline after loading, because "any line break ending an empty line is preserved".
In order to understand why that dumps back as it was loaded, you have to know that PyYAML doesn't
maintain any information about the quoting (nor about the single newline in the first example), it
just loads that scalar into a Python string. During dumping PyYAML evaluates how that string
can best be written and the options it considers (unless you try to force things using the default_style argument to dump()): plain style, single quoted style, double quoted style.
PyYAML will use plain style (without quotes) when possible, but since
the string starts with {, this leads to confusion (collision) with
that character's use as the start of a flow style mapping. So quoting
is necessary. Since there are also double quotes in the string, and
there are no characters that need backslash escaping the "cleanest"
representation that PyYAML can choose is single quoted style, and in
that style it needs to represent a line-break by including an emtpy
line withing the single quoted scalar.
I would personally prefer using a block style literal scalar to represent your last example:
FOO:
- Bar: |
{"HELLO":
"WORLD"}
but if you load, then dump that using PyYAML its readability would be lost.
Although worded differently in the YAML 1.2 specification (released almost 10 years ago) the line-folding works the same, so this would "work" in a similar way with a more up-to-date YAML loader/dumper. My package ruamel.yaml, for loading/dumping YAML 1.2 will properly maintain the block style if you set the attribute preserve_quotes = True on the YAML() instance, but it will still get rid of the newline in your first example. This could be implemented (as is shown by ruamel.yaml preserving appropriate newline positions in folded style block scalars), but nobody ever asked for that, probably because if people want that kind of control over wrapping they use a block style to start with.

Use of --- in yaml

I came across this yaml document:
--- !ruby/object:MyClass
myint: 100
mystring: hello world
What does the line:
--- !ruby/object:MyClass
mean?

In YAML, --- is the end of directives marker.
A YAML document may begin with a number of YAML directives (currently, two directives are defined, %YAML and %TAG). Since a text node (for example) can also start with a % character, there needs to be a way to distinguish between directives and text. This is achieved using the end of directives marker --- which signals the end of the directives and the beginning of the document.
Since directives are allowed to be empty, --- can also serve as a document separator.
YAML also has an end of document marker .... However, this is not often used, because and end of directives marker / document separator also implies the end of the document. You need it if you want to have multiple documents with directives within the same stream or when you want to indicate that a document is finished without necessarily starting a new one (e.g. in cases where there may be significant time passing between the end of one document and the start of another).
Many YAML emitters, and Psych is no exception, always emit an end of directives marker at the beginning of each document. This allows you to easily concatenate multiple documents into a single stream without doing any additional processing of the documents.
The other half of that line, !ruby/object:MyClass, is a tag. A tag is used to give a type to the following node. In YAML, every node has a type, even if it is implicit. You can also write the tag explicitly, for example text nodes have the type (tag) !!str. This can be useful in certain circumstances, for example here:
!!str 2018-10-31
This tells YAML that 2018-10-31 is text, not a date.
!ruby/object:MyClass is a tag used by Psych to indicate that the node is a serialized Ruby Object which is an instance of class MyClass. This way, when deserializing the document, Psych knows what class to instantiate and how to treat the node.

According to yaml.org, '---' indicates the start of a document.
https://yaml.org/spec/1.2/spec.html
for official specifications.

Why isn't two-spaced YAML parsed like four-spaced YAML?

I'm seeing strange behavior when parsing YAML (using Ruby 2.5/Psych) created using two space indentations. The same file, indented with four spaces per line works -- to my mind -- as expected.
Two spaces:
windows:
- shell:
panes:
- echo hello
results in the following hash:
{"windows"=>[{"shell"=>nil, "panes"=>["echo hello"]}]}
Whereas using four space indentations:
windows:
- shell:
panes:
- echo hello
results in:
{"windows"=>[{"shell"=>{"panes"=>["echo hello"]}}]}
I just skimmed through the spec and didn't see anything relevant to this issue.
Is this behavior expected? If so, I'd greatly appreciate links to resources explaining why.

While Wayne's solution is correct, the explanation seems a bit off, so I'll throw in mine:
In YAML, the - for block sequence items (like ? and : for block mappings) is treated as indentation (spec):
The “-”, “?” and “:” characters used to denote block collection entries are perceived by people to be part of the indentation. This is handled on a case-by-case basis by the relevant productions.
Moreover, all block collections (sequences and mappings) take their indentation from their first item (since there is no explicit starting indicator). So in the line - shell:, the - defines the indentation level of the newly started sequence, while at the same time, shell: defines the indentation level of the newly started mapping, which is the content of the sequence item. Note how the - is treated as indentation for defining the indentation level of the mapping.
Now, revisiting your first example:
windows:
- shell:
panes:
- echo hello
panes: is on the same level as shell:. This means that YAML parses it as key of the mapping started by shell:, meaning that the key shell has an empty value. Mapping values of implicit keys, if not on the same line, must always be indented more than the corresponding mapping key (spec):
The block node’s properties may span across several lines. In this case, they must be indented by at least one more space than the block collection, regardless of the indentation of the block collection entries.
OTOH, in the second example:
windows:
- shell:
panes:
- echo hello
panes: is on a deeper indentation level compared to shell:. This means that it is parsed as value of the key shell and thus starts a new, nested block mapping.
Finally, mind that since - is treated as part of the indentation, „indenting by two spaces“ could also mean this:
windows:
- shell:
panes:
- echo hello
Note how the - are not more indented than their mapping keys. This works because the spec says:
Since people perceive the “-” indicator as indentation, nested block sequences may be indented by one less space to compensate, except, of course, if nested inside another block sequence (block-out context vs. block-in context).

The trouble is that you cannot simply replace every two spaces with four spaces. That is because in this pair of lines:
- shell:
panes:
these two spaces in the second line:
panes:
^^
Are an abbrevation for the "- " in the line above. If the second line were not abbreviated, then the pair of lines would be:
- shell:
- panes:
So when doubling the indentation, the second of these line should only have its first pair of spaces doubled, not the second. That would yield the correct indentation for the pair:
- shell:
panes:
So, if you only expand the first pair of spaces in the "panes:" line, you get:
windows:
- shell:
panes:
- git status
Which correctly parses to the expected result.

Antlr4 handling of yaml unquoted multi-line strings

I am trying to build a parser for a limited set of YAML syntax similar to what is shown below using Antlr 4.7:
name:
last: Smith
first: John
address:
street: 123 Main St
Suite 100
city: Boston
state: MA
zip: 12345
I have a grammar (derived from the Python 3 grammar) that works correctly if I put quotes around the "value" strings but fails if I remove them. It seems that defining the "value" string so matching terminates before the next "tag:" portion of a new block or a "tag: " portion of a new assign statement is the trick.
Does anyone have any ideas or working samples that handle this use case?

It is the indentation of a non-empty line that should end the matching of a plain scalar. If that indentation is not more than the indentation of the current mapping, the scalar ends there.
For example:
mapping:
key: value with
multiple lines
key2:
other value
Here, the value with multiple lines ends at the line with key2:, because it is not indented more than the current mapping (i.e. the value of mapping: above). Of course, the last newline character and the indentation of key2: is not a part of that scalar's content.
In the YAML specification, this is handled by a production
s-indent(n) ::= s-space × n
Now in our case, the inner mapping has an indentation of n=2, so your scalar would be matched by something like
plain-scalar-part (s-indent(3) s-white* plain-scalar-part)*
(I don't know Antlr syntax, just assume these are all non-terminals). After the (possibly empty) first line, you match an indentation of more than the parent mapping (so 3 spaces in this case), then there might be even more whitespace (which is not part of the content), and then more content follows. For simplicity, I ignored possible empty lines.
This will not match the line key2: because it has too few indentation, which is how the matching of the scalar will end.
Now I do not know how to do something like s-indent(n) in Antlr, but the Python grammar should give you the right pointers.

YAML How many spaces per indent?

Is there any difference if i use one space, two or four spaces per indent level in YAML?
Are there any specific rules for space numbers per Structure type??
For example 4 spaces for nesting maps , 1 space per list item etc??
I am writing a yaml configuration file for elastic beanstalk .ebextensions and i am having really hard time constructing this correctly. Although i have valid yaml in YAML Validator elastic beanstalk seems to understand a different structure.

There is no requirement in YAML to indent any concrete number of spaces. There is also no requirement to be consistent. So for example, this is valid YAML:
a:
b:
- c
- d
- e
f:
"ghi"
Some rules might be of interest:
Flow content (i.e. everything that starts with { or [) can span multiple lines, but must be indented at least as many spaces as the surrounding current block level.
Block list items can (but don't need to) have the same indentation as the surrounding block level because - is considered part of the indentation:
a: # top-level key
- b # value of that key, which is a list
- c
c: # next top-level key
d # non-list value which must be more indented

The YAML spec for v 1.2 merely says that
In YAML block styles, structure is determined by indentation. In general, indentation is defined as a zero or more space characters at the start of a line.
To maintain portability, tab characters must not be used in indentation, since different systems treat tabs differently. Note that most modern editors may be configured so that pressing the tab key results in the insertion of an appropriate number of spaces.
The amount of indentation is a presentation detail and must not be used to convey content information.
So you can set the indent depth to your preference, as long as you use spaces and not tabs. Interestingly, IntelliJ uses 2 spaces by default.

INDENTATION
The suggested syntax for YAML files is to use 2 spaces for indentation, but YAML will follow whatever indentation system that the individual file uses. Indentation of two spaces works very well for SLS files given the fact that the data is uniform and not deeply nested.
NESTED DICTIONARIES
When dictionaries are nested within other data structures (particularly lists), the indentation logic sometimes changes. Examples of where this might happen include context and default options from the file.managed state:
/etc/http/conf/http.conf:
file:
- managed
- source: salt://apache/http.conf
- user: root
- group: root
- mode: 644
- template: jinja
- context:
custom_var: "override"
- defaults:
custom_var: "default value"
other_var: 123
Notice that while the indentation is two spaces per level, for the values under the context and defaults options there is a four-space indent. If only two spaces are used to indent, then those keys will be considered part of the same dictionary that contains the context key, and so the data will not be loaded correctly. If using a double indent is not desirable, then a deeply-nested dict can be declared with curly braces:
/etc/http/conf/http.conf:
file:
- managed
- source: salt://apache/http.conf
- user: root
- group: root
- mode: 644
- template: jinja
- context: {
custom_var: "override" }
- defaults: {
custom_var: "default value",
other_var: 123 }
you can read more from this link

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio