Why isn't two-spaced YAML parsed like four-spaced YAML? - yaml

I'm seeing strange behavior when parsing YAML (using Ruby 2.5/Psych) created using two space indentations. The same file, indented with four spaces per line works -- to my mind -- as expected.
Two spaces:
windows:
- shell:
panes:
- echo hello
results in the following hash:
{"windows"=>[{"shell"=>nil, "panes"=>["echo hello"]}]}
Whereas using four space indentations:
windows:
- shell:
panes:
- echo hello
results in:
{"windows"=>[{"shell"=>{"panes"=>["echo hello"]}}]}
I just skimmed through the spec and didn't see anything relevant to this issue.
Is this behavior expected? If so, I'd greatly appreciate links to resources explaining why.

While Wayne's solution is correct, the explanation seems a bit off, so I'll throw in mine:
In YAML, the - for block sequence items (like ? and : for block mappings) is treated as indentation (spec):
The “-”, “?” and “:” characters used to denote block collection entries are perceived by people to be part of the indentation. This is handled on a case-by-case basis by the relevant productions.
Moreover, all block collections (sequences and mappings) take their indentation from their first item (since there is no explicit starting indicator). So in the line - shell:, the - defines the indentation level of the newly started sequence, while at the same time, shell: defines the indentation level of the newly started mapping, which is the content of the sequence item. Note how the - is treated as indentation for defining the indentation level of the mapping.
Now, revisiting your first example:
windows:
- shell:
panes:
- echo hello
panes: is on the same level as shell:. This means that YAML parses it as key of the mapping started by shell:, meaning that the key shell has an empty value. Mapping values of implicit keys, if not on the same line, must always be indented more than the corresponding mapping key (spec):
The block node’s properties may span across several lines. In this case, they must be indented by at least one more space than the block collection, regardless of the indentation of the block collection entries.
OTOH, in the second example:
windows:
- shell:
panes:
- echo hello
panes: is on a deeper indentation level compared to shell:. This means that it is parsed as value of the key shell and thus starts a new, nested block mapping.
Finally, mind that since - is treated as part of the indentation, „indenting by two spaces“ could also mean this:
windows:
- shell:
panes:
- echo hello
Note how the - are not more indented than their mapping keys. This works because the spec says:
Since people perceive the “-” indicator as indentation, nested block sequences may be indented by one less space to compensate, except, of course, if nested inside another block sequence (block-out context vs. block-in context).

The trouble is that you cannot simply replace every two spaces with four spaces. That is because in this pair of lines:
- shell:
panes:
these two spaces in the second line:
panes:
^^
Are an abbrevation for the "- " in the line above. If the second line were not abbreviated, then the pair of lines would be:
- shell:
- panes:
So when doubling the indentation, the second of these line should only have its first pair of spaces doubled, not the second. That would yield the correct indentation for the pair:
- shell:
panes:
So, if you only expand the first pair of spaces in the "panes:" line, you get:
windows:
- shell:
panes:
- git status
Which correctly parses to the expected result.

Related

Sphinx issues mysterious error in literal blocks

In Sphinx (the ReStructuredText publishing system), are there any obscure rules that limit what a literal block can contain?
Background: My document contains many literal blocks that follow a double-colon paragraph, like this:
Background:... follow a double-colon paragraph, like this::
$ sudo su
# echo ttyS0,115200 > /sys/module/kgdboc/parameters/kgdboc
This block (with a different preceding paragraph) is one of the ones that issues an error: "WARNING: Inconsistent literal block quoting." The message indicates that the error is in the "echo" line. In the HTML output the literal block contains only the "sudo" line; the "echo" line is treated as ordinary text.
I haven't been able to identify any common property in the lines that report errors, or anything that distinguishes them, as a class, from lines in other literal blocks that don't get errors.
I stripped down the project to isolate the problem, and I identified it that way.
I had a numbered list item that contained a double-colon literal block that was indented only as far as the list item's text, like this:
2. Set up the... directory::
$ A Linux command
$ Another Linux command
$ And ANOTHER Linux command
$ etc.
When I indented the literal block further, the problem went away.
I was misled by two things:
The message does not point to the first line in the literal block, but to some apparently random line within it. In the case above, it pointed to the fifth line (out of eight) in the block!
In most cases this form of indention, although incorrect, works just fine.
Isolating the problem is a brute-force method of solving it, but is often effective when deduction fails. I'll keep that in mind in the future.

Reading and writing back yaml files with multi-line strings

I have to read a yaml file, modify it and write back using pyYAML. Every thing works fine except when there is multi-line string values in single quotes e.g. if input yaml file looks like
FOO:
- Bar: '{"HELLO":
"WORLD"}'
then reading it as data=yaml.load(open("foo.yaml")) and writing it yaml.dump(data, fref, default_flow_style=False) generates something like
FOO:
- Bar: '{"HELLO": "WORLD"}'
i.e. without the extra line for Bar value. Strange thing is that if input file has something like
FOO:
- Bar: '{"HELLO":
"WORLD"}'
i.e. one extra new line for Bar value then writing it back generates the correct number of new lines. Any idea what I am doing wrong?
You are not doing anything wrong, but you probably should have read more of the YAML specification.
According to the (outdated) 1.1 spec that PyYAML implements, within
single quoted scalars:
In a multi-line single-quoted scalar, line breaks are subject to (flow) line folding, and any trailing white space is excluded from the content.
And line-folding:
Line folding allows long lines to be broken for readability, while retaining the original semantics of a single long line. When folding is done, any line break ending an empty line is preserved. In addition, any specific line breaks are also preserved, even when ending a non-empty line.
This means that your first two examples are the same, as the
line-break is read as if there is a space.
The third example is different, because it actually contains a newline after loading, because "any line break ending an empty line is preserved".
In order to understand why that dumps back as it was loaded, you have to know that PyYAML doesn't
maintain any information about the quoting (nor about the single newline in the first example), it
just loads that scalar into a Python string. During dumping PyYAML evaluates how that string
can best be written and the options it considers (unless you try to force things using the default_style argument to dump()): plain style, single quoted style, double quoted style.
PyYAML will use plain style (without quotes) when possible, but since
the string starts with {, this leads to confusion (collision) with
that character's use as the start of a flow style mapping. So quoting
is necessary. Since there are also double quotes in the string, and
there are no characters that need backslash escaping the "cleanest"
representation that PyYAML can choose is single quoted style, and in
that style it needs to represent a line-break by including an emtpy
line withing the single quoted scalar.
I would personally prefer using a block style literal scalar to represent your last example:
FOO:
- Bar: |
{"HELLO":
"WORLD"}
but if you load, then dump that using PyYAML its readability would be lost.
Although worded differently in the YAML 1.2 specification (released almost 10 years ago) the line-folding works the same, so this would "work" in a similar way with a more up-to-date YAML loader/dumper. My package ruamel.yaml, for loading/dumping YAML 1.2 will properly maintain the block style if you set the attribute preserve_quotes = True on the YAML() instance, but it will still get rid of the newline in your first example. This could be implemented (as is shown by ruamel.yaml preserving appropriate newline positions in folded style block scalars), but nobody ever asked for that, probably because if people want that kind of control over wrapping they use a block style to start with.

YAML How many spaces per indent?

Is there any difference if i use one space, two or four spaces per indent level in YAML?
Are there any specific rules for space numbers per Structure type??
For example 4 spaces for nesting maps , 1 space per list item etc??
I am writing a yaml configuration file for elastic beanstalk .ebextensions and i am having really hard time constructing this correctly. Although i have valid yaml in YAML Validator elastic beanstalk seems to understand a different structure.
There is no requirement in YAML to indent any concrete number of spaces. There is also no requirement to be consistent. So for example, this is valid YAML:
a:
b:
- c
- d
- e
f:
"ghi"
Some rules might be of interest:
Flow content (i.e. everything that starts with { or [) can span multiple lines, but must be indented at least as many spaces as the surrounding current block level.
Block list items can (but don't need to) have the same indentation as the surrounding block level because - is considered part of the indentation:
a: # top-level key
- b # value of that key, which is a list
- c
c: # next top-level key
d # non-list value which must be more indented
The YAML spec for v 1.2 merely says that
In YAML block styles, structure is determined by indentation. In general, indentation is defined as a zero or more space characters at the start of a line.
To maintain portability, tab characters must not be used in indentation, since different systems treat tabs differently. Note that most modern editors may be configured so that pressing the tab key results in the insertion of an appropriate number of spaces.
The amount of indentation is a presentation detail and must not be used to convey content information.
So you can set the indent depth to your preference, as long as you use spaces and not tabs. Interestingly, IntelliJ uses 2 spaces by default.
INDENTATION
The suggested syntax for YAML files is to use 2 spaces for indentation, but YAML will follow whatever indentation system that the individual file uses. Indentation of two spaces works very well for SLS files given the fact that the data is uniform and not deeply nested.
NESTED DICTIONARIES
When dictionaries are nested within other data structures (particularly lists), the indentation logic sometimes changes. Examples of where this might happen include context and default options from the file.managed state:
/etc/http/conf/http.conf:
file:
- managed
- source: salt://apache/http.conf
- user: root
- group: root
- mode: 644
- template: jinja
- context:
custom_var: "override"
- defaults:
custom_var: "default value"
other_var: 123
Notice that while the indentation is two spaces per level, for the values under the context and defaults options there is a four-space indent. If only two spaces are used to indent, then those keys will be considered part of the same dictionary that contains the context key, and so the data will not be loaded correctly. If using a double indent is not desirable, then a deeply-nested dict can be declared with curly braces:
/etc/http/conf/http.conf:
file:
- managed
- source: salt://apache/http.conf
- user: root
- group: root
- mode: 644
- template: jinja
- context: {
custom_var: "override" }
- defaults: {
custom_var: "default value",
other_var: 123 }
you can read more from this link

tmLanguage syntax highlighting with begin-end rules without highlighting a begin that doesn't have an end

I am creating a simple postfix programming language. The syntax is as follows:
2 3 add adds the two integers 2 and 3 together. "hello, world!" puts puts the string "hello, world!" to STDOUT.
The defining of new functions that can be used, is done as follows:
"fourth_pow"
"This function raises the argument on the top of the stack to the fourth power."
[
dup dup dup mul mul mul
]
def
Whitespace and newlines are not important in the language grammar. This means that above definition could also have been made as "fourth_pow" "..." [ dup dup dup mul mul mul] def (but of course, this is less readable).
I now want to highlight the syntax of this language, such that in a definition statement as above, the newly defined function name and the def keyword are highlighted.
A snippet from the .tmLanguage file (in YAML-format) is:
definition:
name: "entity.other.function.jux-beta_syntax"
begin: ("([a-zA-Z_][\w]*[?!]?)")
end: (def)
beginCaptures:
'1': {name: "entity.name.function.jux-beta_syntax"}
contentName: "support.constant.jux-beta_syntax"
patterns:
- include: '#comment'
- include: '#quotation_start'
- include: '#quotation_end'
- include: '#string'
- include: '#identifier'
- include: '#integer'
(See the whole file here)
This 'works', but it means that when at the end of the file, I start a new string, it is highlighted as if it would be a function definition. This because the 'begin' part of the #definition rule matches. But what I want to happen, is to only colour it if the match could be closed.
Is there a way to do so using the tmLanguage format?
The major problem is that, at least in Sublime Text (I haven't tested TextMate or VSCode), the regex engine can only match one line at a time using .tmLanguage files. Therefore, something like this:
(?m:("([a-zA-Z_][\w]*[?!]?)")(.*)(def))
using the ?m: multiline option (. matches newlines) won't work, although it works just fine in Ruby (the regex engine is based on Oniguruma). Because syntax highlighting only goes one line at a time, your beginCaptures matches as soon as a double-quoted string is entered. Of course, once the body of the function and def are written, everything highlights appropriately.
Unfortunately, I'm not fluent enough in it to give you a relevant working example, but you may want to check out the new sublime-syntax format in the most recent versions of Sublime Text 3. It functions using a stack, so you can push and pop contexts, allowing for matching on multiple lines. In the Selected Examples section, look at Bracket Balancing as an example - it highlights closing parentheses ) without a matching opening paren (.
Good luck!

Indenting a YAML sequence inside a mapping

Should the following be valid?
parent:
- child
- child
So what we have is a sequence of values inside a mapping.
The specific question is about whether the indentation for the 2nd and 3rd lines is valid. The Ruby YAML.dump generated this code, but the Yaml parser here rejects it, because the child lines are not indented.
i.e. it wants something like:
parent:
- child
- child
Who is right?
Looking at the YAML spec, it's certainly not obvious, and the line
The “-”, “?” and “:” characters used to denote block collection entries are perceived by people to be part of the indentation
doesn't help much.
Yes, that is legal YAML. The relevant text from the spec is here:
Since people perceive the “-” indicator as indentation, nested block sequences may be indented by one less space to compensate, except, of course, if nested inside another block sequence (block-out context vs. block-in context).
and the subsequent example 8.22:
sequence: !!seq
- entry
- !!seq
- nested
mapping: !!map
foo: bar

Resources