YAML comments in multi-line strings - yaml

Does YAML support comments in multi-line strings?
I'm trying to do things like this, but the validator is throwing errors:
key:
#comment
value
#comment
value
value #comments here don't work either

No. Per the YAML 1.2 spec "Comments must not appear inside scalars". Which is exactly the case here. There's no way in YAML to escape the octothorpe symbol (#) so within a multi-line string there's no way to disambiguate the comment from the raw string value.
You can however interleave comments within a collection. For example, if you really needed to, you could break your string into a sequence of strings one per line:
key: #comment
- value line 1
#comment
- value line 2
#comment
- value line 3
Should work...

Related

Reading and writing back yaml files with multi-line strings

I have to read a yaml file, modify it and write back using pyYAML. Every thing works fine except when there is multi-line string values in single quotes e.g. if input yaml file looks like
FOO:
- Bar: '{"HELLO":
"WORLD"}'
then reading it as data=yaml.load(open("foo.yaml")) and writing it yaml.dump(data, fref, default_flow_style=False) generates something like
FOO:
- Bar: '{"HELLO": "WORLD"}'
i.e. without the extra line for Bar value. Strange thing is that if input file has something like
FOO:
- Bar: '{"HELLO":
"WORLD"}'
i.e. one extra new line for Bar value then writing it back generates the correct number of new lines. Any idea what I am doing wrong?
You are not doing anything wrong, but you probably should have read more of the YAML specification.
According to the (outdated) 1.1 spec that PyYAML implements, within
single quoted scalars:
In a multi-line single-quoted scalar, line breaks are subject to (flow) line folding, and any trailing white space is excluded from the content.
And line-folding:
Line folding allows long lines to be broken for readability, while retaining the original semantics of a single long line. When folding is done, any line break ending an empty line is preserved. In addition, any specific line breaks are also preserved, even when ending a non-empty line.
This means that your first two examples are the same, as the
line-break is read as if there is a space.
The third example is different, because it actually contains a newline after loading, because "any line break ending an empty line is preserved".
In order to understand why that dumps back as it was loaded, you have to know that PyYAML doesn't
maintain any information about the quoting (nor about the single newline in the first example), it
just loads that scalar into a Python string. During dumping PyYAML evaluates how that string
can best be written and the options it considers (unless you try to force things using the default_style argument to dump()): plain style, single quoted style, double quoted style.
PyYAML will use plain style (without quotes) when possible, but since
the string starts with {, this leads to confusion (collision) with
that character's use as the start of a flow style mapping. So quoting
is necessary. Since there are also double quotes in the string, and
there are no characters that need backslash escaping the "cleanest"
representation that PyYAML can choose is single quoted style, and in
that style it needs to represent a line-break by including an emtpy
line withing the single quoted scalar.
I would personally prefer using a block style literal scalar to represent your last example:
FOO:
- Bar: |
{"HELLO":
"WORLD"}
but if you load, then dump that using PyYAML its readability would be lost.
Although worded differently in the YAML 1.2 specification (released almost 10 years ago) the line-folding works the same, so this would "work" in a similar way with a more up-to-date YAML loader/dumper. My package ruamel.yaml, for loading/dumping YAML 1.2 will properly maintain the block style if you set the attribute preserve_quotes = True on the YAML() instance, but it will still get rid of the newline in your first example. This could be implemented (as is shown by ruamel.yaml preserving appropriate newline positions in folded style block scalars), but nobody ever asked for that, probably because if people want that kind of control over wrapping they use a block style to start with.

Any issues with using ''' block string for block commenting in yaml?

I have been using ''' for block comments in yaml. Like:
'''
This
is
a
comment
'''
I have noticed that this approach isn't one of the answers to the How do you block comment in yaml question. Is there a reason why not to do this (other than terrible multiline string formating glitches in VIM)? Does it get loaded into memory or something else that could be problematic?
YAML comments are started with # separated from other tokens with whitespace and terminate at the end of line
If you do:
'''
This
is
a
comment
'''
You specify a scalar node, that starts and ends with one (1) single quote. That is because in single quoted style scalar nodes, you can insert a single quote by escaping it with a single quote. Since YAML does line unwrapping the above loads as the string ' This is a comment ' (the string including the quotes).
However if you insert that as comment after a scalar node like 42 as in:
answer: 42 '''
This
is
a
comment
'''
You still have valid YAML, but this will load e.g. in Python as a dict with a key answer and an associated value of 42 ''' This is a comment '''. A string, which would probably give you some error if you expected the integer value 42.
Based on the spec, use # only:
http://yaml.org/spec/1.2/spec.html#comment/
As to why? Short of 'Because they said so' I would guess that some of the readability of YAML is lost with multiline comments.
You're use of ''' is the standard for Python docstrings.

Antlr4 handling of yaml unquoted multi-line strings

I am trying to build a parser for a limited set of YAML syntax similar to what is shown below using Antlr 4.7:
name:
last: Smith
first: John
address:
street: 123 Main St
Suite 100
city: Boston
state: MA
zip: 12345
I have a grammar (derived from the Python 3 grammar) that works correctly if I put quotes around the "value" strings but fails if I remove them. It seems that defining the "value" string so matching terminates before the next "tag:" portion of a new block or a "tag: " portion of a new assign statement is the trick.
Does anyone have any ideas or working samples that handle this use case?
It is the indentation of a non-empty line that should end the matching of a plain scalar. If that indentation is not more than the indentation of the current mapping, the scalar ends there.
For example:
mapping:
key: value with
multiple lines
key2:
other value
Here, the value with multiple lines ends at the line with key2:, because it is not indented more than the current mapping (i.e. the value of mapping: above). Of course, the last newline character and the indentation of key2: is not a part of that scalar's content.
In the YAML specification, this is handled by a production
s-indent(n) ::= s-space × n
Now in our case, the inner mapping has an indentation of n=2, so your scalar would be matched by something like
plain-scalar-part (s-indent(3) s-white* plain-scalar-part)*
(I don't know Antlr syntax, just assume these are all non-terminals). After the (possibly empty) first line, you match an indentation of more than the parent mapping (so 3 spaces in this case), then there might be even more whitespace (which is not part of the content), and then more content follows. For simplicity, I ignored possible empty lines.
This will not match the line key2: because it has too few indentation, which is how the matching of the scalar will end.
Now I do not know how to do something like s-indent(n) in Antlr, but the Python grammar should give you the right pointers.

What does $/ mean in Ruby?

I was reading about Ruby serialization (http://www.skorks.com/2010/04/serializing-and-deserializing-objects-with-ruby/) and came across the following code. What does $/ mean? I assume $ refers to an object?
array = []
$/="\n\n"
File.open("/home/alan/tmp/blah.yaml", "r").each do |object|
array << YAML::load(object)
end
$/ is a pre-defined variable. It's used as the input record separator, and has a default value of "\n".
Functions like gets uses $/ to determine how to separate the input. For example:
$/="\n\n"
str = gets
puts str
So you have to enter ENTER twice to end the input for str.
Reference: Pre-defined variables
This code is trying to read each object into an array element, so you need to tell it where one ends and the next begins. The line $/="\n\n" is setting what ruby uses to to break apart your file into.
$/ is known as the "input record separator" and is the value used to split up your file when you are reading it in. By default this value is set to new line, so when you read in a file, each line will be put into an array. What setting this value, you are telling ruby that one new line is not the end of a break, instead use the string given.
For example, if I have a comma separated file, I can write $/="," then if I do something like your code on a file like this:
foo, bar, magic, space
I would create an array directly, without having to split again:
["foo", " bar", " magic", " space"]
So your line will look for two newline characters, and split on each group of two instead of on every newline. You will only get two newline characters following each other when one line is empty. So this line tells Ruby, when reading files, break on empty lines instead of every line.
I found in this page something probably interesting:
http://www.zenspider.com/Languages/Ruby/QuickRef.html#18
$/ # The input record separator (eg #gets). Defaults to newline.
The $ means it is a global variable.
This one is however special as it is used by Ruby. Ruby uses that variable as a input record separator
For a full list with the special global variables see:
http://www.rubyist.net/~slagell/ruby/globalvars.html

String#split in Ruby not behaving as expected

File.open(path, 'r').each do |line|
row = line.chomp.split('\t')
puts "#{row[0]}"
end
path is the path of file having content like name, age, profession, hobby
I'm expecting output to be name only but I am getting the whole line.
Why is it so?
The question already has an accepted answer, but it's worth noting what the cause of the original problem was:
This is the problem part:
split('\t')
Ruby has several forms for quoted string, which have differences, usually useful ones.
Quoting from Ruby Programming at wikibooks.org:
...double quotes are designed to
interpret escaped characters such as
new lines and tabs so that they appear
as actual new lines and tabs when the
string is rendered for the user.
Single quotes, however, display the
actual escape sequence, for example
displaying \n instead of a new line.
Read further in the linked article to see the use of %q and %Q strings. Or Google for "ruby string delimiters", or see this SO question.
So '\t' is interpreted as "backslash+t", whereas "\t" is a tab character.
String#split will also take a Regexp, which in this case might remove the ambiguity:
split(/\t/)
Your question was not very clear
split("\n") - if you want to split by lines
split - if you want to split by spaces
and as I can understand, you do not need chomp, because it removes all the "\n"

Resources