Multi-line in a sequence in YAML - syntax

I would like to have multiple lines in a sequence in YAML. This is how I do it, but I have issues with parsing it in python:
Element: |
- multiple lines
come here
Doing it this way, when I parse it with Python, I still see the - in the parsed data. It seems that YAML does not understand this is a list.

Your input is not a list, YAML only knows about mappings (constructed as a Python dict and sequences (constructed as a Python list).
Normally - is the block sequence entry indicator, But since you start a block style literal on the first line as the value for the key Element, because of the |, everything following it that is indented is part of this scalar (constructed as a Python string).
What you want to do is bring the indicator outside of the literal scalar:
Element:
- |
multiple lines
come here
If you load that in Python in a variable data then data['Element'][0] will be the string 'multiple lines\ncome here\n'. That is: every newline in your literal scalar will be a newline in your string, and there will be a single final newline on that string independent of how many empty lines follow (this is clipping). If you want the end to have no newline, then use |- (stripping), and if you want all newlines until outdenting then use |+ (keeping). Those additions to the | are called chomping indicators.
If you have the above in a file called input.yaml:
import sys
from pathlib import Path
import ruamel.yaml
input = Path('input.yaml')
yaml = ruamel.yaml.YAML(typ='safe')
data = yaml.load(input)
print(f'{data["Element"][0]!r}') # print the representation, so you can see where the newlines are
which gives:
'multiple lines\ncome here\n'

Use this syntax (for the yaml Python package, at least)
stuff:
- 'this is a multiline
string'
In other words quote the string and unindent its continuation.

Related

How do I save a multiline string to a YAML file?

I have several YAML files that store SQL scripts in them (as multiline strings). I have a Python script that takes all of these scripts and aggregates them into a single table.
Whenever I make an update to a YAML file, it converts the SQL text to a regular string (with \n's to indicate line breaks). Is there a way to preserve the multiline formatting when I make updates to the YAML file?
For multi-line scalars, you can use blocks. The pipe symbol character | to denote the start of a block.
For example:
Data: |
Some data, here and a special character like ':'
Another line of data on a separate line
Also you can check the YAML Multiline

YAML - keep text formatting in new document

What i have:
a: some meta info
b: more meta info
c: actual nicely
formatted text that
has line breaks
I'm looking to move c to a new YAML document by using doc separator ---
a: some meta info
b: more meta info
---
actual nicely
formatted text that has line breaks
and so on
But when I use 2nd alternative, I lose formatting like new lines etc.
Is there a way I can use the latter YAML approach format and keep line breaks?
I'm currently using ruamel.yaml library to read this yaml and below function to load my file.
yaml.load_all(f, Loader=yaml.Loader)
If you want the line breaks to be in your loaded value I recommend to make the second document a literal style scalar.
If you have input.yaml:
a: some meta info
b: more meta info
--- |
actual nicely
formatted text that
has line breaks
then this program:
from pathlib import Path
import ruamel.yaml
path_name = Path('input.yaml')
yaml = ruamel.yaml.YAML()
for data in yaml.load_all(path_name):
print(repr(data))
gives:
ordereddict([('a', 'some meta info'), ('b', 'more meta info')])
'actual nicely\nformatted text that\nhas line breaks\n'
Please note that some YAML libraries do (incorrectly) assume that a literal style scalar at the root level of a document needs to be indented.

Yaml - multi line syntax without delimiter

Is it possible in Yaml to have multi-line syntax for strings without an additional character generated between newlines?
Folded (>) syntax puts spaces, literal syntax (|) puts newlines between lines.
The summary here does not give a solution: In YAML, how do I break a string over multiple lines?.
E.g.
>-
line1_
line2
generates line1<space>line2 - I would like to have line1_line2 without additional token.
Use a double-quoted string:
"line1_\
line2"
By escaping the newline character, it is completely removed instead of being translated into a space. It is not possible to do this with block scalars because they have no escape sequences.

GoldParser: Accept programs not ending with an empty line

I'm rewriting a GoldParser Grammar for VBScript. In VBScript Statements are terminated using either a newline or ':'. Therefore i use the following terminal:
NewLine = {All Newline}
| ':'
Because every statement has to end with the Newline terminal, only programs ending with an empty line are accepted. How can i extend the newline terminal to also accept programs not ending with an empty line? I tried the following:
NewLine = {All Newline}
| ':'
| {EOF}
This does not work because the {EOF} (End of File) group does not exist.
EOF is a special token and I'm not aware of any syntax allowing you to use it in a production rule. It is emitted when the tokenizer receives no more data, and as such it is not a control character you could use in a terminal definition either.
That being said, you have different possibilities to parse the (strictly speaking invalid) input. The simplest may be to just append a newline at the end of the string or text being tokenized. While this will not make it parse correctly in the GOLD Builder test window, it will make your code process the data as expected and it will not add complexity to the grammar.

Escape an & (ampersand) at the start of a YAML entry?

An ampersand at the start of a YAML entry is normally seen as a label for a set of data that can be referenced later. How do you escape a legitimate ampersand at the start of a YAML entry. For example:
---
- news:
news_text: “Text!’
I am looking to not have &ldquo be a label within the yaml file, but rather when I get parse the YAML file to have the news_text come back with the “ in the entry.
Just put quotes around the text
require 'yaml'
data = <<END
---
- news:
news_text: "“Text!’"
END
puts YAML::load(data).inspect
# produces => [{"news"=>{"news_text"=>"“Text!’"}}]
You probably can enclose the text in quotes:
---
- news:
news_text: "“Text!’"
Besides, you can probably just as well use the proper characters there:
---
- news:
news_text: “Text!’
Putting escapes specific to a totally different markup language into a document written in another markup language seems ... odd to me, somehow.
Or you could put the string on the next line, if you put a '>' or '|' at the spot where the string used to be. Using the '|' character your parser will keep your custom line breaks, while '>' turns it into one long string, ignoring line breaks.
- news:
news_text: >
“Text!’
Putting the entire string in single quotes would do what you want:
---
- news:
news_text: '“Text!’'
But, I think that any yaml library should be smart enough to do that for you?

Resources