YAML - keep text formatting in new document - yaml

What i have:
a: some meta info
b: more meta info
c: actual nicely
formatted text that
has line breaks
I'm looking to move c to a new YAML document by using doc separator ---
a: some meta info
b: more meta info
---
actual nicely
formatted text that has line breaks
and so on
But when I use 2nd alternative, I lose formatting like new lines etc.
Is there a way I can use the latter YAML approach format and keep line breaks?
I'm currently using ruamel.yaml library to read this yaml and below function to load my file.
yaml.load_all(f, Loader=yaml.Loader)

If you want the line breaks to be in your loaded value I recommend to make the second document a literal style scalar.
If you have input.yaml:
a: some meta info
b: more meta info
--- |
actual nicely
formatted text that
has line breaks
then this program:
from pathlib import Path
import ruamel.yaml
path_name = Path('input.yaml')
yaml = ruamel.yaml.YAML()
for data in yaml.load_all(path_name):
print(repr(data))
gives:
ordereddict([('a', 'some meta info'), ('b', 'more meta info')])
'actual nicely\nformatted text that\nhas line breaks\n'
Please note that some YAML libraries do (incorrectly) assume that a literal style scalar at the root level of a document needs to be indented.

Related

How do I save a multiline string to a YAML file?

I have several YAML files that store SQL scripts in them (as multiline strings). I have a Python script that takes all of these scripts and aggregates them into a single table.
Whenever I make an update to a YAML file, it converts the SQL text to a regular string (with \n's to indicate line breaks). Is there a way to preserve the multiline formatting when I make updates to the YAML file?
For multi-line scalars, you can use blocks. The pipe symbol character | to denote the start of a block.
For example:
Data: |
Some data, here and a special character like ':'
Another line of data on a separate line
Also you can check the YAML Multiline

MongoDB Error parsing YAML Config illegal map value for replica set

Here is my /etc/mongodb.conf - using MongoDB 3.6.
I'm having a challenge with the config file being parsed when starting mongod.
I have a single space after each colon and two spaces on each new line
I took the replica set example from mongoDB docs here: https://docs.mongodb.com/manual/reference/configuration-options/#replication-options
dbpath=/home/ubuntu/data/db
logpath=/home/ubuntu/data/db/log/mongo.log
logappend=true
journal=true
replication:
replSetName: rep
net:
bindIp: 127.0.0.1
port: 27017
The error is:
Error parsing YAML config file: yaml-cpp: error at line 6, column 12: illegal map value
Command I'm sending is
Error parsing YAML config file: yaml-cpp: error at line 6, column 12: illegal map value
try 'mongod --help' for more information
I don't know what you think the first four lines do but they are certainly not YAML; instead they use a format resembling .properties files, with a = separating property name from value.
Since a = in YAML is simply content, it parses the first six lines as multiline scalar, meaning the value of those lines in YAML is the scalar
dbpath=/home/ubuntu/data/db logpath=/home/ubuntu/data/db/log/mongo.log logappend=true journal=true
replication
(Single line breaks are folded into a space, an empty line generates a line break.)
Now the error happens because YAML disallows multiline scalars to be implicit keys of a mapping. Implicit keys are scalars preceding a : on the same line which form a mapping key.
You fix the error by removing the first four lines, or transforming them into proper YAML. It is unclear what your intention with those lines is since not every name has a corresponding setting in the documentation you linked.

Multi-line in a sequence in YAML

I would like to have multiple lines in a sequence in YAML. This is how I do it, but I have issues with parsing it in python:
Element: |
- multiple lines
come here
Doing it this way, when I parse it with Python, I still see the - in the parsed data. It seems that YAML does not understand this is a list.
Your input is not a list, YAML only knows about mappings (constructed as a Python dict and sequences (constructed as a Python list).
Normally - is the block sequence entry indicator, But since you start a block style literal on the first line as the value for the key Element, because of the |, everything following it that is indented is part of this scalar (constructed as a Python string).
What you want to do is bring the indicator outside of the literal scalar:
Element:
- |
multiple lines
come here
If you load that in Python in a variable data then data['Element'][0] will be the string 'multiple lines\ncome here\n'. That is: every newline in your literal scalar will be a newline in your string, and there will be a single final newline on that string independent of how many empty lines follow (this is clipping). If you want the end to have no newline, then use |- (stripping), and if you want all newlines until outdenting then use |+ (keeping). Those additions to the | are called chomping indicators.
If you have the above in a file called input.yaml:
import sys
from pathlib import Path
import ruamel.yaml
input = Path('input.yaml')
yaml = ruamel.yaml.YAML(typ='safe')
data = yaml.load(input)
print(f'{data["Element"][0]!r}') # print the representation, so you can see where the newlines are
which gives:
'multiple lines\ncome here\n'
Use this syntax (for the yaml Python package, at least)
stuff:
- 'this is a multiline
string'
In other words quote the string and unindent its continuation.

Remove YAML header from markdown file

How to remove a YAML header like this one from a text file in Ruby:
---
date: 2013-02-02 11:22:33
title: "Some Title"
Foo: Bar
...
---
(The YAML is surrounded by three dashes (-))
I tried
text.gsub(/---(.*)---/, '') # text is the variable which contains the full text of the file
but it didn't work.
The solution mentioned above will match from the first occurrence of --- to the last occurrence of --- and everything in between. That means if --- appears later on in your file you'll strip out not only the header, but some of the rest of the content.
This regex will only remove the yaml header:
/\A---(.|\n)*?---/
The \A ensures that it starts matching against the very first instance of --- and the ? makes the * be non-greedy, which makes it stop matching at the second instance of ---.
Found a solution, regex should be:
/---(.|\n)*---/

Escape an & (ampersand) at the start of a YAML entry?

An ampersand at the start of a YAML entry is normally seen as a label for a set of data that can be referenced later. How do you escape a legitimate ampersand at the start of a YAML entry. For example:
---
- news:
news_text: “Text!’
I am looking to not have &ldquo be a label within the yaml file, but rather when I get parse the YAML file to have the news_text come back with the “ in the entry.
Just put quotes around the text
require 'yaml'
data = <<END
---
- news:
news_text: "“Text!’"
END
puts YAML::load(data).inspect
# produces => [{"news"=>{"news_text"=>"“Text!’"}}]
You probably can enclose the text in quotes:
---
- news:
news_text: "“Text!’"
Besides, you can probably just as well use the proper characters there:
---
- news:
news_text: “Text!’
Putting escapes specific to a totally different markup language into a document written in another markup language seems ... odd to me, somehow.
Or you could put the string on the next line, if you put a '>' or '|' at the spot where the string used to be. Using the '|' character your parser will keep your custom line breaks, while '>' turns it into one long string, ignoring line breaks.
- news:
news_text: >
“Text!’
Putting the entire string in single quotes would do what you want:
---
- news:
news_text: '“Text!’'
But, I think that any yaml library should be smart enough to do that for you?

Resources