How do I save a multiline string to a YAML file? - yaml

I have several YAML files that store SQL scripts in them (as multiline strings). I have a Python script that takes all of these scripts and aggregates them into a single table.
Whenever I make an update to a YAML file, it converts the SQL text to a regular string (with \n's to indicate line breaks). Is there a way to preserve the multiline formatting when I make updates to the YAML file?

For multi-line scalars, you can use blocks. The pipe symbol character | to denote the start of a block.
For example:
Data: |
Some data, here and a special character like ':'
Another line of data on a separate line
Also you can check the YAML Multiline

Related

Converting a TXT file with double quotes to a pipe-delimited format using sed

I'm trying to convert TXT files into pipe-delimited text files.
Let's say I have a file called sample.csv:
aaa",bbb"ccc,"ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","nnn"ooo,ppp"qqq",rrr" sss,"ttt,""uuu",Z
I'd like to convert this into an output that looks like this:
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
Now after tons of searching, I have come the closest using this sed command:
sed -r 's/""/\v/g;s/("([^"]+)")?,/\2\|/g;s/"([^"]+)"$/\1/;s/\v/"/g'
However, the output that I received was:
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|pppqqq|rrr" sss|ttt,"uuu|Z
Where the expected for the 9th column should have been ppp"qqq" but the result removed the double quotes and what I got was pppqqq.
I have been playing around with this for a while, but to no avail.
Any help regarding this would be highly appreciated.
As suggested in comments sed or any other Unix tool is not recommended for this kind of complex CSV string. It is much better to use a dedicated CSV parser like this in PHP:
$s = 'aaa",bbb"ccc,"ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","nnn"ooo,ppp"qqq",rrr" sss,"ttt,""uuu",Z';
echo implode('|', str_getcsv($s));
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|nnnooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
The problem with sample.csv is that it mixes non-quoted fields (containing quotes) with fully quoted fields (that should be treated as such).
You can't have both at the same time. Either all fields are (treated as) unquoted and quotes are preserved, or all fields containing a quote (or separator) are fully quoted and the quotes inside are escaped with another quote.
So, sample.csv should become:
"aaa""","bbb""ccc","ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","""nnn""ooo","ppp""qqq""","rrr"" sss","ttt,""uuu",Z
to give you the desired result (using a csv parser):
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
Have the same problem.
I found right result with https://www.papaparse.com/demo
Here is a FOSS on github. So maybe you can check how it works.
With the source of [ "aaa""","bbb""ccc","ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","""nnn""ooo","ppp""qqq""","rrr"" sss","ttt,""uuu",Z ]
The result appears in the browser console:
[1]: https://i.stack.imgur.com/OB5OM.png

Multi-line in a sequence in YAML

I would like to have multiple lines in a sequence in YAML. This is how I do it, but I have issues with parsing it in python:
Element: |
- multiple lines
come here
Doing it this way, when I parse it with Python, I still see the - in the parsed data. It seems that YAML does not understand this is a list.
Your input is not a list, YAML only knows about mappings (constructed as a Python dict and sequences (constructed as a Python list).
Normally - is the block sequence entry indicator, But since you start a block style literal on the first line as the value for the key Element, because of the |, everything following it that is indented is part of this scalar (constructed as a Python string).
What you want to do is bring the indicator outside of the literal scalar:
Element:
- |
multiple lines
come here
If you load that in Python in a variable data then data['Element'][0] will be the string 'multiple lines\ncome here\n'. That is: every newline in your literal scalar will be a newline in your string, and there will be a single final newline on that string independent of how many empty lines follow (this is clipping). If you want the end to have no newline, then use |- (stripping), and if you want all newlines until outdenting then use |+ (keeping). Those additions to the | are called chomping indicators.
If you have the above in a file called input.yaml:
import sys
from pathlib import Path
import ruamel.yaml
input = Path('input.yaml')
yaml = ruamel.yaml.YAML(typ='safe')
data = yaml.load(input)
print(f'{data["Element"][0]!r}') # print the representation, so you can see where the newlines are
which gives:
'multiple lines\ncome here\n'
Use this syntax (for the yaml Python package, at least)
stuff:
- 'this is a multiline
string'
In other words quote the string and unindent its continuation.

BASH: Replacing special character groups

I have a rather tricky request...
We use a special application which is connected to a oracle database. For control reasons the application uses special characters which are defined by the application and saved in a long field of the database.
My task is to query the long field periodically and check for changes. To do that, I write the content by using a bash script in a file and compare the old and the new file with md5sum.
When there's a difference, I want to send the old file via mail. The problem is, that the old file contains these special characters and I don't know how to replace them with for example a string which describes them.
I tried to replace them on the basis of their ASCII code, but this didn't work. I've also tried to replace them by their appearance in the file. (They look like this: ^P ) This didn't work neither.
When viewing the file by text editor like nano the characters are visible like described above. But when using cat on the file, the content is only displayed until the first appearance of such a control character.
As far as I know there is know possibility to replace them while querying from the database because of the fact that the content is in a LONG field.
I hope you can help me.
Thank you in advance.
Marco
^P is the Control-P character, which is decimal 16 or hexadecimal 0x10, also known as the Data Link Escape (DLE) character in ASCII.
To replace all occurrences of 0x10 in a file with another string we can use our friend gsed:
gsed "s/\x10/Data Link Escape/g" yourfile.txt
This should replace all occurrences of characters containing the hex value 0x10 with the text string "Data Link Escape". You'll probably want to use a different string - this is just an example.
Depending on the system you're using you may be able to use the standard sed command if your version of sed recognizes the \xNN single-character escape codes. If there are multiple hex characters you need to replace you may want to create a file containing your sed commands, one for each hexadecmial character you need to replace, and tell sed or gsed to use the commands in the file - consult the sed or gsed man pages for how to do this.
Share and enjoy.
You can use xxd to change the string to its hex representation, then use xxd -r to convert back.
Or, you can use uuencode and uudecode.
One option is to run the file through cat -v. This replaces nonprinting characters with visible representations (using the ^ notation for control characters):
$ echo $'\x10\x12\x13\x14\x16' | cat -v
^P^R^S^T^V

sed/awk/bash to replace text between two strings with external file contents

I'm looking to write a script/command, that'll take inputFile1, look for a specific start and end string in it, and replace all the text in between them
with the full contents of inputFile2.
Ideally, but not mandatory, this should work without a need to escape special characters, so I can put the strings in variables that get called by the script (that way I could easily reuse it multiple times).
As an example, I have file inputYes.txt with contents:
DummyOne
Start
That
What
Yes
End
DummyTwo
And inputNo.txt with contents:
This
Why
Not
And I want the script to search inputYes.txt for the strings Start and End, and replace all the text in between with the contents of inputNo.txt, and write to the file.
So after running it, inputYes.txt should read
DummyOne
Start
This
Why
Not
End
DummyTwo
sed '/end_string/rinputFile2
/start_string/,/end_string/d' inputFile1

Preserve new lines in YAML

How do I format a YAML document like this so that PyYAML can parse it properly?
Data: Some data, here and a special character like ':'
Another line of data on a separate line
I know that the ':' character is special so I have to surround the whole thing in quotations like so:
Data: "Some data, here and a special character like ':'
Another line of data on a separate line"
And in order to add a new line, I have to add '\n':
Data: "Some data, here and a special character like ':'\n
Another line of data on a separate line"
Is there anyway to format the YAML document so I don't have to add the '\n's in order to have a new line?
For multi-line scalars, you can use blocks. The character | denotes the start of a block. Use:
Data: |
Some data, here and a special character like ':'
Another line of data on a separate line
If the extra newline that NullUserException's solutions is adding is a problem you should be using:
Data: |-
Some data, here and a special character like ':'
Another line of data on a separate line

Resources