Keep text delimiters after !!str "some text" or !!str 'some text' - yaml

I have some data to be used to generate SQL, therefore it is important which text delimiters are used (single quotes ' delimits string literal but double quotes " delimit identifiers, at least in Oracle db).
For load procedure generator I used this
someKey: !!str 'Some SQL text'
and expected that someKey would contain the whole string including single quotes: 'Some SQL text'.
However, js-yaml.safeLoad() interprets the data as Some SQL text which is not what I wanted.
The workaround is easy, I can put the literal into additional quotes:
someKey: "'Some SQL text'"
which gives the expected result. However, I am not quite sure why in that case do we need !!str tag in YAML if it does virtually nothing (it is useful only for explicit interpretation number literals, true, false and null) and it is actually almost the same as putting double quotes around the text.
I would prefer to post this into some YAML-spec-related forum but it seems there is none.
Apart from the standard workaround, is there any trick that would do what I originally wanted, i.e. interpret any content after object key as string (+trimming off any initial and trailing spaces) without dealing with double quotes?

In YAML tag !!str is a predifened denoting a string scalar. If you specify that then even things that without that tag (or without quotes) would not be considered a string scalar, like 123, True or null.
Some string scalars need quotes e.g. if they start with a quote or double quote, if special characters need backslash espacing, or if there is a : (colon, space) in the string (which could confuse the parser to intrepret the string scalar as a key-value pair.
However putting !!str before something doesn't make it quoted (which should be obvious as it doesn't define what kind of quoting and single quoted scalars have vastly different rules from double quoted scalars).
Your workaround is not a workaround, that is just one of the ways in YAML you can specify a string scalar that starts and ends with a single quote. Another way is:
someKey: |-
'Some SQL text'
Within literal block style scalars quotes (single or double) are interpreted as is even at the beginning of the scalar. The - makes sure you don't get an extra newline after the final '

Related

Single backslash for Ruby gsub replacement value?

Does anyone know how to provide a single backslash as the replacement value in Ruby's gsub method? I thought using double backslashes for the replacement value would result in a single backslash but it results in two backslashes.
Example: "a\b".gsub("\", "\\")
Result: a\\b
I also get the same result using a block:
Example: "a\b".gsub("\"){"\\"}
Result: a\\b
Obviously I can't use a single backslash for the replacement value since that would just serve to escape the quote that follows it. I've also tried using single (as opposed to double) quotes around the replacement value but still get two backslashes in the result.
EDIT: Thanks to the commenters I now realize my confusion was with how the Rails console reports the result of the operation (i.e. a\\b). Although the strings 'a\b' and 'a\\b' appear to be different, they both have the same length:
'a\b'.length (3)
'a\\b'.length (3)
You can represent a single backslash by either "\\" or '\\'. Try this in irb, where
"\\".size
correctly outputs 1, showing that you indeed have only one character in this string, not 2 as you think. You can also do a
puts "\\"
Similarily, your example
puts("a\b".gsub("\", "\\"))
correctly prints
a\b

Difference between with or without quotes for a value defined in an Ansible variable [duplicate]

I am trying to write a YAML dictionary for internationalisation of a Rails project. I am a little confused though, as in some files I see strings in double-quotes and in some without. A few points to consider:
example 1 - all strings use double quotes;
example 2 - no strings (except the last two) use quotes;
the YAML cookbook says: Enclosing strings in double quotes allows you to use escaping to represent ASCII and Unicode characters. Does this mean I need to use double quotes only when I want to escape some characters? If yes - why do they use double quotes everywhere in the first example - only for the sake of unity / stylistic reasons?
the last two lines of example 2 use ! - the non-specific tag, while the last two lines of the first example don't - and they both work.
My question is: what are the rules for using the different types of quotes in YAML?
Could it be said that:
in general, you don't need quotes;
if you want to escape characters use double quotes;
use ! with single quotes, when... ?!?
After a brief review of the YAML cookbook cited in the question and some testing, here's my interpretation:
In general, you don't need quotes.
Use quotes to force a string, e.g. if your key or value is 10 but you want it to return a String and not a Fixnum, write '10' or "10".
Use quotes if your value includes special characters, (e.g. :, {, }, [, ], ,, &, *, #, ?, |, -, <, >, =, !, %, #, \).
Single quotes let you put almost any character in your string, and won't try to parse escape codes. '\n' would be returned as the string \n.
Double quotes parse escape codes. "\n" would be returned as a line feed character.
The exclamation mark introduces a method, e.g. !ruby/sym to return a Ruby symbol.
Seems to me that the best approach would be to not use quotes unless you have to, and then to use single quotes unless you specifically want to process escape codes.
Update
"Yes" and "No" should be enclosed in quotes (single or double) or else they will be interpreted as TrueClass and FalseClass values:
en:
yesno:
'yes': 'Yes'
'no': 'No'
While Mark's answer nicely summarizes when the quotes are needed according to the YAML language rules, I think what many of the developers/administrators are asking themselves, when working with strings in YAML, is "what should be my rule of thumb for handling the strings?"
It may sound subjective, but the number of rules you have to remember, if you want to use the quotes only when they are really needed as per the language spec, is somewhat excessive for such a simple thing as specifying one of the most common datatypes. Don't get me wrong, you will eventually remember them when working with YAML regularly, but what if you use it occasionally, and you didn't develop automatism for writing YAML? Do you really want to spend time remembering all the rules just to specify the string correctly?
The whole point of the "rule of thumb" is to save the cognitive resource and to handle a common task without thinking about it. Our "CPU" time can arguably be used for something more useful than handling the strings correctly.
From this - pure practical - perspective, I think the best rule of thumb is to single quote the strings. The rationale behind it:
Single quoted strings work for all scenarios, except when you need to use escape sequences.
The only special character you have to handle within a single-quoted string is the single quote itself.
These are just 2 rules to remember for some occasional YAML user, minimizing the cognitive effort.
There have been some great answers to this question.
However, I would like to extend them and provide some context from the new official YAML v1.2.2 specification (released October 1st 2021) which is the "true source" to all things considering YAML.
There are three different styles that can be used to represent strings, each of them with their own (dis-)advantages:
YAML provides three flow scalar styles: double-quoted, single-quoted and plain (unquoted). Each provides a different trade-off between readability and expressive power.
Double-quoted style:
The double-quoted style is specified by surrounding " indicators. This is the only style capable of expressing arbitrary strings, by using \ escape sequences. This comes at the cost of having to escape the \ and " characters.
Single-quoted style:
The single-quoted style is specified by surrounding ' indicators. Therefore, within a single-quoted scalar, such characters need to be repeated. This is the only form of escaping performed in single-quoted scalars. In particular, the \ and " characters may be freely used. This restricts single-quoted scalars to printable characters. In addition, it is only possible to break a long single-quoted line where a space character is surrounded by non-spaces.
Plain (unquoted) style:
The plain (unquoted) style has no identifying indicators and provides no form of escaping. It is therefore the most readable, most limited and most context sensitive style. In addition to a restricted character set, a plain scalar must not be empty or contain leading or trailing white space characters. It is only possible to break a long plain line where a space character is surrounded by non-spaces.
Plain scalars must not begin with most indicators, as this would cause ambiguity with other YAML constructs. However, the :, ? and - indicators may be used as the first character if followed by a non-space “safe” character, as this causes no ambiguity.
TL;DR
With that being said, according to the official YAML specification one should:
Whenever applicable use the unquoted style since it is the most readable.
Use the single-quoted style (') if characters such as " and \ are being used inside the string to avoid escaping them and therefore improve readability.
Use the double-quoted style (") when the first two options aren't sufficient, i.e. in scenarios where more complex line breaks are required or non-printable characters are needed.
Strings in yaml only need quotation if (the beginning of) the value can be misinterpreted as a data type or the value contains a ":" (because it could get misinterpreted as key).
For example
foo: '{{ bar }}'
needs quotes, because it can be misinterpreted as datatype dict, but
foo: barbaz{{ bam }}
does not, since it does not begin with a critical char. Next,
foo: '123'
needs quotes, because it can be misinterpreted as datatype int, but
foo: bar1baz234
bar: 123baz
Does not, because it can not be misinterpreted as int
foo: 'yes'
needs quotes, because it can be misinterpreted as datatype bool
foo: "bar:baz:bam"
needs quotes, because the value can be misinterpreted as key.
These are just examples. Using yamllint helps avoiding to start values with a wrong token
foo#bar:/tmp$ yamllint test.yaml
test.yaml
3:4 error syntax error: found character '#' that cannot start any token (syntax)
and is a must, if working productively with yaml.
Quoting all strings as some suggest, is like using brackets in python. It is bad practice, harms readability and throws away the beautiful feature of not having to quote strings.
I had this concern when working on a Rails application with Docker.
My most preferred approach is to generally not use quotes. This includes not using quotes for:
variables like ${RAILS_ENV}
values separated by a colon (:) like postgres-log:/var/log/postgresql
other strings values
I, however, use double-quotes for integer values that need to be converted to strings like:
docker-compose version like version: "3.8"
port numbers like "8080:8080"
image "traefik:v2.2.1"
However, for special cases like booleans, floats, integers, and other cases, where using double-quotes for the entry values could be interpreted as strings, please do not use double-quotes.
Here's a sample docker-compose.yml file to explain this concept:
version: "3"
services:
traefik:
image: "traefik:v2.2.1"
command:
- --api.insecure=true # Don't do that in production
- --providers.docker=true
- --providers.docker.exposedbydefault=false
- --entrypoints.web.address=:80
ports:
- "80:80"
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
That's all.
I hope this helps
If you are trying to escape a string in pytest tavern, !raw could be helpful to avoid parsing of strings to yaml:
some: !raw "{test: 123}"
Check for more info:
https://tavern.readthedocs.io/en/latest/basics.html#type-conversions
Here's a small function (not optimized for performance) that quotes your strings with single quotes if needed and tests if the result could be unmarshalled into the original value: https://go.dev/play/p/AKBzDpVz9hk.
Instead of testing for the rules it simply uses the marshaller itself and checks if the marshalled and unmmarshalled value matches the original version.
func yamlQuote(value string) string {
input := fmt.Sprintf("key: %s", value)
var res struct {
Value string `yaml:"key"`
}
if err := yaml.Unmarshal([]byte(input), &res); err != nil || value != res.Value {
quoted := strings.ReplaceAll(value, `'`, `''`)
return fmt.Sprintf("'%s'", quoted)
}
return value
}
version: "3.9"
services:
seunggabi:
image: seunggabi:v1.0.0
command:
api:
insecure: true
ports:
- 80:80
- 8080:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
docker compoese up docker-compose.yaml
If you use docker compose v2, you don't need to use quotation for boolean.
Only the version needs quotations.

how to check if a string contains any form of an apostrophe not just single quote and ruby

I am using the following code to check if a string contains an apostrophe:
string.scan(/’|'/)
I have included two types of single quotation because I found that using just the standard ' did not catch some strings that contain an apostrophe using the ’
My concern is that if I am checking strings that may contain other fonts or styles my regex won't catch the apostrophe.
Is there a more general approach that would catch all forms of an apostrophe?
Straight single quote is the generic ver­ti­cal quo­ta­tion marks:
straight sin­gle quote (')
Curly quotes are the quo­ta­tion marks used in good ty­pog­ra­phy. There are two curly single quote char­ac­ters:
the open­ing sin­gle quote (‘)
the clos­ing sin­gle quote (’)
Going by the above three variants:
You maytry this:
string.scan(/['‘’]/)
Those would probably be the most common ones :
/[‘’']/
If you just need to check if a string contains a regex, you shouldn't use scan :
"apostrophe's" =~ /[‘’']/ #=> 10
=~ will stop at the first match.

Lookahead containing the same token as left/right anchors

Got a variation of the classic "regex quoted strings" problem. I need to pick out strings that look like this:
"foo bar bar"
from a long string like this
token token "maybe quoted token that can also contain spaces"
Each of the tokens can be quoted or unquoted (this is easy to take care of using alternating groups) but sometimes I have quoted strings which have literal quotes inside them (not escaped in any way),
the only useable thing being that those quotes never have spaces on either side (since that would
create a delimiter). Those tokens look like this: "foo-bar"baz"
My initial thought was /"(?:[^"]|" )*"/ but that doesn't seem to work because a token like this: "here is some"quotes" gets split in two.
How should I do this? Platform is Ruby 2.1
Use this:
"(?:[^"]|"\w)+"
or
"(?:[^"]|"\S)+"
You can play with sample strings in the regex demo.
Explanation
" matches the opening quote
The non-capturing group(?:start [^"]|"\w) matches...
One [^"] non-quote character, OR |
One quote and a word character "\w
+ one or more times
" closing quote
Further Refinements
If you want to allow quotes in other contexts, for instance escaped quotes, just add them to the alternation:
"(?:\\"|[^"]|"\w)+"
To allow quotes to be followed not just by a word char but any non-space:
"(?:\\"|[^"]|"\S)+"
This one may also suit your needs:
".*?"(?!\S)
Debuggex Demo
To match also non-quoted tokens:
".*?"(?!\S)|\S+
Debuggex Demo

YAML: Do I need quotes for strings in YAML?

I am trying to write a YAML dictionary for internationalisation of a Rails project. I am a little confused though, as in some files I see strings in double-quotes and in some without. A few points to consider:
example 1 - all strings use double quotes;
example 2 - no strings (except the last two) use quotes;
the YAML cookbook says: Enclosing strings in double quotes allows you to use escaping to represent ASCII and Unicode characters. Does this mean I need to use double quotes only when I want to escape some characters? If yes - why do they use double quotes everywhere in the first example - only for the sake of unity / stylistic reasons?
the last two lines of example 2 use ! - the non-specific tag, while the last two lines of the first example don't - and they both work.
My question is: what are the rules for using the different types of quotes in YAML?
Could it be said that:
in general, you don't need quotes;
if you want to escape characters use double quotes;
use ! with single quotes, when... ?!?
After a brief review of the YAML cookbook cited in the question and some testing, here's my interpretation:
In general, you don't need quotes.
Use quotes to force a string, e.g. if your key or value is 10 but you want it to return a String and not a Fixnum, write '10' or "10".
Use quotes if your value includes special characters, (e.g. :, {, }, [, ], ,, &, *, #, ?, |, -, <, >, =, !, %, #, \).
Single quotes let you put almost any character in your string, and won't try to parse escape codes. '\n' would be returned as the string \n.
Double quotes parse escape codes. "\n" would be returned as a line feed character.
The exclamation mark introduces a method, e.g. !ruby/sym to return a Ruby symbol.
Seems to me that the best approach would be to not use quotes unless you have to, and then to use single quotes unless you specifically want to process escape codes.
Update
"Yes" and "No" should be enclosed in quotes (single or double) or else they will be interpreted as TrueClass and FalseClass values:
en:
yesno:
'yes': 'Yes'
'no': 'No'
While Mark's answer nicely summarizes when the quotes are needed according to the YAML language rules, I think what many of the developers/administrators are asking themselves, when working with strings in YAML, is "what should be my rule of thumb for handling the strings?"
It may sound subjective, but the number of rules you have to remember, if you want to use the quotes only when they are really needed as per the language spec, is somewhat excessive for such a simple thing as specifying one of the most common datatypes. Don't get me wrong, you will eventually remember them when working with YAML regularly, but what if you use it occasionally, and you didn't develop automatism for writing YAML? Do you really want to spend time remembering all the rules just to specify the string correctly?
The whole point of the "rule of thumb" is to save the cognitive resource and to handle a common task without thinking about it. Our "CPU" time can arguably be used for something more useful than handling the strings correctly.
From this - pure practical - perspective, I think the best rule of thumb is to single quote the strings. The rationale behind it:
Single quoted strings work for all scenarios, except when you need to use escape sequences.
The only special character you have to handle within a single-quoted string is the single quote itself.
These are just 2 rules to remember for some occasional YAML user, minimizing the cognitive effort.
There have been some great answers to this question.
However, I would like to extend them and provide some context from the new official YAML v1.2.2 specification (released October 1st 2021) which is the "true source" to all things considering YAML.
There are three different styles that can be used to represent strings, each of them with their own (dis-)advantages:
YAML provides three flow scalar styles: double-quoted, single-quoted and plain (unquoted). Each provides a different trade-off between readability and expressive power.
Double-quoted style:
The double-quoted style is specified by surrounding " indicators. This is the only style capable of expressing arbitrary strings, by using \ escape sequences. This comes at the cost of having to escape the \ and " characters.
Single-quoted style:
The single-quoted style is specified by surrounding ' indicators. Therefore, within a single-quoted scalar, such characters need to be repeated. This is the only form of escaping performed in single-quoted scalars. In particular, the \ and " characters may be freely used. This restricts single-quoted scalars to printable characters. In addition, it is only possible to break a long single-quoted line where a space character is surrounded by non-spaces.
Plain (unquoted) style:
The plain (unquoted) style has no identifying indicators and provides no form of escaping. It is therefore the most readable, most limited and most context sensitive style. In addition to a restricted character set, a plain scalar must not be empty or contain leading or trailing white space characters. It is only possible to break a long plain line where a space character is surrounded by non-spaces.
Plain scalars must not begin with most indicators, as this would cause ambiguity with other YAML constructs. However, the :, ? and - indicators may be used as the first character if followed by a non-space “safe” character, as this causes no ambiguity.
TL;DR
With that being said, according to the official YAML specification one should:
Whenever applicable use the unquoted style since it is the most readable.
Use the single-quoted style (') if characters such as " and \ are being used inside the string to avoid escaping them and therefore improve readability.
Use the double-quoted style (") when the first two options aren't sufficient, i.e. in scenarios where more complex line breaks are required or non-printable characters are needed.
Strings in yaml only need quotation if (the beginning of) the value can be misinterpreted as a data type or the value contains a ":" (because it could get misinterpreted as key).
For example
foo: '{{ bar }}'
needs quotes, because it can be misinterpreted as datatype dict, but
foo: barbaz{{ bam }}
does not, since it does not begin with a critical char. Next,
foo: '123'
needs quotes, because it can be misinterpreted as datatype int, but
foo: bar1baz234
bar: 123baz
Does not, because it can not be misinterpreted as int
foo: 'yes'
needs quotes, because it can be misinterpreted as datatype bool
foo: "bar:baz:bam"
needs quotes, because the value can be misinterpreted as key.
These are just examples. Using yamllint helps avoiding to start values with a wrong token
foo#bar:/tmp$ yamllint test.yaml
test.yaml
3:4 error syntax error: found character '#' that cannot start any token (syntax)
and is a must, if working productively with yaml.
Quoting all strings as some suggest, is like using brackets in python. It is bad practice, harms readability and throws away the beautiful feature of not having to quote strings.
I had this concern when working on a Rails application with Docker.
My most preferred approach is to generally not use quotes. This includes not using quotes for:
variables like ${RAILS_ENV}
values separated by a colon (:) like postgres-log:/var/log/postgresql
other strings values
I, however, use double-quotes for integer values that need to be converted to strings like:
docker-compose version like version: "3.8"
port numbers like "8080:8080"
image "traefik:v2.2.1"
However, for special cases like booleans, floats, integers, and other cases, where using double-quotes for the entry values could be interpreted as strings, please do not use double-quotes.
Here's a sample docker-compose.yml file to explain this concept:
version: "3"
services:
traefik:
image: "traefik:v2.2.1"
command:
- --api.insecure=true # Don't do that in production
- --providers.docker=true
- --providers.docker.exposedbydefault=false
- --entrypoints.web.address=:80
ports:
- "80:80"
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
That's all.
I hope this helps
If you are trying to escape a string in pytest tavern, !raw could be helpful to avoid parsing of strings to yaml:
some: !raw "{test: 123}"
Check for more info:
https://tavern.readthedocs.io/en/latest/basics.html#type-conversions
Here's a small function (not optimized for performance) that quotes your strings with single quotes if needed and tests if the result could be unmarshalled into the original value: https://go.dev/play/p/AKBzDpVz9hk.
Instead of testing for the rules it simply uses the marshaller itself and checks if the marshalled and unmmarshalled value matches the original version.
func yamlQuote(value string) string {
input := fmt.Sprintf("key: %s", value)
var res struct {
Value string `yaml:"key"`
}
if err := yaml.Unmarshal([]byte(input), &res); err != nil || value != res.Value {
quoted := strings.ReplaceAll(value, `'`, `''`)
return fmt.Sprintf("'%s'", quoted)
}
return value
}
version: "3.9"
services:
seunggabi:
image: seunggabi:v1.0.0
command:
api:
insecure: true
ports:
- 80:80
- 8080:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
docker compoese up docker-compose.yaml
If you use docker compose v2, you don't need to use quotation for boolean.
Only the version needs quotations.

Resources