I am trying to split a string by three consecutive newlines ("\n\n\n"). I was trying str.split('\n\n\n') and it didn't work, but when I changed to str.split("\n\n\n"), it started to work. Could anyone explain to me why such behaviour happens?
String in single quotes is a raw string. So '\n\n\n' is three backslashes and three n, not three line feeds as you expected. Only double quotes string can be escaped correctly.
puts 'abc\nabc' # => abc\nabc
puts "abc\nabc" # => abc
# abc
Single quoted string have the actual/literal contents, e.g.
1.9.3-p194 :003 > puts 'Hi\nThere'
Hi\nThere
=> nil
Whereas double-quoted string 'interpolate' the special characters (\n) and do the line feed, e.g.
1.9.3-p194 :004 > puts "Hi\nThere"
Hi
There
=> nil
1.9.3-p194 :005 >
Best practice Recommendations:
Choose single quotes over double quotes when possible (use double quotes as needed for interpolation).
When nesting 'Quotes inside "quotes" somewhere' put the double ones inside the single quotes
In single-quoted string literals, backslashes need not be doubled
'\n' == '\\n'
Related
s = "#main= 'quotes'
s.gsub "'", "\\'" # => "#main= quotes'quotes"
This seems to be wrong, I expect to get "#main= \\'quotes\\'"
when I don't use escape char, then it works as expected.
s.gsub "'", "*" # => "#main= *quotes*"
So there must be something to do with escaping.
Using ruby 1.9.2p290
I need to replace single quotes with back-slash and a quote.
Even more inconsistencies:
"\\'".length # => 2
"\\*".length # => 2
# As expected
"'".gsub("'", "\\*").length # => 2
"'a'".gsub("'", "\\*") # => "\\*a\\*" (length==5)
# WTF next:
"'".gsub("'", "\\'").length # => 0
# Doubling the content?
"'a'".gsub("'", "\\'") # => "a'a" (length==3)
What is going on here?
You're getting tripped up by the specialness of \' inside a regular expression replacement string:
\0, \1, \2, ... \9, \&, \`, \', \+
Substitutes the value matched by the nth grouped subexpression, or by the entire match, pre- or postmatch, or the highest group.
So when you say "\\'", the double \\ becomes just a single backslash and the result is \' but that means "The string to the right of the last successful match." If you want to replace single quotes with escaped single quotes, you need to escape more to get past the specialness of \':
s.gsub("'", "\\\\'")
Or avoid the toothpicks and use the block form:
s.gsub("'") { |m| '\\' + m }
You would run into similar issues if you were trying to escape backticks, a plus sign, or even a single digit.
The overall lesson here is to prefer the block form of gsub for anything but the most trivial of substitutions.
s = "#main = 'quotes'
s.gsub "'", "\\\\'"
Since \it's \\equivalent if you want to get a double backslash you have to put four of ones.
You need to escape the \ as well:
s.gsub "'", "\\\\'"
Outputs
"#main= \\'quotes\\'"
A good explanation found on an outside forum:
The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have [two] backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 = 4 backslashes on the screen for one literal
replacement backslash.
source
I don't understand this Ruby code:
>> puts '\\ <- single backslash'
# \ <- single backslash
>> puts '\\ <- 2x a, because 2 backslashes get replaced'.sub(/\\/, 'aa')
# aa <- 2x a, because two backslashes get replaced
so far, all as expected. but if we search for 1 with /\\/, and replace with 2, encoded by '\\\\', why do we get this:
>> puts '\\ <- only 1 ... replace 1 with 2'.sub(/\\/, '\\\\')
# \ <- only 1 backslash, even though we replace 1 with 2
and then, when we encode 3 with '\\\\\\', we only get 2:
>> puts '\\ <- only 2 ... 1 with 3'.sub(/\\/, '\\\\\\')
# \\ <- 2 backslashes, even though we replace 1 with 3
anyone able to understand why a backslash gets swallowed in the replacement string? this happens on 1.8 and 1.9.
Quick Answer
If you want to sidestep all this confusion, use the much less confusing block syntax. Here is an example that replaces each backslash with 2 backslashes:
"some\\path".gsub('\\') { '\\\\' }
Gruesome Details
The problem is that when using sub (and gsub), without a block, ruby interprets special character sequences in the replacement parameter. Unfortunately, sub uses the backslash as the escape character for these:
\& (the entire regex)
\+ (the last group)
\` (pre-match string)
\' (post-match string)
\0 (same as \&)
\1 (first captured group)
\2 (second captured group)
\\ (a backslash)
Like any escaping, this creates an obvious problem. If you want include the literal value of one of the above sequences (e.g. \1) in the output string you have to escape it. So, to get Hello \1, you need the replacement string to be Hello \\1. And to represent this as a string literal in Ruby, you have to escape those backslashes again like this: "Hello \\\\1"
So, there are two different escaping passes. The first one takes the string literal and creates the internal string value. The second takes that internal string value and replaces the sequences above with the matching data.
If a backslash is not followed by a character that matches one of the above sequences, then the backslash (and character that follows) will pass through unaltered. This is also affects a backslash at the end of the string -- it will pass through unaltered. It's easiest to see this logic in the rubinius code; just look for the to_sub_replacement method in the String class.
Here are some examples of how String#sub is parsing the replacement string:
1 backslash \ (which has a string literal of "\\")
Passes through unaltered because the backslash is at the end of the string and has no characters after it.
Result: \
2 backslashes \\ (which have a string literal of "\\\\")
The pair of backslashes match the escaped backslash sequence (see \\ above) and gets converted into a single backslash.
Result: \
3 backslashes \\\ (which have a string literal of "\\\\\\")
The first two backslashes match the \\ sequence and get converted to a single backslash. Then the final backslash is at the end of the string so it passes through unaltered.
Result: \\
4 backslashes \\\\ (which have a string literal of "\\\\\\\\")
Two pairs of backslashes each match the \\ sequence and get converted to a single backslash.
Result: \\
2 backslashes with character in the middle \a\ (which have a string literal of "\\a\\")
The \a does not match any of the escape sequences so it is allowed to pass through unaltered. The trailing backslash is also allowed through.
Result: \a\
Note: The same result could be obtained from: \\a\\ (with the literal string: "\\\\a\\\\")
In hindsight, this could have been less confusing if String#sub had used a different escape character. Then there wouldn't be the need to double escape all the backslashes.
This is an issue because backslash (\) serves as an escape character for Regexps and Strings. You could do use the special variable \& to reduce the number backslashes in the gsub replacement string.
foo.gsub(/\\/,'\&\&\&') #for some string foo replace each \ with \\\
EDIT: I should mention that the value of \& is from a Regexp match, in this case a single backslash.
Also, I thought that there was a special way to create a string that disabled the escape character, but apparently not. None of these will produce two slashes:
puts "\\"
puts '\\'
puts %q{\\}
puts %Q{\\}
puts """\\"""
puts '''\\'''
puts <<EOF
\\
EOF
argh, right after I typed all this out, I realised that \ is used to refer to groups in the replacement string. I guess this means that you need a literal \\ in the replacement string to get one replaced \. To get a literal \\ you need four \s, so to replace one with two you actually need eight(!).
# Double every occurrence of \. There's eight backslashes on the right there!
>> puts '\\'.sub(/\\/, '\\\\\\\\')
anything I'm missing? any more efficient ways?
Clearing up a little confusion on the author's second line of code.
You said:
>> puts '\\ <- 2x a, because 2 backslashes get replaced'.sub(/\\/, 'aa')
# aa <- 2x a, because two backslashes get replaced
2 backslashes aren't getting replaced here. You're replacing 1 escaped backslash with two a's ('aa'). That is, if you used .sub(/\\/, 'a'), you would only see one 'a'
'\\'.sub(/\\/, 'anything') #=> anything
the pickaxe book mentions this exact problem, actually. here's another alternative (from page 130 of the latest edition)
str = 'a\b\c' # => "a\b\c"
str.gsub(/\\/) { '\\\\' } # => "a\\b\\c"
s = "#main= 'quotes'
s.gsub "'", "\\'" # => "#main= quotes'quotes"
This seems to be wrong, I expect to get "#main= \\'quotes\\'"
when I don't use escape char, then it works as expected.
s.gsub "'", "*" # => "#main= *quotes*"
So there must be something to do with escaping.
Using ruby 1.9.2p290
I need to replace single quotes with back-slash and a quote.
Even more inconsistencies:
"\\'".length # => 2
"\\*".length # => 2
# As expected
"'".gsub("'", "\\*").length # => 2
"'a'".gsub("'", "\\*") # => "\\*a\\*" (length==5)
# WTF next:
"'".gsub("'", "\\'").length # => 0
# Doubling the content?
"'a'".gsub("'", "\\'") # => "a'a" (length==3)
What is going on here?
You're getting tripped up by the specialness of \' inside a regular expression replacement string:
\0, \1, \2, ... \9, \&, \`, \', \+
Substitutes the value matched by the nth grouped subexpression, or by the entire match, pre- or postmatch, or the highest group.
So when you say "\\'", the double \\ becomes just a single backslash and the result is \' but that means "The string to the right of the last successful match." If you want to replace single quotes with escaped single quotes, you need to escape more to get past the specialness of \':
s.gsub("'", "\\\\'")
Or avoid the toothpicks and use the block form:
s.gsub("'") { |m| '\\' + m }
You would run into similar issues if you were trying to escape backticks, a plus sign, or even a single digit.
The overall lesson here is to prefer the block form of gsub for anything but the most trivial of substitutions.
s = "#main = 'quotes'
s.gsub "'", "\\\\'"
Since \it's \\equivalent if you want to get a double backslash you have to put four of ones.
You need to escape the \ as well:
s.gsub "'", "\\\\'"
Outputs
"#main= \\'quotes\\'"
A good explanation found on an outside forum:
The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have [two] backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 = 4 backslashes on the screen for one literal
replacement backslash.
source
How can i say "all symbols except backslash" in Ruby character class?
/'[^\]*'/.match("'some string \ hello'") => should be nil
Variant with two backslashed doesn't work
/'[^\\]*'/.match("'some string \ hello'") => 'some string \ hello' BUT should be nil
Your problem is not with your regex; you got that right. Your problem is that your test string does not have a backslash in it. It has an escaped space, instead. Try this:
str = "'some string \\ hello'"
puts str #=> 'some string \ hello'
p /'[^\\]*'/.match(str) #=> nil
You need to escape the backslash:
[^\\]*
because backslash is the escape character in regular expressions, thus escaping the closing bracket here.
If you want to verify that the whole string contains non-backslash characters, then you need anchors:
^[^\\]*$
There's actually not a backslash in your string to match against. Try taking a look at just your input:
"'some string \ hello'".length # => 20
"a\ b".length # => 3
The "\ " in double quotes is being escaped into just a space:
irb(main):014:0> " "[0].to_i # => 32
irb(main):015:0> "\ "[0].to_i # => 32
irb(main):016:0> "\ ".size #=> 1
If you want to match no slash, you'll need two, like in your second example, which looks good to me:
/'[^\\]*'/.match("'some string \\ hello'") # => nil
This is relevant to Csharp.. but idk it might help you.
[^\\\\]*
four slashes instead of two.
I don't understand what is going on here. How should I feed gsub to get the string "Yaho\'o"?
>> "Yaho'o".gsub("Y", "\\Y")
=> "\\Yaho'o"
>> "Yaho'o".gsub("'", "\\'")
=> "Yahooo"
\' means $' which is everything after the match.
Escape the \ again and it works
"Yaho'o".gsub("'", "\\\\'")
"Yaho'o".gsub("'", "\\\\'")
Because you're escaping the escape character as well as escaping the single quote.
This will also do it, and it's a bit more readable:
def escape_single_quotes(str)
str.gsub(/'/) { |x| "\\#{x}" }
end
If you want to escape both a single-quote and a backslash, so that you can embed that string in a double-quoted ruby string, then the following will do that for you:
def escape_single_quotes_and_backslash(str)
str.gsub(/\\|'/) { |x| "\\#{x}" }
end