Escaping '“' with regular double quotes using Ruby regex - ruby

I have text that has these fancy double quotes: '“' and I would like to replace them with regular double quotes using Ruby gsub and regex. Here's an example and what I have so far:
sentence = 'This is a quote, “Hey guys!”'
I couldn't figure out how to escape double quotes so I tried using 34.chr:
sentence.gsub("“",34.chr). This gets me close but leaves a back slash in front of the double quote:
sentence.gsub("“",34.chr) => 'This is a quote, \"Hey guys!”'

The backslashes are only showing up in irb due to the way it prints out the result of a statement. If you instead pass the gsubed string to another method such as puts, you'll see the "real" representation after escape sequences are translated.
1.9.0 > sentence = 'This is a quote, “Hey guys!”'
=> "This is a quote, \342\200\234Hey guys!\342\200\235"
1.9.0 > sentence.gsub('“', "'")
=> "This is a quote, 'Hey guys!\342\200\235"
1.9.0 > puts sentence.gsub('“', "'")
This is a quote, 'Hey guys!”
=> nil
Note also that after the output of puts, we see => nil indicating that the call to puts returned nil.
You probably noticed that the funny quote is still on the end of the output to puts: this is because the quotes are two different escape sequences, and we only named one. But we can take care of that with a regex in gsub:
1.9.0 > puts sentence.gsub(/(“|”)/, 34.chr)
This is a quote, "Hey guys!"
=> nil
Also, in many cases you can swap single quotes and double quotes in Ruby strings -- double quotes perform expansion while single quotes do not. Here are a couple ways you can get a string containing just a double quote:
1.9.0 > '"' == 34.chr
=> true
1.9.0 > %q{"} == 34.chr
=> true
1.9.0 > "\"" == 34.chr
=> true

Related

Ruby replace double blackslash with single blackslash

In Ruby I am trying to replace double backslash \\ with single backslash \ in string but it's not working.
This is my string
line = "this\\is\\line"
desired output
"this\is\line'
This is what i tried
line.gsub("\\", "\") # didn't work
line.gsub("\\", "/\") # didn't work
line.gsub("\\", '\') # didn't work with single quote as well
line.gsub("\\", '/\') # din't work with single quote as well
You are being tricked - it actually is working, but the console is displaying it escaped with \ in line. Use puts to actually see what it is being set without it being escaped with backslashes.
So #1: line = "this\\is\\line" is actually this\is\line. Proof:
irb(main):015:0> line = "this\\is\\line"
=> "this\\is\\line"
irb(main):016:0> puts line
this\is\line
So to actually make a string with double backslashes, you need: line = "this\\\\is\\\\line". Proof:
irb(main):017:0> line = "this\\\\is\\\\line"
=> "this\\\\is\\\\line"
irb(main):018:0> puts line
this\\is\\line
So finally, once you actually have a string with double backslashes, this is the gsub you want: line.gsub("\\\\", "\\")
irb(main):020:0> line = "this\\\\is\\\\line"
=> "this\\\\is\\\\line"
irb(main):021:0> puts line.gsub("\\\\", "\\")
this\is\line

ruby - weird duplication with backtick in gsub [duplicate]

s = "#main= 'quotes'
s.gsub "'", "\\'" # => "#main= quotes'quotes"
This seems to be wrong, I expect to get "#main= \\'quotes\\'"
when I don't use escape char, then it works as expected.
s.gsub "'", "*" # => "#main= *quotes*"
So there must be something to do with escaping.
Using ruby 1.9.2p290
I need to replace single quotes with back-slash and a quote.
Even more inconsistencies:
"\\'".length # => 2
"\\*".length # => 2
# As expected
"'".gsub("'", "\\*").length # => 2
"'a'".gsub("'", "\\*") # => "\\*a\\*" (length==5)
# WTF next:
"'".gsub("'", "\\'").length # => 0
# Doubling the content?
"'a'".gsub("'", "\\'") # => "a'a" (length==3)
What is going on here?
You're getting tripped up by the specialness of \' inside a regular expression replacement string:
\0, \1, \2, ... \9, \&, \`, \', \+
Substitutes the value matched by the nth grouped subexpression, or by the entire match, pre- or postmatch, or the highest group.
So when you say "\\'", the double \\ becomes just a single backslash and the result is \' but that means "The string to the right of the last successful match." If you want to replace single quotes with escaped single quotes, you need to escape more to get past the specialness of \':
s.gsub("'", "\\\\'")
Or avoid the toothpicks and use the block form:
s.gsub("'") { |m| '\\' + m }
You would run into similar issues if you were trying to escape backticks, a plus sign, or even a single digit.
The overall lesson here is to prefer the block form of gsub for anything but the most trivial of substitutions.
s = "#main = 'quotes'
s.gsub "'", "\\\\'"
Since \it's \\equivalent if you want to get a double backslash you have to put four of ones.
You need to escape the \ as well:
s.gsub "'", "\\\\'"
Outputs
"#main= \\'quotes\\'"
A good explanation found on an outside forum:
The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have [two] backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 = 4 backslashes on the screen for one literal
replacement backslash.
source

Replacing '&' with '\&' in Ruby using String#sub

I'm trying to replace every & in a string with \& using String#gsubin Ruby. What I see is confusing me as I was hoping to get milk \& honey:
irb(main):009:0> puts "milk & honey".sub(/&/,'\ &')
milk \ & honey
=> nil
irb(main):010:0> puts "milk & honey".sub(/&/,'\&')
milk & honey
=> nil
irb(main):011:0> puts "milk & honey".sub(/&/,'\\&')
milk & honey
=> nil
irb(main):012:0>
This is on Ruby 2.0.0p481 on OS X. (I was using String#sub above but plan to use String#gsub for the general case with more than one & in a string.)
When you pass a string as the replacement value to String#sub (or String#gsub), it is first scanned for backreferences to the original string. Of particular interest here, the sequence \& is replaced by whatever part of the string matched the whole regular expression:
puts "bar".gsub(/./, '\\&\\&') # => bbaarr
Note that, despite appearances, the Ruby string literal '\\&\\&' represents a string with only four characters, not six:
puts '\\&\\&' # => \&\&
That's because even single-quoted Ruby strings are subject to backslash-substitution, in order to allow the inclusion of single-quotes inside single-quoted strings. Only ' or another backslash itself trigger substitution; a backslash followed by anything else is taken as simply a literal backslash. That means that you can usually get literal backslashes without doubling them:
puts '\&\&' # still => \&\&
But that's a fiddly detail to rely on, as the next character could change the interpretation. The safest practice is doubling all backslashes that you want to appear literally in a string.
Now in this case, we want to somehow get a literal backslash-ampersand back out of sub. Fortunately, just like the Ruby string parser, sub allows us to use doubled backslashes to indicate that a backslash should be taken as literal instead of as the start of a backreference. We just need to double the backslash in the string that sub receives - which means doubling both of the backslashes in the string's literal representation, taking us to a total of four backslashes in that form:
puts "milk & honey".sub(/&/, '\\\\&')
You can get away with only three backslashes here if you like living dangerously. :)
Alternatively, you can avoid all the backslash-counting and use the block form, where the replacement is obtained by calling a block of code instead of parsing a static string. Since the block is free to do any sort of substitution or string munging it wants, its return value is not scanned for backslash substitutions like the string version is:
puts "milk & honey".sub(/&/) { '\\&' }
Or the "risky" version:
puts "milk & honey".sub(/&/) { '\&' }
Just triple the \:
puts "milk & honey".sub(/&/,'\\\&')
See the IDEONE demo
In Ruby regex, \& means the entire regex, that is why it should be escaped, and then we need to add the literal \. More patterns available are listed below:
\& (the entire regex)
\+ (the last group)
\` (pre-match string)
\' (post-match string)
\0 (same as \&)
\1 (first captured group)
\2 (second captured group)
\\ (a backslash)
Block representation is easier and more human-readable and maintainable:
puts "milk & honey".sub(/&/) { '\&' }

Triple single quote vs triple double quote in Ruby

Why might you use ''' instead of """, as in Learn Ruby the Hard Way, Chapter 10 Study Drills?
There are no triple quotes in Ruby.
Two String literals which are juxtaposed are parsed as a single String literal. So,
'Hello' 'World'
#=> "HelloWorld"
is the same as
'HelloWorld'
#=> "HelloWorld"
And
'' 'Hello' ''
#=> "Hello"
is the same as
'''Hello'''
#=> "Hello"
is the same as
'Hello'
#=> "Hello"
Since adding an empty string literal does not change the result, you can add as many empty strings as you want:
""""""""""""'''''Hello'''''''''
#=> "Hello"
There are no special rules for triple single quotes vs. triple double quotes, because there are no triple quotes. The rules are simply the same as for quotes.
I assume the author confused Ruby and Python, because a triple-quote will not work in Ruby the way author thought it would. It'll just work like three separate strings ('' '' '').
For multi-line strings one could use:
%q{
your text
goes here
}
=> "\n your text\n goes here\n "
or %Q{} if you need string interpolation inside.
Triple-quotes ''' are the same as single quotes ' in that they don't interpolate any #{} sequences, escape characters (like "\n"), etc.
Triple-double-quotes (ugh) """ are the same as double-quotes " in that they do interpolation and escape sequences.
This is further down on the same page you linked.
The triple-quoted versions """ ''' allows for multi-line strings... as does the singly-quoted ' and ", so I don't know why both are available.
In Ruby """ supports interpolation, ''' does not.
Rubyists use triple quotes for multi-line strings (similar to 'heredocs').
You could just as easily use one of these characters.
Just like normal strings the double quotes will allow you to use variables inside of your strings (also known as 'interpolation').
Save this to a file called multiline_example.rb and run it:
interpolation = "(but this one can use interpolation)"
single = '''
This is a multi-line string.
'''
double = """
This is also a multi-line string #{interpolation}.
"""
puts single
puts double
This is the output:
$ ruby multiline_string_example.rb
This is a multi-line string.
This is also a multi-line string (but this one can use interpolation).
$
Now try it the other way around:
nope = "(this will never get shown)"
single = '''
This is a multi-line string #{nope}.
'''
double = """
This is also a multi-line string.
"""
puts single
puts double
You'll get this output:
$ ruby multiline_example.rb
This is a multi-line string #{nope}.
This is also a multi-line string.
$
Note that in both examples you got some extra newlines in your output. That's because multiline strings keep any newlines inside them, and puts adds a newline to every string.

How to use escape characters in strings

I've been working my way through the Ruby Koans and am confused by the "escape clauses and single quoted strings" examples.
One example shows that I can't really use escape characters in this way, but immediately after, the following example is given:
def test_single_quotes_sometimes_interpret_escape_characters
string = '\\\''
assert_equal 2, string.size # <-- my answer is correct according to the program
assert_equal "\\'", string # <-- my answer is correct according to the program
end
This has confused me on two fronts:
Single quotes can sometimes be used with escape characters.
Why is the string size 2, when assert_equal is "\\\'"? (I personally thought the answer was "\'", which would make more sense with regards to size).
You can break your string into two pieces to clarify things:
string = '\\' + '\''
Each part is a string of length one; '\\' is the single character \ and '\'' is the single character '. When you put them together you get the two character string \'.
There are two characters that are special within a single quoted string literal: the backslash and the single quote itself. The single quote character is, of course, used to delimit the string so you need something special to get a single quote into a single quoted string, the something special is the backslash so '\'' is a single quoted string literal that represents a string containing one single quote character. Similarly, if you need to get a backslash into a single quoted string literal you escape it with another backslash so '\\' has length one and contains one backslash.
The single quote character has no special meaning within a double quoted string literal so you can say "'" without any difficulty. The backslash, however, does have a special meaning in double quoted strings so you have to say "\\" to get a single backslash into your double quoted string.
Consider your guess off "\'". The single quote has no special meaning within a double quoted string and escaping something that doesn't need escaping just gives you your something back; so, if c is a character that doesn't need to be escaped within a double quoted string, then \c will be just c. In particular, "\'" evaluates to "'" (i.e. one single quote within a double quoted string).
The result is that:
'\\\'' == "\\'"
"\\\"" == '\\"'
"\'" == '\''
"\'" == "'"
'\\\''.length == 2
"\\\"".length == 2
"\'".length == 1
"'".length == 1
The Wikibooks reference that Kassym gave covers these things.
I usually switch to %q{} (similar to single quoting) or %Q{} (similar to double quoting) when I need to get quotes into strings, all the backslashes make my eyes bleed.
This might be worth a read : http://en.wikibooks.org/wiki/Ruby_Programming/Strings
ruby-1.9.3-p0 :002 > a = '\\\''
=> "\\'"
ruby-1.9.3-p0 :003 > a.size
=> 2
ruby-1.9.3-p0 :004 > puts a
\'
In single quotes there are only two escape characters : \\ and \'.

Resources