How to use escape characters in strings - ruby

I've been working my way through the Ruby Koans and am confused by the "escape clauses and single quoted strings" examples.
One example shows that I can't really use escape characters in this way, but immediately after, the following example is given:
def test_single_quotes_sometimes_interpret_escape_characters
string = '\\\''
assert_equal 2, string.size # <-- my answer is correct according to the program
assert_equal "\\'", string # <-- my answer is correct according to the program
end
This has confused me on two fronts:
Single quotes can sometimes be used with escape characters.
Why is the string size 2, when assert_equal is "\\\'"? (I personally thought the answer was "\'", which would make more sense with regards to size).

You can break your string into two pieces to clarify things:
string = '\\' + '\''
Each part is a string of length one; '\\' is the single character \ and '\'' is the single character '. When you put them together you get the two character string \'.
There are two characters that are special within a single quoted string literal: the backslash and the single quote itself. The single quote character is, of course, used to delimit the string so you need something special to get a single quote into a single quoted string, the something special is the backslash so '\'' is a single quoted string literal that represents a string containing one single quote character. Similarly, if you need to get a backslash into a single quoted string literal you escape it with another backslash so '\\' has length one and contains one backslash.
The single quote character has no special meaning within a double quoted string literal so you can say "'" without any difficulty. The backslash, however, does have a special meaning in double quoted strings so you have to say "\\" to get a single backslash into your double quoted string.
Consider your guess off "\'". The single quote has no special meaning within a double quoted string and escaping something that doesn't need escaping just gives you your something back; so, if c is a character that doesn't need to be escaped within a double quoted string, then \c will be just c. In particular, "\'" evaluates to "'" (i.e. one single quote within a double quoted string).
The result is that:
'\\\'' == "\\'"
"\\\"" == '\\"'
"\'" == '\''
"\'" == "'"
'\\\''.length == 2
"\\\"".length == 2
"\'".length == 1
"'".length == 1
The Wikibooks reference that Kassym gave covers these things.
I usually switch to %q{} (similar to single quoting) or %Q{} (similar to double quoting) when I need to get quotes into strings, all the backslashes make my eyes bleed.

This might be worth a read : http://en.wikibooks.org/wiki/Ruby_Programming/Strings
ruby-1.9.3-p0 :002 > a = '\\\''
=> "\\'"
ruby-1.9.3-p0 :003 > a.size
=> 2
ruby-1.9.3-p0 :004 > puts a
\'
In single quotes there are only two escape characters : \\ and \'.

Related

Regex match number of backslashes but not more than x number

Im trying to match double back slashes in a string, but only when there is 2 and not 3 so I can swap out the 2 for 3.
I know that \\{2} will match double back slash except it will also match the first 2 slashes when 3 are present.
For example in the string
{"files":{"windows": {"%windir%\\\System32\\drivers\\etc\\lmhosts.sam":{"ignore":{"id":32}},"%windir%\\System32\\\drivers\\etc":{"ignore":{"id":32}},"%windir%\\System32\\drivers\\etc\\hosts":{"ignore":{"id":32}}}}}
There are multiple double slashes that I wish to match and replace, but there are also a few triple slashes which I wish to leave alone.
So, my question, how do match the double slash when it does not sit adjacent to another slash?
Heres a Regex101 link to toy with.
https://regex101.com/r/kWIscW/1
Also, doing this in Ruby.
How about:
\b\\{2}\b
To define that you \\ are the only one characters evaluated
Another possibility to is looking behind and look ahead, however, not sure your regex engine supports it:
(?<=[^\\])\\{2}(?=[^\\])
r = /
(?<!\\) # do not match a backslash, negative lookbehind
\\\\ # match two backslashes
(?!\\) # do not match a backslash, negative lookahead
/x # free-spacing regex definition mode
str = "\\\\\ are two backslashes and here are three \\\\\\ of 'em"
puts str
# \\ are two backslashes and here are three \\\ of 'em
str.scan(r)
#=> ["\\\\"]
Note that s = "\\\\\ " is two backslashes followed by an escaped space.
s.size
#=> 3
s[0].ord
#=> 92
92.chr
#=> "\\"
s[1].ord
#=> 92
s[2].ord
#=> 32
Let's first address backslashes in string literals,
"\\" is one backslash
"\\\\" are two backslashes
"\\\\\\" are three backslashes
Why? Backslash is the escape sequence in string literals, eg "\n" is a linebreak, and hence a backslash must be escaped with a backslash to encode one backslash.
Now, try this
string = "\\\\aaa\\\\bbb\\\\\\ccc"
string.gsub(/\\+/) { |match| match.size == 2 ? '/' : match }
# => "/aaa/bbb\\\\\\ccc"
How does this work?
/\\+/ matches any sequence of backslashes
match.size == 2 filters those that have length 2
And then we just replace those

ruby - weird duplication with backtick in gsub [duplicate]

s = "#main= 'quotes'
s.gsub "'", "\\'" # => "#main= quotes'quotes"
This seems to be wrong, I expect to get "#main= \\'quotes\\'"
when I don't use escape char, then it works as expected.
s.gsub "'", "*" # => "#main= *quotes*"
So there must be something to do with escaping.
Using ruby 1.9.2p290
I need to replace single quotes with back-slash and a quote.
Even more inconsistencies:
"\\'".length # => 2
"\\*".length # => 2
# As expected
"'".gsub("'", "\\*").length # => 2
"'a'".gsub("'", "\\*") # => "\\*a\\*" (length==5)
# WTF next:
"'".gsub("'", "\\'").length # => 0
# Doubling the content?
"'a'".gsub("'", "\\'") # => "a'a" (length==3)
What is going on here?
You're getting tripped up by the specialness of \' inside a regular expression replacement string:
\0, \1, \2, ... \9, \&, \`, \', \+
Substitutes the value matched by the nth grouped subexpression, or by the entire match, pre- or postmatch, or the highest group.
So when you say "\\'", the double \\ becomes just a single backslash and the result is \' but that means "The string to the right of the last successful match." If you want to replace single quotes with escaped single quotes, you need to escape more to get past the specialness of \':
s.gsub("'", "\\\\'")
Or avoid the toothpicks and use the block form:
s.gsub("'") { |m| '\\' + m }
You would run into similar issues if you were trying to escape backticks, a plus sign, or even a single digit.
The overall lesson here is to prefer the block form of gsub for anything but the most trivial of substitutions.
s = "#main = 'quotes'
s.gsub "'", "\\\\'"
Since \it's \\equivalent if you want to get a double backslash you have to put four of ones.
You need to escape the \ as well:
s.gsub "'", "\\\\'"
Outputs
"#main= \\'quotes\\'"
A good explanation found on an outside forum:
The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have [two] backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 = 4 backslashes on the screen for one literal
replacement backslash.
source

Replacing '&' with '\&' in Ruby using String#sub

I'm trying to replace every & in a string with \& using String#gsubin Ruby. What I see is confusing me as I was hoping to get milk \& honey:
irb(main):009:0> puts "milk & honey".sub(/&/,'\ &')
milk \ & honey
=> nil
irb(main):010:0> puts "milk & honey".sub(/&/,'\&')
milk & honey
=> nil
irb(main):011:0> puts "milk & honey".sub(/&/,'\\&')
milk & honey
=> nil
irb(main):012:0>
This is on Ruby 2.0.0p481 on OS X. (I was using String#sub above but plan to use String#gsub for the general case with more than one & in a string.)
When you pass a string as the replacement value to String#sub (or String#gsub), it is first scanned for backreferences to the original string. Of particular interest here, the sequence \& is replaced by whatever part of the string matched the whole regular expression:
puts "bar".gsub(/./, '\\&\\&') # => bbaarr
Note that, despite appearances, the Ruby string literal '\\&\\&' represents a string with only four characters, not six:
puts '\\&\\&' # => \&\&
That's because even single-quoted Ruby strings are subject to backslash-substitution, in order to allow the inclusion of single-quotes inside single-quoted strings. Only ' or another backslash itself trigger substitution; a backslash followed by anything else is taken as simply a literal backslash. That means that you can usually get literal backslashes without doubling them:
puts '\&\&' # still => \&\&
But that's a fiddly detail to rely on, as the next character could change the interpretation. The safest practice is doubling all backslashes that you want to appear literally in a string.
Now in this case, we want to somehow get a literal backslash-ampersand back out of sub. Fortunately, just like the Ruby string parser, sub allows us to use doubled backslashes to indicate that a backslash should be taken as literal instead of as the start of a backreference. We just need to double the backslash in the string that sub receives - which means doubling both of the backslashes in the string's literal representation, taking us to a total of four backslashes in that form:
puts "milk & honey".sub(/&/, '\\\\&')
You can get away with only three backslashes here if you like living dangerously. :)
Alternatively, you can avoid all the backslash-counting and use the block form, where the replacement is obtained by calling a block of code instead of parsing a static string. Since the block is free to do any sort of substitution or string munging it wants, its return value is not scanned for backslash substitutions like the string version is:
puts "milk & honey".sub(/&/) { '\\&' }
Or the "risky" version:
puts "milk & honey".sub(/&/) { '\&' }
Just triple the \:
puts "milk & honey".sub(/&/,'\\\&')
See the IDEONE demo
In Ruby regex, \& means the entire regex, that is why it should be escaped, and then we need to add the literal \. More patterns available are listed below:
\& (the entire regex)
\+ (the last group)
\` (pre-match string)
\' (post-match string)
\0 (same as \&)
\1 (first captured group)
\2 (second captured group)
\\ (a backslash)
Block representation is easier and more human-readable and maintainable:
puts "milk & honey".sub(/&/) { '\&' }

Remove hex escape from string

I have the following hex as a string: "\xfe\xff". I'd like to convert this to "feff". How do I do this?
The closest I got was "\xfe\xff".inspect.gsub("\\x", ""), which returns "\"FEFF\"".
"\xfe\xff".unpack("H*").first
# => "feff"
You are dealing with what's called an escape sequence in your double quoted string. The most common escape sequence in a double quoted string is "\n", but ruby allows you to use other escape sequences in strings too. Your string, "\xfe\xff", contains two hex escape sequences, which are of the form:
\xNN
Escape sequences represent ONE character. When ruby processes the string, it notices the "\" and converts the whole hex escape sequence to one character. After ruby processes the string, there is no \x left anywhere in the string. Therefore, looking for a \x in the string is fruitless--it doesn't exist. The same is true for the characters 'f' and 'e' found in your escape sequences: they do not exist in the string after ruby processes the string.
Note that ruby processes hex escape sequences in double quoted strings only, so the type of string--double or single quoted--is entirely relevant. In a single quoted string, the series of characters '\xfe' is four characters long because there is no such thing as a hex escape sequence in a single quoted string:
str = "\xfe"
puts str.length #=>1
str = '\xfe'
puts str.length #=>4
Regexes behave like double quoted strings, so it is possible to use an entire escape sequence in a regex:
/\xfe/
When ruby processes the regex, then just like with a double quoted string, ruby converts the hex escape sequence to a single character. That allows you to search for the single character in a string containing the same hex escape sequence:
if "abc\xfe" =~ /\xfe/
If you pretend for a minute that the character ruby converts the escape sequence "\xfe" to is the character 'z', then that if statement is equivalent to:
if "abcz" =~ /z/
It's important to realize that the regex is not searching the string for a '\' followed by an 'x' followed by an 'f' followed by an 'e'. Those characters do not exist in the string.
The inspect() method allows you to see the escape sequences in a string by nullifying the escape sequences, like this:
str = "\\xfe\\xff"
puts str
--output:--
\xfe\xff
In a double quoted string, "\\" represents a literal backslash, while an escape sequence begins with only one slash.
Once you've nullified the escape sequences, then you can match the literal characters, like the two character sequence '\x'. But it's easier to just pick out the parts you want rather than matching the parts you don't want:
str = "\xfe\xff"
str = str.inspect #=> "\"\\xFE\\xFF\""
result = ""
str.scan /x(..)/ do |groups_arr|
result << groups_arr[0]
end
puts result.downcase
--output:--
feff
Here it is with gsub:
str = "\xfe\xff"
str = str.inspect #=>"\"\\xFE\\xFF\""
str.gsub!(/
"? #An optional quote mark
\\ #A literal '\'
x #An 'x'
(..) #Any two characters, captured in group 1
"? #An optional quote mark
/xm) do
Regexp.last_match(1)
end
puts str.downcase
--output:--
feff
Remember, a regex acts like a double quoted string, so to specify a literal \ in a regex, you have to write \\. However, in a regex you don't have to worry about a " being mistaken for the end of the regex, so you don't need to escape it, like you do in a double quoted string.
Just for fun:
str = "\xfe\xff"
result = ""
str.each_byte do |int_code|
result << sprintf('%x', int_code)
end
p result
--output:--
"feff"
Why are you calling inspect? That's adding the extra quotes..
Also, putting that in double quotes means the \x is interpolated. Put it in single quotes and everything should be good.
'\xfe\xff'.gsub("\\x","")
=> "feff"

Create Regex from String stored in variable with escaped characters (Ruby)

I'm trying to build a regex from a string object, which happens to be stored in a variable.
The problem I'm facing is that escaped sequences (in the string) such "\d" doesn't make to the resulting regex.
Regexp.new("\d") => /d/
If I use single quotes, tough, it works flawless.
Regexp.new('\d') => /\d/
But, as my string is stored in a variable, I always get the double-quoted string.
Is there a way to turn a double-quoted string to single-quoted string, so that I could use in the Regexp constructor ?
(I'd like to use the string interpolation feature of the double quotes)
ex.:
email_pattern = "/[a-z]*\.com"
whole_pattern = "to: #{email_pattern}"
Regexp.new(whole_pattern)
For better readability, I'd like to avoid escaping escape characters.
"\\d"
The problem is, that you end up with completely different strings, depending on whether you use single or double quotes:
"\d".chars.to_a
#=> ["d"]
'\d'.chars.to_a
#=> ["\\", "d"]
so when you are using double quotes, the single \ is immediately lost and cannot be recovered by definition, for example:
"\d" == "d"
#=> true
so you can never know what the string contained before the escaping took place. As #FrankSchmitt suggested, use the double backslash or stick with single quotes. There's no other way.
There's an option, though. You can define your regex parts as regexes themselves, instead of strings. They behave exactly as expected:
regex1 = /\d/
#=> /\d/
regex2 = /foobar/
#=> /foobar/
Then, you can build your final regex with #{}-style interpolation, instead of building the regex source from strings:
regex3 = /#{regex1} #{regex2}/
#=> /(?-mix:\d) (?-mix:foobar)/
Reflecting your example this would translate to:
email_regex = /[a-z]*\.com/
whole_regex = /to: #{email_regex}/
#=> /to: (?-mix:[a-z]*\.com)/
You may also find Regexp#escape interesting. (see the docs)
If you run into further escaping problems (with the slashes), you can also use the alternative Regexp literal syntax with %r{<your regex here>}, in which you do not need to escape the / character. For example:
%r{/}
#=> /\//
There's no getting around escaping the backslash \ with \\, though.
Either create your string with single quotes:
s = '\d'
r = Regexp.new(s)
or quote the backslash:
s = "\\d"
r = Regexp.new(s)
Both should work.

Resources