I'm trying to replace every & in a string with \& using String#gsubin Ruby. What I see is confusing me as I was hoping to get milk \& honey:
irb(main):009:0> puts "milk & honey".sub(/&/,'\ &')
milk \ & honey
=> nil
irb(main):010:0> puts "milk & honey".sub(/&/,'\&')
milk & honey
=> nil
irb(main):011:0> puts "milk & honey".sub(/&/,'\\&')
milk & honey
=> nil
irb(main):012:0>
This is on Ruby 2.0.0p481 on OS X. (I was using String#sub above but plan to use String#gsub for the general case with more than one & in a string.)
When you pass a string as the replacement value to String#sub (or String#gsub), it is first scanned for backreferences to the original string. Of particular interest here, the sequence \& is replaced by whatever part of the string matched the whole regular expression:
puts "bar".gsub(/./, '\\&\\&') # => bbaarr
Note that, despite appearances, the Ruby string literal '\\&\\&' represents a string with only four characters, not six:
puts '\\&\\&' # => \&\&
That's because even single-quoted Ruby strings are subject to backslash-substitution, in order to allow the inclusion of single-quotes inside single-quoted strings. Only ' or another backslash itself trigger substitution; a backslash followed by anything else is taken as simply a literal backslash. That means that you can usually get literal backslashes without doubling them:
puts '\&\&' # still => \&\&
But that's a fiddly detail to rely on, as the next character could change the interpretation. The safest practice is doubling all backslashes that you want to appear literally in a string.
Now in this case, we want to somehow get a literal backslash-ampersand back out of sub. Fortunately, just like the Ruby string parser, sub allows us to use doubled backslashes to indicate that a backslash should be taken as literal instead of as the start of a backreference. We just need to double the backslash in the string that sub receives - which means doubling both of the backslashes in the string's literal representation, taking us to a total of four backslashes in that form:
puts "milk & honey".sub(/&/, '\\\\&')
You can get away with only three backslashes here if you like living dangerously. :)
Alternatively, you can avoid all the backslash-counting and use the block form, where the replacement is obtained by calling a block of code instead of parsing a static string. Since the block is free to do any sort of substitution or string munging it wants, its return value is not scanned for backslash substitutions like the string version is:
puts "milk & honey".sub(/&/) { '\\&' }
Or the "risky" version:
puts "milk & honey".sub(/&/) { '\&' }
Just triple the \:
puts "milk & honey".sub(/&/,'\\\&')
See the IDEONE demo
In Ruby regex, \& means the entire regex, that is why it should be escaped, and then we need to add the literal \. More patterns available are listed below:
\& (the entire regex)
\+ (the last group)
\` (pre-match string)
\' (post-match string)
\0 (same as \&)
\1 (first captured group)
\2 (second captured group)
\\ (a backslash)
Block representation is easier and more human-readable and maintainable:
puts "milk & honey".sub(/&/) { '\&' }
Related
s = "#main= 'quotes'
s.gsub "'", "\\'" # => "#main= quotes'quotes"
This seems to be wrong, I expect to get "#main= \\'quotes\\'"
when I don't use escape char, then it works as expected.
s.gsub "'", "*" # => "#main= *quotes*"
So there must be something to do with escaping.
Using ruby 1.9.2p290
I need to replace single quotes with back-slash and a quote.
Even more inconsistencies:
"\\'".length # => 2
"\\*".length # => 2
# As expected
"'".gsub("'", "\\*").length # => 2
"'a'".gsub("'", "\\*") # => "\\*a\\*" (length==5)
# WTF next:
"'".gsub("'", "\\'").length # => 0
# Doubling the content?
"'a'".gsub("'", "\\'") # => "a'a" (length==3)
What is going on here?
You're getting tripped up by the specialness of \' inside a regular expression replacement string:
\0, \1, \2, ... \9, \&, \`, \', \+
Substitutes the value matched by the nth grouped subexpression, or by the entire match, pre- or postmatch, or the highest group.
So when you say "\\'", the double \\ becomes just a single backslash and the result is \' but that means "The string to the right of the last successful match." If you want to replace single quotes with escaped single quotes, you need to escape more to get past the specialness of \':
s.gsub("'", "\\\\'")
Or avoid the toothpicks and use the block form:
s.gsub("'") { |m| '\\' + m }
You would run into similar issues if you were trying to escape backticks, a plus sign, or even a single digit.
The overall lesson here is to prefer the block form of gsub for anything but the most trivial of substitutions.
s = "#main = 'quotes'
s.gsub "'", "\\\\'"
Since \it's \\equivalent if you want to get a double backslash you have to put four of ones.
You need to escape the \ as well:
s.gsub "'", "\\\\'"
Outputs
"#main= \\'quotes\\'"
A good explanation found on an outside forum:
The key point to understand IMHO is that a backslash is special in
replacement strings. So, whenever one wants to have a literal
backslash in a replacement string one needs to escape it and hence
have [two] backslashes. Coincidentally a backslash is also special in a
string (even in a single quoted string). So you need two levels of
escaping, makes 2 * 2 = 4 backslashes on the screen for one literal
replacement backslash.
source
I'm reviewing a line of Ruby code in a pull request. I'm not sure if this is a bug or a feature that I haven't seen before:
puts "A string of Ruby that"\
"continues on the next line"
Is the backslash a valid character to concatenate these strings? Or is this a bug?
That is valid code.
The backslash is a line continuation. Your code has two quoted runs of text; the runs appear like two strings, but are really just one string because Ruby concatenates whitespace-separated runs.
Example of three quoted runs of text that are really just one string:
"a" "b" "c"
=> "abc"
Example of three quoted runs of text that are really just one string, using \ line continuations:
"a" \
"b" \
"c"
=> "abc"
Example of three strings, using + line continuations and also concatenations:
"a" +
"b" +
"c"
=> "abc"
Other line continuation details: "Ruby interprets semicolons and newline characters as the ending of a statement. However, if Ruby encounters operators, such as +, -, or backslash at the end of a line, they indicate the continuation of a statement." - Ruby Quick Guide
The backslash character does not concatenate any strings. It prevents the line-break from meaning that those two lines are different statements. Think of the backslash as the opposite of the semicolon. The semicolon lets two statements occupy one line; the backslash lets one statement occupy two lines.
What you are not realizing is that a string literal can be written as multiple successive literals. This is legal Ruby:
s = "A string of Ruby that" "continues on the same line"
puts s
Since that is legal, it is legal to put a line break between the two string literals - but then you need the backslash, the line-continuation character, to tell Ruby that these are in fact the same statement, spread over two lines.
s = "A string of Ruby that" \
"continues on the same line"
puts s
If you omit the backslash, it is still legal, but doesn't give the result you might be hoping for; the string literal on the second line is simply thrown away.
This is not a case of concatenated strings. It is one single string. "foo" "bar" is a syntactic construct that allows you to break up a string in your code, but it is identical to "foobar". In contrast, "foo" + "bar" is the true concatenation, invoking the concatenation method + on object "foo".
You can verify this by dumping the YARV instructions. Compare:
RubyVM::InstructionSequence.compile('"foo" + "bar"').to_a
// .... [:putstring, "foo"], [:putstring, "bar"] ....
RubyVM::InstructionSequence.compile('"foo" "bar"').to_a
// .... [:putstring, "foobar"] ....
The backslash in front of the newline will cancel the newline, so it does not terminate the statement; without it, it would not be one string, but two strings in separate lines.
I have the following hex as a string: "\xfe\xff". I'd like to convert this to "feff". How do I do this?
The closest I got was "\xfe\xff".inspect.gsub("\\x", ""), which returns "\"FEFF\"".
"\xfe\xff".unpack("H*").first
# => "feff"
You are dealing with what's called an escape sequence in your double quoted string. The most common escape sequence in a double quoted string is "\n", but ruby allows you to use other escape sequences in strings too. Your string, "\xfe\xff", contains two hex escape sequences, which are of the form:
\xNN
Escape sequences represent ONE character. When ruby processes the string, it notices the "\" and converts the whole hex escape sequence to one character. After ruby processes the string, there is no \x left anywhere in the string. Therefore, looking for a \x in the string is fruitless--it doesn't exist. The same is true for the characters 'f' and 'e' found in your escape sequences: they do not exist in the string after ruby processes the string.
Note that ruby processes hex escape sequences in double quoted strings only, so the type of string--double or single quoted--is entirely relevant. In a single quoted string, the series of characters '\xfe' is four characters long because there is no such thing as a hex escape sequence in a single quoted string:
str = "\xfe"
puts str.length #=>1
str = '\xfe'
puts str.length #=>4
Regexes behave like double quoted strings, so it is possible to use an entire escape sequence in a regex:
/\xfe/
When ruby processes the regex, then just like with a double quoted string, ruby converts the hex escape sequence to a single character. That allows you to search for the single character in a string containing the same hex escape sequence:
if "abc\xfe" =~ /\xfe/
If you pretend for a minute that the character ruby converts the escape sequence "\xfe" to is the character 'z', then that if statement is equivalent to:
if "abcz" =~ /z/
It's important to realize that the regex is not searching the string for a '\' followed by an 'x' followed by an 'f' followed by an 'e'. Those characters do not exist in the string.
The inspect() method allows you to see the escape sequences in a string by nullifying the escape sequences, like this:
str = "\\xfe\\xff"
puts str
--output:--
\xfe\xff
In a double quoted string, "\\" represents a literal backslash, while an escape sequence begins with only one slash.
Once you've nullified the escape sequences, then you can match the literal characters, like the two character sequence '\x'. But it's easier to just pick out the parts you want rather than matching the parts you don't want:
str = "\xfe\xff"
str = str.inspect #=> "\"\\xFE\\xFF\""
result = ""
str.scan /x(..)/ do |groups_arr|
result << groups_arr[0]
end
puts result.downcase
--output:--
feff
Here it is with gsub:
str = "\xfe\xff"
str = str.inspect #=>"\"\\xFE\\xFF\""
str.gsub!(/
"? #An optional quote mark
\\ #A literal '\'
x #An 'x'
(..) #Any two characters, captured in group 1
"? #An optional quote mark
/xm) do
Regexp.last_match(1)
end
puts str.downcase
--output:--
feff
Remember, a regex acts like a double quoted string, so to specify a literal \ in a regex, you have to write \\. However, in a regex you don't have to worry about a " being mistaken for the end of the regex, so you don't need to escape it, like you do in a double quoted string.
Just for fun:
str = "\xfe\xff"
result = ""
str.each_byte do |int_code|
result << sprintf('%x', int_code)
end
p result
--output:--
"feff"
Why are you calling inspect? That's adding the extra quotes..
Also, putting that in double quotes means the \x is interpolated. Put it in single quotes and everything should be good.
'\xfe\xff'.gsub("\\x","")
=> "feff"
I'm trying to build a regex from a string object, which happens to be stored in a variable.
The problem I'm facing is that escaped sequences (in the string) such "\d" doesn't make to the resulting regex.
Regexp.new("\d") => /d/
If I use single quotes, tough, it works flawless.
Regexp.new('\d') => /\d/
But, as my string is stored in a variable, I always get the double-quoted string.
Is there a way to turn a double-quoted string to single-quoted string, so that I could use in the Regexp constructor ?
(I'd like to use the string interpolation feature of the double quotes)
ex.:
email_pattern = "/[a-z]*\.com"
whole_pattern = "to: #{email_pattern}"
Regexp.new(whole_pattern)
For better readability, I'd like to avoid escaping escape characters.
"\\d"
The problem is, that you end up with completely different strings, depending on whether you use single or double quotes:
"\d".chars.to_a
#=> ["d"]
'\d'.chars.to_a
#=> ["\\", "d"]
so when you are using double quotes, the single \ is immediately lost and cannot be recovered by definition, for example:
"\d" == "d"
#=> true
so you can never know what the string contained before the escaping took place. As #FrankSchmitt suggested, use the double backslash or stick with single quotes. There's no other way.
There's an option, though. You can define your regex parts as regexes themselves, instead of strings. They behave exactly as expected:
regex1 = /\d/
#=> /\d/
regex2 = /foobar/
#=> /foobar/
Then, you can build your final regex with #{}-style interpolation, instead of building the regex source from strings:
regex3 = /#{regex1} #{regex2}/
#=> /(?-mix:\d) (?-mix:foobar)/
Reflecting your example this would translate to:
email_regex = /[a-z]*\.com/
whole_regex = /to: #{email_regex}/
#=> /to: (?-mix:[a-z]*\.com)/
You may also find Regexp#escape interesting. (see the docs)
If you run into further escaping problems (with the slashes), you can also use the alternative Regexp literal syntax with %r{<your regex here>}, in which you do not need to escape the / character. For example:
%r{/}
#=> /\//
There's no getting around escaping the backslash \ with \\, though.
Either create your string with single quotes:
s = '\d'
r = Regexp.new(s)
or quote the backslash:
s = "\\d"
r = Regexp.new(s)
Both should work.
I've been working my way through the Ruby Koans and am confused by the "escape clauses and single quoted strings" examples.
One example shows that I can't really use escape characters in this way, but immediately after, the following example is given:
def test_single_quotes_sometimes_interpret_escape_characters
string = '\\\''
assert_equal 2, string.size # <-- my answer is correct according to the program
assert_equal "\\'", string # <-- my answer is correct according to the program
end
This has confused me on two fronts:
Single quotes can sometimes be used with escape characters.
Why is the string size 2, when assert_equal is "\\\'"? (I personally thought the answer was "\'", which would make more sense with regards to size).
You can break your string into two pieces to clarify things:
string = '\\' + '\''
Each part is a string of length one; '\\' is the single character \ and '\'' is the single character '. When you put them together you get the two character string \'.
There are two characters that are special within a single quoted string literal: the backslash and the single quote itself. The single quote character is, of course, used to delimit the string so you need something special to get a single quote into a single quoted string, the something special is the backslash so '\'' is a single quoted string literal that represents a string containing one single quote character. Similarly, if you need to get a backslash into a single quoted string literal you escape it with another backslash so '\\' has length one and contains one backslash.
The single quote character has no special meaning within a double quoted string literal so you can say "'" without any difficulty. The backslash, however, does have a special meaning in double quoted strings so you have to say "\\" to get a single backslash into your double quoted string.
Consider your guess off "\'". The single quote has no special meaning within a double quoted string and escaping something that doesn't need escaping just gives you your something back; so, if c is a character that doesn't need to be escaped within a double quoted string, then \c will be just c. In particular, "\'" evaluates to "'" (i.e. one single quote within a double quoted string).
The result is that:
'\\\'' == "\\'"
"\\\"" == '\\"'
"\'" == '\''
"\'" == "'"
'\\\''.length == 2
"\\\"".length == 2
"\'".length == 1
"'".length == 1
The Wikibooks reference that Kassym gave covers these things.
I usually switch to %q{} (similar to single quoting) or %Q{} (similar to double quoting) when I need to get quotes into strings, all the backslashes make my eyes bleed.
This might be worth a read : http://en.wikibooks.org/wiki/Ruby_Programming/Strings
ruby-1.9.3-p0 :002 > a = '\\\''
=> "\\'"
ruby-1.9.3-p0 :003 > a.size
=> 2
ruby-1.9.3-p0 :004 > puts a
\'
In single quotes there are only two escape characters : \\ and \'.