I'm reviewing a line of Ruby code in a pull request. I'm not sure if this is a bug or a feature that I haven't seen before:
puts "A string of Ruby that"\
"continues on the next line"
Is the backslash a valid character to concatenate these strings? Or is this a bug?
That is valid code.
The backslash is a line continuation. Your code has two quoted runs of text; the runs appear like two strings, but are really just one string because Ruby concatenates whitespace-separated runs.
Example of three quoted runs of text that are really just one string:
"a" "b" "c"
=> "abc"
Example of three quoted runs of text that are really just one string, using \ line continuations:
"a" \
"b" \
"c"
=> "abc"
Example of three strings, using + line continuations and also concatenations:
"a" +
"b" +
"c"
=> "abc"
Other line continuation details: "Ruby interprets semicolons and newline characters as the ending of a statement. However, if Ruby encounters operators, such as +, -, or backslash at the end of a line, they indicate the continuation of a statement." - Ruby Quick Guide
The backslash character does not concatenate any strings. It prevents the line-break from meaning that those two lines are different statements. Think of the backslash as the opposite of the semicolon. The semicolon lets two statements occupy one line; the backslash lets one statement occupy two lines.
What you are not realizing is that a string literal can be written as multiple successive literals. This is legal Ruby:
s = "A string of Ruby that" "continues on the same line"
puts s
Since that is legal, it is legal to put a line break between the two string literals - but then you need the backslash, the line-continuation character, to tell Ruby that these are in fact the same statement, spread over two lines.
s = "A string of Ruby that" \
"continues on the same line"
puts s
If you omit the backslash, it is still legal, but doesn't give the result you might be hoping for; the string literal on the second line is simply thrown away.
This is not a case of concatenated strings. It is one single string. "foo" "bar" is a syntactic construct that allows you to break up a string in your code, but it is identical to "foobar". In contrast, "foo" + "bar" is the true concatenation, invoking the concatenation method + on object "foo".
You can verify this by dumping the YARV instructions. Compare:
RubyVM::InstructionSequence.compile('"foo" + "bar"').to_a
// .... [:putstring, "foo"], [:putstring, "bar"] ....
RubyVM::InstructionSequence.compile('"foo" "bar"').to_a
// .... [:putstring, "foobar"] ....
The backslash in front of the newline will cancel the newline, so it does not terminate the statement; without it, it would not be one string, but two strings in separate lines.
Related
I'm trying to replace every & in a string with \& using String#gsubin Ruby. What I see is confusing me as I was hoping to get milk \& honey:
irb(main):009:0> puts "milk & honey".sub(/&/,'\ &')
milk \ & honey
=> nil
irb(main):010:0> puts "milk & honey".sub(/&/,'\&')
milk & honey
=> nil
irb(main):011:0> puts "milk & honey".sub(/&/,'\\&')
milk & honey
=> nil
irb(main):012:0>
This is on Ruby 2.0.0p481 on OS X. (I was using String#sub above but plan to use String#gsub for the general case with more than one & in a string.)
When you pass a string as the replacement value to String#sub (or String#gsub), it is first scanned for backreferences to the original string. Of particular interest here, the sequence \& is replaced by whatever part of the string matched the whole regular expression:
puts "bar".gsub(/./, '\\&\\&') # => bbaarr
Note that, despite appearances, the Ruby string literal '\\&\\&' represents a string with only four characters, not six:
puts '\\&\\&' # => \&\&
That's because even single-quoted Ruby strings are subject to backslash-substitution, in order to allow the inclusion of single-quotes inside single-quoted strings. Only ' or another backslash itself trigger substitution; a backslash followed by anything else is taken as simply a literal backslash. That means that you can usually get literal backslashes without doubling them:
puts '\&\&' # still => \&\&
But that's a fiddly detail to rely on, as the next character could change the interpretation. The safest practice is doubling all backslashes that you want to appear literally in a string.
Now in this case, we want to somehow get a literal backslash-ampersand back out of sub. Fortunately, just like the Ruby string parser, sub allows us to use doubled backslashes to indicate that a backslash should be taken as literal instead of as the start of a backreference. We just need to double the backslash in the string that sub receives - which means doubling both of the backslashes in the string's literal representation, taking us to a total of four backslashes in that form:
puts "milk & honey".sub(/&/, '\\\\&')
You can get away with only three backslashes here if you like living dangerously. :)
Alternatively, you can avoid all the backslash-counting and use the block form, where the replacement is obtained by calling a block of code instead of parsing a static string. Since the block is free to do any sort of substitution or string munging it wants, its return value is not scanned for backslash substitutions like the string version is:
puts "milk & honey".sub(/&/) { '\\&' }
Or the "risky" version:
puts "milk & honey".sub(/&/) { '\&' }
Just triple the \:
puts "milk & honey".sub(/&/,'\\\&')
See the IDEONE demo
In Ruby regex, \& means the entire regex, that is why it should be escaped, and then we need to add the literal \. More patterns available are listed below:
\& (the entire regex)
\+ (the last group)
\` (pre-match string)
\' (post-match string)
\0 (same as \&)
\1 (first captured group)
\2 (second captured group)
\\ (a backslash)
Block representation is easier and more human-readable and maintainable:
puts "milk & honey".sub(/&/) { '\&' }
Why might you use ''' instead of """, as in Learn Ruby the Hard Way, Chapter 10 Study Drills?
There are no triple quotes in Ruby.
Two String literals which are juxtaposed are parsed as a single String literal. So,
'Hello' 'World'
#=> "HelloWorld"
is the same as
'HelloWorld'
#=> "HelloWorld"
And
'' 'Hello' ''
#=> "Hello"
is the same as
'''Hello'''
#=> "Hello"
is the same as
'Hello'
#=> "Hello"
Since adding an empty string literal does not change the result, you can add as many empty strings as you want:
""""""""""""'''''Hello'''''''''
#=> "Hello"
There are no special rules for triple single quotes vs. triple double quotes, because there are no triple quotes. The rules are simply the same as for quotes.
I assume the author confused Ruby and Python, because a triple-quote will not work in Ruby the way author thought it would. It'll just work like three separate strings ('' '' '').
For multi-line strings one could use:
%q{
your text
goes here
}
=> "\n your text\n goes here\n "
or %Q{} if you need string interpolation inside.
Triple-quotes ''' are the same as single quotes ' in that they don't interpolate any #{} sequences, escape characters (like "\n"), etc.
Triple-double-quotes (ugh) """ are the same as double-quotes " in that they do interpolation and escape sequences.
This is further down on the same page you linked.
The triple-quoted versions """ ''' allows for multi-line strings... as does the singly-quoted ' and ", so I don't know why both are available.
In Ruby """ supports interpolation, ''' does not.
Rubyists use triple quotes for multi-line strings (similar to 'heredocs').
You could just as easily use one of these characters.
Just like normal strings the double quotes will allow you to use variables inside of your strings (also known as 'interpolation').
Save this to a file called multiline_example.rb and run it:
interpolation = "(but this one can use interpolation)"
single = '''
This is a multi-line string.
'''
double = """
This is also a multi-line string #{interpolation}.
"""
puts single
puts double
This is the output:
$ ruby multiline_string_example.rb
This is a multi-line string.
This is also a multi-line string (but this one can use interpolation).
$
Now try it the other way around:
nope = "(this will never get shown)"
single = '''
This is a multi-line string #{nope}.
'''
double = """
This is also a multi-line string.
"""
puts single
puts double
You'll get this output:
$ ruby multiline_example.rb
This is a multi-line string #{nope}.
This is also a multi-line string.
$
Note that in both examples you got some extra newlines in your output. That's because multiline strings keep any newlines inside them, and puts adds a newline to every string.
I've recently been coding in Ruby and have come from Python, where single and double quotes made no difference to how the code worked as far as I know.
I moved to Ruby to see how it worked, and to investigate the similarities between Ruby and Python.
I was using single-quoted strings once and noticed this:
hello = 'hello'
x = '#{hello} world!'
puts x
It returned '#{hello} world!' rather than 'hello world!'.
After noticing this I tried double quotes and the problem was fixed. Now I'm not sure why that is.
Do single and double quotes change this or is it because of my editor (Sublime text 3)? I'm also using Ruby version 2.0 if it works differently in previous versions.
In Ruby, double quotes are interpolated, meaning the code in #{} is evaluated as Ruby. Single quotes are treated as literals (meaning the code isn't evaluated).
var = "hello"
"#{var} world" #=> "hello world"
'#{var} world' #=> "#{var} world"
For some extra-special magic, Ruby also offers another way to create strings:
%Q() # behaves like double quotes
%q() # behaves like single quotes
For example:
%Q(#{var} world) #=> "hello world"
%q(#{var} world) #=> "#{var} world"
You should read the Literals section of the official Ruby documentation.
It is very concise, so you need to read carefully. But it explains the difference between double-quoted and single-quoted strings, and how they are equivalent to %Q/.../ and %q/.../ respectively.
If you enclose Ruby string in single qoutes, you can't use interpolation. That's how Ruby works.
Single-quoted strings don't process escape sequence \ and they don't do string interpolation.
For a better understanding, take a look at String concatenation vs. interpolation
To answer your question, you have to use "" when you want to do string interpolation:
name = 'world'
puts "Hello #{name}" # => "Hello world"
Using escape sequence:
puts 'Hello\nworld' # => "Hello\nworld"
puts "Hello\nworld" # => "Hello
world"
Ruby supports single-quoted string, for many uses like as follow:
>> 'foo'
=> "foo"
>> 'foo' + 'bar'
=> "foobar"
In above example, those two types of strings are identical. We can use double quote in place of single quote and we will get same output like above example.
As you face problem, while using interpolation in single quoted string because Ruby do not interpolate into single-quoted string. I am taking one example for more understanding:
>> '#{foo} bar'
=> "\#{foo} bar"
Here you can see that return values using double-quoted strings, which requires backslash to escape special characters such as #.
Single quoted string often useful because they are truly literal.
In the string interpolation concept, the essential difference between using single or double quotes is that double quotes allow for escape sequences while single quotes do not.
Let's take an example:
name = "Mike"
puts "Hello #{name} \n How are you?"
The above ruby code with string interpolation will interpolate the variable called name which is written inside brackets with its original value which is Mike. And it will also print the string How are you? in a separate line since we already placed an escape sequence there.
Output:
Hello Mike
How are you?
If you do the same with single quotes, it will treat the entire string as a text and it will print as it is including the escape sequence as well.
name = Mike'
puts 'Hello #{name} \n How are you'?
Output:
Hello #{name} \n How are you?
I have the following hex as a string: "\xfe\xff". I'd like to convert this to "feff". How do I do this?
The closest I got was "\xfe\xff".inspect.gsub("\\x", ""), which returns "\"FEFF\"".
"\xfe\xff".unpack("H*").first
# => "feff"
You are dealing with what's called an escape sequence in your double quoted string. The most common escape sequence in a double quoted string is "\n", but ruby allows you to use other escape sequences in strings too. Your string, "\xfe\xff", contains two hex escape sequences, which are of the form:
\xNN
Escape sequences represent ONE character. When ruby processes the string, it notices the "\" and converts the whole hex escape sequence to one character. After ruby processes the string, there is no \x left anywhere in the string. Therefore, looking for a \x in the string is fruitless--it doesn't exist. The same is true for the characters 'f' and 'e' found in your escape sequences: they do not exist in the string after ruby processes the string.
Note that ruby processes hex escape sequences in double quoted strings only, so the type of string--double or single quoted--is entirely relevant. In a single quoted string, the series of characters '\xfe' is four characters long because there is no such thing as a hex escape sequence in a single quoted string:
str = "\xfe"
puts str.length #=>1
str = '\xfe'
puts str.length #=>4
Regexes behave like double quoted strings, so it is possible to use an entire escape sequence in a regex:
/\xfe/
When ruby processes the regex, then just like with a double quoted string, ruby converts the hex escape sequence to a single character. That allows you to search for the single character in a string containing the same hex escape sequence:
if "abc\xfe" =~ /\xfe/
If you pretend for a minute that the character ruby converts the escape sequence "\xfe" to is the character 'z', then that if statement is equivalent to:
if "abcz" =~ /z/
It's important to realize that the regex is not searching the string for a '\' followed by an 'x' followed by an 'f' followed by an 'e'. Those characters do not exist in the string.
The inspect() method allows you to see the escape sequences in a string by nullifying the escape sequences, like this:
str = "\\xfe\\xff"
puts str
--output:--
\xfe\xff
In a double quoted string, "\\" represents a literal backslash, while an escape sequence begins with only one slash.
Once you've nullified the escape sequences, then you can match the literal characters, like the two character sequence '\x'. But it's easier to just pick out the parts you want rather than matching the parts you don't want:
str = "\xfe\xff"
str = str.inspect #=> "\"\\xFE\\xFF\""
result = ""
str.scan /x(..)/ do |groups_arr|
result << groups_arr[0]
end
puts result.downcase
--output:--
feff
Here it is with gsub:
str = "\xfe\xff"
str = str.inspect #=>"\"\\xFE\\xFF\""
str.gsub!(/
"? #An optional quote mark
\\ #A literal '\'
x #An 'x'
(..) #Any two characters, captured in group 1
"? #An optional quote mark
/xm) do
Regexp.last_match(1)
end
puts str.downcase
--output:--
feff
Remember, a regex acts like a double quoted string, so to specify a literal \ in a regex, you have to write \\. However, in a regex you don't have to worry about a " being mistaken for the end of the regex, so you don't need to escape it, like you do in a double quoted string.
Just for fun:
str = "\xfe\xff"
result = ""
str.each_byte do |int_code|
result << sprintf('%x', int_code)
end
p result
--output:--
"feff"
Why are you calling inspect? That's adding the extra quotes..
Also, putting that in double quotes means the \x is interpolated. Put it in single quotes and everything should be good.
'\xfe\xff'.gsub("\\x","")
=> "feff"
The following two statements will generate the same result:
arr = %w(abc def ghi jkl)
and
arr = ["abc", "def", "ghi", "jkl"]
In which cases should %w be used?
In the case above, I want an array ["abc", "def", "ghi", "jkl"]. Which is the ideal way: the former (with %w) or the later?
When to use %w[...] vs. a regular array? I'm sure you can think up reasons simply by looking at the two, and then typing them in, and thinking about what you just did.
Use %w[...] when you have a list of single words you want to turn into an array. I use it when I have parameters I want to loop over, or commands I know I'll want to add to in the future, because %w[...] makes it easy to add new elements to the array. There's less visual noise in the definition of the array.
Use a regular array of strings when you have elements that have embedded white-space that would trick %w. Use it for arrays that have to contain elements that are not strings. Enclosing the elements inside " and ' with intervening commas causes visual-noise, but it also makes it possible to create arrays with any object type.
So, you pick when to use one or the other when it makes the most sense to you. It's called "programmer's choice".
As you correctly noted, they generate the same result. So, when deciding, choose one that produces simpler code. In this case, it's the %w operator. In the case of your previous question, it's the array literal.
Using %w allows you to avoid using quotes around strings.
Moreover, there are more shortcuts like these:
%W - double quotes
%r - regular expression
%q - single-quoted string
%Q - double-quoted string
%x - shell command
More information is available in "What does %w(array) mean?"
This is the way I remember it:
%Q/%q is for strings
%Q is for double-quoted strings (useful for when you have multiple quote characters in a string).
Instead of doing this:
“I said \“Hello World\””
You can do:
%Q{I said “Hello World”}
%q is for single-quoted strings (remember single quoted strings do not support string interpolation or escape sequences e.g. \n. And when I say does not "support", I mean that single quoted strings will need process the escape sequence as a special character, in other words, the escape sequence will just be part of the string literal)
Instead of doing this:
‘I said \’Hello World\’’
You can do:
%q{I said 'Hello World'}
But note that if you have an escape sequence in string, that will not be processed and instead treated as a literal backslash and n character:
result = %q{I said Hello World\n}
=> "I said Hello World\\n"
puts result
I said Hello World\n
Notice the literal \n was not treated as a line break, but it is with %Q:
result = %Q{I said Hello World\n}
=> "I said Hello World\n"
puts result
I said Hello World
%W/%w is for array elements
%W is used for double-quoted array elements. This means that it will support string interpolation and escape sequences:
Instead of doing this:
orange = "orange"
result = ["apple", "#{orange}", "grapes"]
=> ["apple", "orange", "grapes”]
you can do this:
result = %W(apple #{orange} grapes\n)
=> ["apple", "orange", "grapes\n"]
puts result
apple
orange
grapes
Notice the escape sequence \n caused a newline break after grapes. That would not happen with %w. %w is used for single-quoted array elements. And of course single quoted strings do not support interpolation and escape sequences.
Instead of doing this:
result = [‘a’, ‘b’, ‘c’]
you can do:
result = %w{a b c}
But look what happens when we try this:
result = %w{a b c\n}
=> ["a", "b", "c\\n"]
puts result
a
b
c\n
Remember do not confuse these constructs with %x (alternative for ` backtick which is used to run unix commands), %r (alternative for // regular expression syntax useful when you have a lot of / characters in your regular expressions and do not want to escape them) and finally %s (which is sued for symbols).