Why do two strings separated by space concatenate in Ruby? - ruby

Why does this work in Ruby:
"foo" "bar"
# => "foobar"
I'm unsure as to why the strings were concatenated instead of a syntax error being given.
I'm curious as to whether or not this is expected behavior and whether or not it's something the parser is responsible for wrangling (two strings without operators is considered a single string) or the language definition itself is specifying this behavior (implicit concat).

In C and C++, string literals next to each other are concatenated. As these languages influenced Ruby, I'd guess it inherits from there.
And it is documented in Ruby now: see this answer and this page in the Ruby repo which states:
Adjacent string literals are automatically concatenated by the interpreter:
"con" "cat" "en" "at" "ion" #=> "concatenation"
"This string contains "\
"no newlines." #=> "This string contains no newlines."
Any combination of adjacent single-quote, double-quote, percent strings will be concatenated as long as a percent-string is not last.
%q{a} 'b' "c" #=> "abc"
"a" 'b' %q{c} #=> NameError: uninitialized constant q

Implementation details can be found in parse.y file in Ruby source code. Specifically, here.
A Ruby string is either a tCHAR (e.g. ?q), a string1 (e.g. "q", 'q', or %q{q}), or a recursive definition of the concatenation of string1 and string itself, which results in string expressions like "foo" "bar", 'foo' "bar" or ?f "oo" 'bar' being concatenated.

Related

What can a ruby symbol (syntax) contain?

I want to create regular expression to match ruby symbols, but I need to know what the exact syntax for a symbol is.
Until now I am aware of the following:
:'string'
:"string"
:__underline
:method
:exclamation!
:question?
:#instance
:$global
It's not entirely clear what you are talking about.
If you are talking about what a Symbol can contain, the answer is: anything and everything, including newlines, arbitrary whitespace, control characters, arbitrarily weird and obscure Unicode characters, and everything else.
If you are talking about the various ways of writing Symbol literals, here's my best understanding:
bare : literal: any valid Ruby identifier (e.g. :foo, :Foo, :#foo, :##foo, :$foo, :$:, …)
single-quoted : literal: everything that's valid in a single-quoted String literal, including escape sequences such as :'\'' and :'\\'
double-quoted : literal: everything that's valid in a double-quoted String literal, including escape sequences such as :"\"", :"\\", and :"\n", as well as string interpolation, which allows you to inject the results of arbitrary Ruby code into the Symbol, e.g. :"#{if rand < 0.5 then RUBY_VERSION else ENV['HOME'] end}"
single-quoted Array of Symbols literal: everything that's valid in a single-quoted Array of Strings literal, e.g. %i|foo bar baz| (equivalent to [:foo, :bar, :baz]), %i(foo\ bar baz) (equivalent to [:'foo bar', :baz]), %i:foo bar: (equivalent to [:foo, :bar])
double-quoted Array of Symbols literal: everything that's valid in a double-quoted Array of Strings literal, e.g. %I|foo #{bar} baz|, etc.
Symbol hash keys in the key: value syntax: every valid Ruby label, e.g. {foo: 42}
Symbol hash keys in the quoted 'key': value syntax: every valid Ruby String literal, including escape sequences and interpolation, e.g. {"foo\n#{bar}": 42}
There are of course a lot of other expressions that evaluate to Symbols:
method definition expressions: def foo;end # => :foo
String#to_sym (alias String#intern): 'foo bar'.to_sym # => :'foo bar'
really, any method that may return a Symbol
http://www.cse.buffalo.edu/~regan/cse305/RubyBNF.pdf enumerates the context-free grammar productions that define Ruby's syntax. CFGs are inherently more powerful than REs, so you might want to consider a different tool for this job--but you can certainly look at this document and try to construct a regexp that matches all cases.

How String concatenation works in ruby? [duplicate]

This question already has answers here:
Why do two strings separated by space concatenate in Ruby?
(2 answers)
Closed 8 years ago.
How the following line of code concatenates a string in ruby?
2.1.0 :052 > value = "Kamesh" "Waran"
=> "KameshWaran"
I understand that '+' is a method on String class which concatenates the strings passed. How the space(' ') can be the operator/method?
Can anyone elaborate how the space(' ') concatenate strings?
The space is not an operator. This only works for string literals, and is just part of the literal syntax, like the double-quotes. If you have two string literals with nothing but whitespace between them, they get turned into a single string. It's a convention borrowed from later versions of C.
irb(main):001:0> foobar = "foo" "bar"
=> "foobar"
irb(main):002:0> foo="foo"
=> "foo"
irb(main):003:0> bar="bar"
=> "bar"
irb(main):004:0> foo bar
NoMethodError: undefined method `foo' for main:Object
from (irb):4
from /usr/local/var/rbenv/versions/2.1.3/bin/irb:11:in `<main>'
irb(main):005:0>
If you do a search on this site you get an answer.
Found on :
Why do two strings separated by space concatenate in Ruby?
Implementation details can be found in parse.y file in Ruby source
code. Specifically, here.
A Ruby string is either a tCHAR (e.g. ?q), a string1 (e.g. "q", 'q',
or %q{q}), or a recursive definition of the concatenation of string1
and string itself, which results in string expressions like "foo"
"bar", 'foo' "bar" or ?f "oo" 'bar' being concatenated.

When to use %w?

The following two statements will generate the same result:
arr = %w(abc def ghi jkl)
and
arr = ["abc", "def", "ghi", "jkl"]
In which cases should %w be used?
In the case above, I want an array ["abc", "def", "ghi", "jkl"]. Which is the ideal way: the former (with %w) or the later?
When to use %w[...] vs. a regular array? I'm sure you can think up reasons simply by looking at the two, and then typing them in, and thinking about what you just did.
Use %w[...] when you have a list of single words you want to turn into an array. I use it when I have parameters I want to loop over, or commands I know I'll want to add to in the future, because %w[...] makes it easy to add new elements to the array. There's less visual noise in the definition of the array.
Use a regular array of strings when you have elements that have embedded white-space that would trick %w. Use it for arrays that have to contain elements that are not strings. Enclosing the elements inside " and ' with intervening commas causes visual-noise, but it also makes it possible to create arrays with any object type.
So, you pick when to use one or the other when it makes the most sense to you. It's called "programmer's choice".
As you correctly noted, they generate the same result. So, when deciding, choose one that produces simpler code. In this case, it's the %w operator. In the case of your previous question, it's the array literal.
Using %w allows you to avoid using quotes around strings.
Moreover, there are more shortcuts like these:
%W - double quotes
%r - regular expression
%q - single-quoted string
%Q - double-quoted string
%x - shell command
More information is available in "What does %w(array) mean?"
This is the way I remember it:
%Q/%q is for strings
%Q is for double-quoted strings (useful for when you have multiple quote characters in a string).
Instead of doing this:
“I said \“Hello World\””
You can do:
%Q{I said “Hello World”}
%q is for single-quoted strings (remember single quoted strings do not support string interpolation or escape sequences e.g. \n. And when I say does not "support", I mean that single quoted strings will need process the escape sequence as a special character, in other words, the escape sequence will just be part of the string literal)
Instead of doing this:
‘I said \’Hello World\’’
You can do:
%q{I said 'Hello World'}
But note that if you have an escape sequence in string, that will not be processed and instead treated as a literal backslash and n character:
result = %q{I said Hello World\n}
=> "I said Hello World\\n"
puts result
I said Hello World\n
Notice the literal \n was not treated as a line break, but it is with %Q:
result = %Q{I said Hello World\n}
=> "I said Hello World\n"
puts result
I said Hello World
%W/%w is for array elements
%W is used for double-quoted array elements. This means that it will support string interpolation and escape sequences:
Instead of doing this:
orange = "orange"
result = ["apple", "#{orange}", "grapes"]
=> ["apple", "orange", "grapes”]
you can do this:
result = %W(apple #{orange} grapes\n)
=> ["apple", "orange", "grapes\n"]
puts result
apple
orange
grapes
Notice the escape sequence \n caused a newline break after grapes. That would not happen with %w. %w is used for single-quoted array elements. And of course single quoted strings do not support interpolation and escape sequences.
Instead of doing this:
result = [‘a’, ‘b’, ‘c’]
you can do:
result = %w{a b c}
But look what happens when we try this:
result = %w{a b c\n}
=> ["a", "b", "c\\n"]
puts result
a
b
c\n
Remember do not confuse these constructs with %x (alternative for ` backtick which is used to run unix commands), %r (alternative for // regular expression syntax useful when you have a lot of / characters in your regular expressions and do not want to escape them) and finally %s (which is sued for symbols).

What does %w(array) mean?

I'm looking at the documentation for FileUtils.
I'm confused by the following line:
FileUtils.cp %w(cgi.rb complex.rb date.rb), '/usr/lib/ruby/1.6'
What does the %w mean? Can you point me to the documentation?
%w(foo bar) is a shortcut for ["foo", "bar"]. Meaning it's a notation to write an array of strings separated by spaces instead of commas and without quotes around them. You can find a list of ways of writing literals in zenspider's quickref.
I think of %w() as a "word array" - the elements are delimited by spaces and it returns an array of strings.
Here are all % literals:
%w() array of strings
%r() regular expression.
%q() string
%x() a shell command (returning the output string)
%i() array of symbols (Ruby >= 2.0.0)
%s() symbol
%() (without letter) shortcut for %Q()
The delimiters ( and ) can be replaced with a lot of variations, like [ and ], |, !, etc.
When using a capital letter %W() you can use string interpolation #{variable}, similar to the " and ' string delimiters. This rule works for all the other % literals as well.
abc = 'a b c'
%w[1 2#{abc} d] #=> ["1", "2\#{abc}", "d"]
%W[1 2#{abc} d] #=> ["1", "2a b c", "d"]
There is also %s that allows you to create any symbols, for example:
%s|some words| #Same as :'some words'
%s[other words] #Same as :'other words'
%s_last example_ #Same as :'last example'
Since Ruby 2.0.0 you also have:
%i( a b c ) # => [ :a, :b, :c ]
%i[ a b c ] # => [ :a, :b, :c ]
%i_ a b c _ # => [ :a, :b, :c ]
# etc...
%W and %w allow you to create an Array of strings without using quotes and commas.
Though it's an old post, the question keep coming up and the answers don't always seem clear to me, so, here's my thoughts:
%w and %W are examples of General Delimited Input types, that relate to Arrays. There are other types that include %q, %Q, %r, %x and %i.
The difference between the upper and lower case version is that it gives us access to the features of single and double quotes. With single quotes and (lowercase) %w, we have no code interpolation (#{someCode}) and a limited range of escape characters that work (\\, \n). With double quotes and (uppercase) %W we do have access to these features.
The delimiter used can be any character, not just the open parenthesis. Play with the examples above to see that in effect.
For a full write up with examples of %w and the full list, escape characters and delimiters, have a look at "Ruby - %w vs %W – secrets revealed!"
Instead of %w() we should use %w[]
According to Ruby style guide:
Prefer %w to the literal array syntax when you need an array of words (non-empty strings without spaces and special characters in them). Apply this rule only to arrays with two or more elements.
# bad
STATES = ['draft', 'open', 'closed']
# good
STATES = %w[draft open closed]
Use the braces that are the most appropriate for the various kinds of percent literals.
[] for array literals(%w, %i, %W, %I) as it is aligned with the standard array literals.
# bad
%w(one two three)
%i(one two three)
# good
%w[one two three]
%i[one two three]
For more read here.
Excerpted from the documentation for Percent Strings at http://ruby-doc.org/core/doc/syntax/literals_rdoc.html#label-Percent+Strings:
Besides %(...) which creates a String, the % may create other types of object. As with strings, an uppercase letter allows interpolation and escaped characters while a lowercase letter disables them.
These are the types of percent strings in ruby:
...
%w: Array of Strings
I was given a bunch of columns from a CSV spreadsheet of full names of users and I needed to keep the formatting, with spaces. The easiest way I found to get them in while using ruby was to do:
names = %(Porter Smith
Jimmy Jones
Ronald Jackson).split("\n")
This highlights that %() creates a string like "Porter Smith\nJimmyJones\nRonald Jackson" and to get the array you split the string on the "\n" ["Porter Smith", "Jimmy Jones", "Ronald Jackson"]
So to answer the OP's original question too, they could have wrote %(cgi\ spaeinfilename.rb;complex.rb;date.rb).split(';') if there happened to be space when you want the space to exist in the final array output.

What does the question mark at the end of a method name mean in Ruby?

What is the purpose of the question mark operator in Ruby?
Sometimes it appears like this:
assert !product.valid?
sometimes it's in an if construct.
It is a code style convention; it indicates that a method returns a boolean value (true or false) or an object to indicate a true value (or “truthy” value).
The question mark is a valid character at the end of a method name.
https://docs.ruby-lang.org/en/2.0.0/syntax/methods_rdoc.html#label-Method+Names
Also note ? along with a character acts as shorthand for a single-character string literal since Ruby 1.9.
For example:
?F # => is the same as "F"
This is referenced near the bottom of the string literals section of the ruby docs:
There is also a character literal notation to represent single
character strings, which syntax is a question mark (?) followed by a
single character or escape sequence that corresponds to a single
codepoint in the script encoding:
?a #=> "a"
?abc #=> SyntaxError
?\n #=> "\n"
?\s #=> " "
?\\ #=> "\\"
?\u{41} #=> "A"
?\C-a #=> "\x01"
?\M-a #=> "\xE1"
?\M-\C-a #=> "\x81"
?\C-\M-a #=> "\x81", same as above
?あ #=> "あ"
Prior to Ruby 1.9, this returned the ASCII character code of the character. To get the old behavior in modern Ruby, you can use the #ord method:
?F.ord # => will return 70
It's a convention in Ruby that methods that return boolean values end in a question mark. There's no more significance to it than that.
In your example it's just part of the method name. In Ruby you can also use exclamation points in method names!
Another example of question marks in Ruby would be the ternary operator.
customerName == "Fred" ? "Hello Fred" : "Who are you?"
It may be worth pointing out that ?s are only allowed in method names, not variables. In the process of learning Ruby, I assumed that ? designated a boolean return type so I tried adding them to flag variables, leading to errors. This led to me erroneously believing for a while that there was some special syntax involving ?s.
Relevant: Why can't a variable name end with `?` while a method name can?
In your example
product.valid?
Is actually a function call and calls a function named valid?. Certain types of "test for condition"/boolean functions have a question mark as part of the function name by convention.
I believe it's just a convention for things that are boolean. A bit like saying "IsValid".
It's also used in regular expressions, meaning "at most one repetition of the preceding character"
for example the regular expression /hey?/ matches with the strings "he" and "hey".
It's also a common convention to use with the first argument of the test method from Kernel#test
test ?d, "/dev" # directory exists?
# => true
test ?-, "/etc/hosts", "/etc/hosts" # are the files identical
# => true
as seen in this question here

Resources