When to use %w? - ruby

The following two statements will generate the same result:
arr = %w(abc def ghi jkl)
and
arr = ["abc", "def", "ghi", "jkl"]
In which cases should %w be used?
In the case above, I want an array ["abc", "def", "ghi", "jkl"]. Which is the ideal way: the former (with %w) or the later?

When to use %w[...] vs. a regular array? I'm sure you can think up reasons simply by looking at the two, and then typing them in, and thinking about what you just did.
Use %w[...] when you have a list of single words you want to turn into an array. I use it when I have parameters I want to loop over, or commands I know I'll want to add to in the future, because %w[...] makes it easy to add new elements to the array. There's less visual noise in the definition of the array.
Use a regular array of strings when you have elements that have embedded white-space that would trick %w. Use it for arrays that have to contain elements that are not strings. Enclosing the elements inside " and ' with intervening commas causes visual-noise, but it also makes it possible to create arrays with any object type.
So, you pick when to use one or the other when it makes the most sense to you. It's called "programmer's choice".

As you correctly noted, they generate the same result. So, when deciding, choose one that produces simpler code. In this case, it's the %w operator. In the case of your previous question, it's the array literal.

Using %w allows you to avoid using quotes around strings.
Moreover, there are more shortcuts like these:
%W - double quotes
%r - regular expression
%q - single-quoted string
%Q - double-quoted string
%x - shell command
More information is available in "What does %w(array) mean?"

This is the way I remember it:
%Q/%q is for strings
%Q is for double-quoted strings (useful for when you have multiple quote characters in a string).
Instead of doing this:
“I said \“Hello World\””
You can do:
%Q{I said “Hello World”}
%q is for single-quoted strings (remember single quoted strings do not support string interpolation or escape sequences e.g. \n. And when I say does not "support", I mean that single quoted strings will need process the escape sequence as a special character, in other words, the escape sequence will just be part of the string literal)
Instead of doing this:
‘I said \’Hello World\’’
You can do:
%q{I said 'Hello World'}
But note that if you have an escape sequence in string, that will not be processed and instead treated as a literal backslash and n character:
result = %q{I said Hello World\n}
=> "I said Hello World\\n"
puts result
I said Hello World\n
Notice the literal \n was not treated as a line break, but it is with %Q:
result = %Q{I said Hello World\n}
=> "I said Hello World\n"
puts result
I said Hello World
%W/%w is for array elements
%W is used for double-quoted array elements. This means that it will support string interpolation and escape sequences:
Instead of doing this:
orange = "orange"
result = ["apple", "#{orange}", "grapes"]
=> ["apple", "orange", "grapes”]
you can do this:
result = %W(apple #{orange} grapes\n)
=> ["apple", "orange", "grapes\n"]
puts result
apple
orange
grapes
Notice the escape sequence \n caused a newline break after grapes. That would not happen with %w. %w is used for single-quoted array elements. And of course single quoted strings do not support interpolation and escape sequences.
Instead of doing this:
result = [‘a’, ‘b’, ‘c’]
you can do:
result = %w{a b c}
But look what happens when we try this:
result = %w{a b c\n}
=> ["a", "b", "c\\n"]
puts result
a
b
c\n
Remember do not confuse these constructs with %x (alternative for ` backtick which is used to run unix commands), %r (alternative for // regular expression syntax useful when you have a lot of / characters in your regular expressions and do not want to escape them) and finally %s (which is sued for symbols).

Related

Triple single quote vs triple double quote in Ruby

Why might you use ''' instead of """, as in Learn Ruby the Hard Way, Chapter 10 Study Drills?
There are no triple quotes in Ruby.
Two String literals which are juxtaposed are parsed as a single String literal. So,
'Hello' 'World'
#=> "HelloWorld"
is the same as
'HelloWorld'
#=> "HelloWorld"
And
'' 'Hello' ''
#=> "Hello"
is the same as
'''Hello'''
#=> "Hello"
is the same as
'Hello'
#=> "Hello"
Since adding an empty string literal does not change the result, you can add as many empty strings as you want:
""""""""""""'''''Hello'''''''''
#=> "Hello"
There are no special rules for triple single quotes vs. triple double quotes, because there are no triple quotes. The rules are simply the same as for quotes.
I assume the author confused Ruby and Python, because a triple-quote will not work in Ruby the way author thought it would. It'll just work like three separate strings ('' '' '').
For multi-line strings one could use:
%q{
your text
goes here
}
=> "\n your text\n goes here\n "
or %Q{} if you need string interpolation inside.
Triple-quotes ''' are the same as single quotes ' in that they don't interpolate any #{} sequences, escape characters (like "\n"), etc.
Triple-double-quotes (ugh) """ are the same as double-quotes " in that they do interpolation and escape sequences.
This is further down on the same page you linked.
The triple-quoted versions """ ''' allows for multi-line strings... as does the singly-quoted ' and ", so I don't know why both are available.
In Ruby """ supports interpolation, ''' does not.
Rubyists use triple quotes for multi-line strings (similar to 'heredocs').
You could just as easily use one of these characters.
Just like normal strings the double quotes will allow you to use variables inside of your strings (also known as 'interpolation').
Save this to a file called multiline_example.rb and run it:
interpolation = "(but this one can use interpolation)"
single = '''
This is a multi-line string.
'''
double = """
This is also a multi-line string #{interpolation}.
"""
puts single
puts double
This is the output:
$ ruby multiline_string_example.rb
This is a multi-line string.
This is also a multi-line string (but this one can use interpolation).
$
Now try it the other way around:
nope = "(this will never get shown)"
single = '''
This is a multi-line string #{nope}.
'''
double = """
This is also a multi-line string.
"""
puts single
puts double
You'll get this output:
$ ruby multiline_example.rb
This is a multi-line string #{nope}.
This is also a multi-line string.
$
Note that in both examples you got some extra newlines in your output. That's because multiline strings keep any newlines inside them, and puts adds a newline to every string.

Ruby backslash to continue string on a new line?

I'm reviewing a line of Ruby code in a pull request. I'm not sure if this is a bug or a feature that I haven't seen before:
puts "A string of Ruby that"\
"continues on the next line"
Is the backslash a valid character to concatenate these strings? Or is this a bug?
That is valid code.
The backslash is a line continuation. Your code has two quoted runs of text; the runs appear like two strings, but are really just one string because Ruby concatenates whitespace-separated runs.
Example of three quoted runs of text that are really just one string:
"a" "b" "c"
=> "abc"
Example of three quoted runs of text that are really just one string, using \ line continuations:
"a" \
"b" \
"c"
=> "abc"
Example of three strings, using + line continuations and also concatenations:
"a" +
"b" +
"c"
=> "abc"
Other line continuation details: "Ruby interprets semicolons and newline characters as the ending of a statement. However, if Ruby encounters operators, such as +, -, or backslash at the end of a line, they indicate the continuation of a statement." - Ruby Quick Guide
The backslash character does not concatenate any strings. It prevents the line-break from meaning that those two lines are different statements. Think of the backslash as the opposite of the semicolon. The semicolon lets two statements occupy one line; the backslash lets one statement occupy two lines.
What you are not realizing is that a string literal can be written as multiple successive literals. This is legal Ruby:
s = "A string of Ruby that" "continues on the same line"
puts s
Since that is legal, it is legal to put a line break between the two string literals - but then you need the backslash, the line-continuation character, to tell Ruby that these are in fact the same statement, spread over two lines.
s = "A string of Ruby that" \
"continues on the same line"
puts s
If you omit the backslash, it is still legal, but doesn't give the result you might be hoping for; the string literal on the second line is simply thrown away.
This is not a case of concatenated strings. It is one single string. "foo" "bar" is a syntactic construct that allows you to break up a string in your code, but it is identical to "foobar". In contrast, "foo" + "bar" is the true concatenation, invoking the concatenation method + on object "foo".
You can verify this by dumping the YARV instructions. Compare:
RubyVM::InstructionSequence.compile('"foo" + "bar"').to_a
// .... [:putstring, "foo"], [:putstring, "bar"] ....
RubyVM::InstructionSequence.compile('"foo" "bar"').to_a
// .... [:putstring, "foobar"] ....
The backslash in front of the newline will cancel the newline, so it does not terminate the statement; without it, it would not be one string, but two strings in separate lines.

Eloquent way to format string in Ruby?

str = 'foo_bar baz __goo'
Should print as
Foo Bar Baz Goo
Tried to use split /(\s|_)/, but it returns '_' and ' ' and multiple spaces...?
Ruby 1.9.3
try this:
str.split(/[\s_]+/).map(&:classify).join(" ")
if you have access to active support, or
str.split(/[\s_]+/).map(&:capitalize).join(" ")
if you want plain ruby.
Assuming you mean /(\s|_)/ (direction of the slashes matters!), your regular expression is pretty close. The reason you're getting the delimiters (spaces and underscores) in your result is the parentheses: they instruct the splitter to include the delimiters in the returned array.
The reason you are getting extra empty strings is that you are splitting on \s, which matches exactly one space (or tab), or '_', which matches exactly one underscore. If you want to treat any number of spaces or underscores as a single delimiter, you need to add + to your regex - it means "one or more of the previous thing".
But \s|_+ means "a space, or one or more underscores". You want to apply the + to the whole expression, not just the _. That brings us back to the parentheses. In this case, you want to group the two alternatives together without capturing (and returning) them; the syntax for that is (?:...). So this is the result:
str.split(/(?:\s|_)+/)
Now, if you want to normalize case, you want to run capitalize on each string, which you can do with map, like this:
str.split(/(?:\s|_)+/).map { |s| s.capitalize }
or use the shortcut:
str.split(/(?:\s|_)+/).map(&:capitalize)
So far, all these solutions return an array of strings, which you can do a variety of things with. But if you just want to put them back together into a single string, you can use join. For instance, to put them together with a single space between them:
str.split(/(?:\s|_)+/).map(&:capitalize).join ' '
Try splitting the string on:
[\s_]+
Use /[\ _]+/, it looks for one or more occurrence or space or underscore. In that way it is able to eat out multiple underscores, spaces or a combination of both. After than you get an array, so you use map to transform each of them. Later you can get them together using join. See examples -
Get them in a list -
str.split(/[\ _]+/).map {|s| s.capitalize }
=> ["Foo", "Bar", "Baz", "Goo"]
Get them as a whole string -
str.split(/[\ _]+/).map {|s| s.capitalize }.join(" ")
=> "Foo Bar Baz Goo"
Here the pure Ruby version:
str = 'foo_bar baz __goo'
str.split(/[ _]+/).map{|s| s[0].upcase+s[1..-1]}.join(" ")

Remove all non-alphabetical, non-numerical characters from a string?

If I wanted to remove things like:
.!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.
Allowed alphabetical characters should also include letters with diacritical marks including à or ç.
You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):
"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"
For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:
"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"
For all character properties, you can refer to the doc.
string.gsub(/[^[:alnum:]]/, "")
The following will work for an array:
z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect
I borrowed Jeremy's suggested regex.
You might consider a regular expression.
http://www.regular-expressions.info/ruby.html
I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.
A regexp you might use might go something like this:
[^.!,^-#]
That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.
If you truly have an array (as you state) and it is an array of strings (I'm guessing), e.g.
foo = [ "hello", "42 cats!", "yöwza" ]
then I can imagine that you either want to update each string in the array with a new value, or that you want a modified array that only contains certain strings.
If the former (you want to 'clean' every string the array) you could do one of the following:
foo.each{ |s| s.gsub! /\p{^Alnum}/, '' } # Change every string in place…
bar = foo.map{ |s| s.gsub /\p{^Alnum}/, '' } # …or make an array of new strings
#=> [ "hello", "42cats", "yöwza" ]
If the latter (you want to select a subset of the strings where each matches your criteria of holding only alphanumerics) you could use one of these:
# Select only those strings that contain ONLY alphanumerics
bar = foo.select{ |s| s =~ /\A\p{Alnum}+\z/ }
#=> [ "hello", "yöwza" ]
# Shorthand method for the same thing
bar = foo.grep /\A\p{Alnum}+\z/
#=> [ "hello", "yöwza" ]
In Ruby, regular expressions of the form /\A………\z/ require the entire string to match, as \A anchors the regular expression to the start of the string and \z anchors to the end.

What does %w(array) mean?

I'm looking at the documentation for FileUtils.
I'm confused by the following line:
FileUtils.cp %w(cgi.rb complex.rb date.rb), '/usr/lib/ruby/1.6'
What does the %w mean? Can you point me to the documentation?
%w(foo bar) is a shortcut for ["foo", "bar"]. Meaning it's a notation to write an array of strings separated by spaces instead of commas and without quotes around them. You can find a list of ways of writing literals in zenspider's quickref.
I think of %w() as a "word array" - the elements are delimited by spaces and it returns an array of strings.
Here are all % literals:
%w() array of strings
%r() regular expression.
%q() string
%x() a shell command (returning the output string)
%i() array of symbols (Ruby >= 2.0.0)
%s() symbol
%() (without letter) shortcut for %Q()
The delimiters ( and ) can be replaced with a lot of variations, like [ and ], |, !, etc.
When using a capital letter %W() you can use string interpolation #{variable}, similar to the " and ' string delimiters. This rule works for all the other % literals as well.
abc = 'a b c'
%w[1 2#{abc} d] #=> ["1", "2\#{abc}", "d"]
%W[1 2#{abc} d] #=> ["1", "2a b c", "d"]
There is also %s that allows you to create any symbols, for example:
%s|some words| #Same as :'some words'
%s[other words] #Same as :'other words'
%s_last example_ #Same as :'last example'
Since Ruby 2.0.0 you also have:
%i( a b c ) # => [ :a, :b, :c ]
%i[ a b c ] # => [ :a, :b, :c ]
%i_ a b c _ # => [ :a, :b, :c ]
# etc...
%W and %w allow you to create an Array of strings without using quotes and commas.
Though it's an old post, the question keep coming up and the answers don't always seem clear to me, so, here's my thoughts:
%w and %W are examples of General Delimited Input types, that relate to Arrays. There are other types that include %q, %Q, %r, %x and %i.
The difference between the upper and lower case version is that it gives us access to the features of single and double quotes. With single quotes and (lowercase) %w, we have no code interpolation (#{someCode}) and a limited range of escape characters that work (\\, \n). With double quotes and (uppercase) %W we do have access to these features.
The delimiter used can be any character, not just the open parenthesis. Play with the examples above to see that in effect.
For a full write up with examples of %w and the full list, escape characters and delimiters, have a look at "Ruby - %w vs %W – secrets revealed!"
Instead of %w() we should use %w[]
According to Ruby style guide:
Prefer %w to the literal array syntax when you need an array of words (non-empty strings without spaces and special characters in them). Apply this rule only to arrays with two or more elements.
# bad
STATES = ['draft', 'open', 'closed']
# good
STATES = %w[draft open closed]
Use the braces that are the most appropriate for the various kinds of percent literals.
[] for array literals(%w, %i, %W, %I) as it is aligned with the standard array literals.
# bad
%w(one two three)
%i(one two three)
# good
%w[one two three]
%i[one two three]
For more read here.
Excerpted from the documentation for Percent Strings at http://ruby-doc.org/core/doc/syntax/literals_rdoc.html#label-Percent+Strings:
Besides %(...) which creates a String, the % may create other types of object. As with strings, an uppercase letter allows interpolation and escaped characters while a lowercase letter disables them.
These are the types of percent strings in ruby:
...
%w: Array of Strings
I was given a bunch of columns from a CSV spreadsheet of full names of users and I needed to keep the formatting, with spaces. The easiest way I found to get them in while using ruby was to do:
names = %(Porter Smith
Jimmy Jones
Ronald Jackson).split("\n")
This highlights that %() creates a string like "Porter Smith\nJimmyJones\nRonald Jackson" and to get the array you split the string on the "\n" ["Porter Smith", "Jimmy Jones", "Ronald Jackson"]
So to answer the OP's original question too, they could have wrote %(cgi\ spaeinfilename.rb;complex.rb;date.rb).split(';') if there happened to be space when you want the space to exist in the final array output.

Resources