What does `:|` do in Ruby? - ruby

I found the following syntax in another question, and I have been unable to find any documentation on what its doing - I'm assuming syntactic sugar of some sort:
[array1, array2, array3, array4].compact.reduce([], :|)
I allows for one of the arrays to be nil instead of an array, and seems to work like a charm. Can anyone point me in the right direction to understand what is going on?
The original question is here: Merge arrays if not nil and not empty

It's a symbol, like :test, but a single character symbol.
The two-argument version of reduce accepts as a second argument a method name, the name of the method in this case is :|, or the | method. | on arrays is a set operation, it "or"s the arrays together, giving you the unique superset of all elements contained in both arrays. This isn't a particularly idiomatic use of reduce, you could achieve the same thing with .flatten.uniq
If you wanted to add the numbers, you could use :+, or to multiply you could use :*.

It's the same thing as this:
[array1, array2, array3, array4].compact.reduce([]) do |memo, array|
memo | array
end
Although it has syntactic sugar, Array#| is a method which you can see the docs for here. As the docs say:
Set Union — Returns a new array by joining ary with other_ary, excluding any duplicates and preserving the order from the original array
When the block of reduce takes this particular form (calling a single method on memo, passing the iteration's element as an argument), you can omit the block and just pass the method name.

Related

File.open('file.txt') vs. File.open('file.txt').readlines

I checked using File.open('file.txt').class and File.open('file.txt').readlines.class and the former one returns File and the latter returns Array.
I understand this difference, but if I do something like:
File.open('file.txt').collect {|l| l.upcase}
=== File.open('file.txt').readlines.collect {|l| l.upcase}
it returns true. So are there any differences between the two objects when each item in the object is being passed to a block as an argument?
And also, I was assuming that the arguments that are passed to the block in both expressions are both a line in the file as a string which makes the comparison return true, is that correct? If so, how do I know what kind of argument will be passed to the block when I write the code? Do I have to check something like the documentation or the source code?
For example, I know how
['a','b','c'].each_with_index { |num, index| puts "#{index + 1}: #{num}" }
works and take this for granted. But how do I know the first argument should be each item in the array and the second the index, instead of the reverse?
Hope that makes sense, thanks!
Get comfortable doing some Ruby introspection in irb.
irb(main):001:0> puts File.ancestors.inspect
[File, IO, File::Constants, Enumerable, Object, Kernel, BasicObject]
This result shows us classes the File class inherits from and that includes the methods of class Enumerable. So what object is returned from File.readlines? An Array I think, let's check.
ri File.readlines
IO.readlines(name, sep=$/ [, open_args]) -> array
IO.readlines(name, limit [, open_args]) -> array
IO.readlines(name, sep, limit [, open_args]) -> array
This may be overkill, but we can verify Enumerable methods exists within an Array.
irb(main):003:0> puts Array.ancestors.inspect
[Array, Enumerable, Object, Kernel, BasicObject]
I'll try to make this response as compact as possible.
The second question - if you operate on objects which come from standard library you may always refer to their documentation in order to be 100% sure what arguments to expect when passing a block. For instance, multiple times you will be using methods like each, select, map (etc ...), on different data structures (Array, Hash, ...). Before you get used to them you may find all information about them in docs for example: http://ruby-doc.org/core-2.2.0/Array.html
If you are operating on non core data structures (for example a class which comes from gem you include, you should always browse it's documentation or sources on Github).
The first question. Result may be the same, when using different methods, but on deeper level there may be some differences. As far as your case is based on files. Reading file content may be processed in two ways. First - read everything into memory and operate on array of string, and the second - read file lines sequentially which may last longer but will not reserve as much memory.
In Enumerable#each_with_index the arguments are in the same order as they are in the method name itself, first the element, then the index. Same thing with each_with_object.

Updating Hash Values in Ruby Clarified

I was going to comment on the original question but I don't have the reputation to do so yet....
I too was wondering how to easily update all the values in a hash, or if there was some kind of equivalent .map! method for hashes. Someone put up this elegant solution:
hash.update(hash){|key,v1| expresion}
on this question:
Ruby: What is the easiest method to update Hash values?
My questions is how does the block know to iterate over each element in the hash? For example, I'd have to call .each on a hash to access each element normally so why isn't it something like:
hash.update(hash.each) do |key ,value|
value+=1
end
In the block with {|key, value| expression} I am accessing each individual hash element yet I don't have to explicitly tell the system this? Why not? Thank you very much.
Hash#update is an alias for Hash#merge! which is more descriptive.
When calling the method with a block, the following happens (excerpt from the docs):
If [a] block is specified, [...] the value of each duplicate key is
determined by calling the block with the key [...]
So, the above code works like this:
The hash is merged with itself, and for each duplicate key the block is called. As we merge the hash with itself, every newly added key is a duplicate and therefore the block is invoked. The result is that every value in the hash gets replaced by expresion.
Hash#update takes a hash as the first parameter, and an optional block as the second parameter. If the second parameter is left out, the method will internally loop on each key-value pair in the supplied hash and use them to merge into the original hash.
If the block (second parameter) is supplied, the method does exactly the same thing. It loops over each key-value in the supplied hash and merges it in. The only difference is where a collision is found (the original hash already has an entry for a specific key). In this case the block is called to help resolve the conflict.
Based on this understanding, simply passing the hash into itself will cause it to loop over every key-value because that's how update always works. Calling .each would be redundant.
To see this more clearly, take a look at the source code for the #update method, and note the internal call to rb_hash_foreach in either logic branch.

Sum array of numbers [duplicate]

This question already has answers here:
How to sum array of numbers in Ruby?
(16 answers)
Closed 8 years ago.
Q: Write a method, sum which takes an array of numbers and returns the sum of the numbers.
A:
def sum(nums)
total = 0
i = 0
while i < nums.count
total += nums[i]
i += 1
end
# return total
total
end
There has to be another way to solve this without using while, right? Anyone know how?
Edit: This is not an exam or test. This is a practice problem provided on github for app academy. They provide the question and answer as an example. I just read however that good programmers don't like to use while or unless, so I was curious if I could learn something to solve this problem a better way. Like with enumerable? (Noob at Ruby here, obviously..)
Also, I would love any walkthrough or methods that I should learn.. This question is also different because I am asking for specific examples using this data.
The usual way of doing that would be this:
def sum(nums) nums.reduce(&:+) end
which is short for something like this:
def sum(nums) nums.reduce(0) { |total, num| total + num } end
I see that Neil posted a similar solution while I was typing this, so I'll just note that reduce and inject are two names for the same method - Ruby has several aliases like this so that people used to different other languages can find what they're looking for. He also left off the &, which is optional when using a named method for reduce/inject, but not in other cases.
Explanation follows.
In Ruby you don't normally use explicit loops (for, while, etc.). Instead you call methods on the collection you're iterating over, and pass them a block of code to execute on each item. Ruby's syntax places the block after the arguments to the method, between either do...end or {...}, so it looks like traditional imperative flow control, but it works differently.
The basic iteration method is each:
[1,2,3].each do |i| puts i end
That calls the block do |i| puts i end three times, passing it 1, then passing it 2, and finally passing it 3. The |i| is a block parameter, which tells Ruby where to put the value(s) passed into the block each time.
But each just throws away the return value of the block calls (in this case, the three nils returned by puts). If you want to do something with those return values, you have to call a different method. For example, map returns an array of the return values:
[1,2,3].map do |i| puts i end
#=> [nil, nil, nil]
That's not very interesting here, but it becomes more useful if the block returns something:
[1,2,3].map do |i| 2*i end
#=> [2,4,6]
If you want to combine the results into a single aggregate return value instead of getting back an array that's the same size as the input, that's when you reach for reduce. In addition to a block, it takes an extra argument, and the block itself is also called with an extra argument. The extra parameter corresponding to this argument is called the "accumulator"; the first time the block is called, it gets the argument originally passed to reduce, but from then on, it gets the return value of the previous call to the block, which is how each block call can pass information along to the next.
That makes reduce more general than map; in fact, you can build map out of reduce by passing in an empty array and having the block add to it:
[1,2,3].reduce([]) do |a,i| a + [2*i] end
#=> [2,4,6]
But since map is already defined, you would normally just use it for that, and only use reduce to do things that are more, well, reductive:
[1,2,3].reduce(0) do |s, i| s + 2*i end
#=> 12
...which is what we're doing in solving your problem.
Neil and I took a couple extra shortcuts. First, if a block does nothing but call a single method on its parameters and return the result, you can get an equivalent block by prefixing &: to the method name. That is, this:
some_array.reduce(x) do |a,b| a.some_method(b) end
can be rewritten more simply as this:
some_array.reduce(x, &:some_method)
and since a + b in Ruby is really just a more-familiar way of writing the method call a.+(b), that means that you can add up numbers by just passing in &:+:
[1,2,3].reduce(0, &:+)
#=> 6
Next, the initial accumulator value for reduce is optional; if you leave it out, then the first time the block is called, it gets the first two elements of the array. So you can leave off the 0:
[1,2,3].reduce(&:+)
#=> 6
Finally, you normally need the & any time you are passing in a block that is not a literal chunk of code. You can turn blocks into Proc objects and store them in variables and in general treat them like any other value, including passing them as regular arguments to method calls. So when you want to use one as the block on a method call instead, you indicate that with the &.
Some methods, including reduce, will also accept a bare Symbol (like :+) and create the Proc/block for you; and Neil took advantage of that fact. But other iterator methods, such as map, don't work that way:
irb(main):001:0> [-1,2,-3].map(:abs)
ArgumentError: wrong number of arguments (1 for 0)
from (irb):1:in `map'
from (irb):1
from /usr/bin/irb:12:in `<main>'
So I just always use the &.
irb(main):002:0> [-1,2,-3].map(&:abs)
#=> [1, 2, 3]
There are lots of good online tutorials for Ruby. For more general information about map/reduce and related concepts, and how to apply them to problem-solving, you should search for introductions to "functional programming", which is called that because it treats "functions" (that is, blocks of executable code, which in Ruby are realized as Proc objects) as values just like numbers and strings, which can be passed around, assigned to variables, etc.
Probably the most idiomatic way of doing this in Ruby is:
nums.inject(:+)
. . . although this basically hides all the working, so it depends what the test is trying to test.
Documentation for Array#inject

Can't all or most cases of `each` be replaced with `map`?

The difference between Enumerable#each and Enumerable#map is whether it returns the receiver or the mapped result. Getting back to the receiver is trivial and you usually do not need to continue a method chain after each like each{...}.another_method (I probably have not seen such case. Even if you want to get back to the receiver, you can do that with tap). So I think all or most cases where Enumerable#each is used can be replaced by Enumerable#map. Am I wrong? If I am right, what is the purpose of each? Is map slower than each?
Edit:
I know that there is a common practice to use each when you are not interested in the return value. I am not interested in whether such practice exists, but am interested in whether such practice makes sense other than from the point of view of convention.
The difference between map and each is more important than whether one returns a new array and the other doesn't. The important difference is in how they communicate your intent.
When you use each, your code says "I'm doing something for each element." When you use map, your code says "I'm creating a new array by transforming each element."
So while you could use map in place of each, performance notwithstanding, the code would now be lying about its intent to anyone reading it.
The choice between map or each should be decided by the desired end result: a new array or no new array. The result of map can be huge and/or silly:
p ("aaaa".."zzzz").map{|word| puts word} #huge and useless array of nil's
I agree with what you said. Enumerable#each simply returns the original object it was called on while Enumerable#map sets the current element being iterated over to the return value of the block, and then returns a new object with those changes.
Since Enumerable#each simply returns the original object itself, it can be very well preferred over the map when it comes to cases where you need to simply iterate or traverse over elements.
In fact, Enumerable#each is a simple and universal way of doing a traditional iterating for loop, and each is much preferred over for loops in Ruby.
You can see the significant difference between map and each when you're composing these enumaratiors.
For example you need to get new array with indixes in it:
array.each.with_index.map { |index, element| [index, element] }
Or for example you just need to apply some method to all elements in array and print result without changing the original array:
m = 2.method(:+)
[1,2,3].each { |a| puts m.call(a) } #=> prints 3, 4, 5
And there's a plenty another examples where the difference between each and map is important key in the writing code in functional style.

How does "(1..4).inject(&:+)" work in Ruby

I find this code in Ruby to be pretty intriguing
(1..4).inject(&:+)
Ok, I know what inject does, and I know this code is basically equivalent to
(1..4).inject(0) {|a,n| a + n}
but how exactly does it work?
Why &:+ is the same as writing the block {|a,n| a + n}?
Why it doesn't need an initial value? I'm ok with the inicial value being 0, but (1..4).inject(&:*) also works, and there the initial value must be 1...
From Ruby documentation:
If you specify a symbol instead, then each element in the collection will be passed to the named method of memo
So, specifying a symbol is equivalent to passing the following block:
{|memo, a| memo.send(sym, a)}
If you do not explicitly specify an initial value for memo, then uses the first element of collection is used as the initial value of memo.
So, there is no magic, Ruby simply takes the first element as the initial value and starts injecting from the second element. You can check it by writing [].inject(:+): it returns nil as opposed to [].inject(0, :+) which returns 0.
Edit: I didn't notice the ampersand. You don't need it, inject will work with a symbol. But if you do write it, the symbol is converted to block, it can be useful with other methods

Resources