Why is uniq needed in this problem? (Ruby) - ruby

New to coding, and trying to study independently. Working on a problem on Ruby, where we create a method (filter_out!) that mutates the original array to remove true elements when the array is passed through a proc. The only stipulation is that we aren't allowed to use Array#reject!.
Can anyone explain to me why
arr.uniq.each { |ele| arr.delete(ele) if prc.call(ele)}
works and
arr.each { |ele| arr.delete(ele) if prc.call(ele)}
does not? Here are two example problems:
arr_2 = [1, 7, 3, 5 ]
filter_out!(arr_2) { |x| x.odd? }
p arr_2 # []
arr_3 = [10, 6, 3, 2, 5 ]
filter_out!(arr_3) { |x| x.even? }
p arr_3 # [3, 5]
I've looked it up and understand #uniq removes duplicate values, right?
Any insight would be greatly appreciated-thank you!
edit: Looked into it more, is it because uniq will use the return value of the proc for comparison? Still not sure why this is needed for some numbers, but not all.

The only stipulation is that we aren't allowed to use Array#reject!.
This is an incredibly braindead stipulation. You could just use Array#select! and invert the condition, or use Array#reject together with Array#replace, for example.
Can anyone explain to me why
arr.uniq.each { |ele| arr.delete(ele) if prc.call(ele)}
works and
arr.each { |ele| arr.delete(ele) if prc.call(ele)}
Whenever you don't quite understand what a piece of code does, a good idea is to just run it yourself with a pen and a piece of paper.
So, let's just do that. Assume arr = [1, 2, 4, 5, 2] and prc = -> e { e.even? }.
Array#each iterates through the array, we don't exactly know how it does it (that is the whole idea of abstraction), but we can imagine that it keeps some kind of index to remember which part of the array it is currently at.
So, during the first iteration of the array, the index is at the first element, and the array looks like this:
[1, 2, 4, 5, 2]
# ↑
Array#each passes 1 to the block, which in turn passes it to prc, which returns false, thus the block doesn't do anything.
Array#each increases the index, so now our array looks like this:
[1, 2, 4, 5, 2]
# ↑
Array#each passes 2 to the block, which in turn passes it to prc, which returns true. As a result, the block now passes 2 to Array#delete, which deletes every element from the array that is equal to 2. So, now the array looks like this:
[1, 4, 5]
# ↑
Array#each increases the index, so now our array looks like this:
[1, 4, 5]
# ↑
Array#each passes 5 to the block, which in turn passes it to prc, which returns false, thus the block doesn't do anything.
Array#each increases the index, so now our array looks like this:
[1, 4, 5]
# ↑
Since index is past the end of the array, the iteration is done, and the result is [1, 4, 5].
As you can see, the problem is that you are mutating the array while iterating over it.
I've looked it up and understand #uniq removes duplicate values, right?
Correct, but that has nothing to do with why it makes your code work. Or, more precisely, makes it seem to work because there are actually a lot of other problems with your code as well.
It is not the fact that Array#uniq removes duplicate values that is relevant, it is the fact that Array#uniq returns a new array. Since Array#uniq returns a new array, you are no longer mutating an array while iterating over it, you are mutating one array while iterating over another.
You could have used any method of Array or one of its ancestors that returns a new array, for example Object#clone, Object#dup, Array#map, or Array#select, or even something really creative such as arr + []:
arr.select { true }.each {|ele| arr.delete(ele) if prc.call(ele) }
In fact, you don't even need to return an Array, you only need to return some kind of Enumerable, such as an Enumerator, for example using Array#each:
arr.each.each {|ele| arr.delete(ele) if prc.call(ele) }
edit: Looked into it more, is it because uniq will use the return value of the proc for comparison? Still not sure why this is needed for some numbers, but not all.
Since you are not passing the proc to Array#uniq, it cannot possibly use it for anything, so it is clearly impossible for this to be the explanation.
Note that, as I explained above, there are more problems with your code than just mutating an array while iterating over it.
Even if your original code did work, it would actually still be broken. You are using Array#delete to delete the element for which prc returns true. The problem is that Array#delete deletes all elements in the array that are #== to the element you are passing as an argument, even the ones for which the prc might return false.
Here is a trivial example:
a = [2, 2, 2]
i = -1
filter_out!(a) { (i += 1) == 1 }
This should only filter out the second element, but actually deletes all of them, so the result is [] when it actually should be [2, 2].
Your version with Array#uniq makes this problem even worse, because after running Array#uniq, the array doesn't even have a second element any more! So, the result is [2], when it actually should be [2, 2].
The next problem is the name of the method: filter_out!. Bang methods (i.e. methods whose name ends with an exclamation mark !) are used to mark the more surprising method of a pair of methods. You should never have a bang method on its own. The name filter_out! should only be used if and only if there is also a filter_out method. So, the method should be named filter_out.
And the last problem is that you are mutating the argument that is passed into the method. That is an absolute no-no. You never, ever mutate an argument that is passed into a method. Never. You never break somebody else's toys, you never touch somebody else's privates.
Mutation, in general, should be avoided as much as possible, since it can lead to confusing and hard to track down bugs, as you so beautifully have discovered yourself. Only if absolutely necessary should you mutate state, and only state that you yourself own. Never mutate somebody else's state. Instead, filter_out should return a new array with the results.
Here are a couple of examples what filter_out could look like:
def filter_out(enum)
enum.select {|el| !yield el }
end
def filter_out(enum)
enum.reduce([]) {|acc, el| if yield el then acc else acc + [el] end }
end
def filter_out(enum)
enum.each_with_object([]) {|el, acc| acc << el unless yield el }
end
Note: Personally, I am not a big fan of Enumerable#each_with_object, because it relies on mutation, so I would avoid that solution as much as possible.
However, the real problem seems to be that whatever course or book or tutorial or class you are following seems to be pretty terrible, both a teaching Ruby, and at teaching good Software Engineering practices, such as testing and debugging.

The behaviour you have witnessed has nothing to do with the fact that a proc is present. The problem is that you are invoking a method (Array#each) on a receiver (the array) that is altered (mutated) in the method's block.
Suppose:
arr = [true, false, true]
arr.each do |x| puts "x = #{x}, arr = #{arr}"
arr.delete(x)
puts "arr = #{arr}"
end
#=> [false]
The following is printed:
x = true, arr = [true, false, true]
arr = [false]
You were perhaps expecting to see an empty array returned, not [false].
As you see from the printed output, only arr[0] #=> true was passed to each's block. That caused arr.delete(x) to delete both instances of true from arr, causing arr to then equal [false]. Ruby then attempts to pass arr[1] to the block but finds that arr has only one element so she concludes she is finished and returns arr #=> [false].
Had arr = [true, true, true] the return value (an empty array) would be correct, but only because all elements are removed when the first element of arr is passed to the block. This is analogous to the example in the question of removing all odd elements of [1, 7, 3, 5].
Even when the code is technically correct it is considered bad practice to invoke a method on a receiver that alters the receiver in its block.
If we first invoke uniq we are no longer iterating over the array we are modifying, so all is well:
a = arr.uniq
#=> [true, false]
a.each do |x| puts "x = #{x}, arr = #{arr}"
arr.delete(x)
puts "arr = #{arr}"
end
#=> [true, false]
The following is printed:
x = true, arr = [true, false, true]
arr = [false]
x = false, arr = [false]
arr = []
Similarly,
a = arr.dup
#=> [true, false]
a.each do |x| puts "x = #{x}, arr = #{arr}"
arr.delete(x)
puts "arr = #{arr}"
end
#=> [true, false, true]
The following is printed:
x = true, arr = [true, false, true]
arr = [false]
x = false, arr = [false]
arr = []
x = true, arr = []
arr = []

Related

ruby syntax code involving hashes

I was looking at code regarding how to return a mode from an array and I ran into this code:
def mode(array)
answer = array.inject ({}) { |k, v| k[v]=array.count(v);k}
answer.select { |k,v| v == answer.values.max}.keys
end
I'm trying to conceptualize what the syntax means behind it as I am fairly new to Ruby and don't exactly understand how hashes are being used here. Any help would be greatly appreciated.
Line by line:
answer = array.inject ({}) { |k, v| k[v]=array.count(v);k}
This assembles a hash of counts. I would not have called the variable answer because it is not the answer, it is an intermediary step. The inject() method (also known as reduce()) allows you to iterate over a collection, keeping an accumulator (e.g. a running total or in this case a hash collecting counts). It needs a starting value of {} so that the hash exists when attempting to store a value. Given the array [1,2,2,2,3,4,5,6,6] the counts would look like this: {1=>1, 2=>3, 3=>1, 4=>1, 5=>1, 6=>2}.
answer.select { |k,v| v == answer.values.max}.keys
This selects all elements in the above hash whose value is equal to the maximum value, in other words the highest. Then it identifies the keys associated with the maximum values. Note that it will list multiple values if they share the maximum value.
An alternative:
If you didn't care about returning multiple, you could use group_by as follows:
array.group_by{|x|x}.values.max_by(&:size).first
or, in Ruby 2.2+:
array.group_by{&:itself}.values.max_by(&:size).first
The inject method acts like an accumulator. Here is a simpler example:
sum = [1,2,3].inject(0) { |current_tally, new_value| current_tally + new_value }
The 0 is the starting point.
So after the first line, we have a hash that maps each number to the number of times it appears.
The mode calls for the most frequent element, and that is what the next line does: selects only those who are equal to the maximum.
I believe your question has been answered, and #Mark mentioned different ways to do the calculations. I would like to just focus on other ways to improve the first line of code:
answer = array.inject ({}) { |k, v| k[v] = array.count(v); k }
First, let's create some data:
array = [1,2,1,4,3,2,1]
Use each_with_object instead of inject
My suspicion is that the code might be fairly old, as Enumerable#each_with_object, which was introduced in v. 1.9, is arguably a better choice here than Enumerable#inject (aka reduce). If we were to use each_with_object, the first line would be:
answer = array.each_with_object ({}) { |v,k| k[v] = array.count(v) }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
each_with_object returns the object, a hash held by the block variable v.
As you see, each_with_object is very similar to inject, the only differences being:
it is not necessary to return v from the block to each_with_object, as it is with inject (the reason for that annoying ; v at the end of inject's block);
the block variable for the object (k) follows v with each_with_object, whereas it proceeds v with inject; and
when not given a block, each_with_object returns an enumerator, meaning it can be chained to other other methods (e.g., arr.each_with_object.with_index ....
Don't get me wrong, inject remains an extremely powerful method, and in many situations it has no peer.
Two more improvements
In addition to replacing inject with each_with_object, let me make two other changes:
answer = array.uniq.each_with_object ({}) { |k,h| h[k] = array.count(k) }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
In the original expression, the object returned by inject (sometimes called the "memo") was represented by the block variable k, which I am using to represent a hash key ("k" for "key"). Simlarly, as the object is a hash, I chose to use h for its block variable. Like many others, I prefer to keep the block variables short and use names that indicate object type (e.g., a for array, h for hash, s for string, sym for symbol, and so on).
Now suppose:
array = [1,1]
then inject would pass the first 1 into the block and then compute k[1] = array.count(1) #=> 2, so the hash k returned to inject would be {1=>2}. It would then pass the second 1 into the block, again compute k[1] = array.count(1) #=> 2, overwriting 1=>1 in k with 1=>1; that is, not changing it at all. Doesn't it make more sense to just do this for the unique values of array? That's why I have: array.uniq....
Even better: use a counting hash
This is still quite inefficient--all those counts. Here's a way that reads better and is probably more efficient:
array.each_with_object(Hash.new(0)) { |k,h| h[k] += 1 }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
Let's have a look at this in gory detail. Firstly, the docs for Hash#new read, "If obj is specified [i.e., Hash.new(obj)], this single object will be used for all default values." This means that if:
h = Hash.new('cat')
and h does not have a key dog, then:
h['dog'] #=> 'cat'
Important: The last expression is often misunderstood. It merely returns the default value. str = "It does *not* add the key-value pair 'dog'=>'cat' to the hash." Let me repeat that: puts str.
Now let's see what's happening here:
enum = array.each_with_object(Hash.new(0))
#=> #<Enumerator: [1, 2, 1, 4, 3, 2, 1]:each_with_object({})>
We can see the contents of the enumerator by converting it to an array:
enum.to_a
#=> [[1, {}], [2, {}], [1, {}], [4, {}], [3, {}], [2, {}], [1, {}]]
These seven elements are passed into the block by the method each:
enum.each { |k,h| h[k] += 1 }
=> {1=>3, 2=>2, 4=>1, 3=>1}
Pretty cool, eh?
We can simulate this using Enumerator#next. The first value of enum ([1, {}]) is passed to the block and assigned to the block variables:
k,h = enum.next
#=> [1, {}]
k #=> 1
h #=> {}
and we compute:
h[k] += 1
#=> h[k] = h[k] + 1 (what '+=' means)
# = 0 + 1 = 1 (h[k] on the right equals the default value
# of 1 since `h` has no key `k`)
so now:
h #=> {1=>1}
Next, each passes the second value of enum into the block and similar calculations are performed:
k,h = enum.next
#=> [2, {1=>1}]
k #=> 2
h #=> {1=>1}
h[k] += 1
#=> 1
h #=> {1=>1, 2=>1}
Things are a little different when the third element of enum is passed in, because h now has a key 1:
k,h = enum.next
#=> [1, {1=>1, 2=>1}]
k #=> 1
h #=> {1=>1, 2=>1}
h[k] += 1
#=> h[k] = h[k] + 1
#=> h[1] = h[1] + 1
#=> h[1] = 1 + 1 => 2
h #=> {1=>1, 2=>1}
The remaining calculations are performed similarly.

How does to_enum(:method) receive its block here?

This code, from an example I found, counts the number of elements in the array which are equal to their index. But how ?
[4, 1, 2, 0].to_enum(:count).each_with_index{|elem, index| elem == index}
I could not have done it only with chaining, and the order of evaluation within the chain is confusing.
What I understand is we're using the overload of Enumerable#count which, if a block is given, counts the number of elements yielding a true value. I see that each_with_index has the logic for whether the item is equal to it's index.
What I don't understand is how each_with_index becomes the block argument of count, or why the each_with_index works as though it was called directly on [4,1,2,0]. If map_with_index existed, I could have done:
[4,1,2,0].map_with_index{ |e,i| e==i ? e : nil}.compact
but help me understand this enumerable-based style please - it's elegant!
Let's start with a simpler example:
[4, 1, 2, 0].count{|elem| elem == 4}
=> 1
So here the count method returns 1 since the block returns true for one element of the array (the first one).
Now let's look at your code. First, Ruby creates an enumerator object when we call to_enum:
[4, 1, 2, 0].to_enum(:count)
=> #<Enumerator: [4, 1, 2, 0]:count>
Here the enumerator is waiting to execute the iteration, using the [4, 1, 2, 0] array and the count method. Enumerators are like a pending iteration, waiting to happen later.
Next, you call the each_with_index method on the enumerator, and provide a block:
...each_with_index{|elem, index| elem == index}
This calls the Enumerator#each_with_index method on the enumerator object you created above. What Enumerator#each_with_index does is start the pending iteration, using the given block. But it also passes an index value to the block, along with the values from the iteration. Since the pending iteration was setup to use the count method, the enumerator will call Array#count. This passes each element from the array back to the enumerator, which passes them into the block along with the index. Finally, Array#count counts up the true values, just like with the simpler example above.
For me the key to understanding this is that you're using the Enumerator#each_with_index method.
The answer is but a click away: the documentation for Enumerator:
Most [Enumerator] methods [but presumably also Kernel#to_enum and Kernel#enum_for] have two forms: a block form where the contents are evaluated for each item in the enumeration, and a non-block form which returns a new Enumerator wrapping the iteration.
It is the second that applies here:
enum = [4, 1, 2, 0].to_enum(:count) # => #<Enumerator: [4, 1, 2, 0]:count>
enum.class # => Enumerator
enum_ewi = enum.each_with_index
# => #<Enumerator: #<Enumerator: [4, 1, 2, 0]:count>:each_with_index>
enum_ewi.class # => Enumerator
enum_ewi.each {|elem, index| elem == index} # => 2
Note in particular irb's return from the third line. It goes on say, "This allows you to chain Enumerators together." and gives map.with_index as an example.
Why stop here?
enum_ewi == enum_ewi.each.each.each # => true
yet_another = enum_ewi.each_with_index
# => #<Enumerator: #<Enumerator: #<Enumerator: [4, 1, 2, 0]:count>:each_with_index>:each_with_index>
yet_another.each_with_index {|e,i| puts "e = #{e}, i = #{i}"}
e = [4, 0], i = 0
e = [1, 1], i = 1
e = [2, 2], i = 2
e = [0, 3], i = 3
yet_another.each_with_index {|e,i| e.first.first == i} # => 2
(Edit 1: replaced example from docs with one pertinent to the question. Edit 2: added "Why stop here?)
Nice answer #Cary.. I'm not exactly sure how the block makes its way through the chain of objects, but despite appearances, the block is being executed by the count method, as in this stack trace, even though its variables are bound to those yielded by each_with_index
enum = [4, 1, 2, 0].to_enum(:count)
enum.each_with_index{|e,i| raise "--" if i==3; puts e; e == i}
4
1
2
RuntimeError: --
from (irb):243:in `block in irb_binding'
from (irb):243:in `count'
from (irb):243:in `each_with_index'
from (irb):243

Ruby inject with index and brackets

I try to clean my Code. The first Version uses each_with_index. In the second version I tried to compact the code with the Enumerable.inject_with_index-construct, that I found here.
It works now, but seems to me as obscure as the first code.
Add even worse I don't understand the brackets around element,index in
.. .inject(groups) do |group_container, (element,index)|
but they are necessary
What is the use of these brackets?
How can I make the code clear and readable?
FIRST VERSION -- WITH "each_with_index"
class Array
# splits as good as possible to groups of same size
# elements are sorted. I.e. low elements go to the first group,
# and high elements to the last group
#
# the default for number_of_groups is 4
# because the intended use case is
# splitting statistic data in 4 quartiles
#
# a = [1, 8, 7, 5, 4, 2, 3, 8]
# a.sorted_in_groups(3) # => [[1, 2, 3], [4, 5, 7], [8, 8]]
#
# b = [[7, 8, 9], [4, 5, 7], [2, 8]]
# b.sorted_in_groups(2) {|sub_ary| sub_ary.sum } # => [ [[2, 8], [4, 5, 7]], [[7, 8, 9]] ]
def sorted_in_groups(number_of_groups = 4)
groups = Array.new(number_of_groups) { Array.new }
return groups if size == 0
average_group_size = size.to_f / number_of_groups.to_f
sorted = block_given? ? self.sort_by {|element| yield(element)} : self.sort
sorted.each_with_index do |element, index|
group_number = (index.to_f / average_group_size).floor
groups[group_number] << element
end
groups
end
end
SECOND VERSION -- WITH "inject" AND index
class Array
def sorted_in_groups(number_of_groups = 4)
groups = Array.new(number_of_groups) { Array.new }
return groups if size == 0
average_group_size = size.to_f / number_of_groups.to_f
sorted = block_given? ? self.sort_by {|element| yield(element)} : self.sort
sorted.each_with_index.inject(groups) do |group_container, (element,index)|
group_number = (index.to_f / average_group_size).floor
group_container[group_number] << element
group_container
end
end
end
What is the use of these brackets?
It's a very nice feature of ruby. I call it "destructuring array assignment", but it probably has an official name too.
Here's how it works. Let's say you have an array
arr = [1, 2, 3]
Then you assign this array to a list of names, like this:
a, b, c = arr
a # => 1
b # => 2
c # => 3
You see, the array was "destructured" into its individual elements. Now, to the each_with_index. As you know, it's like a regular each, but also returns an index. inject doesn't care about all this, it takes input elements and passes them to its block as is. If input element is an array (elem/index pair from each_with_index), then we can either take it apart in the block body
sorted.each_with_index.inject(groups) do |group_container, pair|
element, index = pair
# or
# element = pair[0]
# index = pair[1]
# rest of your code
end
Or destructure that array right in the block signature. Parentheses there are necessary to give ruby a hint that this is a single parameter that needs to be split in several.
Hope this helps.
lines = %w(a b c)
indexes = lines.each_with_index.inject([]) do |acc, (el, ind)|
acc << ind - 1 if el == "b"
acc
end
indexes # => [0]
What is the use of these brackets?
To understand the brackets, first you need to understand how destruction works in ruby. The simplest example I can think of this this:
1.8.7 :001 > [[1,3],[2,4]].each do |a,b|
1.8.7 :002 > puts a, b
1.8.7 :003?> end
1
3
2
4
You should know how each function works, and that the block receives one parameter. So what happens when you pass two parameters? It takes the first element [1,3] and try to split (destruct) it in two, and the result is a=1 and b=3.
Now, inject takes two arguments in the block parameter, so it is usually looks like |a,b|. So passing a parameter like |group_container, (element,index)| we are in fact taking the first one as any other, and destructing the second in two others (so, if the second parameter is [1,3], element=1 and index=3). The parenthesis are needed because if we used |group_container, element, index| we would never know if we are destructing the first or the second parameter, so the parenthesis there works as disambiguation.
9In fact, things works a bit different in the bottom end, but lets hide this for this given question.)
Seems like there already some answers given with good explanation. I want to add some information regards the clear and readable.
Instead of the solution you chose, it is also a possibility to extend Enumerable and add this functionality.
module Enumerable
# The block parameter is not needed but creates more readable code.
def inject_with_index(memo = self.first, &block)
skip = memo.equal?(self.first)
index = 0
self.each_entry do |entry|
if skip
skip = false
else
memo = yield(memo, index, entry)
end
index += 1
end
memo
end
end
This way you can call inject_with_index like so:
# m = memo, i = index, e = entry
(1..3).inject_with_index(0) do |m, i, e|
puts "m: #{m}, i: #{i}, e: #{e}"
m + i + e
end
#=> 9
If you not pass an initial value the first element will be used, thus not executing the block for the first element.
In case, someone is here from 2013+ year, you have each_with_object and with_index for your needs:
records.each_with_object({}).with_index do |(record, memo), index|
memo[record.uid] = "#{index} in collection}"
end

what's different between each and collect method in Ruby [duplicate]

This question already has answers here:
Array#each vs. Array#map
(7 answers)
Closed 6 years ago.
From this code I don't know the difference between the two methods, collect and each.
a = ["L","Z","J"].collect{|x| puts x.succ} #=> M AA K
print a.class #=> Array
b = ["L","Z","J"].each{|x| puts x.succ} #=> M AA K
print b.class #=> Array
Array#each takes an array and applies the given block over all items. It doesn't affect the array or creates a new object. It is just a way of looping over items. Also it returns self.
arr=[1,2,3,4]
arr.each {|x| puts x*2}
Prints 2,4,6,8 and returns [1,2,3,4] no matter what
Array#collect is same as Array#map and it applies the given block of code on all the items and returns the new array. simply put 'Projects each element of a sequence into a new form'
arr.collect {|x| x*2}
Returns [2,4,6,8]
And In your code
a = ["L","Z","J"].collect{|x| puts x.succ} #=> M AA K
a is an Array but it is actually an array of Nil's [nil,nil,nil] because puts x.succ returns nil (even though it prints M AA K).
And
b = ["L","Z","J"].each{|x| puts x.succ} #=> M AA K
also is an Array. But its value is ["L","Z","J"], because it returns self.
Array#each just takes each element and puts it into the block, then returns the original array. Array#collect takes each element and puts it into a new array that gets returned:
[1, 2, 3].each { |x| x + 1 } #=> [1, 2, 3]
[1, 2, 3].collect { |x| x + 1 } #=> [2, 3, 4]
each is for when you want to iterate over an array, and do whatever you want in each iteration. In most (imperative) languages, this is the "one size fits all" hammer that programmers reach for when you need to process a list.
For more functional languages, you only do this sort of generic iteration if you can't do it any other way. Most of the time, either map or reduce will be more appropriate (collect and inject in ruby)
collect is for when you want to turn one array into another array
inject is for when you want to turn an array into a single value
Here are the two source code snippets, according to the docs...
VALUE
rb_ary_each(VALUE ary)
{
long i;
RETURN_ENUMERATOR(ary, 0, 0);
for (i=0; i<RARRAY_LEN(ary); i++) {
rb_yield(RARRAY_PTR(ary)[i]);
}
return ary;
}
# .... .... .... .... .... .... .... .... .... .... .... ....
static VALUE
rb_ary_collect(VALUE ary)
{
long i;
VALUE collect;
RETURN_ENUMERATOR(ary, 0, 0);
collect = rb_ary_new2(RARRAY_LEN(ary));
for (i = 0; i < RARRAY_LEN(ary); i++) {
rb_ary_push(collect, rb_yield(RARRAY_PTR(ary)[i]));
}
return collect;
}
rb_yield() returns the value returned by the block (see also this blog post on metaprogramming).
So each just yields and returns the original array, while collect creates a new array and pushes the results of the block into it; then it returns this new array.
Source snippets: each, collect
The difference is what it returns. In your example above
a == [nil,nil,nil] (the value of puts x.succ) while b == ["L", "Z", "J"] (the original array)
From the ruby-doc, collect does the following:
Invokes block once for each element of
self. Creates a new array containing
the values returned by the block.
Each always returns the original array. Makes sense?
Each is a method defined by all classes that include the Enumerable module. Object.eachreturns a Enumerable::Enumerator Object. This is what other Enumerable methods use to iterate through the object. each methods of each class behaves differently.
In Array class when a block is passed to each, it performs statements of the block on each element, but in the end returns self.This is useful when you don't need an array, but you maybe just want to choose elements from the array and use the as arguments to other methods. inspect and map return a new array with return values of execution of the block on each element. You can use map! and collect! to perform operations on the original array.
I think an easier way to understand it would be as below:
nums = [1, 1, 2, 3, 5]
square = nums.each { |num| num ** 2 } # => [1, 1, 2, 3, 5]
Instead, if you use collect:
square = nums.collect { |num| num ** 2 } # => [1, 1, 4, 9, 25]
And plus, you can use .collect! to mutate the original array.

Skip over iteration in Enumerable#collect

(1..4).collect do |x|
next if x == 3
x + 1
end # => [2, 3, nil, 5]
# desired => [2, 3, 5]
If the condition for next is met, collect puts nil in the array, whereas what I'm trying to do is put no element in the returned array if the condition is met. Is this possible without calling delete_if { |x| x == nil } on the returned array?
My code excerpt is heavily abstracted, so looking for a general solution to the problem.
There is method Enumerable#reject which serves just the purpose:
(1..4).reject{|x| x == 3}.collect{|x| x + 1}
The practice of directly using an output of one method as an input of another is called method chaining and is very common in Ruby.
BTW, map (or collect) is used for direct mapping of input enumerable to the output one. If you need to output different number of elements, chances are that you need another method of Enumerable.
Edit: If you are bothered by the fact that some of the elements are iterated twice, you can use less elegant solution based on inject (or its similar method named each_with_object):
(1..4).each_with_object([]){|x,a| a << x + 1 unless x == 3}
I would simply call .compact on the resultant array, which removes any instances of nil in an array. If you'd like it to modify the existing array (no reason not to), use .compact!:
(1..4).collect do |x|
next if x == 3
x
end.compact!
In Ruby 2.7+, it’s possible to use filter_map for this exact purpose. From the docs:
Returns an array containing truthy elements returned by the block.
(0..9).filter_map {|i| i * 2 if i.even? } #=> [0, 4, 8, 12, 16]
{foo: 0, bar: 1, baz: 2}.filter_map {|key, value| key if value.even? } #=> [:foo, :baz]
For the example in the question: (1..4).filter_map { |x| x + 1 unless x == 3 }.
See this post for comparison with alternative methods, including benchmarks.
just a suggestion, why don't you do it this way:
result = []
(1..4).each do |x|
next if x == 3
result << x
end
result # => [1, 2, 4]
in that way you saved another iteration to remove nil elements from the array. hope it helps =)
i would suggest to use:
(1..4).to_a.delete_if {|x| x == 3}
instead of the collect + next statement.
You could pull the decision-making into a helper method, and use it via Enumerable#reduce:
def potentially_keep(list, i)
if i === 3
list
else
list.push i
end
end
# => :potentially_keep
(1..4).reduce([]) { |memo, i| potentially_keep(memo, i) }
# => [1, 2, 4]

Resources