When is the Enumerator::Yielder#yield method useful? - ruby

The question "Meaning of the word yield" mentions the Enumerator::Yielder#yield method. I haven't used it before, and wonder under what circumstances it would be useful.
Is it mainly useful when you want to create an infinite list of items, such as the Sieve of Eratosthenes, and when you need to use an external iterator?

"How to create an infinite enumerable of Times?" talks about constructing and lazy iterators, but my favorite usage is wrapping an existing Enumerable with additional functionality (any enumerable, without needing to know what it really is, whether it's infinite or not etc).
A trivial example would be implementing the each_with_index method (or, more generally, with_index method):
module Enumerable
def my_with_index
Enumerator.new do |yielder|
i = 0
self.each do |e|
yielder.yield e, i
i += 1
end
end
end
def my_each_with_index
self.my_with_index.each do |e, i|
yield e, i
end
end
end
[:foo, :bar, :baz].my_each_with_index do |e,i|
puts "#{i}: #{e}"
end
#=>0: foo
#=>1: bar
#=>2: baz
Extending to something not already implemented in the core library, such as cyclically assigning value from a given array to each enumerable element (say, for coloring table rows):
module Enumerable
def with_cycle values
Enumerator.new do |yielder|
self.each do |e|
v = values.shift
yielder.yield e, v
values.push v
end
end
end
end
p (1..10).with_cycle([:red, :green, :blue]).to_a # works with any Enumerable, such as Range
#=>[[1, :red], [2, :green], [3, :blue], [4, :red], [5, :green], [6, :blue], [7, :red], [8, :green], [9, :blue], [10, :red]]
The whole point is that these methods return an Enumerator, which you then combine with the usual Enumerable methods, such as select, map, inject etc.

For example you can use it to construct Rack response bodies inline, without creating classes. An Enumerator can also work "outside-in" - you call Enumerator#each which calls next on the enumerator and returns every value in sequence. For example, you can make a Rack response body returning a sequence of numbers:
run ->(env) {
body = Enumerator.new do |y|
9.times { |i| y.yield(i.to_s) }
end
[200, {'Content-Length' => '9'}, body]
}

Since Mladen mentioned getting other answers, I thought I would give an example of something I just did earlier today while writing an application that will receive data from multiple physical devices, analyze the data, and connect related data (that we see from multiple devices). This is a long-running application, and if I never threw away data (say, at least a day old with no updates), then it would grow infinitely large.
In the past, I would have done something like this:
delete_old_stuff if rand(300) == 0
and accomplish this using random numbers. However, this is not purely deterministic. I know that it will run approximately once every 300 evaluations (i.e. seconds), but it won't be exactly once every 300 times.
What I wrote up earlier looks like this:
counter = Enumerator.new do |y|
a = (0..300)
loop do
a.each do |b|
y.yield b
end
delete_old_stuff
end
end
and I can replace delete_old_stuff if rand(300) == 0 with counter.next
Now, I'm sure there is a more efficient or pre-made way of doing this, but being sparked to play with Enumerator::Yielder#yield by your question and the linked question, this is what I came up with.

It seems to be useful when you have multiple objects you want to enumerate over, but flat_map isn't suitable, and you want to chain the enumeration with another action:
module Enumerable
def count_by
items_grouped_by_criteria = group_by {|object| yield object}
counts = items_grouped_by_criteria.map{|key, array| [key, array.length]}
Hash[counts]
end
end
def calculate_letter_frequencies
each_letter.count_by {|letter| letter}
end
def each_letter
filenames = ["doc/Quickstart", "doc/Coding style"]
# Joining the text of each file into a single string would be memory-intensive
enumerator = Enumerator.new do |yielder|
filenames.each do |filename|
text = File.read(filename)
text.chars.each {|letter| yielder.yield(letter)}
end
end
enumerator
end
calculate_letter_frequencies

Related

How does Ruby Enumerators chaining work exactly?

Consider the following code:
[1,2,3].map.with_index { |x, i| x * i }
# => [0,2,6]
How does this work exactly?
My mental model of map is that it iterates and apply a function on each element. Is with_index somehow passing a function to the enumerator [1,2,3].map, in which case what would that function be?
This SO thread shows how enumerators pass data through, but doesn't answer the question. Indeed, if you replace map with each then the behaviour is different:
[1,2,3].each.with_index { |x, i| x * i }
# => [1,2,3]
map seems to carry the information that a function has to be applied, on top of carrying the data to iterate over. How does that work?
Todd's answer is excellent, but I feel like seeing some more Ruby code might be beneficial. Specifically, let's try to write each and map on Array ourselves.
I won't use any Enumerable or Enumerator methods directly, so we see how it's all working under the hood (I'll still use for loops, and those technically call #each under the hood, but that's only cheating a little)
First, there's each. each is easy. It iterates over the array and applies a function to each element, before returning the original array.
def my_each(arr, &block)
for i in 0..arr.length-1
block[arr[i]]
end
arr
end
Simple enough. Now what if we don't pass a block. Let's change it up a bit to support that. We effectively want to delay the act of doing the each to allow the Enumerator to do its thing
def my_each(arr, &block)
if block
for i in 0..arr.length-1
block[arr[i]]
end
arr
else
Enumerator.new do |y|
my_each(arr) { |*x| y.yield(*x) }
end
end
end
So if we don't pass a block, we make an Enumerator that, when consumed, calls my_each, using the enumerator yield object as a block. The y object is a funny thing but you can just think of it as basically being the block you'll eventually pass in. So, in
my_each([1, 2, 3]).with_index { |x, i| x * i }
Think of y as being like the { |x, i| x * i } bit. It's a bit more complicated than that, but that's the idea.
Incidentally, on Ruby 2.7 and later, the Enumerator::Yielder object got its own #to_proc, so if you're on a recent Ruby version, you can just do
Enumerator.new do |y|
my_each(arr, &y)
end
rather than
Enumerator.new do |y|
my_each(arr) { |*x| y.yield(*x) }
end
Now let's extend this approach to map. Writing map with a block is easy. It's just like each but we accumulate the results.
def my_map(arr, &block)
result = []
for i in 0..arr.length-1
result << block[arr[i]]
end
result
end
Simple enough. Now what if we don't pass a block? Let's do the exact same thing we did for my_each. That is, we're just going to make an Enumerator and, inside that Enumerator, we call my_map.
def my_map(arr, &block)
if block
result = []
for i in 0..arr.length-1
result << block[arr[i]]
end
result
else
Enumerator.new do |y|
my_map(arr) { |*x| y.yield(*x) }
end
end
end
Now, the Enumerator knows that, whenever it eventually gets a block, it's going to use my_map on that block at the end. We can see that these two functions actually behave, on arrays, like map and each do
my_each([1, 2, 3]).with_index { |x, i| x * i } # [1, 2, 3]
my_map ([1, 2, 3]).with_index { |x, i| x * i } # [0, 2, 6]
So your intuition was spot on
map seems to carry the information that a function has to be applied, on top of carrying the data to iterate over. How does that work?
That's exactly what it does. map creates an Enumerator whose block knows to call map at the end, whereas each does the same but with each. Of course, in reality, all of this is implemented in C for efficiency and bootstrapping reasons, but the fundamental idea is still there.
Using Array#map without a block simply returns an Enumerator, where each element is then fed to Enumerator#with_index and the results of the block are returned as a collection. It's not complicated, and is similar to (but perhaps cleaner than) the following code. Using Ruby 3.0.1:
results = []
[1, 2, 3].each_with_index { results << _1 * _2 }
results
#=> [0, 2, 6]
Using Array#each doesn't return a collection from the block. It just returns self or another enumerator, so the expected behavior is different by design.

Using range.each vs while-loop to work with sequence of numbers in Ruby

Total beginner here, so I apologize if a) this question isn't appropriate or b) I haven't asked it properly.
I'm working on simple practice problems in Ruby and I noticed that while I arrived at a solution that works, when my solution runs in a visualizer, it gives premature returns for the array. Is this problematic? I'm also wondering if there's any reason (stylistically, conceptually, etc.) why you would want to use a while-loop vs. a for-loop with range for a problem like this or fizzbuzz.
Thank you for any help/advice!
The practice problem is:
# Write a method which collects all numbers between small_num and big_num into
an array. Ex: range(2, 5) => [2, 3, 4, 5]
My solution:
def range(small_num, big_num)
arr = []
(small_num..big_num).each do |num|
arr.push(num)
end
return arr
end
The provided solution:
def range(small_num, big_num)
collection = []
i = small_num
while i <= big_num
collection << i
i += 1
end
collection
end
Here's a simplified version of your code:
def range(small_num, big_num)
arr = [ ]
(small_num..big_num).each do |num|
arr << num
end
arr
end
Where the << or push function does technically have a return value, and that return value is the modified array. This is just how Ruby works. Every method must return something even if that something is "nothing" in the form of nil. As with everything in Ruby even nil is an object.
You're not obligated to use the return values, though if you did want to you could. Here's a version with inject:
def range(small_num, big_num)
(small_num..big_num).inject([ ]) do |arr, num|
arr << num
end
end
Where the inject method takes the return value of each block and feeds it in as the "seed" for the next round. As << returns the array this makes it very convenient to chain.
The most minimal version is, of course:
def range(small_num, big_num)
(small_num..big_num).to_a
end
Or as Sagar points out, using the splat operator:
def range(small_num, big_num)
[*small_num..big_num]
end
Where when you splat something you're in effect flattening those values into the array instead of storing them in a sub-array.

Ruby's factor method explanation

class Integer
def factors
1.upto(Math.sqrt(self)).select {|i| (self % i).zero?}.inject([]) do |f, i|
f << self/i unless i == self/i
f << i
end.sort
end
end
[45, 53, 64].each {|n| puts "#{n} : #{n.factors}"}
In the above Ruby's code. It is a method to find all factors of an integer. There are several places where I am not too sure about the syntax of.
inject([]) - I have seen inject(:+) and inject(:*) where it is to add / multiply back into the result. I wonder if this is to push this into the existing array? What's the best way to explain this method?
after the inject([]) there is a do |f,i| block. I am not too sure if the i is different from the i declared outside the block. I assume not? and I am not too sure about what is block is trying to achieve.
end.sort, Haven't seen it before.
Would be grateful to have advice on this block of code! Thanks!
my_array.inject(&:+) is the same as my_array.inject(0) do |a, b| a + b end
The first a will be 0, and the first b will be the first element of my_array
The inside i is set by the inject method and is not the same identifier as the external i. inject will do something like yield(current_value, self[current_index]) and these will be your |f,i|
In end.sort, end is the end of the inject method call, so it returns a sorted list from the list inject returns.

Higher order iterators in Ruby?

I was reading a Ruby question about the .each iterator, and someone stated that using .each can be a code smell if higher order iterators are better suited for the task. What are higher order iterators in Ruby?
edit: Jörg W Mittag, the author of the StackOverflow answer that I was referring to mentioned that he meant to write higher level iterators, but he also explained what they are very well below.
Oops. I meant higher-level iterators, not higher-order. Every iterator is of course by definition higher-order.
Basically, iteration is a very low-level concept. The purpose of programming is to communicate intent to the other stakeholders on the team. "Initializing an empty array, then iterating over another array and adding the current element of this array to the first array if it is divisible by two without a remainder" is not communicating intent. "Selecting all even numbers" is.
In general, you almost never iterate over a collection just for iteration's sake. You either want to
transform each element in some way (that's usually called map, in Ruby and Smalltalk it's collect and in .NET and SQL it's Select),
reduce the whole collection down to some single value, e.g. computing the sum or the average or the standard deviation of a list of football scores (in category theory, that's called a catamorphism, in functional programming it is fold or reduce, in Smalltalk it's inject:into:, in Ruby it's inject and in .NET Aggregate),
filter out all elements that satisfy a certain condition (filter in most functional languages, select in Smalltalk and Ruby, also find_all in Ruby, Where in .NET and SQL),
filter out all elements that do not satisfy a condition (reject in Smalltalk and Ruby)
find the first element that satisfies a condition (find in Ruby)
count the elements thats satisfy a condition (count in Ruby)
check if all elements (all?), at least one element (any?) or no elements (none?) satisfy a condition
group the elements into buckets based on some discriminator (group_by in Ruby, .NET and SQL)
partition the collection into two collections based on some predicate (partition)
sort the collection (sort, sort_by)
combine multiple collections into one (zip)
and so on and so forth …
Almost never is your goal to just iterate over a collection.
In particular, reduce aka. inject aka. fold aka. inject:into: aka. Aggregate aka. catamorphism is your friend. There's a reason why it has such a fancy-sounding mathematical name: it is extremely powerful. In fact, most of what I mentioned above, can be implemented in terms of reduce.
Basically, what reduce does, is it "reduces" the entire collection down to a single value, using some function. You have some sort of accumulator value, and then you take the accumulator value and the first element and feed it into the function. The result of that function then becomes the new accumulator, which you pair up with the second element and feed to the function and so on.
The most obvious example of this is summing a list of numbers:
[4, 8, 15, 16, 23, 42].reduce(0) {|acc, elem|
acc + elem
}
So, the accumulator starts out as 0, and we pass the first element 4 into the + function. The result is 4, which becomes the new accumulator. Now we pass the next element 8 in and the result is 12. And this continues till the last element and the result is that they were dead the whole time. No, wait, the result is 108.
Ruby actually allows us to take a couple of shortcuts: If the element type is the same as the accumulator type, you can leave out the accumulator and Ruby will simply pass the first element as the first value for the accumulator:
[4, 8, 15, 16, 23, 42].reduce {|acc, elem|
acc + elem
}
Also, we can use Symbol#to_proc here:
[4, 8, 15, 16, 23, 42].reduce(&:+)
And actually, if you pass reduce a Symbol argument it will treat as the name of the function to use for the reduction operation:
[4, 8, 15, 16, 23, 42].reduce(:+)
However, summing is not all that reduce can do. In fact, I find this example a little dangerous. Everybody I showed this to, immediately understood, "Aah, so that's what a reduce is", but unortunately some also thought that summing numbers is all reduce is, and that's definitely not the case. In fact, reduce is a general method of iteration, by which I mean that reduce can do anything that each can do. In particular, you can store arbitrary state in the accumulator.
For example, I wrote above that reduce reduces the collection down to a single value. But of course that "single value" can be arbitrarily complex. It could, for example, be itself a collection. Or a string:
class Array
def mystery_method(foo)
drop(1).reduce("#{first}") {|s, el| s << foo.to_str << el.to_s }
end
end
This is an example how far you can go with playing tricks with the accumulator. If you try it out, you'll of course recognize it as Array#join:
class Array
def join(sep=$,)
drop(1).reduce("#{first}") {|s, el| s << sep.to_str << el.to_s }
end
end
Note that nowhere in this "loop" do I have to keep track of whether I'm at the last or second-to-last element. Nor is there any conditional in the code. There is no potential for fencepost errors here. If you think about how to implement this with each, you would have to somehow keep track of the index and check whether you are at the last element and then have an if in there, to prevent emitting the separator at the end.
Since I wrote above that all iteration can be done with reduce, I might just as well prove it. Here's Ruby's Enumerable methods, implemented in terms of reduce instead of each as they normally are. (Note that I only just started and have only arrived at g yet.)
module Enumerable
def all?
reduce(true) {|res, el| res && yield(el) }
end
def any?
reduce(false) {|res, el| res || yield(el) }
end
alias_method :map, def collect
reduce([]) {|res, el| res << yield(el) }
end
def count
reduce(0) {|res, el| res + 1 if yield el }
end
alias_method :find, def detect
reduce(nil) {|res, el| if yield el then el end unless res }
end
def drop(n=1)
reduce([]) {|res, el| res.tap {|res| res << el unless n -= 1 >= 0 }}
end
def drop_while
reduce([]) {|res, el| res.tap {|res| res << el unless yield el }}
end
def each
reduce(nil) {|_, el| yield el }
end
def each_with_index
tap { reduce(-1) {|i, el| (i+1).tap {|i| yield el, i }}}
end
alias_method :select, def find_all
reduce([]) {|res, el| res.tap {|res| res << el if yield el }}
end
def grep(pattern)
reduce([]) {|res, el| res.tap {|res| res << yield(el) if pattern === el }}
end
def group_by
reduce(Hash.new {|hsh, key| hsh[key] = [] }) {|res, el| res.tap {|res|
res[yield el] = el
}}
end
def include?(obj)
reduce(false) {|res, el| break true if res || el == obj }
end
def reject
reduce([]) {|res, el| res.tap {|res| res << el unless yield el }}
end
end
[Note: I made some simplifications for the purpose of this post. For example, according to the standard Ruby Enumerable protocol, each is supposed to return self, so you'd have to slap an extra line in there; other methods behave slightly differently, depending on what kind and how many arguments you pass in and so on. I left those out because they distract from the point I am trying to make.]
They're talking about more specialized methods such as map, filter or inject. For example, instead of this:
even_numbers = []
numbers.each {|num| even_numbers << num if num.even?}
You should do this:
even_numbers = numbers.select {|num| num.even?}
It says what you want to do but encapsulates all the irrelevant technical details in the select method. (And incidentally, in Ruby 1.8.7 or later, you can just write even_numbers = numbers.select(&:even?), so even more concise if slightly Perl-like.)
These aren't normally called "higher-order iterators," but whoever wrote that probably just had a minor mental mixup. It's a good principle whatever terminology you use.
From the usual definition of "higer-order" I would say a higher-order iterator is an iterator which takes an iterator as an argument or returns an iterator. So something like enum_for maybe. However I don't think this is what the person meant.
I think the person meant iterators like map or select which are higher-order functions, but did not realize that each is of course also a higher-order function. So basically this is just a case of terminological confusion.
The point of the poster presumably was that you should not use each in cases where map, select or inject could naturally be used instead. And to make that point he used a term, that didn't really make sense in that context.
I get this question a fair bit, so I blogged about the most commonly used iterators: select and reject. In the post there are examples of where 'each' is used incorrectly and how to correct the code to use either 'select' or 'reject'. Anyway, I hope it helps.
http://www.natontesting.com/2011/01/01/rubys-each-select-and-reject-methods/
I just wrote a blog that is very relavent to this question - The reason you want to use higher order functions is that doing so elevates the programmer to a higher level of abstraction, to a point that a problem can be expressed declaratively, pushing the implementation down either to the Ruby standard library or lower level code.
http://www.railstutors.com/blog/declarative-thinking-with-higher-order-functions-and-blocks#.UG5x6fl26jJ

First order array difference in Ruby

What's the slickest, most Ruby-like way to do this?
[1, 3, 10, 5].diff
should produce
[2, 7, -5]
that is, an array of first order differences. I've come up with a solution which I'll add below, but it requires ruby 1.9 and isn't all that slick. what else is possible?
I like this functional style:
module Enumerable
def diff
each_cons(2).map {|pair| pair.reverse.reduce :-}
end
end
EDIT: I just realized that the reverse is totally unnecessary. If this were a functional language, I would have used pattern matching, but Ruby doesn't support pattern matching. It does, however, support destructuring bind, which is a good enough approximation for pattern matching in this case.
each_cons(2).map {|first, second| second - first}
No smiley, though.
I like how this sounds if you just read it out loud from left to right: "For each pair, apply the difference between the first and second elements of the pair." In fact, I normally don't like the name collect and prefer map instead, but in this case that reads even better:
each_cons(2).collect {|first, second| second - first}
"For each pair, collect the difference between its elements." Sounds almost like a definition of first order difference.
Yet another way..Seems the shortest so far:)
module Enumerable
def diff
self[1..-1].zip(self).map {|x| x[0]-x[1]}
end
end
The concept comes from functional programming, of course:
module Enumerable
def diff
self.inject([0]) { |r,x| r[-1] += x; r << -x } [1..-2]
end
end
[1,3,10,5].diff
Note that you don't need any separate intermediate variables here
Here's the fastest way I could find (faster than all the others suggested here as of this moment, in both 1.8 and 1.9):
module Enumerable
def diff
last=nil
map do |x|
r = last ? x - last : nil
last = x
r
end.compact
end
end
With this close runner-up:
module Enumerable
def diff
r = []
1.upto(size-1) {|i| r << self[i]-self[i-1]}
r
end
end
Of the others here, testr's self-described "feeble" attempt is the next fastest, but it's still slower than either of these.
And if speed is no object, here's my aesthetic favorite:
module Enumerable
def diff!
[-shift+first] + diff! rescue []
end
def diff
dup.diff!
end
end
But this is (for reasons I don't entirely understand) an order of magnitude slower than any other suggestion here!
Minor variation on Jörg W Mittag's:
module Enumerable
def diff
each_cons(2).map{|a,b| b-a}
end
end
# Attempt, requires ruby 1.9.
module Enumerable
def diff
each_cons(2).with_object([]){|x,array| array << x[1] - x[0]}
end
end
Example:
[1,3,10,5].diff
=> [2, 7, -5]
Another way to do it.
module Enumerable
def diff
result = []
each_with_index{ |x, i|
return result if (i == (self.length-1))
result << self[i+1] - x
}
end
end
My feeble attempt...
module Enumerable
def diff
na = []
self.each_index { |x| r << self[x]-self[x-1] if x > 0 }
na
end
end
p [1,3,10,5].diff #returned [2, 7, -5]

Resources