Fibers vs. explicit enumerators - ruby

I am toying around with Ruby to learn the language. Currently I'm trying to wrap my head around the concept of fibers. According to this answer, they are fairly often used for creating (infinite) external enumerators. On the other hand, this seems to overlap with the concept of so called explicit enumerators.
Say, I want to write a code snippet that fires consecutive prime numbers (yes, the following algorithm has a runtime of O(scary)). I can implement it by using fibers:
prime_fiber = Fiber.new do
primes = [2]
Fiber.yield 2
current = 1
loop do
current += 2
unless primes.find {|value| (current % value) == 0}
Fiber.yield current
primes << current
end
end
end
ARGV[0].to_i.times {print "#{prime_fiber.resume}, "}
This does not emit an enumerator object by itself, although it is not difficult to create one out of it. In contrast, I can also utilize an explicitly defined enumerator, which has the added benefit of already being an enumerator object:
prime_enum = Enumerator.new do |yielder|
primes = [2]
yielder.yield 2
current = 1
loop do
current += 2
unless primes.find {|value| (current % value) == 0}
yielder.yield current
primes << current
end
end
end
ARGV[0].to_i.times {print "#{prime_enum.next}, "}
# I could also write:
# p prime_enum.first(ARGV[0].to_i)
Both methods allow me to implement some sort of co-routines and they seem to be interchangeable to me. So when do I prefer one over the other? Is there some commonly agreed practice? I find it difficult to get all those idioms in my head, so I apologize in advance if this is considered a dumb question.

I would use Enumerator, it allows you to use take, take_while, even each if your sequence is finite. While Fiber is designed for light weight concurrency and is pretty limited as enumerator.
prime_enum.take(ARGV[0].to_i).each { |x| puts x }
or
prime_enum.take_while { |x| x < ARGV[0].to_i }.each { |x| puts x }

Related

Ruby: Understanding .to_enum better

I have been reading this:
https://docs.ruby-lang.org/en/2.4.0/Enumerator.html
I am trying to understand why someone would use .to_enum, I mean how is that different than just an array? I see :scan was passed into it, but what other arguments can you pass into it?
Why not just use .scan in the case below? Any advice on how to understand .to_enum better?
"Hello, world!".scan(/\w+/) #=> ["Hello", "world"]
"Hello, world!".to_enum(:scan, /\w+/).to_a #=> ["Hello", "world"]
"Hello, world!".to_enum(:scan).each(/\w+/).to_a #=> ["Hello", "world"]
Arrays are, necessarily, constructs that are in memory. An array with a a lot of entries takes up a lot of memory.
To put this in context, here's an example, finding all the "palindromic" numbers between 1 and 1,000,000:
# Create a large array of the numbers to search through
numbers = (1..1000000).to_a
# Filter to find palindromes
numbers.select do |i|
is = i.to_s
is == is.reverse
end
Even though there's only 1998 such numbers, the entire array of a million needs to be created, then sifted through, then kept around until garbage collected.
An enumerator doesn't necessarily take up any memory at all, not in a consequential way. This is way more efficient:
# Uses an enumerator instead
numbers = (1..1000000).to_enum
# Filtering code looks identical, but behaves differently
numbers.select do |i|
is = i.to_s
is == is.reverse
end
You can even take this a step further by making a custom Enumerator:
palindromes = Enumerator.new do |y|
1000000.times do |i|
is = (i + 1).to_s
y << i if (is == is.reverse)
end
end
This one doesn't even bother with filtering, it just emits only palindromic numbers.
Enumerators can also do other things like be infinite in length, whereas arrays are necessarily finite. An infinite enumerator can be useful when you want to filter and take the first N matching entries, like in this case:
# Open-ended range, new in Ruby 2.6. Don't call .to_a on this!
numbers = (1..).to_enum
numbers.lazy.select do |i|
is = i.to_s
is == is.reverse
end.take(1000).to_a
Using .lazy here means it does the select, then filters through take with each entry until the take method is happy. If you remove the lazy it will try and evaluate each stage of this to completion, which on an infinite enumerator never happens.

What is prefered way to loop in Ruby?

Why is each loop preferred over for loop in Ruby? Is there a difference in time complexity or are they just syntactically different?
Yes, these are two different ways of iterating over, But hope this calculation helps.
require 'benchmark'
a = Array( 1..100000000 )
sum = 0
Benchmark.realtime {
a.each { |x| sum += x }
}
This takes 5.866932 sec
a = Array( 1..100000000 )
sum = 0
Benchmark.realtime {
for x in a
sum += x
end
}
This takes 6.146521 sec.
Though its not a right way to do the benchmarking, there are some other constraints too. But on a single machine, each seems to be a bit faster than for.
The variable referencing an item in iteration is temporary and does not have significance outside of the iteration. It is better if it is hidden from outside of the iteration. With external iterators, such variable is located outside of the iteration block. In the following, e is useful only within do ... end, but is separated from the block, and written outside of it; it does not look easy to a programmer:
for e in [:foo, :bar] do
...
end
With internal iterators, the block variable is defined right inside the block, where it is used. It is easier to read:
[:foo, :bar].each do |e|
...
end
This visibility issue is not just for a programmer. With respect to visibility in the sense of scope, the variable for an external iterator is accessible outside of the iteration:
for e in [:foo] do; end
e # => :foo
whereas in internal iterator, a block variable is invisible from outside:
[:foo].each do |e|; end
e # => undefined local variable or method `e'
The latter is better from the point of view of encapsulation.
When you want to nest the loops, the order of variables would be somewhat backwards with external iterators:
for a in [[:foo, :bar]] do
for e in a do
...
end
end
but with internal iterators, the order is more straightforward:
[[:foo, :bar]].each do |a|
a.each do |e|
...
end
end
With external iterators, you can only use hard-coded Ruby syntax, and you also have to remember the matching between the keyword and the method that is internally called (for calls each), but for internal iterators, you can define your own, which gives flexibility.
each is the Ruby Way. Implements the Iterator Pattern that has decoupling benefits.
Check also this: "for" vs "each" in Ruby
An interesting question. There are several ways of looping in Ruby. I have noted that there is a design principle in Ruby, that when there are multiple ways of doing the same, there are usually subtle differences between them, and each case has its own unique use, its own problem that it solves. So in the end you end up needing to be able to write (and not just to read) all of them.
As for the question about for loop, this is similar to my earlier question whethe for loop is a trap.
Basically there are 2 main explicit ways of looping, one is by iterators (or, more generally, blocks), such as
[1, 2, 3].each { |e| puts e * 10 }
[1, 2, 3].map { |e| e * 10 )
# etc., see Array and Enumerable documentation for more iterator methods.
Connected to this way of iterating is the class Enumerator, which you should strive to understand.
The other way is Pascal-ish looping by while, until and for loops.
for y in [1, 2, 3]
puts y
end
x = 0
while x < 3
puts x; x += 1
end
# same for until loop
Like if and unless, while and until have their tail form, such as
a = 'alligator'
a.chop! until a.chars.last == 'g'
#=> 'allig'
The third very important way of looping is implicit looping, or looping by recursion. Ruby is extremely malleable, all classes are modifiable, hooks can be set up for various events, and this can be exploited to produce most unusual ways of looping. The possibilities are so endless that I don't even know where to start talking about them. Perhaps a good place is the blog by Yusuke Endoh, a well known artist working with Ruby code as his artistic material of choice.
To demonstrate what I mean, consider this loop
class Object
def method_missing sym
s = sym.to_s
if s.chars.last == 'g' then s else eval s.chop end
end
end
alligator
#=> "allig"
Aside of readability issues, the for loop iterates in the Ruby land whereas each does it from native code, so in principle each should be more efficient when iterating all elements in an array.
Loop with each:
arr.each {|x| puts x}
Loop with for:
for i in 0..arr.length
puts arr[i]
end
In the each case we are just passing a code block to a method implemented in the machine's native code (fast code), whereas in the for case, all code must be interpreted and run taking into account all the complexity of the Ruby language.
However for is more flexible and lets you iterate in more complex ways than each does, for example, iterating with a given step.
EDIT
I didn't come across that you can step over a range by using the step() method before calling each(), so the flexibility I claimed for the for loop is actually unjustified.

Ruby find in array with offset

I'm looking for a way to do the following in Ruby in a cleaner way:
class Array
def find_index_with_offset(offset, &block)
[offset..-1].find &block
end
end
offset = array.find_index {|element| element.meets_some_criterion?}
the_object_I_want =
array.find_index_with_offset(offset+1) {|element| element.meets_another_criterion?}
So I'm searching a Ruby array for the index of some object and then I do a follow-up search to find the first object that matches some other criterion and has a higher index in the array. Is there a better way to do this?
What do I mean by cleaner: something that doesn't involve explicitly slicing the array. When you do this a couple of times, calculating the slicing indices gets messy fast. I'd like to keep operating on the original array. It's easier to understand and less error-prone.
NB. In my actual code I haven't monkey-patched Array, but I want to draw attention to the fact that I expect I'm duplicating existing functionality of Array/Enumerable
Edits
Fixed location of offset + 1 as per Mladen Jablanović's comment; rewrite error
Added explanation of 'cleaner' as per Mladen Jablanović's comment
Cleaner is here obviously subjective matter. If you aim for short, I don't think you could do better than that. If you want to be able to chain multiple such finds, or you are bothered by slicing, you can do something like this:
module Enumerable
def find_multi *procs
return nil if procs.empty?
find do |e|
if procs.first.call(e)
procs.shift
next true if procs.empty?
end
false
end
end
end
a = (1..10).to_a
p a.find_multi(lambda{|e| e % 5 == 0}, lambda{|e| e % 3 == 0}, lambda{|e| e % 4 == 0})
#=> 8
Edit: And if you're not concerned with the performance you could do something like:
array.drop_while{|element|
!element.meets_some_criterion?
}.drop(1).find{|element|
element.meets_another_criterion?
}

Do something infinitely many times with an index

In more ruby way of doing project euler #2 , part of the code is
while((v = fib(i)) < 4_000_000)
s+=v if v%2==0
i+=1
end
Is there a way to change i += 1 into a more functional programming style construct?
The best I can think of is
Float::MAX.to_i.times do |i|
v = fib(i)
break unless v < 4_000_000
s += v if v%2==0
end
because you can't call .times on a floating point number.
Numeric.step has default parameters of infinity (the limit) and 1 (the step size).
1.step do |i|
#...
end
For fun, you might even want to try
1.step.size
There’s a predefined (in 1.9.2) constant Float::INFINITY, so you could write
1.upto(Float::INFINITY) do |i|
...
end
(You could also use Enumerator and take_while, turning the problem inside out to make it look more like Haskell or Python, but take_while is greedy and builds an array.)
Ruby 2.5 introduced the open-ended Range:
(1..).each do |i|
#...
end

Higher order iterators in Ruby?

I was reading a Ruby question about the .each iterator, and someone stated that using .each can be a code smell if higher order iterators are better suited for the task. What are higher order iterators in Ruby?
edit: Jörg W Mittag, the author of the StackOverflow answer that I was referring to mentioned that he meant to write higher level iterators, but he also explained what they are very well below.
Oops. I meant higher-level iterators, not higher-order. Every iterator is of course by definition higher-order.
Basically, iteration is a very low-level concept. The purpose of programming is to communicate intent to the other stakeholders on the team. "Initializing an empty array, then iterating over another array and adding the current element of this array to the first array if it is divisible by two without a remainder" is not communicating intent. "Selecting all even numbers" is.
In general, you almost never iterate over a collection just for iteration's sake. You either want to
transform each element in some way (that's usually called map, in Ruby and Smalltalk it's collect and in .NET and SQL it's Select),
reduce the whole collection down to some single value, e.g. computing the sum or the average or the standard deviation of a list of football scores (in category theory, that's called a catamorphism, in functional programming it is fold or reduce, in Smalltalk it's inject:into:, in Ruby it's inject and in .NET Aggregate),
filter out all elements that satisfy a certain condition (filter in most functional languages, select in Smalltalk and Ruby, also find_all in Ruby, Where in .NET and SQL),
filter out all elements that do not satisfy a condition (reject in Smalltalk and Ruby)
find the first element that satisfies a condition (find in Ruby)
count the elements thats satisfy a condition (count in Ruby)
check if all elements (all?), at least one element (any?) or no elements (none?) satisfy a condition
group the elements into buckets based on some discriminator (group_by in Ruby, .NET and SQL)
partition the collection into two collections based on some predicate (partition)
sort the collection (sort, sort_by)
combine multiple collections into one (zip)
and so on and so forth …
Almost never is your goal to just iterate over a collection.
In particular, reduce aka. inject aka. fold aka. inject:into: aka. Aggregate aka. catamorphism is your friend. There's a reason why it has such a fancy-sounding mathematical name: it is extremely powerful. In fact, most of what I mentioned above, can be implemented in terms of reduce.
Basically, what reduce does, is it "reduces" the entire collection down to a single value, using some function. You have some sort of accumulator value, and then you take the accumulator value and the first element and feed it into the function. The result of that function then becomes the new accumulator, which you pair up with the second element and feed to the function and so on.
The most obvious example of this is summing a list of numbers:
[4, 8, 15, 16, 23, 42].reduce(0) {|acc, elem|
acc + elem
}
So, the accumulator starts out as 0, and we pass the first element 4 into the + function. The result is 4, which becomes the new accumulator. Now we pass the next element 8 in and the result is 12. And this continues till the last element and the result is that they were dead the whole time. No, wait, the result is 108.
Ruby actually allows us to take a couple of shortcuts: If the element type is the same as the accumulator type, you can leave out the accumulator and Ruby will simply pass the first element as the first value for the accumulator:
[4, 8, 15, 16, 23, 42].reduce {|acc, elem|
acc + elem
}
Also, we can use Symbol#to_proc here:
[4, 8, 15, 16, 23, 42].reduce(&:+)
And actually, if you pass reduce a Symbol argument it will treat as the name of the function to use for the reduction operation:
[4, 8, 15, 16, 23, 42].reduce(:+)
However, summing is not all that reduce can do. In fact, I find this example a little dangerous. Everybody I showed this to, immediately understood, "Aah, so that's what a reduce is", but unortunately some also thought that summing numbers is all reduce is, and that's definitely not the case. In fact, reduce is a general method of iteration, by which I mean that reduce can do anything that each can do. In particular, you can store arbitrary state in the accumulator.
For example, I wrote above that reduce reduces the collection down to a single value. But of course that "single value" can be arbitrarily complex. It could, for example, be itself a collection. Or a string:
class Array
def mystery_method(foo)
drop(1).reduce("#{first}") {|s, el| s << foo.to_str << el.to_s }
end
end
This is an example how far you can go with playing tricks with the accumulator. If you try it out, you'll of course recognize it as Array#join:
class Array
def join(sep=$,)
drop(1).reduce("#{first}") {|s, el| s << sep.to_str << el.to_s }
end
end
Note that nowhere in this "loop" do I have to keep track of whether I'm at the last or second-to-last element. Nor is there any conditional in the code. There is no potential for fencepost errors here. If you think about how to implement this with each, you would have to somehow keep track of the index and check whether you are at the last element and then have an if in there, to prevent emitting the separator at the end.
Since I wrote above that all iteration can be done with reduce, I might just as well prove it. Here's Ruby's Enumerable methods, implemented in terms of reduce instead of each as they normally are. (Note that I only just started and have only arrived at g yet.)
module Enumerable
def all?
reduce(true) {|res, el| res && yield(el) }
end
def any?
reduce(false) {|res, el| res || yield(el) }
end
alias_method :map, def collect
reduce([]) {|res, el| res << yield(el) }
end
def count
reduce(0) {|res, el| res + 1 if yield el }
end
alias_method :find, def detect
reduce(nil) {|res, el| if yield el then el end unless res }
end
def drop(n=1)
reduce([]) {|res, el| res.tap {|res| res << el unless n -= 1 >= 0 }}
end
def drop_while
reduce([]) {|res, el| res.tap {|res| res << el unless yield el }}
end
def each
reduce(nil) {|_, el| yield el }
end
def each_with_index
tap { reduce(-1) {|i, el| (i+1).tap {|i| yield el, i }}}
end
alias_method :select, def find_all
reduce([]) {|res, el| res.tap {|res| res << el if yield el }}
end
def grep(pattern)
reduce([]) {|res, el| res.tap {|res| res << yield(el) if pattern === el }}
end
def group_by
reduce(Hash.new {|hsh, key| hsh[key] = [] }) {|res, el| res.tap {|res|
res[yield el] = el
}}
end
def include?(obj)
reduce(false) {|res, el| break true if res || el == obj }
end
def reject
reduce([]) {|res, el| res.tap {|res| res << el unless yield el }}
end
end
[Note: I made some simplifications for the purpose of this post. For example, according to the standard Ruby Enumerable protocol, each is supposed to return self, so you'd have to slap an extra line in there; other methods behave slightly differently, depending on what kind and how many arguments you pass in and so on. I left those out because they distract from the point I am trying to make.]
They're talking about more specialized methods such as map, filter or inject. For example, instead of this:
even_numbers = []
numbers.each {|num| even_numbers << num if num.even?}
You should do this:
even_numbers = numbers.select {|num| num.even?}
It says what you want to do but encapsulates all the irrelevant technical details in the select method. (And incidentally, in Ruby 1.8.7 or later, you can just write even_numbers = numbers.select(&:even?), so even more concise if slightly Perl-like.)
These aren't normally called "higher-order iterators," but whoever wrote that probably just had a minor mental mixup. It's a good principle whatever terminology you use.
From the usual definition of "higer-order" I would say a higher-order iterator is an iterator which takes an iterator as an argument or returns an iterator. So something like enum_for maybe. However I don't think this is what the person meant.
I think the person meant iterators like map or select which are higher-order functions, but did not realize that each is of course also a higher-order function. So basically this is just a case of terminological confusion.
The point of the poster presumably was that you should not use each in cases where map, select or inject could naturally be used instead. And to make that point he used a term, that didn't really make sense in that context.
I get this question a fair bit, so I blogged about the most commonly used iterators: select and reject. In the post there are examples of where 'each' is used incorrectly and how to correct the code to use either 'select' or 'reject'. Anyway, I hope it helps.
http://www.natontesting.com/2011/01/01/rubys-each-select-and-reject-methods/
I just wrote a blog that is very relavent to this question - The reason you want to use higher order functions is that doing so elevates the programmer to a higher level of abstraction, to a point that a problem can be expressed declaratively, pushing the implementation down either to the Ruby standard library or lower level code.
http://www.railstutors.com/blog/declarative-thinking-with-higher-order-functions-and-blocks#.UG5x6fl26jJ

Resources