Ruby enumerable reset? - ruby

I'm having trouble understanding exactly how much state a ruby enumerable keeps.
I know some python, so I was expecting that after I take an item from an enumerable, it's gone and the next item will be returned as I take another item.
Strangely, this does happen when I use next but not when I use anything like take of first.
Here's an example:
a = [1,2,3].to_enum
# => #<Enumerator: [1, 2, 3]:each>
a.take(2)
# => [1, 2]
a.next
# => 1
a.next
# => 2
a.take(2)
# => [1, 2]
a.next
# => 3
a.next
# StopIteration: iteration reached an end
# from (irb):58:in `next'
# from (irb):58
# from /usr/bin/irb:12:in `<main>'
a.take(2)
# => [1, 2]
It seems like the enumerable keeps state between next calls, but resets before each take call?

It may be a little confusing, but it's important to note that in Ruby there is the Enumerator class and the Enumerable module.
The Enumerator class includes Enumerable (like most of enumerable objects such as Array, Hash, etc.
The next method is provided as part of the Enumerator, which indeed has an internal state. You can consider an Enumerator very close to the concept of Iterator exposed by other languages.
When you instantiate the Enumerator, the internal pointer points to the first item in the collection.
2.1.5 :021 > a = [1,2,3].to_enum
=> #<Enumerator: [1, 2, 3]:each>
2.1.5 :022 > a.next
=> 1
2.1.5 :023 > a.next
=> 2
This is not the only purpose of the Enumerator (otherwise it would probably be called Iterator). However, this is one of the documented feature.
An Enumerator can also be used as an external iterator. For example, #next returns the next value of the iterator or raises StopIteration if the Enumerator is at the end.
e = [1,2,3].each # returns an enumerator object.
puts e.next # => 1
puts e.next # => 2
puts e.next # => 3
puts e.next # raises StopIteration
But as I said before, the Enumerator class includes Enumerable. It means every instance of an Enumerator exposes the Enumerable methods that are designed to work on a collection. In this case, the collection is the one the Enumerator is wrapped on.
take is a generic Enumerable method. It is designed to return the first N elements from enum. It's important to note that enum is referring to any generic class that includes Enumerable, not to the Enumerator. Therefore, take(2) will returns the first two elements of the collection, regardless the position of the pointer inside the Enumerator instance.
Let me show you a practical example. I can create a custom class, and implement Enumerable.
class Example
include Enumerable
def initialize(array)
#array = array
end
def each(*args, &block)
#array.each(*args, &block)
end
end
I can mix Enumerable, and as long as I provide an implementation for each I get all the other methods for free, including take.
e = Example.new([1, 2, 3])
=> #<Example:0x007fa9529be760 #array=[1, 2, 3]>
e.take(2)
=> [1, 2]
As expected, take returns the first 2 elements. take ignores anything else of my implementation, exactly as in Enumerable, including states or pointers.

Per the documentation, Enumerable#take returns first n elements from the Enumerator, not the next n elements from the cursor. Only the methods from Enumerator are going to operate on that internal cursor; the Enumerable mix-in is just a collection of methods for enumerating which don't necessarily share cursors.
If you wanted, you could implement Enumerator#take to do what you expect:
class Enumerator
def take(n = 1)
n.times.map { self.next }
end
end

Related

Need Ruby method to convert an array of strings into a Hash

I need Ruby method to convert an array of strings into a Hash where each key is a string and each value is the 1-indexed index of the string in the original array.
hashify(%w(a b c))
# should return
{'a' => 1, 'b' => 2, 'c' => 3}
Even though I think I'm helping someone do their homework, I can't resist taking a golf swing, because Ruby is awesome:
%w(a b c).each.with_index(1).to_h
Also, defining "hashify" is VERY un-Ruby-like. I'd suggest alternatives, but it's probably a homework assignment anyways and you don't seem to want to learn it.
def hashify(array)
array.each.with_index(1).to_h
end
hashify(%w(a b c))
#=> { "a" => 1, "b" => 2, "c" => 3 }
There are (clearly) multiple ways you could achieve your goal in Ruby.
If you consider the expression %w(a b c).map.with_index(1).to_h, you can see that it is a matter of stringing together a few methods provided to us by the Enumerable class.
By first sending the :map message to the array, we receive back an Enumerator, which provides us the handy :with_index method. As you can see in the docs, with_index accepts an offset in an argument, so offsetting your indices by 1 is as simple as passing 1 as your argument to :with_index.
Finally, we call :to_h on the enumerator to receive the desired hash.
# fore!
def hashify(array)
array.map.with_index(1).to_h
end
> hashify %w(a b c)
=> {"a"=>1, "b"=>2, "c"=>3}
Try this, this will add a method to_hash_one_indexed to the Array class.
class Array
def to_hash_one_indexed
map.with_index(1).to_h
end
end
Then to call it:
%w(a b c).to_hash_one_indexed
#=> {"a"=>1, "b"=>2, "c"=>3}

Differences between [1,2,3].to_enum and [1,2,3].enum_for in Ruby

In Ruby I'm trying to understand between the to_enum and enum_for methods. Before I my question, I've provided some sample code and two examples to help w/ context.
Sample code:
# replicates group_by method on Array class
class Array
def group_by2(&input_block)
return self.enum_for(:group_by2) unless block_given?
hash = Hash.new {|h, k| h[k] = [] }
self.each { |e| hash[ input_block.call(e) ] << e }
hash
end
end
Example # 1:
irb (main)> puts [1,2,3].group_by2.inspect
=> #<Enumerator: [1, 2, 3]:group_by2>
In example #1: Calling group_by on the array [1,2,3], without passing in a block, returns an enumerator generated with the command self.enum_for(:group_by_2).
Example #2
irb (main)> puts [1,2,3].to_enum.inspect
=> #<Enumerator: [1, 2, 3]:each>
In example #2, the enumerator is generated by calling the to_enum method on the array [1,2,3]
Question:
Do the enumerators generates in examples 1 and 2, behave differently in any way? I can see from the inspected outputs that they show slightly different labels, but I can find any difference in the enumerators' behavior.
# Output for example #1
#<Enumerator: [1, 2, 3]:each> # label reads ":each"
# Output for example #2
#<Enumerator: [1, 2, 3]:group_by2> # label reads ":group_by2"
p [1, 2, 3].to_enum
p [1, 2, 3].enum_for
--output:--
#<Enumerator: [1, 2, 3]:each>
#<Enumerator: [1, 2, 3]:each>
From the docs:
to_enum
Creates a new Enumerator which will enumerate by calling method on
obj, passing args if any.
...
enum_for
Creates a new Enumerator which will enumerate by calling method on
obj, passing args if any.
ruby is a language that often has method names that are synonyms.
Followup question:
Does the symbol in the command [1,2,3].to_enum(:foo) serve a purpose,
other than replacing :each with :foo in the output?
Yes. By default, ruby hooks up the enumerator to the receiver's each() method. Some classes do not have an each() method, for instance String:
str = "hello\world"
e = str.to_enum
puts e.next
--output:--
1.rb:3:in `next': undefined method `each' for "helloworld":String (NoMethodError)
from 1.rb:3:in `<main>
to_enum() allows you to specify the method you would like the enumerator to use:
str = "hello\nworld"
e = str.to_enum(:each_line)
puts e.next
--output:--
hello
Now, suppose you have the array [1, 2, 3], and you want to to create an enumerator for your array. An array has an each() method, but instead of creating an enumerator with each(), which will return each of the elements in the array, then end; you want to create an enumerator that starts over from the beginning of the array once it reaches the end?
e = [1, 2, 3].to_enum(:cycle)
10.times do
puts e.next()
end
--output:--
1
2
3
1
2
3
1
2
3
1

How to #rewind the internal position under #each?

I'm trying to write a code where the enumeration sequence is rewinded to the beginning.
I think rewind is appropriate for this application, but I'm not sure how to implement it under an each iterator passing to a block? In the Ruby-Docs example, next is used to move the internal position by one at a time. With a block, it would move autonomously.
There's not many good examples online for this specifically. My workaround at the moment is to nest an iterator under a loop and using break under the iterator. When the iterator breaks, the loop resets the enumeration sequence.
Is there a better way—as I'm sure there is—of doing this?
Use the Enumerator#rewind method from Ruby core class libarary.
Rewinds the enumeration sequence to the beginning.If the enclosed object responds to a “rewind” method, it is called.
a = [1,2,3,4]
enum= a.each
enum # => #<Enumerator: [1, 2, 3, 4]:each>
enum.next # => 1
enum.next # => 2
enum.rewind # => #<Enumerator: [1, 2, 3, 4]:each>
enum.next # => 1

Ruby: Module, Mixins and Blocks confusing?

Following is the code I tried to run from the Ruby Programming Book
http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_modules.html
Why doesn't the product method give the right output?
I ran it with irb test.rb. And I am running Ruby 1.9.3p194.
module Inject
def inject(n)
each do |value|
n = yield(n, value)
end
n
end
def sum(initial = 0)
inject(initial) { |n, value| n + value }
end
def product(initial = 1)
inject(initial) { |n, value| n * value }
end
end
class Array
include Inject
end
[1, 2, 3, 4, 5].sum ## 15
[1, 2, 3, 4, 5].product ## [[1], [2], [3], [4], [5]]
Since that code example was written, Array has gained a #product method and you're seeing the output of that particular method. Rename your module's method to something like product_new.
Add this line at the end of your code :
p Array.ancestors
and you get (in Ruby 1.9.3) :
[Array, Inject, Enumerable, Object, Kernel, BasicObject]
Array is a subclass of Object and has a superclass pointer to Object. As Enumerable is mixed in (included) by Array, the superclass pointer of Array points to Enumerable, and from there to Object. When you include Inject, the superclass pointer of Array points to Inject, and from there to Enumerable. When you write
[1, 2, 3, 4, 5].product
the method search mechanism starts at the instance object [1, 2, 3, 4, 5], goes to its class Array, and finds product (new in 1.9) there. If you run the same code in Ruby 1.8, the method search mechanism starts at the instance object [1, 2, 3, 4, 5], goes to its class Array, does not find product, goes up the superclass chain, and finds product in Inject, and you get the result 120 as expected.
You find a good explanation of Modules and Mixins with graphic pictures in the Pickaxe http://pragprog.com/book/ruby3/programming-ruby-1-9
I knew I had seen that some are asking for a prepend method to include a module before, between the instance and its class, so that the search mechanism finds included methods before the ones of the class. I made a seach in SO with "[ruby]prepend module instead of include" and found among others this :
Why does including this module not override a dynamically-generated method?
By the way: in Ruby 2.0, there are two features which help you with both your problems.
Module#prepend prepends a mixin to the inheritance chain, so that methods defined in the mixin override methods defined in the module/class it is being mixed into.
Refinements allow lexically scoped monkeypatching.
Here they are in action (you can get a current build of YARV 2.0 via RVM or ruby-build easily):
module Sum
def sum(initial=0)
inject(initial, :+)
end
end
module ArrayWithSum
refine Array do
prepend Sum
end
end
class Foo
using ArrayWithSum
p [1, 2, 3].sum
# 6
end
p [1, 2, 3].sum
# NoMethodError: undefined method `sum' for [1, 2, 3]:Array
using ArrayWithSum
p [1, 2, 3].sum
# 6
In response to #zeronone "How can we avoid such namespace clashes?"
Avoid monkeypatching core classes wherever possible is the first rule. A better way to do this (IMO) would be to subclass Array:
class MyArray < Array
include Inject
# or you could just dispense with the module and define this directly.
end
xs = MyArray.new([1, 2, 3, 4, 5])
# => [1, 2, 3, 4, 5]
xs.sum
# => 15
xs.product
# => 120
[1, 2, 3, 4, 5].product
# => [[1], [2], [3], [4], [5]]
Ruby may be an OO language, but because it is so dynamic sometimes (I find) subclassing gets forgotten as a useful way to do things, and hence there is an over reliance on the basic data structures of Array, Hash and String, which then leads to far too much re-opening of these classes.
The following code is not very elaborated. Just to show you that today you already have means, like the hooks called by Ruby when certain events occur, to check which method (from the including class or the included module) will be used/not used.
module Inject
def self.append_features(p_host) # don't use included, it's too late
puts "#{self} included into #{p_host}"
methods_of_this_module = self.instance_methods(false).sort
print "methods of #{self} : "; p methods_of_this_module
first_letter = []
methods_of_this_module.each do |m|
first_letter << m[0, 2]
end
print 'selection to reduce the display : '; p first_letter
methods_of_host_class = p_host.instance_methods(true).sort
subset = methods_of_host_class.select { |m| m if first_letter.include?(m[0, 2]) }
print "methods of #{p_host} we are interested in: "; p subset
methods_of_this_module.each do |m|
puts "#{self.name}##{m} will not be used" if methods_of_host_class.include? m
end
super # <-- don't forget it !
end
Rest as in your post. Execution :
$ ruby -v
ruby 1.8.6 (2010-09-02 patchlevel 420) [i686-darwin12.2.0]
$ ruby -w tinject.rb
Inject included into Array
methods of Inject : ["inject", "product", "sum"]
selection to reduce the display : ["in", "pr", "su"]
methods of Array we are interested in: ["include?", "index",
..., "inject", "insert", ..., "instance_variables", "private_methods", "protected_methods"]
Inject#inject will not be used
$ rvm use 1.9.2
...
$ ruby -v
ruby 1.9.2p320 (2012-04-20 revision 35421) [x86_64-darwin12.2.0]
$ ruby -w tinject.rb
Inject included into Array
methods of Inject : [:inject, :product, :sum]
selection to reduce the display : ["in", "pr", "su"]
methods of Array we are interested in: [:include?, :index, ..., :inject, :insert,
..., :private_methods, :product, :protected_methods]
Inject#inject will not be used
Inject#product will not be used

Pass arguments by reference to a block with the splat operator

It seems that the arguments are copied when using the splat operator to pass arguments to a block by reference.
I have this:
def method
a = [1,2,3]
yield(*a)
p a
end
method {|x,y,z| z = 0}
#=> this puts and returns [1, 2, 3] (didn't modified the third argument)
How can I pass these arguments by reference? It seems to work if I pass the array directly, but the splat operator would be much more practical, intuitive and maintainable here.
In Ruby when you write x = value you are creating a new local variable x whether it existed previously or not (if it existed the name is simply rebound and the original value remains untouched). So you won't be able to change a variable in-place this way.
Integers are immutable. So if you send an integer there is no way you can change its value. Note that you can change mutable objects (strings, hashes, arrays, ...):
def method
a = [1, 2, "hello"]
yield(*a)
p a
end
method { |x,y,z| z[1] = 'u' }
# [1, 2, "hullo"]
Note: I've tried to answer your question, now my opinion: updating arguments in methods or blocks leads to buggy code (you have no referential transparency anymore). Return the new value and let the caller update the variable itself if so inclined.
The problem here is the = sign. It makes the local variable z be assigned to another object.
Take this example with strings:
def method
a = ['a', 'b', 'c']
yield(*a)
p a
end
method { |x,y,z| z.upcase! } # => ["a", "b", "C"]
This clearly shows that z is the same as the third object of the array.
Another point here is your example is numeric. Fixnums have fixed ids; so, you can't change the number while maintaining the same object id. To change Fixnums, you must use = to assign a new number to the variable, instead of self-changing methods like inc! (such methods can't exist on Fixnums).
Yes... Array contains links for objects. In your code when you use yield(*a) then in block you works with variables which point to objects which were in array. Now look for code sample:
daz#daz-pc:~/projects/experiments$ irb
irb(main):001:0> a = 1
=> 1
irb(main):002:0> a.object_id
=> 3
irb(main):003:0> a = 2
=> 2
irb(main):004:0> a.object_id
=> 5
So in block you don't change old object, you just create another object and set it to the variable. But the array contain link to the old object.
Look at the debugging stuff:
def m
a = [1, 2]
p a[0].object_id
yield(*a)
p a[0].object_id
end
m { |a, b| p a.object_id; a = 0; p a.object_id }
Output:
3
3
1
3
How can I pass these arguments by reference?
You can't pass arguments by reference in Ruby. Ruby is pass-by-value. Always. No exceptions, no ifs, no buts.
It seems to work if I pass the array directly
I highly doubt that. You simply cannot pass arguments by reference in Ruby. Period.

Resources