How can I get a lazy array in Ruby? - ruby

How can I get a lazy array in Ruby?
In Haskell, I can talk about [1..], which is an infinite list, lazily generated as needed. I can also do things like iterate (+2) 0, which applies whatever function I give it to generate a lazy list. In this case, it would give me all even numbers.
I'm sure I can do such things in Ruby, but can't seem to work out how.

With Ruby 1.9 you can use the Enumerator class. This is an example from the docs:
fib = Enumerator.new { |y|
a = b = 1
loop {
y << a
a, b = b, a + b
}
}
p fib.take(10) #=> [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
Also, this is a nice trick:
Infinity = 1.0/0
range = 5..Infinity
p range.take(10) #=> [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
This one only works for consecutive values though.

Recently Enumerable::Lazy has been added to ruby trunk. We'll see it in ruby 2.0.
In particular:
a = data.lazy.map(&:split).map(&:reverse)
will not be evaluated immediately.
The result is instance of Enumerable::Lazy, that can be lazy chained any further. If you want to get an actual result - use #to_a, #take(n) (#take is now lazy too, use #to_a or #force), etc.
If you want more on this topic and my C patch - see my blog post Ruby 2.0 Enumerable::Lazy

Lazy range (natural numbers):
Inf = 1.0/0.0
(1..Inf).take(3) #=> [1, 2, 3]
Lazy range (even numbers):
(0..Inf).step(2).take(5) #=> [0, 2, 4, 6, 8]
Note, you can also extend Enumerable with some methods to make working with lazy ranges (and so on) more convenient:
module Enumerable
def lazy_select
Enumerator.new do |yielder|
each do |obj|
yielder.yield(obj) if yield(obj)
end
end
end
end
# first 4 even numbers
(1..Inf).lazy_select { |v| v.even? }.take(4)
output:
[2, 4, 6, 8]
More info here:
http://banisterfiend.wordpress.com/2009/10/02/wtf-infinite-ranges-in-ruby/
There are also implementations of lazy_map, and lazy_select for the Enumeratorclass that can be found here:
http://www.michaelharrison.ws/weblog/?p=163

In Ruby 2.0.0, they were introduced new method "Lazy" in Enumerable class.
You can check the lazy function core and usage here..
http://www.ruby-doc.org/core-2.0/Enumerator/Lazy.html
https://github.com/yhara/enumerable-lazy
http://shugomaeda.blogspot.in/2012/03/enumerablelazy-and-its-benefits.html

As I already said in my comments, implementing such a thing as lazy arrays wouldn't be sensible.
Using Enumerable instead can work nicely in some situations, but differs from lazy lists in some points: methods like map and filter won't be evaluated lazily (so they won't work on infinite enumerables) and elements that have been calculated once aren't stored, so if you access an element twice, it's calculated twice.
If you want the exact behavior of haskell's lazy lists in ruby, there's a lazylist gem which implements lazy lists.

This will loop to infinity:
0.step{|i| puts i}
This will loop to infinity twice as fast:
0.step(nil, 2){|i| puts i}
This will go to infinity, only if you want it to (results in an Enumerator).
table_of_3 = 0.step(nil, 3)

I surprised no one answered this question appropriately yet
So, recently I found this method Enumerator.produce which in conjunction with .lazy does exactly what you described but in ruby-ish fashion
Examples
Enumerator.produce(0) do
_1 + 2
end.lazy
.map(&:to_r)
.take(1_000)
.inject(&:+)
# => (999000/1)
def fact(n)
= Enumerator.produce(1) do
_1 + 1
end.lazy
.take(n)
.inject(&:*)
fact 6 # => 720

The right answer has already identified the "lazy" method, but the example provided was not too useful. I will give a better example of when it is appropriate to use lazy with arrays. As stated, lazy is defined as an instance method of the module Enumerable, and it works on EITHER objects that implement Enumerable module (e.g. arrays - [].lazy) or enumerators which are the return value of iterators in the enumerable module (e.g. each_slice - [].each_slice(2).lazy). Note that in Enumerable module, some of the instance methods return more primitive values like true or false, some return collections like arrays and some return enumerators. Some return enumerators if a block is not given.
But for our example, the IO class also has an iterator each_line, which returns an enumerator and thus can be used with "lazy". The beautiful thing about returning an enumerator is that it does not actually load the collection (e.g. large array) in memory that it is working on. Rather, it has a pointer to the collection and then stories the algorithm (e.g. each_slice(2)) that it will use on that collection, when you want to process the collection with something like to_a, for example.
So if you are working with an enumerator for a huge performance boost, now you can attach lazy to the enumerator. So instead of iterating through an entire collection to match this condition:
file.each_line.select { |line| line.size == 5 }.first(5)
You can invoke lazy:
file.each_line.lazy.select { |line| line.size == 5 }.first(5)
If we're scanning a large text file for the first 5 matches, then once we find the 5 matches, there is no need to proceed the execution. Hence, the power of lazy with any type of enumerable object.

Ruby Arrays dynamically expand as needed. You can apply blocks to them to return things like even numbers.
array = []
array.size # => 0
array[0] # => nil
array[9999] # => nil
array << 1
array.size # => 1
array << 2 << 3 << 4
array.size # => 4
array = (0..9).to_a
array.select do |e|
e % 2 == 0
end
# => [0,2,4,6,8]
Does this help?

Related

Why would #each_with_object and #inject switch the order of block parameters?

#each_with_object and #inject can both be used to build a hash.
For example:
matrix = [['foo', 'bar'], ['cat', 'dog']]
some_hash = matrix.inject({}) do |memo, arr|
memo[arr[0]] = arr
memo # no implicit conversion of String into Integer (TypeError) if commented out
end
p some_hash # {"foo"=>["foo", "bar"], "cat"=>["cat", "dog"]}
another_hash = matrix.each_with_object({}) do |arr, memo|
memo[arr[0]] = arr
end
p another_hash # {"foo"=>["foo", "bar"], "cat"=>["cat", "dog"]}
One of the key differences between the two is #each_with_object keeps track of memo through the entire iteration while #inject sets memo equal to the value returned by the block on each iteration.
Another difference is the order or the block parameters.
Is there some intention being communicated here? It doesn't make sense to reverse the block parameters of two similar methods.
They have a different ancestry.
each_with_object has been added to Ruby 1.9 in 2007
inject goes back to Smalltalk in 1980
I guess if the language were designed with both methods from the begin they would likely expect arguments in the same order. But this is not how it happened. inject has been around since the begin of Ruby whereas each_with_object has been added 10 years later only.
inject expects arguments in the same order as Smalltalk's inject:into:
collection inject: 0 into: [ :memo :each | memo + each ]
which does a left fold. You can think of the collection as a long strip of paper that is folded up from the left and the sliding window of the fold function is always the part that has already been folded plus the next element of the remaining paper strip
# (memo = 0 and 1), 2, 3, 4
# (memo = 1 and 2), 3, 4
# (memo = 3 and 3), 4
# (memo = 6 and 4)
Following the Smalltalk convention made sense back then since all the initial methods in Enumerable are taken from Smalltalk and Matz did not want to confuse people who are familiar with Smalltalk.
Nor could anyone have the foresight to know that would happen in 2007 when each_with_object was introduced to Ruby 1.9 and the order of argument reflects the lexical order of the method name, which is each ... object.
And hence these two methods expect arguments in different orders for historical reasons.

I'm learning Ruby and I don't clearly understand what :* means in inject(:*) [duplicate]

This question already has answers here:
How does this ruby injection magic work?
(3 answers)
Closed 7 years ago.
I'm writing a method for calculating the factorial of a number and I found something similar to this in my search.
def factorial(number)
(1..number).inject(:*) || 1
end
It works and I understand what the inject function is doing, but I don't clearly understand what the (:\*) part really means.
I know it must be a shorthand version of writing {|num, prod| num*prod}, but I would love a clear explanation. Thanks!!
:* is simply the method name for * of the method for inject to execute. If you look at the documentation for inject http://ruby-doc.org/core-2.2.2/Enumerable.html#method-i-inject
It states that
If you specify a symbol instead, then each element in the collection will be passed to the named method of memo. In either case, the result becomes the new value for memo. At the end of the iteration, the final value of memo is the return value for the method.
So taken that inject { |memo, obj| block }
The following are equal
ary = [1,2,3]
ary.inject(:*)
#=> 6
ary.inject { |memo, obj| memo.*(obj) }
#=> 6
Short explanation
:* is a symbol. Symbols are immutable strings. :* is like "*" except it's immutable.
In ruby, multiplication is a method invocation too. It's equivalent invoking the .*(second) method of the first multiplier with the second multiplier as an argument. In fact, you can type 3.*(4) instead of 3*4. 3*4 is just syntactic sugar as far as ruby is concerned.
Method invocation in ruby can be invoked by public_sending symbol messages to objects. 3.public_send(:*, 4) will also work just like 3*4.
The argument to inject is interpreted as what type of message should be public_senT, that is, what method should be invoked from the internals of the inject method.
Longer explanation
You can think of
[ 1, 2, 3, 4 ].inject(:*)
as injecting '*' between each adjacent pair of each enumerable object that inject is invoked on:
[ 1, 2, 3, 4 ].inject(:*) == 1 * 2 * 3 * 4
Of course 1 * 2 * 3 * 4 is equivalent to going from left to right, and applying :* on your running tally and the next number to get your next tally, and then returning the final tally.
module Enumerable
def inject_asterisk
tally = first
rest = slice(1, length - 1)
rest.each do |next_num|
tally = tally * next_num
end
return tally
end
end
[2, 3, 5].inject_asterisk #=> 30
You can generalize this by making the operation that combines the tally and next_number to get your next tally an argument function. Blocks in ruby serve basically as argument functions that always have a reserved spot.
module Enumerable
def inject_block(&block)
tally = first
rest = slice(1, length - 1)
rest.each do |next_num|
tally = block.call(tally, next_num)
end
return tally
end
end
[2, 3, 5].inject_block {|tally, next_num| tally + next_num } #=> 10
If your block is always going to be of the form
{|tally, next_num| tally.method_of_tally(next_num) }
as it is in this case (remember tally + next_num <==> tally.+(next_num) <==> tally.public_send(:+,next_num), you can decide to only pass :method_of_tally as the argument and imply the block.
module Enumerable
def my_inject(method_of_tally_symbol, &block)
if method_of_tally_symbol
block = Proc.new { |tally, next_num|
tally.public_send(method_of_tally_symbol, next_num)
}
end
tally = first
rest = slice(1, length - 1)
rest.each do |next_num|
tally = block.call(tally, next_num)
end
return tally
end
end
[2, 3, 5].my_inject(:+) #=> 10
It's all about extracting repeated patterns into reusable components so that you don't have to type as much.
It means symbol to proc and it's a shortcut. Typically you would write something like
array.map { |e| e.join }
with symbol to proc, the shorthand would be
array.map(&:join)
inject and reduce are similar, but you don't need the & in those cases
For example, if you have an array of numbers called numbers
To sum the numbers, you could do
numbers.inject(&:+)
or you could leave off the ampersand
numbers.inject(:+)
http://ruby-doc.org/core-2.2.2/Enumerable.html#method-i-inject
Inject is a method on enumerable that combines the elements of said enumerable using a symbol (as in your case) or a block (as in your proposed longhand).
For example:
(5..10).reduce(:*) is equivalent to (5..10).inject { |prod, n| prod + n }

Why does this enumerator.to_a returns []?

I just encounter this piece of code
Enumerator.new((1..100), :take, 5).to_a
# => []
Does anyone know why it returns an empty array and not an array of 5 integers?
From the docs, Enumerator#new:
new(obj, method = :each, *args)
In the second, deprecated, form, a generated Enumerator iterates over
the given object using the given method with the given arguments
passed.
Use of this form is discouraged. Use Kernel#enum_for or Kernel#to_enum
instead.
This second usage (which you should not use according to the documentation), needs a each-like method (that's it, one that yields values). take returns values, but does not yield them, so you get an empty enumerable.
Note that in Ruby 2 it will be plain simple to perform a lazy take:
2.0.0dev> xs = (1..100).lazy.take(5)
#=> #<Enumerator::Lazy: #<Enumerator::Lazy: 1..100>:take(5)>
2.0.0dev> xs.to_a
#=> [1, 2, 3, 4, 5]

Can someone explain a real-world, plain-language use for inject in Ruby?

I'm working on learning Ruby, and came across inject. I am on the cusp of understanding it, but when I'm the type of person who needs real world examples to learn something. The most common examples I come across are people using inject to add up the sum of a (1..10) range, which I could care less about. It's an arbitrary example.
What would I use it for in a real program? I'm learning so I can move on to Rails, but I don't have to have a web-centric example. I just need something that has a purpose I can wrap my head around.
Thanks all.
inject can sometimes be better understood by its "other" name, reduce. It's a function that operates on an Enumerable (iterating through it once) and returns a single value.
There are many interesting ways that it can be used, especially when chained with other Enumerable methods, such as map. Often times, it can be a more concise and expressive way of doing something, even if there is another way to do it.
An example like this may seem useless at first:
range.inject {|sum, x| sum += x}
The variable range, however, doesn't have to be a simple explicit range. It could be (for example) a list of values returned from your database. If you ran a database query that returned a list of prices in a shopping cart, you could use .inject to sum them all and get a total.
In the simple case, you can do this in the SQL query itself. In a more difficult case, such as where some items have tax added to them and some don't, something like inject can be more useful:
cart_total = prices.inject {|sum, x| sum += price_with_tax(x)}
This sort of thing is also particularly useful when the objects in the Enumerable are complex classes that require more detailed processing than a simple numerical value would need, or when the Enumerable contains objects of different types that need to be converted into a common type before processing. Since inject takes a block, you can make the functionality here as complex as you need it to be.
Here are a couple of inject() examples in action:
[1, 2, 3, 4].inject(0) {|memo, num| memo += num; memo} # sums all elements in array
The example iterates over every element of the [1, 2, 3, 4] array and adds the elements to the memo variable (memo is commonly used as the block variable name). This example explicitly returns memo after every iteration, but the return can also be implicit.
[1, 2, 3, 4].inject(0) {|memo, num| memo += num} # also works
inject() is conceptually similar to the following explicit code:
result = 0
[1, 2, 3, 4].each {|num| result += num}
result # result is now 10
inject() is also useful to create arrays and hashes. Here is how to use inject() to convert [['dogs', 4], ['cats', 3], ['dogs', 7]] to {'dogs' => 11, 'cats' => 3}.
[['dogs', 4], ['cats', 3], ['dogs', 7]].inject({'dogs' => 0, 'cats' => 0}) do |memo, (animal, num)|
memo[animal] = num
memo
end
Here is a more generalized and elegant solution:
[['dogs', 4], ['cats', 3], ['dogs', 7]].inject(Hash.new(0)) do |memo, (animal, num)|
memo[animal] = num
memo
end
Again, inject() is conceptually similar to the following code:
result = Hash.new(0)
[['dogs', 4], ['cats', 3], ['dogs', 7]].each do |animal, num|
result[animal] = num
end
result # now equals {'dogs' => 11, 'cats' => 3}
Instead of a range, imagine you've got a list of sales prices for some item on eBay and you want to know the average price. You can do that by injecting + and then dividing by the length.
ActiveRecord scopes are a typical case. If we call scoped on a model, we get an object on which we can chain additional scopes. This lets us use inject to build up a search scope from, say, a params hash:
search_params = params.slice("first_name","last_name","city","zip").
reject {|k,v| v.blank?}
search_scope = search_params.inject(User.scoped) do |memo, (k,v)|
case k
when "first_name"
memo.first_name(v)
when "last_name"
memo.last_name(v)
when "city"
memo.city(v)
when "zip"
memo.zip(v)
else
memo
end
(Note: if NO params are supplied, this brings back the whole table, which might not be what you wanted.)
My favorite explanation for inject or it's synonym reduce is:
reduce takes in an array and reduces it to a single value. It does this by iterating through a list, keeping and transforming a running total along the way.
I found it in a wonderful article at
http://railspikes.com/2008/8/11/understanding-map-and-reduce

How to find a min/max with Ruby

I want to use min(5,10), or Math.max(4,7). Are there functions to this effect in Ruby?
You can do
[5, 10].min
or
[4, 7].max
They come from the Enumerable module, so anything that includes Enumerable will have those methods available.
v2.4 introduces own Array#min and Array#max, which are way faster than Enumerable's methods because they skip calling #each.
#nicholasklick mentions another option, Enumerable#minmax, but this time returning an array of [min, max].
[4, 5, 7, 10].minmax
=> [4, 10]
You can use
[5,10].min
or
[4,7].max
It's a method for Arrays.
All those results generate garbage in a zealous attempt to handle more than two arguments. I'd be curious to see how they perform compared to good 'ol:
def max (a,b)
a>b ? a : b
end
which is, by-the-way, my official answer to your question.
If you need to find the max/min of a hash, you can use #max_by or #min_by
people = {'joe' => 21, 'bill' => 35, 'sally' => 24}
people.min_by { |name, age| age } #=> ["joe", 21]
people.max_by { |name, age| age } #=> ["bill", 35]
In addition to the provided answers, if you want to convert Enumerable#max into a max method that can call a variable number or arguments, like in some other programming languages, you could write:
def max(*values)
values.max
end
Output:
max(7, 1234, 9, -78, 156)
=> 1234
This abuses the properties of the splat operator to create an array object containing all the arguments provided, or an empty array object if no arguments were provided. In the latter case, the method will return nil, since calling Enumerable#max on an empty array object returns nil.
If you want to define this method on the Math module, this should do the trick:
module Math
def self.max(*values)
values.max
end
end
Note that Enumerable.max is, at least, two times slower compared to the ternary operator (?:). See Dave Morse's answer for a simpler and faster method.

Resources