I have a Ruby array containing some string values. I need to:
Find all elements that match some predicate
Run the matching elements through a transformation
Return the results as an array
Right now my solution looks like this:
def example
matchingLines = #lines.select{ |line| ... }
results = matchingLines.map{ |line| ... }
return results.uniq.sort
end
Is there an Array or Enumerable method that combines select and map into a single logical statement?
I usually use map and compact together along with my selection criteria as a postfix if. compact gets rid of the nils.
jruby-1.5.0 > [1,1,1,2,3,4].map{|n| n*3 if n==1}
=> [3, 3, 3, nil, nil, nil]
jruby-1.5.0 > [1,1,1,2,3,4].map{|n| n*3 if n==1}.compact
=> [3, 3, 3]
Ruby 2.7+
There is now!
Ruby 2.7 is introducing filter_map for this exact purpose. It's idiomatic and performant, and I'd expect it to become the norm very soon.
For example:
numbers = [1, 2, 5, 8, 10, 13]
enum.filter_map { |i| i * 2 if i.even? }
# => [4, 16, 20]
Here's a good read on the subject.
Hope that's useful to someone!
You can use reduce for this, which requires only one pass:
[1,1,1,2,3,4].reduce([]) { |a, n| a.push(n*3) if n==1; a }
=> [3, 3, 3]
In other words, initialize the state to be what you want (in our case, an empty list to fill: []), then always make sure to return this value with modifications for each element in the original list (in our case, the modified element pushed to the list).
This is the most efficient since it only loops over the list with one pass (map + select or compact requires two passes).
In your case:
def example
results = #lines.reduce([]) do |lines, line|
lines.push( ...(line) ) if ...
lines
end
return results.uniq.sort
end
Another different way of approaching this is using the new (relative to this question) Enumerator::Lazy:
def example
#lines.lazy
.select { |line| line.property == requirement }
.map { |line| transforming_method(line) }
.uniq
.sort
end
The .lazy method returns a lazy enumerator. Calling .select or .map on a lazy enumerator returns another lazy enumerator. Only once you call .uniq does it actually force the enumerator and return an array. So what effectively happens is your .select and .map calls are combined into one - you only iterate over #lines once to do both .select and .map.
My instinct is that Adam's reduce method will be a little faster, but I think this is far more readable.
The primary consequence of this is that no intermediate array objects are created for each subsequent method call. In a normal #lines.select.map situation, select returns an array which is then modified by map, again returning an array. By comparison, the lazy evaluation only creates an array once. This is useful when your initial collection object is large. It also empowers you to work with infinite enumerators - e.g. random_number_generator.lazy.select(&:odd?).take(10).
If you have a select that can use the case operator (===), grep is a good alternative:
p [1,2,'not_a_number',3].grep(Integer){|x| -x } #=> [-1, -2, -3]
p ['1','2','not_a_number','3'].grep(/\D/, &:upcase) #=> ["NOT_A_NUMBER"]
If we need more complex logic we can create lambdas:
my_favourite_numbers = [1,4,6]
is_a_favourite_number = -> x { my_favourite_numbers.include? x }
make_awesome = -> x { "***#{x}***" }
my_data = [1,2,3,4]
p my_data.grep(is_a_favourite_number, &make_awesome) #=> ["***1***", "***4***"]
I'm not sure there is one. The Enumerable module, which adds select and map, doesn't show one.
You'd be required to pass in two blocks to the select_and_transform method, which would be a bit unintuitive IMHO.
Obviously, you could just chain them together, which is more readable:
transformed_list = lines.select{|line| ...}.map{|line| ... }
Simple Answer:
If you have n records, and you want to select and map based on condition then
records.map { |record| record.attribute if condition }.compact
Here, attribute is whatever you want from the record and condition you can put any check.
compact is to flush the unnecessary nil's which came out of that if condition
No, but you can do it like this:
lines.map { |line| do_some_action if check_some_property }.reject(&:nil?)
Or even better:
lines.inject([]) { |all, line| all << line if check_some_property; all }
I think that this way is more readable, because splits the filter conditions and mapped value while remaining clear that the actions are connected:
results = #lines.select { |line|
line.should_include?
}.map do |line|
line.value_to_map
end
And, in your specific case, eliminate the result variable all together:
def example
#lines.select { |line|
line.should_include?
}.map { |line|
line.value_to_map
}.uniq.sort
end
def example
#lines.select {|line| ... }.map {|line| ... }.uniq.sort
end
In Ruby 1.9 and 1.8.7, you can also chain and wrap iterators by simply not passing a block to them:
enum.select.map {|bla| ... }
But it's not really possible in this case, since the types of the block return values of select and map don't match up. It makes more sense for something like this:
enum.inject.with_index {|(acc, el), idx| ... }
AFAICS, the best you can do is the first example.
Here's a small example:
%w[a b 1 2 c d].map.select {|e| if /[0-9]/ =~ e then false else e.upcase end }
# => ["a", "b", "c", "d"]
%w[a b 1 2 c d].select.map {|e| if /[0-9]/ =~ e then false else e.upcase end }
# => ["A", "B", false, false, "C", "D"]
But what you really want is ["A", "B", "C", "D"].
You should try using my library Rearmed Ruby in which I have added the method Enumerable#select_map. Heres an example:
items = [{version: "1.1"}, {version: nil}, {version: false}]
items.select_map{|x| x[:version]} #=> [{version: "1.1"}]
# or without enumerable monkey patch
Rearmed.select_map(items){|x| x[:version]}
If you want to not create two different arrays, you can use compact! but be careful about it.
array = [1,1,1,2,3,4]
new_array = map{|n| n*3 if n==1}
new_array.compact!
Interestingly, compact! does an in place removal of nil. The return value of compact! is the same array if there were changes but nil if there were no nils.
array = [1,1,1,2,3,4]
new_array = map{|n| n*3 if n==1}.tap { |array| array.compact! }
Would be a one liner.
Your version:
def example
matchingLines = #lines.select{ |line| ... }
results = matchingLines.map{ |line| ... }
return results.uniq.sort
end
My version:
def example
results = {}
#lines.each{ |line| results[line] = true if ... }
return results.keys.sort
end
This will do 1 iteration (except the sort), and has the added bonus of keeping uniqueness (if you don't care about uniq, then just make results an array and results.push(line) if ...
Here is a example. It is not the same as your problem, but may be what you want, or can give a clue to your solution:
def example
lines.each do |x|
new_value = do_transform(x)
if new_value == some_thing
return new_value # here jump out example method directly.
else
next # continue next iterate.
end
end
end
Related
I want to parse the formal list from https://www.loc.gov/marc/bibliographic/ecbdlist.html into a nested structure of hashes and arrays.
At first, I used a recursive approach - but ran into the problem that Ruby (and BTW also Python) can handle only less than 1000 recursive calls (stack level too deep).
I found "select_before" and it seemed great:
require 'pp'
# read list into array and get rid of unnecessary lines
marc = File.readlines('marc21.txt', 'r:utf-8')[0].lines.map(&:chomp).select { |line| line if !line.match(/^\s*$/) && !line.match(/^--.+/) }
# magic starts here
marc = marc.slice_before { |line| line[/^ */].size == 0 }.to_a
marc = marc.inject({}) { |hash, arr| hash = hash.merge( arr[0] => arr[1..-1] ) }
I now want to iterate these steps throughout the array. As the indentation levels in the list vary ([0, 2, 3, 4, 5, 6, 8, 9, 10, 12] not all of them always present), I use a helper method get_indentation_map to use only the smallest amount of indentation in each iteration.
But adding only one level (far from the goal of turning the whole array into the new structure), I get the error "no implicit conversion of Regex into Integer" the reason of which I fail to see:
def get_indentation_map( arr )
arr.map { |line| line[/^ */].size }
end
# starting again after slice_before of the unindented lines (== 0)
marc = marc.inject({}) do |hash, arr|
hash = hash.merge( arr[0] => arr[1..-1] ) # so far like above
# now trying to do the same on the next level
hash = hash.inject({}) do |h, a|
indentation_map = get_indentation_map( a ).uniq.sort
# only slice before smallest indentation
a = a.slice_before { |line| line[/^ */].size == indentation_map[0] }.to_a
h = h.merge( a[0] => a[1..-1] )
end
hash
end
I would be very grateful for hints how to best parse this list. I aim at a json-like structure in which every entry is the key for the further indented lines (if there are). Thanks in advance.
For example, to return the 10,000th prime number I could write:
require 'prime'
Prime.first(10000).last #=> 104729
But creating a huge intermediate array, just to retrieve its last element feels a bit cumbersome.
Given that Ruby is such an elegant language, I would have expected something like:
Prime.at(9999) #=> 104729
But there is no Enumerable#at.
Is the above workaround intended or is there a more direct way to get the nth element of an Enumerable?
The closest thing I can think of to a hypothetical at method is drop, which skips the indicated number of elements. It tries to return an actual array though so you need to combine it with lazy if you are using with infinite sequences, e.g.
Prime.lazy.drop(9999).first
What about this?
Prime.to_enum.with_index(1){|e, i| break e if i == 10000}
# => 104729
For enumerators that are not lazy, you probably want to put lazy on the enumerator.
Based on sawa's solution, here's a more succinct alternative:
Prime.find.with_index(1) { |_, i| i == 10000 }
#=> 104729
or
Prime.find.with_index { |_, i| i == 9999 }
#=> 104723
module Enumerable
def at(n)
enum = is_a?(Enumerator) ? self : each
(n-1).times { enum.next }
enum.next
end
end
(0..4).at(3)
#=> 2
{ a:1, b:2, c:3, d:4, e:5 }.at(3)
#=> [:c, 3]
'0ab_1ab_2ab_3ab_4ab'.gsub(/.(?=ab)/).at(3)
#=> "2"
require 'prime'; Prime.at(10000)
#=> 104729
e = 1.step; e.at(10)
#=> 10
[0,1,2,3,4].at(6)
#=> nil
'0ab_1ab_2ab_3ab_4ab'.gsub(/.(?=ab)/).at(6)
#=> StopIteration (iteration reached an end)
Note that '0ab_1ab_2ab_3ab_4ab'.gsub(/.(?=ab)/).class #=> Enumerator.
Some refinements would be needed, such as checking that n is an integer greater than zero and improving the exception handling, such as dealing with the case when self is not an enumerator but has no method each.
What is the idiomatic Ruby way to write this code?
Given an array, I would like to iterate through each element of that array, but skip the first one. I want to do this without allocating a new array.
Here are two ways I've come up with, but neither feels particularly elegant.
This works but seems way too verbose:
arr.each_with_index do |elem, i|
next if i.zero? # skip the first
...
end
This works but allocates a new array:
arr[1..-1].each { ... }
Edit/clarification: I'd like to avoid allocating a second array. Originally I said I wanted to avoid "copying" the array, which was confusing.
Using the internal enumerator is certainly more intuitive, and you can do this fairly elegantly like so:
class Array
def each_after(n)
each_with_index do |elem, i|
yield elem if i >= n
end
end
end
And now:
arr.each_after(1) do |elem|
...
end
I want to do this without creating a copy of the array.
1) Internal iterator:
arr = [1, 2, 3]
start_index = 1
(start_index...arr.size).each do |i|
puts arr[i]
end
--output:--
2
3
2) External iterator:
arr = [1, 2, 3]
e = arr.each
e.next
loop do
puts e.next
end
--output:--
2
3
OK, maybe this is bad form to answer my own question. But I've been racking my brain on this and poring over the Enumerable docs, and I think I've found a good solution:
arr.lazy.drop(1).each { ... }
Here's proof that it works :-)
>> [1,2,3].lazy.drop(1).each { |e| puts e }
2
3
Concise: yes. Idiomatic Ruby… maybe? What do you think?
I'm looking for something similar to #detect in enumerables, but not quite. This is what enumerable does:
[1, 2, 3].detect {|i| i > 1 } #=> 2
it returns the first instance of the array which matches the condition. Now, my purpose is to return the value of the block. Concern is not exactly the conditions, but for instance, the first which is not nil. Something like this:
[var1, var2, var3].wanted_detect {|var| another_function(var) }
in which the function would return the first result of another_function call which isn't nil.
Mapping the values of applying the method on the variables and then using detect is not an option. This one would ideally have to work in lazy enumerators, for which the early mapping of all possible values is a no-go
[var1, var2, var3].lazy.map { |var| another_function(var) }.reject(&:nil?).first
If you don't have access to Enumerable#lazy, it is easy enough to implement what you want:
module Enumerable
def wanted_detect
self.each do |obj|
val = yield obj
return val if val
end
end
end
Demo:
[1, 2, 3, 4].wanted_detect { |x| x*x if x > 2 }
# => 9
EDIT: Sorry, I missed the last paragraph till falsetru pointed it out.
Thanks for the comments, falsetru.
Why would you create a proxy reference to an object in Ruby, by using the to_enum method rather than just using the object directly? I cannot think of any practical use for this, trying to understand this concept & where someone might use it, but all the examples I have seen seem very trivial.
For example, why use:
"hello".enum_for(:each_char).map {|c| c.succ }
instead of
"hello".each_char.map {|c| c.succ }
I know this is a very simple example, does anyone have any real-world examples?
Most built-in methods that accept a block will return an enumerator in case no block is provided (like String#each_char in your example). For these, there is no reason to use to_enum; both will have the same effect.
A few methods do not return an Enumerator, though. In those case you might need to use to_enum.
# How many elements are equal to their position in the array?
[4, 1, 2, 0].to_enum(:count).each_with_index{|elem, index| elem == index} #=> 2
As another example, Array#product, #uniq and #uniq! didn't use to accept a block. In 1.9.2, this was changed, but to maintain compatibility, the forms without a block can't return an Enumerator. One can still "manually" use to_enum to get an enumerator:
require 'backports/1.9.2/array/product' # or use Ruby 1.9.2+
# to avoid generating a huge intermediary array:
e = many_moves.to_enum(:product, many_responses)
e.any? do |move, response|
# some criteria
end
The main use of to_enum is when you are implementing your own iterative method. You typically will have as a first line:
def my_each
return to_enum :my_each unless block_given?
# ...
end
I think it has something to do with internal and external Iterators. When you return an enumerator like this:
p = "hello".enum_for(:each_char)
p is an external enumerator. One advantage of external iterators is that:
External iterators are more flexible than internal iterators. It's easy to compare two collections for equality with an external iterator, for example, but it's practically impossible with internal iterators…. But on the other hand, internal iterators are easier to use, because they define the iteration logic for you. [From The Ruby Programming Language book, ch. 5.3]
So, with external iterator you can do, e.g.:
p = "hello".enum_for(:each_char)
loop do
puts p.next
end
Let's say we want to take an array of keys and an array of values and sew them up in a Hash:
With #to_enum
def hashify(k, v)
keys = k.to_enum(:each)
values = v.to_enum(:each)
hash = []
loop do
hash[keys.next] = values.next
# No need to check for bounds,
# as #next will raise a StopIteration which breaks from the loop
end
hash
end
Without #to_enum:
def hashify(k, v)
hash = []
keys.each_with_index do |key, index|
break if index == values.length
hash[key] = values[index]
end
hash
end
It's much easier to read the first method, don't you think? Not a ton easier, but imagine if we were somehow manipulating items from 3 arrays? 5? 10?
This isn't quite an answer to your question, but hopefully it is relevant.
In your second example you are calling each_char without passing a block. When called without a block each_char returns an Enumerator so your examples are actually just two ways of doing the same thing. (i.e. both result in the creation of an enumerable object.)
irb(main):016:0> e1 = "hello".enum_for(:each_char)
=> #<Enumerator:0xe15ab8>
irb(main):017:0> e2 = "hello".each_char
=> #<Enumerator:0xe0bd38>
irb(main):018:0> e1.map { |c| c.succ }
=> ["i", "f", "m", "m", "p"]
irb(main):019:0> e2.map { |c| c.succ }
=> ["i", "f", "m", "m", "p"]
It's great for large or infinite generator objects.
E.g., the following will give you an enumerator for the whole Fibonacci seequence, from 0 to infinity.
def fib_sequence
return to_enum(:fib_sequence) unless block_given?
yield 0
yield 1
x,y, = 0, 1
loop { x,y = y,x+y; yield(y) }
end
to_enum effectively allows you to write this with regular yields without having to mess with Fibers.
You can then slice it as you want, and it will be very memory efficient, since no arrays will be stored in memory:
module Slice
def slice(range)
return to_enum(:slice, range) unless block_given?
start, finish = range.first, range.max + 1
copy = self.dup
start.times { copy.next }
(finish-start).times { yield copy.next }
end
end
class Enumerator
include Slice
end
fib_sequence.slice(0..10).to_a
#=> [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
fib_sequence.slice(10..20).to_a
#=> [55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]