Split array into sub-arrays based on value - ruby

I was looking for an Array equivalent String#split in Ruby Core, and was surprised to find that it did not exist. Is there a more elegant way than the following to split an array into sub-arrays based on a value?
class Array
def split( split_on=nil )
inject([[]]) do |a,v|
a.tap{
if block_given? ? yield(v) : v==split_on
a << []
else
a.last << v
end
}
end.tap{ |a| a.pop if a.last.empty? }
end
end
p (1..9 ).to_a.split{ |i| i%3==0 },
(1..10).to_a.split{ |i| i%3==0 }
#=> [[1, 2], [4, 5], [7, 8]]
#=> [[1, 2], [4, 5], [7, 8], [10]]
Edit: For those interested, the "real-world" problem which sparked this request can be seen in this answer, where I've used #fd's answer below for the implementation.

Sometimes partition is a good way to do things like that:
(1..6).partition { |v| v.even? }
#=> [[2, 4, 6], [1, 3, 5]]

I tried golfing it a bit, still not a single method though:
(1..9).chunk{|i|i%3==0}.reject{|sep,ans| sep}.map{|sep,ans| ans}
Or faster:
(1..9).chunk{|i|i%3==0 || nil}.map{|sep,ans| sep&&ans}.compact
Also, Enumerable#chunk seems to be Ruby 1.9+, but it is very close to what you want.
For example, the raw output would be:
(1..9).chunk{ |i|i%3==0 }.to_a
=> [[false, [1, 2]], [true, [3]], [false, [4, 5]], [true, [6]], [false, [7, 8]], [true, [9]]]
(The to_a is to make irb print something nice, since chunk gives you an enumerator rather than an Array)
Edit: Note that the above elegant solutions are 2-3x slower than the fastest implementation:
module Enumerable
def split_by
result = [a=[]]
each{ |o| yield(o) ? (result << a=[]) : (a << o) }
result.pop if a.empty?
result
end
end

Here are benchmarks aggregating the answers (I'll not be accepting this answer):
require 'benchmark'
a = *(1..5000); N = 1000
Benchmark.bmbm do |x|
%w[ split_with_inject split_with_inject_no_tap split_with_each
split_with_chunk split_with_chunk2 split_with_chunk3 ].each do |method|
x.report( method ){ N.times{ a.send(method){ |i| i%3==0 || i%5==0 } } }
end
end
#=> user system total real
#=> split_with_inject 1.857000 0.015000 1.872000 ( 1.879188)
#=> split_with_inject_no_tap 1.357000 0.000000 1.357000 ( 1.353135)
#=> split_with_each 1.123000 0.000000 1.123000 ( 1.123113)
#=> split_with_chunk 3.962000 0.000000 3.962000 ( 3.984398)
#=> split_with_chunk2 3.682000 0.000000 3.682000 ( 3.687369)
#=> split_with_chunk3 2.278000 0.000000 2.278000 ( 2.281228)
The implementations being tested (on Ruby 1.9.2):
class Array
def split_with_inject
inject([[]]) do |a,v|
a.tap{ yield(v) ? (a << []) : (a.last << v) }
end.tap{ |a| a.pop if a.last.empty? }
end
def split_with_inject_no_tap
result = inject([[]]) do |a,v|
yield(v) ? (a << []) : (a.last << v)
a
end
result.pop if result.last.empty?
result
end
def split_with_each
result = [a=[]]
each{ |o| yield(o) ? (result << a=[]) : (a << o) }
result.pop if a.empty?
result
end
def split_with_chunk
chunk{ |o| !!yield(o) }.reject{ |b,a| b }.map{ |b,a| a }
end
def split_with_chunk2
chunk{ |o| !!yield(o) }.map{ |b,a| b ? nil : a }.compact
end
def split_with_chunk3
chunk{ |o| yield(o) || nil }.map{ |b,a| b && a }.compact
end
end

Other Enumerable methods you might want to consider is each_slice or each_cons
I don't know how general you want it to be, here's one way
>> (1..9).each_slice(3) {|a| p a.size>1?a[0..-2]:a}
[1, 2]
[4, 5]
[7, 8]
=> nil
>> (1..10).each_slice(3) {|a| p a.size>1?a[0..-2]:a}
[1, 2]
[4, 5]
[7, 8]
[10]

here is another one (with a benchmark comparing it to the fastest split_with_each here https://stackoverflow.com/a/4801483/410102):
require 'benchmark'
class Array
def split_with_each
result = [a=[]]
each{ |o| yield(o) ? (result << a=[]) : (a << o) }
result.pop if a.empty?
result
end
def split_with_each_2
u, v = [], []
each{ |x| (yield x) ? (u << x) : (v << x) }
[u, v]
end
end
a = *(1..5000); N = 1000
Benchmark.bmbm do |x|
%w[ split_with_each split_with_each_2 ].each do |method|
x.report( method ){ N.times{ a.send(method){ |i| i%3==0 || i%5==0 } } }
end
end
user system total real
split_with_each 2.730000 0.000000 2.730000 ( 2.742135)
split_with_each_2 2.270000 0.040000 2.310000 ( 2.309600)

Related

How to get each value of inject loop

I want to get each value of inject.
For example [1,2,3].inject(3){|sum, num| sum + num} returns 9, and I want to get all values of the loop.
I tryed [1,2,3].inject(3).map{|sum, num| sum + num}, but it didn't work.
The code I wrote is this, but I feel it's redundant.
a = [1,2,3]
result = []
a.inject(3) do |sum, num|
v = sum + num
result << v
v
end
p result
# => [4, 6, 9]
Is there a way to use inject and map at same time?
Using a dedicated Eumerator perfectly fits here, but I would show more generic approach for this:
[1,2,3].inject(map: [], sum: 3) do |acc, num|
acc[:map] << (acc[:sum] += num)
acc
end
#⇒ => {:map => [4, 6, 9], :sum => 9}
That way (using hash as accumulator) one might collect whatever she wants. Sidenote: better use Enumerable#each_with_object here instead of inject, because the former does not produce a new instance of an object on each subsequent iteration:
[1,2,3].each_with_object(map: [], sum: 3) do |num, acc|
acc[:map] << (acc[:sum] += num)
end
The best I could think
def partial_sums(arr, start = 0)
sum = 0
arr.each_with_object([]) do |elem, result|
sum = elem + (result.empty? ? start : sum)
result << sum
end
end
partial_sums([1,2,3], 3)
You could use an enumerator:
enum = Enumerator.new do |y|
[1, 2, 3].inject (3) do |sum, n|
y << sum + n
sum + n
end
end
enum.take([1,2,3].size) #=> [4, 6, 9]
Obviously you can wrap this up nicely in a method, but I'll leave that for you to do. Also don't think there's much wrong with your attempt, works nicely.
def doit(arr, initial_value)
arr.each_with_object([initial_value]) { |e,a| a << e+a[-1] }.drop 1
end
arr = [1,2,3]
initial_value = 4
doit(arr, initial_value)
#=> [5, 7, 10]
This lends itself to being generalized.
def gen_doit(arr, initial_value, op)
arr.each_with_object([initial_value]) { |e,a| a << a[-1].send(op,e) }.drop 1
end
gen_doit(arr, initial_value, :+) #=> [5,7,10]
gen_doit(arr, initial_value, '-') #=> [3, 1, -2]
gen_doit(arr, initial_value, :*) #=> [4, 8, 24]
gen_doit(arr, initial_value, '/') #=> [4, 2, 0]
gen_doit(arr, initial_value, :**) #=> [4, 16, 4096]
gen_doit(arr, initial_value, '%') #=> [0, 0, 0]

Consecutive letter frequency

I am trying to write code to determine consecutive frequency of letters within a string.
For example:
"aabbcbb" => ["a",2],["b",2],["c", 1], ["b", 2]
My code gives me the first letter frequency but doesn't move on to the next.
def encrypt(str)
array = []
count = 0
str.each_char do |letter|
if array.empty?
array << letter
count += 1
elsif array.last == letter
count += 1
else
return [array, count]
array = []
end
end
end
p "aabbcbb".chars.chunk{|c| c}.map{|c, a| [c, a.size]}
# => [["a", 2], ["b", 2], ["c", 1], ["b", 2]]
"aabbcbb".chars.slice_when(&:!=).map{|a| [a.first, a.length]}
# => [["a", 2], ["b", 2], ["c", 1], ["b", 2]]
There's a simple regular expression-based solution involving back-references:
"aabbbcbb".scan(/((.)\2*)/).map { |m,c| [c, m.length] }
# => [["a", 2], ["b", 3], ["c", 1], ["b", 2]]
But I would prefer the chunk method for clarity (and almost certainly efficiency).
Actually out of curiosity, I wrote a quick benchmark and scan is a little more than four times faster than chunk.map, but I'd still use chunk.map for clarity unless you're actually doing this hundreds of thousands of times:
require 'benchmark'
N = 10000
data = ('a'..'z').map { |c| c * 10 }.join("")
Benchmark.bm do |bm|
bm.report do
N.times { data.chars.chunk{ |c| c }.map { |c, a| [c, a.size] } }
end
bm.report do
N.times { data.scan(/((.)\2*)/).map { |m,c| [c, m.size] } }
end
end
user system total real
0.800000 0.010000 0.810000 ( 0.803824)
0.190000 0.000000 0.190000 ( 0.192915)
You need to build up an array of results, rather than simply stopping at the first one:
def consecutive_frequencies(str)
str.each_char.reduce([]) do |frequencies_arr, char|
if frequencies_arr.last && frequencies_arr.last[0] == char
frequencies_arr.last[1] += 1
else
frequencies_arr << [char, 1]
end
frequencies_arr
end
end
#steenslag gave the answer I would have given, so I'll try something different.
"aabbcbb".each_char.with_object([]) { |c,a| (a.any? && c == a.last.first) ?
a.last[-1] += 1 : a << [c, 1] }
#=> [["a", 2], ["b", 2], ["c", 1], ["b", 2]]
def encrypt(str)
count = 0
array = []
str.chars do |letter|
if array.empty?
array << letter
count += 1
elsif array.last == letter
count += 1
else
puts "[#{array}, #{count}]"
array.clear
count = 0
array << letter
count += 1
end
end
puts "[#{array}, #{count}]"
end
There are several errors with your implementation, I would try with a hash (rather than an array) and use something like this:
def encrypt(str)
count = 0
hash = {}
str.each_char do |letter|
if hash.key?(letter)
hash[letter] += 1
else
hash[letter] = 1
end
end
return hash
end
puts encrypt("aabbcbb")

How to make an argument of the method, the array and hash,simultaneously

I need send in method argument which is array and hash simultaneously. But don't know, how. Here is example:
def method(here should be this argument)
end
def show(*a)
p a
if a.length.even? == true
p Hash[*a]
else
p "hash conversion not possible"
end
end
show(1,2,3,4)
show(1,2,3)
output:
[1, 2, 3, 4]
{1=>2, 3=>4}
[1, 2, 3]
"hash conversion not possible"
EDIT:
From comment of OP here is the code OP could use:
def add(*arg)
#entries = {}
arg.each_slice(2) do |a| #entries[a.first] = a.last end
p #entries
end
add(1,2,3,4)
Output:
{1=>2, 3=>4}

Determine if one array contains all elements of another array

I need to tell if an array contains all of the elements of another array with duplicates.
[1,2,3].contains_all? [1,2] #=> true
[1,2,3].contains_all? [1,2,2] #=> false (this is where (a1-a2).empty? fails)
[2,1,2,3].contains_all? [1,2,2] #=> true
So the first array must contain as many or equal of the number of each unique element in the second array.
This question answers it for those using an array as a set, but I need to control for duplicates.
Update: Benchmarks
On Ruby 1.9.3p194
def bench
puts Benchmark.measure {
10000.times do
[1,2,3].contains_all? [1,2]
[1,2,3].contains_all? [1,2,2]
[2,1,2,3].contains_all? [1,2,2]
end
}
end
Results in:
Rohit 0.100000 0.000000 0.100000 ( 0.104486)
Chris 0.040000 0.000000 0.040000 ( 0.040178)
Sergio 0.160000 0.020000 0.180000 ( 0.173940)
sawa 0.030000 0.000000 0.030000 ( 0.032393)
Update 2: Larger Arrays
#a1 = (1..10000).to_a
#a2 = (1..1000).to_a
#a3 = (1..2000).to_a
def bench
puts Benchmark.measure {
1000.times do
#a1.contains_all? #a2
#a1.contains_all? #a3
#a3.contains_all? #a2
end
}
end
Results in:
Rohit 9.750000 0.410000 10.160000 ( 10.158182)
Chris 10.250000 0.180000 10.430000 ( 10.433797)
Sergio 14.570000 0.070000 14.640000 ( 14.637870)
sawa 3.460000 0.020000 3.480000 ( 3.475513)
class Array
def contains_all? other
other = other.dup
each{|e| if i = other.index(e) then other.delete_at(i) end}
other.empty?
end
end
Here's a naive and straightforward implementation (not the most efficient one, likely). Just count the elements and compare both elements and their occurrence counts.
class Array
def contains_all? ary
# group the arrays, so that
# [2, 1, 1, 3] becomes {1 => 2, 2 => 1, 3 => 1}
my_groups = group_and_count self
their_groups = group_and_count ary
their_groups.each do |el, cnt|
if !my_groups[el] || my_groups[el] < cnt
return false
end
end
true
end
private
def group_and_count ary
ary.reduce({}) do |memo, el|
memo[el] ||= 0
memo[el] += 1
memo
end
end
end
[1, 2, 3].contains_all? [1, 2] # => true
[1, 2, 3].contains_all? [1, 2, 2] # => false
[2, 1, 2, 3].contains_all? [1, 2, 2] # => true
[1, 2, 3].contains_all? [] # => true
[].contains_all? [1, 2] # => false
It seems you need a multiset. Check out this gem, I think it does what you need.
You can use is and do something like (if the intersection is equal to the second multiset then the first one includes all of its elements):
#ms1 & #ms2 == #ms2
Counting the number of occurrences and comparing them seems to be the obvious way to go.
class Array
def contains_all? arr
h = self.inject(Hash.new(0)) {|h, i| h[i] += 1; h}
arr.each do |i|
return false unless h.has_key?(i)
return false if h[i] == 0
h[i] -= 1
end
true
end
end
class Array
def contains_all?(ary)
ary.uniq.all? { |x| count(x) >= ary.count(x) }
end
end
test
irb(main):131:0> %w[a b c c].contains_all? %w[a b c]
=> true
irb(main):132:0> %w[a b c c].contains_all? %w[a b c c]
=> true
irb(main):133:0> %w[a b c c].contains_all? %w[a b c c c]
=> false
irb(main):134:0> %w[a b c c].contains_all? %w[a]
=> true
irb(main):135:0> %w[a b c c].contains_all? %w[x]
=> false
irb(main):136:0> %w[a b c c].contains_all? %w[]
=> true
The following version is faster and shorter in code.
class Array
def contains_all?(ary)
ary.all? { |x| count(x) >= ary.count(x) }
end
end
Answering with my own implementation, but definitely want to see if someone can come up with a more efficient way. (I won't accept my own answer)
class Array
def contains_all?(a2)
a2.inject(self.dup) do |copy, el|
if copy.include? el
index = copy.index el
copy.delete_at index
else
return false
end
copy
end
true
end
end
And the tests:
1.9.3p194 :016 > [1,2,3].contains_all? [1,2] #=> true
=> true
1.9.3p194 :017 > [1,2,3].contains_all? [1,2,2] #=> false (this is where (a1-a2).empty? fails)
=> false
1.9.3p194 :018 > [2,1,2,3].contains_all? [1,2,2] #=> true
=> true
This solution will only iterate through both lists once, and hence run in linear time. It might however be too much overhead if the lists are expected to be very small.
class Array
def contains_all?(other)
return false if other.size > size
elem_counts = other.each_with_object(Hash.new(0)) { |elem,hash| hash[elem] += 1 }
each do |elem|
elem_counts.delete(elem) if (elem_counts[elem] -= 1) <= 0
return true if elem_counts.empty?
end
false
end
end
If you can't find a method, you can build one using ruby's include? method.
Official documentation: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-include-3F
Usage:
array = [1, 2, 3, 4]
array.include? 3 #=> true
Then, you can do a loop:
def array_includes_all?( array, comparision_array )
contains = true
for i in comparision_array do
unless array.include? i
contains = false
end
end
return contains
end
array_includes_all?( [1,2,3,2], [1,2,2] ) #=> true

ruby methods that either yield or return Enumerator

in recent versions of Ruby, many methods in Enumerable return an Enumerator when they are called without a block:
[1,2,3,4].map
#=> #<Enumerator: [1, 2, 3, 4]:map>
[1,2,3,4].map { |x| x*2 }
#=> [2, 4, 6, 8]
I want do do the same thing in my own methods like so:
class Array
def double(&block)
# ???
end
end
arr = [1,2,3,4]
puts "with block: yielding directly"
arr.double { |x| p x }
puts "without block: returning Enumerator"
enum = arr.double
enum.each { |x| p x }
The core libraries insert a guard return to_enum(:name_of_this_method, arg1, arg2, ..., argn) unless block_given?. In your case:
class Array
def double
return to_enum(:double) unless block_given?
each { |x| yield 2*x }
end
end
>> [1, 2, 3].double { |x| puts(x) }
2
4
6
>> ys = [1, 2, 3].double.select { |x| x > 3 }
#=> [4, 6]
use Enumerator#new:
class Array
def double(&block)
Enumerator.new do |y|
each do |x|
y.yield x*2
end
end.each(&block)
end
end
Another approach might be:
class Array
def double(&block)
map {|y| y*2 }.each(&block)
end
end
easiest way for me
class Array
def iter
#lam = lambda {|e| puts e*3}
each &#lam
end
end
array = [1,2,3,4,5,6,7]
array.iter
=>
3
6
9
12
15
18
21

Resources