Faster to consolidate regexes in a hash of regexes in Ruby? - ruby

I have a string that I want to map to an integer. Many strings can map to the same integer, so I'm using regexes to match strings that should map to the same integer.
Example:
str = "hello"
REGEXES.each do |key, val|
if str =~ key
print val
end
end
where REGEXES is a hash that maps regex to integer.
Which is better:
REGEXES = [/hello/ => 2, /foo/ => 2, /bar/ => 3]
or
REGEXES = [/(hello|foo)/ => 2, /bar/ => 3]

Benchmark is your friend:
require 'benchmark'
str = 'hello'
num = 1000000
Benchmark.bmbm do |x|
x.report('individual keys:') do
regexes = [/hello/ => 2, /foo/ => 2, /bar/ => 3]
num.times do
regexes.each {|key, val| str =~ key}
end
end
x.report('combined keys: ') do
regexes = [/(hello|foo)/ => 2, /bar/ => 3]
num.times do
regexes.each {|key, val| str =~ key}
end
end
end
Result:
Rehearsal ----------------------------------------------------
individual keys:   1.600000   0.010000   1.610000 (  1.780246)
combined keys:     1.610000   0.010000   1.620000 (  1.761067)
------------------------------------------- total: 3.230000sec
                       user     system      total        real
individual keys:   1.570000   0.000000   1.570000 (  1.589879)
combined keys:     1.590000   0.010000   1.600000 (  1.678724)
As you can see, in this case there's not much difference.
I'd suggest that you try it with your full hash of regexes/integers, and see if the difference is more significant. If there is, well, there's your answer. If there's not, you can probably feel free to use whichever makes more sense.

I'd go with second version since it'll reduce loop cycles count. Still if there're too much values mapping to the same int, you can split them up in separate hash elements.

Related

Counting unique occurrences of values in a hash

I'm trying to count occurrences of unique values matching a regex pattern in a hash.
If there's three different values, multiple times, I want to know how much each value occurs.
This is the code I've developed to achieve that so far:
def trim(results)
open = []
results.map { |k, v| v }.each { |n| open << n.to_s.scan(/^closed/) }
puts open.size
end
For some reason, it returns the length of all the values, not just the ones I tried a match on. I've also tried using results.each_value, to no avail.
Another way:
hash = {a: 'foo', b: 'bar', c: 'baz', d: 'foo'}
hash.each_with_object(Hash.new(0)) {|(k,v),h| h[v]+=1 if v.start_with?('foo')}
#=> {"foo"=>2}
or
hash.each_with_object(Hash.new(0)) {|(k,v),h| h[v]+=1 if v =~ /^foo|bar/}
#=> {"foo"=>2, "bar"=>1}
Something like this?
hash = {a: 'foo', b: 'bar', c: 'baz', d: 'foo'}
groups = hash.group_by{ |k, v| v[/(?:foo|bar)/] }
# => {"foo"=>[[:a, "foo"], [:d, "foo"]],
# "bar"=>[[:b, "bar"]],
# nil=>[[:c, "baz"]]}
Notice that there is a nil key, which means the regex didn't match anything. We can get rid of it because we (probably) don't care. Or maybe you do care, in which case, don't get rid of it.
groups.delete(nil)
This counts the number of matching "hits":
groups.map{ |k, v| [k, v.size] }
# => [["foo", 2], ["bar", 1]]
group_by is a magical method and well worthy of learning.
def count(hash, pattern)
hash.each_with_object({}) do |(k, v), counts|
counts[k] = v.count{|s| s.to_s =~ pattern}
end
end
h = { a: ['open', 'closed'], b: ['closed'] }
count(h, /^closed/)
=> {:a=>1, :b=>1}
Does that work for you?
I think it worths to update for RUBY_VERSION #=> "2.7.0" which introduces Enumerable#tally:
h = {a: 'foo', b: 'bar', c: 'baz', d: 'foo'}
h.values.tally #=> {"foo"=>2, "bar"=>1, "baz"=>1}
h.values.tally.select{ |k, _| k=~ /^foo|bar/ } #=> {"foo"=>2, "bar"=>1}

Ruby: Sorting an array of strings, in alphabetical order, that includes some arrays of strings

Say I have:
a = ["apple", "pear", ["grapes", "berries"], "peach"]
and I want to sort by:
a.sort_by do |f|
f.class == Array ? f.to_s : f
end
I get:
[["grapes", "berries"], "apple", "peach", "pear"]
Where I actually want the items in alphabetical order, with array items being sorted on their first element:
["apple", ["grapes", "berries"], "peach", "pear"]
or, preferably, I want:
["apple", "grapes, berries", "peach", "pear"]
If the example isn't clear enough, I'm looking to sort the items in alphabetical order.
Any suggestions on how to get there?
I've tried a few things so far yet can't seem to get it there. Thanks.
I think this is what you want:
a.sort_by { |f| f.class == Array ? f.first : f }
I would do
a = ["apple", "pear", ["grapes", "berries"], "peach"]
a.map { |e| Array(e).join(", ") }.sort
# => ["apple", "grapes, berries", "peach", "pear"]
Array#sort_by clearly is the right method, but here's a reminder of how Array#sort would be used here:
a.sort do |s1,s2|
t1 = (s1.is_a? Array) ? s1.first : s1
t2 = (s2.is_a? Array) ? s2.first : s2
t1 <=> t2
end.map {|e| (e.is_a? Array) ? e.join(', ') : e }
#=> ["apple", "grapes, berries", "peach", "pear"]
#theTinMan pointed out that sort is quite a bit slower than sort_by here, and gave a reference that explains why. I've been meaning to see how the Benchmark module is used, so took the opportunity to compare the two methods for the problem at hand. I used #Rafa's solution for sort_by and mine for sort.
For testing, I constructed an array of 100 random samples (each with 10,000 random elements to be sorted) in advance, so the benchmarks would not include the time needed to construct the samples (which was not insignificant). 8,000 of the 10,000 elements were random strings of 8 lowercase letters. The other 2,000 elements were 2-tuples of the form [str1, str2], where str1 and str2 were each random strings of 8 lowercase letters. I benchmarked with other parameters, but the bottom-line results did not vary significantly.
require 'benchmark'
# n: total number of items to sort
# m: number of two-tuples [str1, str2] among n items to sort
# n-m: number of strings among n items to sort
# k: length of each string in samples
# s: number of sorts to perform when benchmarking
def make_samples(n, m, k, s)
s.times.with_object([]) { |_, a| a << test_array(n,m,k) }
end
def test_array(n,m,k)
a = ('a'..'z').to_a
r = []
(n-m).times { r << a.sample(k).join }
m.times { r << [a.sample(k).join, a.sample(k).join] }
r.shuffle!
end
# Here's what the samples look like:
make_samples(6,2,4,4)
#=> [["bloj", "izlh", "tebz", ["lfzx", "rxko"], ["ljnv", "tpze"], "ryel"],
# ["jyoh", "ixmt", "opnv", "qdtk", ["jsve", "itjw"], ["pnog", "fkdr"]],
# ["sxme", ["emqo", "cawq"], "kbsl", "xgwk", "kanj", ["cylb", "kgpx"]],
# [["rdah", "ohgq"], "bnup", ["ytlr", "czmo"], "yxqa", "yrmh", "mzin"]]
n = 10000 # total number of items to sort
m = 2000 # number of two-tuples [str1, str2] (n-m strings)
k = 8 # length of each string
s = 100 # number of sorts to perform
samples = make_samples(n,m,k,s)
Benchmark.bm('sort_by'.size) do |bm|
bm.report 'sort_by' do
samples.each do |s|
s.sort_by { |f| f.class == Array ? f.first : f }
end
end
bm.report 'sort' do
samples.each do |s|
s.sort do |s1,s2|
t1 = (s1.is_a? Array) ? s1.first : s1
t2 = (s2.is_a? Array) ? s2.first : s2
t1 <=> t2
end
end
end
end
user system total real
sort_by 1.360000 0.000000 1.360000 ( 1.364781)
sort 4.050000 0.010000 4.060000 ( 4.057673)
Though it was never in doubt, #theTinMan was right! I did a few other runs with different parameters, but sort_by consistently thumped sort by similar performance ratios.
Note the "system" time is zero for sort_by. In other runs it was sometimes zero for sort. The values were always zero or 0.010000, leading me to wonder what's going on there. (I ran these on a Mac.)
For readers unfamiliar with Benchmark, Benchmark#bm takes an argument that equals the amount of left-padding desired for the header row (user system...). bm.report takes a row label as an argument.
You are really close. Just switch .to_s to .first.
irb(main):005:0> b = ["grapes", "berries"]
=> ["grapes", "berries"]
irb(main):006:0> b.to_s
=> "[\"grapes\", \"berries\"]"
irb(main):007:0> b.first
=> "grapes"
Here is one that works:
a.sort_by do |f|
f.class == Array ? f.first : f
end
Yields:
["apple", ["grapes", "berries"], "peach", "pear"]
a.map { |b| b.is_a?(Array) ? b.join(', ') : b }.sort
# => ["apple", "grapes, berries", "peach", "pear"]
Replace to_s with join.
a.sort_by do |f|
f.class == Array ? f.join : f
end
# => ["apple", ["grapes", "berries"], "peach", "pear"]
Or more concisely:
a.sort_by {|x| [*x].join }
# => ["apple", ["grapes", "berries"], "peach", "pear"]
The problem with to_s is that it converts your Array to a string that starts with "[":
"[\"grapes\", \"berries\"]"
which comes alphabetically before the rest of your strings.
join actually creates the string that you had expected to sort by:
"grapesberries"
which is alphabetized correctly, according to your logic.
If you don't want the arrays to remain arrays, then it's a slightly different operation, but you will still use join.
a.map {|x| [*x].join(", ") }.sort
# => ["apple", "grapes, berries", "peach", "pear"]
Sort a Flattened Array
If you just want all elements of your nested array flattened and then sorted in alphabetical order, all you need to do is flatten and sort. For example:
["apple", "pear", ["grapes", "berries"], "peach"].flatten.sort
#=> ["apple", "berries", "grapes", "peach", "pear"]

All combinations for hash of arrays [duplicate]

This question already has answers here:
Turning a Hash of Arrays into an Array of Hashes in Ruby
(7 answers)
Closed 9 years ago.
Summary
Given a Hash where some of the values are arrays, how can I get an array of hashes for all possible combinations?
Test Case
options = { a:[1,2], b:[3,4], c:5 }
p options.self_product
#=> [{:a=>1, :b=>3, :c=>5},
#=> {:a=>1, :b=>4, :c=>5},
#=> {:a=>2, :b=>3, :c=>5},
#=> {:a=>2, :b=>4, :c=>5}]
When the value for a particular key is not an array, it should simply be included as-is in each resulting hash, the same as if it were wrapped in an array.
Motivation
I need to generate test data given a variety of values for different options. While I can use [1,2].product([3,4],[5]) to get the Cartesian Product of all possible values, I'd rather use hashes to be able to label both my input and output so that the code is more self-explanatory than just using array indices.
I suggest a little pre-processing to keep the result general:
options = { a:[1,2], b:[3,4], c:5 }
options.each_key {|k| options[k] = [options[k]] unless options[k].is_a? Array}
=> {:a=>[1, 2], :b=>[3, 4], :c=>[5]}
I edited to make a few refinements, principally the use of inject({}):
class Hash
def self_product
f, *r = map {|k,v| [k].product(v).map {|e| Hash[*e]}}
f.product(*r).map {|a| a.inject({}) {|h,e| e.each {|k,v| h[k]=v}; h}}
end
end
...though I prefer #Phrogz's '2nd attempt', which, with pre-processing 5=>[5], would be:
class Hash
def self_product
f, *r = map {|k,v| [k].product(v)}
f.product(*r).map {|a| Hash[*a.flatten]}
end
end
First attempt:
class Hash
#=> Given a hash of arrays get an array of hashes
#=> For example, `{ a:[1,2], b:[3,4], c:5 }.self_product` yields
#=> [ {a:1,b:3,c:5}, {a:1,b:4,c:5}, {a:2,b:3,c:5}, {a:2,b:4,c:5} ]
def self_product
# Convert array values into single key/value hashes
all = map{|k,v| [k].product(v.is_a?(Array) ? v : [v]).map{|k,v| {k=>v} }}
#=> [[{:a=>1}, {:a=>2}], [{:b=>3}, {:b=>4}], [{:c=>5}]]
# Create the product of all mini hashes, and merge them into a single hash
all.first.product(*all[1..-1]).map{ |a| a.inject(&:merge) }
end
end
p({ a:[1,2], b:[3,4], c:5 }.self_product)
#=> [{:a=>1, :b=>3, :c=>5},
#=> {:a=>1, :b=>4, :c=>5},
#=> {:a=>2, :b=>3, :c=>5},
#=> {:a=>2, :b=>4, :c=>5}]
Second attempt, inspired by #Cary's answer:
class Hash
def self_product
first, *rest = map{ |k,v| [k].product(v.is_a?(Array) ? v : [v]) }
first.product(*rest).map{ |x| Hash[x] }
end
end
In addition to being more elegant, the second answer is also about 4.5x faster than the first when creating a large result (262k hashes with 6 keys each):
require 'benchmark'
Benchmark.bm do |x|
n = *1..8
h = { a:n, b:n, c:n, d:n, e:n, f:n }
%w[phrogz1 phrogz2].each{ |n| x.report(n){ h.send(n) } }
end
#=> user system total real
#=> phrogz1 4.450000 0.050000 4.500000 ( 4.502511)
#=> phrogz2 0.940000 0.050000 0.990000 ( 0.980424)

simple hash merge by array of keys and values in ruby (with perl example)

In Perl to perform a hash update based on arrays of keys and values I can do something like:
#hash{'key1','key2','key3'} = ('val1','val2','val3');
In Ruby I could do something similar in a more complicated way:
hash.merge!(Hash[ *[['key1','key2','key3'],['val1','val2','val3']].transpose ])
OK but I doubt the effectivity of such procedure.
Now I would like to do a more complex assignment in a single line.
Perl example:
(#hash{'key1','key2','key3'}, $key4) = &some_function();
I have no idea if such a thing is possible in some simple Ruby way. Any hints?
For the Perl impaired, #hash{'key1','key2','key3'} = ('a', 'b', 'c') is a hash slice and is a shorthand for something like this:
$hash{'key1'} = 'a';
$hash{'key2'} = 'b';
$hash{'key3'} = 'c';
In Ruby 1.9 Hash.[] can take as its argument an array of two-valued arrays (in addition to the old behavior of a flat list of alternative key/value arguments). So it's relatively simple to do:
mash.merge!( Hash[ keys.zip(values) ] )
I do not know perl, so I'm not sure what your final "more complex assignment" is trying to do. Can you explain in words—or with the sample input and output—what you are trying to achieve?
Edit: based on the discussion in #fl00r's answer, you can do this:
def f(n)
# return n arguments
(1..n).to_a
end
h = {}
keys = [:a,:b,:c]
*vals, last = f(4)
h.merge!( Hash[ keys.zip(vals) ] )
p vals, last, h
#=> [1, 2, 3]
#=> 4
#=> {:a=>1, :b=>2, :c=>3}
The code *a, b = some_array will assign the last element to b and create a as an array of the other values. This syntax requires Ruby 1.9. If you require 1.8 compatibility, you can do:
vals = f(4)
last = vals.pop
h.merge!( Hash[ *keys.zip(vals).flatten ] )
You could redefine []= to support this:
class Hash
def []=(*args)
*keys, vals = args # if this doesn't work in your version of ruby, use "keys, vals = args[0...-1], args.last"
merge! Hash[keys.zip(vals.respond_to?(:each) ? vals : [vals])]
end
end
Now use
myhash[:key1, :key2, :key3] = :val1, :val2, :val3
# or
myhash[:key1, :key2, :key3] = some_method_returning_three_values
# or even
*myhash[:key1, :key2, :key3], local_var = some_method_returning_four_values
you can do this
def some_method
# some code that return this:
[{:key1 => 1, :key2 => 2, :key3 => 3}, 145]
end
hash, key = some_method
puts hash
#=> {:key1 => 1, :key2 => 2, :key3 => 3}
puts key
#=> 145
UPD
In Ruby you can do "parallel assignment", but you can't use hashes like you do in Perl (hash{:a, :b, :c)). But you can try this:
hash[:key1], hash[:key2], hash[:key3], key4 = some_method
where some_method returns an Array with 4 elements.

Searching for range overlaps in Ruby hashes

Say you have the following Ruby hash,
hash = {:a => [[1, 100..300],
[2, 200..300]],
:b => [[1, 100..300],
[2, 301..400]]
}
and the following functions,
def overlaps?(range, range2)
range.include?(range2.begin) || range2.include?(range.begin)
end
def any_overlaps?(ranges)
# This calls to_proc on the symbol object; it's syntactically equivalent to
# ranges.sort_by {|r| r.begin}
ranges.sort_by(&:begin).each_cons(2).any? do |r1, r2|
overlaps?(r1, r2)
end
end
and it's your desire to, for each key in hash, test whether any range overlaps with any other. In hash above, I would expect hash[:a] to make me mad and hash[:b] to not.
How is this best implemented syntactically?
hash.each{|k, v| puts "#{k} #{any_overlaps?( v.map( &:last )) ? 'overlaps' : 'is ok'}."}
output:
a overlaps.
b is ok.
Here's another way to write any_overlaps:
def any_overlaps?(ranges)
(a = ranges.map { |r| [r.first, r.last] }.sort_by(&:first).flatten) != a.sort
end
any_overlaps? [(51..60),(11..20),(18..30),(0..10),(31..40)] # => true
any_overlaps? [(51..60),(11..20),(21..30),(0..10),(31..40)] # => false

Resources