How to merge multiple hashes? - ruby

Right now, I'm merging two hashes like this:
department_hash = self.parse_department html
super_saver_hash = self.parse_super_saver html
final_hash = department_hash.merge(super_saver_hash)
Output:
{:department=>{"Pet Supplies"=>{"Birds"=>16281, "Cats"=>245512,
"Dogs"=>513926, "Fish & Aquatic Pets"=>46811, "Horses"=>14805,
"Insects"=>364, "Reptiles & Amphibians"=>5816, "Small
Animals"=>19769}}, :super_saver=>{"Free Super Saver
Shipping"=>126649}}
But now I want to merge more in the future. For example:
department_hash = self.parse_department html
super_saver_hash = self.parse_super_saver html
categories_hash = self.parse_categories html
How to merge multiple hashes?

How about:
[department_hash, super_saver_hash, categories_hash].reduce &:merge

You can just call merge again:
h1 = {foo: :bar}
h2 = {baz: :qux}
h3 = {quux: :garply}
h1.merge(h2).merge(h3)
#=> {:foo=>:bar, :baz=>:qux, :quux=>:garply}

You can do below way using Enumerable#inject:
h = {}
arr = [{:a=>"b"},{"c" => 2},{:a=>4,"c"=>"Hi"}]
arr.inject(h,:update)
# => {:a=>4, "c"=>"Hi"}
arr.inject(:update)
# => {:a=>4, "c"=>"Hi"}

It took me a while to figure out how to merge multi-nested hashes after going through this Question and its Answers. It turned out I was iterating through the collections of hashes incorrectly, causing all kinds of problems with null values.
This sample command-line app shows how to merge multiple hashes with a combination of store and merge!, depending on whether or not they were top-level hash keys. It uses command-line args with a few known key name for categorization purposes.
Full code from the Gist URL is provided below as a courtesy:
# Ruby - A nested hash example
# Load each pair of args on the command-line as a key-value pair
# For example from CMD.exe:
# call ruby.exe ruby_nested_hash_example.rb Age 30 Name Mary Fav_Hobby Ataraxia Fav_Number 42
# Output would be:
# {
# "data_info": {
# "types": {
# "nums": {
# "Age": 30,
# "Fav_Number": 42
# },
# "strings": {
# "Name": "Mary",
# "Fav_Hobby": "Ataraxia"
# }
# },
# "data_id": "13435436457"
# }
# }
if (ARGV.count % 2 != 0) || (ARGV.count < 2)
STDERR.puts "You must provide an even amount of command-line args to make key-value pairs.\n"
abort
end
require 'json'
cmd_hashes = {}
nums = {}
strings = {}
types = {}
#FYI `tl` == top-level
all_tl_keys = {}
data_info = {}
data_id = {:data_id => "13435436457"}
_key = ""
_value = ""
element = 0
ARGV.each do |i|
if element % 2 == 0
_key=i
else
if (i.to_i!=0) && (i!=0)
_value=i.to_i
else
_value=i
end
end
if (_key != "") && (_value != "")
cmd_hashes.store(_key, _value)
_key = ""
_value = ""
end
element+=1
end
cmd_hashes.each do |key, value|
if value.is_a? Numeric
nums.store(key, value)
else
strings.store(key, value)
end
end
if nums.size > 0; types.merge!(:nums => nums) end
if strings.size > 0; types.merge!(:strings => strings) end
if types.size > 0; all_tl_keys.merge!(:types => types) end
if data_id.size > 0; all_tl_keys.merge!(data_id) end
if all_tl_keys.size > 0; data_info.merge!(:data_info => all_tl_keys) end
if data_info.size > 0; puts JSON.pretty_generate(data_info) end

Suppose you are having arr = [{x: 10},{y: 20},{z: 30}]
then do
arr.reduce(:merge)

Related

Ruby find atrribute in array of Objects with highest/lowest occurence and return attribute

Hello I am wondering if anyone may be able to give some assistance with two functions I am working on for a Ruby project. I have an array of Objects, and I need to get a certain attribute with the highest and lowest occurrence in each function respectively.
So far I have this, which works, but seems a bit too verbose:
def most_visited_port(time)
most_visited_port_name = ""
most_visited = 0
ending_port_array = []
#ships.each do |ship|
ending_port_array << ship.ending_port
most_visited = ending_port_array.sort.max_by { |v| ending_port_array.count(v) } != ending_port_array.sort.reverse.max_by { |v| ending_port_array.count(v) } ? false : ending_port_array.max_by { |v| ending_port_array.count(v) }
end
#ships.each do |ship|
if time.to_date === ship.time_arrived.to_date && ship.ending_port == most_visited
most_visited_port_name = ship.ending_port_name
end
end
pp most_visited_port_name
end
def least_visited_port(time)
least_visited_port_name = ""
least_visited = 0
ending_port_array = []
#ships.each do |ship|
ending_port_array << ship.ending_port
least_visited = ending_port_array.sort.min_by { |v| ending_port_array.count(v) } != ending_port_array.sort.reverse.min_by { |v| ending_port_array.count(v) } ? false : ending_port_array.min_by { |v| ending_port_array.count(v) }
end
#ships.each do |ship|
if time.to_date === ship.time_arrived.to_date && ship.ending_port == least_visited
least_visited_port_name = ship.ending_port_name
end
end
pp least_visited_port_name
end
Here is a sample of the array of Objects format:
[#<FleetShip:0x0000000108444450
#average_speed=46.02272727272727,
#beginning_port=7,
#beginning_port_name="Summermill",
#distance=81.0,
#ending_port=3,
#ending_port_name="Seamont",
#id=0,
#ship_name="Alpha",
#time_arrived=2016-06-12 08:05:36 -0500,
#time_left=2016-06-12 06:20:00 -0500>,
#<FleetShip:0x0000000108444400
#average_speed=32.01932579334578,
#beginning_port=7,
#beginning_port_name="Summermill",
#distance=81.0,
#ending_port=3,
#ending_port_name="Seamont",
#id=1,
#ship_name="Sea Ghost",
#time_arrived=2016-06-12 11:07:47 -0500,
#time_left=2016-06-12 08:36:00 -0500>]
But could anyone give some assistance on a possibly simpler or more concise way to pull it off?
For fun, you could use each_with_object to build a hash of ending_port values and their frequency, and then retrieve the most frequent.
I'm going to use a much simpler example.
A = Struct.new(:b)
c = [A.new(3), A.new(2), A.new(1), A.new(3), A.new(1), A.new(3), A.new(3)]
most_freq_b = c.each_with_object({}) { |x, h|
h[x.b] ||= 0
h[x.b] += 1
}.max_by(&:last).first
# => 3
This does not account for situations where more than one value for b occurs equal numbers of times. We can tweak it though, to accomplish this.
A = Struct.new(:b)
c = [A.new(3), A.new(2), A.new(1), A.new(3), A.new(1)]
freq = c.each_with_object({}) { |x, h|
h[x.b] ||= 0
h[x.b] += 1
}
highest_freq = freq.values.max
most_freq_b = freq.select { |_, v| v == highest_freq }.keys
# => [3, 1]
Alternatively, we can provide a default value of 0 for the hash, simplifying part of the code.
freq = c.each_with_object(Hash.new(0)) { |x, h|
h[x.b] += 1
}

How to solve the "retrieve_values" problem

I'm working on this problem:
Write a method retrieve_values that takes in two hashes and a key. The method should return an array containing the values from the two hashes that correspond with the given key.
def retrieve_values(hash1, hash2, key)
end
dog1 = {"name"=>"Fido", "color"=>"brown"}
dog2 = {"name"=>"Spot", "color"=> "white"}
print retrieve_values(dog1, dog2, "name") #=> ["Fido", "Spot"]
puts
print retrieve_values(dog1, dog2, "color") #=> ["brown", "white"]
puts
I came up with a working solution:
def retrieve_values(hash1, hash2, key)
arr = []
hash1.each { |key| } && hash2.each { |key| }
if key == "name"
arr << hash1["name"] && arr << hash2["name"]
elsif key == "color"
arr << hash1["color"] && arr << hash2["color"]
end
return arr
end
I then looked at the 'official' solution:
def retrieve_values(hash1, hash2, key)
val1 = hash1[key]
val2 = hash2[key]
return [val1, val2]
end
What is wrong with my code? Or is it an acceptable "different" approach?
Line with hash1.each { |key| } && hash2.each { |key| } just does nothing it is not needed even in your solution.
This part a bit difficult to read arr << hash1["name"] && arr << hash2["name"]. It mutates the array two times in one line, this kind of style could lead to bugs.
Also, your code sticks only to two keys name and color:
dog1 = {"name"=>"Fido", "color"=>"brown", "age" => 1}
dog2 = {"name"=>"Spot", "color"=> "white", "age" => 2}
> retrieve_values(dog1, dog2, "age")
=> []
The official solution will return [1, 2].
You don't need here to explicitly use return keyword, any block of code returns the last evaluated expression. But it is a matter of style guide.
It is possible to simplify even the official solution:
def retrieve_values(hash1, hash2, key)
[hash1[key], hash2[key]]
end

Ruby merge duplicates in string

If I have a string like this
str =<<END
7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1
END
If a number in the first value shows up again, I want to add their second values together. So the final string would look like this
7312357006,1246.221
3214058234,3499.2
1324958723,232.1
3214173443,234.1
6134513494,23.2
If the final output is an array that's fine too.
There are lots of ways to do this in Ruby. One particularly terse way is to use String#scan:
str = <<END
7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1
END
data = Hash.new(0)
str.scan(/(\d+),([\d.]+)/) {|k,v| data[k] += v.to_f }
p data
# => { "7312357006" => 1246.221,
# "3214058234" => 3499.2,
# "1324958723" => 232.1,
# "3214173443" => 234.1,
# "6134513494" => 23.2 }
This uses the regular expression /(\d+),([\d.]+)/ to extract the two values from each line. The block is called with each pair as arguments, which are then merged into the hash.
This could also be written as a single expression using each_with_object:
data = str.scan(/(\d+),([\d.]+)/)
.each_with_object(Hash.new(0)) {|(k,v), hsh| hsh[k] += v.to_f }
# => (same as above)
There are likewise many ways to print the result, but here are a couple I like:
puts data.map {|kv| kv.join(",") }.join("\n")
# => 7312357006,1246.221
# 3214058234,3499.2
# 1324958723,232.1
# 3214173443,234.1
# 6134513494,23.2
# or:
puts data.map {|k,v| "#{k},#{v}\n" }.join
# => (same as above)
You can see all of these in action on repl.it.
Edit: Although I don't recommend either of these for the sake of readability, here's more just for kicks (requires Ruby 2.4+):
data = str.lines.group_by {|s| s.slice!(/(\d+),/); $1 }
.transform_values {|a| a.sum(&:to_f) }
...or, to going straight to a string:
puts str.lines.group_by {|s| s.slice!(/(\d+),/); $1 }
.map {|k,vs| "#{k},#{vs.sum(&:to_f)}\n" }.join
Since repl.it is stuck on Ruby 2.3: Try it online!
You could achieve this using each_with_object, as below:
str = "7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1"
# convert the string into nested pairs of floats
# to briefly summarise the steps: split entries by newline, strip whitespace, split by comma, convert to floats
arr = str.split("\n").map(&:strip).map { |el| el.split(",").map(&:to_f) }
result = arr.each_with_object(Hash.new(0)) do |el, hash|
hash[el.first] += el.last
end
# => {7312357006.0=>1246.221, 3214058234.0=>3499.2, 1324958723.0=>232.1, 3214173443.0=>234.1, 6134513494.0=>23.2}
# You can then call `to_a` on result if you want:
result.to_a
# => [[7312357006.0, 1246.221], [3214058234.0, 3499.2], [1324958723.0, 232.1], [3214173443.0, 234.1], [6134513494.0, 23.2]]
each_with_object iterates through each pair of data, providing them with access to an accumulator (in this the hash). By following this approach, we can add each entry to the hash, and add together the totals if they appear more than once.
Hope that helps - let me know if you've any questions.
def combine(str)
str.each_line.with_object(Hash.new(0)) do |s,h|
k,v = s.split(',')
h.update(k=>v.to_f) { |k,o,n| o+n }
end.reduce('') { |s,kv_pair| s << "%s,%g\n" % kv_pair }
end
puts combine str
7312357006,1246.22
3214058234,3499.2
1324958723,232.1
3214173443,234.1
6134513494,23.2
Notes:
using String#each_line is preferable to str.split("\n") as the former returns an enumerator whereas the latter returns a temporary array. Each element generated by the enumerator is line of str that (unlike the elements of str.split("\n")) ends with a newline character, but that is of no concern.
see Hash::new, specifically when a default value (here 0) is used. If a hash has been defined h = Hash.new(0) and h does not have a key k, h[k] returns the default value, zero (h is not changed). When Ruby encounters the expression h[k] += 1, the first thing she does is expand it to h[k] = h[k] + 1. If h has been defined with a default value of zero, and h does not have a key k, h[k] on the right of the equality (syntactic sugar1 for h.[](k)) returns zero.
see Hash#update (aka merge!). h.update(k=>v.to_f) is syntactic sugar for h.update({ k=>v.to_f })
see Kernel#sprint for explanations of the formatting directives %s and %g.
the receiver for the expression reduce('') { |s,kv_pair| s << "%s,%g\n" % kv_pair } (in the penultimate line), is the following hash.
{"7312357006"=>1246.221, "3214058234"=>3499.2, "1324958723"=>232.1,
"3214173443"=>234.1, "6134513494"=>23.2}
1 Syntactic sugar is a shortcut allowed by Ruby.
Implemented this solution as hash was giving me issues:
d = []
s.split("\n").each do |line|
x = 0
q = 0
dup = false
line.split(",").each do |data|
if x == 0 and d.include? data then dup = true ; q = d.index(data) elsif x == 0 then d << data end
if x == 1 and dup == false then d << data end
if x == 1 and dup == true then d[q+1] = "#{'%.2f' % (d[q+1].to_f + data.to_f).to_s}" end
if x == 2 and dup == false then d << data end
x += 1
end
end
x = 0
s = ""
d.each do |val|
if x == 0 then s << "#{val}," end
if x == 1 then s << "#{val}\n ; x = 0" end
x += 1
end
puts(s)

Merging Ranges using Sets - Error - Stack level too deep (SystemStackError)

I have a number of ranges that I want merge together if they overlap. The way I’m currently doing this is by using Sets.
This is working. However, when I attempt the same code with a larger ranges as follows, I get a `stack level too deep (SystemStackError).
require 'set'
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten!
sets_subsets = set.divide { |i, j| (i - j).abs == 1 } # this line causes the error
puts sets_subsets
The line that is failing is taken directly from the Ruby Set Documentation.
I would appreciate it if anyone could suggest a fix or an alternative that works for the above example
EDIT
I have put the full code I’m using here:
Basically it is used to add html tags to an amino acid sequence according to some features.
require 'set'
def calculate_formatting_classes(hsps, signalp)
merged_hsps = merge_ranges(hsps)
sp = format_signalp(merged_hsps, signalp)
hsp_class = (merged_hsps - sp[1]) - sp[0]
rank_format_positions(sp, hsp_class)
end
def merge_ranges(ranges)
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten
end
def format_signalp(merged_hsps, sp)
sp_class = sp - merged_hsps
sp_hsp_class = sp & merged_hsps # overlap regions between sp & merged_hsp
[sp_class, sp_hsp_class]
end
def rank_format_positions(sp, hsp_class)
results = []
results += sets_to_hash(sp[0], 'sp')
results += sets_to_hash(sp[1], 'sphsp')
results += sets_to_hash(hsp_class, 'hsp')
results.sort_by { |s| s[:pos] }
end
def sets_to_hash(set = nil, cl)
return nil if set.nil?
hashes = []
merged_set = set.divide { |i, j| (i - j).abs == 1 }
merged_set.each do |s|
hashes << { pos: s.min.to_i - 1, insert: "<span class=#{cl}>" }
hashes << { pos: s.max.to_i - 0.1, insert: '</span>' } # for ordering
end
hashes
end
working_hsp = [Range.new(7, 136), Range.new(143, 178)]
not_working_hsp = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
sp = Range.new(1, 20).to_set
# working
results = calculate_formatting_classes(working_hsp, sp)
# Not Working
# results = calculate_formatting_classes(not_working_hsp, sp)
puts results
Here is one way to do this:
ranges = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
ranges.size.times do
ranges = ranges.sort_by(&:begin)
t = ranges.each_cons(2).to_a
t.each do |r1, r2|
if (r2.cover? r1.begin) || (r2.cover? r1.end) ||
(r1.cover? r2.begin) || (r1.cover? r2.end)
ranges << Range.new([r1.begin, r2.begin].min, [r1.end, r2.end].max)
ranges.delete(r1)
ranges.delete(r2)
t.delete [r1,r2]
end
end
end
p ranges
#=> [73..2914, 3203..3241]
The other answers aren't bad, but I prefer a simple recursive approach:
def merge_ranges(*ranges)
range, *rest = ranges
return if range.nil?
# Find the index of the first range in `rest` that overlaps this one
other_idx = rest.find_index do |other|
range.cover?(other.begin) || other.cover?(range.begin)
end
if other_idx
# An overlapping range was found; remove it from `rest` and merge
# it with this one
other = rest.slice!(other_idx)
merged = ([range.begin, other.begin].min)..([range.end, other.end].max)
# Try again with the merged range and the remaining `rest`
merge_ranges(merged, *rest)
else
# No overlapping range was found; move on
[ range, *merge_ranges(*rest) ]
end
end
Note: This code assumes each range is ascending (e.g. 10..5 will break it).
Usage:
ranges = [ 73..856, 82..1145, 116..2914, 3203..3241 ]
p merge_ranges(*ranges)
# => [73..2914, 3203..3241]
ranges = [ 0..10, 5..20, 30..50, 45..80, 50..90, 100..101, 101..200 ]
p merge_ranges(*ranges)
# => [0..20, 30..90, 100..200]
I believe your resulting set has too many items (2881) to be used with divide, which if I understood correctly, would require 2881^2881 iterations, which is such a big number (8,7927981983090337174360463368808e+9966) that running it would take nearly forever even if you didn't get stack level too deep error.
Without using sets, you can use this code to merge the ranges:
module RangeMerger
def merge(range_b)
if cover?(range_b.first) && cover?(range_b.last)
self
elsif cover?(range_b.first)
self.class.new(first, range_b.last)
elsif cover?(range_b.last)
self.class.new(range_b.first, last)
else
nil # Unmergable
end
end
end
module ArrayRangePusher
def <<(item)
if item.kind_of?(Range)
item.extend RangeMerger
each_with_index do |own_item, idx|
own_item.extend RangeMerger
if new_range = own_item.merge(item)
self[idx] = new_range
return self
end
end
end
super
end
end
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
new_ranges = Array.new
new_ranges.extend ArrayRangePusher
ranges.each do |range|
new_ranges << range
end
puts ranges.inspect
puts new_ranges.inspect
This will output:
[73..856, 82..1145, 116..2914, 3203..3241]
[73..2914, 3203..3241]
which I believe is the intended output for your original problem. It's a bit ugly, but I'm a bit rusty at the moment.
Edit: I don't think this has anything to do with your original problem before the edits which was about merging ranges.

String to array to multidimensional hash in ruby

I don't really know if the title is correct, but the question is quite simple:
I have a value and a key.
The key is as follows:
"one.two.three"
Now, how can I set this hash:
params['one']['two']['three'] = value
You can try to do it with this code:
keys = "one.two.three".split '.' # => ["one", "two", "three"]
params = {}; value = 1; i = 0; # i is an index of processed keys array element
keys.reduce(params) { |hash, key|
hash[key] = if (i += 1) == keys.length
value # assign value to the last key in keys array
else
hash[key] || {} # initialize hash if it is not initialized yet (won't loose already initialized hashes)
end
}
puts params # {"one"=>{"two"=>{"three"=>1}}}
Use recursion:
def make_hash(keys)
keys.empty? ? 1 : { keys.shift => make_hash(keys) }
end
puts make_hash("one.two.three".split '.')
# => {"one"=>{"two"=>{"three"=>1}}}
You can use the inject method:
key = "one.two.three"
value = 5
arr = key.split(".").reverse
arr[1..-1].inject({arr[0] => value}){ |memo, i| {i => memo} }
# => {"one"=>{"two"=>{"three"=>5}}}

Resources