Merge duplicates in array of hashes - ruby

I have an array of hashes in ruby:
[
{name: 'one', tags: 'xxx'},
{name: 'two', tags: 'yyy'},
{name: 'one', tags: 'zzz'},
]
and i'm looking for any clean ruby solution, which will make it able to simply merge all the duplicates in that array (by merging i mean concatinating the tags param) so the above example will be transformed to:
[
{name: 'one', tags: 'xxx, zzz'},
{name: 'two', tags: 'yyy'},
]
I can iterate through each array element, check if there is a duplicate, merge it with the original entry and delete the duplicate but i feel that there can be a better solution for this and that there are some caveats in such approach i don't know about. Thanks for any clue.

I can think of as
arr = [
{name: 'one', tags: 'xxx'},
{name: 'two', tags: 'yyy'},
{name: 'one', tags: 'zzz'},
]
merged_array_hash = arr.group_by { |h1| h1[:name] }.map do |k,v|
{ :name => k, :tags => v.map { |h2| h2[:tags] }.join(" ,") }
end
merged_array_hash
# => [{:name=>"one", :tags=>"xxx ,zzz"}, {:name=>"two", :tags=>"yyy"}]

Here's a way that makes use of the form of Hash#update (aka Hash.merge!) that takes a block for determining the merged value for every key that is present in both of the two hashes being merged.
Code
def combine(a)
a.each_with_object({}) { |g,h| h.update({ g[:name]=>g }) { |k,hv,gv|
{ name: k, tags: hv[:tags]+", "+gv[:tags] } } }.values
end
Example
a = [{name: 'one', tags: 'uuu'},
{name: 'two', tags: 'vvv'},
{name: 'one', tags: 'www'},
{name: 'six', tags: 'xxx'},
{name: 'one', tags: 'yyy'},
{name: 'two', tags: 'zzz'}]
combine(a)
#=> [{:name=>"one", :tags=>"uuu, www, yyy"},
# {:name=>"two", :tags=>"vvv, zzz" },
# {:name=>"six", :tags=>"xxx" }]
Explanation
Suppose
a = [{name: 'one', tags: 'uuu'},
{name: 'two', tags: 'vvv'},
{name: 'one', tags: 'www'}]
b = a.each_with_object({})
#=> #<Enumerator: [{:name=>"one", :tags=>"uuu"},
# {:name=>"two", :tags=>"vvv"},
# {:name=>"one", :tags=>"www"}]:each_with_object({})>
We can convert the enumerator b to an array to see what values it will pass into its block:
b.to_a
#=> [[{:name=>"one", :tags=>"uuu"}, {}],
# [{:name=>"two", :tags=>"vvv"}, {}],
# [{:name=>"one", :tags=>"www"}, {}]]
The first value passed to the block and assigned to the block variables is:
g,h = [{:name=>"one", :tags=>"uuu"}, {}]
g #=> {:name=>"one", :tags=>"uuu"}
h #=> {}
The first merge operation is now performed (the merged h is returned):
h.update({ g[:name] => g })
#=> h.update({ "one" => {:name=>"one", :tags=>"uuu"} })
#=> {"one"=>{:name=>"one", :tags=>"uuu"}}
h does not have the key "one", so update's block is not involed.
Next, the enumerator b passes the following into the block:
g #=> {:name=>"two", :tags=>"vvv"}
h #=> {"one"=>{:name=>"one", :tags=>"uuu"}}
so we execute:
h.update({ g[:name] => g })
#=> h.update({ "two"=>{:name=>"two", :tags=>"vvv"})
#=> {"one"=>{:name=>"one", :tags=>"uuu"},
# "two"=>{:name=>"two", :tags=>"vvv"}}
Again, h does not have the key "two", so the block is not used.
Lastly, each_with_object passes the final tuple into the block:
g #=> {:name=>"one", :tags=>"www"}
h #=> {"one"=>{:name=>"one", :tags=>"uuu"},
# "two"=>{:name=>"two", :tags=>"vvv"}}
and we execute:
h.update({ g[:name] => g })
#=> h.update({ "one"=>{:name=>"one", :tags=>"www"})
h has a key/value pair with key "one":
"one"=>{:name=>"one", :tags=>"uuu"}
update's block is therefore executed to determine the merged value. The following values are passed to that block's variables:
k #=> "one"
hv #=> {:name=>"one", :tags=>"uuu"} <h's value for "one">
gv #=> {:name=>"one", :tags=>"www"} <g's value for "one">
and the block calculation creates this hash (as the merged value for the key "one"):
{ name: k, tags: hv[:tags]+", "+gv[:tags] }
#=> { name: "one", tags: "uuu" + ", " + "www" }
#=> { name: "one", tags: "uuu, www" }
So the merged hash now becomes:
h #=> {"one"=>{:name=>"one", :tags=>"uuu, www"},
# "two"=>{:name=>"two", :tags=>"vvv" }}
All that remains is to extract the values:
h.values
#=> [{:name=>"one", :tags=>"uuu, www"}, {:name=>"two", :tags=>"vvv"}]

Related

Extend an array of hash with values from an array

I have this array
types = ['first', 'second', 'third']
and this array of hashes
data = [{query: "A"}, {query: "B"}, {query:"C", type: 'first'}]
Now I have to "extend" each Hash of data with each type if not already exists. All existing keys of the hash must be copied too (eg. :query).
So the final result must be:
results = [
{query: "A", type: 'first'}, {query: "A", type: "second"}, {query: "A", type: "third"},
{query: "B", type: 'first'}, {query: "B", type: "second"}, {query: "D", type: "third"},
{query: "C", type: 'first'}, {query: "C", type: "second"}, {query: "C", type: "third"}
]
the data array is quite big for performance matters.
You can use Array#product to combine both arrays and Hash#merge to add the :type key:
data.product(types).map { |h, t| h.merge(type: t) }
#=> [
# {:query=>"A", :type=>"first"}, {:query=>"A", :type=>"second"}, {:query=>"A", :type=>"third"},
# {:query=>"B", :type=>"first"}, {:query=>"B", :type=>"second"}, {:query=>"B", :type=>"third"},
# {:query=>"C", :type=>"first"}, {:query=>"C", :type=>"second"}, {:query=>"C", :type=>"third"}
# ]
Note that the above will replace existing values for :type with the values from the types array. (there can only be one :type per hash)
If you need more complex logic, you can pass a block to merge which handles existing / conflicting keys, e.g.:
h = { query: 'C', type: 'first' }
t = 'third'
h.merge(type: t) { |h, v1, v2| v1 } # preserve existing value
#=> {:query=>"C", :type=>"first"}
h.merge(type: t) { |h, v1, v2| [v1, v2] } # put both values in an array
#=> {:query=>"C", :type=>["first", "third"]}
We see that each hash in data is mapped to an array of three hashes and the resulting array of three arrays is then to be flattended, suggesting we skip a step by using the method Enumerable#flat_map on data. The construct is as follows.
n = types.size
#=> 3
data.flat_map { |h| n.times.map { |i| ... } }
where ... produces a hash such as
{:query=>"A", :type=>"second"}
Next we see that the value of :type in the array of hashes returned equals :first then :second then :third then :first and so on. That is, the value cycles among the elements of types. Also, the fact that one of the hashes in data has a key :type is irrelevant, as it will be overwritten. Therefore, for each value of i (0, 1 or 2) in map's block above, we wish to merge h with { type: types[i%n] }:
n = types.size
data.flat_map { |h| n.times.map { |i| h.merge(type: types[i%n]) } }
#=> [{:query=>"A", :type=>"first"}, {:query=>"A", :type=>"second"},
# {:query=>"A", :type=>"third"},
# {:query=>"B", :type=>"first"}, {:query=>"B", :type=>"second"},
# {:query=>"B", :type=>"third"},
# {:query=>"C", :type=>"first"}, {:query=>"C", :type=>"second"},
# {:query=>"C", :type=>"third"}]
We may alternatively make use of the method Array#cycle.
enum = types.cycle
#=> #<Enumerator: ["first", "second", "third"]:cycle>
As the name of the method suggests,
enum.next
#=> "first"
enum.next
#=> "second"
enum.next
#=> "third"
enum.next
#=> "first"
enum.next
#=> "second"
...
ad infinitum. Before continuing let me reset the enumerator.
enum.rewind
See Enumerator#next and Enumerator#rewind.
n = types.size
data.flat_map { |h| n.times.map { h.merge(type: enum.next) } }
#=> <as above>

Merge Ruby Hash values with same key

Is this possible to achieve with selected keys:
Eg
h = [
{a: 1, b: "Hello", c: "Test1"},
{a: 2, b: "Hey", c: "Test1"},
{a: 3, b: "Hi", c: "Test2"}
]
Expected Output
[
{a: 1, b: "Hello, Hey", c: "Test1"}, # See here, I don't want key 'a' to be merged
{a: 3, b: "Hi", c: "Test2"}
]
My Try
g = h.group_by{|k| k[:c]}.values
OUTPUT =>
[
[
{:a=>1, :b=>"Hello", :c=>"Test1"},
{:a=>2, :b=>"Hey", :c=>"Test1"}
], [
{:a=>3, :b=>"Hi", :c=>"Test2"}
]
]
g.each do |v|
if v.length > 1
c = v.reduce({}) do |s, l|
s.merge(l) { |_, a, b| [a, b].uniq.join(", ") }
end
end
p c #{:a=>"1, 2", :b=>"Hello, Hey", :c=>"Test1"}
end
So, the output I get is
{:a=>"1, 2", :b=>"Hello, Hey", :c=>"Test1"}
But, I needed
{a: 1, b: "Hello, Hey", c: "Test1"}
NOTE: This is just a test array of HASH I have taken to put my question. But, the actual hash has a lots of keys. So, please don't reply with key comparison answers
I need a less complex solution
I can't see a simpler version of your code. To make it fully work, you can use the first argument in the merge block instead of dismissing it to differentiate when you need to merge a and b or when you just use a. Your line becomes:
s.merge(l) { |key, a, b| key == :a ? a : [a, b].uniq.join(", ") }
Maybe you can consider this option, but I don't know if it is less complex:
h.group_by { |h| h[:c] }.values.map { |tmp| tmp[0].merge(*tmp[1..]) { |key, oldval, newval| key == :b ? [oldval, newval].join(' ') : oldval } }
#=> [{:a=>1, :b=>"Hello Hey", :c=>"Test1"}, {:a=>3, :b=>"Hi", :c=>"Test2"}]
The first part groups the hashes by :c
h.group_by { |h| h[:c] }.values #=> [[{:a=>1, :b=>"Hello", :c=>"Test1"}, {:a=>2, :b=>"Hey", :c=>"Test1"}], [{:a=>3, :b=>"Hi", :c=>"Test2"}]]
Then it maps to merge the first elements with others using Hash#merge
h.each_with_object({}) do |g,h|
h.update(g[:c]=>g) { |_,o,n| o.merge(b: "#{o[:b]}, #{n[:b]}") }
end.values
#=> [{:a=>1, :b=>"Hello, Hey", :c=>"Test1"},
# {:a=>3, :b=>"Hi", :c=>"Test2"}]
This uses the form of Hash#update that employs a block (here { |_,o,n| o.merge(b: "#{o[:b]}, #{n[:b]}") }) to determine the values of keys that are present in both hashes being merged. The first block variable holds the common key. I’ve used an underscore for that variable mainly to signal to the reader that it is not used in the block calculation. See the doc for definitions of the other two block variables.
Note that the receiver of values equals the following.
h.each_with_object({}) do |g,h|
h.update(g[:c]=>g) { |_,o,n| o.merge(b: "#{o[:b]}, #{n[:b]}") }
end
#=> { “Test1”=>{:a=>1, :b=>"Hello, Hey", :c=>"Test1"},
# “Test2=>{:a=>3, :b=>"Hi", :c=>"Test2"} }

What is best way to create an array of hashes from an array?

Given this array:
array = ['one', 'two']
what is the best way to turn that into something like the following?
[{value: 'one', label: 'one'}, {value: 'two', label: 'two'}]
Use Array#map, which iterates over your collection and returns an array. In your case, just return the hash directly
array.map { |a| {value: a, label: a} }
# => [{:value=>"one", :label=>"one"}, {:value=>"two", :label=>"two"}]
The best way is Array#map, but just to try a different way check also Enumerable#each_with_object:
array = ['one', 'two']
array.each_with_object([]) { |e, a| a << {value: e, label: e} }
#=> [{:value=>"one", :label=>"one"}, {:value=>"two", :label=>"two"}]

Ruby chain two 'group_by' methods

I have an array of objects that looks like this:
[
{day: 'Monday', class: 1, name: 'X'},
{day: 'Monday', class: 2, name: 'Y'},
{day: 'Tuesday', class: 1, name: 'Z'},
{day: 'Monday', class: 1, name: 'T'}
]
I want to group them by days, and then by classes i.e.
groupedArray['Monday'] => {'1' => [{name: 'X'}, {name: 'T'}], '2' => [{name: 'Y'}]}
I've seen there is a
group_by { |a| [a.day, a.class]}
But this creates a hash with a [day, class] key.
Is there a way I can achieve this, without having to group them first by day, and then iterate through each day, and group them by class, then pushing them into a new hash?
arr = [
{day: 'Monday', class: 1, name: 'X'},
{day: 'Monday', class: 2, name: 'Y'},
{day: 'Tuesday', class: 1, name: 'Z'},
{day: 'Monday', class: 1, name: 'T'}
]
One way of obtaining the desired hash is to use the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged. Here that is done twice, first when values of :day are the same, then for each such occurrence, when the values of :class are the same (for a given value of :day).
arr.each_with_object({}) { |g,h|
h.update(g[:day]=>{ g[:class].to_s=>[{name: g[:name] }] }) { |_,h1,h2|
h1.update(h2) { |_,p,q| p+q } } }
#=> {"Monday" =>{"1"=>[{:name=>"X"}, {:name=>"T"}], "2"=>[{:name=>"Y"}]},
# "Tuesday"=>{"1"=>[{:name=>"Z"}]}}
The steps are as follows.
enum = arr.each_with_object({})
#=> #<Enumerator: [{:day=>"Monday", :class=>1, :name=>"X"},
# {:day=>"Monday", :class=>2, :name=>"Y"},
# {:day=>"Tuesday", :class=>1, :name=>"Z"},
# {:day=>"Monday", :class=>1, :name=>"T"}]:each_with_object({})>
We can see the values that will be generated by this enumerator by converting it to an array:
enum.to_a
#=> [[{:day=>"Monday", :class=>1, :name=>"X"}, {}],
# [{:day=>"Monday", :class=>2, :name=>"Y"}, {}],
# [{:day=>"Tuesday", :class=>1, :name=>"Z"}, {}],
# [{:day=>"Monday", :class=>1, :name=>"T"}, {}]]
The empty hash in each array is the hash being built and returned. It is initially empty, but will be partially formed as each element of enum is processed.
The first element of enum is passed to the block (by Enumerator#each) and the block variables are assigned using parallel assignment (somtimes called multiple assignment):
g,h = enum.next
#=> [{:day=>"Monday", :class=>1, :name=>"X"}, {}]
g #=> {:day=>"Monday", :class=>1, :name=>"X"}
h #=> {}
We now perform the block calculation:
h.update(g[:day]=>{ g[:class].to_s=>[{name: g[:name] }] })
#=> {}.update("Monday"=>{ "1"=>[{name: "X"}] })
#=> {"Monday"=>{"1"=>[{:name=>"X"}]}}
This operation returns the updated value of h, the hash being constructed.
Note that update's argument
"Monday"=>{ "1"=>[{name: "X"}] }
is shorthand for
{ "Monday"=>{ "1"=>[{name: "X"}] } }
Because the key "Monday" was not present in both hashes being merged (h had no keys), the block
{ |_,h1,h2| h1.update(h2) { |_,p,q| p+q } } }
was not used to determine the value of "Monday".
Now the next value of enum is passed to the block and the block variables are assigned:
g,h = enum.next
#=> [{:day=>"Monday", :class=>2, :name=>"Y"},
# {"Monday"=>{"1"=>[{:name=>"X"}]}}]
g #=> {:day=>"Monday", :class=>2, :name=>"Y"}
h #=> {"Monday"=>{"1"=>[{:name=>"X"}]}}
Note that h was updated. We now perform the block calculation:
h.update(g[:day]=>{ g[:class].to_s=>[{name: g[:name] }] })
# {"Monday"=>{"1"=>[{:name=>"X"}]}}.update("Monday"=>{ "2"=>[{name: "Y"}] })
Both hashes being merged share the key "Monday". We therefore must use the block to determine the merged value of "Monday":
{ |k,h1,h2| h1.update(h2) { |m,p,q| p+q } } }
#=> {"1"=>[{:name=>"X"}]}.update("2"=>[{name: "Y"}])
#=> {"1"=>[{:name=>"X"}], "2"=>[{:name=>"Y"}]}
See the doc for update for an explanation of the block variables k, h1 and h2 for the outer update and m, p and q for the inner update. k and m are the values of the common key. As they are not used in the block calculations, I have replaced them with underscores, which is common practice.
So now:
h #=> { "Monday" => { "1"=>[{ :name=>"X" }], "2"=>[{ :name=>"Y"}] } }
Prior to this operation the hash h["Monday] did not yet have a key 2, so the second update did not require use of the block
{ |_,p,q| p+q }
This block is used, however, when the last element of enum is merged into h, since the values of both :day and :class are the same for the two hashes being merged.
The remaining calculations are similar.

Removing hashes that have identical values for particular keys

I have an Array of Hashes with the same keys, storing people's data.
I want to remove the hashes that have the same values for the keys :name and :surname. The rest of the values can differ, so calling uniq! on array won't work.
Is there a simple solution for this?
You can pass a block to uniq or uniq!, the value returned by the block is used to compare two entries for equality:
irb> people = [{name: 'foo', surname: 'bar', age: 10},
{name: 'foo', surname: 'bar' age: 11}]
irb> people.uniq { |p| [p[:name], p[:surname]] }
=> [{:name=>"foo", :surname=>"bar", :age=>10}]
arr=[{name: 'john', surname: 'smith', phone:123456789},
{name: 'thomas', surname: 'hardy', phone: 671234992},
{name: 'john', surname: 'smith', phone: 666777888}]
# [{:name=>"john", :surname=>"smith", :phone=>123456789},
# {:name=>"thomas", :surname=>"hardy", :phone=>671234992},
# {:name=>"john", :surname=>"smith", :phone=>666777888}]
arr.uniq {|h| [h[:name], h[:surname]]}
# [{:name=>"john", :surname=>"smith", :phone=>123456789},
# {:name=>"thomas", :surname=>"hardy", :phone=>671234992}]
unique_people = {}
person_array.each do |person|
unique_people["#{person[:name]} #{person[:surname]}"] = person
end
array_of_unique_people = unique_people.values
This should do the trick.
a.delete_if do |h|
a.select{|i| i[:name] == h[:name] and i[:surname] == h[:surname] }.count > 1
end

Resources