Merge duplicate values in json using ruby - ruby

I have the following item.json file
{
"items": [
{
"brand": "LEGO",
"stock": 55,
"full-price": "22.99",
},
{
"brand": "Nano Blocks",
"stock": 12,
"full-price": "49.99",
},
{
"brand": "LEGO",
"stock": 5,
"full-price": "199.99",
}
]
}
There are two items named LEGO and I want to get output for the total number of stock for the individual brand.
In ruby file item.rb i have code like:
require 'json'
path = File.join(File.dirname(__FILE__), '../data/products.json')
file = File.read(path)
products_hash = JSON.parse(file)
products_hash["items"].each do |brand|
puts "Stock no: #{brand["stock"]}"
end
I got output for stock no individually for each brand wherein I need the stock to be summed for two brand name "LEGO" displayed as one.
Anyone has solution for this?

json = File.open(path,'r:utf-8',&:read) # in case the JSON uses UTF-8
items = JSON.parse(json)['items']
stock_by_brand = items
.group_by{ |h| h['brand'] }
.map do |brand,array|
[ brand,
array
.map{ |item| item['stock'] }
.inject(:+) ]
end
.to_h
#=> {"LEGO"=>60, "Nano Blocks"=>12}
It works like this:
Enumerable#group_by takes the array of items and creates a hash mapping the brand name to an array of all item hashes with that brand
Enumerable#map turns each brand/array pair in that hash into an array of the brand (unchanged) followed by:
Enumerable#map on the array of items picks out just the "stock" counts, and then
Enumerable#inject sums them all together
Array#to_h then turns that array of two-value arrays into a hash, mapping the brand to the sum of stock values.
If you want simpler code that's less functional and possibly easier to understand:
stock_by_brand = {} # an empty hash
items.each do |item|
stock_by_brand[ item['brand'] ] ||= 0 # initialize to zero if unset
stock_by_brand[ item['brand'] ] += item['stock']
end
p stock_by_brand #=> {"LEGO"=>60, "Nano Blocks"=>12}

To see what your JSON string looks like, let's create it from your hash, which I've denoted h:
require 'json'
j = JSON.generate(h)
#=> "{\"items\":[{\"brand\":\"LEGO\",\"stock\":55,\"full-price\":\"22.99\"},{\"brand\":\"Nano Blocks\",\"stock\":12,\"full-price\":\"49.99\"},{\"brand\":\"LEGO\",\"stock\":5,\"full-price\":\"199.99\"}]}"
After reading that from a file, into the variable j, we can now parse it to obtain the value of "items":
arr = JSON.parse(j)["items"]
#=> [{"brand"=>"LEGO", "stock"=>55, "full-price"=>"22.99"},
# {"brand"=>"Nano Blocks", "stock"=>12, "full-price"=>"49.99"},
# {"brand"=>"LEGO", "stock"=>5, "full-price"=>"199.99"}]
One way to obtain the desired tallies is to use a counting hash:
arr.each_with_object(Hash.new(0)) {|g,h| h.update(g["brand"]=>h[g["brand"]]+g["stock"])}
#=> {"LEGO"=>60, "Nano Blocks"=>12}
Hash.new(0) creates an empty hash (represented by the block variable h) with with a default value of zero1. That means that h[k] returns zero if the hash does not have a key k.
For the first element of arr (represented by the block variable g) we have:
g["brand"] #=> "LEGO"
g["stock"] #=> 55
Within the block, therefore, the calculation is:
g["brand"] => h[g["brand"]]+g["stock"]
#=> "LEGO" => h["LEGO"] + 55
Initially h has no keys, so h["LEGO"] returns the default value of zero, resulting in { "LEGO"=>55 } being merged into the hash h. As h now has a key "LEGO", h["LEGO"], will not return the default value in subsequent calculations.
Another approach is to use the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged:
arr.each_with_object({}) {|g,h| h.update(g["brand"]=>g["stock"]) {|_,o,n| o+n}}
#=> {"LEGO"=>60, "Nano Blocks"=>12}
1 k=>v is shorthand for { k=>v } when it appears as a method's argument.

Related

Merging Three hashes and getting this resultant hash

I have read the xls and have formed these three hashes
hash1=[{'name'=>'Firstname',
'Locator'=>'id=xxx',
'Action'=>'TypeAndWait'},
{'name'=>'Password',
'Locator'=>'id=yyy',
'Action'=>'TypeAndTab'}]
Second Hash
hash2=[{'Test Name'=>'Example',
'TestNumber'=>'Test1'},
{'Test Name'=>'Example',
'TestNumber'=>'Test2'}]
My Thrid Hash
hash3=[{'name'=>'Firstname',
'Test1'=>'four',
'Test2'=>'Five',
'Test3'=>'Six'},
{'name'=>'Password',
'Test1'=>'Vicky',
'Test2'=>'Sujin',
'Test3'=>'Sivaram'}]
Now my resultant hash is
result={"Example"=>
{"Test1"=>
{'Firstname'=>
["id=xxx","four", "TypeAndWait"],
'Password'=>
["id=yyy","Vicky", "TypeAndTab"]},
"Test2"=>
{'Firstname'=>
["id=xxx","Five", "TypeAndWait"],
'Password'=>
["id=yyy","Sujin", "TypeAndTab"]}}}
I have gotten this result, but I had to write 60 lines of code in my program, but I don't think I have to write such a long program when I use Ruby, I strongly believe some easy way to achieve this. Can some one help me?
The second hash determines the which testcase has to be read, for an example, test3 is not present in the second testcase so resultant hash doesn't have test3.
We are given three arrays, which I've renamed arr1, arr2 and arr3. (hash1, hash2 and hash3 are not especially good names for arrays. :-))
arr1 = [{'name'=>'Firstname', 'Locator'=>'id=xxx', 'Action'=>'TypeAndWait'},
{'name'=>'Password', 'Locator'=>'id=yyy', 'Action'=>'TypeAndTab'}]
arr2 = [{'Test Name'=>'Example', 'TestNumber'=>'Test1'},
{'Test Name'=>'Example', 'TestNumber'=>'Test2'}]
arr3=[{'name'=>'Firstname', 'Test1'=>'four', 'Test2'=>'Five', 'Test3'=>'Six'},
{'name'=>'Password', 'Test1'=>'Vicky', 'Test2'=>'Sujin', 'Test3'=>'Sivaram'}]
The drivers are the values "Test1" and "Test2" in the hashes that are elements of arr2. Nothing else in that array is needed, so let's extract those values (of which there could be any number, but here there are just two).
a2 = arr2.map { |h| h['TestNumber'] }
#=> ["Test1", "Test2"]
Next we need to rearrange the information in arr3 by creating a hash whose keys are the elements of a2.
h3 = a2.each_with_object({}) { |test,h|
h[test] = arr3.each_with_object({}) { |f,g| g[f['name']] = f[test] } }
#=> {"Test1"=>{"Firstname"=>"four", "Password"=>"Vicky"},
# "Test2"=>{"Firstname"=>"Five", "Password"=>"Sujin"}}
Next we need to rearrange the content of arr1 by creating a hash whose keys match the keys of values of h3.
h1 = arr1.each_with_object({}) { |g,h| h[g['name']] = g.reject { |k,_| k == 'name' } }
#=> {"Firstname"=>{"Locator"=>"id=xxx", "Action"=>"TypeAndWait"},
# "Password"=>{"Locator"=>"id=yyy", "Action"=>"TypeAndTab"}}
It is now a simple matter of extracting information from these three objects.
{ 'Example'=>
a2.each_with_object({}) do |test,h|
h[test] = h3[test].each_with_object({}) do |(k,v),g|
f = h1[k]
g[k] = [f['Locator'], v, f['Action']]
end
end
}
#=> {"Example"=>
# {"Test1"=>{"Firstname"=>["id=xxx", "four", "TypeAndWait"],
# "Password"=>["id=yyy", "Vicky", "TypeAndTab"]},
# "Test2"=>{"Firstname"=>["id=xxx", "Five", "TypeAndWait"],
# "Password"=>["id=yyy", "Sujin", "TypeAndTab"]}}}
What do you call hash{1-2-3} are arrays in the first place. Also, I am pretty sure you have mistyped hash1#Locator and/or hash3#name. The code below works for this exact data, but it should not be hard to update it to reflect any changes.
hash2.
map(&:values).
group_by(&:shift).
map do |k, v|
[k, v.flatten.map do |k, v|
[k, hash3.map do |h3|
# lookup a hash from hash1
h1 = hash1.find do |h1|
h3['name'].start_with?(h1['Locator'])
end
# can it be nil btw?
[
h1['name'],
[
h3['name'][/.*(?=-id)/],
h3[k],
h1['Action']
]
]
end.to_h]
end.to_h]
end.to_h

Creating a hash where the keys are strings, values are numbers

I have an array:
["Melanie", "149", "Joe", "2", "16", "216", "Sarah"]
I want to create a hash:
{"Melanie"=>[149], "Joe"=>[2, 16, 216] "Sarah"=>nil}
How would I accomplish this when the keys and values are in the same array?
All values would be integers (although they are in string form in the array.) All keys start and end with a letter.
Your expected hash is invalid. Therefore, it is impossible to get what you wrote that you want.
From your issue, it looks reasonable to expect the values to be array. In that case, you can do it like this:
["Melanie", "149", "Joe", "2", "16", "216", "Sarah"]
.slice_before(/[a-z]/i).map{|k, *v| [k, v.map(&:to_i)]}.to_h
# => {"Melanie"=>[149], "Joe"=>[2, 16, 216], "Sarah"=>[]}
With little modification, you can let the value be a number instead of an array when the array length is one, but that is not a good design; it would introduce exceptions.
Try this
def numeric?(x)
x.chars.all? { |y| ('0'..'9').include?(y) }
end
array = ["Melanie", "149", "Joe", "2", "16", "216", "Sarah"]
keys = array.select { |x| not numeric?(x) }
map = {}
keys.each do |k|
from = array.index(k) + 1
to = array.index( keys[keys.index(k) + 1] )
map[k] = to ? array[from...to] : array[from..from]
end
p map
Output:
{"Melanie"=>["149"], "Joe"=>["2", "16", "216"], "Sarah"=>[]}
[Finished in 0.1s]
Here's another way:
arr = ["Melanie", "149", "Joe", "2", "16", "216", "Sarah"]
class String
def integer?
!!(self =~ /^-?\d+$/)
end
end
Hash[*arr.each_with_object([]) { |s,a| s.integer? ? a[-1] << s.to_i : a<<s<<[] }].
tap { |h| h.each_key { |k| h[k] = nil if h[k].empty? } }
#=> {"Melanie"=>[149], "Joe"=>[2, 16, 216], "Sarah"=>nil}
There are three components to your question, and I will try to answer them separately.
Regarding storing a multi-valued mapping, while there are specialized solutions available, the most common recommendation is just to store a hash whose values are arrays. That is, for your use case, your primary data structure is a hash whose keys are strings and whose values are arrays of integers. Depending on your desired behavior for duplicates etc., etc, you may wish to substitute a different data structure for the value structure, possibly a set.
Regarding identifying strings containing numbers and strings not containing numbers, well, that depends on exactly what your non-number-containing strings could instead contain, but a good starting point would be to perform a regular expression match for digits. You didn't specify whether your allowable numeric strings represented integers, floating points, etc. The particular answer to that may affect your overall strategy. Unfortunately, input parsing and validation is a complex and messy topic in the general case.
Regarding the actual conversion process, I would recommend the following strategy. Iterate through your input array. Check each string for whether it is numeric or non-numeric. If it is non-numeric, store that as the current key in a local. Also, in your hash, create a mapping from that key to a new empty array. If, instead, the string is numeric, convert it into a number, and add it to the array under the appropriate key.
I don't know if there's a pretty way to do it. I'd do something like this:
def numeric?(string)
# `!!` converts parsed number to `true`
!!Kernel.Float(string)
rescue TypeError, ArgumentError
false
end
def my_method(input_array)
# associate values with proper key and stores result in output
curr_key = nil
output = {}
input_array.each do |e|
if !numeric?(e)
output[e] = []
curr_key = e
else
# use Float if values may be floating-point
output[curr_key] << Integer(e, 10)
end
end
output.each do |k, v|
output[k] = v.empty? ? nil : v
end
output
end
Source for numeric method.

Sorting an array of hashes by a date field

I have an object with many arrays of hashes, one of which I want to sort by a value in the 'date' key.
#array['info'][0] = {"name"=>"personA", "date"=>"23/09/1980"}
#array['info'][1] = {"name"=>"personB", "date"=>"01/04/1970"}
#array['info'][2] = {"name"=>"personC", "date"=>"03/04/1975"}
I have tried various methods using Date.parse and with collect but an unable to find a good solution.
Edit:
To be clear I want to sort the original array in place
#array['info'].sort_by { |i| Date.parse i['date'] }.collect
How might one solve this elegantly the 'Ruby-ist' way. Thanks
Another way, which doesn't require converting the date strings to date objects, is the following.
Code
def sort_by_date(arr)
arr.sort_by { |h| h["date"].split('/').reverse }
end
If arr is to be sorted in place, use Array#sort_by! rather than Enumerable#sort_by.
Example
arr = [{ "name"=>"personA", "date"=>"23/09/1980" },
{ "name"=>"personB", "date"=>"01/04/1970" },
{ "name"=>"personC", "date"=>"03/04/1975" }]
sort_by_date(arr)
#=> [{ "name"=>"personB", "date"=>"01/04/1970" },
# { "name"=>"personC", "date"=>"03/04/1975" },
# { "name"=>"personA", "date"=>"23/09/1980" }]
Explanation
For arr in the example, sort_by passes the first element of arr into its block and assigns it to the block variable:
h = { "name"=>"personA", "date"=>"23/09/1980" }
then computes:
a = h["date"].split('/')
#=> ["23", "09", "1980"]
and then:
b = a.reverse
#=> ["1980", "09", "23"]
Similarly, we obtain b equal to:
["1970", "04", "01"]
and
["1975", "04", "03"]
for each of the other two elements of arr.
If you look at the docs for Array#<=> you will see that these three arrays are ordered as follows:
["1970", "04", "01"] < ["1975", "04", "03"] < ["1980", "09", "23"]
There is no need to convert the string elements to integers.
Looks fine overall. Although you can drop the collect call since it's not needed and use sort_by! to modify the array in-place (instead of reassigning):
#array['info'].sort_by! { |x| Date.parse x['date'] }

Search one hash for its keys, grab the values, put them in an array and merge the first hash with the second

I have a hash with items like
{'people'=>'50'},
{'chairs'=>'23'},
{'footballs'=>'5'},
{'crayons'=>'1'},
and I have another hash with
{'people'=>'http://www.thing.com/this-post'},
{'footballs'=>'http://www.thing.com/that-post'},
{'people'=>'http://www.thing.com/nice-post'},
{'footballs'=>'http://www.thing.com/other-post'},
{'people'=>'http://www.thing.com/thingy-post'},
{'footballs'=>'http://www.thing.com/the-post'},
{'people'=>'http://www.thing.com/the-post'},
{'crayons'=>'http://www.thing.com/the-blah'},
{'chairs'=>'http://www.thing.com/the-page'},
and I want something like the following that takes the first hash and then looks through the second one, grabs all the links for each word and puts them into a array appended onto the end of the hash somehow.
{'people', '50' => {'http://www.thing.com/this-post', 'http://www.thing.com/nice-post', 'http://www.thing.com/thingy-post'}},
{'footballs', '5' => {'http://www.thing.com/the-post', 'http://www.thing.com/the-post'}},
{'crayons', '1' => {'http://www.thing.com/the-blah'}},
{'chairs', '23' => {'chairs'=>'http://www.thing.com/the-page'}},
I am very new to Ruby, and I have tried quite a few combinations, and I need some help.
Excuse the example, I hope that it makes sense.
What you have is a mix of hashes, arrays, and something in the middle. I'm going to assume the following inputs:
categories = {'people'=>'50',
'chairs'=>'23',
'footballs'=>'5',
'crayons'=>'1'}
and:
posts = [['people', 'http://www.thing.com/this-post'],
['footballs','http://www.thing.com/that-post'],
['people','http://www.thing.com/nice-post'],
['footballs','http://www.thing.com/other-post'],
['people','http://www.thing.com/thingy-post'],
['footballs','http://www.thing.com/the-post'],
['people','http://www.thing.com/the-post'],
['crayons','http://www.thing.com/the-blah'],
['chairs','http://www.thing.com/the-page']]
and the following output:
[['people', '50', ['http://www.thing.com/this-post', 'http://www.thing.com/nice-post', 'http://www.thing.com/thingy-post']],
[['footballs', '5', ['http://www.thing.com/the-post', 'http://www.thing.com/the-post']],
['crayons', '1', ['http://www.thing.com/the-blah']],
['chairs', '23' => {'chairs'=>'http://www.thing.com/the-page']]]
In which case what you would need is:
categories.map do |name, count|
[name, count, posts.select do |category, _|
category == name
end.map { |_, post| post }]
end
You need to understand the different syntax for Array and Hash in Ruby:
Hash:
{ 'key1' => 'value1',
'key2' => 'value2' }
Array:
[ 'item1', 'item2', 'item3', 'item4' ]
A Hash in ruby (like in every other language) can't have more than once instance of any single key, meaning that a Hash {'key1' => 1, 'key1' => 2} is invalid and will result in an unexpected value (duplicate keys are overridden - you'll have {'key1' => 2 }).
Since there is some confusion about the format of the data, I will suggest how you might effectively structure both the input and the output. I will first present some code you could use, then give an example of how it's used, then explain what is happening.
Code
def merge_em(hash, array)
hash_keys = hash.keys
new_hash = hash_keys.each_with_object({}) { |k,h|
h[k] = { qty: hash[k], http: [] } }
array.each do |h|
h.keys.each do |k|
(new_hash.update({k=>h[k]}) { |k,g,http|
{qty: g[:qty], http: (g[:http] << http)}}) if hash_keys.include?(k)
end
end
new_hash
end
Example
Here is a hash that I have modified to include a key/value pair that does not appear in the array below:
hash = {'people' =>'50', 'chairs' =>'23', 'footballs'=>'5',
'crayons'=> '1', 'cat_lives'=> '9'}
Below is your array of hashes that is to be merged into hash. You'll see I've added a key/value pair to your hash with key "chairs". As I hope to make clear, the code is no different (i.e., not simplified) if we know in advance that each hash has only one key value pair. (Aside: if, for example, we want want the key from a hash h that is known to have only one key, we still have to pull out all the keys into an array and then take the only element of the array: h.keys.first).
I have also added a hash to the array that has no key that is among hash's keys.
array =
[{'people' =>'http://www.thing.com/this-post'},
{'footballs'=>'http://www.thing.com/that-post'},
{'people' =>'http://www.thing.com/nice-post'},
{'footballs'=>'http://www.thing.com/other-post'},
{'people' =>'http://www.thing.com/thingy-post'},
{'footballs'=>'http://www.thing.com/the-post'},
{'people' =>'http://www.thing.com/the-post'},
{'crayons' =>'http://www.thing.com/the-blah'},
{'chairs' =>'http://www.thing.com/the-page',
'crayons' =>'http://www.thing.com/blah'},
{'balloons' =>'http://www.thing.com/the-page'}
]
We now merge the information from array into hash, and at the same time change the structure of hash to something more suitable:
result = merge_em(hash, array)
#=> {"people" =>{:qty=>"50",
# :http=>["http://www.thing.com/this-post",
# "http://www.thing.com/nice-post",
# "http://www.thing.com/thingy-post",
# "http://www.thing.com/the-post"]},
# "chairs" =>{:qty=>"23",
# :http=>["http://www.thing.com/the-page"]},
# "footballs"=>{:qty=>"5",
# :http=>["http://www.thing.com/that-post",
# "http://www.thing.com/other-post",
# "http://www.thing.com/the-post"]},
# "crayons" =>{:qty=>"1",
# :http=>["http://www.thing.com/the-blah",
# "http://www.thing.com/blah"]},
# "cat_lives"=>{:qty=>"9",
# :http=>[]}}
I've assumed you want to look up the content of result with hash's keys. It is therefore convenient to make the values associated with those keys hashes themselves, with keys :qty and http. The former is for the values in hash (the naming may be wrong); the latter is an array containing the strings drawn from array.
This way, if we want the value for the key "crayons", we could write:
result["crayons"]
#=> {:qty=>"1",
# :http=>["http://www.thing.com/the-blah", "http://www.thing.com/blah"]}
or
irb(main):133:0> result["crayons"][:qty]
#=> "1"
irb(main):134:0> result["crayons"][:http]
#=> ["http://www.thing.com/the-blah", "http://www.thing.com/blah"]
Explanation
Let's go through this line-by-line. First, we need to reference hash.keys more than once, so let's make it a variable:
hash_keys = hash.keys
#=> ["people", "chairs", "footballs", "crayons", "cat_lives"]
We may as well convert this hash to the output format now. We could do it during the merge operation below, but I think is clearer to do it as a separate step:
new_hash = hash_keys.each_with_object({}) { |k,h|
h[k] = { qty: hash[k], http: [] } }
#=> {"people" =>{:qty=>"50", :http=>[]},
# "chairs" =>{:qty=>"23", :http=>[]},
# "footballs"=>{:qty=>"5", :http=>[]},
# "crayons" =>{:qty=>"1", :http=>[]},
# "cat_lives"=>{:qty=>"9", :http=>[]}}
Now we merge each (hash) element of array into new_hash:
array.each do |h|
h.keys.each do |k|
(new_hash.update({k=>h[k]}) { |k,g,http|
{ qty: g[:qty], http: (g[:http] << http) } }) if hash_keys.include?(k)
end
end
The first hash h from array that is passed into the block by each is:
{'people'=>'http://www.thing.com/this-post'}
which is assigned to the block variable h. We next construct an array of h's keys:
h.keys #=> ["people"]
The first of these keys, "people" (pretend there were more, as there would be in the penultimate element of array) is passed by its each into the inner block, whose block variable, k, is assigned the value "people". We then use Hash#update (aka merge!) to merge the hash:
{k=>h[k]} #=> {"people"=>'http://www.thing.com/this-post'}
into new_hash, but only because:
hash_keys.include?(k)
#=> ["people", "chairs", "footballs", "crayons", "cat_lives"].include?("people")
#=> true
evaluates to true. Note that this will evaluate to false for the key "balloons", so the hash in array with that key will not be merged. update's block:
{ |k,g,http| { qty: g[:qty], http: (g[:http] << http) } }
is crucial. This is update's way of determining the value of a key that is in both new_hash and in the hash being merged, {k=>h[k]}. The three block variables are assigned the following values by update:
k : the key ("people")
g : the current value of `new_hash[k]`
#=> `new_hash["people"] => {:qty=>"50", :http=>[]}`
http: the value of the key/value being merged
#=> 'http://www.thing.com/this-post'
We want the merged hash value for key "people" to be:
{ qty: g[:qty], http: (g[:http] << http) }
#=> { qty: 50, http: ([] << 'http://www.thing.com/this-post') }
#=> { qty: 50, http: ['http://www.thing.com/this-post'] }
so now:
new_hash
#=> {"people" =>{:qty=>"50", :http=>['http://www.thing.com/this-post']},
# "chairs" =>{:qty=>"23", :http=>[]},
# "footballs"=>{:qty=>"5", :http=>[]},
# "crayons" =>{:qty=>"1", :http=>[]},
# "cat_lives"=>{:qty=>"9", :http=>[]}}
We do the same for each of the other elements of array.
Lastly, we need to return the merged new_hash, so we make last line of the method:
new_hash
You could also do this
cat = [{'people'=>'50'},
{'chairs'=>'23'},
{'footballs'=>'5'},
{'crayons'=>'1'}]
pages = [{'people'=>'http://www.thing.com/this-post'},
{'footballs'=>'http://www.thing.com/that-post'},
{'people'=>'http://www.thing.com/nice-post'},
{'footballs'=>'http://www.thing.com/other-post'},
{'people'=>'http://www.thing.com/thingy-post'},
{'footballs'=>'http://www.thing.com/the-post'},
{'people'=>'http://www.thing.com/the-post'},
{'crayons'=>'http://www.thing.com/the-blah'},
{'chairs'=>'http://www.thing.com/the-page'}]
cat.map do |c|
c.merge(Hash['pages',pages.collect{|h| h[c.keys.pop]}.compact])
end
#=> [{"people"=>"50", "pages"=>["http://www.thing.com/this-post", "http://www.thing.com/nice-post", "http://www.thing.com/thingy-post", "http://www.thing.com/the-post"]},
{"chairs"=>"23", "pages"=>["http://www.thing.com/the-page"]},
{"footballs"=>"5", "pages"=>["http://www.thing.com/that-post", "http://www.thing.com/other-post", "http://www.thing.com/the-post"]},
{"crayons"=>"1", "pages"=>["http://www.thing.com/the-blah"]}]
Which is closer to your request but far less usable than some of the other posts.

Consolidate duplicate values of a certain key from an array of hashes into array

I have an array of hashes:
connections = [
{:name=>"John Doe", :number=>"5551234567", :count=>8},
{:name=>"Jane Doe", :number=>"5557654321", :count=>6},
{:name=>"John Doe", :number=>"5559876543", :count=>3}
]
If the :name value is a duplicate, as is the case with John Doe, it should combine the :number values into an array. The count is not important anymore, so the output should be in the following format:
{"John Doe"=>["5551234567","5559876543"],
"Jane Doe"=>["5557654321"]}
What I have so far is:
k = connections.inject(Hash.new{ |h,k| h[k[:name]] = [k[:number]] }) { |h,(k,v)| h[k] << v ; h }
But this only outputs
{"John Doe"=>["5559876543", nil], "Jane Doe"=>["5557654321", nil]}
This works:
connections.group_by do |h|
h[:name]
end.inject({}) do |h,(k,v)|
h.merge( { k => (v.map do |i| i[:number] end) } )
end
# => {"John Doe"=>["5551234567", "5559876543"], "Jane Doe"=>["5557654321"]}
Step by step...
connections is the same as in your post:
connections
# => [{:name=>"John Doe", :number=>"5551234567", :count=>8},
# {:name=>"Jane Doe", :number=>"5557654321", :count=>6}, {:name=>"John Doe",
# :number=>"5559876543", :count=>3}]
First we use group_by to combine the hash entries with the same :name:
connections.group_by do |h| h[:name] end
# => {"John Doe"=>[{:name=>"John Doe", :number=>"5551234567", :count=>8},
# {:name=>"John Doe", :number=>"5559876543", :count=>3}],
# "Jane Doe"=>[{:name=>"Jane Doe", :number=>"5557654321", :count=>6}]}
That's great, but we want the values of the result hash to be just the numbers that show up as values of the :number key, not the full original entry hashes.
Given just one of the list values, we can get the desired result this way:
[{:name=>"John Doe", :number=>"5551234567", :count=>8},
{:name=>"John Doe", :number=>"5559876543", :count=>3}].map do |i|
i[:number]
end
# => ["5551234567", "5559876543"]
But we want to do that to all of the list values at once, while keeping the association with their keys. It's basically a nested map operation, but the outer map runs across a Hash instead of an Array.
You can in fact do it with map. The only tricky part is that map on a Hash doesn't return a Hash, but an Array of nested [key,value] Arrays. By wrapping the call in a Hash[...] constructor, you can turn the result back into a Hash:
Hash[
connections.group_by do |h|
h[:name]
end.map do |k,v|
[ k, (v.map do |i| i[:number] end) ]
end
]
That returns the same result as my original full answer above, and is arguably clearer, so you might want to just use that version.
But the mechanism I used instead was inject. It's like map, but instead of just returning an Array of the return values from the block, it gives you full control over how the return value is constructed out of the individual block calls:
connections.group_by do |h|
h[:name]
end.inject({}) do |h,(k,v)|
h.merge( { k => (v.map do |i| i[:number] end) } )
end
That creates a new Hash, which starts out empty (the {} passed to inject), and passes it to the do block (where it shows up as h) along with the first key/value pair in the Hash returned by group_by. That block creates another new Hash with the single key passed in and the result of transforming the value as we did above, and merges that into the passed-in one, returning the new value - basically, it adds one new key/value pair to the Hash, with the value transformed into the desired form by the inner map. The new Hash is returned from the block, so it becomes the new value of h for the next time through the block.
(We could also just assign the entry into h directly with h[k] = v.map ..., but the block would then need to return h afterward as a separate statement, since it is the return value of the block, and not the value of h at the end of the block's execution, that gets passed to the next iteration.)
As an aside: I used do...end instead of {...} around my blocks to avoid confusion with the {...} used for Hash literals. There is no semantic difference; it's purely a matter of style. In standard Ruby style, you would use {...} for single-line blocks, and restrict do...end to blocks that span more than one line.
In one line:
k = connections.each.with_object({}) {|conn,result| (result[conn[:name]] ||= []) << conn[:number] }
More readable:
result = Hash.new {|h,k| h[k] = [] }
connections.each {|conn| result[conn[:name]] << conn[:number] }
result #=> {"John Doe"=>["5551234567", "5559876543"], "Jane Doe"=>["5557654321"]}
names = {}
connections.each{ |c| names[c[:name]] ||= []; names[c[:name]].push(c[:number]) }
puts names

Resources