Related
So, I have a hash with arrays, like this one:
{"name": ["John","Jane","Chris","Mary"], "surname": ["Doe","Doe","Smith","Martins"]}
I want to merge them into an array of hashes, combining the corresponding elements.
The results should be like that:
[{"name"=>"John", "surname"=>"Doe"}, {"name"=>"Jane", "surname"=>"Doe"}, {"name"=>"Chris", "surname"=>"Smith"}, {"name"=>"Mary", "surname"=>"Martins"}]
Any idea how to do that efficiently?
Please, note that the real-world use scenario could contain a variable number of hash keys.
Try this
h[:name].zip(h[:surname]).map do |name, surname|
{ 'name' => name, 'surname' => surname }
end
I suggest writing the code to permit arbitrary numbers of attributes. It's no more difficult than assuming there are two (:name and :surname), yet it provides greater flexibility, accommodating, for example, future changes to the number or naming of attributes:
def squish(h)
keys = h.keys.map(&:to_s)
h.values.transpose.map { |a| keys.zip(a).to_h }
end
h = { name: ["John", "Jane", "Chris"],
surname: ["Doe", "Doe", "Smith"],
age: [22, 34, 96]
}
squish(h)
#=> [{"name"=>"John", "surname"=>"Doe", "age"=>22},
# {"name"=>"Jane", "surname"=>"Doe", "age"=>34},
# {"name"=>"Chris", "surname"=>"Smith", "age"=>96}]
The steps for the example above are as follows:
b = h.keys
#=> [:name, :surname, :age]
keys = b.map(&:to_s)
#=> ["name", "surname", "age"]
c = h.values
#=> [["John", "Jane", "Chris"], ["Doe", "Doe", "Smith"], [22, 34, 96]]
d = c.transpose
#=> [["John", "Doe", 22], ["Jane", "Doe", 34], ["Chris", "Smith", 96]]
d.map { |a| keys.zip(a).to_h }
#=> [{"name"=>"John", "surname"=>"Doe", "age"=>22},
# {"name"=>"Jane", "surname"=>"Doe", "age"=>34},
# {"name"=>"Chris", "surname"=>"Smith", "age"=>96}]
In the last step the first value of b is passed to map's block and the block variable is assigned its value.
a = d.first
#=> ["John", "Doe", 22]
e = keys.zip(a)
#=> [["name", "John"], ["surname", "Doe"], ["age", 22]]
e.to_h
#=> {"name"=>"John", "surname"=>"Doe", "age"=>22}
The remaining calculations are similar.
If your dataset is really big, you can consider using Enumerator::Lazy.
This way Ruby will not create intermediate arrays during calculations.
This is how #Ursus answer can be improved:
h[:name]
.lazy
.zip(h[:surname])
.map { |name, surname| { 'name' => name, 'surname' => surname } }
.to_a
Other option for the case where:
[..] the real-world use scenario could contain a variable number of hash keys
h = {
'name': ['John','Jane','Chris','Mary'],
'surname': ['Doe','Doe','Smith','Martins'],
'whathever': [1, 2, 3, 4, 5]
}
You could use Object#then with a splat operator in a one liner:
h.values.then { |a, *b| a.zip *b }.map { |e| (h.keys.zip e).to_h }
#=> [{:name=>"John", :surname=>"Doe", :whathever=>1}, {:name=>"Jane", :surname=>"Doe", :whathever=>2}, {:name=>"Chris", :surname=>"Smith", :whathever=>3}, {:name=>"Mary", :surname=>"Martins", :whathever=>4}]
The first part, works this way:
h.values.then { |a, *b| a.zip *b }
#=> [["John", "Doe", 1], ["Jane", "Doe", 2], ["Chris", "Smith", 3], ["Mary", "Martins", 4]]
The last part just maps the elements zipping each with the original keys then calling Array#to_h to convert to hash.
Here I removed the call .to_h to show the intermediate result:
h.values.then { |a, *b| a.zip *b }.map { |e| h.keys.zip e }
#=> [[[:name, "John"], [:surname, "Doe"], [:whathever, 1]], [[:name, "Jane"], [:surname, "Doe"], [:whathever, 2]], [[:name, "Chris"], [:surname, "Smith"], [:whathever, 3]], [[:name, "Mary"], [:surname, "Martins"], [:whathever, 4]]]
[h[:name], h[:surname]].transpose.map do |name, surname|
{ 'name' => name, 'surname' => surname }
end
I am new to ruby and I struggle understanding sort_by!. This method does something magically I really don't understand.
Here is a simple example:
pc= ["Z6","Z5","Z4"]
c = [
{
id: "Z4",
name: "zlah1"
},
{
id: "Z5",
name: "blah2"
},
{
id: "Z6",
name: "clah3"
}
]
c.sort_by! do |c|
pc.index c[:id]
end
This procedure returns:
=> [{:id=>"Z6", :name=>"clah3"}, {:id=>"Z5", :name=>"blah2"}, {:id=>"Z4", :name=>"zlah1"}]
It somehow reverses the array order. How does it do that? pc.index c[:id] just returns a number. What does this method do under the hood? The documentation is not very beginners friendly.
Suppose we are given the following:
order = ['Z6', 'Z5', 'Z4']
array = [{id: 'Z4', name: 'zlah1'},
{id: 'Z5', name: 'blah2'},
{id: 'Z6', name: 'clah3'},
{id: 'Z5', name: 'dlah4'}]
Notice that I added a 4th hash ({id: 'Z5', name: 'dlah4'}) to the array array given in the question. I did this so that two elements of array would the same value for the key :id ("Z5").
Now let's consider how Ruby might implement the following:
array.sort_by { |hash| order.index(hash[:id]) }
#=> [{:id=>"Z6", :naCme=>"clah3"},
# {:id=>"Z5", :name=>"blah2"},
# {:id=>"Z5", :name=>"dlah4"},
# {:id=>"Z4", :name=>"zlah1"}]
That could be done in four steps.
Step 1: Create a hash that maps the values of the sort criterion to the values of sort_by's receiver
sort_map = array.each_with_object(Hash.new { |h,k| h[k] = [] }) do |hash,h|
h[order.index(hash[:id])] << hash
end
#=> {2=>[{:id=>"Z4", :name=>"zlah1"}],
# 1=>[{:id=>"Z5", :name=>"blah2"}, {:id=>"Z5", :name=>"dlah4"}],
# 0=>[{:id=>"Z6", :name=>"clah3"}]}
h = Hash.new { |h,k| h[k] = [] } creates an empty hash with a default proc that operates as follows when evaluating:
h[k] << hash
If h has a key k this operation is performed as usual. If, however, h does not have a key k the proc is called, causing the operation h[k] = [] to be performed, after which h[k] << hash is executed as normal.
The values in this hash must be arrays, rather than individual elements of array, due the possibility that, as here, two elements of sort_by's receiver map to the same key. Note that this operation has nothing to do with the particular mapping of the elements of sort_by's receiver to the sort criterion.
Step 2: Sort the keys of sort_map
keys = sort_map.keys
#=> [2, 1, 0]
sorted_keys = keys.sort
#=> [0, 1, 2]
Step 3: Map sorted_keys to the values of sort_map
sort_map_values = sorted_keys.map { |k| sort_map[k] }
#=> [[{:id=>"Z6", :name=>"clah3"}],
# [{:id=>"Z5", :name=>"blah2"}, {:id=>"Z5", :name=>"dlah4"}],
# [{:id=>"Z4", :name=>"zlah1"}]]
Step 4: Flatten sort_map_values
sort_map_values.flatten
#=> [{:id=>"Z6", :name=>"clah3"},
# {:id=>"Z5", :name=>"blah2"},
# {:id=>"Z5", :name=>"dlah4"},
# {:id=>"Z4", :name=>"zlah1"}]
One of the advantages of using sort_by rather than sort (with a block) is that the sort criterion (here order.index(hash[:id])) is computed only once for each element of sort_by's receiver, whereas sort would recompute these values for each pairwise comparison in its block. The time savings can be considerable if this operation is computationally expensive.
order = ['Z6', 'Z5', 'Z4']
array = [{id: 'Z4', name: 'zlah1'},
{id: 'Z5', name: 'blah2'},
{id: 'Z6', name: 'clah3'}]
array.sort_by { |hash| order.index(hash[:id]) }
#=> [{:id=>"Z6", :name=>"clah3"}, {:id=>"Z5", :name=>"blah2"}, {:id=>"Z4", :name=>"zlah1"}]
This doesn't magically reverse the order of the array. To explain what happens we first need to understand what order.index(hash[:id]) does. This becomes better visible with the map method.
array.map { |hash| order.index(hash[:id]) }
#=> [2, 1, 0]
Like you can see, the first element with id 'Z4' will return the number 2 since 'Z4' in the order array has index 2. The same happens with all other array elements. The retuned value is used to sort the objects, sort_by will always sort asynchronous, so the order of the above array should become [0, 1, 2]. However, the actual content is not replaced, the number is only used for comparison vs other elements. Thus resulting in:
#=> [{:id=>"Z6", :name=>"clah3"}, {:id=>"Z5", :name=>"blah2"}, {:id=>"Z4", :name=>"zlah1"}]
I have a map function in ruby which returns an array of arrays with two values in each, which I want to have in a different format.
What I want to have:
"countries": [
{
"country": "Canada",
"count": 12
},
{and so on... }
]
But map obviously returns my values as array:
"countries": [
[
"Canada",
2
],
[
"Chile",
1
],
[
"China",
1
]
]
When using Array::to_h I am also able to bringt it closer to the format I actually want to have.
"countries": {
"Canada": 2,
"Chile": 1,
"China": 1,
}
I have tried reduce/inject, each_with_object but in both cases I do not understand how to access the incoming parameters. While searching here you find many many similar problems. But haven't found a way to adapt those to my case.
Hope you can help to find a short and elegant solution.
You are given two arrays:
countries= [['Canada', 2], ['Chile', 1], ['China', 1]]
keys = [:country, :count]
You could write
[keys].product(countries).map { |arr| arr.transpose.to_h }
#=> [{:country=>"Canada", :count=>2},
# {:country=>"Chile", :count=>1},
# {:country=>"China", :count=>1}]
or simply
countries.map { |country, cnt| { country: country, count: cnt } }
#=> [{:country=>"Canada", :count=>2},
# {:country=>"Chile", :count=>1},
# {:country=>"China", :count=>1}]
but the first has the advantage that no code need be changed in the names of the keys change. In fact, there would be no need to change the code if the arrays countries and keys both changed, provided countries[i].size == keys.size for all i = 0..countries.size-1. (See the example at the end.)
The initial step for the first calculation is as follows.
a = [keys].product(countries)
#=> [[[:country, :count], ["Canada", 2]],
# [[:country, :count], ["Chile", 1]],
# [[:country, :count], ["China", 1]]]
See Array#product. We now have
a.map { |arr| arr.transpose.to_h }
map passes the first element of a to the block and sets the block variable arr to that value:
arr = a.first
#=> [[:country, :count], ["Canada", 2]]
The block calculation is then performed:
b = arr.transpose
#=> [[:country, "Canada"], [:count, 2]]
b.to_h
#=> {:country=>"Canada", :count=>2}
So we see that a[0] (arr) is mapped to {:country=>"Canada", :count=>2}. The next two elements of a are then passed to the block and similar calculations are made, after which map returns the desired array of three hashes. See Array#transpose and Array#to_h.
Here is a second example using the same code.
countries= [['Canada', 2, 9.09], ['Chile', 1, 0.74],
['China', 1, 9.33], ['France', 1, 0.55]]
keys = [:country, :count, :area]
[keys].product(countries).map { |arr| arr.transpose.to_h }
#=> [{:country=>"Canada", :count=>2, :area=>9.09},
# {:country=>"Chile", :count=>1, :area=>0.74},
# {:country=>"China", :count=>1, :area=>9.33},
# {:country=>"France", :count=>1, :area=>0.55}]
Just out of curiosity:
countries = [['Canada', 2], ['Chile', 1], ['China', 1]]
countries.map(&%i[country count].method(:zip)).map(&:to_h)
#⇒ [{:country=>"Canada", :count=>2},
# {:country=>"Chile", :count=>1},
# {:country=>"China", :count=>1}]
How can I convert a string of JSON data to a multidimensional array?
# Begin with JSON
json_data = "[
{"id":1,"name":"Don"},
{"id":2,"name":"Bob"},
...
]"
# do something here to convert the JSON data to array of arrays.
# End with multidimensional arrays
array_data = [
["id", "name"],
[1,"Don"],
[2,"Bob"],
...
]
For readability and efficiency, I would do it like this:
require 'json'
json_data = '[{"id":1,"name":"Don"},{"id":2,"name":"Bob"}]'
arr = JSON.parse(json_data)
#=> "[{\"id\":1,\"name\":\"Don\"},{\"id\":2,\"name\":\"Bob\"}]"
keys = arr.first.keys
#=> ["id", "name"]
arr.map! { |h| h.values_at(*keys) }.unshift(keys)
#=> [["id", "name"], [1, "Don"], [2, "Bob"]]
This should do the trick:
require 'json'
json_data = '[{"id":1,"name":"Don"},{"id":2,"name":"Bob"}]'
JSON.parse(json_data).inject([]) { |result, e| result + [e.keys, e.values] }.uniq
First, we read the JSON into an array with JSON.parse. For each element in the JSON, we collect all keys and values using inject which results in the following array:
[
["id", "name"],
[1, "Don"],
["id", "name"],
[2, "Bob"]
]
To get rid of the repeating key-arrays, we call uniq and are done.
[
["id", "name"],
[1, "Don"],
[2, "Bob"]
]
Adding to #tessi's answer, we can avoid using 'uniq' if we combine 'with_index' and 'inject'.
require 'json'
json_data = '[{"id":1,"name":"Don"},{"id":2,"name":"Bob"}]'
array_data = JSON.parse(json_data).each.with_index.inject([]) { |result, (e, i)| result + (i == 0 ? [e.keys, e.values] : [e.values]) }
puts array_data.inspect
The result is:
[["id", "name"], [1, "Don"], [2, "Bob"]]
I have a structure with a cartesian product that looks like this (and could go out to arbitrary depth)...
variables = ["var1","var2",...]
myhash = {
{"var1"=>"a", "var2"=>"a", ...}=>1,
{"var1"=>"a", "var2"=>"b", ...}=>2,
{"var1"=>"b", "var2"=>"a", ...}=>3,
{"var1"=>"b", "var2"=>"b", ...}=>4,
}
... it has a fixed structure but I'd like simple indexing so I'm trying to write a method to convert it to this :
nested = {
"a"=> {
"a"=> 1,
"b"=> 2
},
"b"=> {
"a"=> 3,
"b"=> 4
}
}
Any clever ideas (that allow for arbitrary depth)?
Maybe like this (not the cleanest way):
def cartesian_to_map(myhash)
{}.tap do |hash|
myhash.each do |h|
(hash[h[0]["var1"]] ||= {}).merge!({h[0]["var2"] => h[1]})
end
end
end
Result:
puts cartesian_to_map(myhash).inspect
{"a"=>{"a"=>1, "b"=>2}, "b"=>{"a"=>3, "b"=>4}}
Here is my example.
It uses a method index(hash, fields) that takes the hash, and the fields you want to index by.
It's dirty, and uses a local variable to pass up the current level in the index.
I bet you can make it much nicer.
def index(hash, fields)
# store the last index of the fields
last_field = fields.length - 1
# our indexed version
indexed = {}
hash.each do |key, value|
# our current point in the indexed hash
point = indexed
fields.each_with_index do |field, i|
key_field = key[field]
if i == last_field
point[key_field] = value
else
# ensure the next point is a hash
point[key_field] ||= {}
# move our point up
point = point[key_field]
end
end
end
# return our indexed hash
indexed
end
You can then just call
index(myhash, ["var1", "var2"])
And it should look like what you want
index({
{"var1"=>"a", "var2"=>"a"} => 1,
{"var1"=>"a", "var2"=>"b"} => 2,
{"var1"=>"b", "var2"=>"a"} => 3,
{"var1"=>"b", "var2"=>"b"} => 4,
}, ["var1", "var2"])
==
{
"a"=> {
"a"=> 1,
"b"=> 2
},
"b"=> {
"a"=> 3,
"b"=> 4
}
}
It seems to work.
(see it as a gist
https://gist.github.com/1126580)
Here's an ugly-but-effective solution:
nested = Hash[ myhash.group_by{ |h,n| h["var1"] } ].tap{ |nested|
nested.each do |v1,a|
nested[v1] = a.group_by{ |h,n| h["var2"] }
nested[v1].each{ |v2,a| nested[v1][v2] = a.flatten.last }
end
}
p nested
#=> {"a"=>{"a"=>1, "b"=>2}, "b"=>{"a"=>3, "b"=>4}}
You might consider an alternative representation that is easier to map to and (IMO) just as easy to index:
paired = Hash[ myhash.map{ |h,n| [ [h["var1"],h["var2"]], n ] } ]
p paired
#=> {["a", "a"]=>1, ["a", "b"]=>2, ["b", "a"]=>3, ["b", "b"]=>4}
p paired[["a","b"]]
#=> 2