How to output sorted hash in ruby template - ruby

I'm building a config file for one of our inline apps. Its essentially a json file. I'm having a lot of trouble getting puppet/ruby 1.8 to output the hash/json the same way each time.
I'm currently using
<%= require "json"; JSON.pretty_generate data %>
But while outputting human readable content, it doesn't guarantee the same order each time. Which means that puppet will send out change notifications often for the same data.
I've also tried
<%= require "json"; JSON.pretty_generate Hash[*data.sort.flatten] %>
Which will generate the same data/order each time. The problem comes when data has a nested array.
data => { beanstalkd => [ "server1", ] }
becomes
"beanstalkd": "server1",
instead of
"beanstalkd": ["server1"],
I've been fighting with this for a few days on and off now, so would like some help

Since hashes in Ruby are ordered, and the question is tagged with ruby, here's a method that will sort a hash recursively (without affecting ordering of arrays):
def sort_hash(h)
{}.tap do |h2|
h.sort.each do |k,v|
h2[k] = v.is_a?(Hash) ? sort_hash(v) : v
end
end
end
h = {a:9, d:[3,1,2], c:{b:17, a:42}, b:2 }
p sort_hash(h)
#=> {:a=>9, :b=>2, :c=>{:a=>42, :b=>17}, :d=>[3, 1, 2]}
require 'json'
puts sort_hash(h).to_json
#=> {"a":9,"b":2,"c":{"a":42,"b":17},"d":[3,1,2]}
Note that this will fail catastrophically if your hash has keys that cannot be compared. (If your data comes from JSON, this will not be the case, since all keys will be strings.)

Hash is an unordered data structure. In some languages (ruby, for example) there's an ordered version of hash, but in most cases in most languages you shouldn't rely on any specific order in a hash.
If order is important to you, you should use an array. So, your hash
{a: 1, b: 2}
becomes this
[{a: 1}, {b: 2}]
I think, it doesn't force too many changes in your code.
Workaround to your situation
Try this:
data = {beanstalkId: ['server1'], ccc: 2, aaa: 3}
data2 = data.keys.sort.map {|k| [k, data[k]]}
puts Hash[data2]
#=> {:aaa=>3, :beanstalkId=>["server1"], :ccc=>2}

Related

How to reduce a Ruby hash by a given array of keys?

I'm looking for a simple way to copy/reduce a hash, but only include the keys/values specified in an array of keys.
original_hash = { one: 1, two: 'too', three: 3 }
wanted_keys = [:one, :three]
new_hash = # do something with the hash
expect(new_hash).to eq({ one: 1, three: 3 })
If the hash has a very large number of keys and/or the array of wanted keys is very large (improbable as that may be),
original_hash.select { |k, v| wanted_keys.include?(k) }
would be relatively inefficient because a linear search of wanted_keys is required for each of original_hash's keys. Here are two ways to speed things up. (#Lucas' solution is a third way.)
Convert wanted_keys to a set
require 'set'
wanted_keys_set = wanted_keys.to_set
original_hash.select { |k, v| wanted_keys_set.include?(k) }
#=> {:one=>1, :three=>3}
Match wanted_keys with the values of those keys in original_hash and then convert the resulting array to a hash
wanted_keys.zip(original_hash.values_at(*wanted_keys)).to_h
#=> {:one=>1, :three=>3}
Prior to Ruby v2.0, when Array#to_h made its debut, this would be written
Hash[wanted_keys.zip(original_hash.values_at(*wanted_keys))]
this is for your spec passing :)
original_hash.slice(*wanted_keys)
If you happen to be using Rails, you could use Hash#slice
require "active_support/core_ext/hash"
original_hash = { one: 1, two: 'too', three: 3 }
wanted_keys = [:one, :three]
new_hash = original_hash.slice *wanted_keys
#=> {:one=>1, :three=>3}
Implementation of Hash#slice method is present in Active support core extensions code rails/activesupport/lib/active_support/core_ext/hash/slice.rb
if you dont wanna to iterate whole Hash, you can use each_with_object
original_hash = {one: 1, two: 'too', three: 3}
wanted_keys = [:one, :three]
# iterate only array of keys
new_hash = wanted_keys.each_with_object({}) do |key, exp|
exp[key] = original_hash[key] if original_hash[key]
end
You could try something like (inefficient solution below)
original_hash.select{|k,v| wanted_keys.include? k }
I'm not entirely up on my Ruby-foo so I'm not sure if this returns a list or a Hash.

Flatten deep nested hash to array for sha1 hashing

I want to compute an unique sha1 hash from a ruby hash. I thought about
(Deep) Converting the Hash into an array
Sorting the array
Join array by empty string
calculate sha1
Consider the following hash:
hash = {
foo: "test",
bar: [1,2,3]
hello: {
world: "world",
arrays: [
{foo: "bar"}
]
}
}
How can I get this kind of nested hash into an array like
[:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]
I would then sort the array, join it with array.join("") and compute the sha1 hash like this:
require 'digest/sha1'
Digest::SHA1.hexdigest hash_string
How could I flatten the hash like I described above?
Is there already a gem for this?
Is there a quicker / easier way to solve this? I have a large amount of objects to convert (~700k), so performance does matter.
EDIT
Another problem that I figured out by the answers below are this two hashes:
a = {a: "a", b: "b"}
b = {a: "b", b: "a"}
When flattening the hash and sorting it, this two hashes produce the same output, even when a == b => false.
EDIT 2
The use case for this whole thing is product data comparison. The product data is stored inside a hash, then serialized and sent to a service that creates / updates the product data.
I want to check if anything has changed inside the product data, so I generate a hash from the product content and store it in a database. The next time the same product is loaded, I calculate the hash again, compare it to the one in the DB and decide wether the product needs an update or not.
EDIT : As you detailed, two hashes with keys in different order should give the same string. I would reopen the Hash class to add my new custom flatten method :
class Hash
def custom_flatten()
self.sort.map{|pair| ["key: #{pair[0]}", pair[1]]}.flatten.map{ |elem| elem.is_a?(Hash) ? elem.custom_flatten : elem }.flatten
end
end
Explanation :
sort converts the hash to a sorted array of pairs (for the comparison of hashes with different keys order)
.map{|pair| ["key: #{pair[0]}", pair[1]]} is a trick to differentiate keys from values in the final flatten array, to avoid the problem of {a: {b: {c: :d}}}.custom_flatten == {a: :b, c: :d}.custom_flatten
flatten converts an array of arrays into a single array of values
map{ |elem| elem.is_a?(Hash) ? elem.custom_flatten : elem } calls back fully_flatten on any sub-hash left.
Then you just need to use :
require 'digest/sha1'
Digest::SHA1.hexdigest hash.custom_flatten.to_s
I am not aware of a gem that does something like what you are looking for. There is a Hash#flatten method in ruby, but it does not flatten nested hashes recursively. Here is a straight forward recursive function that will flatten in the way that you requested in your question:
def completely_flatten(hsh)
hsh.flatten(-1).map{|el| el.is_a?(Hash) ? completely_flatten(el) : el}.flatten
end
This will yield
hash = {
foo: "test",
bar: [1,2,3]
hello: {
world: "earth",
arrays: [
{my: "example"}
]
}
}
completely_flatten(hash)
#=> [:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]
To get the string representation you are looking for (before making the sha1 hash) convert everything in the array to a string before sorting so that all of the elements can be meaningfully compared or else you will get an error:
hash_string = completely_flatten(hash).map(&:to_s).sort.join
#=> "123arraysbarearthexamplefoohellomytestworld"
The question is how to "flatten" a hash. There is a second, implicit, question concerning sha1, but, by SO rules, that needs to be addressed in a separate question. You can "flatten" any hash or array as follows.
Code
def crush(obj)
recurse(obj).flatten
end
def recurse(obj)
case obj
when Array then obj.map { |e| recurse e }
when Hash then obj.map { |k,v| [k, recurse(v)] }
else obj
end
end
Example
crush({
foo: "test",
bar: [1,2,3],
hello: {
world: "earth",
arrays: [{my: "example"}]
}
})
#=> [:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]
crush([[{ a:1, b:2 }, "cat", [3,4]], "dog", { c: [5,6] }])
#=> [:a, 1, :b, 2, "cat", 3, 4, "dog", :c, 5, 6]
Use Marshal for Fast Serialization
You haven't articulated a useful reason to change your data structure before hashing. Therefore, you should consider marshaling for speed unless your data structures contain unsupported objects like bindings or procs. For example, using your hash variable with the syntax corrected:
require 'digest/sha1'
hash = {
foo: "test",
bar: [1,2,3],
hello: {
world: "world",
arrays: [
{foo: "bar"}
]
}
}
Digest::SHA1.hexdigest Marshal.dump(hash)
#=> "f50bc3ceb514ae074a5ab9672ae5081251ae00ca"
Marshal is generally faster than other serialization options. If all you need is speed, that will be your best bet. However, you may find that JSON, YAML, or a simple #to_s or #inspect meet your needs better for other reasons. As long as you are comparing similar representations of your object, the internal format of the hashed object is largely irrelevant to ensuring you have a unique or unmodified object.
Any solution based on flattening the hash will fail for nested hashes. A robust solution is to explicitly sort the keys of each hash recursively (from ruby 1.9.x onwards, hash keys order is preserved), and then serialize it as a string and digest it.
def canonize_hash(h)
r = h.map { |k, v| [k, v.is_a?(Hash) ? canonize_hash(v) : v] }
Hash[r.sort]
end
def digest_hash(hash)
Digest::SHA1.hexdigest canonize_hash(hash).to_s
end
digest_hash({ foo: "foo", bar: "bar" })
# => "ea1154f35b34c518fda993e8bb0fe4dbb54ae74a"
digest_hash({ bar: "bar", foo: "foo" })
# => "ea1154f35b34c518fda993e8bb0fe4dbb54ae74a"

Ruby all possible permutations of an array of arrays (one liner?)

Questions similar to this have been asked before on SO, but they're not quite what I need and I can't seem to arrive at my solution through altering/modifying those approaches.
In any case, I have an array of arrays, as follows:
b= [["1"],["2"],["3"],["4"],["5"],["6"]]
(If it makes it easier to arrive at a solution, b can also be a one dimensional array, as follows: ["1","2","3","4","5","6"]. Either type of input works for my needs.)
and I would like to generate the following:
[["123456"],["213456"],["312456"],...]
where each array in the output array is a unique permutation of the six numbers. I would also take it as a single array (e.g., ["123456", "213456",...]). The order of the output isn't particularly important as long as each entry is unique and no number repeats in a string (e.g., "112345" isn't allowed). All 6 numbers must also be used in each entry, so I'm not interested in incremental output like "123", either.
As much as this sounds like it, this isn't a homework problem. I could brute for this thing and get the output I need. I just feel like there has to be a better, more elegant, solution.
With Array#permutation:
permutations = (1..6).to_a.permutation.map(&:join)
# ["123456", "123465", "123546", ..., "654312", "654321"]
Ruby does this natively :)
From the ruby documentation :
a = [1, 2, 3]
a.permutation.to_a #=> [[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]]
a.permutation(1).to_a #=> [[1],[2],[3]]
a.permutation(2).to_a #=> [[1,2],[1,3],[2,1],[2,3],[3,1],[3,2]]
a.permutation(3).to_a #=> [[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]]
a.permutation(0).to_a #=> [[]] # one permutation of length 0
a.permutation(4).to_a #=> [] # no permutations of length 4
http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-permutation
You should definitely have a look at Permutation Gem. Example from documentation
perm = Permutation.new(3)
# => #<Permutation:0x57dc94 #last=5, #rank=0, #size=3>
colors = [:r, :g, :b]
# => [:r, :g, :b]
perm.map { |p| p.project(colors) }
# => [[:r, :g, :b], [:r, :b, :g], [:g, :r, :b], [:g, :b, :r], [:b, :r, :g],
# [:b, :g, :r]]
UPDATE
If you are using Ruby > 1.8.6, Array.permutation is built in.
This should do it:
b.permutation.to_a.collect! { |i| i = [i.flatten.join] }

Sort items in a nested hash by their values

I'm being sent a nested hash that needs to be sorted by its values. For example:
#foo = {"a"=>{"z"=>5, "y"=>3, "x"=>88}, "b"=>{"a"=>2, "d"=>-5}}
When running the following:
#foo["a"].sort{|a,b| a[1]<=>b[1]}
I get:
[["y", 3], ["z", 5], ["x", 88]]
This is great, it's exactly what I want. The problem is I'm not always going to know what all the keys are that are being sent to me so I need some sort of loop. I tried to do the following:
#foo.each do |e|
e.sort{|a,b| a[1]<=>b[1]}
end
This to me makes sense since if I manually call #foo.first[0] I get
"a"
and #foo.first[1] returns
{"z"=>5, "y"=>3, "x"=>8}
but for some reason this isn't sorting properly (e.g. at all). I assume this is because the each is calling sort on the entire hash object rather than on "a"'s values. How do I access the values of the nested hash without knowing what it's key is?
You might want to loop over the hash like this:
#foo.each do |key, value|
#foo[key] = value.sort{ |a,b| a[1]<=>b[1] }
end
#foo = {"a"=>{"z"=>5, "y"=>3, "x"=>88}, "b"=>{"a"=>2, "d"=>-5}}
#bar = Hash[ #foo.map{ |key,values| [ key, values.sort_by(&:last) ] } ]
Or, via a less-tricky path:
#bar = {}
#foo.each do |key,values|
#bar[key] = values.sort_by{ |key,value| value }
end
In both cases #bar turns out to be:
p #bar
#=> {
#=> "a"=>[["y", 3], ["z", 5], ["x", 88]],
#=> "b"=>[["d", -5], ["a", 2]]
#=> }
My coworker came up with a slightly more flexible solution that will recursively sort an array of any depth:
def deep_sort_by(&block)
Hash[self.map do |key, value|
[if key.respond_to? :deep_sort_by
key.deep_sort_by(&block)
else
key
end,
if value.respond_to? :deep_sort_by
value.deep_sort_by(&block)
else
value
end]
end.sort_by(&block)]
end
You can inject it into all hashes and then just call it like this:
myMap.deep_sort_by { |obj| obj }
The code would be similar for an array. We published it as a gem for others to use, see blog post for additional details.
Disclaimer: I work for this company.
in your example e is an temporary array containing a [key,value] pair. In this case, the character key and the nested hash. So e.sort{|a,b|...} is going to try to compare the character to the hash, and fails with a runtime error. I think you probably meant to type e[1].sort{...}. But even that is not going to work correctly, because you don't store the sorted hash anywhere: #foo.each returns the original #foo and leaves it unchanged.
The better solution is the one suggested by #Pan Thomakos:
#foo.each do |key, value|
#foo[key] = value.sort{ |a,b| a[1]<=>b[1] }
end

Convert array-of-hashes to a hash-of-hashes, indexed by an attribute of the hashes

I've got an array of hashes representing objects as a response to an API call. I need to pull data from some of the hashes, and one particular key serves as an id for the hash object. I would like to convert the array into a hash with the keys as the ids, and the values as the original hash with that id.
Here's what I'm talking about:
api_response = [
{ :id => 1, :foo => 'bar' },
{ :id => 2, :foo => 'another bar' },
# ..
]
ideal_response = {
1 => { :id => 1, :foo => 'bar' },
2 => { :id => 2, :foo => 'another bar' },
# ..
}
There are two ways I could think of doing this.
Map the data to the ideal_response (below)
Use api_response.find { |x| x[:id] == i } for each record I need to access.
A method I'm unaware of, possibly involving a way of using map to build a hash, natively.
My method of mapping:
keys = data.map { |x| x[:id] }
mapped = Hash[*keys.zip(data).flatten]
I can't help but feel like there is a more performant, tidier way of doing this. Option 2 is very performant when there are a very minimal number of records that need to be accessed. Mapping excels here, but it starts to break down when there are a lot of records in the response. Thankfully, I don't expect there to be more than 50-100 records, so mapping is sufficient.
Is there a smarter, tidier, or more performant way of doing this in Ruby?
Ruby <= 2.0
> Hash[api_response.map { |r| [r[:id], r] }]
#=> {1=>{:id=>1, :foo=>"bar"}, 2=>{:id=>2, :foo=>"another bar"}}
However, Hash::[] is pretty ugly and breaks the usual left-to-right OOP flow. That's why Facets proposed Enumerable#mash:
> require 'facets'
> api_response.mash { |r| [r[:id], r] }
#=> {1=>{:id=>1, :foo=>"bar"}, 2=>{:id=>2, :foo=>"another bar"}}
This basic abstraction (convert enumerables to hashes) was asked to be included in Ruby long ago, alas, without luck.
Note that your use case is covered by Active Support: Enumerable#index_by
Ruby >= 2.1
[UPDATE] Still no love for Enumerable#mash, but now we have Array#to_h. It creates an intermediate array, but it's better than nothing:
> object = api_response.map { |r| [r[:id], r] }.to_h
Something like:
ideal_response = api_response.group_by{|i| i[:id]}
#=> {1=>[{:id=>1, :foo=>"bar"}], 2=>[{:id=>2, :foo=>"another bar"}]}
It uses Enumerable's group_by, which works on collections, returning matches for whatever key value you want. Because it expects to find multiple occurrences of matching key-value hits it appends them to arrays, so you end up with a hash of arrays of hashes. You could peel back the internal arrays if you wanted but could run a risk of overwriting content if two of your hash IDs collided. group_by avoids that with the inner array.
Accessing a particular element is easy:
ideal_response[1][0] #=> {:id=>1, :foo=>"bar"}
ideal_response[1][0][:foo] #=> "bar"
The way you show at the end of the question is another valid way of doing it. Both are reasonably fast and elegant.
For this I'd probably just go:
ideal_response = api_response.each_with_object(Hash.new) { |o, h| h[o[:id]] = o }
Not super pretty with the multiple brackets in the block but it does the trick with just a single iteration of the api_response.

Resources