Ruby Hash: type casting - ruby

I’m trying to get a better grasp on writing in Ruby and working with Hash tables and their values.
1. Say you have a hash:
‘FOO’= {‘baz’ => [1,2,3,4,5]}
Goal: convert each value into a string in the ‘Ruby’ way.
I’ve come across multiple examples of using .each eg.
FOO.each = { |k,v| FOO[k] = v.to_s }
However this renders an array encapsulated in a string. Eg. "[1,2,3,4,5]" where it should be ["1", "2", "3", "4", "5"].
2. When type casting is performed on a Hash that’s holds an array of values, is the result a new array? Or simply a change in type of value (eg. 1 becomes “1” when .to_s is applied (say the value was placed through a each enumerator like above).
An explanation is greatly appreciated. New to Ruby.

In the each block, k and v are the key value pair. In your case, 'baz' is key and [1,2,3,4,5] is value. Since you're doing v.to_s, it converts the whole array to string and not the individual values.
You can do something like this to achieve what you want.
foo = { 'baz' => [1,2,3,4,5] }
foo.each { |k, v| foo[k] = v.map(&:to_s) }

You can use Hash#transform_values:
foo = { 'baz' => [1, 2, 3, 4, 5] }
foo.transform_values { |v| v.map(&:to_s) } #=> {"baz"=>["1", "2", "3", "4", "5"]}

Related

Ruby using regex in select block

I've been having a lot of trouble sifting out regex matches. I could use scan, but since it only operates over a string, and I dont want to use a join on the array in question, it is much more tedious. I want to be able to do something like this:
array = ["a1d", "6dh","th3"].select{|x| x =~ /\d/}
# => ["1", "6", "3"}
However this never seems to work. Is there a work around or do I just need to use scan?
Try: Array#map
> array = ["a1d", "6dh","th3"].map {|x| x[/\d+/]}
#=> ["1", "6", "3"]
Note:
select
Returns a new array containing all elements of ary for which the given
block returns a true value.
In your case each element contains digit and it returns true, so you are getting original element via select. while map will perform action on each element and return new array with performed action on each element.
You can use grep with a block:
array = ["a1d", "6dh", "th3"]
array.grep(/(\d)/) { $1 }
#=> ["1", "6", "3"]
It passes each matching element to the block and returns an array containing the block's results.
$1 is a special global variable containing the first capture group.
Unlike map, only matching elements are returned:
array = ["a1d", "foo", "6dh", "bar", "th3"]
array.grep(/(\d)/) { $1 }
#=> ["1", "6", "3"]
array.map { |s| s[/\d/] }
#=> ["1", nil, "6", nil, "3"]
Depending on your requirements, you may wish to construct a hash.
arr = ["a1d", "6dh", "th3", "abc", "3for", "rg6", "def"]
arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |str,h| h[str[/\d+/]] << str }
#=> {"1"=>["a1d"], "6"=>["6dh", "rg6"], "3"=>["th3", "3for"], nil=>["abc", "def"]}
Hash.new { |h,k| h[k] = [] } creates an empty hash with a default block, represented by the block variable h. That means that if the hash does not have a key k, the block is executed, adding the key value pair k=>[] to the hash, after which h[k] << k is executed.
The above is a condensed (and Ruby-like) way of writing the following.
h = {}
arr.each do |str|
s = str[/\d+/]
h[s] = [] unless h.key?(s)
h[s] << str
end
h
# => {"1"=>["a1d"], "6"=>["6dh", "rg6"], "3"=>["th3", "3for"], nil=>["abc", "def"]}
The expression in the third line could alternatively be written
arr.each_with_object({}) { |str,h| (h[str[/\d+/]] ||= []) << str }
h[str[/\d+/]] ||= [] sets h[str[/\d+/]] to an empty array if the hash h does not have a key str[/\d+/].
See Enumerable#each_with_object and Hash::new.
#Stefan suggests
arr.group_by { |str| str[/\d+/] }
#=> {"1"=>["a1d"], "6"=>["6dh", "rg6"], "3"=>["th3", "3for"], nil=>["abc", "def"]}
What can I say?

Flatten deep nested hash to array for sha1 hashing

I want to compute an unique sha1 hash from a ruby hash. I thought about
(Deep) Converting the Hash into an array
Sorting the array
Join array by empty string
calculate sha1
Consider the following hash:
hash = {
foo: "test",
bar: [1,2,3]
hello: {
world: "world",
arrays: [
{foo: "bar"}
]
}
}
How can I get this kind of nested hash into an array like
[:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]
I would then sort the array, join it with array.join("") and compute the sha1 hash like this:
require 'digest/sha1'
Digest::SHA1.hexdigest hash_string
How could I flatten the hash like I described above?
Is there already a gem for this?
Is there a quicker / easier way to solve this? I have a large amount of objects to convert (~700k), so performance does matter.
EDIT
Another problem that I figured out by the answers below are this two hashes:
a = {a: "a", b: "b"}
b = {a: "b", b: "a"}
When flattening the hash and sorting it, this two hashes produce the same output, even when a == b => false.
EDIT 2
The use case for this whole thing is product data comparison. The product data is stored inside a hash, then serialized and sent to a service that creates / updates the product data.
I want to check if anything has changed inside the product data, so I generate a hash from the product content and store it in a database. The next time the same product is loaded, I calculate the hash again, compare it to the one in the DB and decide wether the product needs an update or not.
EDIT : As you detailed, two hashes with keys in different order should give the same string. I would reopen the Hash class to add my new custom flatten method :
class Hash
def custom_flatten()
self.sort.map{|pair| ["key: #{pair[0]}", pair[1]]}.flatten.map{ |elem| elem.is_a?(Hash) ? elem.custom_flatten : elem }.flatten
end
end
Explanation :
sort converts the hash to a sorted array of pairs (for the comparison of hashes with different keys order)
.map{|pair| ["key: #{pair[0]}", pair[1]]} is a trick to differentiate keys from values in the final flatten array, to avoid the problem of {a: {b: {c: :d}}}.custom_flatten == {a: :b, c: :d}.custom_flatten
flatten converts an array of arrays into a single array of values
map{ |elem| elem.is_a?(Hash) ? elem.custom_flatten : elem } calls back fully_flatten on any sub-hash left.
Then you just need to use :
require 'digest/sha1'
Digest::SHA1.hexdigest hash.custom_flatten.to_s
I am not aware of a gem that does something like what you are looking for. There is a Hash#flatten method in ruby, but it does not flatten nested hashes recursively. Here is a straight forward recursive function that will flatten in the way that you requested in your question:
def completely_flatten(hsh)
hsh.flatten(-1).map{|el| el.is_a?(Hash) ? completely_flatten(el) : el}.flatten
end
This will yield
hash = {
foo: "test",
bar: [1,2,3]
hello: {
world: "earth",
arrays: [
{my: "example"}
]
}
}
completely_flatten(hash)
#=> [:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]
To get the string representation you are looking for (before making the sha1 hash) convert everything in the array to a string before sorting so that all of the elements can be meaningfully compared or else you will get an error:
hash_string = completely_flatten(hash).map(&:to_s).sort.join
#=> "123arraysbarearthexamplefoohellomytestworld"
The question is how to "flatten" a hash. There is a second, implicit, question concerning sha1, but, by SO rules, that needs to be addressed in a separate question. You can "flatten" any hash or array as follows.
Code
def crush(obj)
recurse(obj).flatten
end
def recurse(obj)
case obj
when Array then obj.map { |e| recurse e }
when Hash then obj.map { |k,v| [k, recurse(v)] }
else obj
end
end
Example
crush({
foo: "test",
bar: [1,2,3],
hello: {
world: "earth",
arrays: [{my: "example"}]
}
})
#=> [:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]
crush([[{ a:1, b:2 }, "cat", [3,4]], "dog", { c: [5,6] }])
#=> [:a, 1, :b, 2, "cat", 3, 4, "dog", :c, 5, 6]
Use Marshal for Fast Serialization
You haven't articulated a useful reason to change your data structure before hashing. Therefore, you should consider marshaling for speed unless your data structures contain unsupported objects like bindings or procs. For example, using your hash variable with the syntax corrected:
require 'digest/sha1'
hash = {
foo: "test",
bar: [1,2,3],
hello: {
world: "world",
arrays: [
{foo: "bar"}
]
}
}
Digest::SHA1.hexdigest Marshal.dump(hash)
#=> "f50bc3ceb514ae074a5ab9672ae5081251ae00ca"
Marshal is generally faster than other serialization options. If all you need is speed, that will be your best bet. However, you may find that JSON, YAML, or a simple #to_s or #inspect meet your needs better for other reasons. As long as you are comparing similar representations of your object, the internal format of the hashed object is largely irrelevant to ensuring you have a unique or unmodified object.
Any solution based on flattening the hash will fail for nested hashes. A robust solution is to explicitly sort the keys of each hash recursively (from ruby 1.9.x onwards, hash keys order is preserved), and then serialize it as a string and digest it.
def canonize_hash(h)
r = h.map { |k, v| [k, v.is_a?(Hash) ? canonize_hash(v) : v] }
Hash[r.sort]
end
def digest_hash(hash)
Digest::SHA1.hexdigest canonize_hash(hash).to_s
end
digest_hash({ foo: "foo", bar: "bar" })
# => "ea1154f35b34c518fda993e8bb0fe4dbb54ae74a"
digest_hash({ bar: "bar", foo: "foo" })
# => "ea1154f35b34c518fda993e8bb0fe4dbb54ae74a"

Sorting an array of hashes by a date field

I have an object with many arrays of hashes, one of which I want to sort by a value in the 'date' key.
#array['info'][0] = {"name"=>"personA", "date"=>"23/09/1980"}
#array['info'][1] = {"name"=>"personB", "date"=>"01/04/1970"}
#array['info'][2] = {"name"=>"personC", "date"=>"03/04/1975"}
I have tried various methods using Date.parse and with collect but an unable to find a good solution.
Edit:
To be clear I want to sort the original array in place
#array['info'].sort_by { |i| Date.parse i['date'] }.collect
How might one solve this elegantly the 'Ruby-ist' way. Thanks
Another way, which doesn't require converting the date strings to date objects, is the following.
Code
def sort_by_date(arr)
arr.sort_by { |h| h["date"].split('/').reverse }
end
If arr is to be sorted in place, use Array#sort_by! rather than Enumerable#sort_by.
Example
arr = [{ "name"=>"personA", "date"=>"23/09/1980" },
{ "name"=>"personB", "date"=>"01/04/1970" },
{ "name"=>"personC", "date"=>"03/04/1975" }]
sort_by_date(arr)
#=> [{ "name"=>"personB", "date"=>"01/04/1970" },
# { "name"=>"personC", "date"=>"03/04/1975" },
# { "name"=>"personA", "date"=>"23/09/1980" }]
Explanation
For arr in the example, sort_by passes the first element of arr into its block and assigns it to the block variable:
h = { "name"=>"personA", "date"=>"23/09/1980" }
then computes:
a = h["date"].split('/')
#=> ["23", "09", "1980"]
and then:
b = a.reverse
#=> ["1980", "09", "23"]
Similarly, we obtain b equal to:
["1970", "04", "01"]
and
["1975", "04", "03"]
for each of the other two elements of arr.
If you look at the docs for Array#<=> you will see that these three arrays are ordered as follows:
["1970", "04", "01"] < ["1975", "04", "03"] < ["1980", "09", "23"]
There is no need to convert the string elements to integers.
Looks fine overall. Although you can drop the collect call since it's not needed and use sort_by! to modify the array in-place (instead of reassigning):
#array['info'].sort_by! { |x| Date.parse x['date'] }

add up values from 2 arrays based on duplicate values of the other one

A similar question has been answered here However I'd like to know how I can add up/group the numbers from one array based on the duplicate values of another array.
test_names = ["TEST1", "TEST1", "TEST2", "TEST3", "TEST2", "TEST4", "TEST4", "TEST4"]
numbers = ["5", "4", "3", "2", "9", "7", "6", "1"]
The ideal result I'd like to get is a hash or an array with:
{"TEST1" => 9, "TEST2" => 12, "TEST3" => 2, "TEST4" => 14}
Another way I found you can do:
test_names.zip(numbers).each_with_object(Hash.new(0)) {
|arr, hsh| hsh[arr[0]] += arr[1].to_i }
You can do it like this:
my_hash = Hash.new(0)
test_names.each_with_index {|name, index| my_hash[name] += numbers[index].to_i}
my_hash
#=> {"TEST1"=>9, "TEST2"=>12, "TEST3"=>2, "TEST4"=>14}
I wish to follow #squidguy's example and use Enumerable#zip, but with a different twist:
{}.tap { |h| test_names.zip(numbers.map(&:to_i)) { |a|
h.update([a].to_h) { |_,o,n| o+n } } }
#=> {"TEST1"=>9, "TEST2"=>12, "TEST3"=>2, "TEST4"=>14}
Object#tap is here just a substitute for Enumerable#each_with_object or for having h={} initially and a last line with just h.
I'm using the form of Hash#update (aka merge!) that takes a block for determining the merged value for each key that is present in both the original hash (h) and the hash being merged ([a].to_h). There are three block variables, the shared key (which we don't use here, so I've replaced it with the placeholder _), and the values for that key for the original hash (o) and for the hash being merged (n).

Creating a hash from two arrays with identical values in Ruby

I'm having issues creating a hash from 2 arrays when values are identical in one of the arrays.
e.g.
names = ["test1", "test2"]
numbers = ["1", "2"]
Hash[names.zip(numbers)]
works perfectly it gives me exactly what I need => {"test1"=>"1", "test2"=>"2"}
However if the values in "names" are identical then it doesn't work correctly
names = ["test1", "test1"]
numbers = ["1", "2"]
Hash[names.zip(numbers)]
shows {"test1"=>"2"} however I expect the result to be {"test1"=>"1", "test1"=>"2"}
Any help is appreciated
Hashes can't have duplicate keys. Ever.
If they were permitted, how would you access "2"? If you write myhash["test1"], which value would you expect?
Rather, if you expect to have several values under one key, make a hash of arrays.
names = ["test1", "test1", "test2"]
numbers = ["1", "2", "3"]
Hash.new.tap { |h| names.zip(numbers).each { |k, v| (h[k] ||= []) << v } }
# => {"test1"=>["1", "2"], "test2"=>["3"]}

Resources