Ruby - Merge two hashes with no like keys based on matching value - ruby

I would like to find an efficient way to merge two hashes together and the resulting hash must contain all original data and a new key/value pair based on criteria below. There are no keys in common between the two hashes, however the key in one hash matches the value of a key in the adjacent hash.
Also note that the second hash is actually an array of hashes.
I am working with a relatively large data set, so looking for an efficient solution but hoping to keep the code readable at the same time since it will likely end up in production.
Here is the structure of my data:
# Hash
hsh1 = { "devicename1"=>"active", "devicename2"=>"passive", "devicename3"=>"passive" }
# Array of Hashes
hsh2 = [ { "host" => "devicename3", "secure" => true },
{ "host" => "devicename2", "secure" => true },
{ "host" => "devicename1", "secure" => false } ]
Here is what I need to accomplish:
I need to merge the data from hsh1 into hsh2 keeping all of the original key/value pairs in hsh2 and adding a new key called activation_status using the the data in hsh1.
The resulting hsh2 would be as follows:
hsh2 = [{ "host"=>"devicename3", "secure"=>true, "activation_status"=>"passive" },
{ "host"=>"devicename2", "secure"=>true, "activation_status"=>"passive" },
{ "host"=>"devicename1", "secure"=>false, "activation_status"=>"active" }]
This may already be answered on StackOverflow but I looked for some time and couldn't find a match. My apologies in advance if this is a duplicate.

I suggest something along the lines of:
hash3 = hash2.map do |nestling|
host = nestling["host"]
status = hash1[host]
nestling["activation_status"] = status
nestling
end
Which of course you can shrink down a bit. This version uses less variables and in-place edit of hash2:
hash2.each do |nestling|
nestling["activation_status"] = hash1[nestling["host"]]
end

This will do it:
hsh2.map { |h| h.merge 'activation_status' => hsh1[h['host']] }
However, I think it will make a copy of the data instead of just walking the array of hashes and adding the appropriate key=>value pair. I don't think it would have a huge impact on performance unless your data set is large enough to consume a significant portion of the memory allocated to your app.

Related

Unable to sort a multi-level hash (nested hash) in perl 5.30

As part of migration from perl 5.8 to perl 5.30, unable to get the perl nested hash in sorted fashion. Tried disabling the hash randomization features in 5.30 (set PERL_PERTURB_KEYS=0
set PERL_HASH_SEED=0x00), but still the sorting doesn't apply to multi-level /nested hash.
Apart from sorting the foreach-keys in the perl code, is there any other way, like disabling any environment variables/configuration , so as to get the values in sorted fashion wrt to perl 5.30. Tried using the deprecated Deep:Hash::Util (nested hash sort of perl5.6) also, but did not work wrt to nested/mulit-level hash.
Ex:
not-working:
$VAR3 = 'Mapping_1';
$VAR4 = {
'2' => {
'ShortName' => 'Mapping_Tx2',
'FileName' => 'Appl_1.arxml',
},
'1' => {
'ShortName' => 'Mapping_Tx1',
'FileName' => 'Appl_1.arxml',
}
};
working:
$VAR3 = 'Mapping_1';
$VAR4 = {
'1' => {
'ShortName' => 'Mapping_Tx1',
'FileName' => 'Appl_1.arxml',
},
'2' => {
'ShortName' => 'Mapping_Tx2',
'FileName' => 'Appl_1.arxml',
}
};
Perl's hashes are not and were not ordered. If your code relied on particular orderings in Perl 5.8, it was probably buggy and only worked by accident.
You have two reasonable options here.
You can stop trying to make the data structure itself keep a particular order, and instead choose an order when you use the data structure. For example, to iterate over all hash keys in a stable order you might use for (sort keys %hash) instead of for (keys %hash). This is almost always the correct approach.
Use a different data structure that maintains order. If you need a stable ordering but do not need fast key→value access, consider an array. Otherwise, consider creating a class that implements the data structure you need, probably by using a hash internally and also an array to store the order. Which approach to choose depends on the particular order you want. For example, Hash::Ordered maintains the insertion order.

Iterate through hashes to find values predefined in an array

I have an array with hashes:
test = [
{"type"=>1337, "age"=>12, "name"=>"Eric Johnson"},
{"type"=>1338, "age"=>18, "name"=>"John Doe"},
{"type"=>1339, "age"=>22, "name"=>"Carl Adley"},
{"type"=>1340, "age"=>25, "name"=>"Anna Brent"}
]
I am interested in getting all the hashes where the name key equals to a value that can be found in an array:
get_hash_by_name = ["John Doe","Anna Brent"]
Which would end up in the following:
# test_sorted = would be:
# {"type"=>1338, "age"=>18, "name"=>"John Doe"}
# {"type"=>1340, "age"=>25, "name"=>"Anna Brent"}
I probably have to iterate with test.each somehow, but I still trying to get a grasp of Ruby. Happy for all help!
Here's something to meditate on:
Iterating over an array to find something is slow, even if it's a sorted array. Computer languages have various structures we can use to improve the speed of lookups, and in Ruby Hash is usually a good starting point. Where an Array is like reading from a sequential file, a Hash is like reading from a random-access file, we can jump right to the record we need.
Starting with your test array-of-hashes:
test = [
{'type'=>1337, 'age'=>12, 'name'=>'Eric Johnson'},
{'type'=>1338, 'age'=>18, 'name'=>'John Doe'},
{'type'=>1339, 'age'=>22, 'name'=>'Carl Adley'},
{'type'=>1340, 'age'=>25, 'name'=>'Anna Brent'},
{'type'=>1341, 'age'=>13, 'name'=>'Eric Johnson'},
]
Notice that I added an additional "Eric Johnson" record. I'll get to that later.
I'd create a hash that mapped the array of hashes to a regular hash where the key of each pair is a unique value. The 'type' key/value pair appears to fit that need well:
test_by_types = test.map { |h| [
h['type'], h]
}.to_h
# => {1337=>{"type"=>1337, "age"=>12, "name"=>"Eric Johnson"},
# 1338=>{"type"=>1338, "age"=>18, "name"=>"John Doe"},
# 1339=>{"type"=>1339, "age"=>22, "name"=>"Carl Adley"},
# 1340=>{"type"=>1340, "age"=>25, "name"=>"Anna Brent"},
# 1341=>{"type"=>1341, "age"=>13, "name"=>"Eric Johnson"}}
Now test_by_types is a hash using the type value to point to the original hash.
If I create a similar hash based on names, where each name, unique or not, points to the type values, I can do fast lookups:
test_by_names = test.each_with_object(
Hash.new { |h, k| h[k] = [] }
) { |e, h|
h[e['name']] << e['type']
}.to_h
# => {"Eric Johnson"=>[1337, 1341],
# "John Doe"=>[1338],
# "Carl Adley"=>[1339],
# "Anna Brent"=>[1340]}
Notice that "Eric Johnson" points to two records.
Now, here's how we look up things:
get_hash_by_name = ['John Doe', 'Anna Brent']
test_by_names.values_at(*get_hash_by_name).flatten
# => [1338, 1340]
In one quick lookup Ruby returned the matching types by looking up the names.
We can take that output and grab the original hashes:
test_by_types.values_at(*test_by_names.values_at(*get_hash_by_name).flatten)
# => [{"type"=>1338, "age"=>18, "name"=>"John Doe"},
# {"type"=>1340, "age"=>25, "name"=>"Anna Brent"}]
Because this is running against hashes, it's fast. The hashes can be BIG and it'll still run very fast.
Back to "Eric Johnson"...
When dealing with the names of people it's likely to get collisions of the names, which is why test_by_names allows multiple type values, so with one lookup all the matching records can be retrieved:
test_by_names.values_at('Eric Johnson').flatten
# => [1337, 1341]
test_by_types.values_at(*test_by_names.values_at('Eric Johnson').flatten)
# => [{"type"=>1337, "age"=>12, "name"=>"Eric Johnson"},
# {"type"=>1341, "age"=>13, "name"=>"Eric Johnson"}]
This will be a lot to chew on if you're new to Ruby, but the Ruby documentation covers it all, so dig through the Hash, Array and Enumerable class documentation.
Also, *, AKA "splat", explodes the array elements from the enclosing array into separate parameters suitable for passing into a method. I can't remember where that's documented.
If you're familiar with database design this will look very familiar, because it's similar to how we do database lookups.
The point of all of this is that it's really important to consider how you're going to store your data when you first ingest it into your program. Do it wrong and you'll jump through major hoops trying to do useful things with it. Do it right and the code and data will flow through very easily, and you'll be able to massage/extract/combine the data easily.
Said differently, Arrays are containers useful for holding things you want to access sequentially, such as jobs you want to print, sites you need to access in order, files you want to delete in a specific order, but they're lousy when you want to lookup and work with a record randomly.
Knowing which container is appropriate is important, and for this particular task, it appears that an array of hashes isn't appropriate, since there's no fast way of accessing specific ones.
And that's why I made my comment above asking what you were trying to accomplish in the first place. See "What is the XY problem?" and "XyProblem" for more about that particular question.
You can use select and include? so
test.select {|object| get_hash_by_name.include? object['name'] }
…should do the job.

How to return array of hashes with modified values

I've been successfully converting an array of objects into an array of hashes. But I also want to modify the objects slightly as well, before getting the combined hash.
This is what I do to convert array of objects into a combined hash:
prev_vars.map(&:to_h).reduce({}, :merge)
{ "b"=>#<Money fractional:400 currency:GBP> }
But what I want to have instead, which required to additionally call to_i is:
{ "b"=> 4 }
I got this working using this line, but I am looking for a more elegant solution:
prev_vars.map(&:to_h).reduce({}) { |combined, v| combined.merge({v.keys[0] => v.values[0].to_i}) }
How large is prev_vars? map(&:to_h) could require a fair amount of memory overhead, because it instantiates an entirely new array. Instead, I'd recommend switching the order: first #reduce, then #to_h:
prev_vars.reduce({}) do |combined, var|
combined.merge! var.to_h.transform_values!(&:to_i)
end
Note the use of #merge! rather than #merge so that a new hash is not created for combined for each iteration of the loop.

Filter large json file with Ruby

As a total beginner of programming, I am trying to filter a JSON file for my master's thesis at university. The file contains approximately 500 hashes of which 115 are the ones I am interested in.
What I want to do:
(1) Filter the file and select the hashes I am interested in
(2) For each selected hash, return only some specific keys
The format of the array with the hashes ("loans") included:
{"header": {
"total":546188,
"page":868,
"date":"2013-04-11T10:21:24Z",
"page_size":500},
"loans": [{
"id":427853,
"name":"Peter Pan",
...,
"status":"expired",
"paid_amount":525,
...,
"activity":"Construction Supplies",
"sector":"Construction"," },
... ]
}
Being specific, I would like to have the following:
(1) Filter out the "loans" hashes with "status":"expired"
(2) Return for each such "expired" loan certain keys only: "id", "name", "activity", ...
(3) Eventually, export all that into one file that I can analyse in Excel or with some stats software (SPSS or Stata)
What I have come up with myself so far is this:
require 'rubygems'
require 'json'
toberead = File.read('loans_868.json')
another = JSON.parse(toberead)
read = another.select {|hash| hash['status'] == 'expired'}
puts hash
This is obviously totally incomplete. And I feel totally lost.
Right now, I don't know where and how to continue. Despite having googled and read through tons of articles on how to filter JSON...
Is there anyone who can help me with this?
The JSON will be parsed as a hash, 'header' is one key, 'loans' is another key.
so after your JSON.parse line, you can do
loans = another['loans']
now loans is an array of hashes, each hash representing one of your loans.
you can then do
expired_loans = loans.select {|loan| loan['status'] == 'expired'}
puts expired_loans
to get at your desired output.

Ruby: Ordered array and add to array at next number if number exists

I was wondering if anyone new of an easy way to organize an array by numbers but if the number already exists push it to the next number that doesn't exist I was thinking of just creating a multi-dimensional ordered array where if numbers clash (such as 2 pages having 1) then the first would be [1][1] and the second would be [1][2] but is there a better way to handle this?
Edit; an example:
page1 -> sets order to 1
page2 -> sets order to 1
page3 -> sets order to 2
Normally I would go through and YAML read the pages configurations and get the order and then use that number and set _site.sidebar[_config["order"]] but in this case it would clash and it wouldn't add it. So I'm looking for a way to allow for user mistakes but preserve order keeping the first found as one but if one exists shift the array down and put the second 1 as two.
This sounds like you're implementing a hashtable, and using 'number' as hash. There are all kinds of algorithms for that, just look for hashtable algorithms.
Here is the final snippet on how I implemented what I was asking about, just in case somebody else stumbles upon this thread looking for the same sort of thing. I basically just wanted to preserve the order, in my actual application of the code I used a normal multi-dimensional array since "order" was pulled from YAML front so it is it's own variable.
data = []
demo = {
"page_1" => {
"order" => 1,
"data" => "Hello World 1"
},
"page_2" => {
"order" => 2,
"data" => "Hello World 2"
},
"page_3" => {
"order" => 1,
"data" => "Hello World 3"
},
"page_4" => {
"order" => "a",
"data" => "Hello World 4"
}
}
demo.each |key, page|
local_data = page["data"]
order = page["order"].to_i.floor
data[order] ||= []
data[order] << local_data
}
puts data.flatten.join(" ").strip

Resources