categorize by hash value - ruby

I have an array of hashes with values like:
by_person = [{ :person => "Jane Smith", :filenames => ["Report.pdf", "File2.pdf"]}, {:person => "John Doe", :filenames => ["Report.pdf] }]
I would like to end up with another array of hashes (by_file) that has each unique value from the filenames key as a key in the by_file array:
by_file = [{ :filename => "Report.pdf", :people => ["Jane Smith", "John Doe"] }, { :filename => "File2.pdf", :people => [Jane Smith] }]
I have tried:
by_file = []
by_person.each do |person|
person[:filenames].each do |file|
unless by_file.include?(file)
# list people that are included in file
by_person_each_file = by_person.select{|person| person[:filenames].include?(file)}
by_person_each_file.each do |person|
by_file << {
:file => file,
:people => person[:person]
}
end
end
end
end
as well as:
by_file.map(&:to_a).reduce({}) {|h,(k,v)| (h[k] ||= []) << v; h}
Any feedback is appreciated, thanks!

Doesn't seem too tricky, but the way you're compiling it isn't very efficient:
by_person = [{ :person => "Jane Smith", :filenames => ["Report.pdf", "File2.pdf"]}, {:person => "John Doe", :filenames => ["Report.pdf"] }]
by_file = by_person.each_with_object({ }) do |entry, index|
entry[:filenames].each do |filename|
set = index[filename] ||= [ ]
set << entry[:person]
end
end.collect do |filename, people|
{
filename: filename,
people: people
}
end
puts by_file.inspect
# => [{:filename=>"Report.pdf", :people=>["Jane Smith", "John Doe"]}, {:filename=>"File2.pdf", :people=>["Jane Smith"]}]
This makes use of a hash to group the people by filename, essentially inverting your structure, and then converts that into the final format in a second pass. This is more efficient than working with the final format during compilation as that's not indexed and requires an expensive linear search to find the correct container to insert into.
An alternate method is to create a default hash constructor that makes the structure you're looking for:
by_file_hash = Hash.new do |h,k|
h[k] = {
filename: k,
people: [ ]
}
end
by_person.each do |entry|
entry[:filenames].each do |filename|
by_file_hash[filename][:people] << entry[:person]
end
end
by_file = by_file_hash.values
puts by_file.inspect
# => [{:filename=>"Report.pdf", :people=>["Jane Smith", "John Doe"]}, {:filename=>"File2.pdf", :people=>["Jane Smith"]}]
This may or may not be easier to understand.

This is one way to do it.
Code
def convert(by_person)
by_person.each_with_object({}) do |hf,hp|
hf[:filenames].each do |fname|
hp.update({ fname=>[hf[:person]] }) { |_,oh,nh| oh+nh }
end
end.map { |fname,people| { :filename => fname, :people=>people } }
end
Example
by_person = [{:person=>"Jane Smith", :filenames=>["Report.pdf", "File2.pdf"]},
{:person=>"John Doe", :filenames=>["Report.pdf"]}]
convert(by_person)
#=> [{:filename=>"Report.pdf", :people=>["Jane Smith", "John Doe"]},
# {:filename=>"File2.pdf", :people=>["Jane Smith"]}]
Explanation
For by_person in the example:
enum1 = by_person.each_with_object({})
#=>[{:person=>"Jane Smith", :filenames=>["Report.pdf", "File2.pdf"]},
{:person=>"John Doe", :filenames=>["Report.pdf"]}]:each_with_object({})>
Let's see what values the enumerator enum will pass into the block:
enum1.to_a
#=> [[{:person=>"Jane Smith", :filenames=>["Report.pdf", "File2.pdf"]}, {}],
# [{:person=>"John Doe", :filenames=>["Report.pdf"]}, {}]]
As will be shown below, the empty hash in the first element of the enumerator will no longer be empty with the second element is passed into the block.
The first element is assigned to the block variables as follows (I've indented to indicate the block level):
hf = {:person=>"Jane Smith", :filenames=>["Report.pdf", "File2.pdf"]}
hp = {}
enum2 = hf[:filenames].each
#=> #<Enumerator: ["Report.pdf", "File2.pdf"]:each>
enum2.to_a
#=> ["Report.pdf", "File2.pdf"]
"Report.pdf" is passed to the inner block, assigned to the block variable:
fname = "Report.pdf"
and
hp.update({ "Report.pdf"=>["Jane Smith"] }) { |_,oh,nh| oh+nh }
#=> {"Report.pdf"=>["Jane Smith"]}
is executed, returning the updated value of hp.
Here the block for Hash#update (aka Hash#merge!) is not consulted. It is only needed when the hash hp and the merging hash (here { fname=>["Jane Smith"] }) have one or more common keys. For each common key, the key and the corresponding values from the two hashes are passed to the block. This is elaborated below.
Next, enum2 passes "File2.pdf" into the block and assigns it to the block variable:
fname = "File2.pdf"
and executes
hp.update({ "File2.pdf"=>["Jane Smith"] }) { |_,oh,nh| oh+nh }
#=> {"Report.pdf"=>["Jane Smith"], "File2.pdf"=>["Jane Smith"]}
which returns the updated value of hp. Again, update's block was not consulted. We're now finished with Jane, so enum1 next passes its second and last value into the block and assigns the block variables as follows:
hf = {:person=>"John Doe", :filenames=>["Report.pdf"]}
hp = {"Report.pdf"=>["Jane Smith"], "File2.pdf"=>["Jane Smith"]}
Note that hp has now been updated. We then have:
enum2 = hf[:filenames].each
#=> #<Enumerator: ["Report.pdf"]:each>
enum2.to_a
#=> ["Report.pdf"]
enum2 assigns
fname = "Report.pdf"
and executes:
hp.update({ "Report.pdf"=>["John Doe"] }) { |_,oh,nv| oh+nv }
#=> {"Report.pdf"=>["Jane Smith", "John Doe"], "File2.pdf"=>["Jane Smith"]}
In making this update, hp and the hash being merged both have the key "Report.pdf". The following values are therefore passed to the block variables |k,ov,nv|:
k = "Report.pdf"
oh = ["Jane Smith"]
nh = ["John Doe"]
We don't need the key, so I've replaced it with an underscore. The block returns
["Jane Smith"]+["John Doe"] #=> ["Jane Smith", "John Doe"]
which becomes the new value for the key "Report.pdf".
Before turning to the final step, I'd like to suggest that you consider stopping here. That is, rather than constructing an array of hashes, one for each file, just leave it as a hash with the files as keys and arrays of persons the values:
{ "Report.pdf"=>["Jane Smith", "John Doe"], "File2.pdf"=>["Jane Smith"] }
The final step is straightforward:
hp.map { |fname,people| { :filename => fname, :people=>people } }
#=> [{ :filename=>"Report.pdf", :people=>["Jane Smith", "John Doe"] },
# { :filename=>"File2.pdf", :people=>["Jane Smith"] }]

Related

Questions on implementing hashes in ruby

I'm new to ruby, I am solving a problem that involves hashes and key. The problem asks me to Implement a method, #pet_types, that accepts a hash as an argument. The hash uses people's # names as keys, and the values are arrays of pet types that the person owns. My question is about using Hash#each method to iterate through each num inside the array. I was wondering if there's any difference between solving the problem using hash#each or hash.sort.each?
I spent several hours coming up different solution and still to figure out what are the different approaches between the 2 ways of solving the problem below.
I include my code in repl.it: https://repl.it/H0xp/6 or you can see below:
# Pet Types
# ------------------------------------------------------------------------------
# Implement a method, #pet_types, that accepts a hash as an argument. The hash uses people's
# names as keys, and the values are arrays of pet types that the person owns.
# Example input:
# {
# "yi" => ["dog", "cat"],
# "cai" => ["dog", "cat", "mouse"],
# "venus" => ["mouse", "pterodactyl", "chinchilla", "cat"]
# }
def pet_types(owners_hash)
results = Hash.new {|h, k| h[k] = [ ] }
owners_hash.sort.each { |k, v| v.each { |pet| results[pet] << k } }
results
end
puts "-------Pet Types-------"
owners_1 = {
"yi" => ["cat"]
}
output_1 = {
"cat" => ["yi"]
}
owners_2 = {
"yi" => ["cat", "dog"]
}
output_2 = {
"cat" => ["yi"],
"dog" => ["yi"]
}
owners_3 = {
"yi" => ["dog", "cat"],
"cai" => ["dog", "cat", "mouse"],
"venus" => ["mouse", "pterodactyl", "chinchilla", "cat"]
}
output_3 = {
"dog" => ["cai", "yi"],
"cat" => ["cai", "venus", "yi"],
"mouse" => ["cai", "venus"],
"pterodactyl" => ["venus"],
"chinchilla" => ["venus"]
}
# method 2
# The 2nd and 3rd method should return a hash that uses the pet types as keys and the values should
# be a list of the people that own that pet type. The names in the output hash should
# be sorted alphabetically
# switched_hash = Hash.new()
# owners_hash.each do |owner, pets_array|
# pets_array.each do |pet|
# select_owners = owners_hash.select { |owner, pets_array|
owners_hash[owner].include?(pet) }
# switched_hash[pet] = select_owners.keys.sort
# end
# end
# method 3
#switched_hash
# pets = Hash.new {|h, k| h[k] = [ ] } # WORKS SAME AS: pets = Hash.new( Array.new )
# owners = owners_hash.keys.sort
# owners.each do |owner|
# owners_hash[owner].each do |pet|
# pets[pet] << owner
# end
# end
# pets
# Example output:
# output_3 = {
# "dog" => ["cai", "yi"],
# "cat" => ["cai", "venus", "yi"], ---> (sorted alphabetically!)
# "mouse" => ["cai", "venus"],
# "pterodactyl" => ["venus"],
# "chinchilla" => ["venus"]
# }
I used a hash data structure in my program to first solve this problem. Then I tried to rewrite it using the pet_hash. And my final codes is the following:
def pet_types(owners_hash)
pets_hash = Hash.new { |k, v| v = [] }
owners_hash.each do |owner, pets|
pets.each do |pet|
pets_hash[pet] += [owner]
end
end
pets_hash.values.each(&:sort!)
pets_hash
end
puts "-------Pet Types-------"
owners_1 = {
"yi" => ["cat"]
}
output_1 = {
"cat" => ["yi"]
}
owners_2 = {
"yi" => ["cat", "dog"]
}
output_2 = {
"cat" => ["yi"],
"dog" => ["yi"]
}
owners_3 = {
"yi" => ["dog", "cat"],
"cai" => ["dog", "cat", "mouse"],
"venus" => ["mouse", "pterodactyl", "chinchilla", "cat"]
}
output_3 = {
"dog" => ["cai", "yi"],
"cat" => ["cai", "venus", "yi"],
"mouse" => ["cai", "venus"],
"pterodactyl" => ["venus"],
"chinchilla" => ["venus"]
}
puts pet_types(owners_1) == output_1
puts pet_types(owners_2) == output_2
puts pet_types(owners_3) == output_3
Hash#sort has the same effect (at least for my basic test) as Hash#to_a followed by Array#sort.
hash = {b: 2, a: 1}
hash.to_a.sort # => [[:a, 1, [:b, 2]]
hash.sort # => the same
Now let's look at #each, both on Hash and Array.
When you provide two arguments to the block, that can handle both cases. For the hash, the first argument will be the key and the second will be the value. For the nested array, the values essentially get splatted out to the args:
[[:a, 1, 2], [:b, 3, 4]].each { |x, y, z| puts "#{x}-#{y}-#{z}" }
# => a-1-2
# => b-3-4
So basically, you should think of Hash#sort to be a shortcut to Hash#to_a followed by Array#sort, and recognize that #each will work the same on a hash as a hash converted to array (a nested array). In this case, it doesn't matter which approach you take. Clearly if you need to sort iteration by the keys then you should use sort.

Extracting Data from array of hashes Ruby

Given that I have the following array of hashes
#response = { "0"=>{"forename_1"=>"John", "surname_1"=>"Smith"},
"1"=>{"forename_1"=>"Chris", "surname_1"=>"Jenkins"},
"2"=>{"forename_1"=>"Billy", "surname_1"=>"Bob"},
"Status" => 100
}
I am looking to create an array of the forename_1 and surname_1 values combined, so desired output would be
["John Smith", "Chris Jenkins", "Billy Bob"]
So I can get this far, but need further assistance
# Delete the Status as not needed
#response.delete_if { |k| ["Status"].include? k }
#response.each do |key, value|
puts key
#This will print out 0 1 2
puts value
# This will print {"forename_1"=>"John", "surname_1"=>"Smith"}, "{"forename_1"=>"Chris", "surname_1"=>"Jenkins"}, "{"forename_1"=>"Billy", "surname_1"=>"Bob"}
puts value.keys
# prints ["forename_1", "surname_1"], ["forename_1", "surname_1"], ["forename_1", "surname_1"]
puts value.values
# prints ["John", "Smith"], ["Chris", "Jenkins"], ["Billy", "Bob"]
value.map { |v| v["forename_1"] }
# However i get no implicit conversion of String into Integer error
end
What am i doing wrong here?
Thanks
Another way :
#response.values.grep(Hash).map { |t| t.values.join(' ')}
What you have to do is to get the values of the #response hash, filter out what is not an instance of Hash, and then join together the forename and the surname, I would do something like this:
#response.values.grep(Hash).map { |h| "#{h['forename_1']} #{h['surname_1']}" }
# => ["John Smith", "Chris Jenkins", "Billy Bob"]
#response.values.map{ |res|
[res["forename_1"] , res["surname_1"]].join(' ') if res.is_a?(Hash)
}.compact

Searching through two multidimensional arrays and grouping together similar subarrays

I am trying to search through two multidimensional arrays to find any elements in common in a given subarray and then put the results in a third array where the entire subarrays with similar elements are grouped together (not just the similar elements).
The data is imported from two CSVs:
require 'csv'
array = CSV.read('primary_csv.csv')
#=> [["account_num", "account_name", "primary_phone", "second_phone", "status],
#=> ["11111", "John Smith", "8675309", " ", "active"],
#=> ["11112", "Tina F.", "5551234", "5555678" , "disconnected"],
#=> ["11113", "Troy P.", "9874321", " ", "active"]]
# and so on...
second_array = CSV.read('customer_service.csv')
#=> [["date", "name", "agent", "call_length", "phone", "second_phone", "complaint"],
#=> ["3/1/15", "Mary ?", "Bob X", "5:00", "5551234", " ", "rude"],
#=> ["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309" , "says shes not a customer"]]
# and so on...
If any number is present as an element in a subarray on both primary.csv and customer_service.csv, I want that entire subarray (as opposed to just the common elements), put into a third array, results_array. The desire output based upon the above sample is:
results_array = [["11111", "John Smith", "8675309", " ", "active"],
["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309" , "says shes not a customer"]] # and so on...
I then want to export the array into a new CSV, where each subarray is its own row of the CSV. I intend to iterate over each subarray by joining it with a , to make it comma delimited and then put the results into a new CSV:
results_array.each do {|j| j.join(",")}
File.open("results.csv", "w") {|f| f.puts results_array}
#=> 11111,John Smith,8675309, ,active
#=> 3/2/15,Mrs. Smith,Stew,1:45,9995678,8675309,says shes not a customer
# and so on...
How can I achieve the desired output? I am aware that the final product will look messy because similar data (for example, phone number) will be in different columns. But I need to find a way to generally group the data together.
Suppose a1 and a2 are the two arrays (excluding header rows).
Code
def combine(a1, a2)
h2 = a2.each_with_index
.with_object(Hash.new { |h,k| h[k] = [] }) { |(arr,i),h|
arr.each { |e| es = e.strip; h[es] << i if number?(es) } }
a1.each_with_object([]) do |arr, b|
d = arr.each_with_object([]) do |str, d|
s = str.strip
d.concat(a2.values_at(*h2[s])) if number?(s) && h2.key?(s)
end
b << d.uniq.unshift(arr) if d.any?
end
end
def number?(str)
str =~ /^\d+$/
end
Example
Here is your example, modified somewhat:
a1 = [
["11111", "John Smith", "8675309", "", "active" ],
["11112", "Tina F.", "5551234", "5555678", "disconnected"],
["11113", "Troy P.", "9874321", "", "active" ]
]
a2 = [
["3/1/15", "Mary ?", "Bob X", "5:00", "5551234", "", "rude"],
["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309", "surly"],
["3/7/15", "Cher", "Sonny", "7:45", "9874321", "8675309", "Hey Jude"]
]
combine(a1, a2)
#=> [[["11111", "John Smith", "8675309", "",
# "active"],
# ["3/2/15", "Mrs. Smith", "Stew", "1:45",
# "9995678", "8675309", "surly"],
# ["3/7/15", "Cher", "Sonny", "7:45",
# "9874321", "8675309", "Hey Jude"]
# ],
# [["11112", "Tina F.", "5551234", "5555678",
# "disconnected"],
# ["3/1/15", "Mary ?", "Bob X", "5:00",
# "5551234", "", "rude"]
# ],
# [["11113", "Troy P.", "9874321", "",
# "active"],
# ["3/7/15", "Cher", "Sonny", "7:45",
# "9874321", "8675309", "Hey Jude"]
# ]
# ]
Explanation
First, we define a helper:
def number?(str)
str =~ /^\d+$/
end
For example:
number?("8675309") #=> 0 ("truthy)
number?("3/1/15") #=> nil
Now index a2 on the values that represent numbers:
h2 = a2.each_with_index
.with_object(Hash.new { |h,k| h[k] = [] }) { |(arr,i),h|
arr.each { |e| es = e.strip; h[es] << i if number?(es) } }
#=> {"5551234"=>[0], "9995678"=>[1], "8675309"=>[1, 2], "9874321"=>[2]}
This says, for example, that the "numeric" field "8675309" is contained in elements at offsets 1 and 2 of a2 (i.e, for Mrs. Smith and Cher).
We can now simply run through the elements of a1 looking for matches.
The code:
arr.each_with_object([]) do |str, d|
s = str.strip
d.concat(a2.values_at(*h2[s])) if number?(s) && h2.key?(s)
end
steps through the elements of arr, assigning each to the block variable str. For example, if arr holds the first element of a1 str will in turn equals "11111", "John Smith", and so on. After s = str.strip, this says that if a s has a numerical representation and there is a matching key in h2, the (initially empty) array d is concatenated with the elements of a2 given by the value of h2[s].
After completing this loop we see if d contains any elements of a2:
b << d.uniq.unshift(arr) if d.any?
If it does, we remove duplicates, prepend the array with arr and save it to b.
Note that this allows one element of a2 to match multiple elements of a1.

Iterating over an array to create a nested hash

I am trying to create a nested hash from an array that has several elements saved to it. I've tried experimenting with each_with_object, each_with_index, each and map.
class Person
attr_reader :name, :city, :state, :zip, :hobby
def initialize(name, hobby, city, state, zip)
#name = name
#hobby = hobby
#city = city
#state = state
#zip = zip
end
end
steve = Person.new("Steve", "basketball","Dallas", "Texas", 75444)
chris = Person.new("Chris", "piano","Phoenix", "Arizona", 75218)
larry = Person.new("Larry", "hunting","Austin", "Texas", 78735)
adam = Person.new("Adam", "swimming","Waco", "Texas", 76715)
people = [steve, chris, larry, adam]
people_array = people.map do |person|
person = person.name, person.hobby, person.city, person.state, person.zip
end
Now I just need to turn it into a hash. One issue I am having is, when I'm experimenting with other methods, I can turn it into a hash, but the array is still inside the hash. The expected output is just a nested hash with no arrays inside of it.
# Expected output ... create the following hash from the peeps array:
#
# people_hash = {
# "Steve" => {
# "hobby" => "golf",
# "address" => {
# "city" => "Dallas",
# "state" => "Texas",
# "zip" => 75444
# }
# # etc, etc
Any hints on making sure the hash is a nested hash with no arrays?
This works:
person_hash = Hash[peeps_array.map do |user|
[user[0], Hash['hobby', user[1], 'address', Hash['city', user[2], 'state', user[3], 'zip', user[4]]]]
end]
Basically just use the ruby Hash [] method to convert each of the sub-arrays into an hash
Why not just pass people?
people.each_with_object({}) do |instance, h|
h[instance.name] = { "hobby" => instance.hobby,
"address" => { "city" => instance.city,
"state" => instance.state,
"zip" => instance.zip } }
end

Consolidate duplicate values of a certain key from an array of hashes into array

I have an array of hashes:
connections = [
{:name=>"John Doe", :number=>"5551234567", :count=>8},
{:name=>"Jane Doe", :number=>"5557654321", :count=>6},
{:name=>"John Doe", :number=>"5559876543", :count=>3}
]
If the :name value is a duplicate, as is the case with John Doe, it should combine the :number values into an array. The count is not important anymore, so the output should be in the following format:
{"John Doe"=>["5551234567","5559876543"],
"Jane Doe"=>["5557654321"]}
What I have so far is:
k = connections.inject(Hash.new{ |h,k| h[k[:name]] = [k[:number]] }) { |h,(k,v)| h[k] << v ; h }
But this only outputs
{"John Doe"=>["5559876543", nil], "Jane Doe"=>["5557654321", nil]}
This works:
connections.group_by do |h|
h[:name]
end.inject({}) do |h,(k,v)|
h.merge( { k => (v.map do |i| i[:number] end) } )
end
# => {"John Doe"=>["5551234567", "5559876543"], "Jane Doe"=>["5557654321"]}
Step by step...
connections is the same as in your post:
connections
# => [{:name=>"John Doe", :number=>"5551234567", :count=>8},
# {:name=>"Jane Doe", :number=>"5557654321", :count=>6}, {:name=>"John Doe",
# :number=>"5559876543", :count=>3}]
First we use group_by to combine the hash entries with the same :name:
connections.group_by do |h| h[:name] end
# => {"John Doe"=>[{:name=>"John Doe", :number=>"5551234567", :count=>8},
# {:name=>"John Doe", :number=>"5559876543", :count=>3}],
# "Jane Doe"=>[{:name=>"Jane Doe", :number=>"5557654321", :count=>6}]}
That's great, but we want the values of the result hash to be just the numbers that show up as values of the :number key, not the full original entry hashes.
Given just one of the list values, we can get the desired result this way:
[{:name=>"John Doe", :number=>"5551234567", :count=>8},
{:name=>"John Doe", :number=>"5559876543", :count=>3}].map do |i|
i[:number]
end
# => ["5551234567", "5559876543"]
But we want to do that to all of the list values at once, while keeping the association with their keys. It's basically a nested map operation, but the outer map runs across a Hash instead of an Array.
You can in fact do it with map. The only tricky part is that map on a Hash doesn't return a Hash, but an Array of nested [key,value] Arrays. By wrapping the call in a Hash[...] constructor, you can turn the result back into a Hash:
Hash[
connections.group_by do |h|
h[:name]
end.map do |k,v|
[ k, (v.map do |i| i[:number] end) ]
end
]
That returns the same result as my original full answer above, and is arguably clearer, so you might want to just use that version.
But the mechanism I used instead was inject. It's like map, but instead of just returning an Array of the return values from the block, it gives you full control over how the return value is constructed out of the individual block calls:
connections.group_by do |h|
h[:name]
end.inject({}) do |h,(k,v)|
h.merge( { k => (v.map do |i| i[:number] end) } )
end
That creates a new Hash, which starts out empty (the {} passed to inject), and passes it to the do block (where it shows up as h) along with the first key/value pair in the Hash returned by group_by. That block creates another new Hash with the single key passed in and the result of transforming the value as we did above, and merges that into the passed-in one, returning the new value - basically, it adds one new key/value pair to the Hash, with the value transformed into the desired form by the inner map. The new Hash is returned from the block, so it becomes the new value of h for the next time through the block.
(We could also just assign the entry into h directly with h[k] = v.map ..., but the block would then need to return h afterward as a separate statement, since it is the return value of the block, and not the value of h at the end of the block's execution, that gets passed to the next iteration.)
As an aside: I used do...end instead of {...} around my blocks to avoid confusion with the {...} used for Hash literals. There is no semantic difference; it's purely a matter of style. In standard Ruby style, you would use {...} for single-line blocks, and restrict do...end to blocks that span more than one line.
In one line:
k = connections.each.with_object({}) {|conn,result| (result[conn[:name]] ||= []) << conn[:number] }
More readable:
result = Hash.new {|h,k| h[k] = [] }
connections.each {|conn| result[conn[:name]] << conn[:number] }
result #=> {"John Doe"=>["5551234567", "5559876543"], "Jane Doe"=>["5557654321"]}
names = {}
connections.each{ |c| names[c[:name]] ||= []; names[c[:name]].push(c[:number]) }
puts names

Resources