Counting multiple fields in a hash - ruby

Problem:
I need to extract certain keys and count them in a hash, as a sample consider:
data = [{"name"=>"name1", "priority"=>"1", "owner"=>"test3"},
{"name"=>"name1", "priority"=>"1", "owner"=>"test4"},
{"name"=>"name2", "priority"=>"1", "owner"=>"test5"},
{"name"=>"name2", "priority"=>"2", "owner"=>"test5"},
{"name"=>"nae954me2", "priority"=>"2", "owner"=>"test5"}]
I want to count the number of records per each [id (extracted from name) and priority] so that at the end I will have something like:
#{{"priority"=>"1", "id"=>"name1"}=>2, {"priority"=>"1", "id"=>"name2"}=>1, {"priority"=>"2", "id"=>"name2"}=>1}
I'm doing the following but I have a feeling that I'm overcomplicating it:
#!/usr/bin/env ruby
data = [{"name"=>"name1", "priority"=>"1", "owner"=>"test3"},
{"name"=>"name1", "priority"=>"1", "owner"=>"test4"},
{"name"=>"name2", "priority"=>"1", "owner"=>"test5"},
{"name"=>"name2", "priority"=>"2", "owner"=>"test5"},
{"name"=>"nae954me2", "priority"=>"2", "owner"=>"test5"}]
# (1) trash some keys, just because I don't need them
data.each do |d|
d.delete 'owner'
# in the real data I have about 4 or 5 that I'm trashing
d['id'] = d['name'].scan(/[a-z][a-z][a-z][a-z][0-9]/)[0] # only valid ids
d.delete 'name'
end
puts data
#output:
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name2"}
#{"priority"=>"2", "id"=>"name2"}
#{"priority"=>"2", "id"=>nil}
# (2) reject invalid keys
data = data.reject { |d| d['id'].nil? }
puts data
#output:
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name2"}
#{"priority"=>"2", "id"=>"name2"}
# (3) count
counts = Hash.new(0)
data.each do |d|
counts[d] += 1
end
puts counts
#{{"priority"=>"1", "id"=>"name1"}=>2, {"priority"=>"1", "id"=>"name2"}=>1, {"priority"=>"2", "id"=>"name2"}=>1}
any suggestions on improving my method of counting?

There are many ways to do this. (You may have noticed that I've done a lot of editing of my answer, explaining in some detail how a method works, only to realize there's a better way to do it, so out comes the machete.) Here are two solutions. The first was inspired by the approach you took, but I've tried to package it to be more Ruby-like. I'm not sure what constitutes a valid "name", so I've put that determination in a separate method that can be easily changed.
Code
def name_valid?(name)
name[0..3] == "name"
end
data.each_with_object(Hash.new(0)) {|h,g|
(g[{"id"=>h["name"],"priority"=>h["priority"]}]+=1) if name_valid?(h["name"])}
#=> {{"id"=>"name1", "priority"=>"1"}=>2,
# {"id"=>"name2", "priority"=>"1"}=>1,
# {"id"=>"name2", "priority"=>"2"}=>1}
Explanation
Enumerable#each_with_object creates an initially-empty hash with default value zero that is represented by the block variable g. g is built by adding hash elements created from the the elements of data:
g[{"id"=>h["name"],"priority"=>h["priority"]}]+=1
If the hash g has the key
{"id"=>h["name"],"priority"=>h["priority"]}
the value associated with the key is incremented by one. If h does not have this key,
g[{"id"=>h["name"],"priority"=>h["priority"]}]
is set equal to zero before
g[{"id"=>h["name"],"priority"=>h["priority"]}]+=1
is invoked, so the value becomes 1.
Alternative Method
Code
data.each_with_object({}) do |h,g|
hash = { { "id"=>h["name"], "priority"=>h["priority"] } => 1 }
g.update(hash) { |k, vg, _| vg + 1 } if name_valid?(h["name"])
end
#=> {{"id"=>"name1", "priority"=>"1"}=>2,
# {"id"=>"name2", "priority"=>"1"}=>1,
# {"id"=>"name2", "priority"=>"2"}=>1}
Explanation
Here, I've used Hash#update (aka Hash#merge!) to merge each element of data (a hash) into the initially-empty hash h (provided the value of "name" is valid). update's block
{ |k, vg, _| vg + 1 }
is invoked if and only if the merged hash (g) and the merging hash (hash) have the same key, k, in which case the block returns the value of the key. Note the third block variable is the value for the key k for the hash hash. As we do not use that value, I've replaced it with the placeholder _.

Depending on what you mean by "something like" this might do the trick:
data.group_by { |h| [h["name"], h["priority"]] }.map { |k, v| { k => v.size } }
=> [{["name1", "1"]=>2}, {["name2", "1"]=>1}, {["name2", "2"]=>1}, {["nae954me2", "2"]=>1}]

Related

How would I remove a nested value from a hash that occurs multiple times

I have a hash which is say,
hash = {"lock_version"=>4,
"exhibition_quality"=>false,
"within"=>["FID6", "S2"],
"repository"=>
{"ref"=>"/repositories/2",
"repository"=>{"ref"=>"/repositories/2",
"within"=>["FID6", "S2"]
}
}
This hash is been passed through another function. How can I delete from "within"=>["FID6", "S5"] a value with the pattern FID (in this example FID6) without mutating the original hash ash well? This is just a shortened version of the hash but there are other instances where the hash is super long and "within" key value pair appears multiple times. Note: This program is using ruby 2.4
I have been asked to clarify how this question is different from a previous question I asked so this is a little bit of more clarification because I've done more work on it since. This specific key value pair "within"=>["FID6", "S2"], is now appearing deeply nested (the entire hash is about 2 pages long, hence why I didn't copy and paste it). I can't split the hash where the "repository" is because it appears nested in other key values. What I'm asking now is just is there a way to match that within key value no matter now deep it. Thanks everyone for the suggestions.
Code
def defidder(h)
h.each_with_object({}) do |(k,v),h|
h[k] =
case v
when Array
v.reject { |s| s.match?(/\AFID\d+\z/) } if k == "within"
when Hash
defidder(v)
else
v
end
end
end
Example
I've added another layer of hash nesting to the example given in the question:
hash = {
"lock_version"=>4,
"exhibition_quality"=>false,
"within"=>["FID6", "S2"],
"repository"=>{
"ref"=>"/repositories/2",
"repository"=>{"ref"=>"/repositories/2"},
"within"=>["FID6", "S2"],
"1more"=>{ a: 1, "within"=>["FID999", "S7"] }
}
}
defidder hash
#=> {
# "lock_version"=>4,
# "exhibition_quality"=>false, "within"=>["S2"],
# "repository"=>{
# "ref"=>"/repositories/2",
# "repository"=>{"ref"=>"/repositories/2"},
# "within"=>["S2"],
# "1more"=>{:a=>1, "within"=>["S7"]
# }
# }
We may verify hash was not mutated.
hash
#=> {
# "lock_version"=>4,
# "exhibition_quality"=>false,
# "within"=>["FID6", "S2"],
# "repository"=>{
# "ref"=>"/repositories/2",
# "repository"=>{"ref"=>"/repositories/2"},
# "within"=>["FID6", "S2"],
# "1more"=>{ a: 1, "within"=>["FID999", "S7"] }
# }
# }
Assuming:
Only nested hashes and no hashes in arrays.
No objects in hash.
This works with your example and works with examples I created with the assumptions above:
cloned_hash = Marshal.load(Marshal.dump(hash))
def remove_key_value_pair(key, value, hash)
if hash.key?(key) && hash[key] == value
hash.delete(key)
end
hash.each{|k, v| remove_key_value_pair(key, value, v) if v.is_a? Hash }
end
# call with
remove_key_value_pair("within", ["FID6", "S2"], cloned_hash)
This will run into a SystemStackError if the hash has a lot of nesting.

How to compare ruby hash with same key?

I have two hashes like this:
hash1 = Hash.new
hash1["part1"] = "test1"
hash1["part2"] = "test2"
hash1["part3"] = "test3"
hash2 = Hash.new
hash2["part1"] = "test1"
hash2["part2"] = "test2"
hash2["part3"] = "test4"
Expected output: part3
Basically, I want to iterate both of the hashes and print out "part3" because the value for "part3" is different in the hash. I can guarantee that the keys for both hashes will be the same, the values might be different. I want to print out the keys when their values are different?
I have tried iterating both hashes at once and comparing values but does not seem to give the right solution.
The cool thing about Ruby is that it is so high level that it is often basically English:
Print keys from the first hash if the values in the two hashes are different:
hash1.keys.each { |key| puts key if hash1[key] != hash2[key] }
Select the first hash keys that have different values in the two hashes and print each of them:
hash1.keys.select { |key| hash1[key] != hash2[key] }.each { |key| puts key }
Edit: I'll leave this should it be of interest, but #ndn's solution is certainly better.
p hash1.merge(hash2) { |_,v1,v2| v1==v2 }.reject { |_,v| v }.keys
# ["part3"]
hash1["part1"] = "test99"
p hash1.merge(hash2) { |_,v1,v2| v1==v2 }.reject { |_,v| v }.keys
# ["part1", "part3"]
This uses the form of Hash#merge that employs a block (here { |_,v1,v2| v1==v2 }) to determine the values of keys that are present in both hashes being merged. See the doc for an explanation of the three block variables, _, v1 and v2. The first block variable equals the common key. I've used the local variable _ for that, as is customary when the variable is not used in the block calculation.
The steps (for the original hash1):
g = hash1.merge(hash2) { |_,v1,v2| v1==v2 }
#=> {"part1"=>true, "part2"=>true, "part3"=>false}
h = g.reject { |_,v| v }
#=> {"part3"=>false}
h.keys
#=> ["part3"]
The obvious way is that of ndn, here a solution without blocks by converting to arrays, joining them and subtracting the elements that are the same, followed by converting back to hash and asking for the keys.
Next time it would be better to include what you tried so far.
((hash1.to_a + hash2.to_a) - (hash1.to_a & hash2.to_a)).to_h.keys
# ["part3"]

Ruby iterate through hash and compare value pairs

My Ruby assignment is to iterate through a hash and return the key associated with the lowest value, without using any of the following methods:
#keys #values #min #sort #min_by
I don't understand how to iterate through the hash and store each pair as it comes through, compare it to the last pair that came through, and return the lowest key. This is my code to show you my thought process, but it of course does not work. Any thoughts on how to do this? Thanks!
def key_for_min_value(name_hash)
index = 0
lowest_hash = {}
name_hash.collect do |key, value|
if value[index] < value[index + 1]
lowest = value
index = index + 1
key_for_min_value[value]
return lowest
end
end
end
Track min_value and key_for_min_value. Iterate through the hash, and any time the current value is lower than min_value, update both of these vars. At the end of the loop, return key_for_min_value.
I didn't include sample code because, hey, this is homework. :) Good luck!
One way to do it is transforming our hash into an array;
def key_for_min_value(name_hash)
# Convert hash to array
name_a = name_hash.to_a
# Default key value
d_value= 1000
d_key= 0
# Iterate new array
name_a.each do |i|
# If current value is lower than default, change value&key
if i[1] < d_value
d_value = i[1]
d_key = i[0]
end
end
return d_key
end
You might need to change d_value to something higher or find something more creative :)
We can use Enumerable#reduce method to compare entries and pick the smallest value. Each hash entry gets passed in as an array with 2 elements in reduce method, hence, I am using Array#first and Array#last methods to access key and values.
h = {"a" => 1, "b" => 2, "c" => 0}
p h.reduce{ |f, s| f.last > s.last ? s : f }.first
#=> "c"

Comparing values of one hash to many hashes to get inverse document frequency in ruby

I'm trying to find the inverse document frequency for a categorization algorithm and am having trouble getting it the way that my code is structured (with nested hashes), and generally comparing one hash to many hashes.
My training code looks like this so far:
def train!
#data = {}
#all_books.each do |category, books|
#data[category] = {
words: 0,
books: 0,
freq: Hash.new(0)
}
books.each do |filename, tokens|
#data[category][:words] += tokens.count
#data[category][:books] += 1
tokens.each do |token|
#data[category][:freq][token] += 1
end
end
#data[category][:freq].map { |k, v| v = (v / #data[category][:freq].values.max) }
end
end
Basically, I have a hash with 4 categories (subject to change), and for each have word count, book count, and a frequency hash which shows term frequency for the category. How do I get the frequency of individual words from one category compared against the frequency of the words shown in all categories? I know how to do the comparison for one set of hash keys against another, but am not sure how to loop through a nested hash to get the frequency of terms against all other terms, if that makes sense.
Edit to include predicted outcome -
I'd like to return a hash of nested hashes (one for each category) that shows the word as the key, and the number of other categories in which it appears as the value. i.e. {:category1 = {:word => 3, :other => 2, :third => 1}, :category2 => {:another => 1, ...}} Alternately an array of category names as the value, instead of the number of categories, would also work.
I've tried creating a new hash as follows, but it's turning up empty:
def train!
#data = {}
#all_words = Hash.new([]) #new hash for all words, default value is empty array
#all_books.each do |category, books|
#data[category] = {
words: 0,
books: 0,
freq: Hash.new(0)
}
books.each do |filename, tokens|
#data[category][:words] += tokens.count
#data[category][:books] += 1
tokens.each do |token|
#data[category][:freq][token] += 1
#all_words[token] << category #should insert category name if the word appears, right?
end
end
#data[category][:freq].map { |k, v| v = (v / #data[category][:freq].values.max) }
end
end
If someone can help me figure out why the #all_words hash is empty when the code is run, I may be able to get the rest.
I haven't gone through it all, but you certainly have an error:
#all_words[token] << category #should insert category name if the word appears, right?
Nope. #all_words[token] will return empty array, but not create a new slot with an empty array, like you're assuming. So that statement doesn't modify the #all_words hash at all.
Try these 2 changes and see if it helps:
#all_words = {} # ditch the default value
...
(#all_words[token] ||= []) << category # lazy-init the array, and append

How do I merge two arrays of hashes based on same hash key value?

So I have two arrays of hashes:
a = [{"b"=>123,"c"=>456}, {"b"=>456,"c"=>555}]
b = [{"c"=>456,"d"=>789}, {"b"=>222,"c"=>444}]
How would I concatenate them with the condition that the value of the key c is equivalent in both a and b? Meaning I want to be able to concatenate with the condition of a['c'] == b['c']
This is the result I want to get:
final_array = [{"b"=>123,"c"=>456,"d"=>789}, {"b"=>456,"c"=>555}, {"b"=>222,"c"=>444}]
a = [{"b"=>123,"c"=>456}, {"b"=>456,"c"=>555}]
b = [{"c"=>456,"d"=>789}, {"b"=>222,"c"=>444}]
p a.zip(b).map{|h1,h2| h1["c"] == h2["c"] ? h1.merge(h2) : [h1 ,h2]}.flatten
# => [{"b"=>123, "c"=>456, "d"=>789}, {"b"=>456, "c"=>555}, {"b"=>222, "c"=>444}]
a = [{"b"=>123,"c"=>456}, {"b"=>456,"c"=>555}]
b = [{"c"=>456,"d"=>789}, {"b"=>222,"c"=>444}]
def merge_hashes_with_equal_values(array_of_hashes, key)
array_of_hashes.sort { |a,b| a[key] <=> b[key] }.
chunk { |h| h[key] }.
each_with_object([]) { |h, result| result << h.last.inject(&:merge) }
end
p merge_hashes_with_equal_values(a + b, 'c')
# => [{"b"=>222, "c"=>444}, {"c"=>456, "d"=>789, "b"=>123}, {"b"=>456, "c"=>555}]
Concatenate the arrays first, and pass it to the method with the hash key to combine on. Sorting that array then places the hashes to merge next to each other in another array, which makes merging a bit easier to program for. Here I chose #chunk to handle detection of continuous runs of hashes with equal keys to merge, and #each_with_object to compile the final array.
Since this method takes one array to work on, the length of the starting arrays does not need to be equal, and the ordering of those arrays does not matter. A downside is that the keys to operate on must contain a sortable value (no nils, for example).
Here is yet another approach to the problem, this one using a hash to build the result:
def merge_hashes_with_equal_values(array_of_hashes, key)
result = Hash.new { |h,k| h[k] = {} }
remainder = []
array_of_hashes.each_with_object(result) do |h, answer|
if h.has_key?(key)
answer[h.fetch(key)].merge!(h)
else
remainder << h
end
end.values + remainder
end
Enumerable#flat_map and Hash#update are the perfect methods for this purpose :
a = [{"b"=>123,"c"=>456}, {"b"=>456,"c"=>555}]
b = [{"c"=>456,"d"=>789}, {"b"=>222,"c"=>444}]
p a.zip(b).flat_map{|k,v| next k.update(v) if k["c"] == v["c"];[k,v]}
# >> [{"b"=>123, "c"=>456, "d"=>789}, {"b"=>456, "c"=>555}, {"b"=>222, "c"=>444}]

Resources