Ruby select latest duplicated values from an array of hash - ruby
Let say I have this kind of array
a = [
{key: "cat", value: 1},
{key: "dog", value: 2},
{key: "mouse", value: 5},
{key: "rat", value: 3},
{key: "cat", value: 5},
{key: "rat", value: 2},
{key: "cat", value: 1},
{key: "cat", value: 1}
]
Let say I have this array, and want to get only the latest value found for "cat".
I know how to select all of them
like
a.select do |e|
e[:key] == "cat"
end
But I'm looking for a way to just get a selection of the last 3
desired result would be
[
{key: "cat", value: 5},
{key: "cat", value: 1},
{key: "cat", value: 1}
]
thanks!
In a comment on the question #Stefan suggested:
a.select { |e| e[:key] == "cat" }.last(3)
Provided a is not too large that is likely what you should use. However, if a is large, and especially if it contains many elements (hashes) h for which h[:key] #=> "cat", it likely would be more efficient to iterate backwards from the end of the array and terminate ("short-circuit") as soon as three elements h have been found for which h[:key] #=> "cat". This also avoids the construction of a potentially-large temporary array (a.select { |e| e[:key] == "cat" }).
One way to do that is as follows.
a.reverse_each.with_object([]) do |h,arr|
arr.insert(0,h) if h[:key] == "cat"
break arr if arr.size == 3
end
#=> [{:key=>"cat", :value=>5},
# {:key=>"cat", :value=>1},
# {:key=>"cat", :value=>1}]
See Array#reverse_each, Enumerator#with_object and Array#insert. Note that because reverse_each and with_object both return enumerators, chaining them produces an enumerator as well:
a.reverse_each.with_object([])
#=> #<Enumerator: #<Enumerator: [{:key=>"cat", :value=>1},
# ...
# {:key=>"cat", :value=>1}]:reverse_each>:with_object([])>
It might be ever-so-slightly faster to replace the block calculation with
arr << h if h[:key] == "cat"
break arr.reverse if arr.size == 3
If a contains fewer elements h for which h[:key] #=> "cat" an array arr will be returned for which arr.size < 3. It therefore is necessary to confirm that the array returned contains three elements.
This check must also be performed when #Stefan's suggested code is used, as (for example)
a.select { |e| e[:key] == "cat" }.last(99)
#=> [{:key=>"cat", :value=>1},
# {:key=>"cat", :value=>5},
# {:key=>"cat", :value=>1},
# {:key=>"cat", :value=>1}]
Related
Best practice for writing complex, three-part, interchangeable "uniq" ruby block
I have an array of hashes: array = [ {foo: 1, bar1: 2 bar2: 3, bar3: 4}, {foo: 2, bar1: 3 bar2: 4, bar3: 5}, {foo: 3, bar1: 4 bar2: 5, bar4: 6}, etc ] I want to eliminate some redundant results from this array. Specifically, I want to eliminate any results where foo, bar1, and bar2 are identical across multiple objects, which can easily be done like so: array.uniq! { |object| [object.foo, object.bar1, object.bar2] } However, there is an additional edge case where I must also eliminate one of the following objects, which I don't know how to solve: {foo: 1, bar1: 3 bar2: 2,...} {foo: 1, bar1: 2 bar2: 3,...} Specifically, bar1 and bar2 may be switched in some of the data, and I want to only have unique results where those two are collectively the same pair. (2, 3 should be considered redundant as 3, 2).
After fully writing up this question I realized I had an answer, but I'm not sure how ideal it is. I simply combined the two interchangeable variables into a single array and then sorted them, which guarantees that they will always be identical even if they two values are switched: array.uniq! { |object| [ object.foo, [object.bar1, object.bar2].sort ] } I'd love to know if anyone has better solutions. Also, unsurprisingly, inserting a uniq! method into a large sorting action is causing some performance issues, so I'm exploring ways to further optimize it by adding additional filters etc. This is all for a cache for an API endpoint.
Since you have special equality rules, it seems like the most performant solution would be to override the Object#hash and Object#eql? functions as these are what is used by Array#uniq. If you have millions of records this may well be necessary for adequate performance. require 'pp' class MyHash < Hash def hash # Note that the XOR operator is commutative, so the three values # can be in any order and still output the same hash. self[:foo].hash ^ self[:bar1].hash ^ self[:bar2].hash end def eql?(other) # I think this is a bit ugly, and welcome suggestions for better # performance and readability. self[:foo] == other[:foo] && ( self[:bar1] == other[:bar1] && self[:bar2] == other[:bar2] ) || ( self[:bar1] == other[:bar2] && self[:bar2] == other[:bar1] ) end end a = MyHash[foo: 10, bar1: 2, bar2: 3, ignored: 'a'] b = MyHash[foo: 10, bar1: 3, bar2: 2, ignored: 'b'] c = MyHash[foo: 20, bar1: 2, bar2: 3, ignored: 'c'] d = MyHash[foo: 20, bar1: 3, bar2: 2, ignored: 'd'] e = MyHash[foo: 2, bar1: 20, bar2: 3, ignored: 'e'] f = MyHash[foo: 3, bar1: 2, bar2: 20, ignored: 'f'] puts a.hash #=> 3556565295874809176 puts b.hash #=> 3556565295874809176 puts c.hash #=> 2914353897173641784 puts d.hash #=> 2914353897173641784 puts e.hash #=> 2914353897173641784 puts f.hash #=> 2914353897173641784 array = [a, b, c, d, e, f] pp array #=> [{:foo=>10, :bar1=>2, :bar2=>3, :ignored=>"a"}, # {:foo=>10, :bar1=>3, :bar2=>2, :ignored=>"b"}, # {:foo=>20, :bar1=>2, :bar2=>3, :ignored=>"c"}, # {:foo=>20, :bar1=>3, :bar2=>2, :ignored=>"d"}, # {:foo=>2, :bar1=>20, :bar2=>3, :ignored=>"e"}, # {:foo=>3, :bar1=>2, :bar2=>20, :ignored=>"f"}] pp array.uniq #=> [{:foo=>10, :bar1=>2, :bar2=>3, :ignored=>"a"}, # {:foo=>20, :bar1=>2, :bar2=>3, :ignored=>"c"}, # {:foo=>2, :bar1=>20, :bar2=>3, :ignored=>"e"}, # {:foo=>3, :bar1=>2, :bar2=>20, :ignored=>"f"}] If you just have thousands of records then the solution you proposed should be completely fine. array.uniq! { |object| [ object[:foo], [object[:bar1], object[:bar2]].sort ] }
Convert Array of Hashes to a Hash
I'm trying to convert the following: dep = [ {id: 1, depen: 2}, {id: 1, depen: 3}, {id: 3, depen: 4}, {id: 5, depen: 3}, {id: 3, depen: 6} ] Into a single hash: # {1=>2, 1=>3, 3=>4, 5=3, 3=>6} I tried a solution I found on another question: dep.each_with_object({}) { |g,h| h[g[:id]] = g[:dep_id] } However, the output removed elements and gave me: #{1=>3, 3=>6, 5=>2} where the last element is also incorrect.
You cannot have a hash like {1=>2, 1=>3, 3=>4, 5=3, 3=>6}. All keys of a hash mst have be unique. If you want to get a hash mapping each id to a list of dependencies, you can use: result = dep. group_by { |obj| obj[:id] }. transform_values { |objs| objs.map { |obj| obj[:depen] } } Or result = dep.reduce({}) do |memo, val| memo[val[:id]] ||= [] memo[val[:id]].push val[:depen] memo end which produce {1=>[2, 3], 3=>[4, 6], 5=>[3]}
Sort hash by key which is a string
Assuming I get back a string: "27,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,12,17,17,41,17,17,17,17,17,17,17,17,17,17,17,17,17,26,26,26,26,26,26,26,26,26,29,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,40,48,28,28,28,28,28,28,28,28,28,28,28,28,28,28,29,29,29,29,29,29,29,29,29,29,29,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,34,34,34,34,34,34,36,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,40,40,40,40,40,40,40,40,41,41,41,41,41,41,41,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,43,43,43,43,43,43,43,43,43,43,43,43,43,44,44,44,44,48,49,29,41,6,30,11,29,29,36,29,29,36,29,43,1,29,29,29,1,41" I turn that into an array by calling str.split(',') Then turning it into a hash by calling arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } I would get back a hash that looks like {"1"=>2, "6"=>1, "39"=>23, "36"=>23, "34"=>39, "32"=>31, "30"=>18, "3"=>8, "2"=>10, "28"=>36, "29"=>21, "26"=>41, "27"=>48, "49"=>1, "44"=>4, "43"=>14, "42"=>34, "48"=>2, "40"=>9, "41"=>10, "11"=>1, "17"=>15, "12"=>1} However, I'd like to sort that hash by key. I've tried the solutions listed here. I believe my problem is related to the fact they keys are strings. The closest I got was using Hash[h.sort_by{|k,v| k.to_i}]
Hashes shouldn't be treated as a sorted data structure. They have other advantages and use case as to return their values sequentially. As Mladen Jablanović already pointed out a array of tuples might be the better data structure when you need a sorted key/value pair. But in current versions of Ruby there actually exists a certain order in which key/value pairs are returned when you call for example each on a hash and that is the order of insertion. Using this behavior you can just build a new hash and insert all key/value pairs into that new hash in the order you want them to be. But keep in mind that the order will break when you add more entries later on. string = "27,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,12,17,17,41,17,17,17,17,17,17,17,17,17,17,17,17,17,26,26,26,26,26,26,26,26,26,29,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,40,48,28,28,28,28,28,28,28,28,28,28,28,28,28,28,29,29,29,29,29,29,29,29,29,29,29,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,34,34,34,34,34,34,36,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,40,40,40,40,40,40,40,40,41,41,41,41,41,41,41,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,43,43,43,43,43,43,43,43,43,43,43,43,43,44,44,44,44,48,49,29,41,6,30,11,29,29,36,29,29,36,29,43,1,29,29,29,1,41" sorted_number_count_tupels = string.split(','). group_by(&:itself). map { |k, v| [k, v.size] }. sort_by { |(k, v)| k.to_i } #=> [["1",2],["2",10],["3",8],["6",1],["11",1],["12",1],["17",15],["26",41],["27",48],["28",36],["29",21],["30",18],["32",31],["34",39],["36",23],["39",23],["40",9],["41",10],["42",34],["43",14],["44",4],["48",2],["49",1]] sorted_number_count_hash = sorted_number_count_tupels.to_h #=> { "1" => 2, "2" => 10, "3" => 8, "6" => 1, "11" => 1, "12" => 1, "17" => 15, "26" => 41, "27" => 48, "28" => 36, "29" => 21, "30" => 18, "32" => 31, "34" => 39, "36" => 23, "39" => 23, "40" => 9, "41" => 10, "42" => 34, "43" => 14, "44" => 4, "48" => 2, "49" => 1}
Suppose you started with str = "27,2,2,2,41,26,26,26,48,48,41,6,11,1,41" and created the following hash h = str.split(',').inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } #=> {"27"=>1, "2"=>3, "41"=>3, "26"=>3, "48"=>2, "6"=>1, "11"=>1, "1"=>1} I removed compact because the array str.split(',') contains only (possibly empty) strings, no nils. Before continuing, you may want to change this last step to h = str.split(/\s*,\s*/).each_with_object(Hash.new(0)) { |e,h| h[e] += 1 } #=> {"27"=>1, "2"=>3, "41"=>3, "26"=>3, "48"=>2, "6"=>1, "11"=>1, "1"=>1} Splitting on the regex allows for the possibility of one or more spaces before or after each comma, and Enumerable#each_with_object avoids the need for that pesky ; h. (Notice the block variables are reversed.) Then h.sort_by { |k,_| k.to_i }.to_h #=> {"1"=>1, "2"=>3, "6"=>1, "11"=>1, "26"=>3, "27"=>1, "41"=>3, "48"=>2} creates a new hash that contains h's key-value pairs sorted by the integer representations of the keys. See Hash#sort_by. Notice we've created two hashes. Here's a way to do that by modifying h in place. h.keys.sort_by(&:to_i).each { |k| h[k] = h.delete(k) } #=> ["1", "2", "6", "11", "26", "27", "41", "48"] (each always returns the receiver) h #=> {"1"=>1, "2"=>3, "6"=>1, "11"=>1, "26"=>3, "27"=>1, "41"=>3, "48"=>2} Lastly, another alternative is to sort str.split(',') before creating the hash. str.split(',').sort_by(&:to_i).each_with_object(Hash.new(0)) { |e,h| h[e] += 1 } #=> {"1"=>1, "2"=>3, "6"=>1, "11"=>1, "26"=>3, "27"=>1, "41"=>3, "48"=>2}
Notes compact String#split cannot return a nil element. compact won't be useful, here. split might return an empty string, though : p "1,,2,3".split(',') # ["1", "", "2", "3"] p "1,,2,3".split(',').compact # ["1", "", "2", "3"] p "1,,2,3".split(',').reject(&:empty?) # ["1", "2", "3"] inject If you have to use two statements inside inject block, each_with_object might be a better idea : arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } can be rewritten : arr.compact.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 } Hash or Array? If you need to sort results, an Array of pairs might be more suitable than a Hash. String or Integer? If you accept to have an integer as key, it might make your code easier to write. Refactoring Here's a possibility to rewrite your code : str.split(',') .reject(&:empty?) .map(&:to_i) .group_by(&:itself) .map { |k, v| [k, v.size] } .sort It outputs : [[1, 2], [2, 10], [3, 8], [6, 1], [11, 1], [12, 1], [17, 15], [26, 41], [27, 48], [28, 36], [29, 21], [30, 18], [32, 31], [34, 39], [36, 23], [39, 23], [40, 9], [41, 10], [42, 34], [43, 14], [44, 4], [48, 2], [49, 1]] If you really want a Hash, you can add .to_h : {1=>2, 2=>10, 3=>8, 6=>1, 11=>1, 12=>1, 17=>15, 26=>41, 27=>48, 28=>36, 29=>21, 30=>18, 32=>31, 34=>39, 36=>23, 39=>23, 40=>9, 41=>10, 42=>34, 43=>14, 44=>4, 48=>2, 49=>1}
You can assign the arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } to a variable and sort it by key: num = arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } num.keys.sort That would sort the hash by key.
A Ruby hash will keep the order of keys added. If the array is small enough to sort I would just change str.split(','). to str.split(',').sort_by(&:to_i) in order to get the values, and therefore also you hash sorted...
Join an array with a block natively
Is there a native way to join all elements of an array into a unique element like so: [ {a: "a"}, {b: "b"} ].join do | x, y | x.merge(y) end To output something like: { a: "a", b: "b" } The fact that I used hashes into my array is an example, I could say: [ 0, 1, 2, 3 ].join do | x, y | x + y end Ends up with 6 as a value.
Enumerable#inject covers both of these cases: a = [{a: "a"}, {b: "b"}] a.inject(:merge) #=> {:a=>"a", :b=>"b"} b = [0, 1, 2, 3] b.inject(:+) #=> 6 inject "sums" an array using the provided method. In the first case, the "addition" of the sum and the current element is done by merging, and in the second case, through addition. If the array is empty, inject returns nil. To make it return something else, specify an initial value (thanks #Hellfar): [].inject(0, :+) #=> 0
[ {a: "a"}, {b: "b"} ].inject({}){|sum, e| sum.merge e}
Accessing values in nested hash
I'm trying to filter nested hashes and pull various keys and values. Here is the hash I'm looking at: exp = { fam: {cty: "bk", ins: 3}, spec: {cty: "man", ins: 2}, br: {cty: "qns", ins: 1}, aha: {cty: "man", ins: 0} } I'm trying to find all the hash keys where cty is "man". I'd like to run something where the result is the following hash: e = { spec: {cty: "man", ins: 2}, aha: {cty: "man", ins: 0} } I tried this and it seems like it almost works: exp.each do |e, c, value| c = :cty.to_s value = "man" if e[c] == value puts e end end But the result I get is: => true Instead of what I'm looking for: e = { spec: {cty: "man", ins: 2}, aha: {cty: "man", ins: 0} }
To start, you need to understand what iterating over a hash will give you. Consider this: exp = { fam: {cty: "bk", ins: 3}, spec: {cty: "man", ins: 2}, br: {cty: "qns", ins: 1}, aha: {cty: "man", ins: 0} } exp.map { |e, c, value| [e, c, value] } # => [[:fam, {:cty=>"bk", :ins=>3}, nil], [:spec, {:cty=>"man", :ins=>2}, nil], [:br, {:cty=>"qns", :ins=>1}, nil], [:aha, {:cty=>"man", :ins=>0}, nil]] This is basically what you're doing as you loop and Ruby passes the block the key/value pairs. You're telling Ruby to give you the current hash key in e, the current hash value in c and, since there's nothing else being passed in, the value parameter becomes nil. Instead, you need a block variable for the key, one for the value: exp.map { |k, v| [k, v] } # => [[:fam, {:cty=>"bk", :ins=>3}], [:spec, {:cty=>"man", :ins=>2}], [:br, {:cty=>"qns", :ins=>1}], [:aha, {:cty=>"man", :ins=>0}]] Notice that the nil values are gone. Rewriting your code taking that into account, plus refactoring it for simplicity: exp = { fam: {cty: 'bk', ins: 3}, spec: {cty: 'man', ins: 2}, br: {cty: 'qns', ins: 1}, aha: {cty: 'man', ins: 0} } exp.each do |k, v| if v[:cty] == 'man' puts k end end # >> spec # >> aha Now it's returning the keys you want, so it becomes easy to grab the entire hashes. select is the appropriate method to use when you're trying to locate specific things: exp = { fam: {cty: 'bk', ins: 3}, spec: {cty: 'man', ins: 2}, br: {cty: 'qns', ins: 1}, aha: {cty: 'man', ins: 0} } e = exp.select { |k, v| v[:cty] == 'man' } # => {:spec=>{:cty=>"man", :ins=>2}, :aha=>{:cty=>"man", :ins=>0}} Older versions of Ruby didn't maintain hash output from the hash iterators so we'd have to coerce back to a hash: e = exp.select { |k, v| v[:cty] == 'man' }.to_h # => {:spec=>{:cty=>"man", :ins=>2}, :aha=>{:cty=>"man", :ins=>0}}
e = {} exp.each do |k,v| if v[:cty] == "man" e[k] = v end end p e or even e = exp.select do |k,v| v[:cty] == "man" end
As the Tin Man pointed out, there are two parameters you can pass in to a block (the code between do and end in this case) when iterating through a hash --- one for its key and the other for its value. To iterate though a hash (and print out its values) h = { a: "hello", b: "bonjour", c: "hola" } using .each method, you can do: h.each do |key, value| puts value end The result will be: hello bonjour hola => {:a=>"hello", :b=>"bonjour", :c=>"hola"} Please note that the value "returned" is the hash we iterated through, which evaluates to true in ruby. (Anything other than nil or false will evaluate to true in Ruby. See What evaluates to false in Ruby?) This is important because the reason you got true in your code (which in fact should be {:fam=>{:cty=>"bk", :ins=>3}, :spec=>{:cty=>"man", :ins=>2}, :br=>{:cty=>"qns", :ins=>1}, :aha=>{:cty=>"man", :ins=>0}}), rather than the parsed hash you wanted is that the value returned by .each method for a hash is the hash itself (which evaluates to true). Which is why osman created an empty hash e = {} so that, during each iteration of the hash, we can populate the newly created hash e with the key and value we want. This explains why he can do: e = exp.select do |k,v| v[:cty] == "man" end Here the code depends upon select method being able to return a new hash with the key and value we want (rather than the original hash as is the case for .each method). But if you do e = exp.each do |k,v| v[:cty] == "man" end The variable e will be assigned the original hash exp itself, which is not what we want. Therefore it's very important to understand what the returned value is when applying a method. For more information about return values (and Ruby in general), I highly recommend the free e-book from LaunchSchool "Introduction to Programming with Ruby" (https://launchschool.com/books/ruby). This not only helped me recognise the importance of return values, but also gave me a solid foundation on Ruby prigramming in general, which is really useful if you are planning to learn Ruby on Rails (which I'm doing now :)).
Simplest way to digging into Nested hash is:- class Hash def deep_find(key, object=self, found=nil) if object.respond_to?(:key?) && object.key?(key) return object[key] elsif object.is_a? Enumerable object.find { |*a| found = deep_find(key, a.last) } return found end end end Hash.deep_find(key)