Sorting specific objects in an array - ruby

I'm new to Ruby and looking to sort only certain items in my collection.
For example if I have the following array. I only want to sort the objects that contains the property type: 'sort'
object = [{
type: 'sort',
id: 3
}, {
type: 'notsort',
id: 4
}, {
type: 'sort',
id: 1
}, {
type: 'sort',
id: 0
}
]
I need the order to map directly to the id map below.
sortIdOrder = [0, 1, 3]
The end result should look like:
object = [{
type: 'notsort',
id: 4
}, {
type: 'sort',
id: 0
},{
type: 'sort',
id: 1
}, {
type: 'sort',
id: 3
}]
As you can see the array is sorted by id based on the sortIdOrder . The notsort type can either be at the end or start.

Sorting can be expensive, so one should not sort when the desired order is known, as it is here.
I've assumed that the values :id are unique, as the question would not make sense if they were not.
First partition the hashes into those to be sorted and the rest.
sortees, nonsortees = object.partition { |h| h[:type] == 'sort' }
#=> [[{:type=>"sort", :id=>3}, {:type=>"sort", :id=>1}, {:type=>"sort", :id=>0}],
# [{:type=>"notsort", :id=>4}]]
so
sortees
#=> [{:type=>"sort", :id=>3}, {:type=>"sort", :id=>1}, {:type=>"sort", :id=>0}]
nonsortees
#=> [{:type=>"notsort", :id=>4}]
I'll put the elements of sortees in the desired order then concatenate that array with nonsortees, putting the hashes that are not to be sorted at the end.
I order the elements of sortees by creating a hash with one key-value pair g[:id]=>g for each element g (a hash) of sortees. That allows me to use Hash#values_at to pull out the desired hashes in the specified order.
sortees.each_with_object({}) { |g,h| h[g[:id]] = g }.
values_at(*sortIdOrder).
concat(nonsortees)
#=> [{:type=>"sort", :id=>0}, {:type=>"sort", :id=>1}, {:type=>"sort", :id=>3},
# {:type=>"notsort", :id=>4}]
Note that
sortees.each_with_object({}) { |g,h| h[g[:id]] = g }
#=> {3=>{:type=>"sort", :id=>3}, 1=>{:type=>"sort", :id=>1},
# 0=>{:type=>"sort", :id=>0}}

A not very performant one-liner:
object.sort_by{|o| sortIdOrder.index(o[:id]) || -1}
This makes the notsort objects appear at the head of sorted array. It's an O(m * nlog(n)) algorithm, where n is the size of object and m is the size of sortIdOrder. This is faster when your object and sortIdOrder are small.
A more performant one for large arrays is
order = sortIdOrder.each.with_index.with_object(Hash.new(-1)) {|(id, index), h| h[id] = index}
object.sort_by{|o| order[o[:id]]}
This is an O(m + nlog(n)) algorithm but requires more memory.

you can use sort with a block that sorts by :type, then :id.
object.sort {|a, b| [a[:type], a[:id]] <=> [b[:type], b[:id]] }
[{:type=>"notsort", :id=>4},
{:type=>"sort", :id=>0},
{:type=>"sort", :id=>1},
{:type=>"sort", :id=>3}]

I'd go with something like this:
object.sort_by do |o|
[
(o[:type] == :sort) ? 0 : 1,
sortIdOrder.index(o[:id])
]
end
When sorting by an array, you're essentially sorting by the first element, except when they're the same, in which case you sort by the second element, etc. In code above, (o[:type] == :sort) ? 0 : 1 ensures that everything with a type of :sort comes first, and everything else after, even if the type is nil, or 5 or whatever you like. The sortIdOrder.index(o[:id]) term ensures things are sorted as you like (though items with no :id, or whose :id is not found in sortIdOrder will be ordered arbitrarily. If your data set is very large, you may want to tweak this further so that the sortIdOrder array is not performed for non-sort items.
Enumerable#sort_by only has to call the block once for each element, and then perform fast comparisons on the results; Enumerable#sort has to call the block on pairs of elements, which means it's called more often:
irb(main):015:0> ary = %w{9 8 7 6 5 4 3 2 1}
=> ["9", "8", "7", "6", "5", "4", "3", "2", "1"]
irb(main):016:0> a = 0; ary.sort_by {|x| puts x; a+=1; x.to_i }; puts "Total: #{a}"
9
8
7
6
5
4
3
2
1
Total: 9
=> nil
irb(main):017:0> a = 0; ary.sort {|x,y| puts "#{x},#{y}"; a+=1; x.to_i <=> y.to_i }; puts "Total: #{a}"
9,5
5,1
8,5
2,5
7,5
3,5
6,5
4,5
6,8
8,9
7,8
6,7
1,3
3,4
2,3
1,2
Total: 16
=> nil
In these cases, it's really not very important, because hash access is fast anyway (though sort_by is still more legible), but in cases where calculating the attribute you want to sort by is even moderately expensive, sort_by can be quite a bit faster. The block form of sort is mostly useful if the comparison logic itself is complicated.

Maybe I'm late, but my solution is:
Rails Solution:
object.partition { |hash| hash[:id].in?(sortIdOrder) }.flatten.reverse
Ruby Solution:
object.partition { |hash| sortIdOrder.include? hash[:id] }.flatten.reverse
both of it in result give this:
=> [{:type=>"notsort", :id=>4},
{:type=>"sort", :id=>0},
{:type=>"sort", :id=>1},
{:type=>"sort", :id=>3}]

Related

Is there any way to check if hashes in an array contains similar key value pairs in ruby?

For example, I have
array = [ {name: 'robert', nationality: 'asian', age: 10},
{name: 'robert', nationality: 'asian', age: 5},
{name: 'sira', nationality: 'african', age: 15} ]
I want to get the result as
array = [ {name: 'robert', nationality: 'asian', age: 15},
{name: 'sira', nationality: 'african', age: 15} ]
since there are 2 Robert's with the same nationality.
Any help would be much appreciated.
I have tried Array.uniq! {|e| e[:name] && e[:nationality] } but I want to add both numbers in the two hashes which is 10 + 5
P.S: Array can have n number of hashes.
I would start with something like this:
array = [
{ name: 'robert', nationality: 'asian', age: 10 },
{ name: 'robert', nationality: 'asian', age: 5 },
{ name: 'sira', nationality: 'african', age: 15 }
]
array.group_by { |e| e.values_at(:name, :nationality) }
.map { |_, vs| vs.first.merge(age: vs.sum { |v| v[:age] }) }
#=> [
# {
# :name => "robert",
# :nationality => "asian",
# :age => 15
# }, {
# :name => "sira",
# :nationality => "african",
# :age => 15
# }
# ]
Let's take a look at what you want to accomplish and go from there. You have a list of some objects, and you want to merge certain objects together if they have the same ethnicity and name. So we have a key by which we will merge. Let's put that in programming terms.
key = proc { |x| [x[:name], x[:nationality]] }
We've defined a procedure which takes a hash and returns its "key" value. If this procedure returns the same value (according to eql?) for two hashes, then those two hashes need to be merged together. Now, what do we mean by "merge"? You want to add the ages together, so let's write a merge function.
merge = proc { |x, y| x.dup.tap { |x1| x1[:age] += y[:age] } }
If we have two values x and y such that key[x] and key[y] are the same, we want to merge them by making a copy of x and adding y's age to it. That's exactly what this procedure does. Now that we have our building blocks, we can write the algorithm.
We want to produce an array at the end, after merging using the key procedure we've written. Fortunately, Ruby has a handy function called each_with_object which will do something very nice for us. The method each_with_object will execute its block for each element of the array, passing in a predetermined value as the other argument. This will come in handy here.
result = array.each_with_object({}) do |x, hsh|
# ...
end.values
Since we're using keys and values to do the merge, the most efficient way to do this is going to be with a hash. Hence, we pass in an empty hash as the extra object, which we'll modify to accumulate the merge results. At the end, we don't care about the keys anymore, so we write .values to get just the objects themselves. Now for the final pieces.
if hsh.include? key[x]
hsh[ key[x] ] = merge.call hsh[ key[x] ], x
else
hsh[ key[x] ] = x
end
Let's break this down. If the hash already includes key[x], which is the key for the object x that we're looking at, then we want to merge x with the value that is currently at key[x]. This is where we add the ages together. This approach only works if the merge function is what mathematicians call a semigroup, which is a fancy way of saying that the operation is associative. You don't need to worry too much about that; addition is a very good example of a semigroup, so it works here.
Anyway, if the key doesn't exist in the hash, we want to put the current value in the hash at the key position. The resulting hash from merging is returned, and then we can get the values out of it to get the result you wanted.
key = proc { |x| [x[:name], x[:nationality]] }
merge = proc { |x, y| x.dup.tap { |x1| x1[:age] += y[:age] } }
result = array.each_with_object({}) do |x, hsh|
if hsh.include? key[x]
hsh[ key[x] ] = merge.call hsh[ key[x] ], x
else
hsh[ key[x] ] = x
end
end.values
Now, my complexity theory is a bit rusty, but if Ruby implements its hash type efficiently (which I'm fairly certain it does), then this merge algorithm is O(n), which means it will take a linear amount of time to finish, given the problem size as input.
array.each_with_object(Hash.new(0)) { |g,h| h[[g[:name], g[:nationality]]] += g[:age] }.
map { |(name, nationality),age| { name:name, nationality:nationality, age:age } }
[{ :name=>"robert", :nationality=>"asian", :age=>15 },
{ :name=>"sira", :nationality=>"african", :age=>15 }]
The two steps are as follows.
a = array.each_with_object(Hash.new(0)) { |g,h| h[[g[:name], g[:nationality]]] += g[:age] }
#=> { ["robert", "asian"]=>15, ["sira", "african"]=>15 }
This uses the class method Hash::new to create a hash with a default value of zero (represented by the block variable h). Once this hash heen obtained it is a simple matter to construct the desired hash:
a.map { |(name, nationality),age| { name:name, nationality:nationality, age:age } }

How to get the highest and the lowest points in the hash?

#participants[id] = {nick: nick, points: 1}
=> {"1"=>{:nick=>"Test", :points=>3}, "30"=>{:nick=>"AnotherTest", :points=>5}, "20"=>{:nick=>"Newtest", :points=>3}}
I want my the lowest points (ID: 1 and 20). How do I get the lowest points go first and then ID 30 go last?
If you use Enumerable#max_by() or Enumerable#min_by() you can do following;
data = {
"1" => {nick: "U1", points: 3},
"30" => {nick: "U30", points: 5},
"20" => {nick: "U20", points: 3}
}
max_id, max_data = data.max_by {|k,v| v[:points]}
puts max_id # => 30
puts max_data # => {nick: "U30", points: 5}
Same thing works with #min_by() and if you want to get back Hash you do this:
minimal = Hash[*data.min_by {|k,v| v[:points]}]
puts minimal # => {"1"=>{:nick=>"U1", :points=>3}}
Functions min_by() and max_by() will always return one record. If you want to get all records with same points then you have to use min / max data an do another "lookup" like this:
min_id, min_data = data.min_by {|k,v| v[:points]}
all_minimal = data.select {|k,v| v[:points] == min_data[:points]}
puts all_minimal
# => {"1"=>{:nick=>"U1", :points=>3}, "20"=>{:nick=>"U20", :points=>3}}
Hashes are not a suitable data structure for this operation. They are meant when you need to get a value in O(1) complexity.
You are better off using a sorted array or a tree if you are interested in comparisons or Heap (in case you are interested in only maximum or minimum value) as #Vadim suggested
Use Enumerable#minmax_by:
h = { "1"=>{:nick=>"Test", :points=>3},
"30"=>{:nick=>"AnotherTest", :points=>5},
"20"=>{:nick=>"Newtest", :points=>3}}
h.minmax_by { |_,g| g[:points] }
#=> [[ "1", {:nick=>"Test", :points=>3}],
# ["30", {:nick=>"AnotherTest", :points=>5}]]
You can work around the fact that min_by and max_by are returning only one result by combining sort and chunk:
data.sort_by{|_,v| v[:points]}.chunk{|(_,v)| v[:points]}.first.last.map(&:first)
#=> ["1", "20"]

Count array object types in one iteration

I have a JSON with format
{body => ["type"=>"user"...], ["type"=>"admin"...]}
I want to count the objects by type, but I don't want to iterate the array three times (that's how many different objects I have), so this won't work:
#user_count = json["body"].count{|a| a['type'] == "user"}
#admin_count = json["body"].count{|a| a['type'] == "admin"}
...
Is there a smart way to count the object types without doing an .each block and using if statements?
You could use each_with_object to create a hash with type => count pairs:
json['body'].each_with_object(Hash.new(0)) { |a, h| h[a['type']] += 1 }
#=> {"user"=>5, "admin"=>7, ...}
You can store counts into a hash using one each:
counts = { "user" => 0, "admin" => 0, "whatever" => 0 }
json["body"].each do |a|
counts[a.type] += 1
end
counts["user"] #=> 1
counts["admin"] #=> 2
counts["whatever"] #=> 3

Ruby: Sorting an array of strings, in alphabetical order, that includes some arrays of strings

Say I have:
a = ["apple", "pear", ["grapes", "berries"], "peach"]
and I want to sort by:
a.sort_by do |f|
f.class == Array ? f.to_s : f
end
I get:
[["grapes", "berries"], "apple", "peach", "pear"]
Where I actually want the items in alphabetical order, with array items being sorted on their first element:
["apple", ["grapes", "berries"], "peach", "pear"]
or, preferably, I want:
["apple", "grapes, berries", "peach", "pear"]
If the example isn't clear enough, I'm looking to sort the items in alphabetical order.
Any suggestions on how to get there?
I've tried a few things so far yet can't seem to get it there. Thanks.
I think this is what you want:
a.sort_by { |f| f.class == Array ? f.first : f }
I would do
a = ["apple", "pear", ["grapes", "berries"], "peach"]
a.map { |e| Array(e).join(", ") }.sort
# => ["apple", "grapes, berries", "peach", "pear"]
Array#sort_by clearly is the right method, but here's a reminder of how Array#sort would be used here:
a.sort do |s1,s2|
t1 = (s1.is_a? Array) ? s1.first : s1
t2 = (s2.is_a? Array) ? s2.first : s2
t1 <=> t2
end.map {|e| (e.is_a? Array) ? e.join(', ') : e }
#=> ["apple", "grapes, berries", "peach", "pear"]
#theTinMan pointed out that sort is quite a bit slower than sort_by here, and gave a reference that explains why. I've been meaning to see how the Benchmark module is used, so took the opportunity to compare the two methods for the problem at hand. I used #Rafa's solution for sort_by and mine for sort.
For testing, I constructed an array of 100 random samples (each with 10,000 random elements to be sorted) in advance, so the benchmarks would not include the time needed to construct the samples (which was not insignificant). 8,000 of the 10,000 elements were random strings of 8 lowercase letters. The other 2,000 elements were 2-tuples of the form [str1, str2], where str1 and str2 were each random strings of 8 lowercase letters. I benchmarked with other parameters, but the bottom-line results did not vary significantly.
require 'benchmark'
# n: total number of items to sort
# m: number of two-tuples [str1, str2] among n items to sort
# n-m: number of strings among n items to sort
# k: length of each string in samples
# s: number of sorts to perform when benchmarking
def make_samples(n, m, k, s)
s.times.with_object([]) { |_, a| a << test_array(n,m,k) }
end
def test_array(n,m,k)
a = ('a'..'z').to_a
r = []
(n-m).times { r << a.sample(k).join }
m.times { r << [a.sample(k).join, a.sample(k).join] }
r.shuffle!
end
# Here's what the samples look like:
make_samples(6,2,4,4)
#=> [["bloj", "izlh", "tebz", ["lfzx", "rxko"], ["ljnv", "tpze"], "ryel"],
# ["jyoh", "ixmt", "opnv", "qdtk", ["jsve", "itjw"], ["pnog", "fkdr"]],
# ["sxme", ["emqo", "cawq"], "kbsl", "xgwk", "kanj", ["cylb", "kgpx"]],
# [["rdah", "ohgq"], "bnup", ["ytlr", "czmo"], "yxqa", "yrmh", "mzin"]]
n = 10000 # total number of items to sort
m = 2000 # number of two-tuples [str1, str2] (n-m strings)
k = 8 # length of each string
s = 100 # number of sorts to perform
samples = make_samples(n,m,k,s)
Benchmark.bm('sort_by'.size) do |bm|
bm.report 'sort_by' do
samples.each do |s|
s.sort_by { |f| f.class == Array ? f.first : f }
end
end
bm.report 'sort' do
samples.each do |s|
s.sort do |s1,s2|
t1 = (s1.is_a? Array) ? s1.first : s1
t2 = (s2.is_a? Array) ? s2.first : s2
t1 <=> t2
end
end
end
end
user system total real
sort_by 1.360000 0.000000 1.360000 ( 1.364781)
sort 4.050000 0.010000 4.060000 ( 4.057673)
Though it was never in doubt, #theTinMan was right! I did a few other runs with different parameters, but sort_by consistently thumped sort by similar performance ratios.
Note the "system" time is zero for sort_by. In other runs it was sometimes zero for sort. The values were always zero or 0.010000, leading me to wonder what's going on there. (I ran these on a Mac.)
For readers unfamiliar with Benchmark, Benchmark#bm takes an argument that equals the amount of left-padding desired for the header row (user system...). bm.report takes a row label as an argument.
You are really close. Just switch .to_s to .first.
irb(main):005:0> b = ["grapes", "berries"]
=> ["grapes", "berries"]
irb(main):006:0> b.to_s
=> "[\"grapes\", \"berries\"]"
irb(main):007:0> b.first
=> "grapes"
Here is one that works:
a.sort_by do |f|
f.class == Array ? f.first : f
end
Yields:
["apple", ["grapes", "berries"], "peach", "pear"]
a.map { |b| b.is_a?(Array) ? b.join(', ') : b }.sort
# => ["apple", "grapes, berries", "peach", "pear"]
Replace to_s with join.
a.sort_by do |f|
f.class == Array ? f.join : f
end
# => ["apple", ["grapes", "berries"], "peach", "pear"]
Or more concisely:
a.sort_by {|x| [*x].join }
# => ["apple", ["grapes", "berries"], "peach", "pear"]
The problem with to_s is that it converts your Array to a string that starts with "[":
"[\"grapes\", \"berries\"]"
which comes alphabetically before the rest of your strings.
join actually creates the string that you had expected to sort by:
"grapesberries"
which is alphabetized correctly, according to your logic.
If you don't want the arrays to remain arrays, then it's a slightly different operation, but you will still use join.
a.map {|x| [*x].join(", ") }.sort
# => ["apple", "grapes, berries", "peach", "pear"]
Sort a Flattened Array
If you just want all elements of your nested array flattened and then sorted in alphabetical order, all you need to do is flatten and sort. For example:
["apple", "pear", ["grapes", "berries"], "peach"].flatten.sort
#=> ["apple", "berries", "grapes", "peach", "pear"]

Convert cartesian product to nested hash in ruby

I have a structure with a cartesian product that looks like this (and could go out to arbitrary depth)...
variables = ["var1","var2",...]
myhash = {
{"var1"=>"a", "var2"=>"a", ...}=>1,
{"var1"=>"a", "var2"=>"b", ...}=>2,
{"var1"=>"b", "var2"=>"a", ...}=>3,
{"var1"=>"b", "var2"=>"b", ...}=>4,
}
... it has a fixed structure but I'd like simple indexing so I'm trying to write a method to convert it to this :
nested = {
"a"=> {
"a"=> 1,
"b"=> 2
},
"b"=> {
"a"=> 3,
"b"=> 4
}
}
Any clever ideas (that allow for arbitrary depth)?
Maybe like this (not the cleanest way):
def cartesian_to_map(myhash)
{}.tap do |hash|
myhash.each do |h|
(hash[h[0]["var1"]] ||= {}).merge!({h[0]["var2"] => h[1]})
end
end
end
Result:
puts cartesian_to_map(myhash).inspect
{"a"=>{"a"=>1, "b"=>2}, "b"=>{"a"=>3, "b"=>4}}
Here is my example.
It uses a method index(hash, fields) that takes the hash, and the fields you want to index by.
It's dirty, and uses a local variable to pass up the current level in the index.
I bet you can make it much nicer.
def index(hash, fields)
# store the last index of the fields
last_field = fields.length - 1
# our indexed version
indexed = {}
hash.each do |key, value|
# our current point in the indexed hash
point = indexed
fields.each_with_index do |field, i|
key_field = key[field]
if i == last_field
point[key_field] = value
else
# ensure the next point is a hash
point[key_field] ||= {}
# move our point up
point = point[key_field]
end
end
end
# return our indexed hash
indexed
end
You can then just call
index(myhash, ["var1", "var2"])
And it should look like what you want
index({
{"var1"=>"a", "var2"=>"a"} => 1,
{"var1"=>"a", "var2"=>"b"} => 2,
{"var1"=>"b", "var2"=>"a"} => 3,
{"var1"=>"b", "var2"=>"b"} => 4,
}, ["var1", "var2"])
==
{
"a"=> {
"a"=> 1,
"b"=> 2
},
"b"=> {
"a"=> 3,
"b"=> 4
}
}
It seems to work.
(see it as a gist
https://gist.github.com/1126580)
Here's an ugly-but-effective solution:
nested = Hash[ myhash.group_by{ |h,n| h["var1"] } ].tap{ |nested|
nested.each do |v1,a|
nested[v1] = a.group_by{ |h,n| h["var2"] }
nested[v1].each{ |v2,a| nested[v1][v2] = a.flatten.last }
end
}
p nested
#=> {"a"=>{"a"=>1, "b"=>2}, "b"=>{"a"=>3, "b"=>4}}
You might consider an alternative representation that is easier to map to and (IMO) just as easy to index:
paired = Hash[ myhash.map{ |h,n| [ [h["var1"],h["var2"]], n ] } ]
p paired
#=> {["a", "a"]=>1, ["a", "b"]=>2, ["b", "a"]=>3, ["b", "b"]=>4}
p paired[["a","b"]]
#=> 2

Resources