Sorting By Multiple Conditions in Ruby - ruby

I have a collection of Post objects and I want to be able to sort them based on these conditions:
First, by category (news, events, labs, portfolio, etc.)
Then by date, if date, or by position, if a specific index was set for it
Some posts will have dates (news and events), others will have explicit positions (labs, and portfolio).
I want to be able to call posts.sort!, so I've overridden <=>, but am looking for the most effective way of sorting by these conditions. Below is a pseudo method:
def <=>(other)
# first, everything is sorted into
# smaller chunks by category
self.category <=> other.category
# then, per category, by date or position
if self.date and other.date
self.date <=> other.date
else
self.position <=> other.position
end
end
It seems like I'd have to actually sort two separate times, rather than cramming everything into that one method. Something like sort_by_category, then sort!. What is the most ruby way to do this?

You should always sort by the same criteria to insure a meaningful order. If comparing two nil dates, it is fine that the position will judge of the order, but if comparing one nil date with a set date, you have to decide which goes first, irrespective of the position (for example by mapping nil to a day way in the past).
Otherwise imagine the following:
a.date = nil ; a.position = 1
b.date = Time.now - 1.day ; b.position = 2
c.date = Time.now ; c.position = 0
By your original criteria, you would have: a < b < c < a. So, which one is the smallest??
You also want to do the sort at once. For your <=> implementation, use #nonzero?:
def <=>(other)
return nil unless other.is_a?(Post)
(self.category <=> other.category).nonzero? ||
((self.date || AGES_AGO) <=> (other.date || AGES_AGO)).nonzero? ||
(self.position <=> other.position).nonzero? ||
0
end
If you use your comparison criteria just once, or if that criteria is not universal and thus don't want to define <=>, you could use sort with a block:
post_ary.sort{|a, b| (a.category <=> ...).non_zero? || ... }
Better still, there is sort_by and sort_by! which you can use to build an array for what to compare in which priority:
post_ary.sort_by{|a| [a.category, a.date || AGES_AGO, a.position] }
Besides being shorter, using sort_by has the advantage that you can only obtain a well ordered criteria.
Notes:
sort_by! was introduced in Ruby 1.9.2. You can require 'backports/1.9.2/array/sort_by' to use it with older Rubies.
I'm assuming that Post is not a subclass of ActiveRecord::Base (in which case you'd want the sort to be done by the db server).

Alternatively you could do the sort in one fell swoop in an array, the only gotcha is handling the case where one of the attributes is nil, although that could still be handled if you knew the data set by selecting the appropriate nil guard. Also it's not clear from your psuedo code if the date and position comparisons are listed in a priority order or an one or the other (i.e. use date if exists for both else use position). First solution assumes use, category, followed by date, followed by position
def <=>(other)
[self.category, self.date, self.position] <=> [other.category, other.date, other.position]
end
Second assumes it's date or position
def <=>(other)
if self.date && other.date
[self.category, self.date] <=> [other.category, other.date]
else
[self.category, self.position] <=> [other.category, other.position]
end
end

Related

Idiomatic lazy sorting by multiple criteria

In Ruby, the most common way to sort by multiple criteria is to use sort_by with the sorting function returning an array of the values corresponding to each sorting criterion, in order of decreasing importance, e.g.:
Dir["*"].sort_by { |f| [test(?s, f) || 0, test(?M, f), f] }
will sort the directory entries by size, then by mtime, then finally by the filename. This is efficient to the extent that it uses a Schwartzian transform to only calculate the size and mtime of each file once, not once per comparison. However it is not truly lazy, since it calculates the mtime for every single file, but if (say) every file in the directory had a different size, it should not be necessary to calculate any mtimes.
This is not a big problem in this case, since looking up the mtime immediately after looking up the size should be efficient due to caching at the kernel level (e.g. IIRC on Linux they both come from a stat(2) syscall), and I wouldn't be surprised if Ruby has its own optimizations too. But imagine if the second criterion was not the mtime, but (say) the number of occurrences of a string within the file, and the files in question are huge. In this case you'd really want lazy evaluation, to avoid reading the whole of these huge files if sorting by size is sufficient.
At the time of writing, the Wikibooks entry for Algorithm Implementation/Sorting/Schwartzian transform suggests this solution:
sorted_files =
Dir["*"]. # Get all files
# compute tuples of name, size, modtime
collect{|f| [f, test(?s, f), test(?M, f)]}.
sort {|a, b| # sort
a[1] <=> b[1] or # -- by increasing size
b[2] <=> a[2] or # -- by age descending
a[0] <=> b[0] # -- by name
}.collect{|a| a[0]} # extract original name
This kind of approach is copied from Perl, where
sort {
$a->[1] <=> $b->[1] # sort first numerically by size (smallest first)
or $b->[2] <=> $a->[2] # then numerically descending by modtime age (oldest first)
or $a->[0] cmp $b->[0] # then stringwise by original name
}
works beautifully because Perl has a quirk where 0 or $foo evaluates to $foo. But in Ruby, it's broken because 0 or foo evaluates to 0. So in effect, the Wikibooks implementation totally ignores mtimes and filenames, and only sorts by size. I've dusted off my Wikibooks account so that I can fix this, but I'm wondering: what is the cleanest way of combining the results of multiple <=> spaceship operator comparisons in Ruby?
I'll give a concrete-ish example to clarify the question. Let's assume we have two types of evaluation which may be required as criteria during the sort. The first is relatively cheap:
def size(a)
# get the size of file `a`, and if we're feeling keen,
# memoize the results
...
end
The second is expensive:
def matches(a)
# count the number of occurrences of a string
# in file `a`, which could be a large file, and
# memoize the results
...
end
And we want to sort first by size ascending, then descending by number of matches. We can't use a Schwartzian transform, because that would non-lazily call matches() on every item.
We could define a helper like
def nil_if_equal(result)
result == 0 ? nil : result
end
and then do:
sort {|a, b|
nil_if_equal(size(a) <=> size(b)) or
matches(b) <=> matches(a)
}
If there are n criteria to sort by then you'd need n-1 invocations of nil_if_equal here, since only the last sorting criteria doesn't require it.
So is there a more idiomatic way than this which can avoid the need for nil_if_equal?
No idea how idiomatic it is, but here's a way to use sort_by again. Instead of
for example
['bab', 'foo', 'so', 'bar'].sort_by { |s| [s.size, count_a(s), count_b(s)] }
do this to make count_a(s) and count_b(s) lazy and memoized:
['bab', 'foo', 'so', 'bar'].sort_by { |s| [s.size, lazy{count_a(s)}, lazy{count_b(s)}] }
My lazy makes the block act like a lazy and memoizing version of the value it yields.
Demo output, showing we only count what's necessary (i.e., don't count in 'so' since it has a unique size and don't count 'b' in 'foo' since its 'a'-count is unique among the size-3 strings):
Counting 'a' in 'bab'.
Counting 'a' in 'foo'.
Counting 'a' in 'bar'.
Counting 'b' in 'bab'.
Counting 'b' in 'bar'.
["so", "foo", "bar", "bab"]
Demo code:
def lazy(&block)
def block.value
(#value ||= [self.yield])[0]
end
def block.<=>(other)
value <=> other.value
end
block
end
def count_a(s)
puts "Counting 'a' in '#{s}'."
s.count('a')
end
def count_b(s)
puts "Counting 'b' in '#{s}'."
s.count('b')
end
p ['bab', 'foo', 'so', 'bar'].sort_by { |s| [s.size, lazy{count_a(s)}, lazy{count_b(s)}] }
A different way to make value memoizing: If it ever gets called, it immediately replaces itself with a method just returning the stored value:
def block.value
def self.value; #value end
#value = self.yield
end

Building dynamic if statement based on user-defined input

I have a table, with increasing values each row (year in the code below).
I have a target, that specifies a "threshold". The target is user defined, it can contain a value for one or multiple columns of the table. This means you never know how many columns are specified in the target.
I want to match the first row in the table, where the values in the row are greater than the values in the target. I currently have this:
class Target < ActiveRecord::Base
def loop_sheets(sheets, year_array)
result = nil
elements = self.class.column_names[1..-3].map(&:to_sym)
to_match = elements.select{|e| self.send(e) != nil }
condition = to_match.map do |attr|
"row[:#{attr}] > #{attr}"
end.join " and "
year_array.each do |year|
sheets.each do |sheet|
row = sheet.calculation.datatable.select { |r| r[:year] == year }.first
does_match = eval(condition)
if does_match
result = {
:year => row[:year],
:sheet_name => sheet.name
}
return result
end
end
end
return result
end
end
This works perfectly, but now the algorithm is fixed to use AND matching. I want to support OR matching as well as AND matching. Also I want to avoid using eval, there has to be a more elegant way. Also I want to reduce the complexity of this code as much as possible.
How could I rewrite this code to meet these requirements? Any suggestion is appreciated.
To avoid using eval: Ruby can create code dynamically, so do that instead of adding strings together. All you have to do is take the strings away!
conditions = to_match.map do |attr|
proc {|row| row[attr.to_sym] > attr }
end
Now you have an array of runnable blocks that take the row as their argument and return the result of the condition (return keyword not required). If you're just doing and, it's as simple as:
does_match = conditions.all? {|c| c.call(row) }
which will be true only if all the conditions return a truthy value (i.e. not false or nil).
As for supporting OR logic, if you are happy to just support ORing all of the conditions (e.g. replacing "and" with "or") then this will do it:
does_match = conditions.any? {|c| c.call(row) }
but if you want to support ORing some and ANDing others, you'll need someway to group them together, which is more complex.

Is it possible to sort a list of objects depending on the individual object's response to a method?

I am wanting to display a gallery of products where I include products that are both for sale and not for sale. Only I want the products that are for sale to appear at the front of the list and the objects that are not for sale will appear at the end of the list.
An easy way for me to accomplish this is to make two lists, then merge them (one list of on_sale? objects and one list of not on_sale? objects):
available_products = []
sold_products = []
#products.each do |product|
if product.on_sale?
available_products << product
else
sold_products << product
end
end
. . . But do to the structure of my existing app, this would require an excessive amount of refactoring due to an oddity in my code (I lose my pagination, and I would rather not refactor). It would be easier if there were a way to sort the existing list of objects by my product model's method on_sale? which returns a boolean value.
Is it possible to more elegantly iterate through an existing list and sort it by this true or false value in rails? I only ask because there is so much I'm not aware of hidden within this framework / language (ruby) and I'd like to know if they work has been done before me.
Sure. Ideally we'd do something like this using sort_by!:
#products.sort_by! {|product| product.on_sale?}
or the snazzier
#products.sort_by!(&:on_sale?)
but sadly, <=> doesn't work for booleans (see Why doesn't sort or the spaceship (flying saucer) operator (<=>) work on booleans in Ruby?) and sort_by doesn't work for boolean values, so we need to use this trick (thanks rohit89!)
#products.sort_by! {|product| product.on_sale? ? 0 : 1}
If you want to get fancier, the sort method takes a block, and inside that block you can use whatever logic you like, including type conversion and multiple keys. Try something like this:
#products.sort! do |a,b|
a_value = a.on_sale? ? 0 : 1
b_value = b.on_sale? ? 0 : 1
a_value <=> b_value
end
or this:
#products.sort! do |a,b|
b.on_sale?.to_s <=> a.on_sale?.to_s
end
(putting b before a because you want "true" values to come before "false")
or if you have a secondary sort:
#products.sort! do |a,b|
if a.on_sale? != b.on_sale?
b.on_sale?.to_s <=> a.on_sale?.to_s
else
a.name <=> b.name
end
end
Note that sort returns a new collection, which is usually a cleaner, less error-prone solution, but sort! modifies the contents of the original collection, which you said was a requirement.
#products.sort_by {|product| product.on_sale? ? 0 : 1}
This is what I did when I had to sort based on booleans.
No need to sort:
products_grouped = #products.partition(&:on_sale?).flatten(1)
Ascending and descending can be done by inter changing "false" and "true"
Products.sort_by {|product| product.on_sale? == true ? "false" : "true" }

Ruby find in array with offset

I'm looking for a way to do the following in Ruby in a cleaner way:
class Array
def find_index_with_offset(offset, &block)
[offset..-1].find &block
end
end
offset = array.find_index {|element| element.meets_some_criterion?}
the_object_I_want =
array.find_index_with_offset(offset+1) {|element| element.meets_another_criterion?}
So I'm searching a Ruby array for the index of some object and then I do a follow-up search to find the first object that matches some other criterion and has a higher index in the array. Is there a better way to do this?
What do I mean by cleaner: something that doesn't involve explicitly slicing the array. When you do this a couple of times, calculating the slicing indices gets messy fast. I'd like to keep operating on the original array. It's easier to understand and less error-prone.
NB. In my actual code I haven't monkey-patched Array, but I want to draw attention to the fact that I expect I'm duplicating existing functionality of Array/Enumerable
Edits
Fixed location of offset + 1 as per Mladen Jablanović's comment; rewrite error
Added explanation of 'cleaner' as per Mladen Jablanović's comment
Cleaner is here obviously subjective matter. If you aim for short, I don't think you could do better than that. If you want to be able to chain multiple such finds, or you are bothered by slicing, you can do something like this:
module Enumerable
def find_multi *procs
return nil if procs.empty?
find do |e|
if procs.first.call(e)
procs.shift
next true if procs.empty?
end
false
end
end
end
a = (1..10).to_a
p a.find_multi(lambda{|e| e % 5 == 0}, lambda{|e| e % 3 == 0}, lambda{|e| e % 4 == 0})
#=> 8
Edit: And if you're not concerned with the performance you could do something like:
array.drop_while{|element|
!element.meets_some_criterion?
}.drop(1).find{|element|
element.meets_another_criterion?
}

sorting on a new field

So I did have a array of events which I ordered by calling sort on it
all.sort {|a,b| b.time<=> a.time}
Now recently I have added a new field to my object to include a "uploaded_at" time. I would like to firstly sort by "Time" and then by "Uploaded_at" time (as the 'Time' field is simple a date without any time on it.
I need to bare in mind that all old "Events" will not have a value for "Uploaded_at" and so this method could not exist. How can I go about that? (I do not care about the order of two events that have the same Time and no uploaded_at values
Try something like this:
all.sort_by { |x| [x.time, (x.uploaded_at rescue Time.utc(1970))] }
You're going to have to handle the nil values in uploaded_at with some care. The problem is that x <=> nil and nil <=> x will be nil except when x.nil? and sorting requires the <=> operator to return a Fixnum.
One option is to map nils to some non-nil value that will always sort properly, Time.new(0) perhaps:
all.sort_by { |a| [a.time, a.uploaded_at || Time.new(0)] }
Array#<=> operators component-by-component and the above removes the nil problem. If you want nils at the other end then choose something large instead of Time.new(0).
Another options is to handle the nils manually:
all.sort do |a, b|
x = a.time <=> b.time
if(x != 0)
x
elsif(a.uploaded_at.nil? && b.uploaded_at.nil?)
0
elsif(a.uploaded_at.nil?)
1
elsif(b.uploaded_at.nil?)
-1
else
a.uploaded_at <=> b.uploaded_at
end
end
You would, of course, adjust the elsif(a.uploaded_at.nil?) and elsif(b.uploaded_at.nil?) branches to put the nils where you want them.

Resources