Beginnless date ranges don't allow include? - ruby

With ruby 2.7 beginless ranges were introduced. Now you can have:
(..5)
(5..10)
(10..)
With integers, .include? works as expected:
(..5).include?(6) # false
(..5).include?(5) # true
(..5).include?(2) # true
(..5).include?(-100) # true
The same does not work for date ranges however:
(..Date.tomorrow).include?(Date.today) # RangeError (cannot get the first element of beginless range)
Funnily, it works the other way round:
(Date.yesterday..).include?(Date.today) # true
And finally:
(Date.yesterday..).include?(Date.today - 2.days) # Seems to loop forever.
This is such a weird behaviour. All 3 cases bring a different result and only 1 of them actually works as intended.
I mean, I guess it would be understandable if we had a range that has some kind of "continious" logic to it, that it might be hard to check for inclusion. But relatively easy classes like Date should at least work. Date is almost like an Integer anyways. And even Float can do this, too, so I don't see why Date or DateTime shouldn't.
The usecase I have is that the database might give nil for a 2 dates that I'm querying. These are start and end dates that I want to use in a range, but I can't be sure that one of them might not be nil, which would be fine for my logic, but that would result in a beginningless range, which can't handle .include?.
I can easily make my usecase work with some manual ugly checks, but that's not the elegant ruby way. Am I missing something here? Or should this be a feature that's just not there yet?

With Range#include?, you are actually iterating the range, comparing each element in the range whether it is equal to the tested element. Only with number ranges, this is optimized internally to behave as you apparently expect it to. To quote the docs:
Returns true if obj is an element of the range, false otherwise. If begin and end are numeric, comparison is done according to the magnitude of the values.
Thus, instead of Range#include? you likely want to use Range#cover? here which only checks the boundaries of the range (and which works the same as Range#include? only with numeric boundaries):
Returns true if obj is between the begin and end of the range.
This tests begin <= obj <= end when exclude_end? is false and begin <= obj < end when exclude_end? is true.
[...]
Returns false if the begin value of the range is larger than the end value. Also returns false if one of the internal calls to <=> returns nil (indicating the objects are not comparable).
With your examples, Range#cover? does the right thing:
(..Date.tomorrow).cover?(Date.today)
# => true
(Date.yesterday..).cover?(Date.today)
# => true
(Date.yesterday..).cover?(Date.today - 2.days)
# => false

TL;DR
This is either a bug in how Date objects are compared in an endless Range, or a known issue with how certain iterators work with an endless Range. I provide an explanation and some workarounds below.
Analysis & Explanation
There are some surprising-but-documented behaviors of Ruby's beginless and endless Range objects. The documentation calls them "implementation details," and describes them as follows:
begin of beginless range and end of endless range are nil;
each of beginless range raises an exception;
each of endless range enumerates infinite sequence (may be useful in combination with Enumerable#take_while or similar methods);
(1..) and (1...) are not equal, although technically representing the same sequence.
As a result, you're somewhat at the mercy of how iteration is implemented for a given object type or method. Pragmatically, it seems that there are some optimizations for Integer ranges that allow for code like:
(1..).include? 999_999_999
#=> true
(1..).to_a
#=> RangeError (cannot convert endless range to an array)
to be performed (or to fail) quickly, but your particular code is (pragmatically speaking) attempting to reify infinity. As Date#yesterday is not a core Ruby method, it may also be an issue with how the Range is constructed by whatever mixin has monkeypatched your Date class. However, even when refactored to vanilla Ruby 2.7.1, ((Date.today - 1)..).include?(Date.today - 2) will hang.
Working Around the Behavior
Whether the behavior above is a bug or a design choice is a question for the Ruby Core Team. However, you can work around it very easily by checking bounds rather than iterating. If you must iterate, then don't try to iterate over infinity. For example:
require 'date'
def distant_future
# 5 millenia from today
Date.today + (365 * 5_000)
end
def yesterday
Date.today - 1
end
def two_days_ago
yesterday - 1
end
# slow, but returns in about 0m1.046s on my system
(yesterday .. distant_future).include? two_days_ago
By using something large but less than infinity as the end of your range, you allow the iteration to return. You can make this more performant in two ways:
Shortening your date range, reducing the number of potential iterations.
Checking for a date near the front of your range, requiring fewer iterations to match.
As an example, iterating over 1,825,000 days only to find you don't have a match takes noticeable time. On the other hand, the following returns almost instantly:
(two_days_ago .. distant_future).include? yesterday
#=> true
Every language has its share of bugs and rough edges. This appears to be one of them. Either way, I would recommend avoiding iteration over beginless/endless Date ranges in the interests of pragmatism.

Related

Ruby: Help improving hashing algorithm

I am still relatively new to ruby as a language, but I know there are a lot of convenience methods built into the language. I am trying to generate a "hash" to check against in a low level block-chain verifier and I am wondering if there are any "convenience methods" that I could you to try to make this hashing algorithm more efficient. I think I can make this more efficient by utilizing ruby's max integer size, but I'm not sure.
Below is the current code which takes in a string to hash, unpacks it into an array of UTF-8 values, does computationally intensive math to each one of those values, adds up all of those values after the math is done to them, takes that value modulo 65,536, and then returns the hex representation of that value.
def generate_hash(string)
unpacked_string = string.unpack('U*')
sum = 0
unpacked_string.each do |x|
sum += (x**2000) * ((x + 2)**21) - ((x + 5)**3)
end
new_val = sum % 65_536 # Gives a number from 0 to 65,535
new_val.to_s(16)
end
On very large block-chains there is a very large performance hit which I am trying to get around. Any help would be great!
First and foremost, it is extremely unlikely that you are going to create anything that is more efficient than simply using String#hash. This is a case of you trying to build a better mousetrap.
Honestly, your hashing algorithm is very inefficient. The entire point of a hash is to be a fast, low-overhead way of quickly getting a "unique" (as unique as possible) integer to represent any object to avoid comparing by values.
Using that as a premise, if you start doing any type of intense computation in a hash algorithm, it is already counter-productive. Once you start implementing modulo and pow functions, it is inefficient.
Usually best practice involves taking a value(s) of the object that can be represented as integers, and performing bit operations on them, typically with prime numbers to help reduce hash collisions.
def hash
h = value1 ^ 393
h += value2 ^ 17
h
end
In your example, you are for some reason forcing the hash to the max value of a 16-bit unsigned integer, when typically 32-bits is used, although if you are comparing on the Ruby-side, this would be 31-bits due to how Ruby masks Fixnum values. Fixnum was deprecated on the Ruby side as it should have been, but internally the same threshold exists between what how a Bignum and Fixnum are handled. The Integer class simply provides one interface on the Ruby side, as those two really should never have been exposed outside of the C code.
In your specific example using strings, I would simply symbolize them. This guarantees a quick and efficient way that determines if two strings are equal without hardly any overhead, and comparing 2 symbols is the exact same as comparing 2 integers. There is a caveat to this method if you are comparing a vast number of strings. Once a symbol is created, it is alive for the life of the program. Any additional strings that equal to it will return the same symbol, but you cannot remove the memory of the symbol (just a few bytes) for as long as the program runs. Not good if using this method to compare thousands and thousands of unique strings.

Ruby: target-less 'case', compared to 'if'

(I have asked this question already at Ruby Forum, but it didn't draw any answer, so I'm crossposting it now)
From my understanding, the following pieces of code are equivalent under
Ruby 1.9 and higher:
# (1)
case
when x < y
foo
when a > b
bar
else
baz
end
# (2)
if x < y
foo
elsif a > b
bar
else
baz
end
So far I would have always used (2), out of a habit. Could someone think
of a particular reason, why either (1) or (2) is "better", or is it just
a matter of taste?
CLARIFICATION: Some users have objected, that this question would just be "opinion-based", and hence not suited to this forum. I therefore think that I did not make myself clear enough: I don't want to start a discussion on personal programming style. The reason why I brought up this topic is this:
I was surprised, that Ruby offered two very different syntaxes (target-less case, and if-elsif) for, as it seems to me, the exactly same purpose, in particular since the if-elsif syntax is the one virtually every programmer is familiar. I wouldn't even consider 'target-less if' as "syntactic sugar", because it doesn't allow me to express the programming logic more consisely then 'if-elsif'.
So I wonder in what situation I might want to use the 'target-less case' construct. Does it give a performance advantage? Is it different from if-elsif in some subtle way which I just don't notice?
ADDITIONAL FINDINGS regarding the implementation of target-less case:
Olivier Poulin has pointed out, that a target-less case statement would explicitly use the === operator against the value "true", which would cause a (tiny) perfomance penalty of the 'case' compared to 'if' (and one more reason why I don't see why someone might want to use it).
However, when checking the documentation of the case statement for Ruby 1.9 and Ruby 2.0, I found that they describe it differently, but both at least suggest that === might NOT be used in this case. In the case of Ruby 1.9:
Case statements consist of an optional condition, which is in the position of an argument to case, and zero or more when clauses. The first when clause to match the condition (or to evaluate to Boolean truth, if the condition is null) “wins”
Here it says, that if the condition (i.e. what comes after 'case') is null (i.e. does not exist), the first 'when' clause which evaluates to true is the one being executed. No reference to === here.
In Ruby 2.0, the wording is completely different:
The case expression can be used in two ways. The most common way is to compare an object against multiple patterns. The patterns are matched using the +===+ method [.....]. The other way to use a case expression is like an if-elsif expression: [example of target-less case is given here].
It hence says that === is used in the "first" way (case with target), while the target-less case "is like" if-elsif. No mentioning of === here.
Midwire ran a few benchmarks and concluded that if/elsif is faster
than case because case “implicitly compares using the more expensive
=== operator”.
Here is where I got this quote. It compares if/elsif statements to case.
It is very thorough and explores the differences in the instruction sequence, definitely will give you an idea on which is better.
The main thing i pulled from the post though, is that both if/else if and case have no huge differences, both can usually be used interchangeably.
Some major differences can present themselves depending on how many cases you have.
n = 1 (last clause matches)
if: 7.4821e-07
threequal_if: 1.6830500000000001e-06
case: 3.9176999999999997e-07
n = 15 (first clause matches)
if: 3.7357000000000003e-07
threequal_if: 5.0263e-07
case: 4.3348e-07
As you can see, if the last clause is matched,if/elsif runs much slower than case, while if the first clause is matched, it's the other way around.
This difference comes from the fact that if/elsif uses branchunless, while case uses branchif in their instruction sequences.
Here is a test I did on my own with a target-less case vs if/elsif statements (using "=="). The first time is case, while the second time is if/elsif.
First test, 5 when statements, 5 if/elsif, the first clause is true for both.
Time elapsed 0.052023 milliseconds
Time elapsed 0.031467999999999996 milliseconds
Second test, 5 when statements, 5 if/elsif, the last(5th) clause is true for both.
Time elapsed 0.001224 milliseconds
Time elapsed 0.028578 milliseconds
As you can see, just as we saw before, when the first clause is true, if/elsif perform better than case, while case has a massive performance advantage when the last clause is true.
CONCLUSION
Running more tests has shown that it probably comes down to probability. If you think the answer is going to come earlier in your list of clauses, use if/elsif, otherwise case seems to be faster.
The main thing that this has shown is that both case and if/elsif are equally efficient and that using one over the other comes down to probability and taste.

Bubble Sort method

I am just learning ruby and KevinC's response (in this link) makes sense to me with one exception. I don't understand why the code is encompassed in the arr.each do |i| #while... end That part seems redundant to me as the 'while' loop is already hitting each of the positions? Can someone explain?
The inner loop finds a bubble and carries it up; if it finds another, lighter bubble, it switches them around and carries the lighter one. So you need several passes through the array to find all the bubbles and carry them to the correct place, since you can't float several bubbles at the same time.
EDIT:
The each is really misused in KevinC's code, since it is not used for its normal purpose: yielding elements of the collection. Instead of arr.each, it would be better to use arr.size.times - as it would be more informative to the reader. Redefining the i within the block is adding insult to injury. While none of this will cause the code to be wrong as such, it is misleading.
The other problem with the code is the fact that it does not provide the early termination condition (swapped in most other answers on that question). In theory, bubble sort could find the array sorted in the first pass; the other size - 1 steps are unnecesary. KevinC's code would still dry-hump the already sorted array, never realising it is done.
As for rewrite into block-less code, it is certainly possible, but you need to understand that blocks syntax is very idiomatic in Ruby, and non-block loops are almost unheard of in Ruby world. While Ruby has for, it is pretty much never used in Ruby. But...
arr.each do |i|
...
end
is equivalent to
for i in arr
...
end
which is, again, at least for the array case, equivalent to
index = 0
while index < arr.size
i = arr[index]
...
index += 1
end

Ruby performance in .each loops

Consider the following two peices of ruby code
Example 1
name = user.first_name
round_number = rounds.count
users.each do |u|
puts "#{name} beat #{u.first_name} in round #{round_number}"
end
Example 2
users.each do |u|
puts "#{user.first_name} beat #{u.first_name} in #{rounds.count}"
end
For both pieces of code imagine
#user.rb
def first_name
name.split.first
end
So in a classical analysis of algorithms, the first piece of code would be more efficient, however in most modern compiled languages, modern compilers would optimize the second piece of code to make it look like the first, eliminating the need to optimize code in such maner.
Will ruby optimize or cache values for this code before execution? Should my ruby code look like example 1 or example 2?
Example 1 will run faster, as first_name() is only called once, and it's value stored in the variable.
In Example 2 Ruby will not memoize this value automatically, since the value could have changed between iterations for the each() loop.
Therefor expensive-to-calculate methods should be explicitly memoized if they are expected to be used more than once without the return value changing.
Making use of Ruby's Benchmark Module can be useful when making decisions like this. It will likely only be worth memoizing if there are a lot of values in users, or if first_name() is expensive to calculate.
A compiler can only perform this optimization if it can prove that the method has no side effects. This is even more difficult in Ruby than most languages, as everything is mutable and can be overridden at runtime. Whether it happens or not is implementation dependent, but since it's hard to do in Ruby, most do not. I actually don't know of any that do at the time of this posting.

What are the differences between an array and a range in ruby?

just wondering what the subtle difference between an array and a range is. I came across an example where I have x = *(1..10) output x as an array and *(1..10) == (1..10).to_a throws an error. This means to me there is a subtle difference between the two and I'm just curious what it is.
Firstly, when you're not in the middle of an assignment or parameter-passing, *(1..10) is a syntax error because the splat operator doesn't parse that way. That's not really related to arrays or ranges per se, but I thought I'd clear up why that's an error.
Secondly, arrays and ranges are really apples and oranges. An array is an object that's a collection of arbitrary elements. A range is an object that has a "start" and an "end", and knows how to move from the start to the end without having to enumerate all the elements in between.
Finally, when you convert a range to an array with to_a, you're not really "converting" it so much as you're saying, "start at the beginning of this range and keep giving me elements until you reach the end". In the case of "(1..10)", the range is giving you 1, then 2, then 3, and so on, until you get to 10.
One difference is that ranges do not separately store every element in itself, unlike an array.
r = (1..1000000) # very fast
r.to_a # sloooooow
You lose the ability to index to an arbitrary point, however.

Resources