Best practice for chaining comparisons in Ruby - ruby

In python, it is possible to chain comparisons like so:
0 < 1 < 2 and 5 > 4 > 3
The only practice I can figure to accomplish similar in Ruby is like so:
0 < 1 && 1 < 2 && 5 > 4 && 4 > 3
Which I find to be fairly unappealing visually.
I've searched google and found some class extensions to make Ruby work like Python, but I was wondering if there was an easier way to chain comparators using just core ruby?

1.between?(0,2)
between? works for any class which includes the Comparable module, e.g. dates, strings, arrays etc.

If you have basic lower and upper bounds you can use Enumberable#include? for range comparison like:
i = 10
(5 .. 20).include?(i)
So you know 5 is your lower-bound and 20 is your upper-bound.
Enumberable#include? uses == under the hood so it has to walk the range and compare each element, so this is poor performance for large ranges
Edit: steenslag answer above is way better. use that!

Related

Constraint Satisfaction Problems with solvers VS Systematic Search

We have to solve a difficult problem where we need to check a lot of complex rules from multiple sources against a system in order to decide if the system satisfy those rules or how it should be changed to satisfy them.
We initially started using Constraint Satisfaction Problems algorithms (using Choco) to try to solve it but since the number of rules and variables would be smaller than anticipated, we are looking to build a list of all possibles configurations on a database and using multiple requests based on the rules to filter this list and find the solutions this way.
Is there limitations or disadvantages of doing a systematic search compared to using a CSP solver algorithms for a reasonable number of rules and variables? Will it impact performances significantly? Will it reduce the kind of constraints we can implement?
As examples :
You have to imagine it with a much bigger number of variables, much bigger domains of definition (but always discrete) and bigger number of rules (and some much more complex) but instead of describing the problem as :
x in (1,6,9)
y in (2,7)
z in (1,6)
y = x + 1
z = x if x < 5 OR z = y if x > 5
And giving it to a solver we would build a table :
X | Y | Z
1 2 1
6 2 1
9 2 1
1 7 1
6 7 1
9 7 1
1 2 6
6 2 6
9 2 6
1 7 6
6 7 6
9 7 6
And use queries like (this is just an example to help understand, actually we would use SPARQL against a semantic database) :
SELECT X, Y, Z WHERE Y = X + 1
INTERSECT
SELECT X, Y, Z WHERE (Z = X AND X < 5) OR (Z = Y AND X > 5)
CSP allows you to combine deterministic generation of values (through the rules) with heuristic search. The beauty happens when you customize both of those for your problem. The rules are just one part. Equally important is the choice of the search algorithm/generator. You can cull a lot of the search space.
While I cannot make predictions about the performance of your SQL solution, I must say that it strikes me as somewhat roundabout. One specific problem will happen if your rules change - you may find that you have to start over from scratch. Also, the RDBMS will fully generate all of the subqueries, which may explode.
I'd suggest to implement a working prototype with CSP, and one with SQL, for a simple subset of your requirements. You then will get a good feeling what works and what does not. Be sure to think about long term maintenance as well.
Full disclaimer: my last contact with CSP was decades ago in university as part of my master's (I implemented a CSP search engine not unlike choco, of course a bit more rudimentary, and researched a bit on that topic). But the field will certainly have evolved since then.

Should I evaluate a range?

Re-asking this here, since it doesn't belong in the Code Review SE.
I was always taught to never have static expressions in code, as it is an unnecessary operation that will always have the same output. For example, you would never have if 6 < 7 (aside from people slapping the occasional while true around).
That being said, I have a functioning bash script as follows:
#!/usr/bin/env bash
for i in {0..9}
do
...some stuff...
done
However, PyCharm is giving my hell for this re-iterating my concern in my first paragraph. It's counter suggestion is to have:
#!/usr/bin/env bash
for i in 0 1 2 3 4 5 6 7 8 9
do
...some stuff...
done
The logic is that it will not have to evaluate the range itself, thus increasing speed.
My Question
I think that the range looks nicer and, as far as I know, it won't actually affect speed (I don't mean noticeably, I mean at all), as it is simply iterating as it goes. Am I incorrect in thinking so?
It's a peeve of mine to waste cycles, but it's a greater peeve of mine to write grotesque looking code.
The best practice approach in bash or other shells adopting ksh extensions is a C-style for loop:
for ((i=0; i<=9; i++)); do
echo "Doing some stuff with $i"
done
This has advantages over the {0..9} syntax in that it works with variables ({$min..$max} doesn't work, because brace expansions happens before variable expansions do) and avoids needing to store the full list in memory at once, and it has advantages over 0 1 2 3 4 5 6 7 8 9 because the latter is hard to check for typos (it's trickier to visually spot the problems with 0 1 2 3 5 4 6 7 8 9 or 0 1 2 3 4 6 7 8 9).

Rearrange Variables in an equation with Ruby

(To begin with, I am a beginner in Ruby.)
I want to rearrange variables in a linear equation (there may be 2 or more variables).
I have an equation
a + 2*b - 1 = 0
I would like Ruby to give me
a = 1 - 2*b
Or alternatively
b = (1-a)/2
Is there a way to do this in Ruby ? (It is possible in Matlab, which, in my case, seems overkill...)
Thanks in advance
Try the symbolic gem.
It should provide you with the tools you need.

Direct way of converting BASE-14 to BASE-7

Given (3AC) in base-14. Convert it into BASE-7.
A simple approach is to convert first 3AC into BASE-10 & then to BASE-7 which results in 2105.
I was just wondering that does there exist any direct way of conversion from BASE-14 to BASe-7?
As others have said, there is no straightforward technique, because 14 is not a power of 7.
However, you don't need to go through base-10. One approach is to write routines that perform base-14 arithmetic (specifically addition and multiplication), and then use them to process each base-7 digit in turn: multiply it by the relevant power-of-7, and then add it to an accumulator.
I have found one approach.
There is no need to calculate for base 10 and then base 7. It can be done using this formula!
If a no X is represented in base 14 as
X = an a(n-1) a(n-2) .... a(0)
then in base 7 we can write it as
X=.....rqp
where
p=(2^0)a(0)%7;
q=((2^1)a(1) + p/7)%7
r=((2^2)a(2) + q/7)%7
..........
nth term=((2^n)a(n) + (n-1)th term/7)%7
(will go further because a no. in base 14 will require more digits in base 7).
The logic is simple, just based on properties of bases, and taking into account the fact that 7 is half of 14. Else it would have been a tedious task.
Eg. here it is given 3AC.
C =12;
so last digit is (2^0 * 12)%7 = 5
A=10
next digit is (2^1 * 10 + 12/7)%7 = (20+1)%7=21%7=0
next is 3;
next digit is (2^2 * 3 + 21/7)%7 = (12+3)%7=15%7=1
next is nothing(0);
next digit is (2^3 * 0 + 15/7)%7 = (0+2)%7=2%7=2
Hence, in base 7 number will be, 2105. This method may seem confusing and difficult, but with a little practice, it may come very handy in solving similar types of problems! Also, even if the number is very long, like 287AC23B362, we don't have to unnecessarily find base 10, which may consume atleast some time, and directly compute base 7!
No, there's not really an easy way to do as you wish because 14 is not a power of 7.
The only tricks that I know of for something like this (ex easily going from hex to binary) require that one base be a power of the other.
Link gives a reasonable clear answer. In short, it's a bit of a pain from the methods I know.

Aggregating automatically-generated feature vectors

I've got a classification system, which I will unfortunately need to be vague about for work reasons. Say we have 5 features to consider, it is basically a set of rules:
A B C D E Result
1 2 b 5 3 X
1 2 c 5 4 X
1 2 e 5 2 X
We take a subject and get its values for A-E, then try matching the rules in sequence. If one matches we return the first result.
C is a discrete value, which could be any of a-e. The rest are just integers.
The ruleset has been automatically generated from our old system and has an extremely large number of rules (~25 million). The old rules were if statements, e.g.
result("X") if $A >= 1 && $A <= 10 && $C eq 'A';
As you can see, the old rules often do not even use some features, or accept ranges. Some are more annoying:
result("Y") if ($A == 1 && $B == 2) || ($A == 2 && $B == 4);
The ruleset needs to be much smaller as it has to be human maintained, so I'd like to shrink rule sets so that the first example would become:
A B C D E Result
1 2 bce 5 2-4 X
The upshot is that we can split the ruleset by the Result column and shrink each independently. However, I cannot think of an easy way to identify and shrink down the ruleset. I've tried clustering algorithms but they choke because some of the data is discrete, and treating it as continuous is imperfect. Another example:
A B C Result
1 2 a X
1 2 b X
(repeat a few hundred times)
2 4 a X
2 4 b X
(ditto)
In an ideal world, this would be two rules:
A B C Result
1 2 * X
2 4 * X
That is: not only would the algorithm identify the relationship between A and B, but would also deduce that C is noise (not important for the rule)
Does anyone have an idea of how to go about this problem? Any language or library is fair game, as I expect this to be a mostly one-off process. Thanks in advance.
Check out the Weka machine learning lib for Java. The API is a little bit crufty but it's very useful. Overall, what you seem to want is an off-the-shelf machine learning algorithm, which is exactly what Weka contains. You're apparently looking for something relatively easy to interpret (you mention that you want it to deduce the relationship between A and B and to tell you that C is just noise.) You could try a decision tree, such as J48, as these are usually easy to visualize/interpret.
Twenty-five million rules? How many features? How many values per feature? Is it possible to iterate through all combinations in practical time? If you can, you could begin by separating the rules into groups by result.
Then, for each result, do the following. Considering each feature as a dimension, and the allowed values for a feature as the metric along that dimension, construct a huge Karnaugh map representing the entire rule set.
The map has two uses. One: research automated methods for the Quine-McCluskey algorithm. A lot of work has been done in this area. There are even a few programs available, although probably none of them will deal with a Karnaugh map of the size you're going to make.
Two: when you have created your final reduced rule set, iterate over all combinations of all values for all features again, and construct another Karnaugh map using the reduced rule set. If the maps match, your rule sets are equivalent.
-Al.
You could try a neural network approach, trained via backpropagation, assuming you have or can randomly generate (based on the old ruleset) a large set of data that hit all your classes. Using a hidden layer of appropriate size will allow you to approximate arbitrary discriminant functions in your feature space. This is more or less the same idea as clustering, but due to the training paradigm should have no issue with your discrete inputs.
This may, however, be a little too "black box" for your case, particularly if you have zero tolerance for false positives and negatives (although, it being a one-off process, you get an arbitrary degree of confidence by checking a gargantuan validation set).

Resources