What's the benefit of 3-ways comparison operator (<=>) in C++20? - syntax

I know the syntax of it.
I just wondering what's the benefit or whether does it make sens.
Without it, we must code like this:
void func1(int x, int y) {
if( x > y )
doSomeThing();
else if( x < y )
doSomeElse();
else
explosive();
}
With it, we can do like this:
void func1(int x, int y) {
auto result = x <=> y;
if( result > 0 )
doSomeThing();
else if( result < 0 )
doSomeElse();
else
explosive();
}
Except for returning a comparison result, I can NOT see any benefit of this feature.
Some one says it can make our codes more readable, but I don't think so.
It's very obvious, the former example has more readability.
As to return a result, like this:
int func1(int x, int y) {
return x <=> y;
}
It looks like that we get much more readability, but we still need to check the value with another if/else somewhere, e.g. outside of the func1.

I can NOT see any benefit of this feature.
Then you are thinking too narrowly about what is actually happening.
Direct use of <=> does not exist to serve the needs of ints. It can be used for comparing them, but ints are not why the functionality exists.
It exists for types that actually have complex comparison logic. Consider std::string. To know if one string is "less than" another, you have to iterate through both strings and compare each character. When you find a non-equivalent one, you have your answer.
Your code, when applied to string, does the comparison twice: once with less than and once with greater than. The problem is this: the first comparison already found the first non-equal character. But the second comparison does not know where that is. So it must start from the very beginning, doing the exact same comparisons the first one did.
That's a lot of repeated work for an answer than the first comparison already computed. In fact, there is a really easy way to compute <=> for strings: subtract the corresponding character in the second string from the first. If the value is zero, then they are equal. If the result is negative, the first string is less; if it's positive, the first string is greater.
Which is... exactly what <=> returns, isn't it? By using <=>, you do two expensive comparisons in the space of one; testing the return value is immaterial next to the cost of them.
The more complex your comparison logic, the more you are likely to save with <=> if you need to categorize them into less/greater/equal.
It should also be noted that processors often have special opcodes to tell if an integer is negative, zero, or positive. If we look at the x86 assembly for your integer comparison:
func1(int, int): # #func1(int, int)
cmp edi, esi
jle .LBB0_1
jmp a()#PLT # TAILCALL
.LBB0_1:
jge .LBB0_2
jmp b()#PLT # TAILCALL
.LBB0_2:
jmp c()#PLT # TAILCALL
We can see that it only executes cmp once; the jle and jge instructions use the results of the comparison. The <=> compiles to the same assembly, so the compiler fully understands these as being synonyms.

Related

perl6 min and max of mixed Str and Int arguments

What type gets converted first for min and max routine when the arguments contain mixture of Str and Int ?
To exit type 'exit' or '^D'
> say ("9", "10").max
9
> say ("9", "10").max.WHAT
(Str)
> say (9, "10").max
9
> say (9, "10").max.WHAT
(Int) # if convert to Int first, result should be 10
> say ("9", 10).max
9
> say ("9", 10).max.WHAT
(Str) # if convert to Str first, result should be 9
> say (9, "10").min
10
> say (9, "10").min.WHAT
(Str) # does min and max convert Str or Int differently?
If min or max converts arguments to be the type of the first argument, the results here are still inconsistent.
Thank you for your enlightenment !!!
Both min and max use the cmp infix operator to do the comparisons. If the types differ, then this logic is used (rewritten slightly to be pure Perl 6, whereas the real one uses an internals shortcut):
multi sub infix:<cmp>($a, $b) {
$a<> =:= $b<>
?? Same
!! a.Stringy cmp b.Stringy
}
Effectively, if the two things point to the exact same object, then they are the Same, otherwise stringify both and then compare. Thus:
say 9 cmp 10; # uses the (Int, Int) candidate, giving Less
say "9" cmp "10"; # uses the (Str, Str) candidate, giving More
say 9 cmp "10"; # delegates to "9" cmp "10", giving More
say "9" cmp 10; # delegates to "9" cmp "10", giving More
The conversion to a string is done for the purpose of comparison (as an implementation detail of cmp), and so has no impact upon the value that is returned by min or max, which will be that found in the input list.
Well, jnthn has answered. His answers are always authoritative and typically wonderfully clear and succinct too. This one is no exception. :) But I'd started so I'll finish and publish...
A search for "method min" in the Rakudo sources yields 4 matches of which the most generic is a match in core/Any-iterable-methods.pm6.
It might look difficult to understand but nqp is actually essentially a simple subset of P6. The key thing is it uses cmp to compare each value that is pulled from the sequence of values being compared against the latest minimum (the $pulled cmp $min bit).
Next comes a search for "sub infix:<cmp>" in the Rakudo sources. This yields 14 matches.
These will all have to be looked at to confirm what the source code shows for comparing these various types of value. Note also that the logic is pairwise for each pair which is slightly weird to think about. So if there's three values a, b, and c, each of a different type, then the logic will be that a is the initial minimum, then there'll be a b cmp a which will be whatever cmp logic wins for that combination of types in that order, and then c cmp d where d is whichever won the b cmp a comparison and the cmp logic will be whatever is suitable to that pair of types in that order.
Let's start with the first one -- the match in core/Order.pm6 -- which is presumably a catchall if none of the other matches are more specific:
If both arguments of cmp are numeric, then comparison is a suitable numeric comparison (eg if they're both Ints then comparison is of two arbitrary precision integers).
If one argument is numeric but not the other, then -Inf and Inf are sorted to the start and end but otherwise comparison is done after both arguments are coerced by .Stringyfication.
Otherwise, both arguments are coerced by .Stringyfication.
So, that's the default.
Next one would have to go thru the individual overloads. For example, the next one is the cmp ops in core/allomorphs.pm6 and we see how for allomorphic types (IntStr etc.) comparison is numeric first, then string if that doesn't settle it. Note the comment:
we define cmp ops for these allomorphic types as numeric first, then Str. If you want just one half of the cmp, you'll need to coerce the args
Anyhoo, I see jnthn's posted yet another great answer so it's time to wrap this one. :)

Efficient way to write ordering instances?

I'm working on a basic Haskell exercise that is set up as follows: a data definition is made, where Zero is declared to be a NaturalNumber, and a series of numbers (printed out by name, so, for instance, four) up to ten is constructed with this.
I didn't have too much trouble with understanding how the declaration of Eq instances works (apart from not having been given an exact explanation for the syntax), but I'm having trouble with declaring all instances I need for Ord -- I need to be able to construct an ordering over the entire set of numbers, such that I'll get True if I input "ten > nine" or something.
Right now, I have this snippet of code. The first two lines should be correct, as I copied them (as I was supposed to) from the exercise itself.
instance Ord NaturalNumber where
compare Zero Zero = EQ
compare Zero (S Zero) = LT
compare (S Zero) Zero = GT
compare x (S x) = LT
The first four lines work fine, but they can't deal with cases like "compare four five", and anything similar to what I typed in the last doesn't work even if I type in something like compare four four = EQ: I get a "conflicting definitions" error, presumably because the x appears twice. If I write something like compare two one = GT instead, I get a "pattern match(es) are overlapped" warning, but it works. However, I also get the result GT when I input compare one two into the actual Haskell platform, so clearly something isn't working. This happens even if I add compare one two = LT below that line.
So clearly I can't finish off this description of Ord instances by writing every instance I could possibly need, and even if I could, it would be incredibly inefficient to write out all 100 instances by hand.
Might anyone be able to provide me with a hint as to how I can resolve this problem and finish off the construction of an ordering mechanism?
What this task focuses on is finding base cases and recursion rules. The first two lines you were given were
instance Ord NaturalNumber where
compare Zero Zero = EQ
This is the first base case, in words:
zero is equal to zero
The other two base cases are:
zero is less than the successor of any NaturalNumber
the successor of any NaturalNumber is greater than zero
Note that your lines three and four only say that 0 < 1 and 1 > 0, but nothing about any other nonzero numbers.
The recursion rule, then, is that it makes no difference if you compare two nonzero numbers, or the numbers they are successors of:
comparing 1 + x and 1 + y is the same as comparing x and y.
Codifying that into Haskell should give you the solution to this exercise.
You'll need to organize your instances in a way that will cover all possible patterns. To make it simpler, remember how your numbers are defined:
one = S Zero
two = S one -- or S (S Zero)
and think in terms of S and Zero, not one, two etc. (they are merely aliases). Once you do this, it should become clear that you're missing a case like:
compare (S x) (S y) = compare x y
Edit:
Like Jakob Runge noticed, also the following base clauses should be improved:
compare Zero (S Zero) = LT
compare (S Zero) Zero = GT
As they're written, they allow comparison only between zero and one. You should change them to cover comparison between zero and any positive number:
compare Zero (S _) = LT
compare (S _) Zero = GT
Your compare function needs to be recursive. You will want your last case to capture the situation where both arguments are the successor of something, and then recurse on what they are the successor of. Additionally, your middle two cases, are probably not what you want, as they will only capture the following cases:
1 > 0
0 < 1
You would like this to be more general, so that you can handle cases like:
S x > 0, for all x
0 < S x, for all x

Speed of Comparison operators

In languages such as... well anything, both operators for < and <= (and their opposites) exist. Which would be faster, and how are they interpreted?
if (x <= y) { blah; }
or
if (x < y + 1) { blah; }
Assuming no compiler optimizations (big assumption), the first will be faster, as <= is implemented by a single jle instruction, where as the latter requires an addition followed by a jl instruction.
http://en.wikibooks.org/wiki/X86_Assembly/Control_Flow#Jump_if_Less
I wouldn't worry about this at all as far as performance goes. Using C as an example, on a simple test I ran with GCC 4.5.1 targeting x86 (with -O2), the (x <=y ) operation compiled to:
// if (x <= y) {
// printf( "x <= y\n");
// }
//
// `x` is [esp+28]
// `y` is [esp+24]
mov eax, DWORD PTR [esp+24] // load `y` into eax
cmp DWORD PTR [esp+28], eax // compare with `x`
jle L5 // if x < y, jump to the `true` block
L2:
// ...
ret
L5: // this prints "x <= y\n"
mov DWORD PTR [esp], OFFSET FLAT:LC1
call _puts
jmp L2 // jumps back to the code after the ` if statement
and the (x < y + 1) operation compiled to:
// if (x < y +1) {
// printf( "x < y+1\n");
// }
//
// `x` is [esp+28]
// `y` is [esp+24]
mov eax, DWORD PTR [esp+28] // load x into eax
cmp DWORD PTR [esp+24], eax // compare with y
jl L3 // jump past the true block if (y < x)
mov DWORD PTR [esp], OFFSET FLAT:LC2
call _puts
L3:
So you might have a difference of a jump around a jump or so, but you should really only be concerned about this kind of thing for the odd time where it really is a hot spot. Of course there may be differences between languages and what exactly happens might depend on the type of objects that are being compared. But I'd still not worry about this at all as far as performance is concerned (until it became a demonstrated performance issue - which I'll be surprised if it ever does for more than once or twice in my lifetime).
So, I think the only two reasons to worry about which test to use are:
correctness - of course, this trumps any other consideration
style/readabilty
While you might not think there's much to the style/readability consideration, I do worry about this a little. In my C and C++ code today, I'd favor using the < operator over <= because I think loops tend to terminate 'better' using a < than a <= test. So, for example:
iterating over an array by index, you should typically use an index < number_of_elements test
iterating over an array using pointers to elements should use a ptr < (array + number_of_elements) test
Actually even in C, I now tend to use a ptr != (array + number_of_elements) since I've gotten used to STL iterators where the < relation won work.
In fact, if I see a <= test in a for loop condition, I take a close look - often there's a bug lurking. I consider it an anti-pattern.
Noe I'll grant that a lot of this may not hold for other languages, but I be surprised if when I'm using another language that there's ever a performance issues I'll have to worry about because I chose to use < over <=.
What data-type?
If y is INT_MAX, then the first expression is true no matter what x is (assuming x is the same or smaller type), while the second expression is always false.
If the answer doesn't need to be right, you can get it even faster.
Have you considered that both those arguments are different? In case x and y are floating point numbers - they may not give the same result. That is the reason both comparison operators exists.
Prefer the first one.
In some languages with dynamic types the running environment has to figure out what the type of y is and execute the appropriate + operator.
Leaving this as vague as you have has caused this to be an unanswerable question. Performance cannot be evaluated unless you have software and hardware to measure - what language? what language implementation? what target CPU architecture? etc.
That being said, both <= and < are often identical performance-wise, because they are logically equivalent to > and >=, just with swapped destinations for the underlying goto's (branch instructions), or swapped logic for the underlying "true/false" evaluation.
If you're programming in C or C++, the compiler may be able to figure out what you're doing, and swap in the faster alternative, anyway.
Write code that is understandable, maintainable, correct, and performant, in that order. For performance, find tools to measure the performance of your whole program, and spend your time wisely. Optimize bottlenecks only until your program is fast enough. Spend the time you save by making better code, or making more cool features :)

Finding perfect numbers between 1 and 100

How can I generate all perfect numbers between 1 and 100?
A perfect number is a positive integer that is equal to the sum of its proper divisors. For example, 6(=1+2+3) is a perfect number.
So I suspect Frank is looking for an answer in Prolog, and yes it does smell rather homeworky...
For fun I decided to write up my answer. It took me about 50 lines.
So here is the outline of what my predicates look like. Maybe it will help get you thinking the Prolog way.
is_divisor(+Num,+Factor)
divisors(+Num,-Factors)
divisors(+Num,+N,-Factors)
sum(+List,-Total)
sum(+List,+Sofar,-Total)
is_perfect(+N)
perfect(+N,-List)
The + and - are not really part of the parameter names. They are a documentation clue about what the author expects to be instantiated.(NB) "+Foo" means you expect Foo to have a value when the predicate is called. "-Foo" means you expect to Foo to be a variable when the predicate is called, and give it a value by the time it finishes. (kind of like input and output, if it helps to think that way)
Whenever you see a pair of predicates like sum/2 and sum/3, odds are the sum/2 one is like a wrapper to the sum/3 one which is doing something like an accumulator.
I didn't bother to make it print them out nicely. You can just query it directly in the Prolog command line:
?- perfect(100,L).
L = [28, 6] ;
fail.
Another thing that might be helpful, that I find with Prolog predicates, is that there are generally two kinds. One is one that simply checks if something is true. For this kind of predicate, you want everything else to fail. These don't tend to need to be recursive.
Others will want to go through a range (of numbers or a list) and always return a result, even if it is 0 or []. For these types of predicates you need to use recursion and think about your base case.
HTH.
NB: This is called "mode", and you can actually specify them and the compiler/interpreter will enforce them, but I personally just use them in documentation. Also tried to find a page with info about Prolog mode, but I can't find a good link. :(
I'm not sure if this is what you were looking for, but you could always just print out "6, 28"...
Well looks like you need to loop up until n/2 that is 1/2 of n. Divide the number and if there is no remainder then you can include it in the total, once you have exhausted 1/2 of n then you check if your total added = the number you are testing.
For instance:
#include "stdafx.h"
#include "iostream"
#include "math.h"
using namespace std;
int main(void)
{
int total=0;
for(int i = 1; i<=100; i++)
{
for( int j=1; j<=i/2; j++)
{
if (!(i%j))
{
total+=j;
}
}
if (i==total)
{
cout << i << " is perfect";
}
//it works
total=0;
}
return 0;
}

Palindrome detection efficiency

I got curious by Jon Limjap's interview mishap and started to look for efficient ways to do palindrome detection. I checked the palindrome golf answers and it seems to me that in the answers are two algorithms only, reversing the string and checking from tail and head.
def palindrome_short(s):
length = len(s)
for i in xrange(0,length/2):
if s[i] != s[(length-1)-i]: return False
return True
def palindrome_reverse(s):
return s == s[::-1]
I think neither of these methods are used in the detection of exact palindromes in huge DNA sequences. I looked around a bit and didn't find any free article about what an ultra efficient way for this might be.
A good way might be parallelizing the first version in a divide-and-conquer approach, assigning a pair of char arrays 1..n and length-1-n..length-1 to each thread or processor.
What would be a better way?
Do you know any?
Given only one palindrome, you will have to do it in O(N), yes. You can get more efficiency with multi-processors by splitting the string as you said.
Now say you want to do exact DNA matching. These strings are thousands of characters long, and they are very repetitive. This gives us the opportunity to optimize.
Say you split a 1000-char long string into 5 pairs of 100,100. The code will look like this:
isPal(w[0:100],w[-100:]) and isPal(w[101:200], w[-200:-100]) ...
etc... The first time you do these matches, you will have to process them. However, you can add all results you've done into a hashtable mapping pairs to booleans:
isPal = {("ATTAGC", "CGATTA"): True, ("ATTGCA", "CAGTAA"): False}
etc... this will take way too much memory, though. For pairs of 100,100, the hash map will have 2*4^100 elements. Say that you only store two 32-bit hashes of strings as the key, you will need something like 10^55 megabytes, which is ridiculous.
Maybe if you use smaller strings, the problem can be tractable. Then you'll have a huge hashmap, but at least palindrome for let's say 10x10 pairs will take O(1), so checking if a 1000 string is a palindrome will take 100 lookups instead of 500 compares. It's still O(N), though...
Another variant of your second function. We need no check equals of the right parts of normal and reverse strings.
def palindrome_reverse(s):
l = len(s) / 2
return s[:l] == s[l::-1]
Obviously, you're not going to be able to get better than O(n) asymptotic efficiency, since each character must be examined at least once. You can get better multiplicative constants, though.
For a single thread, you can get a speedup using assembly. You can also do better by examining data in chunks larger than a byte at a time, but this may be tricky due to alignment considerations. You'll do even better to use SIMD, if you can examine chunks as large as 16 bytes at a time.
If you wanted to parallelize it, you could divide the string into N pieces, and have processor i compare the segment [i*n/2, (i+1)*N/2) with the segment [L-(i+1)*N/2, L-i*N/2).
There isn't, unless you do a fuzzy match. Which is what they probably do in DNA (I've done EST searching in DNA with smith-waterman, but that is obviously much harder then matching for a palindrome or reverse-complement in a sequence).
They are both in O(N) so I don't think there is any particular efficiency problem with any of these solutions. Maybe I am not creative enough but I can't see how would it be possible to compare N elements in less than N steps, so something like O(log N) is definitely not possible IMHO.
Pararellism might help, but it still wouldn't change the big-Oh rank of the algorithm since it is equivalent to running it on a faster machine.
Comparing from the center is always much more efficient since you can bail out early on a miss but it alwo allows you to do faster max palindrome search, regardless of whether you are looking for the maximal radius or all non-overlapping palindromes.
The only real paralellization is if you have multiple independent strings to process. Splitting into chunks will waste a lot of work for every miss and there's always much more misses than hits.
On top of what others said, I'd also add a few pre-check criteria for really large inputs :
quick check whether tail-character matches
head character
if NOT, just early exit by returning Boolean-False
if (input-length < 4) {
# The quick check just now already confirmed it's palindrome
return Boolean-True
} else if (200 < input-length) {
# adjust this parameter to your preferences
#
# e.g. make it 20 for longer than 8000 etc
# or make it scale to input size,
# either logarithmically, or a fixed ratio like 2.5%
#
reverse last ( N = 4 ) characters/bytes of the input
if that **DOES NOT** match first N chars/bytes {
return boolean-false # early exit
# no point to reverse rest of it
# when head and tail don't even match
} else {
if N was substantial
trim out the head and tail of the input
you've confirmed; avoid duplicated work
remember to also update the variable(s)
you've elected to track the input size
}
[optional 1 : if that substring of N characters you've
just checked happened to all contain the
same character, let's call it C,
then locate the index position, P, for the first
character that isn't C
if P == input-size
then you've already proven
the entire string is a nonstop repeat
of one single character, which, by def,
must be a palindrome
then just return Boolean-True
but the P is more than the half-way point,
you've also proven it cannot possibly be a
palindrome, because second half contains a
component that doesn't exist in first half,
then just return Boolean-False ]
[optional 2 : for extremely long inputs,
like over 200,000 chars,
take the N chars right at the middle of it,
and see if the reversed one matches
if that fails, then do early exit and save time ]
}
if all pre-checks passed,
then simply do it BAU style :
reverse second-half of it,
and see if it's same as first half
With Python, short code can be faster since it puts the load into the faster internals of the VM (And there is the whole cache and other such things)
def ispalin(x):
return all(x[a]==x[-a-1] for a in xrange(len(x)>>1))
You can use a hashtable to put the character and have a counter variable whose value increases everytime you find an element not in table/map. If u check and find element thats already in table decrease the count.
For odd lettered string the counter should be back to 1 and for even it should hit 0.I hope this approach is right.
See below the snippet.
s->refers to string
eg: String s="abbcaddc";
Hashtable<Character,Integer> textMap= new Hashtable<Character,Integer>();
char charA[]= s.toCharArray();
for(int i=0;i<charA.length;i++)
{
if(!textMap.containsKey(charA[i]))
{
textMap.put(charA[i], ++count);
}
else
{
textMap.put(charA[i],--count);
}
if(length%2 !=0)
{
if(count == 1)
System.out.println("(odd case:PALINDROME)");
else
System.out.println("(odd case:not palindrome)");
}
else if(length%2==0)
{
if(count ==0)
System.out.println("(even case:palindrome)");
else
System.out.println("(even case :not palindrome)");
}
public class Palindrome{
private static boolean isPalindrome(String s){
if(s == null)
return false; //unitialized String ? return false
if(s.isEmpty()) //Empty Strings is a Palindrome
return true;
//we want check characters on opposite sides of the string
//and stop in the middle <divide and conquer>
int left = 0; //left-most char
int right = s.length() - 1; //right-most char
while(left < right){ //this elegantly handles odd characters string
if(s.charAt(left) != s.charAt(right)) //left char must equal
return false; //right else its not a palindrome
left++; //converge left to right
right--;//converge right to left
}
return true; // return true if the while doesn't exit
}
}
though we are doing n/2 calculations its still O(n)
this can done also using threads, but calculations get messy, best to avoid it. this doesn't test for special characters and is case sensitive. I have code that does it, but this code can be modified to handle that easily.

Resources