cost of == operator vs < or > operators [closed] - performance

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
This is really just sort of an academic question, I'm just curious to know which one is faster. I'm guessing the difference is negligible but still, I'd like to know.
if( (x == 1) || (x == 2) )
vs
if (x < 3)
thanks!

In the form you provided there is evident difference in complexity: the first code uses 3 operators, then the second -- just one. But OK, lets put this code of and assume you want to compare > (or <) and == (!=). If you have ever faced assembler while examing your programs (but i bet you didn't) you would notice such code
if (x < 3) /* action */;
being translated to smth like (for x86 CPU):
cmp %eax, 3 // <-- compare EAX and 3. It modifies flags register(*1) (namely ZF, CF, SF and so on)
jge ##skip // <-- this instruction tells CPU to jump when some conditions related to flags are met(*2).
// So this code is executed when jump is *not* made (when x is *not*
// greater or equal than 3.
/* here is action body */
##skip:
// ...
Now consider this code:
if (x == 3) /* action */;
It will give almost the same assembly (of course, it may differ from mine, but not semantically):
cmp %eax, 3
jne ##skip // < -- here is the only difference
/* action is run when x is equal to 3 */
##skip:
...
Both of this operators (jge and jne and others jumps) do their job with the same speed (because CPUs are made so, but it obviously depends on its architecture). The more impact on performance does distance of jump (difference between code positions), cache misses (when processor wrongly predicted the jump), and so on. Some instructions are more effective even (use less bytes for example), but remember the only thing: do not pay them so much attention. It always better to make algorythmic optimizations, don't do savings on matches. Let the compiler do it for you -- it is really more competent in such questions. Focus on your algorythms, code readablity, program architecture, fault-tolerance.. Make speed the last factor.
(*1): http://en.wikipedia.org/wiki/FLAGS_register_%28computing%29
(*2): http://www.unixwiz.net/techtips/x86-jumps.html

Related

Stack implementation the Trollface way [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
In my software engineering course, I encountered the following characteristic of a stack, condensed by me: What you push is what you pop. The fully axiomatic version I Uled here.
Being a natural-born troll, I immediately invented the Troll Stack. If it has more than 1 element already on it, pushing results in a random permutation of those elements. Promptly I got into an argument with the lecturers whether this nonsense implementation actually violates the axioms. I said no, the top element stays where it is. They said yes, somehow you can recursively apply the push-pop-axiom to get "deeper". Which I don't see. Who is right?
The violated axiom is pop(push(s,x)) = s. Take a stack s with n > 1 distinct entries. If you implement push such that push(s,x) is s'x with s' being a random permutation of s, then since pop is a function, you have a problem: how do you reverse random_permutation() such that pop(push(s,x)) = s? The preimage of s' might have been any of the n! > 1 permutations of s, and no matter which one you map to, there are n! - 1 > 0 other original permutations s'' for which pop(push(s'',x)) != s''.
In cases like this, which might be very easy to see for everybody but not for you (hence your usage of the "troll" word), it always helps to simply run the "program" on a piece of paper.
Write down what happens when you push and pop a few times, and you will see.
You should also be able to see how those axioms correspond very closely to the actual behaviour of your stack; they are not just there for fun, but they deeply (in multiple meanings of the word) specify the data structure with its methods. You could even view them as a "formal system" describing the ins and outs of stacks.
Note that it is still good for you to be sceptic; this leads to a) better insight and b) detection of errors your superiours make. In this case they are right, but there are cases where it can save you a lot of time (e.g. while searching the solution for the "MU" riddle in "Gödel, Escher, Bach", which would be an excellent read for you, I think).

Is there ever a point to swap two variables without using a third?

I know not to use them, but there are techniques to swap two variables without using a third, such as
x ^= y;
y ^= x;
x ^= y;
and
x = x + y
y = x - y
x = x - y
In class the prof mentioned that these were popular 20 years ago when memory was very limited and are still used in high-performance applications today. Is this true? My understanding as to why it's pointless to use such techniques is that:
It can never be the bottleneck using the third variable.
The optimizer does this anyway.
So is there ever a good time to not swap with a third variable? Is it ever faster?
Compared to each other, is the method that uses XOR vs the method that uses +/- faster? Most architectures have a unit for addition/subtraction and XOR so wouldn't that mean they are all the same speed? Or just because a CPU has a unit for the operation doesn't mean they're all the same speed?
These techniques are still important to know for the programmers who write the firmware of your average washing machine or so. Lots of that kind of hardware still runs on Z80 CPUs or similar, often with no more than 4K of memory or so. Outside of that scene, knowing these kinds of algorithmic "trickery" has, as you say, as good as no real practical use.
(I do want to remark though that nonetheless, the programmers who remember and know this kind of stuff often turn out to be better programmers even for "regular" applications than their "peers" who won't bother. Precisely because the latter often take that attitude of "memory is big enough anyway" too far.)
There's no point to it at all. It is an attempt to demonstrate cleverness. Considering that it doesn't work in many cases (floating point, pointers, structs), is unreadabe, and uses three dependent operations which will be much slower than just exchanging the values, it's absolutely pointless and demonstrates a failure to actually be clever.
You are right, if it was faster, then optimising compilers would detect the pattern when two numbers are exchanged, and replace it. It's easy enough to do. But compilers do actually notice when you exchange two variables and may produce no code at all, but start using the different variables after that. For example if you exchange x and y, then write a += x; b += y; the compiler may just change this to a += y; b += x; . The xor or add/subtract pattern on the other hand will not be recognised because it is so rare and won't get improved.
Yes, there is, especially in assembly code.
Processors have only a limited number of registers. When the registers are pretty full, this trick can avoid spilling a register to another memory location (posssibly in an unfetched cacheline).
I've actually used the 3 way xor to swap a register with memory location in the critical path of high-performance hand-coded lock routines for x86 where the register pressure was high, and there was no (lock safe!) place to put the temp. (on the X86, it is useful to know the the XCHG instruction to memory has a high cost associated with it, because it includes its own lock, whose effect I did not want. Given that the x86 has LOCK prefix opcode, this was really unnecessary, but historical mistakes are just that).
Morale: every solution, no matter how ugly looking when standing in isolation, likely has some uses. Its good to know them; you can always not use them if inappropriate. And where they are useful, they can be very effective.
Such a construct can be useful on many members of the PIC series of microcontrollers which require that almost all operations go through a single accumulator ("working register") [note that while this can sometimes be a hindrance, the fact that it's only necessary for each instruction to encode one register address and a destination bit, rather than two register addresses, makes it possible for the PIC to have a much larger working set than other microcontrollers].
If the working register holds a value and it's necessary to swap its contents with those of RAM, the alternative to:
xorwf other,w ; w=(w ^ other)
xorwf other,f ; other=(w ^ other)
xorwf other,w ; w=(w ^ other)
would be
movwf temp1 ; temp1 = w
movf other,w ; w = other
movwf temp2 ; temp2 = w
movf temp1,w ; w = temp1 [old w]
movwf other ; other = w
movf temp2,w ; w = temp2 [old other]
Three instructions and no extra storage, versus six instructions and two extra registers.
Incidentally, another trick which can be helpful in cases where one wishes to make another register hold the maximum of its present value or W, and the value of W will not be needed afterward is
subwf other,w ; w = other-w
btfss STATUS,C ; Skip next instruction if carry set (other >= W)
subwf other,f ; other = other-w [i.e. other-(other-oldW), i.e. old W]
I'm not sure how many other processors have a subtract instruction but no non-destructive compare, but on such processors that trick can be a good one to know.
These tricks are not very likely to be useful if you want to exchange two whole words in memory or two whole registers. Still you could take advantage of them if you have no free registers (or only one free register for memory-to-memoty swap) and there is no "exchange" instruction available (like when swapping two SSE registers in x86) or "exchange" instruction is too expensive (like register-memory xchg in x86) and it is not possible to avoid exchange or lower register pressure.
But if your variables are two bitfields in single word, a modification of 3-XOR approach may be a good idea:
y = (x ^ (x >> d)) & mask
x = x ^ y ^ (y << d)
This snippet is from Knuth's "The art of computer programming" vol. 4a. sec. 7.1.3. Here y is just a temporary variable. Both bitfields to exchange are in x. mask is used to select a bitfield, d is distance between bitfields.
Also you could use tricks like this in hardness proofs (to preserve planarity). See for example crossover gadget from this slide (page 7). This is from recent lectures in "Algorithmic Lower Bounds" by prof. Erik Demaine.
Of course it is still useful to know. What is the alternative?
c = a
a = b
b = c
three operations with three resources rather than three operations with two resources?
Sure the instruction set may have an exchange but that only comes into play if you are 1) writing assembly or 2) the optimizer figures this out as a swap and then encodes that instruction. Or you could do inline assembly but that is not portable and a pain to maintain, if you called an asm function then the compiler has to setup for the call burning a bunch more resources and instructions. Although it can be done you are not as likely to actually exploit the instruction sets feature unless the language has a swap operation.
Now the average programmer doesnt NEED to know this now any more than back in the day, folks will bash this kind of premature optimization, and unless you know the trick and use it often if the code isnt documented then it is not obvious so it is bad programming because it is unreadable and unmaintainable.
it is still a value programming education and exercise for example to have one invent a test to prove that it actually swaps for all combinations of bit patterns. And just like doing an xor reg,reg on an x86 to zero a register, it has a small but real performance boost for highly optimized code.

MPI prime numbers [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I wanted to find paralel algorithm which finds prime numbers with mpi library.I found this one but when i run on code block,always i get
Sorry - this exercise requires an even number of tasks.
evenly divisible into 2500000 . Try 4 or 8.
What it means?how can i obtain number of tasks.
https://computing.llnl.gov/tutorials/mpi/samples/C/mpi_prime.c
What it means?
It means that you probably have to take a look at the source code and try to understand how it works. High Performance Mark has already pointed to the right MPI call and if you look at the beginning of the main function, you'd see these lines:
MPI_Comm_size(MPI_COMM_WORLD,&ntasks);
if (((ntasks%2) !=0) || ((LIMIT%ntasks) !=0)) {
printf("Sorry - this exercise requires an even number of tasks.\n");
printf("evenly divisible into %d. Try 4 or 8.\n",LIMIT);
MPI_Finalize();
exit(0);
}
Obviously it requires an even number of MPI processes (otherwise ntasks%2 != 0) and this number should also divide LIMIT (which is equal to 2500000 in this case). MPI programs should be executed through the MPI launcher, which in most cases is called mpiexec or mpirun. It takes the number of processes as a parameter. If you do not run the code through mpiexec, most MPI implementations behave as if the program was started using
mpiexec -np 1 ./program
1 is not even, hence the first part of the if condition evaluates to true and the abort code gets executed.
What you should do is either run the program in a terminal using mpiexec -np <# of procs> executable, where <# of procs> is the desired number of MPI processes and executable is the name of the executable file. <# of procs> should be even and should divide 2500000. I would suggest to go with 2, 4 or 8. 10 would also do. You won't see any improvement in the speed unless your development system has multicore CPU or/and several CPUs.
You mention Code::Blocks. See here on some ideas of how to make it run MPI programs through mpiexec.
The usual way to get the number of processes during the execution of an MPI program is to call the MPI_COMM_SIZE subroutine, like this
call MPI_COMM_SIZE(MPI_COMM_WORLD, num_procs, ierr)
where num_procs is an integer which will equal the number of processes after the call has completed. I expect that what you call a task is the same as what I call a process.
Note that I've written the call in Fortran, C and C++ bindings are also available though the latter seem to be going out of favour.

Should 'else' be kept or dropped in cases where it's not needed? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
This is a sort of trivial question but something I've been wondering about.
In terms of style (I assume the performance is identical), is it better to keep an 'else' in an if statement where it's not necessary?
For example, which of the following is better:
if (x < 10)
doSomething();
else if (x > 20)
doSomethingElse();
or
if (x < 10)
doSomething();
if (x > 20)
doSomethingElse();
Another case:
if (x < 10)
return;
else doSomething();
or
if (x < 10)
return;
doSomething();
Thanks,
Luke
In the first example, definitely keep the else in - it prevents an extra evaluation if the first part is true.
(i.e. if you do:
if (x < 10)
doSomething();
if (x > 20)
doSomethingElse();
both ifs are always evaluated.
However, if you do:
if (x < 10)
doSomething();
else if (x > 20)
doSomethingElse();
the second part is only evaluated if the first is false. [If x < 10, you don't even want to check if x > 20, it's a waste of an evaluation...])
The second is a personal design decision; choose which appeals more to you, fits the particular logic more, or follows your company's standards if they have any.
Leave the elses where it enhances code clarity. In the first example, leave the else, since it is subtly different, saving an evaluation. The second case is less clear-cut; typically I use elses after returns for alternate cases, but leave out the else when the return is for error handling.
If you're breaking an if statement across more than one line, then for Turing's sake use brackets! So:
if (x < 10) return;
Or
if (x < 10) {
return;
}
But not
if (x < 10)
return;
The else issue is subjective, but my view on it is that an if/else is conceptually breaking your code into more-or-less equal cases. Your first example is handling two conceptually similar but mutually exclusive cases, so I find an else to be appropriate. With two separate if statements, it appears at first glance as though the two conditionals are independent, which they are not.
When you return on an if, the circumstances are different. The cases are automatically mutually exclusive, since the return will prevent the other code from running anyway. If the alternative case (without the return) is short, and escecially if it returns as well, then I tend to use the else. If, as is common, the alternative code is long or complex, then I don't use an else, treating the if as more of a guard clause.
It's mostly a matter of clarity and readability, and both of those factors are subjective.
On a few platforms, code may sometimes execute faster with the 'else' statements omitted. For example:
if (a & 1)
b |= 2;
if (!(a & 1)
b &= ~2;
will generate four instructions for a Microchip PIC, and execute in a constant four cycles. An alternative:
if (a & 1)
b |= 2;
else
b &= ~2;
would require five instructions, and the execution time would be four or five cycles depending upon whether (a & 1) was true or not.
Unless you know that leaving in a 'redundant' test will make your code faster, use the 'else' statements to eliminate them.
Generally I would stay away from anything that isn't necessary. More code means more room for bugs.
However, if code that isn't necessary for the flow of the program, makes it that much clearer to someone reading it, then it may be necessary after all.
In your first example, no number would be less than 10 and greater than 20, but sometimes the logic is not as obvious. The else makes it easy to see that these conditions are part of the same block of code, and therefore should be included.
That else in your second example doesn't actually change the flow of the program so it really isn't necessary. In fact, you might consider reworking the logic a little:
if(x>10){
doSomething;
}
Now you don't even have to worry about the return statement, or the extra else block.
Edit: added brackets.. For Turing's sake!
As a general rule, always leave them in, and add a comment as to why it is ok, that the else has no code in it. That way those who come in your footsteps won't have to ask the question
"What is the implication of the else? Should there be one there or not?"
If you find yourself unable to come up with the correct wording to explain why the else is not significant, then there is a good chance that it is.
Missing elses are generally a code smell.

Why differ(!=,<>) is faster than equal(=,==)?

I've seen comments on SO saying "<> is faster than =" or "!= faster than ==" in an if() statement.
I'd like to know why is that so. Could you show an example in asm?
Thanks! :)
EDIT:
Source
Here is what he did.
function Check(var MemoryData:Array of byte;MemorySignature:Array of byte;Position:integer):boolean;
var i:byte;
begin
Result := True; //moved at top. Your function always returned 'True'. This is what you wanted?
for i := 0 to Length(MemorySignature) - 1 do //are you sure??? Perhaps you want High(MemorySignature) here...
begin
{!} if MemorySignature[i] <> $FF then //speedup - '<>' evaluates faster than '='
begin
Result:=memorydata[i + position] <> MemorySignature[i]; //speedup.
if not Result then
Break; //added this! - speedup. We already know the result. So, no need to scan till end.
end;
end;
end;
I'd claim that this is flat out wrong except perhaps in very special circumstances. Compilers can refactor one into the other effortlessly (by just switching the if and else cases).
It could have something to do with branch prediction on the CPU. Static branch prediction would predict that a branch simply wouldn't be taken and fetch the next instruction. However, hardly anybody uses that anymore. Other than that, I'd say it's bull because the comparisons should be identical.
I think there's some confusion in your previous question about what the algorithm was that you were trying to implement, and therefore in what the claimed "speedup" purports to do.
Here's some disassembly from Delphi 2007. optimization on. (Note, optimization off changed the code a little, but not in a relevant way.
Unit70.pas.31: for I := 0 to 100 do
004552B5 33C0 xor eax,eax
Unit70.pas.33: if i = j then
004552B7 3B02 cmp eax,[edx]
004552B9 7506 jnz $004552c1
Unit70.pas.34: k := k+1;
004552BB FF05D0DC4500 inc dword ptr [$0045dcd0]
Unit70.pas.35: if i <> j then
004552C1 3B02 cmp eax,[edx]
004552C3 7406 jz $004552cb
Unit70.pas.36: l := l + 1;
004552C5 FF05D4DC4500 inc dword ptr [$0045dcd4]
Unit70.pas.37: end;
004552CB 40 inc eax
Unit70.pas.31: for I := 0 to 100 do
004552CC 83F865 cmp eax,$65
004552CF 75E6 jnz $004552b7
Unit70.pas.38: end;
004552D1 C3 ret
As you can see, the only difference between the two cases is a jz vs. a jnz instruction. These WILL run at the same speed. what's likely to affect things much more is how often the branch is taken, and if the entire loop fits into cache.
For .Net languages
If you look at the IL from the string.op_Equality and string.op_Inequality methods, you will see that both internall call string.Equals.
But the op_Inequality inverts the result. This is two IL-statements more.
I would say they the performance is the same, with maybe a small (very small, very very small) better performance for the == statement. But I believe that the optimizer & JIT compiler will remove this.
Spontaneous though; most other things in your code will affect performance more than the choice between == and != (or = and <> depending on language).
When I ran a test in C# over 1000000 iterations of comparing strings (containing the alphabet, a-z, with the last two letters reversed in one of them), the difference was between 0 an 1 milliseconds.
It has been said before: write code for readability; change into more performant code when it has been established that it will make a difference.
Edit: repeated the same test with byte arrays; same thing; the performance difference is neglectible.
It could also be a result of misinterpretation of an experiment.
Most compilers/optimizers assume a branch is taken by default. If you invert the operator and the if-then-else order, and the branch that is now taken is the ELSE clause, that might cause an additional speed effect in highly calculating code (*)
(*) obviously you need to do a lot of operations for that. But it can matter for the tightest loops in e.g. codecs or image analysis/machine vision where you have 50MByte/s of data to trawl through.
.... and then I even only stoop to this level for the really heavily reusable code. For ordinary business code it is not worth it.
I'd claim this was flat out wrong full stop. The test for equality is always the same as the test for inequality. With string (or complex structure testing), you're always going to break at exactly the same point. Until that break point is reached, then the answer for equality is unknown.
I strongly doubt there is any speed difference. For integral types for example you are getting a CMP instruction and either JZ (Jump if zero) or JNZ (Jump if not zero), depending on whether you used = or ≠. There is no speed difference here and I'd expect that to hold true at higher levels too.
If you can provide a small example that clearly shows a difference, then I'm sure the Stack Overflow community could explain why. However, I think you might have difficulty constructing a clear example. I don't think there will be any performance difference noticeable at any reasonable scale.
Well it could be or it couldn't be, that is the question :-)
The thing is this is highly depending on the programming language you are using.
Since all your statements will eventually end up as instructions to the CPU, the one that uses the least amount of instruction to achieve the result will be the fastest.
For example if you say bits x is equal to bits y, you could use the instruction that does an XOR using both bits as an input, if the result is anything but 0 it is not the same. So how would you know that the result is anything but 0? By using the instruction that returns true if you say input a is bigger than 0.
So this is already 2 instructions you use to do it, but since most CPU's have an instruction that does compare in a single cycle it is a bad example.
The point I am making is still the same, you can't make this generally statements without providing the programming language and the CPU architecture.
This list (assuming it's on x86) of ASM instructions might help:
Jump if greater
Jump on equality
Comparison between two registers
(Disclaimer, I have nothing more than very basic experience with writing assembler so I could be off the mark)
However it obviously depends purely on what assembly instructions the Delphi compiler is producing. Without seeing that output then it's guesswork. I'm going to keep my Donald Knuth quote in as caring about this kind of thing for all but a niche set of applications (games, mobile devices, high performance server apps, safety critical software, missile launchers etc.) is the thing you worry about last in my view.
"We should forget about small
efficiencies, say about 97% of the
time: premature optimization is the
root of all evil."
If you're writing one of those or similar then obviously you do care, but you didn't specify it.
Just guessing, but given you want to preserve the logic, you cannot just replace
if A = B then
with
if A <> B then
To conserve the logic, the original code must have been something like
if not (A = B) then
or
if A <> B then
else
and that may truely be a little bit slower than the test on inequality.

Resources