Can we use branch prediction along with atomic operations? - linux-kernel

I wanted to know whether we can use branch prediction macros (likely/unlikely) along with any atomic operation. Is there any side effect of below statement ?
atomic_t v = ATOMIC_INIT(0);
atomic_inc(&v);
if (unlikely(atomic_read(&v)) == 2) {
/* Some Operation */
}

There is no difference between using likely/unlikely on atomic vs non-atomic operations. The purpose of those macros is only to generate code that performs better in the scenario where one of the two branches of a condition is more likely.
So for a "normal" operation you would have for example:
if (unlikely(--x)) if (likely(--x))
do_a(); do_a();
else else
do_b(); do_b();
*decrement x* *decrement x*
jnz not_zero jz zero
call do_b call do_a
not_zero: zero:
call do_a call do_b
While in the case of an atomic operation you would simply have:
if (unlikely(atomic_sub_and_test(&x))) if (likely(atomic_sub_and_test(&x)))
do_a(); do_a();
else else
do_b(); do_b();
*atomically decrement x* *atomically decrement x*
jnz not_zero jz zero
call do_b call do_a
not_zero: zero:
call do_a call do_b

Related

Which one is faster ! or == operation?

I wanted to know if there is any difference in performance upon using ! operator in place of == operator.
This is my understanding and please correct me if I am wrong.
! operator - does inverts all the bits and for integer it goes over 32 bits for flipping all the bits. works with one operand and maps to not operator in assembly code.
== operator - works with two operand and involves CMP and eventually JMP operation in assembly which is costly.
For a simple statement like the following which one performs better?
function(){
return (some operation) == 0
}
or
function(){
return !(some operation)
}
Languages: C++, Java, Python
platform: linux

comparison of “if” statements use against direct use of logical operators

To start with, I'll show some code:
//Declarations
bool cmp=filter();
//case 1
cmp && mainOperation();
cmp || elseOperation();
//case 2 :void*
cmp ?
mainOperatiom() &&
elseOperation() ;
//case 3
cmp || goto other;
mainOperation();
goto end;
other:
elseOperation();
end:
//case 0
if(cmp){
mainOperation();
} else {
elseOperation();
}
I'm actually not sure what the differences are between these codes from complexity view.
I'd like to know which case compiles the same as case 0? I mean which set of instructions will compile the same bytecode as the if statement.
Use case 0. It's readable, it's what any serious developer would use, it's the code that you are not asked to change in a code review, it's the code that I can read without thinking "what kind of xxxxx wrote this".
If you are even thinking about using another version to make your code run faster, then you need to learn how to save microseconds or milliseconds, not nanoseconds.

If/else Performance

I'm trying to figure out the difference in CPU usage and performance comparing two if/else statements, take the following functions
function f(x):
if(condition) return true;
else return false;
function f'(x):
if(condition) return true;
return false;
The purpose of the function is not important, of course in both cases you want to return true if the 'if' is true, otherwise return false.
Both pieces of code do the same thing, I'm thinking with regards to performance and CPU usage, would there be any difference between these two programs when removing the else statement and using the sequential execution to do the 'else' instead, or when compiled would the difference just be lost?
There is no difference between the two functions. Any half-decent compiler would generate identical code for them.
Because the if branch contains return at the end, else in the first program is redundant. When the program is translated to machine instructions, you end up with something like this:
start: LD $condition -- Check condition
JZ else_br -- Conditional jump
LD true_val
RET -- Return true
else_br: LD false_val
RET -- Return false
In the second program else branch is empty, so the sequence of instructions would be the same.

why always gcc makes jle/jg?

I make some assembly test code which just compare with character,
gcc makes jle / jg combination always whether condition contains equal or not.
example 1.
if ( 'A' < test && test < 'Z' )
0x000000000040054d <+32>: cmp BYTE PTR [rbp-0x1],0x41
0x0000000000400551 <+36>: jle 0x40056a <main+61>
0x0000000000400553 <+38>: cmp BYTE PTR [rbp-0x1],0x59
0x0000000000400557 <+42>: jg 0x40056a <main+61>
example 2.
if ( 'A' <= test && test <= 'Z' )
0x000000000040054d <+32>: cmp BYTE PTR [rbp-0x1],0x40
0x0000000000400551 <+36>: jle 0x40056a <main+61>
0x0000000000400553 <+38>: cmp BYTE PTR [rbp-0x1],0x5a
0x0000000000400557 <+42>: jg 0x40056a <main+61>
I thought it's problem about optimization, but GCC gave same result even if I compile with -O0 option.
How can I get JL/JG through 'A'< sth<'Z' and JLE/JGE through 'A'<=sth<='Z'?
One can see that first comparison is against [x41...x59] range. Second comparison is against [x40...x5a] range. Basically, compiler makes it into
if ( 'A'-1 < test && test < 'Z'+1 )
and then generates the same code
UPDATE
Just to make clear why I think compiler prefers JL vs JLE. JLE depends on flag values being updated (ZF=1) but JL doesn't. Therefore, JLE will introduce dependencies which potentially could hurt instruction level parallelism, even if instruction timing itself is the same
So, clear choice - transform code to use simpler instructions.
In general, you can't force the compiler to emit a particular instruction. In this case, you might succeed if you get rid of the constant so the compiler won't be able to adjust it. Note that due to the nature of your expression, the compiler will probably still reverse one of the tests, and thus bring in an equals. You might be able to work around that by using goto. Obviously, both of these changes will generate worse code.

How to recognize what is, and what is not tail recursion?

Sometimes it's simple enough (if the self call is the last statement, it's tail recursion), but there are still cases that confuse me. A professor told me that "if there's no instruction to execute after the self-call, it's tail recursion". How about these examples (disregard the fact that they don't make much sense) :
a) This one should be tail recursive, seeing how the self-call is the last statement, and there's nothing left to execute after it.
function foo(n)
{
if(n == 0)
return 0;
else
return foo(n-2);
}
b) But how about this one? It should be a tail call, because if the condition is true, nothing except it will be executed, but it's not the last statement?
function foo(n)
{
if(n != 0)
return foo(n-2);
else
return 0;
}
c) How about this one? In both cases, the self call will be the last thing executed :
function foo(n)
{
if(n == 0)
return 0;
else
{
if(n > 100)
return foo(n - 2);
else
return foo(n - 1);
}
}
It might help you to think about this in terms of how tail-call optimisations are actually implemented. That's not part of the definition, of course, but it does motivate the definition.
Typically when a function is called, the calling code will store any register values that it will need later, on the stack. It will also store a return address, indicating the next instruction after the call. It will do whatever it needs to do to ensure that the stack pointer is set up correctly for the callee. Then it will jump to the target address[*] (in this case, the same function). On return, it knows the return value is in the place specified by the calling convention (register or stack slot).
For a tail call, the caller doesn't do this. It ignores any register values, because it knows it won't need them later. It sets up the stack pointer so that the callee will use the same stack the caller did, and it doesn't set itself up as the return address, it just jumps to the target address. Thus, the callee will overwrite the same stack region, it will put its return value in the same location that the caller would have put its return value, and when it returns, it will not return to its caller, but will return to its caller's caller.
Therefore, informally, a function is tail-recursive when it is possible for a tail call optimisation to occur, and when the target of the tail call is the function itself. The effect is more or less the same as if the function contained a loop, and instead of calling itself, the tail call jumps to the start of the loop. This means there must be no variables needed after the call (and indeed no "work to do", which in a language like C++ means nothing to be destructed), and the return value of the tail call must be returned by the caller.
This is all for simple/trivial tail-recursion. There are transformations that can be used to make something tail-recursive which isn't already, for example introducing extra parameters, that store some information used by the "bottom-most" level of recursion, to do work that would otherwise be done on the "way out". So for instance:
int triangle(int n) {
if (n == 0) return 0;
return n + triangle(n-1);
}
can be made tail-recursive, either by the programmer or automatically by a smart enough compiler, like this:
int triangle(int n, int accumulator = 0) {
if (n == 0) return accumulator;
return triangle(n-1, accumulator + n);
}
Therefore, the former function might be described as "tail recursive" by someone who's talking about a smart enough language/compiler. Be prepared for that variant usage.
[*] Storing a return address, moving the stack pointer, and jumping, may or may not be wrapped up in a single opcode by the architecture, but even if not that's typically what happens.
All your functions are tail recursive.
no instruction left after the self-call
means: After the self-call, you return from the function, i.e. no more code has to be executed, and not that there is no more line of code in the function.
Yep; I think your professor meant that in any path, if the final instruction is recursive, then it is tail recursion.
So, all three examples are tail-recursive.
All three examples are tail recursive. Generally speaking, it is tail recursion, if the result of the function (the expression following the "return" keyword) is a lone call to the function itself. No other operator must be involved in the outermost level of the expression. If the call to itself is only a part of an expression then the machine must execute the call but then has to return back into the evaluation of said expression, that is, it was not at the tail of the function execution but in the middle of an expression. This however does not apply to any parameters that the recursive call may take: anything is allowed there, including recursive calls to itself (e.g. "return foo(foo(0));"). The optimization of calls to jumps is only possible for the outer call then, of course.

Resources