Will compiler optimize away called-once functions - compilation

I've been reading "Clean Code" by Robert C Martin, and one of the pieces of advice is to use more but smaller functions; i.e. instead of
int main(){
// do one thing
// ... 10 lines of code
// do another thing
// ... another 10 lines of code
// one last thing
// ... another 10 lines
return 0;
}
you ought to
int main(){
doOneThing();
doAnotherThing();
oneLastThing();
}
void doOneThing(){
... 10 lines of code
}
// ... you get the idea
However, from my understanding of low level langauges, I know that when a function is called, variables are pushed onto the stack and the stack pointer is incremented, etc, whereas for the continuous code, there is no need to do this.
On the other hand, compiler optimisations can do cool things like inline class methods. Assuming doOneThing is called exactly once, could a compiler this code, deduce that the code can be unrolled into main(), and elimiate the function call and associated runtime overhead altogether?

A previous version of the question was about C++ or C. I am convinced that it needs the context of a language to discuss this topic in a meaningful way, so I chose to stay with C++.
C++ is not really "low level". Code you write is not instructions for your CPU. Between your code and what actually happens during runtime there is one of the most sophisticated pieces of software: Your compiler. When you turn on optimizations then your compiler will analyze your code and try to produce the most efficient executable that behaves at runtime as your code would if it was taken literally (i.e. no optimizations).
This is governed by the so-called as-if rule. Calling a function is not observable behavior. If the function is small, the compiler will inline the call to the function.
On the other hand, of course calling a function has a cost, but that cost is comparatively small. You need to start worry about this overhead, when the function is very small, like only 1 or 2 lines, and (and this is important!) for some reason the compiler cannot inline the call. This can be due the function being virtual for example.
You are asking if a compiler can optimize it, so I pick just one example. Gcc with -O3 will produce following output for following code:
int foo() { return 42;}
int bar() { return 0;}
int moo() { return 1;}
int main() {
return foo() + bar() + moo();
}
output:
foo():
mov eax, 42
ret
bar():
xor eax, eax
ret
moo():
mov eax, 1
ret
main:
mov eax, 43
ret
You can see hat in main no function is called. THe compiler examind the expression foo() + bar() + moo() and realized it always equals 43. No function has to be called to return 43.
This is a silly example, though for the general case, if you do want to see what the compiler did you need to do the same: look at the compilers output.
And to do that you need to write the code first. It is of no use to first speculate which code would be more efficient or less efficient. You first need to write the code. And because you need to do that anyhow, you can write clean simple human comprehensible code. That's the code your compiler understands best and knows how to optimize, because thats also what other programmers write.

Related

Hello world exit code

I just compiled my hello world c program with gcc and ran it in ubuntu. Since I ran it through emacs, I got the exit code of the program: 13. Why 13? I didn't specify anything, so why didn't it default to 0? When I put an exit function at the end, I could change it, but I'm wondering what the significance of this default is.
Code:
#include<stdio.h>
int main()
{
printf("Hello, world!");
}
As of C99, reaching the end of main without a return is the same as if you'd returned zero (only main, not all functions in general). Before C99 (and I believe gcc defaults to C89/90 as a baseline), it was not defined what would happen, so you should be explicitly returning zero if that's what you need.
Or you could adopt C99/C11 by compiling with -std=c99 or the c11 one.
In terms of why 13, it's neither relevant nor portable but it's likely that the return code is whatever happens to be in the eax register (or equivalent if you're using a different calling convention or architecture). For x86, that would probably still be the value that was returned from printf, which returns the number of characters printed.
Either you can use void main() instead of int main() so you don't have to give any return type but if you use int main, then you have to provide the return statement

What are the limitations on the use of output registers in avr-gcc inline assembly?

Output register in inline assembly must be declared with the "=" constraint, meaning "write-only" [1]. What exactly does this mean - is it truly forbidden to read and modify them within the assembly? For example, consider this code:
uint8_t one ()
{
uint8_t res;
asm("ldi %[res],0\n"
"inc %[res]\n"
: [res] "=r" (res)
);
return res;
}
The assembly sets the output register to 0 then increments it. Is this breaking the "write-only" constraint?
UPDATE
I'm seeing problems where my inline asm breaks when I change it to work directly on an output register, as opposed to using r16 for the computation and finally mov'ing r16 into the output register. The code is here: http://ideone.com/JTpYma . It prints results to serial, you just need to define F_CPU and BAUD. The problem appears only when using gcc-4.8.0 and not using gcc-4.7.2.
[1] http://www.nongnu.org/avr-libc/user-manual/inline_asm.html
The compiler doesn't care whether you read it or not, it just won't put the initial value of the variable into the register. Your example is entirely legal, but people often wrongly expect to get result 2 from this code:
uint8_t one ()
{
uint8_t res = 1;
asm("inc %[res]\n"
: [res] "=r" (res)
);
return res;
}
Since it's only an output constraint, the initial value of res is not guaranteed to be loaded into the register. In fact, the initializer may even be optimized away on the assumption that the asm block will overwrite it anyway. The above code is compiled to this by my version of avr-gcc:
inc r24
ret
As you can see, the compiler indeed removed loading 1 into res and hence into r24 thus producing undefined result.
Update
The problem with the updated program in the question is that it also has an input register operand. By default the compiler assumes that all inputs are consumed before the outputs are assigned so it's safe to allocate overlapping registers. That's clearly not the case for your example. You should use an "early clobber" modifier (&) for the output. This is what the manual has to say about that:
& Means (in a particular alternative) that this operand is an
earlyclobber operand, which is modified before the instruction is
finished using the input operands. Therefore, this operand may not lie
in a register that is used as an input operand or as part of any
memory address.
Nobody said gcc inline asm was easy :D

Compile time barriers - compiler code reordering - gcc and pthreads

AFAIK there are pthread functions that acts as memory barriers (e.g. here clarifications-on-full-memory-barriers-involved-by-pthread-mutexes). But what about compile-time barrier, i.e. is compiler (especially gcc) aware of this?
In other words - e.g. - is pthread_create() reason for gcc not to perform reordering?
For example in code:
a = 1;
pthread_create(...);
Is it certain that reordering will not take place?
What about invocations from different functions:
void fun(void) {
pthread_create(...);
...
}
a = 1;
fun();
Is fun() also compile time barrier (assuming pthread_create() is)?
What about functions in different translation units?
Please note that I am interested in general gcc and pthreads behavior scpecification, not necessarily x86-specific (various different embedded platforms in focus).
I am also not interested in other compilers/thread libraries behavior.
Because functions such as pthread_create() are external functions the compiler must ensure that any side effects that could be visible to an external function (such as a write to a global variable) must be done before calling the function. The compile couldn't reorder the write to a until after the function call in the first case) assuming a was global or otherwise potentially accessible externally).
This is behavior that is necessary for any C compiler, and really has little to do with threads.
However, if the variable a was a local variable, the compiler might be able to reorder it until after the function call (a might not even end up in memory at all for that matter), unless something like the address of a was taken and made available externally somehow (like passing it as the thread parameter).
For example:
int a;
void foo(void)
{
a = 1;
pthread_create(...); // the compiler can't reorder the write to `a` past
// the call to `pthread_create()`
// ...
}
void bar(void)
{
int b;
b = 1;
pthread_create(...); // `b` can be initialized after calling `pthread_create()`
// `b` might not ever even exist except as a something
// passed on the stack or in a register to `printf()`
printf( "%d\n", b);
}
I'm not sure if there's a document that outlines this in more detail - this is covered largely by C's 'as if' rule. In C99 that's in 5.1.2.3/3 "Program execution". C is specified by an abstract machine with sequence points where side effects must be complete, and programs must follow that abstract machine model except where the compiler can deduce that the side effects aren't needed.
In my foo() example above, the compiler would generally not be able to deduce that setting a = 1; isn't needed by pthread_create(), so the side effect of setting a to the value 1 must be completed before calling pthread_create(). Note that if there are compilers that perform global optimizations that can deduce that a isn't used elsewhere, they could delay or elide the assignment. However, in that case nothing else is using the side effect, so there would be no problem with that.

likely/unlikely equivalent for MSVC

GCC compiler supports __builtin_expect statement that is used to define likely and unlikely macros.
eg.
#define likely(expr) (__builtin_expect(!!(expr), 1))
#define unlikely(expr) (__builtin_expect(!!(expr), 0))
Is there an equivalent statement for the Microsoft Visual C compiler, or something equivalent ?
C++20 standard will include [[likely]] and [[unlikely]] branch prediction attributes.
The latest revision of attribute proposal can be found from http://wg21.link/p0479
The original attribute proposal can be found from http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0479r0.html
Programmers should prefer PGO. Attributes can easily reduce performance if applied incorrectly or they later become incorrect when program changes.
According to http://www.akkadia.org/drepper/cpumemory.pdf (page 57), it still makes sense to use static branch prediction even if CPU predicts correctly dynamically.
The reason for that is that L1i cache will be used even more efficiently if static prediction was done right.
I say just punt
There is nothing like it. There is __assume(), but don't use it, it's a different kind of optimizer directive.
Really, the reason the gnu builtin is wrapped in a macro is so you can just get rid of it automatically if __GNUC__ is not defined. There isn't anything the least bit necessary about those macros and I bet you will not notice the run time difference.
Summary
Just get rid of (null out) *likely on non-GNU. You won't miss it.
According to Branch and Loop Reorganization to Prevent Mispredicts document from Intel:
In order to effectively write your code to take advantage of these
rules, when writing if-else or switch statements, check the most
common cases first and work progressively down to the least common.
Unfortunately you cannot write something like
#define if_unlikely(cond) if (!(cond)); else
because MSVC optimizer as of VS10 ignores such "hint".
As I prefer to deal with errors first in my code, I seem to write less efficient code.
Fortunately, second time CPU encounters the branch it will use its statistics instead of a static hint.
__assume should be similar.
However, if you want to do this really well you should use Profile Guided Optimization rather than static hints.
I know this question is about Visual Studio, but I'm going to try to answer for as many compilers as I can (including Visual Studio)…
A decade later there is progress! As of Visual Studio 2019 MSVC still doesn't support anything like this (even though it's the most popular builtin/intrinsic), but as Pauli Nieminen mentioned above C++20 has likely / unlikely attributes which can be used to create likely/unlikely macros and MSVC usually adds support for new C++ standards pretty quickly (unlike C) so I expect Visual Studio 2021 to support them.
Currently (2019-10-14) only GCC supports these attributes, and even then only applied to labels, but it is sufficient to at least do some basic testing. Here is a quick implementation which you can test on Compiler Explorer:
#define LIKELY(expr) \
( \
([](bool value){ \
switch (value) { \
[[likely]] case true: \
return true; \
[[unlikely]] case false: \
return false; \
} \
}) \
(expr))
#define UNLIKELY(expr) \
( \
([](bool value){ \
switch (value) { \
[[unlikely]] case true: \
return true; \
[[likely]] case false: \
return false; \
} \
}) \
(expr))
Edit (2022-05-02): MSVC 2022 supports C++20, including [[likely]]/[[unlikely]], but generates absolutely terrible code for this (see the comments on this post)... don't use it there.
You'll probably want to #ifdef around it to support compilers that can't handle it, but luckily most compilers support __builtin_expect:
GCC 3.0
clang
ICC since at least 13, probably much longer.
Oracle Development Studio 12.6+, but only in C++ mode.
ARM 4.1
IBM XL C/C++ since at least 10.1, probably longer.
TI since 6.1
TinyCC since 0.9.27
GCC 9+ also supports __builtin_expect_with_probability. It's not available anywhere else, but hopefully one day… It takes a lot of the guesswork out of trying to figure out whether to use ilkely/unlikely or not—you just set the probability and the compiler (theoretically) does the right thing.
Also, clang supports a __builtin_unpredictable (since 3.8, but test for it with __has_builtin(__builtin_unpredictable)). Since a lot of compilers are based on clang these days it probably works in them, too.
If you want this all wrapped up and ready to go, you might be interested in one of my projects, Hedley. It's a single public-domain C/C++ header which works on pretty much all compilers and contains lots of useful macros, including HEDLEY_LIKELY, HEDLEY_UNLIKELY, HEDLEY_UNPREDICTABLE, HEDLEY_PREDICT, HEDLEY_PREDICT_TRUE, and HEDLEY_PREDICT_FALSE. It doesn't have the C++20 version quite yet, but it should be there soon…
Even if you don't want to use Hedley in your project, you might want to check the the implementations there instead of relying on the lists above; I'll probably forget to update this answer with new information, but Hedley should always be up-to-date.
Now MS said they have implemented likely/unlikely attributes
But in fact there isn't any different between using "likely" or not using.
I have compiled these codes and is produce same result.
int main()
{
int i = rand() % 2;
if (i) [[likely]]
{
printf("Hello World!\n");
}
else
{
printf("Hello World2%d!\n",i);
}
}
int main()
{
int i = rand() % 2;
if (i)
{
printf("Hello World!\n");
}
else [[likely]]
{
printf("Hello World2%d!\n",i);
}
}
int pdb._main (int argc, char **argv, char **envp);
0x00401040 push ebp
0x00401041 mov ebp, esp
0x00401043 push ecx
0x00401044 call dword [rand] ; pdb.__imp__rand
; 0x4020c4
0x0040104a and eax, 0x80000001
0x0040104f jns 0x401058
0x00401051 dec eax
0x00401052 or eax, 0xfffffffe ; 4294967294
0x00401055 add eax, 1
0x00401058 je 0x40106d
0x0040105a push str.Hello_World ; pdb.___C__0O_NFOCKKMG_Hello_5World__CB_6
; 0x402108 ; const char *format
0x0040105f call pdb._printf ; int printf(const char *format)
0x00401064 add esp, 4
0x00401067 xor eax, eax
0x00401069 mov esp, ebp
0x0040106b pop ebp
0x0040106c ret
0x0040106d push 0
0x0040106f push str.Hello_World2_d ; pdb.___C__0BB_DODJFBPJ_Hello_5World2__CFd__CB_6
; 0x402118 ; const char *format
0x00401074 call pdb._printf ; int printf(const char *format)
0x00401079 add esp, 8
0x0040107c xor eax, eax
0x0040107e mov esp, ebp
0x00401080 pop ebp
0x00401081 ret
As the question is old, the answers saying there's no [[likely]] / [[unlikely]] in MSVC, or that there's no impact are obsolete.
Latest MSVC supports [[likely]] / [[unlikely]] in /std:c++20 and /std:c++latest modes.
See demo on Godbolt's compiler explorer that shows the difference.
As can be seen from the link above, one visible effect on x86/x64 for if-else statement is that the conditional jump forward will be for unlikely branch. Before C++20 and supporting VS version the same could be achieved by placing the likely branch into if part, and the unlikely branch into else part, negating the condition as needed.
Note that the effect of such optimization is minimal. For frequently called code in a tight loop, the dynamic branch prediction would do the right thing anyway.

MSVC equivalent to '__builtin_return_address'

With msvc, is there an equivalent to gcc's "__builtin_return_address"?
I'm looking to find the address of the calling function, 1 level deep.
__ReturnAddress
From MSDN:
The _ReturnAddress intrinsic provides
the address of the instruction in the
calling function that will be executed
after control returns to the caller
Note that on some platforms, the result could be misleading due to tail folding - the compiler might have your inner function return 2 levels deep. This can commonly occur for code like this:
int DoSomething()
{
return DoSomethingSpecial();
}
The compiler could generate code so DoSomethingSpecial returns directly to the caller of DoSomething.
Also, the return address is not trustworthy-enough to make security decisions, see here.

Resources