Consider the following simplified example function:
void foo(void) {
int t;
asm("push %0\n\t"
"push %0\n\t"
"call bar"
:
: "m" (t)
:
);
}
If I compile it with Cygwin x86 gcc 4.8.3 without optimization, the push instructions become:
push -4(%ebp)
push -4(%ebp)
This is good. The problem happens with -O:
push 12(%esp)
push 12(%esp)
This is clearly wrong. The first push changes esp, and the second push then accesses the wrong location. I have read that adding "%esp" to the clobber list should fix it, but it does not help. How can I make GCC use a frame pointer or properly consider esp changes?
(Assume that returning from the bar function will set esp to the value it had before the asm statement. I just need to call a thiscall function and am using inline assembly for that.)
I can simply use __attribute__((optimize("-fno-omit-frame-pointer"))) to prevent the problem-causing optimization in this one function. This might be the best solution. – dreamlayers
Related
I'm trying to run a simple code in assembly - I want to save an address to memory.
I'm moving the address into a register and then moving it into the memory, but for some reason the memory isn't updated.
.data
str1: .asciz "atm course number is 234118"
str2: .asciz "234118"
result: .space 8
.text
.global main
main:
xorq %rax, %rax
xorq %rbx, %rbx
leaq str1, %rax
mov %rax, result(,%rbx,1)
ret
What am I doing wrong?
Your debugger is looking at the wrong instance of result. Your code was always fine (although inefficient; use mov %rax, result(%rip) and don't zero an index, or use mov %rax, result(%rbx,,) to use the byte offset as a "base", not "index", which is more efficient).
glibc contains several result symbols, and in GDB info var result shows:
All variables matching regular expression "result":
Non-debugging symbols:
0x000000000040404b result # in your executable, at a normal static address
0x00007ffff7f54f20 result_type
0x00007ffff7f821b8 cached_result
0x00007ffff7f846a0 result # in glibc, at a high address
0x00007ffff7f85260 result # where the dynamic linker puts shared libs
0x00007ffff7f85660 result
0x00007ffff7f86ab8 result
0x00007ffff7f86f48 result
When I do p /x &result to see what address the debugger resolved that symbol to, I get one of the glibc instances, not the instance in your .data section. Specifically, I get 0x7ffff7f85660 as the address, with the content = 0.
When I print the value with a cast to p /x (unsigned long)result, or dump the memory with GDB's x command, I find a 0 there after the store.
(gdb) x /xg &result
0x7ffff7f85660 <result>: 0x0000000000000000
It looks like your system picked a different instance, one that contained a pointer to a libc address or something. I can't copy-paste from your image. These other result variables are probably static int result or whatever inside various .c files in glibc. (And BTW, that looks like a sign of poor coding style; usually you want to return a value instead of set a global or static. But glibc is old and/or maybe there's some justification for some of those.)
Your result: is the asm a compiler would make for static void* result if it didn't get optimized away. Except it would put it in .bss instead of .data because it's zero-initialized.
You're using SASM. I used GDB to get more details on exactly what's going on. Looking at the address of result in SASM's debug pane might have helped. But now that we've identified the problem using GDB, we can change your source to fix it for SASM.
You can use .globl result to make it an externally-visible symbol so it "wins" when the debugger is looking for symbols.
I added that and compiled again with gcc -g -no-pie store.s. It works as expected now, with p /x (unsigned long)result giving 0x404028
I was wondering if there was any way that would allow me to specify anything other than eax, ebx, ecx and edx as output operands.
Lets say I want to put the content of r8 in a variable, Is it possible to write something like this :
__asm__ __volatile__ (""
:"=r8"(my_var)
: /* no input */
);
It is not clear why would you need to put contents of a specific register into a variable, given a volatile nature of the most of them.
GNU C only has specific-register constraints for the original 8 registers, like "=S"(rsi). For r8..r15, your only option (to avoid needing a mov instruction inside the asm statement) is a register-asm variable.
register long long my_var __asm__ ("r8");
__asm__ ("" :"=r"(my_var)); // guaranteed that r chooses r8
You may want to use an extra input/output constraint to control where you sample the value of r8. (e.g. "+rm"(some_other_var) will make this asm statement part of a data dependency chain in your function, but that will also prevent constant-propagation and other optimizations.) asm volatile may help with controlling the ordering, but that's not guaranteed.
It sometimes works to omit the __asm__ ("" :"=r"(my_var)); statement using the register local as an operand, but it's only guaranteed to work if you do use it: https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables. (And see discussion in comments on a previous version of this answer which suggested you could skip that part.) It doesn't make your code any slower, so don't skip that part to make sure your code is safe in general.
The only supported use for this feature is to specify registers for input and output operands when calling Extended asm (see Extended Asm). This may be necessary if the constraints for a particular machine don’t provide sufficient control to select the desired register. To force an operand into a register, create a local variable and specify the register name after the variable’s declaration. Then use the local variable for the asm operand and specify any constraint letter that matches the register
P.S. This is a GCC extension that may not be portable, but should be available on all compilers that support GNU C inline asm syntax.
gcc doesn't have specific-register constraints at all for some architectures, like ARM, so this technique is the only way for rare cases where you want to force specific registers for input or output operands.
Example:
int get_r8d(void) {
register long long my_var __asm__ ("r8");
__asm__ ("" :"=r"(my_var)); // guaranteed that r chooses r8
return my_var * 2; // do something interesting with the value
}
compiled with gcc7.3 -O3 on the Godbolt compiler explorer
get_r8d():
lea eax, [r8+r8] # gcc can use it directly without a MOV first
ret
It should be possible, based on the answer here:
https://stackoverflow.com/a/43197401/3569229
#include <stdint.h>
uint64_t getsp( void )
{
uint64_t sp;
asm( "mov %%r8, %0" : "=rm" ( sp ));
return sp;
}
You can find a list of register names here: https://www3.nd.edu/~dthain/courses/cse40243/fall2015/intel-intro.html
So your code above would be changed to:
__asm__ __volatile__ ("mov %%r8, %0"
:"=rm"(my_var)
: /* no input */
);
guys! I have a problem. How do I get address of class member function by asm in GCC?
In VS2012, we can do below code to get address.
asm {mov eax, offset TEST::foo}
But, in GCC?
__asm__ __volatile__(
"movq offset %1, %%rdi"
"movq %%rdi, %0"
:"=r"(addr)
:"r"(&TEST::foo)
);
It failed...
AT&T syntax doesn't use the offset keyword. And besides, you've asked the compiler to put &TEST::foo in a register already.
__asm__ (
"mov %1, %0"
:"=r"(addr)
:"r"(&TEST::foo)
);
Or better:
__asm__ ( "" // no instructions
:"=r"(addr)
:"0"(&TEST::foo) // same register as operand 0
);
Or even better: addr = &TEST::foo; https://gcc.gnu.org/wiki/DontUseInlineAsm for this, because it stops the compiler from knowing what's going on.
But if you are going to use inline asm, make sure you let the compiler do as much for you as it can. Use constraints to tell it where you want the input, and where you left the output. If the first or last instruction of an inline-asm statement is a mov, usually that means you're doing it wrong. (See the inline-assembly tag wiki for some links to guides on how to write GNU C inline asm that doesn't suck.
Bugs in your original: you didn't declare a clobber on RDI, so the compiler will still assume you didn't modify it.
You don't need volatile if the only reason to run the code in the asm statement is to produce the output operands, not for side effects. Leaving out volatile lets the compiler optimize around it, and even drop it entirely if the output is unused.
Consider the following code:
#include <stdio.h>
void main() {
uint32_t num = 2;
__asm__ __volatile__ ("CPUID");
__asm__ __volatile__ ("movl $1, %%ecx":);
__asm__ __volatile__ ("andl $0, %%ecx": "=r"(num));
printf("%i\n", num);
}
My initial expectation was that this code would print 0, and it does if I comment out the CPUID line, but as-is it was giving me garbage. After some trial, error, and research I realized that I was getting the value of a random register. Apparently GCC doesn't assume that I want the result of the statement being executed.
The problem is that I've seen (other people's) code that relies on that statement properly getting the result of the AND, regardless of what is going on with the other registers. Obviously such code is broken, given my observations, and the "=r" should be replaced with "=c".
My question is, can we ever rely on the "=r" constraint behaving consistently or according to the obvious expectation? Or is GCC's implementation too opaque/weird/other and it's best just to avoid it in every situation?
In order to use the =r output specifier you need to give gcc the freedom to pick the register that it wants to use. You do that by specifying the inputs and outputs generically with %0 for the output and the inputs starting with %1 for the first input.
In your case you are saying that num can be in a register. But there is nothing in the asm instruction that uses the output register. So gcc will essentially ignore this.
The reason that you are getting a different value if you comment or don't comment the CPUID instruction is that CPUID can write to eax,ebx,ecx, and edx. I tried your example on my system and got 0 as the result in both cases. But I noticed that the assembly that is generated is printing the value of eax. So I guess when I ran this program CPUID was writing 0 to eax.
If you did want to use the =r constraint you would need to do something like this:
asm("CPUID \n\t"
"movl $1, %0 \n\t"
"andl $0, %0 \n\t"
:"=r"(num) );
Otherwise if your asm code specifically mentions a register then you will need to specify it in the constraint list. In your example that means using =c.
GCC compiler supports __builtin_expect statement that is used to define likely and unlikely macros.
eg.
#define likely(expr) (__builtin_expect(!!(expr), 1))
#define unlikely(expr) (__builtin_expect(!!(expr), 0))
Is there an equivalent statement for the Microsoft Visual C compiler, or something equivalent ?
C++20 standard will include [[likely]] and [[unlikely]] branch prediction attributes.
The latest revision of attribute proposal can be found from http://wg21.link/p0479
The original attribute proposal can be found from http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0479r0.html
Programmers should prefer PGO. Attributes can easily reduce performance if applied incorrectly or they later become incorrect when program changes.
According to http://www.akkadia.org/drepper/cpumemory.pdf (page 57), it still makes sense to use static branch prediction even if CPU predicts correctly dynamically.
The reason for that is that L1i cache will be used even more efficiently if static prediction was done right.
I say just punt
There is nothing like it. There is __assume(), but don't use it, it's a different kind of optimizer directive.
Really, the reason the gnu builtin is wrapped in a macro is so you can just get rid of it automatically if __GNUC__ is not defined. There isn't anything the least bit necessary about those macros and I bet you will not notice the run time difference.
Summary
Just get rid of (null out) *likely on non-GNU. You won't miss it.
According to Branch and Loop Reorganization to Prevent Mispredicts document from Intel:
In order to effectively write your code to take advantage of these
rules, when writing if-else or switch statements, check the most
common cases first and work progressively down to the least common.
Unfortunately you cannot write something like
#define if_unlikely(cond) if (!(cond)); else
because MSVC optimizer as of VS10 ignores such "hint".
As I prefer to deal with errors first in my code, I seem to write less efficient code.
Fortunately, second time CPU encounters the branch it will use its statistics instead of a static hint.
__assume should be similar.
However, if you want to do this really well you should use Profile Guided Optimization rather than static hints.
I know this question is about Visual Studio, but I'm going to try to answer for as many compilers as I can (including Visual Studio)…
A decade later there is progress! As of Visual Studio 2019 MSVC still doesn't support anything like this (even though it's the most popular builtin/intrinsic), but as Pauli Nieminen mentioned above C++20 has likely / unlikely attributes which can be used to create likely/unlikely macros and MSVC usually adds support for new C++ standards pretty quickly (unlike C) so I expect Visual Studio 2021 to support them.
Currently (2019-10-14) only GCC supports these attributes, and even then only applied to labels, but it is sufficient to at least do some basic testing. Here is a quick implementation which you can test on Compiler Explorer:
#define LIKELY(expr) \
( \
([](bool value){ \
switch (value) { \
[[likely]] case true: \
return true; \
[[unlikely]] case false: \
return false; \
} \
}) \
(expr))
#define UNLIKELY(expr) \
( \
([](bool value){ \
switch (value) { \
[[unlikely]] case true: \
return true; \
[[likely]] case false: \
return false; \
} \
}) \
(expr))
Edit (2022-05-02): MSVC 2022 supports C++20, including [[likely]]/[[unlikely]], but generates absolutely terrible code for this (see the comments on this post)... don't use it there.
You'll probably want to #ifdef around it to support compilers that can't handle it, but luckily most compilers support __builtin_expect:
GCC 3.0
clang
ICC since at least 13, probably much longer.
Oracle Development Studio 12.6+, but only in C++ mode.
ARM 4.1
IBM XL C/C++ since at least 10.1, probably longer.
TI since 6.1
TinyCC since 0.9.27
GCC 9+ also supports __builtin_expect_with_probability. It's not available anywhere else, but hopefully one day… It takes a lot of the guesswork out of trying to figure out whether to use ilkely/unlikely or not—you just set the probability and the compiler (theoretically) does the right thing.
Also, clang supports a __builtin_unpredictable (since 3.8, but test for it with __has_builtin(__builtin_unpredictable)). Since a lot of compilers are based on clang these days it probably works in them, too.
If you want this all wrapped up and ready to go, you might be interested in one of my projects, Hedley. It's a single public-domain C/C++ header which works on pretty much all compilers and contains lots of useful macros, including HEDLEY_LIKELY, HEDLEY_UNLIKELY, HEDLEY_UNPREDICTABLE, HEDLEY_PREDICT, HEDLEY_PREDICT_TRUE, and HEDLEY_PREDICT_FALSE. It doesn't have the C++20 version quite yet, but it should be there soon…
Even if you don't want to use Hedley in your project, you might want to check the the implementations there instead of relying on the lists above; I'll probably forget to update this answer with new information, but Hedley should always be up-to-date.
Now MS said they have implemented likely/unlikely attributes
But in fact there isn't any different between using "likely" or not using.
I have compiled these codes and is produce same result.
int main()
{
int i = rand() % 2;
if (i) [[likely]]
{
printf("Hello World!\n");
}
else
{
printf("Hello World2%d!\n",i);
}
}
int main()
{
int i = rand() % 2;
if (i)
{
printf("Hello World!\n");
}
else [[likely]]
{
printf("Hello World2%d!\n",i);
}
}
int pdb._main (int argc, char **argv, char **envp);
0x00401040 push ebp
0x00401041 mov ebp, esp
0x00401043 push ecx
0x00401044 call dword [rand] ; pdb.__imp__rand
; 0x4020c4
0x0040104a and eax, 0x80000001
0x0040104f jns 0x401058
0x00401051 dec eax
0x00401052 or eax, 0xfffffffe ; 4294967294
0x00401055 add eax, 1
0x00401058 je 0x40106d
0x0040105a push str.Hello_World ; pdb.___C__0O_NFOCKKMG_Hello_5World__CB_6
; 0x402108 ; const char *format
0x0040105f call pdb._printf ; int printf(const char *format)
0x00401064 add esp, 4
0x00401067 xor eax, eax
0x00401069 mov esp, ebp
0x0040106b pop ebp
0x0040106c ret
0x0040106d push 0
0x0040106f push str.Hello_World2_d ; pdb.___C__0BB_DODJFBPJ_Hello_5World2__CFd__CB_6
; 0x402118 ; const char *format
0x00401074 call pdb._printf ; int printf(const char *format)
0x00401079 add esp, 8
0x0040107c xor eax, eax
0x0040107e mov esp, ebp
0x00401080 pop ebp
0x00401081 ret
As the question is old, the answers saying there's no [[likely]] / [[unlikely]] in MSVC, or that there's no impact are obsolete.
Latest MSVC supports [[likely]] / [[unlikely]] in /std:c++20 and /std:c++latest modes.
See demo on Godbolt's compiler explorer that shows the difference.
As can be seen from the link above, one visible effect on x86/x64 for if-else statement is that the conditional jump forward will be for unlikely branch. Before C++20 and supporting VS version the same could be achieved by placing the likely branch into if part, and the unlikely branch into else part, negating the condition as needed.
Note that the effect of such optimization is minimal. For frequently called code in a tight loop, the dynamic branch prediction would do the right thing anyway.