PPC64 subroutine call from inline assembly - powerpc

I am trying to write a simple function using inline assembly in C for powerpc64, my function calls another function and I have a couple of questions related to that.
1) How do I save the LR register before branching using 'bl ' to the subroutine?
Specifically, for this code:
void func(void *arg1, void *arg2)
{
void *result;
__asm__ volatile (
...
...
"bl <address>\n" //Call to subroutine
"nop\n"
...
: [result]"=r"(result)
: [arg1]"r"(arg1),
[arg2]"r"(arg2)
);
return result;
}
The compiler generates the prologue code for this without the "mflr 0;std 0, 16(1)" instructions to save LR since it does not know that a subroutine is being called in my assembly code. Do I include these instructions in my assembly code? If so, how do I know the stack size created by the compiler prologue code to get to the LR save area of the function calling 'func'? (from powerpc assembly tutorials on developerworks the LR registers needs to be saved in the 'calling' function's stack frame)
2) I believe I will need to save arg1 and arg2 before calling the subroutine, which is the right place to temp store these parameters before making a subroutine call - the parameter save area or non-volatile registers? I just want to know the right way this is done in production quality ppc64 code
Thanks in advance!

AFAIK there is no way to do this properly. Inline asm is not designed for calling functions.
You can't reliably know the size of the stack frame the compiler has generated, in fact the entire function could be inlined, or as you observed the compiler might not generate a stack frame at all.
But you don't have to store LR in the caller stack frame, it's best if you do, but it's not 100% required. So just put it in a non-volatile, mark that register as clobbered, and restore it on the way back.
You shouldn't need to save arg1 and arg2, but what you must do is mark all the volatile registers as clobbered. Then the compiler will save anything that is in volatile registers (like arg1 and arg2) before it calls your asm. Also remember that some CR fields might be clobbered. I'd also add 'memory' to the clobbers so that GCC is pessimistic about optimising across the asm.
If you do all that it might work, unless I'm forgetting something :)

Related

HW register value (R15) not saved during a function call to external library

My code is written in C++, and compiled with gcc version 4.7.2.
It's linked with 3rd party library, which is written in C, and compiled with gcc 4.5.2.
My code calls a function initStuff(). During the debug I found out that the value of R15 register before the call to initStuff() is not the same as the value upon return from that function.
As a quick hack I did:
asm(" mov %%r15, %0" : "=r" ( saveR15 ) );
initStuff();
asm(" mov %0, %%r15;" : : "r" (saveR15) );
which seems to work for now.
Who is to blame here? How can I find if it's a compiler issue, or maybe compatibility issue?
gcc on x86-64 follows the System V ABI, which defines r15 as a callee-saved register; any function which uses this register is supposed to save and restore it.
So if this third-party function is not doing so, it is failing to conform to the ABI, and unless this is documented, it is to blame. AFAIK this part of the ABI has been stable forever, so if compiler-generated code (with default options) is failing to save and restore r15, that would be a compiler bug. More likely some part of the third-party code uses assembly language and is buggy, or conceivably it was built with non-standard compiler options.
You can either dig into it, or as a workaround, write a wrapper around it that saves and restores r15. Your current workaround is not really safe, since the compiler might reorder your asm statements with respect to surrounding code. You should instead put the call to initStuff inside a single asm block with the save-and-restore (declaring it as clobbering all caller-saved registers), or write a "naked" assembly wrapper which does the save/restore and call, and call it instead. (Make sure to preserve stack alignment.)

How to see result of MASM directives such as PROC, .SETFRAME. .PUSHREG

Writing x64 Assembly code using MASM, we can use these directives to provide frame unwinding information. For example, from .SETFRAME definition:
These directives do not generate code; they only generate .xdata and .pdata.
Since these directives don't produce any code, I cannot see their effects in Disassembly window. So, I don't see any difference, when I write assembly function with or without these directives. How can I see the result of these directives - using dumpbin or something else?
How to write code that can test this unwinding capability? For example, I intentionally write assembly code that causes an exception. I want to see the difference in exception handling behavior, when function is written with or without these directives.
In my case caller is written in C++, and can use try-catch, SSE etc. - whatever is relevant for this situation.
Answering your question:
How can I see the result of these directives - using dumpbin or something else?
You can use dumpbin /UNWINDINFO out.exe to see the additions to the .pdata resulting from your use of .SETFRAME.
The output will look something like the following:
00000054 00001530 00001541 000C2070
Unwind version: 1
Unwind flags: None
Size of prologue: 0x04
Count of codes: 2
Frame register: rbp
Frame offset: 0x0
Unwind codes:
04: SET_FPREG, register=rbp, offset=0x00
01: PUSH_NONVOL, register=rbp
A bit of explanation to the output:
The second hex number found in the output is the function address 00001530
Unwind codes express what happens in the function prolog. In the example what happens is:
RBP is pushed to the stack
RBP is used as the frame pointer
Other functions may look like the following:
000000D8 000016D0 0000178A 000C20E4
Unwind version: 1
Unwind flags: EHANDLER UHANDLER
Size of prologue: 0x05
Count of codes: 2
Unwind codes:
05: ALLOC_SMALL, size=0x20
01: PUSH_NONVOL, register=rbx
Handler: 000A2A50
One of the main differences here is that this function has an exception handler. This is indicated by the Unwind flags: EHANDLER UHANDLER as well as the Handler: 000A2A50.
Probably your best bet is to have your asm function call another C++ function, and have your C++ function throw a C++ exception. Ideally have the code there depend on multiple values in call-preserved registers, so you can make sure they get restored. But just having unwinding find the right return addresses to get back into your caller requires correct metadata to indicate where that is relative to RSP, for any given RIP.
So create a situation where a C++ exception needs to unwind the stack through your asm function; if it works then you got the stack-unwind metadata directives correct. Specifically, try{}catch in the C++ caller, and throw in a C++ function you call from asm.
That thrower can I think be extern "C" so you can call it from asm without name mangling. Or call it via a function pointer, or just look at MSVC compiler output and copy the mangled name into asm.
Apparently Windows SEH uses the same mechanism as plain C++ exceptions, so you could potentially set up a catch for the exception delivered by the kernel in response to a memory fault from something like mov ds:[0], eax (null deref). You could put this at any point in your function to make sure the exception unwind info was correct about the stack state at every point, not just getting back into sync before a function-call.
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170&viewFallbackFrom=vs-2019 has details about the metadata.
BTW, the non-Windows (e.g. GNU/Linux) equivalent of this metadata is DWARF .cfi directives which create a .eh_frame section.
I don't know equivalent details for Windows, but I do know they use similar metadata that makes it possible to unwind the stack without relying on RBP frame pointers. This lets compilers make optimized code that doesn't waste instructions on push rbp / mov rbp,rsp and leave in function prologues/epilogues, and frees up RBP for use as a general-purpose register. (Even more useful in 32-bit code where 7 instead of 6 registers besides the stack pointer is a much bigger deal than 15 vs. 14.)
The idea is that given a RIP, you can look up the offset from RSP to the return address on the stack, and the locations of any call-preserved registers. So you can restore them and continue unwinding into the parent using that return address.
The metadata indicates where each register was saved, relative to RSP or RBP, given the current RIP as a search key. In functions that use an RBP frame pointer, one piece of metadata can indicate that. (Other metadata for each push rbx / push r12 says which call-preserved regs were saved in which order).
In functions that don't use RBP as a frame pointer, every push / pop or sub/add RSP needs metadata for which RIP it happened at, so given a RIP, stack unwinding can see where the return address is, and where those saved call-preserved registers are. (Functions that use alloca or VLAs thus must use RBP as a frame pointer.)
This is the big-picture problem that the metadata has to solve. There are a lot of details, and it's much easier to leave things up to a compiler!

Context switch using arm inline assembly

I have another question about an inline assembly instruction concerning a context switching. This code may work but I'm not sure at 100% so I submit this code to the pros of stackoverflow ;-)
I'm compiling using gcc (no optimization) for an arm7TDMI. At some point, the code must do a context switching.
/* Software Interrupt */
/* we must save lr in case it is called from SVC mode */
#define ngARMSwi(code) __asm__("SWI %0" : : "I"(code) : "lr")
// Note : code = 0x23
When I check the compiled code, I get this result :
svc 0x00000023
The person before me who coded this wrote "we must save lr" but in the compiled code, I don't see any traces of lr being saved.
The reason I think that code could be wrong is that the program run for some time before going into a reset exception and one of the last thing the code execute is a context switch...
The __asm__ statement lists lr as a clobbered register. This means that the compiler will save the register if it needs to.
As you're not seeing any save, I think you can assume the compiler was not using that register (in your testcase, at least).
I think that SWI instruction should be called in the user mode. if this is right. The mode of ARM is switched to SVC mode after this instruction. then the ARM core does the copy operation that the CPSR is copied into SPSR_svc and LR is copied into LR_svc. this should be used for saving the user mode cpu's context to return from svc mode. if your svc exception handler use lr like calling another function the lr register should be required to be preserved like using stack between the change of the mode. i guess the person before you wrote like that to talk about this situation.

inline assembly error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'

I have an inline AT&T style assembly block, which works with XMM registers and there are no problems in Release configuration of my XCode project, however I've stumbled upon this strange error (which is supposedly a GCC bug) in Debug configuration... Can I fix it somehow? There is nothing special in assembly code, but I am using a lot of memory constraints (12 constraints), can this cause this problem?
Not a complete answer, sorry, but the comments section is too short for this ...
Can you post a sample asm("..." :::) line that demonstrates the problem ?
The use of XMM registers is not the issue, the error message indicates that GCC wanted to create code like, say:
movdqa (%rax),%xmm0
i.e. memory loads/stores through pointers held in general registers, and you specified more memory locations than available general-purpose regs (it's probably 12 in debug mode because because RBP, RSP are used for frame/stackpointer and likely RBX for the global offset table and RAX reserved for returns) without realizing register re-use potential.
You might be able to eek things out by doing something like:
void *all_mem_args_tbl[16] = { memarg1, memarg2, ... };
void *trashme;
asm ("movq (%0), %1\n\t"
"movdqa (%1), %xmm0\n\t"
"movq 8(%0), %1\n\t"
"movdqa (%1), %xmm1\n\t"
...
: "r"all_mem_args_tbl : "r"(trashme) : ...);
i.e. put all the mem locations into a table that you pass as operand, and then manage the actual general-purpose register use on your own. It might be two pointer accesses through the indirection table, but whether that makes a difference is hard to say without knowing your complete assembler code piece.
The Debug configuration uses -O0 by default. Since this flag disables optimisations, the compiler is probably not being able to allocate registers given the constraints specified by your inline assembly code, resulting in register starvation.
One solution is to specify a different optimisation level, e.g. -Os, which is the one used by default in the Release configuration.

GCC's extended version of asm

I never thought I'd be posting an assembly question. :-)
In GCC, there is an extended version of the asm function. This function can take four parameters: assembly-code, output-list, input-list and overwrite-list.
My question is, are the registers in the overwrite-list zeroed out? What happens to the values that were previously in there (from other code executing).
Update: In considering my answers thus far (thank you!), I want to add that though a register is listed in the clobber-list, it (in my instance) is being used in a pop (popl) command. There is no other reference.
No, they are not zeroed out. The purpose of the overwrite list (more commonly called the clobber list) is to inform GCC that, as a result of the asm instructions the register(s) listed in the clobber list will be modified, and so the compiler should preserve any which are currently live.
For example, on x86 the cpuid instruction returns information in four parts using four fixed registers: %eax, %ebx, %ecx and %edx, based on the input value of %eax. If we were only interested in the result in %eax and %ebx, then we might (naively) write:
int input_res1 = 0; // also used for first part of result
int res2;
__asm__("cpuid" : "+a"(input_res1), "=b"(res2) );
This would get the first and second parts of the result in C variables input_res1 and res2; however if GCC was using %ecx and %edx to hold other data; they would be overwritten by the cpuid instruction without gcc knowing. To prevent this; we use the clobber list:
int input_res1 = 0; // also used for first part of result
int res2;
__asm__("cpuid" : "+a"(input_res1), "=b"(res2)
: : "%ecx", "%edx" );
As we have told GCC that %ecx and %edx will be overwritten by this asm call, it can handle the situation correctly - either by not using %ecx or %edx, or by saving their values to the stack before the asm function and restoring after.
Update:
With regards to your second question (why you are seeing a register listed in the clobber list for a popl instruction) - assuming your asm looks something like:
__asm__("popl %eax" : : : "%eax" );
Then the code here is popping an item off the stack, however it doesn't care about the actual value - it's probably just keeping the stack balanced, or the value isn't needed in this code path. By writing this way, as opposed to:
int trash // don't ever use this.
__asm__("popl %0" : "=r"(trash));
You don't have to explicitly create a temporary variable to hold the unwanted value. Admittedly in this case there isn't a huge difference between the two, but the version with the clobber makes it clear that you don't care about the value from the stack.
If by "zeroed out" you mean "the values in the registers are replaced with 0's to prevent me from knowing what some other function was doing" then no, the registers are not zeroed out before use. But it shouldn't matter because you're telling GCC you plan to store information there, not that you want to read information that's currently there.
You give this information to GCC so that (reading the documentation) "you need not guess which registers or memory locations will contain the data you want to use" when you're finished with the assembly code (eg., you don't have to remember if the data will be in the stack register, or some other register).
GCC needs a lot of help for assembly code because "The compiler ... does not parse the assembler instruction template and does not know what it means or even whether it is valid assembler input. The extended asm feature is most often used for machine instructions the compiler itself does not know exist."
Update
GCC is designed as a multi-pass compiler. Many of the passes are in fact entirely different programs. A set of programs forming "the compiler" translate your source from C, C++, Ada, Java, etc. into assembly code. Then a separate program (gas, for GNU Assembler) takes that assembly code and turns it into a binary (and then ld and collect2 do more things to the binary). Assembly blocks exist to pass text directly to gas, and the clobber-list (and input list) exist so that the compiler can do whatever set up is needed to pass information between the C, C++, Ada, Java, etc. side of things and the gas side of things, and to guarantee that any important information currently in registers can be protected from the assembly block by copying it to memory before the assembly block runs (and copying back from memory afterward).
The alternative would be to save and restore every register for every assembly code block. On a RISC machine with a large number of registers that could get expensive (the Itanium has 128 general registers, another 128 floating point registers and 64 1-bit registers, for instance).
It's been a while since I've written any assembly code. And I have much more experience using GCC's named registers feature than doing things with specific registers. So, looking at an example:
#include <stdio.h>
long foo(long l)
{
long result;
asm (
"movl %[l], %[reg];"
"incl %[reg];"
: [reg] "=r" (result)
: [l] "r" (l)
);
return result;
}
int main(int argc, char** argv)
{
printf("%ld\n", foo(5L));
}
I have asked for an output register, which I will call reg inside the assembly code, and that GCC will automatically copy to the result variable on completion. There is no need to give this variable different names in C code vs assembly code; I only did it to show that it is possible. Whichever physical register GCC decides to use -- whether it's %%eax, %%ebx, %%ecx, etc. -- GCC will take care of copying any important data from that register into memory when I enter the assembly block so that I have full use of that register until the end of the assembly block.
I have also asked for an input register, which I will call l both in C and in assembly. GCC promises that whatever physical register it decides to give me will have the value currently in the C variable l when I enter the assembly block. GCC will also do any needed recordkeeping to protect any data that happens to be in that register before I enter the assembly block.
What if I add a line to the assembly code? Say:
"addl %[reg], %%ecx;"
Since the compiler part of GCC doesn't check the assembly code it won't have protected the data in %%ecx. If I'm lucky, %%ecx may happen to be one of the registers GCC decided to use for %[reg] or %[l]. If I'm not lucky, I will have "mysteriously" changed a value in some other part of my program.
I suspect the overwrite list is just to give GCC a hint not to store anything of value in these registers across the ASM call; since GCC doesn't analyze what ASM you're giving it, and certain instructions have side-effects that touch other registers not explicitly named in the code, this is the way to tell GCC about it.

Resources