Write inline assembly without clobber lists? - gcc

Is it possible to write inline assembly (Intel syntax) with GCC or Clang, without needing to understand the clobber list "stuff"?
I'm going to guess "no" because the clobber list "stuff" ensures you don't over-write the register the compiler wrote to (immediately before your inline assembly begins)?

GNU C Basic inline asm statements (no operand/clobber lists) are not recommended for basically anything except maybe the body of an __attribute__((naked)) function. Why can't local variable be used in GNU C basic inline asm statements? (globals can't safely be used either.)
https://gcc.gnu.org/wiki/DontUseInlineAsm says to see ConvertBasicAsmToExtended for reasons not to use Basic asm statements. You can't really do anything safely in Basic asm; even asm("cli"); can get reordered with any memory accesses that aren't volatile.
If you're going to use inline asm at all (instead of writing a stand-alone function in asm, or C with intrinsics), you need to describe your string of asm instruction in exact detail to the compiler, in terms of a black box with input and/or output operands, and/or clobbers. See https://stackoverflow.com/tags/inline-assembly/info for links to guides, including some SO answers about using input / output constraints.
Think hard before deciding it's really worth using GNU C inline asm for anything. If you can get the compiler to emit the same instructions another way, that's almost always better. Intrinsics or pure C allow constant-propagation optimization; inline asm doesn't (unless you do stuff like if(_builtin_constant_p(x)) { pure C version } else { inline asm version }).
Intel syntax: in GCC, compile with -masm=intel so your asm template will be part of an Intel-syntax .s, and the compiler will substitute in operands in Intel syntax. (Like dword ptr [rsp] instead of (%rsp) for "m"(my_int)).
In clang I'm not sure there's any convenient way to use Intel-syntax in normal asm statements.
There is one other option though, if you don't care about efficient code (but then why are you using asm?): clang supports -fasm-blocks to allow syntax like MSVC's inefficient style of inline asm. And yes, this uses Intel syntax.
Is there any way to complie a microsoft style inline-assembly code on a linux platform? shows how inefficient the resulting code is: full of compiler-generated instructions to store input variables to memory for the asm{} block to read them. Because MSVC-style asm blocks can't do inputs or outputs in registers. (Clang doesn't support the leave-a-value-in-EAX method for getting a single value out so the output has to be stored/reloaded as well.)
You don't get to specify clobbers for this, so I assume an asm block implies a "memory" clobber, along with clobbers on all registers you write. (Or maybe even just mention.)
I would not recommend this; it's basically not possible to wrap a single instruction or handful of instructions efficiently this way. Only if you're writing a whole loop can you amortize the overhead of getting inputs into an asm{} block.

Related

GCC [for ARM] force no floating point

I would like to create a build of my embedded C code which specifically checks that floating point operations aren't introduced into it by accident. I've tried adding +nofp to my [cortex-m3] processor architecture but GCC for ARM doesn't like that (probably because the cortex-m3 doesn't have a floating point unit). I've tried specifying -mfpu=none but that isn't a permitted option. I've tried leaving -lm off the linker command-line but the linker seems too clever to be fooled by that and is compiling code with double in it and resolving pow() anyway.
This post: https://gcc.gnu.org/legacy-ml/gcc-help/2011-07/msg00093.html from 2011 hints that GCC has no such option, since no-one is interested in it, which surprises me as it seems like a common thing to want, at least from an embedded standpoint, to avoid accidental C-library bloat.
Does anyone know of a way to do this with GCC/newlib without me having to go through and manually hack stuff out of the C library file it chooses?
It is not just a library issue. Your target will use soft-fp, and the compiler will supply floating point code to implement arithmetic operators regardless of the library.
The solution I generally apply is to scan the map file for instances of the compiler supplied floating-point routines. If your code is "fp clean" there will be no such references. The math library and any other code that perform floating-point arithmetic operations will use these operator implementations, so you only need look for these operator calls and can ignore the Newlib math library functions.
The internal soft-fp routines are listed at https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html. It is probably feasible to manually check the mapfile for fp symbols but you might write yourself a script or tool to scan the map file for these names to check your. The cross-reference section of the map file will list all modules these symbols are used in so you can use that to identify where the floating point code is used.
The Newlib stdio functions support floating-point by default. If your formatted I/O is limited to printf() you can use iprintf() instead or you can rebuild Newlib with FLOATING_POINT undefined to remove floating point support from all but scanf() (no idea why). You can then use the map file technique again to find "banned" formatted I/O functions (although these are likely to also use the floating point operator functions in any case, so you will already have spotted them indirectly).
An alternative is to use an alternative stdio library to override the Newlib versions. There are any number of "tiny printf" implementations available you could use. If you link such a library as object code or list its library ahead of Newlib in the link command, it will override the Newlib versions.

PowerPC GCC Print registers in assembly without % sign

On Matt Godbolt's Compiler Explorer website, you can compile code using various pre-installed compilers. When using PowerPC gcc 4.8 the registers cannot be distinguished from immediates (for example addi 11,31,16).
However, when the -mregnames option is used, all registers are marked with %r followed by the register index. How do I omit just the % sign to get r1 instead of %r1?
For example, void nop () {} with gcc4.8 PowerPC -O0 -mregnames:
nop():
stwu %r1,-16(%r1)
stw %r31,12(%r1)
mr %r31,%r1
addi %r11,%r31,16
lwz %r31,-4(%r11)
mr %r1,%r11
blr
When targeting PowerPC, you basically have two options for the syntax of assembly listings:
You can either use the IBM syntax (common on IBM assemblers), where the registers do not use any type of special prefix: they are just referred to with numbers. Yes, this makes it difficult to distinguish them from immediates.
Or, you can use Gnu/AT&T syntax, which always prefixes registers with % symbols (and an r, in this case). This not only makes it easier to distinguish between registers and immediates, but it also makes it possible to distinguish between integer registers (%r?) and floating-point registers (%f?).
There is no intermediate option, where you get the r (or f) prefix, but no leading %. If you need this, you can do like Jester suggested and post-process the output, using the regular expression %r[0-9]+ for matching.
An update:
powerpc-linux-gnu-gcc version 5.4.0 (the default package with Ubuntu 16.04)
When using -mregnames, you can use "%r0" or "r0" or "0" format for a register name in assembly source code files.
For disassembling, powerpc-linux-gnu-objdump defaults to the "r0" format (which I agree is easier to read).
In the example from that webpage, it looks like it is showing the listing output from the compiler, instead of using objdump. I do not know of a way to control the listing output format.

gcc/clang: How to force ordering of items on the stack?

Consider the following code:
int a;
int b;
Is there a way to force that a precedes b on the stack?
One way to do the ordering would be to put b in a function:
void foo() {
int b;
}
...
int a;
foo();
However, that would generally work only if b isn't inlined.
Maybe there's a different way to do that? Putting an inline assembler between the two declarations may do a trick, but I am not sure.
Your initial question was about forcing a function call to not be inlined.
To improve on Jordy Baylac's answer, you might try to declare the function within the block calling it, and perhaps use a statement expr:
#define FOO_WITHOUT_INLINING(c,i) ({ \
extern int foo (char, int) __attribute__((noinline)); \
int r = foo(c,i); \
r; })
(If the type of foo is unknown, you could use typeof)
However, I still think that your question is badly formulated (and is meaningless, if one avoid reading your comments which should really go inside the question, which should have mentioned your libmill). By definition of inlining, a compiler can inline any function as it wants without changing the semantics of the program.
For example, a user of your library might legitimately compile it with -flto -O2 (both at compiling and at linking stage). I don't know what would happen then.
I believe you might redesign your code, perhaps using -fsplit-stack; are you implementing some call/cc in C? Then look inside the numerous existing implementations of it, and inside Gabriel Kerneis CPC.... See also setcontext(3) & longjmp(3)
Perhaps you might need to use somewhere the return_twice (and/or nothrow) function attribute of GCC, or some _Pragma like GCC optimize
Then you edited your question to change it completely (asking about order of variables on the call stack), still without mentioning in the question your libmill and its go macro (as you should; comments are volatile so should not contain most of the question).
But the C compiler is not even supposed to have a call stack (an hypothetical C99 conforming compiler could do whole program optimization to avoid any call stack) in the compiled program. And GCC is certainly allowed to put some variables outside of the call stack (e.g. only in registers) and it is doing that. And some implementations (IA64 probably) have two call stacks.
So your changed question is completely meaniningless: a variable might not sit on the stack (e.g. only be in a register, or even disappear completely if the compiler can prove it is useless after some other optimizations), and the compiler is allowed to optimize and use the same call stack slot for two variables (and GCC is doing such an optimization quite often). So you cannot force any order on the call stack layout.
If you need to be sure that two local variables a & b have some well defined order on the call stack, make them into a struct e.g.
struct { int _a, _b; } _locals;
#define a _locals._a
#define b _locals._b
then, be sure to put the &_locals somewhere (e.g. in a volatile global or thread-local variable). Since some versions of GCC (IIRC 4.8 or 4.7) had some optimization passes to reorder the fields of non-escaping struct-s
BTW, you might customize GCC with your MELT extension to help about that (e.g. introduce your own builtin or pragma doing part of the work).
Apparently, you are inventing some new dialect of C (à la CPC); then you should say that!
below there is a way, using gcc attributes:
char foo (char, int) __attribute__ ((noinline));
and, as i said, you can try -fno-inline-functions option, but this is for all functions in the compilation process
It is still unclear for me why you want function not to be inline-d, but here is non-pro solution I am proposing:
You can make this function in separate object something.o file.
Since you will include header only, there will be no way for the compiler to inline the function.
However linker might decide to inline it later at linking time.

GCC __attribute__((always_inline)) and lambdas, is this syntax correct?

I am using GCC 4.6 as part of the lpcxpresso ide for a Cortex embedded processor. I have very limited code size, especially when compiling in debug mode. Using attribute((always_inline)) has so far proven to be a good tool to inline trivial functions and this saves a lot of code bloat in debug mode while still maintaining readability. I expect it to be somewhat mainstream and supported in the future because it is mentioned here http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0348c/CIAJGAIH.html
Now to my question: Is this the correct Syntax for declaring a Lambda always inline?
#define ALWAYS_INLINE __attribute__((always_inline))
[](volatile int &i)ALWAYS_INLINE{i++;}
It does work, my question is will it continue to work in future and what can I do to ensure it works in the future. If I ever switch to another major compiler that supports c++11 will I find a similar keyword which I can replace the attribute((always_inline)) with?
If I were to meet my fairy godmother I would wish for a compiler directive which causes all lambdas which are constructed as temporaries with empty constructors and bound by reference to be automatically inlined even in debug mode. Any ideas?
Will it continue to work in future?
Likely but, always_inline is compiler specific and since there is no standard specifying its exact behavior with lambda, there is no guaranty that this will continue to work in the future.
What can I do to ensure it works?
This depends on the compiler not you. If a future version drops support for always_inline with lambda, you have to stick with a version that works or code your own preprocessor that inlines lambdas with an always_inline-like keyword.
If I ever switch to another major compiler that supports c++11 will I
find a similar keyword?
Likely but again, there is no guaranty. The only real standard is the C++ inline keyword and it is not applicable to lambdas. For non-lambda it only suggests inlining and tells the compiler that a function may be defined in different compile units.

Equivalent for GCC's naked attribute

I've got an application written in pure C, mixed with some functions that contain pure ASM. Naked attribute isn't available for x86 (why? why?!) and my asm functions don't like when prologue and epilogue is messing with the stack. Is it somehow possible to create a pure assembler function that can be referenced from C code parts? I simply need the address of such ASM function.
Just use asm() outside a function block. The argument of asm() is simply ignored by the compiler and passed directly on to the assembler. For complex functions a separate assembly source file is the better option to avoid the awkward syntax.
Example:
#include <stdio.h>
asm("_one: \n\
movl $1,%eax \n\
ret \n\
");
int one();
int main() {
printf("result: %d\n", one());
return 0;
}
PS: Make sure you understand the calling conventions of your platform. Many times you can not just copy/past assembly code.
PPS: If you care about performance, use extended asm instead. Extended asm essentially inlines the assembly code into your C/C++ code and is much faster, especially for short assembly functions. For larger assembly functions a seperate assembly source file is preferable, so this answer is really a hack for the rare case that you need a function pointer to a small assembly function.
Good news everyone. GCC developers finally implemented attribute((naked)) for x86. The feature will be available in GCC 8.
Certainly, just create a .s file (assembly source), which is run through gas (the assembler) to create a normal object file.

Resources