C assembly pushl not working for WriteFile function - winapi

i used C-assembly to call the (_GetStdHandle#4) function to get (output) handle then used (_WriteFile#20) function to write my string on console using handle that i got from (_GetStdHandle#4).
i used (pushl) in my source code for each function to pass the parameters but something's is wrong because (WriteFile)) function return error (6) which is invalid handle but the handle is valid ... so something's wrong with passing argument ... yes ... my problem is passing argument to (_WriteFile) function using (pushl) ... in this code, i used (g) for each argument because there is no reason to move the parameters to register then push the registers ... so i didn't used (r) but if i use (r), the program work without any problem (which mov the parameters to registers first then push the registers (which i want to push the parameters without moving them into the registers)
this code is show nothing and the problem is from (WriteFile) function and if i use (r) for (WriteFile) parameters, the print will be done but why i can't use "g" to not mov the parameters to registers ?
typedef void * HANDLE;
#define GetStdHandle(result, handle) \
__asm ( \
"pushl %1\n\t" \
"call _GetStdHandle#4" \
: "=a" (result) \
: "g" (handle))
#define WriteFile(result, handle, buf, buf_size, written_bytes) \
__asm ( \
"pushl $0\n\t" \
"pushl %1\n\t" \
"pushl %2\n\t" \
"pushl %3\n\t" \
"pushl %4\n\t" \
"call _WriteFile#20" \
: "=a" (result) \
: "g" (written_bytes), "g" (buf_size), "g" (buf), "g" (handle))
int main()
{
HANDLE handle;
int write_result;
unsigned long written_bytes;
GetStdHandle(handle, -11);
if(handle != INVALID_HANDLE_VALUE)
{
WriteFile(write_result, handle, "Hello", 5, & written_bytes);
}
return 0;
}
the Assembly code for this program is :
.file "main.c"
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "Hello\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB25:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
/APP
pushl $-11
call _GetStdHandle#4
# 0 "" 2
/NO_APP
movl %eax, 12(%esp)
cmpl $-1, 12(%esp)
je L2
leal 4(%esp), %eax
/APP
pushl $0
pushl %eax
pushl $5
pushl $LC0
pushl 12(%esp)
call _WriteFile#20
# 0 "" 2
/NO_APP
movl %eax, 8(%esp)
L2:
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE25:
.ident "GCC: (MinGW.org GCC-6.3.0-1) 6.3.0"
what is the problem ?

I would question the need for calling the WINAPI through wrappers like this rather than calling them directly. You can declare prototypes for the stdcall calling convention with
__attribute__((stdcall))
If you don't need to use inline assembly you shouldn't. GCC's inline assembly is hard to get right. Getting it wrong can make the code appear to work until one day it doesn't, especially if optimizations are enabled. David Wohlferd has a good article on why you shouldn't use inline assembly if you don't need to.
The primary problem can be seen in this section of generated code:
pushl $0
pushl %eax
pushl $5
pushl $LC0
pushl 12(%esp)
call _WriteFile#20
GCC has computed the memory operand (handle) for the first parameter as 12(%esp) . The problem is that you have altered ESP with the previous pushes and now offset 12(%esp) is no longer where handle is.
To get around this problem you can pass memory addresses through registers or as immediates (if possible). Rather than use g constraint which includes m (memory constraints), simply use ri for registers and immediates. This prevents memory operands from being generated. If you pass pointers through registers you will also need to add the "memory" clobber.
The STDCALL(WINAPI) calling convention allows a function to destroy EAX, ECX, and EDX (AKA the volatile registers). It is possible that GetStdHandle and WriteFile will clobber ECX and EDX as well as return a value in EAX. You need to ensure that ECX and EDX are listed as clobbers as well (or have a constraint that marks it as output), otherwise the compiler may assume the values in those registers are the same before and after the inline assembly blocks are completed. If they are different it could cause subtle bugs.
With these changes your code could look something like:
#define INVALID_HANDLE_VALUE (void *)-1
typedef void *HANDLE;
#define GetStdHandle(result, handle) \
__asm ( \
"pushl %1\n\t" \
"call _GetStdHandle#4" \
: "=a" (result) \
: "g" (handle) \
: "ecx", "edx")
#define WriteFile(result, handle, buf, buf_size, written_bytes) \
__asm __volatile ( \
"pushl $0\n\t" \
"pushl %1\n\t" \
"pushl %2\n\t" \
"pushl %3\n\t" \
"pushl %4\n\t" \
"call _WriteFile#20" \
: "=a" (result) \
: "ri" (written_bytes), "ri" (buf_size), "ri" (buf), "ri" (handle) \
: "memory", "ecx", "edx")
int main()
{
HANDLE handle;
int write_result;
unsigned long written_bytes;
GetStdHandle(handle, -11);
if(handle != INVALID_HANDLE_VALUE)
{
WriteFile(write_result, handle, "Hello", 5, &written_bytes);
}
return 0;
}
Notes:
I marked the WriteFile inline assembly as __volatile so that the optimizer can't remove the entire inline assembly if it thinks result isn't being used. The compiler doesn't know that a side effect of the function is that the display is updated. Mark the function volatile to prevent the inline assembly from being removed entirely.
GetStdHandle doesn't have a problem with potential memory operands because there are no further uses of constraints after the initial push %1. The problem you are encountering is only an issue when ESP has been modified (via a PUSH/POP or change to ESP directly) and there is a possible use of a memory constraint in that inline assembly afterwards.

Related

Using sigaction and setitimer system calls to implement assembly language timer on BSD/OS X

I'm trying to implement a timer routine in 32-bit assembler on OS X Lion using sigaction() & setitimer() system calls. The idea is to set a timer using setitimer() & then have the generated alarm signal invoke a handler function previously setup through sigaction(). I have such a mechanism working on Linux, but cannot seem to get it working on OS X. I know the system call convention is different between OS X & Linux, and that OS X has a 16 byte alignment requirement. Despite compensating for these, I'm still not able to get it working (usually a "Bus error: 10" error). Thinking I did something wrong with the alignment, I wrote a simple C program that does what I want & then used clang 3.2 to generate the assembly code. Then I modified the machine-generated assembly by replacing the calls to sigaction() & setitimer() with the appropriate system & int $0x80 calls, as well as stack alignment instructions. The resulting program still doesn't work.
Here is the C program, sigaction.c, that I used to generate the assembly. Note that I commented out the printf & sleep stuff so the resulting assembly code would be easier to read:
//#include <stdio.h>
#include <signal.h>
#include <sys/time.h>
struct sigaction action;
void handler(int arg) {
// printf("HERE!\n");
}
int main() {
action.__sigaction_u.__sa_handler = handler;
action.sa_mask = 0;
action.sa_flags = 0;
// printf("sigaction size: %d\n", sizeof(action));
int fd = sigaction(14, &action, 0);
struct itimerval timer;
timer.it_interval.tv_sec = 1;
timer.it_interval.tv_usec = 0;
timer.it_value.tv_sec = 1;
timer.it_value.tv_usec = 0;
// printf("itimerval size: %d\n", sizeof(timer));
fd = setitimer(0, &timer, 0);
while (1) {
// sleep(60);
}
return 0;
}
Here is the assembly code generated using "clang -arch i386 -S sigaction.c" on the above file:
.section __TEXT,__text,regular,pure_instructions
.globl _handler
.align 4, 0x90
_handler: ## #handler
## BB#0:
pushl %ebp
movl %esp, %ebp
pushl %eax
movl 8(%ebp), %eax
movl %eax, -4(%ebp)
addl $4, %esp
popl %ebp
ret
.globl _main
.align 4, 0x90
_main: ## #main
## BB#0:
pushl %ebp
movl %esp, %ebp
pushl %esi
subl $52, %esp
calll L1$pb
L1$pb:
popl %eax
movl $14, %ecx
movl L_action$non_lazy_ptr-L1$pb(%eax), %edx
movl $0, %esi
leal _handler-L1$pb(%eax), %eax
movl $0, -8(%ebp)
movl %eax, (%edx)
movl $0, 4(%edx)
movl $0, 8(%edx)
movl $14, (%esp)
movl %edx, 4(%esp)
movl $0, 8(%esp)
movl %esi, -36(%ebp) ## 4-byte Spill
movl %ecx, -40(%ebp) ## 4-byte Spill
calll _sigaction
movl $0, %ecx
leal -32(%ebp), %edx
movl %eax, -12(%ebp)
movl $1, -32(%ebp)
movl $0, -28(%ebp)
movl $1, -24(%ebp)
movl $0, -20(%ebp)
movl $0, (%esp)
movl %edx, 4(%esp)
movl $0, 8(%esp)
movl %ecx, -44(%ebp) ## 4-byte Spill
calll _setitimer
movl %eax, -12(%ebp)
LBB1_1: ## =>This Inner Loop Header: Depth=1
jmp LBB1_1
.comm _action,12,2 ## #action
.section __IMPORT,__pointers,non_lazy_symbol_pointers
L_action$non_lazy_ptr:
.indirect_symbol _action
.long 0
.subsections_via_symbols
If I compile the assembly code using "clang -arch i386 sigaction.s -o sigaction", debug it using lldb & place a breakpoint in the handler function, the handler function is indeed called every second. so I know the assembly code is correct (ditto for the C code).
Now if I replace the call to sigaction() with:
# calll _sigaction
movl $0x2e, %eax
subl $0x04, %esp
int $0x80
addl $0x04, %esp
and the call to setitimer() with:
# calll _setitimer
movl $0x53, %eax
subl $0x04, %esp
int $0x80
addl $0x04, %esp
the assembly code no longer works, and generates the same "Bus error: 10" that my hand-coded assembly code does.
I've tried removing the subl/addl instructions that I'm using to align the stack as well as changing the values to make sure the stack is aligned on 16-byte boundaries, but nothing seems to work. I either get the bus error, a segmentation fault, or the code just hangs without calling the handler function.
One thing I did notice during debugging is that the sigaction call appears to have a lengthy wrapper around the underlying system call. If you disassemble both functions from within lldb, you will see sigaction() has a lengthy wrapper but setitimer does not. Not sure this means anything, but perhaps the sigaction() wrapper is massaging the data before passing it along. I tried debugging that code, but haven't found anything definitive yet.
If anyone knows how to get the above assembly code working by replacing the sigaction() & setitimer() functions with the appropriate system calls, it would be greatly appreciated. I can then take those changes & apply them to my hand-coded routines.
Thanks.
Update: I stripped down my hand-written assembly code to a manageable size & was able to get it working using the sigaction() & setitimer() library calls, but still haven't figured out why the syscalls don't work. Here's the code (timer.s):
.globl _main
.data
.set ITIMER_REAL, 0x00
.set SIGALRM, 0x0e
.set SYS_SIGACTION, 0x2e
.set SYS_SETITIMER, 0x53
.set TRAP, 0x80
itimerval:
interval_tv_sec:
.long 0
interval_tv_usec:
.long 0
value_tv_sec:
.long 0
value_tv_usec:
.long 0
sigaction:
sa_handler:
.long handler
sa_mask:
.long 0
sa_flags:
.long 0
.text
handler:
pushl %ebp
movl %esp, %ebp
movl %ebp, %esp
popl %ebp
ret
_main:
pushl %ebp
movl %esp, %ebp
subl $0x0c, %esp
movl $SIGALRM, %ebx
movl $sigaction, %ecx
movl $0x00, %edx
pushl %edx
pushl %ecx
pushl %ebx
# subl $0x04, %esp
call _sigaction
# movl $SYS_SIGACTION, %eax
# int $0x80
addl $0x0c, %esp
# addl $0x10, %esp
movl $ITIMER_REAL, %ebx
movl $0x01, interval_tv_sec # Successive calls every 1 second
movl $0x00, interval_tv_usec
movl $0x01, value_tv_sec # Initial call in 1 second
movl $0x00, value_tv_usec
movl $itimerval, %ecx
movl $0x00, %edx
pushl %edx
pushl %ecx
pushl %ebx
# subl $0x04, %esp
call _setitimer
# movl $SYS_SETITIMER, %eax
# int $0x80
addl $0x0c, %esp
# addl $0x10, %esp
loop:
jmp loop
When compiled with "clang -arch i386 timer.s -o timer" & debugged with lldb, the handler routine is called every second. I left my efforts at making the code work with syscalls in the code - they are commented out around the sigaction() & setitimer() calls. If for no other reason than to educate myself (and others), I would still like to get the sys call version working if possible, and if not, understand the reason why it doesn't work.
Thanks again.
Update 2: I got the setitimer syscall working. Here's the modified code:
pushl %edx
pushl %ecx
pushl %ebx
subl $0x04, %esp
movl $SYS_SETITIMER, %eax
int $0x80
addl $0x10, %esp
But the same edits do not work for the sigaction sys call, which leads me back to my original conclusion - the sigaction() library function is doing something extra before making the actual syscall. This snippet from dtruss seems to suggest the same:
With sigaction() syscall (not working):
sigaction(0xE, 0x2030, 0x0) = 0 0
setitimer(0x0, 0x2020, 0x0) = 0 0
With sigaction() library call (working):
sigaction(0xE, 0xBFFFFC40, 0x0) = 0 0
setitimer(0x0, 0x2028, 0x0) = 0 0
As you can see, the 2nd argument is different between the two versions. It seems the address of the sigaction structure (0x2030) is passed directly when using the syscall, but something else is passed when using the library call. I'm guessing that the "something else" is generated in the sigaction() library function.
Update 3: I discovered that the same exact problem exists on FreeBSD 9.1. The setitimer syscall works, but the sigaction syscall does not. Like OS X, the sigaction() library call does work.
BSD has a few sigaction syscalls - so far, I've only tried the same one I was using in OS X - 0x2e. Perhaps one of the other sigaction syscalls will work. Knowing that BSD has the same behavior will make this easier to track down, as I can pull the C source code. Plus this opens the problem up to a much wider group of people who may already know what the problem is.
Based on my understanding of how syscalls work coupled with the fact that sigaction does work on Linux, I can't help but to think I am doing something wrong in my code. However, the fact that replacing the int $0x80 call with the sigaction() library function causes my code to work seems to contradict this. There is an entire chapter on assembly language programming in the FreeBSD developer manual, as well as a section on making system calls, so what I'm doing should be possible:
https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/x86-system-calls.html
Unless someone can point out what, if anything, I am doing wrong, I think the next step is for me to look at the BSD sources for sigaction(). As I mentioned previously, I looked at the disassembled version of sigaction on OS X & found it to be quite lengthy compared to other syscalls & filled with magic numbers. Hopefully looking at the C code will make clear what it's doing that causes it to work. In the end, it could be something as simple as passing in the wrong sigaction struct (there are several of them) or failing to set some bit somewhere.

Unexpected GCC inline ASM behaviour (clobbered variable overwritten)

On my computer, the compiled executable omits executing "mov %2, %%ax" at the top of the loop
when "add %1, %%ax" uncommented.
Anyone to doublecheck or comment ?
#include <stdio.h>
int main() {
short unsigned result, low ,high;
low = 0;
high = 1;
__asm__ (
"movl $10, %%ecx \n\t"
"loop: mov %2, %%ax \n\t"
// "add %1, %%ax \n\t" // uncomment and result = 10
"mov %%ax, %0 \n\t"
"subl $1, %%ecx \n\t"
"jnz loop"
: "=r" (result)
: "r" (low) , "r" (high)
: "%ecx" ,"%eax" );
printf("%d\n", result);
return 0;
}
Follows the assembly generated
movl $1, %esi
xorl %edx, %edx
/APP
movl $10 ,%ecx
loop: mov %si, %ax
mov %dx, %bx
add %bx, %ax
mov %ax, %dx
subl $1, %ecx
jnz loop
/NO_APP
Thanks to Jester the solution :
: "=&r" (result) // early clober modifier
GCC inline assembly is advanced programming, with a lot of pitfalls. Make sure you actually need it, and can't replace it with standalone assembly module, or C code using intrinsics. or vector support.
If you insist on inline assembly, you should be prepared to at least look at the generated assembly code and try to figure out any mistakes from there. Obviously the compiler does not omit anything that you write into the asm block, it just substitutes the arguments. If you look at the generated code, you might see something like this:
add %dx, %ax
mov %ax, %dx
Apparently the compiler picked dx for both argument 0 and 1. It is allowed to do that, because by default it assumes that the input arguments are consumed before any outputs are written. To signal that this is not the case, you must use an early clobber modifier for your output operand, so it would look like "=&r".
PS: Even when inline assembly seems to work, it may have hidden problems that will bite you another day, when the compiler happens to make other choices. You should really avoid it.

gcc compilation and assemblying

I am trying to create an executable with gcc. I have two files virtualstack.c (which consists of the C-code below) and stack.s which consists of the intel x86 assembly code written in AT&T syntax (seen below the C-code). My command line command is gcc -c virtualstack.c -s stack.s, but I get two errors (line 3 in stack.s) - missing symbol name in directive and no such instruction _stack_create. I thought I have correctly declared functions from C in assembly prefixed with a underscore (_). I would be very grateful for any comments.
C code:
#include <stdio.h>
#include <stdlib.h>
extern void stack_create(void);
int main(void)
{
stack_create();
return 0;
}
Assembly code:
.global _stack_create
.type, #function
_stack_create
pushl %ebp
movl $5, %esp
movl %esp, ebp
movl $21, %edx
pushl %edx
I will try to explain you method how to investigate such cases.
1) It is always good idea to make compiler work for you. So lets start with code (lets call it assemble.c):
#include <stdio.h>
#include <stdlib.h>
/* stub stuff */
void __attribute__ ((noinline))
stack_create(void) { }
int
main(void)
{
stack_create();
return 0;
}
Now compile it to assembler with gcc -S -g0 assemble.c. stack_create function was assembled to (your results may differ, so please follow my instructions by yourself):
.text
.globl stack_create
.type stack_create, #function
stack_create:
pushq %rbp
movq %rsp, %rbp
popq %rbp
ret
.size stack_create, .-stack_create
2) Now all you need is to take this template and fill it with your stuff:
.text
.globl stack_create
.type stack_create, #function
stack_create:
pushq %rbp
movq %rsp, %rbp
;; Go and put your favorite stuff here!
pushl %ebp
movl $5, %esp
movl %esp, ebp
movl $21, %edx
pushl %edx
... etc ...
popq %rbp
ret
.size stack_create, .-stack_create
And of course make it separate .s file, say stack.s.
3) Now lets compile alltogether. Remove stub stuff from assemble.c and compile everything as:
gcc assemble.c stack.s
I got no errors. I believe you will get no errors too.
The main lesson: don't ever try to write in assembler in details like sections, function labels, etc. Compiler better knows how to do it. Use his knowledge instead.

Why is GCC std::atomic increment generating inefficient non-atomic assembly?

I've been using gcc's Intel-compatible builtins (like __sync_fetch_and_add) for quite some time, using my own atomic template. The "__sync" functions are now officially considered "legacy".
C++11 supports std::atomic<> and its descendants, so it seems reasonable to use that instead, since it makes my code standard compliant, and the compiler will produce the best code either way, in a platform independent manner, that is almost too good to be true.
Incidentally, I'd only have to text-replace atomic with std::atomic, too. There's a lot in std::atomic (re: memory models) that I don't really need, but default parameters take care of that.
Now for the bad news. As it turns out, the generated code is, from what I can tell, ... utter crap, and not even atomic at all. Even a minimum example that increments a single atomic variable and outputs it has no fewer than 5 non-inlined function calls to ___atomic_flag_for_address, ___atomic_flag_wait_explicit, and __atomic_flag_clear_explicit (fully optimized), and on the other hand, there is not a single atomic instruction in the generated executable.
What gives? There is of course always the possibility of a compiler bug, but with the huge number of reviewers and users, such rather drastic things are generally unlikely to go unnoticed. Which means, this is probably not a bug, but intended behaviour.
What is the "rationale" behind so many function calls, and how is atomicity implemented without atomicity?
As-simple-as-it-can-get example:
#include <atomic>
int main()
{
std::atomic_int a(5);
++a;
__builtin_printf("%d", (int)a);
return 0;
}
produces the following .s:
movl $5, 28(%esp) #, a._M_i
movl %eax, (%esp) # tmp64,
call ___atomic_flag_for_address #
movl $5, 4(%esp) #,
movl %eax, %ebx #, __g
movl %eax, (%esp) # __g,
call ___atomic_flag_wait_explicit #
movl %ebx, (%esp) # __g,
addl $1, 28(%esp) #, MEM[(__i_type *)&a]
movl $5, 4(%esp) #,
call _atomic_flag_clear_explicit #
movl %ebx, (%esp) # __g,
movl $5, 4(%esp) #,
call ___atomic_flag_wait_explicit #
movl 28(%esp), %esi # MEM[(const __i_type *)&a], __r
movl %ebx, (%esp) # __g,
movl $5, 4(%esp) #,
call _atomic_flag_clear_explicit #
movl $LC0, (%esp) #,
movl %esi, 4(%esp) # __r,
call _printf #
(...)
.def ___atomic_flag_for_address; .scl 2; .type 32; .endef
.def ___atomic_flag_wait_explicit; .scl 2; .type 32; .endef
.def _atomic_flag_clear_explicit; .scl 2; .type 32; .endef
... and the mentioned functions look e.g. like this in objdump:
004013c4 <__atomic_flag_for_address>:
mov 0x4(%esp),%edx
mov %edx,%ecx
shr $0x2,%ecx
mov %edx,%eax
shl $0x4,%eax
add %ecx,%eax
add %edx,%eax
mov %eax,%ecx
shr $0x7,%ecx
mov %eax,%edx
shl $0x5,%edx
add %ecx,%edx
add %edx,%eax
mov %eax,%edx
shr $0x11,%edx
add %edx,%eax
and $0xf,%eax
add $0x405020,%eax
ret
The others are somewhat simpler, but I don't find a single instruction that would really be atomic (other than some spurious xchg which are atomic on X86, but these seem to be rather NOP/padding, since it's xchg %ax,%ax following ret).
I'm absolutely not sure what such a rather complicated function is needed for, and how it's meant to make anything atomic.
It is an inadequate compiler build.
Check your c++config.h, it shoukld look like this, but it doesn't:
/* Define if builtin atomic operations for bool are supported on this host. */
#define _GLIBCXX_ATOMIC_BUILTINS_1 1
/* Define if builtin atomic operations for short are supported on this host.
*/
#define _GLIBCXX_ATOMIC_BUILTINS_2 1
/* Define if builtin atomic operations for int are supported on this host. */
#define _GLIBCXX_ATOMIC_BUILTINS_4 1
/* Define if builtin atomic operations for long long are supported on this
host. */
#define _GLIBCXX_ATOMIC_BUILTINS_8 1
These macros are defined or not depending on configure tests, which check host machine support for __sync_XXX functions. These tests are in libstdc++v3/acinclude.m4, AC_DEFUN([GLIBCXX_ENABLE_ATOMIC_BUILTINS] ....
On your installation, it's evident from the MEM[(__i_type *)&a] put in the assembly file by -fverbose-asm that the compiler uses macros from atomic_0.h, for example:
#define _ATOMIC_LOAD_(__a, __x) \
({typedef __typeof__(_ATOMIC_MEMBER_) __i_type; \
__i_type* __p = &_ATOMIC_MEMBER_; \
__atomic_flag_base* __g = __atomic_flag_for_address(__p); \
__atomic_flag_wait_explicit(__g, __x); \
__i_type __r = *__p; \
atomic_flag_clear_explicit(__g, __x); \
__r; })
With a properly built compiler, with your example program, c++ -m32 -std=c++0x -S -O2 -march=core2 -fverbose-asm should produce something like this:
movl $5, 28(%esp) #, a.D.5442._M_i
lock addl $1, 28(%esp) #,
mfence
movl 28(%esp), %eax # MEM[(const struct __atomic_base *)&a].D.5442._M_i, __ret
mfence
movl $.LC0, (%esp) #,
movl %eax, 4(%esp) # __ret,
call printf #
There are two implementations. One that uses the __sync primitives and one that does not. Plus a mixture of the two that only uses some of those primitives. Which is selected depends on macros _GLIBCXX_ATOMIC_BUILTINS_1, _GLIBCXX_ATOMIC_BUILTINS_2, _GLIBCXX_ATOMIC_BUILTINS_4 and _GLIBCXX_ATOMIC_BUILTINS_8.
At least the first one is needed for the mixed implementation, all are needed for the fully atomic one. It seems that whether they are defined depends on target machine (they may not be defined for -mi386 and should be defined for -mi686).

gcc inline asm embedding pointer to .rodata in .text, x86

I'm trying to embed a pointer to a string in the code section using inline assembler. But gcc is adding a $ to the start of the symbol name, causing a link error.Here is a minimal example,
static const char str[] = "bar";
int main()
{
__asm__ __volatile__
(
"jmp 0f\n\t"
".long %0\n\t"
"0:"
:
: "i" ( str )
);
return 0;
}
building with
gcc -Wall -save-temps test.c -o test
gives the error
test.o: In function `main':
test.c:(.text+0x6): undefined reference to `$str'
looking at the .s temp file, can see the additional $ prepended to str
.file "test.c"
.section .rodata
.type str, #object
.size str, 4
str:
.string "bar"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
#APP
# 4 "test.c" 1
jmp 0f
.long $str
0:
# 0 "" 2
#NO_APP
movl $0, %eax
leave
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5"
.section .note.GNU-stack,"",#progbits
Think i am doing this the correct way, as the same approach works on ppc gcc,
<clip>
b 0f
.long str
0:
</clip>
Then again, maybe it is just "luck" it works for ppc. Is the issue because $ is used as a prefix for immediates when using the AT&T synax ?
In this simple example, i can work around the issue by hardcoding the symbol name, "str", in the inline assembler, but really need it to be an input constraint to the inline assembler.
Does anyone have any ideas on how to get this working on x86 targets ?
Thanks,
- Luke
The same thing happens using clang, probably because the code generator doesn't know the operand is bing used in a .long rather than as an immediate instruction operand. You code try something like:
const char str[] = "bar";
#define string(str) __asm__ __volatile__ \
( \
"jmp 0f\n\t" \
".long " #str "\n\t" \
"0:" \
)
int main()
{
string(str);
return 0;
}
(I had to remove the "static" on str because the compiler optimized it out as not being referenced.)

Resources