I'm trying to use .ascii directive in the gcc extended asm command but I keep getting compiler errors. What is the exact syntax for directives inside extended asm?
I tried the following options but none of the worked:
asm ("NOP;"
".ASCII ""ABC"""
);
I got "Error: junk at end of line, first unrecognized character is `/'"
asm ("NOP;"
".ASCII "ABC""
);
I got Error: junk at end of line, first unrecognized character is `/'"
asm ("NOP;"
.ASCII "ABC"
);
I got "error: expected ‘:’ or ‘)’ before ‘/’ token"
The syntax for directives inside the asm is identical to writing GNU Assembler, so you can reference the GNU Assembler manual for the relevant syntax.
Example:
#include <stdio.h>
int
main (void)
{
char *string;
asm (".pushsection .rodata\n"
"0:\n"
" .ascii \"Testing 1 2 3!\"\n"
" .popsection\n"
" mov $0b, %0\n":"=rm" (string));
puts (string);
}
In the example we use an extended asm to copy the address of a string to a char * and then pass that to puts to print the string.
The string needs to be placed into the appropriate linker section, not just added to the current (usually the code section i.e. .text). So you begin by pushing the section you want the string stored to into the assembler's section stack. In this example I give it's the read only data section (.rodata) where most strings live. Then you pop the section off the section stack to get back to whatever section the compiler left you in, and do your operation with the string address. The trick is to use a local label like 0 to reference the string and let the assembler and linker compute the offset for you. This may require more work if you're PIE or PIC depending on how much more complicated your references become or if they require relocations.
Related
I'm following along in some books on assembler and find that they happily use fprintf with stdout. They simply reference stdout as a known symbol. I tried to do that in my code, and the linker complains that stdout is not found. I tried variation on spelling including a leading underscore. I also disassembled printf. It sets up a call to vfprintf using the following line:
leaq 0x41879948(%rip), %rax ; __stdoutp
So it seems like I should be able to do something like:
leaq __stdoutp(%rip), %rdi
Didn't work. Linker complains of undefined symbol. Tried with variations. Left off the p suffix, one or both of the underscores. Nothing worked.
Any ideas, or insider knowledge I can follow to access to this symbol?
Based on #PeterCordes suggestion of looking at the compiler generated asm, I wrote a c program:
#include <stdio.h>
int main() { fprintf(stdout, "Hello, world!\n"); }
ran clang -S main.c on it, and had a look at the asm. Here's how clang generated asm is accessing the stdout global
movq ___stdoutp#GOTPCREL(%rip), %rax
movq (%rax), %rdi
Turns out this is stated in the ABI (section 3.5.4)
Position-independent code cannot contain absolute address. To access a global symbol the address of the symbol has to be loaded from the Global Offset Table. The address of
the entry in the GOT can be obtained with a %rip-relative instruction in the small model.
stdout in the c code refers to the __stdoutp symbol. This becomes ___stdoutp in assembler, because macosx. Similar to how main is _main.
ABI section 5.2 explains GOT this way
Global offset tables hold absolute addresses in private data, thus making the addresses available without compromising the position-independence and shareability of a program’s text.
I have a trivial example C program:-
#include <stdio.h>
int main()
{
printf("hello world!");
return 1;
}
I use the following command to compile it and generate assembly:-
riscv32-unknown-elf-gcc -S hello.c -o hello.asm
Which generates the following assembly: -
.file "hello.c"
.option nopic
.section .rodata
.align 2
.LC0:
.string "hello world!"
.text
.align 2
.globl main
.type main, #function
main:
addi sp,sp,-16
sw ra,12(sp)
sw s0,8(sp)
addi s0,sp,16
lui a5,%hi(.LC0)
addi a0,a5,%lo(.LC0)
call printf
li a5,1
mv a0,a5
lw ra,12(sp)
lw s0,8(sp)
addi sp,sp,16
jr ra
.size main, .-main
.ident "GCC: (GNU) 7.2.0"
There is an expected call printf line but because there is no implementation of the printf inside this assembly file I would have expected to see it request an external implementation with something like this...
.global printf
But there is no such line in the assembly. I thought that without the global directive it meant that the linker will only try and resolve it to labels inside this single assembly file. I thought that was the whole point of the global directive, so that all the labels are local to the single assembly file unless exported using .global for access from other object files or import from another object file by also using .global.
What am I missing here?
.global would mark a label in the current file as having global scope (available to other modules). Maybe you meant .extern. Although .extern can be used to say a label is external, the directive is actually ignored by GNU Assembler. From the manual:
.extern is accepted in the source program--for compatibility with other assemblers--but it is ignored. as treats all undefined symbols as external.
as = GNU assembler.
GNU assembler assumes that any label it doesn't know about in the current file is an external reference. It is up to the linker to determine if it is undefined or not. That is why you don't see any directive marking printf as being external. In GNU assembler it just isn't necessary.
Note: Part of the confusion may be in that assemblers like NASM/YASM require an explicit extern statement to denote that a symbol is not within the local module being assembled. Those assemblers will return with an error that a symbol was undefined. This is one difference between GNU Assembler and NASM/YASM.
The .global .directive doesn't import labels, as it is essentially export. It only marks labels in the current file as globally available to other modules. It is not used for importing labels from other modules. From the manual:
.global makes the symbol visible to ld. If you define symbol in your partial program, its value is made available to other partial programs that are linked with it. Otherwise, symbol takes its attributes from a symbol of the same name from another file linked into the same program.
Both spellings (‘.globl’ and ‘.global’) are accepted, for compatibility with other assemblers.
There is a .global main directive to mark main as global. Without it the linker will assume that main is essentially a static label specific to a module and not usable by other modules. The C runtime library needs access to main since main must be called as the last step of transferring control to the entry of your C code.
What is the correct gnu assembly syntax for doing the following:
.section .data2
.asciz "******* Output Data ********"
total_sectors_written: .word 0x0
max_buffer_sectors: .word ((0x9fc00 - $data_buffer) / 512) # <=== need help here
.align 512
data_buffer: .asciz "<The actual data will overwrite this>"
Specifically, I'm writing a toy OS. The code above is in 16-bit real mode. I'm setting up a data buffer that will be dumped back to the boot disk. I want to calculate the number of sectors there are between where data_buffer gets placed in memory, and the upper bound of that data buffer. (Address 0x9fc00 is where the buffer would run into RAM reserved for other purposes.)
I know I could write assembly code to calculate this; but, since it is a constant known at build time, I'm curious if I can get the assembler to calculate it for me.
I'm running into three specific problems:
(1) If I use $data_buffer I get this error:
os_src/boot.S: Assembler messages:
os_src/boot.S:497: Error: missing ')'
os_src/boot.S:497: Error: can't resolve `L0' {*ABS* section} - `$data_buffer' {*UND* section}
which I find confusing, because I should use $ when I want the memory address of a label, correct?
(2) If I use data_buffer instead of $data_buffer, I get this error:
os_src/boot.S: Assembler messages:
os_src/boot.S:497: Error: missing ')'
os_src/boot.S:497: Error: value of 653855 too large for field of 2 bytes at 31
make: *** [obj/boot/dd_test.o] Error 1
which seems to suggest that the assembler is complaining about the size of the intermediate value (which does not need to fit in a 16-bit word).
(3) And, of course, what's up with the missing ')'?
When you use expressions in GNU assembler they have to resolve to absolute values. GNU assembler isn't aware of what the origin point of the code will actually be at. That is what the linker is for. Because of that data_buffer absolute address isn't known until linking is done so it is considered relocatable. If you take an absolute value like 0x9fc00 and subtract a relocatable value from it you get a relocatable value. Relocatable values can't be used in constant (absolute) expressions.
All is not lost. The linker itself will know the absolute address once it arranges everything in memory. You seem to suggest you already use a linker script which means the work you have to do is minimal. You can use the linker to compute the value of max_buffer_sectors.
Your linker script will have a SECTIONS directive like:
SECTIONS
{
[your section contents here]
}
You can create a linker symbol max_buffer_sectors with something like:
SECTIONS
{
max_buffer_sectors = (0x9fc00 - (data_buffer)) / 512;
[your section contents here]
}
This will allow the linker to compute the size since it will know data_buffer absolute address in memory.
Your GNU assembly file will need a bit of tweaking:
.globl data_buffer
.section .data2
.asciz "******* Output Data ********"
total_sectors_written: .word 0x0
.align 512
data_buffer: .asciz "<The actual data will overwrite this>"
You'll notice I used .globl data_buffer. This exports the symbol and makes it global so that the linker can use it.
You can then use the symbol max_buffer_sectors in code like:
mov $max_buffer_sectors, %ax
I have the following code which compiles fine with the gcc command gcc ./example.c. The program itself calls the function "add_two" which simply adds two integers. To use the intel syntax within the extended assembly instructions I need to switch at first to intel and than back to AT&T. According to the gcc documentation it is possible to switch to intel syntax entirely by using gcc -masm=intel ./exmaple.
Whenever I try to compile it with the switch -masm=intel it won't compile and I don't understand why? I already tried to delete the instruction .intel_syntax but it still don't compile.
#include <stdio.h>
int add_two(int, int);
int main(){
int src = 3;
int dst = 5;
printf("summe = %d \n", add_two(src, dst));
return 0;
}
int add_two(int src, int dst){
int sum;
asm (
".intel_syntax;" //switch to intel syntax
"mov %0, %1;"
"add %0, %2;"
".att_syntax;" //switch to at&t syntax
: "=r" (sum) //output
: "r" (src), "r" (dst) //input
);
return sum;
}
The error message by compiling the above mentioned program with gcc -masm=intel ./example.c is:
tmp/ccEQGI4U.s: Assembler messages:
/tmp/ccEQGI4U.s:55: Error: junk `PTR [rbp-4]' after expression
/tmp/ccEQGI4U.s:55: Error: too many memory references for `mov'
/tmp/ccEQGI4U.s:56: Error: too many memory references for `mov'
Use -masm=intel and don't use any .att_syntax directives in your inline asm. This works with GCC and I think ICC, and with any constraints you use. Other methods don't. (See Can I use Intel syntax of x86 assembly with GCC? for a simple answer saying that; this answer explores exactly what goes wrong, including with clang 13 and earlier.)
That also works in clang 14 and later. (Which isn't released yet but the patch is part of current trunk; see https://reviews.llvm.org/D113707).
Clang 13 and earlier would always use AT&T syntax for inline asm, both in substituting operands and in assembling as op src, dst. But even worse, clang -masm=intel would do that even when taking the Intel side of an asm template using dialect-alternatives like asm ("add {att | intel}" : ... )`!
clang -masm=intel did still control how it printed asm after its built-in assembler turned an asm() statement into some internal representation of the instruction. e.g. Godbolt showing clang13 -masm=intel turning add %0, 1 as add dword ptr [1], eax, but clang trunk producing add eax, 1.
Some of the rest of this answer talking about clang hasn't been updated for this new clang patch.
Clang does support Intel-syntax inside MSVC-style asm-blocks, but that's terrible (no constraints so inputs / outputs have to go through memory.
If you were hard-coding register names with clang, -masm=intel would be usable (or the equivalent -mllvm --x86-asm-syntax=intel). But it chokes on mov %eax, 5 in Intel-syntax mode so you can't let %0 expand to an AT&T-syntax register name.
-masm=intel makes the compiler use .intel_syntax noprefix at the top of its asm output file, and use Intel-syntax when generating asm from C outside your inline-asm statement. Using .att_syntax at the bottom of your asm template breaks the compiler's asm, hence the error messages like PTR [rbp-4] looking like junk to the assembler (which is expecting AT&T syntax).
The "too many operands for mov" is because in AT&T syntax, mov eax, ebx is a mov from a memory operand (with symbol name eax) to a memory operand (with symbol name ebx)
Some people suggest using .intel_syntax noprefix and .att_syntax prefix around your asm template. That can sometimes work but it's problematic. And incompatible with the preferred method of -masm=intel.
Problems with the "sandwich" method:
When the compiler substitutes operands into your asm template, it will do so according to -masm=. This will always break for memory operands (the addressing-mode syntax is completely different).
It will also break with clang even for registers. Clang's built-in assembler does not accept %eax as a register name in Intel-syntax mode, and doesn't accept .intel_syntax prefix (as opposed to the noprefix that's usually used with Intel-syntax).
Consider this function:
int foo(int x) {
asm(".intel_syntax noprefix \n\t"
"add %0, 1 \n\t"
".att_syntax"
: "+r"(x)
);
return x;
}
It assembles as follows with GCC (Godbolt):
movl %edi, %eax
.intel_syntax noprefix
add %eax, 1 # AT&T register name in Intel syntax
.att_syntax
The sandwich method depends on GAS accepting %eax as a register name even in Intel-syntax mode. GAS from GNU Binutils does, but clang's built-in assembler doesn't.
On a Mac, even using real GCC the asm output has to assemble with an as that's based on clang, not GNU Binutils.
Using clang on that source code complains:
<source>:2:35: error: unknown token in expression
asm(".intel_syntax noprefix \n\t"
^
<inline asm>:2:6: note: instantiated into assembly here
add %eax, 1
^
(The first line of the error message didn't handle the multi-line string literal very well. If you use ; instead of \n\t and put everything on one line the clang error message works better but the source is a mess.)
I didn't check what happens with "ri" constraints when the compiler picks an immediate; it will still decorate it with $ but IDK if GAS silently ignores that, too, in Intel syntax mode.
PS: your asm statement has a bug: you forgot an early-clobber on your output operand so nothing is stopping the compiler from picking the same register for the %0 output and the %2 input that you don't read until the 2nd instruction. Then mov will destroy an input.
But using mov as the first or last instruction of an asm-template is usually also a missed-optimization bug. In this case you can and should just use lea %0, [%1 + %2] to let the compiler add with the result written to a 3rd register, non-destructively. Or just wrap the add instruction (using a "+r" operand and an "r", and let the compiler worry about data movement.) If it had to load the value from memory anyway, it can put it in the right register so no mov is needed.
PS: it's possible to write inline asm that works with -masm=intel or att, using GNU C inline asm dialect alternatives. e.g.
void atomic_inc(int *p) {
asm( "lock add{l $1, %0 | %0, 1}"
: "+m" (*p)
:: "memory"
);
}
compiles with gcc -O2 (-masm=att is the default) to
atomic_inc(int*):
lock addl $1, (%rdi)
ret
Or with -masm=intel to:
atomic_inc(int*):
lock add DWORD PTR [rdi], 1
ret
Notice that the l suffix is required for AT&T, and the dword ptr is required for intel, because memory, immediate doesn't imply an operand-size. And that the compiler filled in valid addressing-mode syntax for both cases.
This works with clang, but only the AT&T version ever gets used.
Note that -masm= also affects the default inline assembler syntax:
Output assembly instructions using selected dialect. Also affects
which dialect is used for basic "asm" and extended "asm". Supported
choices (in dialect order) are att or intel. The default is att.
Darwin does not support intel.
That means that your first .intel_syntax directive is superfluous and the final .att_syntax is wrong because your GCC call compiles C to Intel assembler code.
IOW, either stick to -masm=intel or sandwich your inline Intel assembler code sections between .intel_syntax noprefix and .att_syntax prefix directives - but don't do both.
Note that the sandwich method isn't compatible with all inline assembler constraints - e.g. a constraint that involves m (i.e. memory operand) would insert an operand in ATT syntax which would yield an error like 'Error: junk (%rbp) after expression'. In those cases you have to use -masm=intel.
All is in the title.
For some reasons I have to do it like this.
But when I compile my code, GCC (or GAS maybe...) displays the following error:
.../Temp/cc1C1fjs.s:19: Error: immediate operand illegal with absolute jump
Code:
int main ( int argc, char **argv )
{
/* Some code */
( (void(*)()) &&label)();
/* Some code */
return 0;
label:
asm ("push %ebp");
asm ("mov %esp,%ebp");
/* Some code */
printf("Hello world");
asm ("leave");
asm("ret");
}
I'm sure that this should works because I tried to create a thread using CreateThread function (I'm under windows) specifing as entry point the address of label, and it works perfectly well.
So how can I ensure that the compiler accepting this syntax?
Or there is anothers ways for doing that?
I don't have a solution for you, but I do have a couple of suggestions:
Run gcc -S file.c and look at line #19 to see if you can spot what the actual problem is.
Look through the rest of the (short) .s file to see if anything is obviously amiss. For example, my version of gcc seems to decide that everything after return 0 is dead code, so none of your asm code nor the printf actually make it to the assembler.
Can't this code be moved into a function? This way you'll get the prologue/epilogue for free; taking the address would also be less fraught with difficulty.
I fixed a part of the problem:
#aix you have right, GCC remove
everything of the main function
after return 0;, I fixed this
replacing it by
asm("leave");
asm("xor %eax,%eax");
asm("ret");
Now the code after my label is generated.
Running gcc -S file.c then
gcc file.s -o file.exe, of course it displays the error and at
the error line there is call *$L2
(L2 is label in my c file). It works
by replacing it by call L2.
Now the code after my label and after my call in main is
executed and the program properly
terminates with state 0.
But I don't want to have to do that each time I will compile.
Is it normal that GCC write call *$L2 rather than call L2?