gcc with intel x86-32 bit assembly : accessing C function arguments - gcc

I am doing an operating system implementation work.
Here's the code first :
//generate software interrupt
void generate_interrupt(int n) {
asm("mov al, byte ptr [n]");
asm("mov byte ptr [genint+1], al");
asm("jmp genint");
asm("genint:");
asm("int 0");
}
I am compiling above code with -masm=intel option in gcc. Also,
this is not complete code to generate software interrupt.
My problem is I am getting error as n undefined, how do I resolve it, please help?
Also it promts error at link time not at compile time, below is an image

When you are using GCC, you must use GCC-style extended asm to access variables declared in C, even if you are using Intel assembly syntax. The ability to write C variable names directly into an assembly insert is a feature of MSVC, which GCC does not copy.
For constructs like this, it is also important to use a single assembly insert, not several in a row; GCC can and will rearrange assembly inserts relative to the surrounding code, including relative to other assembly inserts, unless you take specific steps to prevent it.
This particular construct should be written
void generate_interrupt(unsigned char n)
{
asm ("mov byte ptr [1f+1], %0\n\t"
"jmp 1f\n"
"1:\n\t"
"int 0"
: /* no outputs */ : "r" (n));
}
Note that I have removed the initial mov and any insistence on involving the A register, instead telling GCC to load n into any convenient register for me with the "r" input constraint. It is best to do as little as possible in an assembly insert, and to leave the choice of registers to the compiler as much as possible.
I have also changed the type of n to unsigned char to match the actual requirements of the INT instruction, and I am using the 1f local label syntax so that this works correctly if generate_interrupt is made an inline function.
Having said all that, I implore you to find an implementation strategy for your operating system that does not involve self-modifying code. Well, unless you plan to get a whole lot more use out of the self-modifications, anyway.

This isn't an answer to your specific question about passing parameters into inline assembly (see #zwol's answer). This addresses using self modifying code unnecessarily for this particular task.
Macro Method if Interrupt Numbers are Known at Compile-time
An alternative to using self modifying code is to create a C macro that generates the specific interrupt you want. One trick is you need to a macro that converts a number to a string. Stringize macros are quite common and documented in the GCC documentation.
You could create a macro GENERATE_INTERRUPT that looks like this:
#define STRINGIZE_INTERNAL(s) #s
#define STRINGIZE(s) STRINGIZE_INTERNAL(s)
#define GENERATE_INTERRUPT(n) asm ("int " STRINGIZE(n));
STRINGIZE will take a numeric value and convert it into a string. GENERATE_INTERRUPT simply takes the number, converts it to a string and appends it to the end of the of the INT instruction.
You use it like this:
GENERATE_INTERRUPT(0);
GENERATE_INTERRUPT(3);
GENERATE_INTERRUPT(255);
The generated instructions should look like:
int 0x0
int3
int 0xff
Jump Table Method if Interrupt Numbers are Known Only at Run-time
If you need to call interrupts only known at run-time then one can create a table of interrupt calls (using int instruction) followed by a ret. generate_interrupt would then simply retrieve the interrupt number off the stack, compute the position in the table where the specific int can be found and jmp to it.
In the following code I get GNU assembler to generate the table of 256 interrupt call each followed by a ret using the .rept directive. Each code fragment fits in 4 bytes. The result code generation and the generate_interrupt function could look like:
/* We use GNU assembly to create a table of interrupt calls followed by a ret
* using the .rept directive. 256 entries (0 to 255) are generated.
* generate_interrupt is a simple function that takes the interrupt number
* as a parameter, computes the offset in the interrupt table and jumps to it.
* The specific interrupted needed will be called followed by a RET to return
* back from the function */
extern void generate_interrupt(unsigned char int_no);
asm (".pushsection .text\n\t"
/* Generate the table of interrupt calls */
".align 4\n"
"int_jmp_table:\n\t"
"intno=0\n\t"
".rept 256\n\t"
"\tint intno\n\t"
"\tret\n\t"
"\t.align 4\n\t"
"\tintno=intno+1\n\t"
".endr\n\t"
/* generate_interrupt function */
".global generate_interrupt\n" /* Give this function global visibility */
"generate_interrupt:\n\t"
#ifdef __x86_64__
"movzx edi, dil\n\t" /* Zero extend int_no (in DIL) across RDI */
"lea rax, int_jmp_table[rip]\n\t" /* Get base of interrupt jmp table */
"lea rax, [rax+rdi*4]\n\t" /* Add table base to offset = jmp address */
"jmp rax\n\t" /* Do sepcified interrupt */
#else
"movzx eax, byte ptr 4[esp]\n\t" /* Get Zero extend int_no (arg1 on stack) */
"lea eax, int_jmp_table[eax*4]\n\t" /* Compute jump address */
"jmp eax\n\t" /* Do specified interrupt */
#endif
".popsection");
int main()
{
generate_interrupt (0);
generate_interrupt (3);
generate_interrupt (255);
}
If you were to look at the generated code in the object file you'd find the interrupt call table (int_jmp_table) looks similar to this:
00000000 <int_jmp_table>:
0: cd 00 int 0x0
2: c3 ret
3: 90 nop
4: cd 01 int 0x1
6: c3 ret
7: 90 nop
8: cd 02 int 0x2
a: c3 ret
b: 90 nop
c: cc int3
d: c3 ret
e: 66 90 xchg ax,ax
10: cd 04 int 0x4
12: c3 ret
13: 90 nop
...
[snip]
Because I used .align 4 each entry is padded out to 4 bytes. This makes the address calculation for the jmp easier.

Related

How to have GCC combine "move r10, r3; store r10" into a "store r3"?

I'm working Power9 and utilizing the hardware random number generator instruction called DARN. I have the following inline assembly:
uint64_t val;
__asm__ __volatile__ (
"xor 3,3,3 \n" // r3 = 0
"addi 4,3,-1 \n" // r4 = -1, failure
"1: \n"
".byte 0xe6, 0x05, 0x61, 0x7c \n" // r3 = darn 3, 1
"cmpd 3,4 \n" // r3 == -1?
"beq 1b \n" // retry on failure
"mr %0,3 \n" // val = r3
: "=g" (val) : : "r3", "r4", "cc"
);
I had to add a mr %0,3 with "=g" (val) because I could not get GCC to produce expected code with "=r3" (val). Also see Error: matching constraint not valid in output operand.
A disassembly shows:
(gdb) b darn.cpp : 36
(gdb) r v
...
Breakpoint 1, DARN::GenerateBlock (this=<optimized out>,
output=0x7fffffffd990 "\b", size=0x100) at darn.cpp:77
77 DARN64(output+i*8);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.ppc64le libgcc-4.8.5-28.el7_5.1.ppc64le libstdc++-4.8.5-28.el7_5.1.ppc64le
(gdb) disass
Dump of assembler code for function DARN::GenerateBlock(unsigned char*, unsigned long):
...
0x00000000102442b0 <+48>: addi r10,r8,-8
0x00000000102442b4 <+52>: rldicl r10,r10,61,3
0x00000000102442b8 <+56>: addi r10,r10,1
0x00000000102442bc <+60>: mtctr r10
=> 0x00000000102442c0 <+64>: xor r3,r3,r3
0x00000000102442c4 <+68>: addi r4,r3,-1
0x00000000102442c8 <+72>: darn r3,1
0x00000000102442cc <+76>: cmpd r3,r4
0x00000000102442d0 <+80>: beq 0x102442c8 <DARN::GenerateBlock(unsigned char*, unsigned long)+72>
0x00000000102442d4 <+84>: mr r10,r3
0x00000000102442d8 <+88>: stdu r10,8(r9)
Notice GCC faithfully reproduces the:
0x00000000102442d4 <+84>: mr r10,r3
0x00000000102442d8 <+88>: stdu r10,8(r9)
How do I get GCC to fold the two instructions into:
0x00000000102442d8 <+84>: stdu r3,8(r9)
GCC will never remove text that's part of the asm template; it doesn't even parse it other than substituting in for %operand. It's literally just a text substitution before the asm is sent to the assembler.
You have to leave out the mr from your inline asm template, and tell gcc that your output is in r3 (or use a memory-destination output operand, but don't do that). If your inline-asm template ever starts or ends with mov instructions, you're usually doing it wrong.
Use register uint64_t foo asm("r3"); to force "=r"(foo) to pick r3 on platforms that don't have specific-register constraints.
(Despite ISO C++17 removing the register keyword, this GNU extension still works with -std=c++17. You can also use register uint64_t foo __asm__("r3"); if you want to avoid the asm keyword. You probably still need to treat register as a reserved word in source that uses this extension; that's fine. ISO C++ removing it from the base language doesn't force implementations to not use it as part of an extension.)
Or better, don't hard-code a register number. Use an assembler that supports the DARN instruction. (But apparently it's so new that even up-to-date clang lacks it, and you'd only want this inline asm as a fallback for gcc too old to support the __builtin_darn() intrinsic)
Using these constraints will let you remove the register setup, too, and use foo=0 / bar=-1 before the inline asm statement, and use "+r"(foo).
But note that darn's output register is write-only. There's no need to zero r3 first. I found a copy of IBM's POWER ISA instruction set manual that is new enough to include darn here: https://wiki.raptorcs.com/w/images/c/cb/PowerISA_public.v3.0B.pdf#page=96
In fact, you don't need to loop inside the asm at all, you can leave that to the C and only wrap the one asm instruction, like inline-asm is designed for.
uint64_t random_asm() {
register uint64_t val asm("r3");
do {
//__asm__ __volatile__ ("darn 3, 1");
__asm__ __volatile__ (".byte 0x7c, 0x61, 0x05, 0xe6 # gcc asm operand = %0\n" : "=r" (val));
} while(val == -1ULL);
return val;
}
compiles cleanly (on the Godbolt compiler explorer) to
random_asm():
.L6: # compiler-generated label, no risk of name clashes
.byte 0x7c, 0x61, 0x05, 0xe6 # gcc asm operand = 3
cmpdi 7,3,-1 # compare-immediate
beq 7,.L6
blr
Just as tight as your loop, with less setup. (Are you sure you even need to zero r3 before the asm instruction?)
This function can inline anywhere you want it to, allowing gcc to emit a store instruction that reads r3 directly.
In practice, you'll want to use a retry counter, as advised in the manual: if the hardware RNG is broken, it might give you failure forever so you should have a fallback to a PRNG. (Same for x86's rdrand)
Deliver A Random Number (darn) - Programming Note
When the error value is obtained, software is
expected to repeat the operation. If a non-error
value has not been obtained after several attempts,
a software random number generation method
should be used. The recommended number of
attempts may be implementation specific. In the
absence of other guidance, ten attempts should be
adequate.
xor-zeroing is not efficient on most fixed-instruction-width ISAs, because a mov-immediate is just as short so there's no need to detect and special-case an xor. (And thus CPU designs don't spend transistors on it). Moreover, dependency rules for the PPC asm equivalent of C++11 std::memory_order_consume require it to carry a dependency on the input register, so it couldn't be dependency-breaking even if the designers wanted it to. xor-zeroing is only a thing on x86 and maybe a few other variable-width ISAs.
Use li r3, 0 like gcc does for int foo(){return 0;} https://godbolt.org/z/-gHI4C.

What is the role of the clobber list? [duplicate]

This function "strcpy" aims to copy the content of src to dest, and it works out just fine: display two lines of "Hello_src".
#include <stdio.h>
static inline char * strcpy(char * dest,const char *src)
{
int d0, d1, d2;
__asm__ __volatile__("1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2)
: "0"(src),"1"(dest)
: "memory");
return dest;
}
int main(void) {
char src_main[] = "Hello_src";
char dest_main[] = "Hello_des";
strcpy(dest_main, src_main);
puts(src_main);
puts(dest_main);
return 0;
}
I tried to change the line : "0"(src),"1"(dest) to : "S"(src),"D"(dest), the error occurred: ‘asm’ operand has impossible constraints. I just cannot understand. I thought that "0"/"1" here specified the same constraint as the 0th/1th output variable. the constraint of 0th output is =&S, te constraint of 1th output is =&D. If I change 0-->S, 1-->D, there shouldn't be any wrong. What's the matter with it?
Does "clobbered registers" or the earlyclobber operand(&) have any use? I try to remove "&" or "memory", the result of either circumstance is the same as the original one: output two lines of "Hello_src" strings. So why should I use the "clobbered" things?
The earlyclobber & means that the particular output is written before the inputs are consumed. As such, the compiler may not allocate any input to the same register. Apparently using the 0/1 style overrides that behavior.
Of course the clobber list also has important use. The compiler does not parse your assembly code. It needs the clobber list to figure out which registers your code will modify. You'd better not lie, or subtle bugs may creep in. If you want to see its effect, try to trick the compiler into using a register around your asm block:
extern int foo();
int bar()
{
int x = foo();
asm("nop" ::: "eax");
return x;
}
Relevant part of the generated assembly code:
call foo
movl %eax, %edx
nop
movl %edx, %eax
Notice how the compiler had to save the return value from foo into edx because it believed that eax will be modified. Normally it would just leave it in eax, since that's where it will be needed later. Here you can imagine what would happen if your asm code did modify eax without telling the compiler: the return value would be overwritten.

Calling printf in extended inline ASM

I'm trying to output the same string twice in extended inline ASM in GCC, on 64-bit Linux.
int main()
{
const char* test = "test\n";
asm(
"movq %[test], %%rdi\n" // Debugger shows rdi = *address of string*
"movq $0, %%rax\n"
"push %%rbp\n"
"push %%rbx\n"
"call printf\n"
"pop %%rbx\n"
"pop %%rbp\n"
"movq %[test], %%rdi\n" // Debugger shows rdi = 0
"movq $0, %%rax\n"
"push %%rbp\n"
"push %%rbx\n"
"call printf\n"
"pop %%rbx\n"
"pop %%rbp\n"
:
: [test] "g" (test)
: "rax", "rbx","rcx", "rdx", "rdi", "rsi", "rsp"
);
return 0;
}
Now, the string is outputted only once. I have tried many things, but I guess I am missing some caveats about the calling convention. I'm not even sure if the clobber list is correct or if I need to save and restore RBP and RBX at all.
Why is the string not outputted twice?
Looking with a debugger shows me that somehow when the string is loaded into rdi for the second time it has the value 0 instead of the actual address of the string.
I cannot explain why, it seems like after the first call the stack is corrupted? Do I have to restore it in some way?
Specific problem to your code: RDI is not maintained across a function call (see below). It is correct before the first call to printf but is clobbered by printf. You'll need to temporarily store it elsewhere first. A register that isn't clobbered will be convenient. You can then save a copy before printf, and copy it back to RDI after.
I do not recommend doing what you are suggesting (making function calls in inline assembler). It will be very difficult for the compiler to optimize things. It is very easy to get things wrong. David Wohlferd wrote a very good article on reasons not to use inline assembly unless absolutely necessary.
Among other things the 64-bit System V ABI mandates a 128-byte red zone. That means you can't push anything onto the stack without potential corruption. Remember: doing a CALL pushes a return address on the stack. Quick and dirty way to resolve this problem is to subtract 128 from RSP when your inline assembler starts and then add 128 back when finished.
The 128-byte area beyond the location pointed to by %rsp is considered to
be reserved and shall not be modified by signal or interrupt handlers.8 Therefore,
functions may use this area for temporary data that is not needed across function
calls. In particular, leaf functions may use this area for their entire stack frame,
rather than adjusting the stack pointer in the prologue and epilogue. This area is
known as the red zone.
Another issue to be concerned about is the requirement for the stack to be 16-byte aligned (or possibly 32-byte aligned depending on the parameters) prior to any function call. This is required by the 64-bit ABI as well:
The end of the input argument area shall be aligned on a 16 (32, if __m256 is
passed on stack) byte boundary. In other words, the value (%rsp + 8) is always
a multiple of 16 (32) when control is transferred to the function entry point.
Note: This requirement for 16-byte alignment upon a CALL to a function is also required on 32-bit Linux for GCC >= 4.5:
In context of the C programming language, function arguments are pushed on the stack in the reverse order. In Linux, GCC sets the de facto standard for calling conventions. Since GCC version 4.5, the stack must be aligned to a 16-byte boundary when calling a function (previous versions only required a 4-byte alignment.)
Since we call printf in inline assembler we should ensure that we align the stack to a 16-byte boundary before making the call.
You also have to be aware that when calling a function some registers are preserved across a function call and some are not. Specifically those that may be clobbered by a function call are listed in Figure 3.4 of the 64-bit ABI (see previous link). Those registers are RAX, RCX, RDX, RD8-RD11, XMM0-XMM15, MMX0-MMX7, ST0-ST7 . These are all potentially destroyed so should be put in the clobber list if they don't appear in the input and output constraints.
The following code should satisfy most of the conditions to ensure that inline assembler that calls another function will not inadvertently clobber registers, preserves the redzone, and maintains 16-byte alignment before a call:
int main()
{
const char* test = "test\n";
long dummyreg; /* dummyreg used to allow GCC to pick available register */
__asm__ __volatile__ (
"add $-128, %%rsp\n\t" /* Skip the current redzone */
"mov %%rsp, %[temp]\n\t" /* Copy RSP to available register */
"and $-16, %%rsp\n\t" /* Align stack to 16-byte boundary */
"mov %[test], %%rdi\n\t" /* RDI is address of string */
"xor %%eax, %%eax\n\t" /* Variadic function set AL. This case 0 */
"call printf\n\t"
"mov %[test], %%rdi\n\t" /* RDI is address of string again */
"xor %%eax, %%eax\n\t" /* Variadic function set AL. This case 0 */
"call printf\n\t"
"mov %[temp], %%rsp\n\t" /* Restore RSP */
"sub $-128, %%rsp\n\t" /* Add 128 to RSP to restore to orig */
: [temp]"=&r"(dummyreg) /* Allow GCC to pick available output register. Modified
before all inputs consumed so use & for early clobber*/
: [test]"r"(test), /* Choose available register as input operand */
"m"(test) /* Dummy constraint to make sure test array
is fully realized in memory before inline
assembly is executed */
: "rax", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10", "r11",
"xmm0","xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
"xmm8","xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15",
"mm0","mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm6",
"st", "st(1)", "st(2)", "st(3)", "st(4)", "st(5)", "st(6)", "st(7)"
);
return 0;
}
I used an input constraint to allow the template to choose an available register to be used to pass the str address through. This ensures that we have a register to store the str address between the calls to printf. I also get the assembler template to choose an available location for storing RSP temporarily by using a dummy register. The registers chosen will not include any one already chosen/listed as an input/output/clobber operand.
This looks very messy, but failure to do it correctly could lead to problems later as you program becomes more complex. This is why calling functions that conform to the System V 64-bit ABI within inline assembler is generally not the best way to do things.

gcc arm -- ensuring args are retained when inlining functions with inline asm statements

I have a series of functions that are ultimately implemented with an SVC call. For instance:
void func(int arg) {
asm volatile ("svc #123");
}
as you might imagine, the SVC operates on 'arg' which is presumably in a register. if i explictly add a 'noinline' attribute to the definition, everything works as you'd expect.
but, were the function inlined at a higher optimization level, the code that loads 'arg' into a register would be omitted -- as there is apprently no reference to 'arg'.
I've tried adding a 'used' attribute to the declaration of 'arg' itself -- but gcc apparently yields a warning in this case.
I've also tried adding "dummy" asm statements such as
asm ("" : "=r"(arg));
But this didn't appear to work in general. (maybe i need to say volatile here as well???)
Anyway, it seems unfortunate to have an explicit function call for a routine whose body essentially consists of one asm statement.
A relevant recipe is in the GCC manual, in Assembler Instructions with C Expression Operands section, that uses sysint with the same role of your svc instruction. The idea is to define a local register variable with a specified register, and then use extended asmsyntax to add inputs and outputs to the inline assembly block.
I tried to compile the following code:
#include <stdint.h>
__attribute__((always_inline))
uint32_t func(uint32_t arg) {
register uint32_t r0 asm("r0") = arg;
register uint32_t result asm("r0");
asm volatile ("svc #123":"=r" (result) : "0" (r0));
return result;
}
uint32_t foo(void) {
return func(2);
}
This is the disassembly of the compiled (with -O2 flag) object file:
00000000 <func>:
0: ef00007b svc 0x0000007b
4: e12fff1e bx lr
00000008 <foo>:
8: e3a00002 mov r0, #2
c: ef00007b svc 0x0000007b
10: e12fff1e bx lr
func is expanded inline and the argument is put in r0 correctly. I believe volatile is necessary, because if you don't make use of the return value of the service call, then the compiler might assume that the assembly piece of code is not necessary.
You should have a single asm block, compiler is still free to treat two asm blocks individually until otherwise specified. Meaning requirements put on second asm block won't have any effect on the first one.
You are assuming registers will be in their right places because of the calling convention.
What about something like this? (didn't test)
void func(int arg) {
asm volatile (
"mov r0, %[code]\n\t"
"svc #123"
:
: [code]"r" (code)
);
}
For more information, see ARM GCC Inline Assembler Cookbook.

Rewrite Intel-style assembly code into GCC inline assembly

How to write this assembly code as inline assembly? Compiler: gcc(i586-elf-gcc). The GAS syntax confuses me. Please give tell me how to write this as inline assembly that works for gcc.
.set_video_mode:
mov ah,00h
mov al,13h
int 10h
.init_mouse:
mov ax,0
int 33h
Similar one I have in assembly. I wrote them separate as assembly routines to call them from my C program. I need to call these and some more interrupts from C itself.
Also I need to put some values in some registers depending on which interrupt routine I'm calling. Please tell me how to do it.
All that I want to do is call interrupt routines from C. It's OK for me even to do it using int86() but i don't have source code of that function.
I want int86() so that i can call interrupts from C.
I am developing my own tiny OS so i got no restrictions for calling interrupts or for any direct hardware access.
I've not tested this, but it should get you started:
void set_video_mode (int x, int y) {
register int ah asm ("ah") = x;
register int al asm ("al") = y;
asm volatile ("int $0x10"
: /* no outputs */
: /* no inputs */
: /* clobbers */ "ah", "al");
}
I've put in two 'clobbers' as an example, but you'll need to set the correct list of clobbers so that the compiler knows you've overwritten register values (maybe none).
First, keep in mind GCC doesn't support 16-bit code yet, so you'll end up compiling 32-bit code in 16-bit mode, which is very inefficient but doable (it is used, for example, by Linux and SeaBIOS). It can be done with the following at the begging of each file:
__asm__ (".code16gcc");
Newer GCC versions (since 4.9 IIRC) support the -m16 flag that does the same thing.
Also, there's no mouse driver available unless you load it previous to your kernel running init_mouse.
You seem to be using an API commonly available in several x86 DOS.
asm can take care of the register assignments, so the code can be reduced to:
void set_video_mode(int mode)
{
mode &= 255;
__asm__ __volatile__ (
"int $0x10"
: "+a" (mode) /* %eax = mode & 255 => %ah = 0, %al = mode */
);
}
void init_mouse(void)
{
/* XXX it is really important to check the IDT entry isn't 0 */
int tmp = 0;
__asm__ __volatile__ (
"int $0x33"
: "+a" (tmp) /* %eax = 0*/
:: "ebx" /* %ebx is also clobbered by DOS mouse drivers */
);
}
The asm statement is documented in the GCC manual, although perhaps not in enough depth and lacks x86 examples. The outputs (after first colon) have a distinctively obscure syntax, while the rest is far easier to understand (the second colon specifies the inputs and the third the clobbered registers, flags and/or memory).
The outputs must be prefixed with =, meaning you don't care the previous value it may have had, or +, meaning you want to use it as an input too. In this context we use that instead of an input because the value is modified by the interrupt and you're not allowed to specify input registers in the clobbered list (because the compiler is forbidden from using them).

Resources