I'm compiling a 32 bit binary but want to embed some 64 bit assembly in it.
void method() {
asm("...64 bit assembly...");
}
Of course when I compile I get errors about referring to bad registers because the registers are 64 bit.
evil.c:92: Error: bad register name `%rax'
Is it possible to add some annotations so gcc will process the asm sections using the 64bit assembler instead. I have a workaround which is compile separately, map in a page with PROT_EXEC|PROT_WRITE and copy in my code but this is very awkward.
No, this isn't possible. You can't run 64-bit assembly from a 32-bit binary, as the processor will not be in long mode while running your program.
Copying 64-bit code to an executable page will result in that code being interpreted incorrectly as 32-bit code, which will have unpredictable and undesirable results.
Don't try to put 64-bit machine-code inside a compiler-generated function. It might work since the encoding for function prologue/epilogue is the same in 32 and 64-bit, but it would be cleaner to just have a separate block of 64-bit code.
The easiest thing is probably to assemble that block in a separate file, using GAS .code64 or NASM BITS 64 to get 64-bit code in an object file you can link into a 32-bit executable.
You said in a comment you're thinking of using this for a kernel exploit against a 64-bit kernel from a 32-bit user-space process, so you just need some code bytes in an executable part of your process's memory and a way to get a pointer to that block. This is certainly plausible; if you can gain control of the kernel's RIP from a 32-bit process, this is what you want, because kernel code will always be running in long mode.
If you were doing something with 64-bit userspace code in a process that started in 32-bit mode, you could maybe far jmp to the block of 64-bit code (as #RossRidge suggests), using a known value for the kernel's __USER_CS 64-bit code segment descriptor. syscall from 64-bit code should return in 64-bit mode, but if not, try the int 0x80 ABI. It always returns to the mode you were in, saving/restoring cs and ss along with rip and rflags. (What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?)
.rodata is part of the test segment of your executable, so just get the compiler to put bytes in a const array. Fun fact: const int main = 195; compiles to a program that exits without segfaulting, because 195 = 0xc3 = the x86 encoding for ret (and x86 is little-endian). For an arbitrary-length machine-code sequence, const char funcname[] = { 0x90, 0x90, ..., 0xc3 } will work. The const is necessary, otherwise it will go in .data (read/write/noexec) instead of .rodata.
You could use const char funcname[] __attribute__((section(".text"))) = { ... }; to control what section it goes in (e.g. .text along with compiler-generated functions), or even a linker script to get more control.
If you really want to do it all in one .c file, instead of using the easier solution of a separately-assembled pure asm source:
To assemble some 64-bit code along with compiler-generated 32-bit code, use the .code64 GAS directive in an asm statement *outside of any functions. IDK if there's any guarantee on what section will be active when gcc emits your asm how gcc will mix that asm with its asm, but it won't put it in the middle of a function.
asm(".pushsection .text \n\t" // AFAIK, there's no guarantee how this will mix with compiler asm output
".code64 \n\t"
".p2align 4 \n\t"
".globl my_codebytes \n\t" // optional
"my_codebytes: \n\t"
"inc %r10d \n\t"
"my_codebytes_end: \n\t"
//"my_codebytes_len: .long . - my_codebytes\n\t" // store the length in memory. Optional
".popsection \n\t"
#ifdef __i386
".code32" // back to 32-bit interpretation for gcc's code
// "\n\t inc %r10" // uncomment to check that it *doesn't* assemble
#endif
);
#ifdef __cplusplus
extern "C" {
#endif
// put C names on the labels.
// They are *not* pointers, their addresses are link-time constants
extern char my_codebytes[], my_codebytes_end[];
//extern const unsigned my_codebytes_len;
#ifdef __cplusplus
}
#endif
// This expression for the length isn't a compile-time constant, so this isn't legal C
//static const unsigned len = &my_codebytes_end - &my_codebytes;
#include <stddef.h>
#include <unistd.h>
int main(void) {
size_t len = my_codebytes_end - my_codebytes;
const char* bytes = my_codebytes;
// do whatever you want. Writing it to stdout is one option!
write(1, bytes, len);
}
This compiles and assembles with gcc and clang (compiler explorer).
I tried it on my desktop to double check:
peter#volta$ gcc -m32 -Wall -O3 /tmp/foo.c
peter#volta$ ./a.out | hd
00000000 41 ff c2 |A..|
00000003
This is the correct encoding for inc %r10d :)
The program also works when compiled without -m32, because I used #ifdef to decide whether to use .code32 at the end or not. (There's no push/pop mode directive like there is for sections.)
Of course, disassembling the binary will show you:
00000580 <my_codebytes>:
580: 41 inc ecx
581: ff c2 inc edx
because the disassembler doesn't know to switch to 64-bit disassembly for that block. (I wonder if ELF has attributes for that... I didn't use any assembler directives or linker scripts to generate such attributes, if such a thing exists.)
Switching between long mode and compatibility mode is done by changing CS. User mode code cannot modify the descriptor table, but it can perform a far jump or far call to a code segment that is already present in the descriptor table. In Linux the required descriptor is present (in my experience; this may not be true for all installations).
Here is sample code for 64-bit Linux (Ubuntu) that starts in 32-bit mode, switches to 64-bit mode, runs a function, and then switches back to 32-bit mode. Build with gcc -m32.
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
extern bool switch_cs(int cs, bool (*f)());
extern bool check_mode();
int main(int argc, char **argv)
{
int cs = 0x33;
if (argc > 1)
cs = strtoull(argv[1], 0, 16);
printf("switch to CS=%02x\n", cs);
bool r = switch_cs(cs, check_mode);
if (r)
printf("cs=%02x: 64-bit mode\n", cs);
else
printf("cs=%02x: 32-bit mode\n", cs);
return 0;
}
.intel_syntax noprefix
.text
.code32
.globl switch_cs
switch_cs:
mov eax, [esp+4]
mov edx, [esp+8]
push 0
push edx
push eax
push offset .L2
lea eax, [esp+8]
lcall [esp]
add esp, 16
ret
.L2:
call [eax]
lret
.code64
.globl check_mode
check_mode:
xor eax, eax
// In 32-bit mode, this instruction is executed as
// inc eax; test eax, eax
test rax, rax
setz al
ret
Related
I want to get the value of EIP from the following code, but the compilation does not pass
Command :
gcc -o xxx x86_inline_asm.c -m32 && ./xxx
file contetn x86_inline_asm.c:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned int eip_val;
__asm__("mov %0,%%eip":"=r"(eip_val));
return 0;
}
How to use the inline assembly to get the value of EIP, and it can be compiled successfully under x86.
How to modify the code and use the command to complete it?
This sounds unlikely to be useful (vs. just taking the address of the whole function like void *tmp = main), but it is possible.
Just get a label address, or use . (the address of the current line), and let the linker worry about getting the right immediate into the machine code. So you're not architecturally reading EIP, just reading the value it currently has from an immediate.
asm volatile("mov $., %0" : "=r"(address_of_mov_instruction) );
AT&T syntax is mov src, dst, so what you wrote would be a jump if it assembled.
(Architecturally, EIP = the end of an instruction while it's executing, so arguably you should do
asm volatile(
"mov $1f, %0 \n\t" // reference label 1 forward
"1:" // GAS local label
"=r"(address_after_mov)
);
I'm using asm volatile in case this asm statement gets duplicated multiple times inside the same function by inlining or something. If you want each case to get a different address, it has to be volatile. Otherwise the compiler can assume that all instances of this asm statement produce the same output. Normally that will be fine.
Architecturally in 32-bit mode you don't have RIP-relative addressing for LEA so the only good way to actually read EIP is call / pop. Reading program counter directly. It's not a general-purpose register so you can't just use it as the source or destination of a mov or any other instruction.
But really you don't need inline asm for this at all.
Is it possible to store the address of a label in a variable and use goto to jump to it? shows how to use the GNU C extension where &&label takes its address.
int foo;
void *addr_inside_function() {
foo++;
lab1: ; // labels only go on statements, not declarations
void *tmp = &&lab1;
foo++;
return tmp;
}
There's nothing you can safely do with this address outside the function; I returned it just as an example to make the compiler put a label in the asm and see what happens. Without a goto to that label, it can still optimize the function pretty aggressively, but you might find it useful as an input for an asm goto(...) somewhere else in the function.
But anyway, it compiles on Godbolt to this asm
# gcc -O3 -m32
addr_inside_function:
.L2:
addl $2, foo
movl $.L2, %eax
ret
#clang -O3 -m32
addr_inside_function:
movl foo, %eax
leal 1(%eax), %ecx
movl %ecx, foo
.Ltmp0: # Block address taken
addl $2, %eax
movl %eax, foo
movl $.Ltmp0, %eax # retval = label address
retl
So clang loads the global, computes foo+1 and stores it, then after the label computes foo+2 and stores that. (Instead of loading twice). So you still can't usefully jump to the label from anywhere, because it depends on having foo's old value in eax, and on the desired behaviour being to store foo+2
I don't know gcc inline assembly syntax for this, but for masm:
call next0
next0: pop eax ;eax = eip for this line
In the case of Masm, $ represents the current location, and since call is a 5 byte instruction, an alternative syntax without a label would be:
call $+5
pop eax
I am doing an operating system implementation work.
Here's the code first :
//generate software interrupt
void generate_interrupt(int n) {
asm("mov al, byte ptr [n]");
asm("mov byte ptr [genint+1], al");
asm("jmp genint");
asm("genint:");
asm("int 0");
}
I am compiling above code with -masm=intel option in gcc. Also,
this is not complete code to generate software interrupt.
My problem is I am getting error as n undefined, how do I resolve it, please help?
Also it promts error at link time not at compile time, below is an image
When you are using GCC, you must use GCC-style extended asm to access variables declared in C, even if you are using Intel assembly syntax. The ability to write C variable names directly into an assembly insert is a feature of MSVC, which GCC does not copy.
For constructs like this, it is also important to use a single assembly insert, not several in a row; GCC can and will rearrange assembly inserts relative to the surrounding code, including relative to other assembly inserts, unless you take specific steps to prevent it.
This particular construct should be written
void generate_interrupt(unsigned char n)
{
asm ("mov byte ptr [1f+1], %0\n\t"
"jmp 1f\n"
"1:\n\t"
"int 0"
: /* no outputs */ : "r" (n));
}
Note that I have removed the initial mov and any insistence on involving the A register, instead telling GCC to load n into any convenient register for me with the "r" input constraint. It is best to do as little as possible in an assembly insert, and to leave the choice of registers to the compiler as much as possible.
I have also changed the type of n to unsigned char to match the actual requirements of the INT instruction, and I am using the 1f local label syntax so that this works correctly if generate_interrupt is made an inline function.
Having said all that, I implore you to find an implementation strategy for your operating system that does not involve self-modifying code. Well, unless you plan to get a whole lot more use out of the self-modifications, anyway.
This isn't an answer to your specific question about passing parameters into inline assembly (see #zwol's answer). This addresses using self modifying code unnecessarily for this particular task.
Macro Method if Interrupt Numbers are Known at Compile-time
An alternative to using self modifying code is to create a C macro that generates the specific interrupt you want. One trick is you need to a macro that converts a number to a string. Stringize macros are quite common and documented in the GCC documentation.
You could create a macro GENERATE_INTERRUPT that looks like this:
#define STRINGIZE_INTERNAL(s) #s
#define STRINGIZE(s) STRINGIZE_INTERNAL(s)
#define GENERATE_INTERRUPT(n) asm ("int " STRINGIZE(n));
STRINGIZE will take a numeric value and convert it into a string. GENERATE_INTERRUPT simply takes the number, converts it to a string and appends it to the end of the of the INT instruction.
You use it like this:
GENERATE_INTERRUPT(0);
GENERATE_INTERRUPT(3);
GENERATE_INTERRUPT(255);
The generated instructions should look like:
int 0x0
int3
int 0xff
Jump Table Method if Interrupt Numbers are Known Only at Run-time
If you need to call interrupts only known at run-time then one can create a table of interrupt calls (using int instruction) followed by a ret. generate_interrupt would then simply retrieve the interrupt number off the stack, compute the position in the table where the specific int can be found and jmp to it.
In the following code I get GNU assembler to generate the table of 256 interrupt call each followed by a ret using the .rept directive. Each code fragment fits in 4 bytes. The result code generation and the generate_interrupt function could look like:
/* We use GNU assembly to create a table of interrupt calls followed by a ret
* using the .rept directive. 256 entries (0 to 255) are generated.
* generate_interrupt is a simple function that takes the interrupt number
* as a parameter, computes the offset in the interrupt table and jumps to it.
* The specific interrupted needed will be called followed by a RET to return
* back from the function */
extern void generate_interrupt(unsigned char int_no);
asm (".pushsection .text\n\t"
/* Generate the table of interrupt calls */
".align 4\n"
"int_jmp_table:\n\t"
"intno=0\n\t"
".rept 256\n\t"
"\tint intno\n\t"
"\tret\n\t"
"\t.align 4\n\t"
"\tintno=intno+1\n\t"
".endr\n\t"
/* generate_interrupt function */
".global generate_interrupt\n" /* Give this function global visibility */
"generate_interrupt:\n\t"
#ifdef __x86_64__
"movzx edi, dil\n\t" /* Zero extend int_no (in DIL) across RDI */
"lea rax, int_jmp_table[rip]\n\t" /* Get base of interrupt jmp table */
"lea rax, [rax+rdi*4]\n\t" /* Add table base to offset = jmp address */
"jmp rax\n\t" /* Do sepcified interrupt */
#else
"movzx eax, byte ptr 4[esp]\n\t" /* Get Zero extend int_no (arg1 on stack) */
"lea eax, int_jmp_table[eax*4]\n\t" /* Compute jump address */
"jmp eax\n\t" /* Do specified interrupt */
#endif
".popsection");
int main()
{
generate_interrupt (0);
generate_interrupt (3);
generate_interrupt (255);
}
If you were to look at the generated code in the object file you'd find the interrupt call table (int_jmp_table) looks similar to this:
00000000 <int_jmp_table>:
0: cd 00 int 0x0
2: c3 ret
3: 90 nop
4: cd 01 int 0x1
6: c3 ret
7: 90 nop
8: cd 02 int 0x2
a: c3 ret
b: 90 nop
c: cc int3
d: c3 ret
e: 66 90 xchg ax,ax
10: cd 04 int 0x4
12: c3 ret
13: 90 nop
...
[snip]
Because I used .align 4 each entry is padded out to 4 bytes. This makes the address calculation for the jmp easier.
I write a boot loader in asm and want to add some compiled C code in my project.
I created a test function here:
test.c
__asm__(".code16\n");
void print_str() {
__asm__ __volatile__("mov $'A' , %al\n");
__asm__ __volatile__("mov $0x0e, %ah\n");
__asm__ __volatile__("int $0x10\n");
}
And here is the asm code (the boot loader):
hw.asm
[org 0x7C00]
[BITS 16]
[extern print_str] ;nasm tip
start:
mov ax, 0
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00
mov si, name
call print_string
mov al, ' '
int 10h
mov si, version
call print_string
mov si, line_return
call print_string
call print_str ;call function
mov si, welcome
call print_string
jmp mainloop
mainloop:
mov si, prompt
call print_string
mov di, buffer
call get_str
mov si, buffer
cmp byte [si], 0
je mainloop
mov si, buffer
;call print_string
mov di, cmd_version
call strcmp
jc .version
jmp mainloop
.version:
mov si, name
call print_string
mov al, ' '
int 10h
mov si, version
call print_string
mov si, line_return
call print_string
jmp mainloop
name db 'MOS', 0
version db 'v0.1', 0
welcome db 'Developped by Marius Van Nieuwenhuyse', 0x0D, 0x0A, 0
prompt db '>', 0
line_return db 0x0D, 0x0A, 0
buffer times 64 db 0
cmd_version db 'version', 0
%include "functions/print.asm"
%include "functions/getstr.asm"
%include "functions/strcmp.asm"
times 510 - ($-$$) db 0
dw 0xaa55
I need to call the c function like a simple asm function
Without the extern and the call print_str, the asm script boot in VMWare.
I tried to compile with:
nasm -f elf32
But i can't call org 0x7C00
Compiling & Linking NASM and GCC Code
This question has a more complex answer than one might believe, although it is possible. Can the first stage of a bootloader (the original 512 bytes that get loaded at physical address 0x07c00) make a call into a C function? Yes, but it requires rethinking how you build your project.
For this to work you can no longer us -f bin with NASM. This also means you can't use the org 0x7c00 to tell the assembler what address the code expects to start from. You'll need to do this through a linker (either us LD directly or GCC for linking). Since the linker will lay things out in memory we can't rely on placing the boot sector signature 0xaa55 in our output file. We can get the linker to do that for us.
The first problem you will discover is that the default linker scripts used internally by GCC don't lay things out the way we want. We'll need to create our own. Such a linker script will have to set the origin point (Virtual Memory Address aka VMA) to 0x7c00, place the code from your assembly file before the data and place the boot signature at offset 510 in the file. I'm not going to write a tutorial on Linker scripts. The Binutils Documentation contains almost everything you need to know about linker scripts.
OUTPUT_FORMAT("elf32-i386");
/* We define an entry point to keep the linker quiet. This entry point
* has no meaning with a bootloader in the binary image we will eventually
* generate. Bootloader will start executing at whatever is at 0x07c00 */
ENTRY(start);
SECTIONS
{
. = 0x7C00;
.text : {
/* Place the code in hw.o before all other code */
hw.o(.text);
*(.text);
}
/* Place the data after the code */
.data : SUBALIGN(2) {
*(.data);
*(.rodata*);
}
/* Place the boot signature at LMA/VMA 0x7DFE */
.sig 0x7DFE : {
SHORT(0xaa55);
}
/* Place the uninitialised data in the area after our bootloader
* The BIOS only reads the 512 bytes before this into memory */
.bss : SUBALIGN(4) {
__bss_start = .;
*(COMMON);
*(.bss)
. = ALIGN(4);
__bss_end = .;
}
__bss_sizeb = SIZEOF(.bss);
/* Remove sections that won't be relevant to us */
/DISCARD/ : {
*(.eh_frame);
*(.comment);
}
}
This script should create an ELF executable that can be converted to a flat binary file with OBJCOPY. We could have output as a binary file directly but I separate the two processes out in the event I want to include debug information in the ELF version for debug purposes.
Now that we have a linker script we must remove the ORG 0x7c00 and the boot signature. For simplicity sake we'll try to get the following code (hw.asm) to work:
extern print_str
global start
bits 16
section .text
start:
xor ax, ax ; AX = 0
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00
call print_str ; call function
/* Halt the processor so we don't keep executing code beyond this point */
cli
hlt
You can include all your other code, but this sample will still demonstrate the basics of calling into a C function.
Assume the code above you can now generate the ELF object from hw.asm producing hw.o using this command:
nasm -f elf32 hw.asm -o hw.o
You compile each C file with something like:
gcc -ffreestanding -c kmain.c -o kmain.o
I placed the C code you had into a file called kmain.c . The command above will generate kmain.o. I noticed you aren't using a cross compiler so you'll want to use -fno-PIE to ensure we don't generate relocatable code. -ffreestanding tells GCC the C standard library may not exist, and main may not be the program entry point. You'd compile each C file in the same way.
To link this code to a final executable and then produce a flat binary file that can be booted we do this:
ld -melf_i386 --build-id=none -T link.ld kmain.o hw.o -o kernel.elf
objcopy -O binary kernel.elf kernel.bin
You specify all the object files to link with the LD command. The LD command above will produce a 32-bit ELF executable called kernel.elf. This file can be useful in the future for debugging purposes. Here we use OBJCOPY to convert kernel.elf to a binary file called kernel.bin. kernel.bin can be used as a bootloader image.
You should be able to run it with QEMU using this command:
qemu-system-i386 -fda kernel.bin
When run it may look like:
You'll notice the letter A appears on the last line. This is what we'd expect from the print_str code.
GCC Inline Assembly is Hard to Get Right
If we take your example code in the question:
__asm__ __volatile__("mov $'A' , %al\n");
__asm__ __volatile__("mov $0x0e, %ah\n");
__asm__ __volatile__("int $0x10\n");
The compiler is free to reorder these __asm__ statements if it wanted to. The int $0x10 could appear before the MOV instructions. If you want these 3 lines to be output in this exact order you can combine them into one like this:
__asm__ __volatile__("mov $'A' , %al\n\t"
"mov $0x0e, %ah\n\t"
"int $0x10");
These are basic assembly statements. It's not required to specify __volatile__on them as they are already implicitly volatile, so it has no effect. From the original poster's answer it is clear they want to eventually use variables in __asm__ blocks. This is doable with extended inline assembly (the instruction string is followed by a colon : followed by constraints.):
With extended asm you can read and write C variables from assembler and perform jumps from assembler code to C labels. Extended asm syntax uses colons (‘:’) to delimit the operand parameters after the assembler template:
asm [volatile] ( AssemblerTemplate
: OutputOperands
[ : InputOperands
[ : Clobbers ] ])
This answer isn't a tutorial on inline assembly. The general rule of thumb is that one should not use inline assembly unless you have to. Inline assembly done wrong can create hard to track bugs or have unusual side effects. Unfortunately doing 16-bit interrupts in C pretty much requires it, or you write the entire function in assembly (ie: NASM).
This is an example of a print_chr function that take a nul terminated string and prints each character out one by one using Int 10h/ah=0ah:
#include <stdint.h>
__asm__(".code16gcc\n");
void print_str(char *str) {
while (*str) {
/* AH=0x0e, AL=char to print, BH=page, BL=fg color */
__asm__ __volatile__ ("int $0x10"
:
: "a" ((0x0e<<8) | *str++),
"b" (0x0000));
}
}
hw.asm would be modified to look like this:
push welcome
call print_str ;call function
The idea when this is assembled/compiled (using the commands in the first section of this answer) and run is that it print out the welcome message. Unfortunately it will almost never work, and may even crash some emulators like QEMU.
code16 is Almost Useless and Should Not be Used
In the last section we learn that a simple function that takes a parameter ends up not working and may even crash an emulator like QEMU. The main problem is that the __asm__(".code16\n"); statement really doesn't work well with the code generated by GCC. The Binutils AS documentation says:
‘.code16gcc’ provides experimental support for generating 16-bit code from gcc, and differs from ‘.code16’ in that ‘call’, ‘ret’, ‘enter’, ‘leave’, ‘push’, ‘pop’, ‘pusha’, ‘popa’, ‘pushf’, and ‘popf’ instructions default to 32-bit size. This is so that the stack pointer is manipulated in the same way over function calls, allowing access to function parameters at the same stack offsets as in 32-bit mode. ‘.code16gcc’ also automatically adds address size prefixes where necessary to use the 32-bit addressing modes that gcc generates.
.code16gcc is what you really need to be using, not .code16. This force GNU assembler on the back end to emit address and operand prefixes on certain instructions so that the addresses and operands are treated as 4 bytes wide, and not 2 bytes.
The hand written code in NASM doesn't know it will be calling C instructions, nor does NASM have a directive like .code16gcc. You'll need to modify the assembly code to push 32-bit values on to the stack in real mode. You will also need to override the call instruction so that the return address needs to be treated as a 32-bit value, not 16-bit. This code:
push welcome
call print_str ;call function
Should be:
jmp 0x0000:setcs
setcs:
cld
push dword welcome
call dword print_str ;call function
GCC has a requirement that the direction flag be cleared before calling any C function. I added the CLD instruction to the top of the assembly code to make sure this is the case. GCC code also needs to have CS to 0x0000 to work properly. The FAR JMP does just that.
You can also drop the __asm__(".code16gcc\n"); on modern GCC that supports the -m16 option. -m16 automatically places a .code16gcc into the file that is being compiled.
Since GCC also uses the full 32-bit stack pointer it is a good idea to initialize ESP with 0x7c00, not just SP. Change mov sp, 0x7C00 to mov esp, 0x7C00. This ensures the full 32-bit stack pointer is 0x7c00.
The modified kmain.c code should now look like:
#include <stdint.h>
void print_str(char *str) {
while (*str) {
/* AH=0x0e, AL=char to print, BH=page, BL=fg color */
__asm__ __volatile__ ("int $0x10"
:
: "a" ((0x0e<<8) | *str++),
"b" (0x0000));
}
}
and hw.asm:
extern print_str
global start
bits 16
section .text
start:
xor ax, ax ; AX = 0
mov ds, ax
mov es, ax
mov ss, ax
mov esp, 0x7C00
jmp 0x0000:setcs ; Set CS to 0
setcs:
cld ; GCC code requires direction flag to be cleared
push dword welcome
call dword print_str ; call function
cli
hlt
section .data
welcome db 'Developped by Marius Van Nieuwenhuyse', 0x0D, 0x0A, 0
These commands can be build the bootloader with:
gcc -fno-PIC -ffreestanding -m16 -c kmain.c -o kmain.o
ld -melf_i386 --build-id=none -T link.ld kmain.o hw.o -o kernel.elf
objcopy -O binary kernel.elf kernel.bin
When run with qemu-system-i386 -fda kernel.bin it should look simialr to:
In Most Cases GCC Produces Code that Requires 80386+
There are number of disadvantages to GCC generated code using .code16gcc:
ES=DS=CS=SS must be 0
Code must fit in the first 64kb
GCC code has no understanding of 20-bit segment:offset addressing.
For anything but the most trivial C code, GCC doesn't generate code that can run on a 286/186/8086. It runs in real mode but it uses 32-bit operands and addressing not available on processors earlier than 80386.
If you want to access memory locations above the first 64kb then you need to be in Unreal Mode(big) before calling into C code.
If you want to produce real 16-bit code from a more modern C compiler I recommend OpenWatcom C
The inline assembly is not as powerful as GCC
The inline assembly syntax is different but it is easier to use and less error prone than GCC's inline assembly.
Can generate code that will run on antiquated 8086/8088 processors.
Understands 20-bit segment:offset real mode addressing and supports the concept of far and huge pointers.
wlink the Watcom linker can produce basic flat binary files usable as a bootloader.
Zero Fill the BSS Section
The BIOS boot sequence doesn't guarantee that memory is actually zero. This causes a potential problem for the zero initialized region BSS. Before calling into C code for the first time the region should be zero filled by our assembly code. The linker script I originally wrote defines a symbol __bss_start that is the offset of the BSS memory and __bss_sizeb is the size in bytes. Using this info you can use the STOSB instruction to easily zero fill it. At the top of hw.asm you can add:
extern __bss_sizeb
extern __bss_start
And after the CLD instruction and before calling any C code you can do the zero fill this way:
; Zero fill the BSS section
mov cx, __bss_sizeb ; Size of BSS computed in linker script
mov di, __bss_start ; Start of BSS defined in linker script
rep stosb ; AL still zero, Fill memory with zero
Other Suggestions
To reduce the bloat of the code generated by the compiler it can be useful to use -fomit-frame-pointer. Compiling with -Os can optimize for space (rather than speed). We have limited space (512 bytes) for the initial code loaded by the BIOS so these optimizations can be beneficial. The command line for compiling could appear as:
gcc -fno-PIC -fomit-frame-pointer -ffreestanding -m16 -Os -c kmain.c -o kmain.o
I write a boot loader in asm and want to add some compiled C code in my project.
Then you need to use a 16-bit x86 compiler, such as OpenWatcom.
GCC cannot safely build real-mode code, as it is unaware of some important features of the platform, including memory segmentation. Inserting the .code16 directive will make the compiler generate incorrect output. Despite appearing in many tutorials, this piece of advice is simply incorrect, and should not be used.
First i want to express how to link C compiled code with assembled file.
I put together some Q/A in SO and reach to this.
C code:
func.c
//__asm__(".code16gcc\n");when we use eax, 32 bit reg we cant use this as truncate
//problem
#include <stdio.h>
int x = 0;
int madd(int a, int b)
{
return a + b;
}
void mexit(){
__asm__ __volatile__("mov $0, %ebx\n");
__asm__ __volatile__("mov $1, %eax \n");
__asm__ __volatile__("int $0x80\n");
}
char* tmp;
///how to direct use of arguments in asm command
void print_str(int a, char* s){
x = a;
__asm__("mov x, %edx\n");// ;third argument: message length
tmp = s;
__asm__("mov tmp, %ecx\n");// ;second argument: pointer to message to write
__asm__("mov $1, %ebx\n");//first argument: file handle (stdout)
__asm__("mov $4, %eax\n");//system call number (sys_write)
__asm__ __volatile__("int $0x80\n");//call kernel
}
void mtest(){
printf("%s\n", "Hi");
//putchar('a');//why not work
}
///gcc -c func.c -o func
Assembly code:
hello.asm
extern mtest
extern printf
extern putchar
extern print_str
extern mexit
extern madd
section .text ;section declaration
;we must export the entry point to the ELF linker or
global _start ;loader. They conventionally recognize _start as their
;entry point. Use ld -e foo to override the default.
_start:
;write our string to stdout
push msg
push len
call print_str;
call mtest ;print "Hi"; call printf inside a void function
; use add inside func.c
push 5
push 10
call madd;
;direct call of <stdio.h> printf()
push eax
push format
call printf; ;printf(format, eax)
call mexit; ;exit to OS
section .data ;section declaration
format db "%d", 10, 0
msg db "Hello, world!",0xa ;our dear string
len equ $ - msg ;length of our dear string
; nasm -f elf32 hello.asm -o hello
;Link two files
;ld hello func -o hl -lc -I /lib/ld-linux.so.2
; ./hl run code
;chain to assemble, compile, Run
;; gcc -c func.c -o func && nasm -f elf32 hello.asm -o hello && ld hello func -o hl -lc -I /lib/ld-linux.so.2 && echo &&./hl
Chain commands for assemble, compile and Run
gcc -c func.c -o func && nasm -f elf32 hello.asm -o hello && ld hello func -o hl -lc -I /lib/ld-linux.so.2 && echo && ./hl
Edit[toDO]
Write boot loader code instead of this version
Some explanation on how ld, gcc, nasm works.
I'm trying to write trampolines for x86 and amd64 so that a given function invocation is immediately vectored to an address stored at a known memory location (the purpose is to ensure the first target address lives within a given DLL (windows)).
The following code is attempting to use _fn as a memory location (or group of them) to start actual target addresses:
(*_fn[IDX])(); // rough equivalent in C
.globl _asmfn
_asmfn:
jmp *_fn+8*IDX(%rip)
The IDX is intended to be constructed using some CPP macros to provide a range of embedded DLL vectors each uniquely mapped to a slot in the _fn array of function pointers.
This works in a simple test program, but when I actually put it into a shared library (for the moment testing on OSX), I get a bus error when attempting to vector to the _asmfn code:
Invalid memory access of location 0x10aa1f320 rip=0x10aa1f320
The final target of this code is Windows, though I haven't tried it there yet (I figured I could at least prove out the assembly in a test case on OSX/intel first). Is the amd64 jump at least nominally correct, or have I missed something?
A good reference on trampolines on amd64.
EDIT
The jump does work properly on windows 7 (finally got a chance to test). However, I'm still curious to know why it is failing on OSX. The bus error is caused by a KERN_PROTECTION_FAILURE, which would appear to indicate that OS protections are preventing execution of that code. The target address is allocated memory (it's a trampoline generated by libffi), but I believe it to be properly marked as executable memory. If it's an executable memory issue, that would explain why my standalone test code works (the callback trampoline is compiled, not allocated).
When using PC-relative addressing, keep in mind that the offset must be within +- 2GB. That means your jump table and trampoline can't be too far away from each other. Regarding trampolines as such, what can be done on Windows x64 to transfer without requiring to clobber any registers is:
a sequence:
PUSH <high32>
MOV DWORD PTR [ RSP - 4 ], <low32>
RET
this works both on Win64 and UN*X x86_64. Although on UN*X, if the function uses the redzone then you're clobbering ...
a sequence:
JMP [ RIP ]
.L: <tgtaddr64>
again, applicable to both Win64 and UN*X x86_64.
a sequence:
MOV DWORD PTR [ RSP + c ], <low32>
MOV DWORD PTR [ RSP + 8 ], <high32>
JMP [ RSP + 8 ]
this is Win64-specific as it (ab)uses part of the 32-Byte "argument space" reserved (just above the return address on the stack) by the Win64 ABI; the UN*X x86_64 equiv to this would be to (ab)use part of the 128-Byte "red zone" reserved (just below the return address on the stack) there:
MOV DWORD PTR [ RSP - c ], <low32>
MOV DWORD PTR [ RSP - 8 ], <high32>
JMP [ RSP - 8 ]
Both are only usable if it's acceptable to clobber (overwrite) what's in there at the point of invoking the trampoline.
If is possible to directly construct such a position-independent register-neutral trampoline in memory - like this (for method 1.):
#include <stdint.h>
#include <stdio.h>
char *mystr = "Hello, World!\n";
int main(int argc, char **argv)
{
struct __attribute__((packed)) {
char PUSH;
uint32_t CONST_TO_PUSH;
uint32_t MOV_TO_4PLUS_RSP;
uint32_t CONST_TO_MOV;
char RET;
} mycode = {
0x68, ((uint32_t)printf),
0x042444c7, (uint32_t)((uintptr_t)printf >> 32),
0xc3
};
void *buf = /* fill in an OS-specific way to get an executable buffer */;
memcpy(buf, &mycode, sizeof(mycode));
__asm__ __volatile__(
"push $0f\n\t" // this is to make the "jmp" return
"jmp *%0\n\t"
"0:\n\t" : : "r"(buf), "D"(mystr), "a"(0));
return 0;
}
Note that this doesn't take into account whether any nonvolatile registers are being clobbered by the function "invoked"; I've also left out how to make the trampoline buffer executable (the stack ordinarily isn't on Win64/x86_64).
#HarryJohnston had the right of it, the permissions issue was encountered on OS X only. The code runs fine on its target windows environment.
How to write this assembly code as inline assembly? Compiler: gcc(i586-elf-gcc). The GAS syntax confuses me. Please give tell me how to write this as inline assembly that works for gcc.
.set_video_mode:
mov ah,00h
mov al,13h
int 10h
.init_mouse:
mov ax,0
int 33h
Similar one I have in assembly. I wrote them separate as assembly routines to call them from my C program. I need to call these and some more interrupts from C itself.
Also I need to put some values in some registers depending on which interrupt routine I'm calling. Please tell me how to do it.
All that I want to do is call interrupt routines from C. It's OK for me even to do it using int86() but i don't have source code of that function.
I want int86() so that i can call interrupts from C.
I am developing my own tiny OS so i got no restrictions for calling interrupts or for any direct hardware access.
I've not tested this, but it should get you started:
void set_video_mode (int x, int y) {
register int ah asm ("ah") = x;
register int al asm ("al") = y;
asm volatile ("int $0x10"
: /* no outputs */
: /* no inputs */
: /* clobbers */ "ah", "al");
}
I've put in two 'clobbers' as an example, but you'll need to set the correct list of clobbers so that the compiler knows you've overwritten register values (maybe none).
First, keep in mind GCC doesn't support 16-bit code yet, so you'll end up compiling 32-bit code in 16-bit mode, which is very inefficient but doable (it is used, for example, by Linux and SeaBIOS). It can be done with the following at the begging of each file:
__asm__ (".code16gcc");
Newer GCC versions (since 4.9 IIRC) support the -m16 flag that does the same thing.
Also, there's no mouse driver available unless you load it previous to your kernel running init_mouse.
You seem to be using an API commonly available in several x86 DOS.
asm can take care of the register assignments, so the code can be reduced to:
void set_video_mode(int mode)
{
mode &= 255;
__asm__ __volatile__ (
"int $0x10"
: "+a" (mode) /* %eax = mode & 255 => %ah = 0, %al = mode */
);
}
void init_mouse(void)
{
/* XXX it is really important to check the IDT entry isn't 0 */
int tmp = 0;
__asm__ __volatile__ (
"int $0x33"
: "+a" (tmp) /* %eax = 0*/
:: "ebx" /* %ebx is also clobbered by DOS mouse drivers */
);
}
The asm statement is documented in the GCC manual, although perhaps not in enough depth and lacks x86 examples. The outputs (after first colon) have a distinctively obscure syntax, while the rest is far easier to understand (the second colon specifies the inputs and the third the clobbered registers, flags and/or memory).
The outputs must be prefixed with =, meaning you don't care the previous value it may have had, or +, meaning you want to use it as an input too. In this context we use that instead of an input because the value is modified by the interrupt and you're not allowed to specify input registers in the clobbered list (because the compiler is forbidden from using them).