Related
I am working on an ARMv7 (Cortex-A7) system, and I want to read CPSR from C file in either ARM mode or THUMB mode.
Firstly, I used the embedded ASSEMBLY instruction in C function as follows,
__asm__ volatile("mrs %0, CPSR\n" : "=r"(regval));
When I compiled the C file with -mthumb and ran the code with GDB, it showed that the regval is 0x60000010 which is NOT the 0x60000030 shown by GDB!
So how to write a function to read CPSR in either ARM or THUMB mode?
Updated with compiling option
a) Build the code with following command line to specify the THUMB mode.
arm-linux-gnueabi-gcc -g2 backtrace.c -mcpu=cortex-a7 -static -mthumb -o tbacktrace
Run tbacktrace with qemu and GDB, I got different value as,
(gdb) p/x regval
$7 = 0x60000010
(gdb) p/x $cpsr
$8 = 0x60000030
The question is why my mrs %0, CPSR\n showd CPSR is ARM mode, instead of THUMB mode which the code is built.
b) When build the code with command line (not specify -mcpu=cortex-a7),
arm-linux-gnueabi-gcc -g2 backtrace.c -mthumb -o tbacktrace
there reported the following error.
$ arm-linux-gnueabi-gcc -g2 backtrace.c -mthumb -o tbacktrace
/tmp/ccOg2tlo.s: Assembler messages:
/tmp/ccOg2tlo.s:2256: Error: selected processor does not support `mrs r3,CPSR' in Thumb mode
/tmp/ccOg2tlo.s:2398: Error: selected processor does not support `mrs r3,CPSR' in Thumb mode
c) Build the code without -mcpu or -mthumb, the code can be built and ran well.
So I think there should be some other ways to get right CPSR in both ARM and THUMB modes.
Updated with more assembly codes.
arm-linux-gnueabi-objdump -M force-thumb -d a.elf shows following,
4000014c: 0ff0 lsrs r0, r6, #31
4000014e: e92d 0f30 stmdb sp!, {r4, r5, r8, r9, sl, fp}
40000152: ee30 3407 cdp 4, 3, cr3, cr0, cr7, {0}
40000156: e210 b.n 4000057a <__aeabi_f2d+0x16>
40000158: 3ba3 subs r3, #163 ; 0xa3
4000015a: e1a0 b.n 4000049e <__adddf3+0x1f6>
4000015c: 001b movs r3, r3
4000015e: 0a00 lsrs r0, r0, #8
40000160: a000 add r0, pc, #0 ; (adr r0, 40000164 <B_Loop1>)
40000162: e3a0 b.n 400008a6 <__udivmoddi4+0x19a>
......
400002a8 <__adddf3>:
400002a8: b530 push {r4, r5, lr}
400002aa: ea4f 0441 mov.w r4, r1, lsl #1
400002ae: ea4f 0543 mov.w r5, r3, lsl #1
400002b2: ea94 0f05 teq r4, r5
400002b6: bf08 it eq
400002b8: ea90 0f02 teqeq r0, r2
400002bc: bf1f itttt ne
Here is a part of code of the project, which is built with -mthumb -mcpu=cortex-a7.
As Nate and Frant mentioned, I think the code is running in THUMB mode, and checking Tbit of CPSR to detect ARM or THUMB mode is un-necessary, is it correct?
A way to detect THUMB or ARM mode
After reading Nate's and Frant's comments, I had an idea to detect which mode the CPU is not by reading Tbit of CPSR. The idea is by reading PC register two times, and check the difference. If it is 2 (length of THUMB instruction), CPU is running in THUMB mode, if it is 4 (length of ARM instruction), CPU is in ARM mode.
The code is as follows,
register uint32_t pc1, pc2;
asm volatile("mov %0, pc\n mov %1, pc" : "=r"(pc1), "=r"(pc2));
I built the code with and without -mthumb, with -Os, the code seems to be able to detect the THUMB or ARM mode.
CPSR.c:
#include <stdint.h>
int main(int argc, char* argv[]) {
uint32_t regval;
asm volatile("mrs %0, CPSR" : "=r"(regval));
return regval;
}
If you don't use -mcpu=cortex-a7, your compiler will default to another CPU:
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-gcc -O0 -nostartfiles -nostdlib -Wl,--section-start=.text=0x80800000 -S CPSR.c
cat CPSR.s
.cpu arm7tdmi
.arch armv4t
The ARM7TDMI-S was introduced in 2001, and, as pointed out by your compiler, does not seem to support mrs r3,CPSR in Thumb mode. Therefore, you must specify -mcpu=cortex-a7:
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-gcc -mcpu=cortex-a7 -O0 -nostartfiles -nostdlib -Wl,--section-start=.text=0x80800000 -S CPSR.c
cat CPSR.s
.cpu cortex-a7
.arch armv7-a
CPU and architecture are now as expected.
Testing your code on real hardware - a Cortex-A7 running u-boot - in Arm and Thumb mode:
Arm:
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-gcc -O0 -mcpu=cortex-a7 -marm -nostartfiles -nostdlib -Wl,--section-start=.text=0x80800000 -o CPSR-arm.elf CPSR.c
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000080800000
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-objcopy -O srec CPSR-arm.elf CPSR-arm.srec
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-objdump -j .text -D CPSR-arm.elf
CPSR-arm.elf: file format elf32-littlearm
Disassembly of section .text:
80800000 <main>:
80800000: e52db004 push {fp} ; (str fp, [sp, #-4]!)
80800004: e28db000 add fp, sp, #0
80800008: e24dd01c sub sp, sp, #28
8080000c: e50b0010 str r0, [fp, #-16]
80800010: e50b1014 str r1, [fp, #-20] ; 0xffffffec
80800014: e50b2018 str r2, [fp, #-24] ; 0xffffffe8
80800018: e10f3000 mrs r3, CPSR
8080001c: e50b3008 str r3, [fp, #-8]
80800020: e51b3008 ldr r3, [fp, #-8]
80800024: e1a00003 mov r0, r3
80800028: e28bd000 add sp, fp, #0
8080002c: e49db004 pop {fp} ; (ldr fp, [sp], #4)
80800030: e12fff1e bx lr
I.MX7d running u-boot:
# loads
## Ready for S-Record download ...
## First Load Addr = 0x80800000
## Last Load Addr = 0x80800033
## Total Size = 0x00000034 = 52 Bytes
CACHE: Misaligned operation at range [80800000, 80800034]
## Start Addr = 0x80800000
# go 0x80800000
## Starting application at 0x80800000 ...
## Application terminated, rc = 0x200000D3
Thumb:
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-gcc -O0 -mcpu=cortex-a7 -mthumb -nostartfiles -nostdlib -Wl,--section-start=.text=0x80800000 -o CPSR-thumb.elf CPSR.c
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000080800000
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-objcopy -O srec CPSR-thumb.elf CPSR-thumb.srec
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-objdump -j .text -D CPSR-thumb.elf
CPSR-thumb.elf: file format elf32-littlearm
Disassembly of section .text:
80800000 <main>:
80800000: b480 push {r7}
80800002: b087 sub sp, #28
80800004: af00 add r7, sp, #0
80800006: 60f8 str r0, [r7, #12]
80800008: 60b9 str r1, [r7, #8]
8080000a: 607a str r2, [r7, #4]
8080000c: f3ef 8300 mrs r3, CPSR
80800010: 617b str r3, [r7, #20]
80800012: 697b ldr r3, [r7, #20]
80800014: 4618 mov r0, r3
80800016: 371c adds r7, #28
80800018: 46bd mov sp, r7
8080001a: bc80 pop {r7}
8080001c: 4770 bx lr
I.MX7d running u-boot:
# loads
## Ready for S-Record download ...
## First Load Addr = 0x80800000
## Last Load Addr = 0x8080001D
## Total Size = 0x0000001E = 30 Bytes
CACHE: Misaligned operation at range [80800000, 8080001e]
## Start Addr = 0x80800000
#
# go 0x80800001
## Starting application at 0x80800001 ...
## Application terminated, rc = 0x200000D3
Bottom-line, both versions returned the same value for CPSR, i.e. 0x200000D3.
To the question
How to write a function to read ARM CPSR in either ARM or THUMB mode?
The answer would then be: The way you did.
Asking why p/x regval and p/x $cpsr are not returning the same value should be the topic for a different question, may be on the GDB forum.
Update #1: Nate Eldredge explained why the value read into the register has always the T bit set to zero.
Testing on a different Cortex-A7 (Allwinner H3), a JLink probe and the Ozone debugger, we can see that even though the value read by the MRS instruction is 0x200000D3, the value of CPSR_USR read by the JTAG probe and Ozone is 0x200001F3 when executing the Thumb version, and 0x200000D3 when executing the Arm version:
Arm:
Thumb:
This would I.M.H.O. perfectly validate his explanation.
Update #2
Still using the JLink debug probe, but in combination with JLinkGDBServerExe and arm-none-eabi-gdb 12.1 in TUI mode:
Arm:
Thumb:
The value for the CPSR register read by the JTAG probe is the one you would expect, i.e. has the Tbit set in Thumb mode.
You probably would get the same result in Linux using a TRACE32 JTAG probe.
Not sure this could be useful, but note that some pre-defined symbols differ when building an Arm or Thumb executable:
/opt/arm/11/arm-gnu-toolchain-11.3.rel1-x86_64-arm-none-eabi/bin/arm-none-eabi-gcc -dM -E -mcpu=cortex-a7 -marm - < /dev/null | grep -i arm
#define __ARM_SIZEOF_WCHAR_T 4
#define __ARM_FEATURE_SAT 1
#define __ARM_ARCH_ISA_ARM 1
#define __ARMEL__ 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_SIZEOF_MINIMAL_ENUM 1
#define __ARM_FEATURE_LDREX 15
#define __ARM_PCS 1
#define __ARM_FEATURE_QBIT 1
#define __ARM_ARCH_PROFILE 65
#define __ARM_32BIT_STATE 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_ARCH_ISA_THUMB 2
#define __ARM_ARCH 7
#define __ARM_FEATURE_UNALIGNED 1
#define __arm__ 1
#define __ARM_ARCH_7A__ 1
#define __ARM_FEATURE_SIMD32 1
#define __ARM_FEATURE_COPROC 15
#define __ARM_FEATURE_DSP 1
#define __ARM_ARCH_EXT_IDIV__ 1
#define __ARM_EABI__ 1
/opt/arm/11/arm-gnu-toolchain-11.3.rel1-x86_64-arm-none-eabi/bin/arm-none-eabi-gcc -dM -E -mcpu=cortex-a7 -mthumb - < /dev/null | grep -i thumb
#define __thumb2__ 1
#define __THUMB_INTERWORK__ 1
#define __thumb__ 1
#define __ARM_ARCH_ISA_THUMB 2
#define __THUMBEL__ 1
You could therefore use #ifdef __arm__ and #ifdef __thumb2__ statements in your code in order to know if you are executing the Arm or the Thumb version.
The instruction is working as designed and documented.
The discrepancy is in bit 5, which according to the ARMv7-A Architecture Reference Manual, is the T bit, indicating whether the processor is in Thumb state. It's one of the "execution state bits". Lower down on that page, under "Accessing the execution state bits", it says:
The execution state bits, other than the E bit, are RAZ [read as zero] when read by an MRS instruction.
So mrs rN, CPSR masks off those bits. I'm not sure why it's designed this way. But in principle you should already know whether you're in Thumb state or not, so it shouldn't really be necessary to read this information from CPSR.
On the other hand, gdb doesn't get its CPSR value from mrs rN, CPSR. I haven't checked, but I presume what happens is this: when your program hits a breakpoint, an exception is generated. This causes CPSR to be saved into SPSR (without masking any bits!), and the kernel's exception handler retrieves it from there to store as part of the saved context of your process, along with register values, etc. The saved context is made available to the debugger via appropriate system calls (e.g. ptrace(2)) and that's how it is able to display register contents and such. In particular, it gets the CPSR value that was saved at the breakpoint and which isn't masked.
I understand when to use a cobbler list (e.g. listing a register which is modified in the assembly so that it doesn't get chosen for use as an input register, etc), but I can't wrap my head around the the earlyclobber constraint &. If you list your outputs, wouldn't that already mean that inputs can't use the selected register (aside from matching digit constraints)?
For example:
asm(
"movl $1, %0;"
"addl $3, %0;"
"addl $4, %1;" // separate bug: modifies input-only operand
"addl %1, %0;"
: "=g"(num_out)
: "g"(num_in)
:
);
Would & even be needed for the output variables? The compiler should know the register that was selected for the output, and thus know not to use it for the input.
By default, the compiler assumes all inputs will be consumed before any output registers are written to, so that it's allowed to use the same registers for both. This leads to better code when possible, but if the assumption is wrong, things will fail catastrophically. The "early clobber" marker is a way to tell the compiler that this output will be written before all the input has been consumed, so it cannot share a register with any input.
GNU C inline asm syntax was designed to wrap a single instruction as efficiently as possible. You can put multiple instructions in an asm template, but the defaults (assuming that all inputs are read before any outputs are written) are designed around wrapping a single instruction.
It's the same constraint syntax as GCC uses in its machine-description files that teach the compiler what instructions are available in an ISA.
Minimal educational example
Here I provide a minimal educational example that attempts to make what https://stackoverflow.com/a/15819941/895245 mentioned clearer.
This specific code is of course not useful in practice, and could be achieved more efficiently a single lea 1(%q[in]), %out instruction, it is just a simple educational example.
main.c
#include <assert.h>
#include <inttypes.h>
int main(void) {
uint64_t in = 1;
uint64_t out;
__asm__ (
"mov %[in], %[out];" /* out = in */
"inc %[out];" /* out++ */
"mov %[in], %[out];" /* out = in */
"inc %[out];" /* out++ */
: [out] "=&r" (out)
: [in] "r" (in)
:
);
assert(out == 2);
}
Compile and run:
gcc -ggdb3 -std=c99 -O3 -Wall -Wextra -pedantic -o main.out main.c
./main.out
This program is correct and the assert passes, because & forces the compiler to choose different registers for in and out.
This is because & tells the compiler that in might be used after out was written to, which is actually the case here.
Therefore, the only way to not wrongly modify in is to put in and out in different registers.
The disassembly:
gdb -nh -batch -ex 'disassemble/rs main' main.out
contains:
0x0000000000001055 <+5>: 48 89 d0 mov %rdx,%rax
0x0000000000001058 <+8>: 48 ff c0 inc %rax
0x000000000000105b <+11>: 48 89 d0 mov %rdx,%rax
0x000000000000105e <+14>: 48 ff c0 inc %rax
which shows that GCC chose rax for out and rdx for in.
If we remove the & however, the behavior is unspecified.
In my test system, the assert actually fails, because the compiler tries to minimize register usage, and compiles to:
0x0000000000001055 <+5>: 48 89 c0 mov %rax,%rax
0x0000000000001058 <+8>: 48 ff c0 inc %rax
0x000000000000105b <+11>: 48 89 c0 mov %rax,%rax
0x000000000000105e <+14>: 48 ff c0 inc %rax
therefore using rax for both in and out.
The result of this is that out is incremented twice, and equals 3 instead of 2 in the end.
Tested in Ubuntu 18.10 amd64, GCC 8.2.0.
More practical examples
multiplication implicit output registers
non-hardcoded scratch registers: GCC: Prohibit use of some registers
I write a boot loader in asm and want to add some compiled C code in my project.
I created a test function here:
test.c
__asm__(".code16\n");
void print_str() {
__asm__ __volatile__("mov $'A' , %al\n");
__asm__ __volatile__("mov $0x0e, %ah\n");
__asm__ __volatile__("int $0x10\n");
}
And here is the asm code (the boot loader):
hw.asm
[org 0x7C00]
[BITS 16]
[extern print_str] ;nasm tip
start:
mov ax, 0
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00
mov si, name
call print_string
mov al, ' '
int 10h
mov si, version
call print_string
mov si, line_return
call print_string
call print_str ;call function
mov si, welcome
call print_string
jmp mainloop
mainloop:
mov si, prompt
call print_string
mov di, buffer
call get_str
mov si, buffer
cmp byte [si], 0
je mainloop
mov si, buffer
;call print_string
mov di, cmd_version
call strcmp
jc .version
jmp mainloop
.version:
mov si, name
call print_string
mov al, ' '
int 10h
mov si, version
call print_string
mov si, line_return
call print_string
jmp mainloop
name db 'MOS', 0
version db 'v0.1', 0
welcome db 'Developped by Marius Van Nieuwenhuyse', 0x0D, 0x0A, 0
prompt db '>', 0
line_return db 0x0D, 0x0A, 0
buffer times 64 db 0
cmd_version db 'version', 0
%include "functions/print.asm"
%include "functions/getstr.asm"
%include "functions/strcmp.asm"
times 510 - ($-$$) db 0
dw 0xaa55
I need to call the c function like a simple asm function
Without the extern and the call print_str, the asm script boot in VMWare.
I tried to compile with:
nasm -f elf32
But i can't call org 0x7C00
Compiling & Linking NASM and GCC Code
This question has a more complex answer than one might believe, although it is possible. Can the first stage of a bootloader (the original 512 bytes that get loaded at physical address 0x07c00) make a call into a C function? Yes, but it requires rethinking how you build your project.
For this to work you can no longer us -f bin with NASM. This also means you can't use the org 0x7c00 to tell the assembler what address the code expects to start from. You'll need to do this through a linker (either us LD directly or GCC for linking). Since the linker will lay things out in memory we can't rely on placing the boot sector signature 0xaa55 in our output file. We can get the linker to do that for us.
The first problem you will discover is that the default linker scripts used internally by GCC don't lay things out the way we want. We'll need to create our own. Such a linker script will have to set the origin point (Virtual Memory Address aka VMA) to 0x7c00, place the code from your assembly file before the data and place the boot signature at offset 510 in the file. I'm not going to write a tutorial on Linker scripts. The Binutils Documentation contains almost everything you need to know about linker scripts.
OUTPUT_FORMAT("elf32-i386");
/* We define an entry point to keep the linker quiet. This entry point
* has no meaning with a bootloader in the binary image we will eventually
* generate. Bootloader will start executing at whatever is at 0x07c00 */
ENTRY(start);
SECTIONS
{
. = 0x7C00;
.text : {
/* Place the code in hw.o before all other code */
hw.o(.text);
*(.text);
}
/* Place the data after the code */
.data : SUBALIGN(2) {
*(.data);
*(.rodata*);
}
/* Place the boot signature at LMA/VMA 0x7DFE */
.sig 0x7DFE : {
SHORT(0xaa55);
}
/* Place the uninitialised data in the area after our bootloader
* The BIOS only reads the 512 bytes before this into memory */
.bss : SUBALIGN(4) {
__bss_start = .;
*(COMMON);
*(.bss)
. = ALIGN(4);
__bss_end = .;
}
__bss_sizeb = SIZEOF(.bss);
/* Remove sections that won't be relevant to us */
/DISCARD/ : {
*(.eh_frame);
*(.comment);
}
}
This script should create an ELF executable that can be converted to a flat binary file with OBJCOPY. We could have output as a binary file directly but I separate the two processes out in the event I want to include debug information in the ELF version for debug purposes.
Now that we have a linker script we must remove the ORG 0x7c00 and the boot signature. For simplicity sake we'll try to get the following code (hw.asm) to work:
extern print_str
global start
bits 16
section .text
start:
xor ax, ax ; AX = 0
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00
call print_str ; call function
/* Halt the processor so we don't keep executing code beyond this point */
cli
hlt
You can include all your other code, but this sample will still demonstrate the basics of calling into a C function.
Assume the code above you can now generate the ELF object from hw.asm producing hw.o using this command:
nasm -f elf32 hw.asm -o hw.o
You compile each C file with something like:
gcc -ffreestanding -c kmain.c -o kmain.o
I placed the C code you had into a file called kmain.c . The command above will generate kmain.o. I noticed you aren't using a cross compiler so you'll want to use -fno-PIE to ensure we don't generate relocatable code. -ffreestanding tells GCC the C standard library may not exist, and main may not be the program entry point. You'd compile each C file in the same way.
To link this code to a final executable and then produce a flat binary file that can be booted we do this:
ld -melf_i386 --build-id=none -T link.ld kmain.o hw.o -o kernel.elf
objcopy -O binary kernel.elf kernel.bin
You specify all the object files to link with the LD command. The LD command above will produce a 32-bit ELF executable called kernel.elf. This file can be useful in the future for debugging purposes. Here we use OBJCOPY to convert kernel.elf to a binary file called kernel.bin. kernel.bin can be used as a bootloader image.
You should be able to run it with QEMU using this command:
qemu-system-i386 -fda kernel.bin
When run it may look like:
You'll notice the letter A appears on the last line. This is what we'd expect from the print_str code.
GCC Inline Assembly is Hard to Get Right
If we take your example code in the question:
__asm__ __volatile__("mov $'A' , %al\n");
__asm__ __volatile__("mov $0x0e, %ah\n");
__asm__ __volatile__("int $0x10\n");
The compiler is free to reorder these __asm__ statements if it wanted to. The int $0x10 could appear before the MOV instructions. If you want these 3 lines to be output in this exact order you can combine them into one like this:
__asm__ __volatile__("mov $'A' , %al\n\t"
"mov $0x0e, %ah\n\t"
"int $0x10");
These are basic assembly statements. It's not required to specify __volatile__on them as they are already implicitly volatile, so it has no effect. From the original poster's answer it is clear they want to eventually use variables in __asm__ blocks. This is doable with extended inline assembly (the instruction string is followed by a colon : followed by constraints.):
With extended asm you can read and write C variables from assembler and perform jumps from assembler code to C labels. Extended asm syntax uses colons (‘:’) to delimit the operand parameters after the assembler template:
asm [volatile] ( AssemblerTemplate
: OutputOperands
[ : InputOperands
[ : Clobbers ] ])
This answer isn't a tutorial on inline assembly. The general rule of thumb is that one should not use inline assembly unless you have to. Inline assembly done wrong can create hard to track bugs or have unusual side effects. Unfortunately doing 16-bit interrupts in C pretty much requires it, or you write the entire function in assembly (ie: NASM).
This is an example of a print_chr function that take a nul terminated string and prints each character out one by one using Int 10h/ah=0ah:
#include <stdint.h>
__asm__(".code16gcc\n");
void print_str(char *str) {
while (*str) {
/* AH=0x0e, AL=char to print, BH=page, BL=fg color */
__asm__ __volatile__ ("int $0x10"
:
: "a" ((0x0e<<8) | *str++),
"b" (0x0000));
}
}
hw.asm would be modified to look like this:
push welcome
call print_str ;call function
The idea when this is assembled/compiled (using the commands in the first section of this answer) and run is that it print out the welcome message. Unfortunately it will almost never work, and may even crash some emulators like QEMU.
code16 is Almost Useless and Should Not be Used
In the last section we learn that a simple function that takes a parameter ends up not working and may even crash an emulator like QEMU. The main problem is that the __asm__(".code16\n"); statement really doesn't work well with the code generated by GCC. The Binutils AS documentation says:
‘.code16gcc’ provides experimental support for generating 16-bit code from gcc, and differs from ‘.code16’ in that ‘call’, ‘ret’, ‘enter’, ‘leave’, ‘push’, ‘pop’, ‘pusha’, ‘popa’, ‘pushf’, and ‘popf’ instructions default to 32-bit size. This is so that the stack pointer is manipulated in the same way over function calls, allowing access to function parameters at the same stack offsets as in 32-bit mode. ‘.code16gcc’ also automatically adds address size prefixes where necessary to use the 32-bit addressing modes that gcc generates.
.code16gcc is what you really need to be using, not .code16. This force GNU assembler on the back end to emit address and operand prefixes on certain instructions so that the addresses and operands are treated as 4 bytes wide, and not 2 bytes.
The hand written code in NASM doesn't know it will be calling C instructions, nor does NASM have a directive like .code16gcc. You'll need to modify the assembly code to push 32-bit values on to the stack in real mode. You will also need to override the call instruction so that the return address needs to be treated as a 32-bit value, not 16-bit. This code:
push welcome
call print_str ;call function
Should be:
jmp 0x0000:setcs
setcs:
cld
push dword welcome
call dword print_str ;call function
GCC has a requirement that the direction flag be cleared before calling any C function. I added the CLD instruction to the top of the assembly code to make sure this is the case. GCC code also needs to have CS to 0x0000 to work properly. The FAR JMP does just that.
You can also drop the __asm__(".code16gcc\n"); on modern GCC that supports the -m16 option. -m16 automatically places a .code16gcc into the file that is being compiled.
Since GCC also uses the full 32-bit stack pointer it is a good idea to initialize ESP with 0x7c00, not just SP. Change mov sp, 0x7C00 to mov esp, 0x7C00. This ensures the full 32-bit stack pointer is 0x7c00.
The modified kmain.c code should now look like:
#include <stdint.h>
void print_str(char *str) {
while (*str) {
/* AH=0x0e, AL=char to print, BH=page, BL=fg color */
__asm__ __volatile__ ("int $0x10"
:
: "a" ((0x0e<<8) | *str++),
"b" (0x0000));
}
}
and hw.asm:
extern print_str
global start
bits 16
section .text
start:
xor ax, ax ; AX = 0
mov ds, ax
mov es, ax
mov ss, ax
mov esp, 0x7C00
jmp 0x0000:setcs ; Set CS to 0
setcs:
cld ; GCC code requires direction flag to be cleared
push dword welcome
call dword print_str ; call function
cli
hlt
section .data
welcome db 'Developped by Marius Van Nieuwenhuyse', 0x0D, 0x0A, 0
These commands can be build the bootloader with:
gcc -fno-PIC -ffreestanding -m16 -c kmain.c -o kmain.o
ld -melf_i386 --build-id=none -T link.ld kmain.o hw.o -o kernel.elf
objcopy -O binary kernel.elf kernel.bin
When run with qemu-system-i386 -fda kernel.bin it should look simialr to:
In Most Cases GCC Produces Code that Requires 80386+
There are number of disadvantages to GCC generated code using .code16gcc:
ES=DS=CS=SS must be 0
Code must fit in the first 64kb
GCC code has no understanding of 20-bit segment:offset addressing.
For anything but the most trivial C code, GCC doesn't generate code that can run on a 286/186/8086. It runs in real mode but it uses 32-bit operands and addressing not available on processors earlier than 80386.
If you want to access memory locations above the first 64kb then you need to be in Unreal Mode(big) before calling into C code.
If you want to produce real 16-bit code from a more modern C compiler I recommend OpenWatcom C
The inline assembly is not as powerful as GCC
The inline assembly syntax is different but it is easier to use and less error prone than GCC's inline assembly.
Can generate code that will run on antiquated 8086/8088 processors.
Understands 20-bit segment:offset real mode addressing and supports the concept of far and huge pointers.
wlink the Watcom linker can produce basic flat binary files usable as a bootloader.
Zero Fill the BSS Section
The BIOS boot sequence doesn't guarantee that memory is actually zero. This causes a potential problem for the zero initialized region BSS. Before calling into C code for the first time the region should be zero filled by our assembly code. The linker script I originally wrote defines a symbol __bss_start that is the offset of the BSS memory and __bss_sizeb is the size in bytes. Using this info you can use the STOSB instruction to easily zero fill it. At the top of hw.asm you can add:
extern __bss_sizeb
extern __bss_start
And after the CLD instruction and before calling any C code you can do the zero fill this way:
; Zero fill the BSS section
mov cx, __bss_sizeb ; Size of BSS computed in linker script
mov di, __bss_start ; Start of BSS defined in linker script
rep stosb ; AL still zero, Fill memory with zero
Other Suggestions
To reduce the bloat of the code generated by the compiler it can be useful to use -fomit-frame-pointer. Compiling with -Os can optimize for space (rather than speed). We have limited space (512 bytes) for the initial code loaded by the BIOS so these optimizations can be beneficial. The command line for compiling could appear as:
gcc -fno-PIC -fomit-frame-pointer -ffreestanding -m16 -Os -c kmain.c -o kmain.o
I write a boot loader in asm and want to add some compiled C code in my project.
Then you need to use a 16-bit x86 compiler, such as OpenWatcom.
GCC cannot safely build real-mode code, as it is unaware of some important features of the platform, including memory segmentation. Inserting the .code16 directive will make the compiler generate incorrect output. Despite appearing in many tutorials, this piece of advice is simply incorrect, and should not be used.
First i want to express how to link C compiled code with assembled file.
I put together some Q/A in SO and reach to this.
C code:
func.c
//__asm__(".code16gcc\n");when we use eax, 32 bit reg we cant use this as truncate
//problem
#include <stdio.h>
int x = 0;
int madd(int a, int b)
{
return a + b;
}
void mexit(){
__asm__ __volatile__("mov $0, %ebx\n");
__asm__ __volatile__("mov $1, %eax \n");
__asm__ __volatile__("int $0x80\n");
}
char* tmp;
///how to direct use of arguments in asm command
void print_str(int a, char* s){
x = a;
__asm__("mov x, %edx\n");// ;third argument: message length
tmp = s;
__asm__("mov tmp, %ecx\n");// ;second argument: pointer to message to write
__asm__("mov $1, %ebx\n");//first argument: file handle (stdout)
__asm__("mov $4, %eax\n");//system call number (sys_write)
__asm__ __volatile__("int $0x80\n");//call kernel
}
void mtest(){
printf("%s\n", "Hi");
//putchar('a');//why not work
}
///gcc -c func.c -o func
Assembly code:
hello.asm
extern mtest
extern printf
extern putchar
extern print_str
extern mexit
extern madd
section .text ;section declaration
;we must export the entry point to the ELF linker or
global _start ;loader. They conventionally recognize _start as their
;entry point. Use ld -e foo to override the default.
_start:
;write our string to stdout
push msg
push len
call print_str;
call mtest ;print "Hi"; call printf inside a void function
; use add inside func.c
push 5
push 10
call madd;
;direct call of <stdio.h> printf()
push eax
push format
call printf; ;printf(format, eax)
call mexit; ;exit to OS
section .data ;section declaration
format db "%d", 10, 0
msg db "Hello, world!",0xa ;our dear string
len equ $ - msg ;length of our dear string
; nasm -f elf32 hello.asm -o hello
;Link two files
;ld hello func -o hl -lc -I /lib/ld-linux.so.2
; ./hl run code
;chain to assemble, compile, Run
;; gcc -c func.c -o func && nasm -f elf32 hello.asm -o hello && ld hello func -o hl -lc -I /lib/ld-linux.so.2 && echo &&./hl
Chain commands for assemble, compile and Run
gcc -c func.c -o func && nasm -f elf32 hello.asm -o hello && ld hello func -o hl -lc -I /lib/ld-linux.so.2 && echo && ./hl
Edit[toDO]
Write boot loader code instead of this version
Some explanation on how ld, gcc, nasm works.
I would like to implement header files in my c-code which consists partly of GCC inline assembly code for 16 bit real mode but i seem to have linking problems. This is what my header file console.h looks like:
#ifndef CONSOLE_H
#define CONSOLE_H
extern void kprintf(char*);
#endif
and this is console.c:
#include "console.h"
void kprintf(char *string)
{
for(int i=0;string[i]!='\0';i++)
{
asm("mov $0x0e,%%ah;"
"mov $0x00,%%bh;"
"mov %0,%%al;"
"int $0x10"::"g"(string[i]):"eax", "ebx");
}
}
the last one hellworld.c:
asm("jmp main");
#include "console.h"
void main()
{
asm("mov $0x1000,%ax;"
"mov %ax,%es;"
"mov %ax,%ds");
char string[]="hello world";
kprintf(string);
asm(".rept 512;"
"hlt;"
".endr");
}
My bootloader is in bootloader.asm:
org 0x7c00
bits 16
section .text
mov ax,0x1000
mov ss,ax
mov sp,0x000
mov esp,0xfffe
xor ax,ax
mov es,ax
mov ds,ax
mov [bootdrive],dl
mov bh,0
mov bp,zeichen
mov ah,13h
mov bl,06h
mov al,1
mov cx,6
mov dh,010h
mov dl,01h
int 10h
load:
mov dl,[bootdrive]
xor ah,ah
int 13h
jc load
load2:
mov ax,0x1000
mov es,ax
xor bx,bx
mov ah,2
mov al,1
mov cx,2
xor dh,dh
mov dl,[bootdrive]
int 13h
jc load2
mov ax,0
mov es,ax
mov bh,0
mov bp,zeichen3
mov ah,13h
mov bl,06h
mov al,1
mov cx,13
mov dh,010h
mov dl,01h
int 10h
mov ax,0x1000
mov es,ax
mov ds,ax
jmp 0x1000:0x000
zeichen db 'hello2'
zeichen3 db 'soweit so gut'
bootdrive db 0
times 510 - ($-$$) hlt
dw 0xaa55
Now I use the following buildscript build.sh:
#!bin/sh
nasm -f bin bootloader.asm -o bootloader.bin
gcc hellworld.c -m16 -c -o hellworld.o -nostdlib -ffreestanding
gcc console.c -m16 -c -o console.o -nostdlib link.ld -ffreestanding
ld -melf_i386 -Ttext=0x0000 console.o hellworld.o -o hellworld.elf
objcopy -O binary hellworld.elf hellworld.bin
cat bootloader.bin hellworld.bin >disk.img
qemu-system-i386 disk.img
and the linkscript link.ld:
/*
* link.ld
*/
OUTPUT_FORMAT(elf32-i386)
SECTIONS
{
. = 0x0000;
.text : { *(.startup); *(.text) }
.data : { *(.data) }
.bss : { *(.bss) }
}
Unfortunately it isn't working because it doesn't print the expected hello world. I think there must be something wrong with the linking command:
ld -melf_i386 -Ttext=0x0000 console.o hellword.o link.ld -o hellworld.elf`
How do I link header-files in 16-bit mode correctly?
When I write the kprintf function directly in the hellworld.c it is working correctly. I am using Linux Mint Cinnamon Version 18 64 bit for development.
The header files are not really the issue at all. When you restructured the code and split it into multiple objects it has identified issues with how you build and how jmp main is placed into the final kernel file.
I have created a set of files that make all the adjustments discussed below if you wish to test the complete set of changes to see if they rectify your problems.
Although you show the linker script, you aren't actually using it. In your build file you have:
ld -melf_i386 -Ttext=0x0000 console.o hellworld.o -o hellworld.elf
It should be:
ld -melf_i386 -Tlink.ld console.o hellworld.o -o hellworld.elf
When using -c (compiles but doesn't link) with GCC don't specify link.ld as a linker script. The linker script can be specified at link time when you invoke LD. This line:
gcc console.c -m16 -c -o console.o -nostdlib link.ld -ffreestanding
Should be:
gcc console.c -m16 -c -o console.o -nostdlib -ffreestanding
In order for this linker script to locate the jmp main in a place that is first in the output kernel file you need to change:
asm("jmp main");
To:
asm(".pushsection .startup\r\n"
"jmp main\r\n"
".popsection\r\n");
The .pushsection temporarily changes the section to .startup, outputs the instruction jmp main and then restores the section with .popsection to whatever it was before. The linker script deliberately places anything in the .startup section before anything else. This ensures the jmp main (or any other instructions you place there) appear as the very first instructions of the output kernel file. The \r\n can be replaced by ; (semicolon). \r\n makes for prettier output if you ever have GCC generate an assembly file.
As mentioned in the comments of a now deleted question your kernel file exceeds the size of a single sector. When you don't have a linker script, the default one will place the data section after the code. Your code has repeated the hlt instruction so that your kernel is greater than 1 sector (512 bytes) and your bootloader only reads a single sector with Int 13h/AH=2h .
To rectify this remove:
asm(".rept 512;"
"hlt;"
".endr");
And replace it with:
asm("cli;"
"hlt;");
You should be mindful that as your kernel grows you'll need to adjust the number of sectors read in bootloader.asm to ensure all of the kernel is loaded into memory.
I also suggest that to keep QEMU, and other virtual machines happy that you simply generate a well known disk image size and place the bootloader and kernel inside it. Rather than:
cat bootloader.bin hellworld.bin >disk.img
Use this:
dd if=/dev/zero of=disk.img bs=1024 count=1440
dd if=bootloader.bin of=disk.img seek=0 conv=notrunc
dd if=hellworld.bin of=disk.img seek=1 conv=notrunc
The first command makes a zero filled file of 1440kb. This is the exact size of a 1.44MB floppy. The second command inserts bootloader.bin in the first sector without truncating the disk file. The third command places the kernel file into the disk images starting at the second sector on the disk without truncating the disk image.
I had made available a slightly improved linker script. It was amended to remove some of the potential cruft that the linker may insert into the kernel that won't be of much use and specifically identifies some of the sections like .rodata (read only data) etc.
/*
* link.ld
*/
OUTPUT_FORMAT(elf32-i386)
SECTIONS
{
. = 0x0000;
.text : { *(.startup); *(.text) }
.data : { *(.data); *(.rodata) }
.bss : { *(COMMON); *(.bss) }
/DISCARD/ : {
*(.eh_frame);
*(.comment);
*(.note.gnu.build-id);
}
}
Other Comments
Not related to your question but this code can be removed:
asm("mov $0x1000,%ax;"
"mov %ax,%es;"
"mov %ax,%ds");
You do this in bootloader.asm, so setting these segment registers again with the same value won't do anything useful.
You can improve the extended assembly template by using input constraints to pass the values you need via register EAX(AX) and EBX(BX) rather than coding the moves inside the template. Your code could have looked like:
void kprintf(const char *string)
{
while (*string)
{
asm("int $0x10"
:
:"a"((0x0e<<8) | *string++), /* AH = 0x0e, AL = char to print */
"b"(0)); /* BH = 0x00 page #
BL = 0x00 unused in text mode */
}
}
<< is the C bit shift left operator. 0x0e<<8 would shift 0x0e left 8 bits which would be 0x0e00. | is bitwise OR which effectively places the character to print in the lower 8 bits. That value is then passed into the EAX register by the assembly template via input constraint "a".
It is hard to say without knowing what your bootloader.asm does, but:
The link order must be wrong;
ld -melf_i386 -Ttext=0x0000 console.o hellworld.o -o hellworld.elf
should be:
ld -melf_i386 -Ttext=0x0000 hellworld.o console.o -o hellworld.elf
(Edit: I see that you have a linker script which would remove the need for this re-arrangement, but you're not using it for the link).
I suspect that your bootloader loads a single sector, and your padding:
asm(".rept 512;"
"hlt;"
".endr");
... prevents the code from the other object file from ever being loaded, since it pads hellword.o to (more than) the size of a sector.
The problem is nothing to do with the use of header files, it is because you have two compilation units which become separate objects, and the combined size of both when linked is larger than a sector (512 bytes).
I'm using Keil uVision with gcc compiler (Sourcery Codebenchlite for ARM EABI ) to program the STM32F4 cortex M4 chip.
The compiler control strings I have set are:
-march=armv7e-m -mfpu=fpv4-sp-d16 -mfloat-abi=softfp -std=gnu99 -fsingle-precision-constant
When the debugger encounters some mathematical functions (e.g. asinf(), atan2f() etc), it stops.
I have checked that the arguments for these functions are also single-precision.
I think it is because of some missing compiler directives for the use of VFP floating point, but was unable to identify it.
Is there anything I have missed out?
The disassembly code of an example I did:
The debugger can evaluate atan2f(0.3,0.4), but stops at 0x0803B9CA when it evaluates atan2f(a,b). Didn't know why the number works but not variables.
377: float a = 0.3;
0x0803B9BA 4B1E LDR r3,[pc,#120] ; #0x0803BA34
0x0803B9BC 63BB STR r3,[r7,#0x38]
378: float b = 0.4;
379:
0x0803B9BE 4B1E LDR r3,[pc,#120] ; #0x0803BA38
0x0803B9C0 637B STR r3,[r7,#0x34]
380: float c = atan2f(0.3,0.4);
0x0803B9C2 4B1E LDR r3,[pc,#120] ; #0x0803BA3C
0x0803B9C4 633B STR r3,[r7,#0x30]
381: float d = atan2f(a,b);
382:
0x0803B9C6 6BB8 LDR r0,[r7,#0x38]
0x0803B9C8 6B79 LDR r1,[r7,#0x34]
0x0803B9CA F004F993 BL.W atan2f (0x0803FCF4)
0x0803B9CE 62F8 STR r0,[r7,#0x2C]
On the STM32F4 you first need to enable the FPU - otherwise the CPU will jump into the HardFault_Handler or BusFault_Handler (I'm not shure which one).
You can do it in C/C++ anywhere before you use floating point instructions (maybe at the beginning of main()?). Assuming you use the CMSIS library and have the core_m4.h included (maybe through stm32f4xx.h):
void cortexm4f_enable_fpu() {
/* set CP10 and CP11 Full Access */
SCB->CPACR |= ((3UL << 10*2)|(3UL << 11*2));
}
The alternative is assembler code in the startup file:
/*enable fpu begin*/
ldr r0, =0xe000ed88 /*; enable cp10,cp11 */
ldr r1,[r0]
ldr r2, =0xf00000
orr r1,r1,r2
str r1,[r0]
/*enable fpu end*/
(I found the code somewhere on the internet, don't know where though. I used it myself, it works).
Maybe your problem is located there?