I'm trying to get familiar with the linking and startup procedures in ARM Cortex-M4 microcontrollers. Looking through the linker scripts almost all the sections are marked as loadable.
At first I thought that meant it would be copied from flash to RAM, but then I learned that doing that is handled in a different way. So what does it mean for a section in flash to be loadable? Isn't it already loaded and run from the location in flash? Also, I'm referring to a section section containing instructions.
Does loadable in this context mean loading by the debugger into the device?
a fully functional cortex-m program
flash.s
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.align
.thumb_func
.globl dummy
dummy:
bx lr
so.c
void dummy ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<10;ra++) dummy(ra);
return(0);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -c so.c -o so.o
arm-none-eabi-ld -o so.elf -T flash.ld flash.o so.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
(can use cortex-m4 either works)
so.list
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000011 andeq r0, r0, r1, lsl r0
8: 00000017 andeq r0, r0, r7, lsl r0
c: 00000017 andeq r0, r0, r7, lsl r0
00000010 <reset>:
10: f000 f804 bl 1c <notmain>
14: e7ff b.n 16 <hang>
00000016 <hang>:
16: e7fe b.n 16 <hang>
00000018 <dummy>:
18: 4770 bx lr
1a: 46c0 nop ; (mov r8, r8)
0000001c <notmain>:
1c: b510 push {r4, lr}
1e: 2400 movs r4, #0
20: 0020 movs r0, r4
22: 3401 adds r4, #1
24: f7ff fff8 bl 18 <dummy>
28: 2c0a cmp r4, #10
2a: d1f9 bne.n 20 <notmain+0x4>
2c: 2000 movs r0, #0
2e: bc10 pop {r4}
30: bc02 pop {r1}
32: 4708 bx r1
being a less complicated linker script and program the elf has less stuff
part of readelf
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 010000 000034 00 AX 0 0 4
[ 2] .ARM.attributes ARM_ATTRIBUTES 00000000 010034 00002d 00 0 0 1
[ 3] .comment PROGBITS 00000000 010061 000011 01 MS 0 0 1
[ 4] .symtab SYMTAB 00000000 010074 0000f0 10 5 12 4
[ 5] .strtab STRTAB 00000000 010164 00003d 00 0 0 1
[ 6] .shstrtab STRTAB 00000000 0101a1 00003a 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
y (purecode), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010000 0x00000000 0x00000000 0x00034 0x00034 R E 0x10000
hexdump -C so.bin
00000000 00 10 00 20 11 00 00 00 17 00 00 00 17 00 00 00 |... ............|
00000010 00 f0 04 f8 ff e7 fe e7 70 47 c0 46 10 b5 00 24 |........pG.F...$|
00000020 20 00 01 34 ff f7 f8 ff 0a 2c f9 d1 00 20 10 bc | ..4.....,... ..|
00000030 02 bc 08 47 |...G|
00000034
a "binary" comes in many flavors, it is an unfortunately poorly named term as it is so confusing. Everything you see in the disassembly above is required for the program to run, that is the real binary part of this, it has to be loaded wherever you need it loaded to run. if on an operating system then the operating system reads the elf file extracts the loadable sections with their addresses/offsets and loads them into memory before launching at the entry point. Take advantage of tools already having the elf file format and we can re-use some of that for microcontrollers, we cant/dont normally use the entry point as that makes no sense, we have to make the vector/entry point match what the hardware needs, in this case a vector table.
The hexdump coming from the objcopy also shows the parts we have to have visible to the processor for the program to run, or loaded in address/memory space (flash is in memory or address space in this case).
But the "binary" file, the elf also contains debug symbols in case you want to pull up a debugger, add some more options on the toolchain command line and you get even more information about where these items are in the source code file so you could in some cases see the high level langauge when single stepping through code. Those are not loadable sections, they are descriptive of the program or just there to help out, they are not machine code nor data that the processor needs to execute the program so they do not need to be loaded into the memory space.
yet another "binary" file format
arm-none-eabi-objcopy so.elf -O ihex so.hex
cat so.hex
:100000000010002011000000170000001700000081
:1000100000F004F8FFE7FEE77047C04610B5002483
:1000200020000134FFF7F8FF0A2CF9D1002010BCA2
:0400300002BC0847BF
:00000001FF
it has a little extra information but almost all of it is the part we have to load into the address space for the program to run
another binary file format
S00A0000736F2E7372656338
S1130000001000201100000017000000170000007D
S113001000F004F8FFE7FEE77047C04610B500247F
S113002020000134FFF7F8FF0A2CF9D1002010BC9E
S107003002BC0847BB
S9030000FC
also most of it is program.
elf is just another file format (fairly popular with gnu tools, but still just another file format), it contains the machine code and data required plus a bunch of other stuff, the machine code and data is what we have to load into ram if this is an operating system, or what we ideally load into flash for a microcontroller, but not all microcontrollers are equal some are ram only based and the program is downloaded via usb during enumeration (into ram). and other solutions, or if debugging you could probably have items loaded into ram depending on the mcu and tools, although that is not how the thing boots so that wouldnt really be a good binary.
if you feel the need to zero .bss and to have any .data then you need additional information, the offset and size of .bss and the offset and contents of .data, the bootstrap then zeros and copies those items, and you need that information in the non-volatile flash/rom, it is just more data that is required of the machine code and data needed to run the program. If you are letting others write the code for you then there are possibly tailored linker scripts and bootstrap code that allows you to just have .data items and push a build button on a gui and it all magically is in place when your entry point (main() by convention and or standard) starts execution or the code that represents your high level code at the C entry point.
Related
Well I have written a bootloader in assembly and trying to load a C kernel from it.
This is the bootloader:
bits 16
xor ax,ax
jmp 0x0000:boot
extern kernel_main
global boot
boot:
mov ah, 0x02 ; load second stage to memory
mov al, 1 ; numbers of sectors to read into memory
mov dl, 0x80 ; sector read from fixed/usb disk ;0 for floppy; 0x80 for hd
mov ch, 0 ; cylinder number
mov dh, 0 ; head number
mov cl, 2 ; sector number
mov bx, 0x8000 ; load into es:bx segment :offset of buffer
int 0x13 ; disk I/O interrupt
mov ax, 0x2401
int 0x15 ; enable A20 bit
mov ax, 0x3
int 0x10 ; set vga text mode 3
cli
lgdt [gdt_pointer] ; load the gdt table
mov eax, cr0
or eax,0x1 ; set the protected mode bit on special CPU reg cr0
mov cr0, eax
jmp CODE_SEG:boot2 ; long jump to the code segment
gdt_start:
dq 0x0
gdt_code:
dw 0xFFFF
dw 0x0
db 0x0
db 10011010b
db 11001111b
db 0x0
gdt_data:
dw 0xFFFF
dw 0x0
db 0x0
db 10010010b
db 11001111b
db 0x0
gdt_end:
gdt_pointer:
dw gdt_end - gdt_start
dd gdt_start
CODE_SEG equ gdt_code - gdt_start
DATA_SEG equ gdt_data - gdt_start
bits 32
boot2:
mov ax, DATA_SEG
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
; mov esi,hello
; mov ebx,0xb8000
;.loop:
; lodsb
; or al,al
; jz haltz
; or eax,0x0100
; mov word [ebx], ax
; add ebx,2
; jmp .loop
;haltz:
;hello: db "Hello world!",0
mov esp,kernel_stack_top
jmp kernel_main
cli
hlt
times 510 -($-$$) db 0
dw 0xaa55
section .bss
align 4
kernel_stack_bottom: equ $
resb 16384 ; 16 KB
kernel_stack_top:
and this is the C kernel:
__asm__("cli\n");
void kernel_main(void){
const char string[] = "012345678901234567890123456789012345678901234567890123456789012";
volatile unsigned char* vid_mem = (unsigned char*) 0xb8000;
int j=0;
while(string[j]!='\0'){
*vid_mem++ = (unsigned char) string[j++];
*vid_mem++ = 0x09;
}
for(;;);
}
Now I am compiling both the source separately into an ELF output file.
And linking them through a linker script and output a raw binary file and load it with qemu.
Linker script:
ENTRY(boot)
OUTPUT_FORMAT("binary")
SECTIONS{
. = 0x7c00;
.boot1 : {
*(.boot)
}
.kernel : AT(0x7e00){
*(.text)
*(.rodata)
*(.data)
_bss_start = .;
*(.bss)
*(COMMON)
_bss_end = .;
*(.comment)
*(.symtab)
*(.shstrtab)
*(.strtab)
}
/DISCARD/ : {
*(.eh_frame)
}
}
with a build script:
nasm -f elf32 boot.asm -o boot.o
/home/rakesh/Desktop/cross-compiler/i686-elf-4.9.1-Linux-x86_64/bin/i686-elf-gcc -m32 kernel.c -o kernel.o -e kernel_main -Ttext 0x0 -nostdlib -ffreestanding -std=gnu99 -mno-red-zone -fno-exceptions -nostdlib -Wall -Wextra
/home/rakesh/Desktop/cross-compiler/i686-elf-4.9.1-Linux-x86_64/bin/i686-elf-ld boot.o kernel.o -o kernel.bin -T linker3.ld
qemu-system-x86_64 kernel.bin
But I am running into a little problem.
notice that string in the C kernel
const char string[] = "012345678901234567890123456789012345678901234567890123456789012";
when its size is equal to or less than 64 bytes (along with the null termination). then the program works correctly.
however when the string size increases from 64 bytes then the program seems to not work
I was trying to debug it myself and observed that when the string size is less than or equal to 64 bytes then the output ELF file, the kernel.o have following contents :
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x1
Start of program headers: 52 (bytes into file)
Start of section headers: 4412 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 7
Section header string table index: 4
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 001000 0000bd 00 AX 0 0 1
[ 2] .eh_frame PROGBITS 000000c0 0010c0 000034 00 A 0 0 4
[ 3] .comment PROGBITS 00000000 0010f4 000011 01 MS 0 0 1
[ 4] .shstrtab STRTAB 00000000 001105 000034 00 0 0 1
[ 5] .symtab SYMTAB 00000000 001254 0000a0 10 6 6 4
[ 6] .strtab STRTAB 00000000 0012f4 00002e 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0x00000000 0x00000000 0x000f4 0x000f4 R E 0x1000
Section to Segment mapping:
Segment Sections...
00 .text .eh_frame
There is no dynamic section in this file.
There are no relocations in this file.
The decoding of unwind sections for machine type Intel 80386 is not currently supported.
Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 SECTION LOCAL DEFAULT 1
2: 000000c0 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 FILE LOCAL DEFAULT ABS kernel.c
5: 00000000 0 FILE LOCAL DEFAULT ABS
6: 00000001 188 FUNC GLOBAL DEFAULT 1 kernel_main
7: 000010f4 0 NOTYPE GLOBAL DEFAULT 2 __bss_start
8: 000010f4 0 NOTYPE GLOBAL DEFAULT 2 _edata
9: 000010f4 0 NOTYPE GLOBAL DEFAULT 2 _end
No version information found in this file.
However when the size of the string is more than 64 bytes the contents are like this:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x1
Start of program headers: 52 (bytes into file)
Start of section headers: 4432 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 8
Section header string table index: 5
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 001000 000083 00 AX 0 0 1
[ 2] .rodata PROGBITS 00000084 001084 000041 00 A 0 0 4
[ 3] .eh_frame PROGBITS 000000c8 0010c8 000038 00 A 0 0 4
[ 4] .comment PROGBITS 00000000 001100 000011 01 MS 0 0 1
[ 5] .shstrtab STRTAB 00000000 001111 00003c 00 0 0 1
[ 6] .symtab SYMTAB 00000000 001290 0000b0 10 7 7 4
[ 7] .strtab STRTAB 00000000 001340 00002e 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0x00000000 0x00000000 0x00100 0x00100 R E 0x1000
Section to Segment mapping:
Segment Sections...
00 .text .rodata .eh_frame
There is no dynamic section in this file.
There are no relocations in this file.
The decoding of unwind sections for machine type Intel 80386 is not currently supported.
Symbol table '.symtab' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 SECTION LOCAL DEFAULT 1
2: 00000084 0 SECTION LOCAL DEFAULT 2
3: 000000c8 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 SECTION LOCAL DEFAULT 4
5: 00000000 0 FILE LOCAL DEFAULT ABS kernel.c
6: 00000000 0 FILE LOCAL DEFAULT ABS
7: 00000001 130 FUNC GLOBAL DEFAULT 1 kernel_main
8: 00001100 0 NOTYPE GLOBAL DEFAULT 3 __bss_start
9: 00001100 0 NOTYPE GLOBAL DEFAULT 3 _edata
10: 00001100 0 NOTYPE GLOBAL DEFAULT 3 _end
No version information found in this file.
I noticed that the string is now in the .rodata section with a size of 41 hex or 65 bytes, which has to be mapped to a segment, possibly the 0th segment which is NULL.
And that the program is unable to find the .rodata.
I am unable to make it work. I understand the ELF structure but I don't know how to work with them.
Two serious issues cause most of the problems are:
You load the second sector of the disk to 0x0000:0x8000 when all of the code expect the kernel to be loaded after the bootloader at 0x0000:0x7e00
You compile your kernel.c straight to an executable name kernel.o. You should compile it to a proper object file so it can go through the expected linking phase when you run ld.
To fix the problem with the kernel being loaded into memory at the wrong memory location, change:
mov bx, 0x8000 ; load into es:bx segment :offset of buffer
to:
mov bx, 0x7e00 ; load into es:bx segment :offset of buffer
To fix the issue of compiling kernel.cto an executable ELF file named kernel.o remove the -e kernel_main -Ttext 0x0 and replace it with -c. -c option forces GCC to produce an object file that can be properly linked with LD. Change:
/home/rakesh/Desktop/cross-compiler/i686-elf-4.9.1-Linux-x86_64/bin/i686-elf-gcc -m32 kernel.c -o kernel.o -e kernel_main -Ttext 0x0 -nostdlib -ffreestanding -std=gnu99 -mno-red-zone -fno-exceptions -nostdlib -Wall -Wextra
to:
/home/rakesh/Desktop/cross-compiler/i686-elf-4.9.1-Linux-x86_64/bin/i686-elf-gcc -m32 -c kernel.c -o kernel.o -nostdlib -ffreestanding -std=gnu99 -mno-red-zone -fno-exceptions -Wall -Wextra
Reason for Failure with Longer Strings
The reason the string with less than 64 bytes worked is because the compiler generated code in a position independent way by initializing the array on the stack with immediate values. When the size reached 64 bytes the compiler placed the string into the .rodata section and then initialized the array on the stack by copying it from the .rodata. This made your code position dependent. Your code was loaded at the wrong offsets and had incorrect origin points yielding code referencing incorrect addresses, so it failed.
Other Observations
You should initialize your BSS (.bss) section to 0 before calling kernel_main. This can be done in assembly by iterating through all the bytes from offset _bss_start to offset _bss_end.
The .comment section will be emitted into your binary file wasting bytes as a result. You should put it in the /DISCARD/ section.
You should place the BSS section in your linker script after all the others so it doesn't take up space in kernel.bin
In boot.asm you should set SS:SP (stack pointer) near the beginning before reading disk sectors. It should be set to a place that won't interfere with your code. This is especially important when reading data into memory from disk since you don't know where the BIOS placed the current stack. You don't want to read on top of the current stack area. Setting it just below the bootloader at 0x0000:0x7c00 should work.
Before calling into C code you should clear the direction flag to ensure string instructions use forward movement. You can do this by using the CLD instruction.
In boot.asmyou can make your code more generic by using the boot drive number passed by the BIOS in the DL register rather than hard coding it to the value 0x80 (0x80 being the first hard drive)
You might consider turning on optimization with -O3, or using optimization level -Os to optimize for code size.
Your linker script doesn't quite work the way you expect although it produces the correct results. You never declared .boot section in your NASM file so nothing actually gets placed in the .boot1 output section in the linker script. It works because it gets included in the .text section in the .kernel output section.
It is preferable to remove the padding and boot signature from the assembly file and move it to the linker script
Instead of having your linker script output a binary file directly, it is more useful to output to the default ELF executable format. You can then use OBJCOPY to convert the ELF file to a binary file. This allows you to build with debug information which will appear as part of the ELF executable. The ELF executable can be used to symbolically debug your binary kernel in QEMU.
Rather than use LD directly for linking, use GCC. This has the advantage that the libgcc library can be added without specifying the full path to the library. libgcc is a set of routines that may be needed for C code generation with GCC
Revised source code, linker script and build commands with the observations above taken into account:
boot.asm:
bits 16
section .boot
extern kernel_main
extern _bss_start
extern _bss_len
global boot
jmp 0x0000:boot
boot:
; Place realmode stack pointer below bootloader where it doesn't
; get in our way
xor ax, ax
mov ss, ax
mov sp, 0x7c00
mov ah, 0x02 ; load second stage to memory
mov al, 1 ; numbers of sectors to read into memory
; Remove this, DL is already set by BIOS to current boot drive number
; mov dl, 0x80 ; sector read from fixed/usb disk ;0 for floppy; 0x80 for hd
mov ch, 0 ; cylinder number
mov dh, 0 ; head number
mov cl, 2 ; sector number
mov bx, 0x7e00 ; load into es:bx segment :offset of buffer
int 0x13 ; disk I/O interrupt
mov ax, 0x2401
int 0x15 ; enable A20 bit
mov ax, 0x3
int 0x10 ; set vga text mode 3
cli
lgdt [gdt_pointer] ; load the gdt table
mov eax, cr0
or eax,0x1 ; set the protected mode bit on special CPU reg cr0
mov cr0, eax
jmp CODE_SEG:boot2 ; long jump to the code segment
gdt_start:
dq 0x0
gdt_code:
dw 0xFFFF
dw 0x0
db 0x0
db 10011010b
db 11001111b
db 0x0
gdt_data:
dw 0xFFFF
dw 0x0
db 0x0
db 10010010b
db 11001111b
db 0x0
gdt_end:
gdt_pointer:
dw gdt_end - gdt_start
dd gdt_start
CODE_SEG equ gdt_code - gdt_start
DATA_SEG equ gdt_data - gdt_start
bits 32
boot2:
mov ax, DATA_SEG
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
; Zero out the BSS area
cld
mov edi, _bss_start
mov ecx, _bss_len
xor eax, eax
rep stosb
mov esp,kernel_stack_top
call kernel_main
cli
hlt
section .bss
align 4
kernel_stack_bottom: equ $
resb 16384 ; 16 KB
kernel_stack_top:
kernel.c:
void kernel_main(void){
const char string[] = "01234567890123456789012345678901234567890123456789012345678901234";
volatile unsigned char* vid_mem = (unsigned char*) 0xb8000;
int j=0;
while(string[j]!='\0'){
*vid_mem++ = (unsigned char) string[j++];
*vid_mem++ = 0x09;
}
for(;;);
}
linker3.ld:
ENTRY(boot)
SECTIONS{
. = 0x7c00;
.boot1 : {
*(.boot);
}
.sig : AT(0x7dfe){
SHORT(0xaa55);
}
. = 0x7e00;
.kernel : AT(0x7e00){
*(.text);
*(.rodata*);
*(.data);
_bss_start = .;
*(.bss);
*(COMMON);
_bss_end = .;
_bss_len = _bss_end - _bss_start;
}
/DISCARD/ : {
*(.eh_frame);
*(.comment);
}
}
Commands to build this bootloader and kernel:
nasm -g -F dwarf -f elf32 boot.asm -o boot.o
i686-elf-gcc -g -O3 -m32 kernel.c -c -o kernel.o -ffreestanding -std=gnu99 \
-mno-red-zone -fno-exceptions -Wall -Wextra
i686-elf-gcc -nostdlib -Wl,--build-id=none -T linker3.ld boot.o kernel.o \
-lgcc -o kernel.elf
objcopy -O binary kernel.elf kernel.bin
To symbolically debug the 32-bit kernel with QEMU you can launch QEMU this way:
qemu-system-i386 -fda kernel.bin -S -s &
gdb kernel.elf \
-ex 'target remote localhost:1234' \
-ex 'break *kernel_main' \
-ex 'layout src' \
-ex 'continue'
This will start up your kernel.bin file in QEMU and then remotely connect the GDB debugger. The layout should show the source code and break on kernel_main.
In the LPC4088 user manual (p. 876) we can read that LPC4088 microcontroler has a really extraordinary startup procedure:
This looks like a total nonsense and I need someone to help me clear things out... In the world of ARM I've heard countless times to put vector table looking like this:
reset: b _start
undefined: b undefined
software_interrupt: b software_interrupt
prefetch_abort: b prefetch_abort
data_abort: b data_abort
nop
interrupt_request: b interrupt_request
fast_interrupt_request: b fast_interrupt_request
exactly at location 0x00000000 in my binary file, but why would we do that if this location is shadowed at boot with a boot ROM vector table which can't even be changed as it is read-only?! So where can we put our own vector table? I thought about putting it at 0x1FFF0000 so it would be transferred to location 0x00000000 at reset but can't do that because of read-only area...
Now to the second part. ARM expects to find exactly 8 vectors at 0x00000000 and at reset boot ROM checks if sum of 8 vectors is zero and only if this is true user code executes. To pass this check we need to sum up first 7 vectors and save it's 2's complement to the last vector which is a vector for fast interrupt requests residing at 0x0000001C. Well this is only true if your code is 4-bytes aligned (ARM encoding) but is it still true if your code is 2-bytes aligned (Thumb encoding) which is the case with all Cortex-M4 cores that can only execute Thumb encoded opcodes... So why did they explicitly mention that 2's complement of the sum has to be at 0x0000001C when this will never come in to play with Cortex-M4. Is 0x0000000E the proper address to save the 2's complement to?
And third part. Why would boot ROM even check if sum of first 8 vectors is zero when they are already in boot ROM?! And are read-only!
Can you see something is weird here? I need someone to explain to me the unclarities in the above three paragraphs...
you need to read the arm documentation as well as the nxp documentation. The non-cortex-m cores boot differently than the cortex-m cores you keep getting stuck there.
The cortex m is documented in the armv7m ARM ARM (architectural reference manual). It is based on VECTORS not INSTRUCTIONS. An address to the handler not an instruction like in full sized arm cores. Exception 7 is documented as reserved (for the ARM7TDMI based mcus from them it was the reserved vector they used for this checksum as well). Depending on the arm core you are using they expect as many as 144 or 272 (exceptions plus up to 128 or 256 interrupts depending on what the core supports).
(note the aarch64 processor, armv8 in 64 bit mode also boots differently than the traditional full sized 32 bit arm processor, even bigger table).
This checksum thing is classic NXP and makes sense, no reason to launch into an erased or not properly prepared flash and brick or hang.
.cpu cortex-m0
.thumb
.thumb_func
.globl _start
_start:
.word 0x20001000 # 0 SP load
.word reset # 1 Reset
.word hang # 2 NMI
.word hang # 3 HardFault
.word hang # 4 MemManage
.word hang # 5 BusFault
.word hang # 6 UsageFault
.word 0x00000000 # 7 Reserved
.thumb_func
hang: b hang
.thumb_func
reset:
b hang
which gives:
Disassembly of section .text:
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000023 andeq r0, r0, r3, lsr #32
8: 00000021 andeq r0, r0, r1, lsr #32
c: 00000021 andeq r0, r0, r1, lsr #32
10: 00000021 andeq r0, r0, r1, lsr #32
14: 00000021 andeq r0, r0, r1, lsr #32
18: 00000021 andeq r0, r0, r1, lsr #32
1c: 00000000 andeq r0, r0, r0
00000020 <hang>:
20: e7fe b.n 20 <hang>
00000022 <reset>:
22: e7fd b.n 20 <hang>
now make an ad-hoc tool that does the checksum and adds it to the binary
Looking at the above program as words this is the program:
0x20001000
0x00000023
0x00000021
0x00000021
0x00000021
0x00000021
0x00000021
0xDFFFEF38
0xE7FDE7FE
and if you flash it the bootloader should be happy with it and let it run.
Now that is assuming the checksum is word based if it is byte based then you would want a different number.
99% of baremetal programming is reading and research. If you had a binary from them already built or used a sandbox that supports this processor or family you could examine the binary built and see how all of this works. Or look at someones github examples or blog to see how this works. They did document this, and they have used this scheme for many years now before they were NXP, so nothing really new...Now is it a word based or byte based checksum, the documentation implies word based and that makes more sense. but a simple experiment and/or looking at sandbox produced binaries would have resolved that.
How I did it for this answer.
#include <stdio.h>
unsigned int data[8]=
{
0x20001000,
0x00000023,
0x00000021,
0x00000021,
0x00000021,
0x00000021,
0x00000021,
0x00000000,
};
int main ( void )
{
unsigned int ra;
unsigned int rb;
rb=0;
for(ra=0;ra<7;ra++)
{
rb+=data[ra];
}
data[7]=(-rb);
rb=0;
for(ra=0;ra<8;ra++)
{
rb+=data[ra];
printf("0x%08X 0x%08X\n",data[ra],rb);
}
return(0);
}
output:
0x20001000 0x20001000
0x00000023 0x20001023
0x00000021 0x20001044
0x00000021 0x20001065
0x00000021 0x20001086
0x00000021 0x200010A7
0x00000021 0x200010C8
0xDFFFEF38 0x00000000
then cut and pasted stuff into the answer.
How I have done it in the past is make an adhoc util that I call from my makefile that operates on the objcopied .bin file and either modifies that one or creates a new .bin file that has the checksum applied. You should be able to write that in 20-50 lines of code, choose your favorite language.
another comment question:
.cpu cortex-m0
.thumb
.word one
.word two
.word three
.thumb_func
one:
nop
two:
.thumb_func
three:
nop
Disassembly of section .text:
00000000 <one-0xc>:
0: 0000000d andeq r0, r0, sp
4: 0000000e andeq r0, r0, lr
8: 0000000f andeq r0, r0, pc
0000000c <one>:
c: 46c0 nop ; (mov r8, r8)
0000000e <three>:
e: 46c0 nop ; (mov r8, r8)
the .thumb_func affects the label AFTER...
GCC and Clang compilers seem to employ some dark magic. The C code just negates the value of a double, but the assembler instructions involve bit-wise XOR and the instruction pointer. Can somebody explain what is happening and why is it an optimal solution. Thank you.
Contents of test.c:
void function(double *a, double *b) {
*a = -(*b); // This line.
}
The resulting assembler instructions:
(gcc)
0000000000000000 <function>:
0: f2 0f 10 06 movsd xmm0,QWORD PTR [rsi]
4: 66 0f 57 05 00 00 00 xorpd xmm0,XMMWORD PTR [rip+0x0] # c <function+0xc>
b: 00
c: f2 0f 11 07 movsd QWORD PTR [rdi],xmm0
10: c3 ret
(clang)
0000000000000000 <function>:
0: f2 0f 10 06 movsd xmm0,QWORD PTR [rsi]
4: 0f 57 05 00 00 00 00 xorps xmm0,XMMWORD PTR [rip+0x0] # b <function+0xb>
b: 0f 13 07 movlps QWORD PTR [rdi],xmm0
e: c3 ret
The assembler instruction at address 0x4 represents "This line", however I can't understand how it works. The xorpd/xorps instructions are supposed to be bit-wise XOR and PTR [rip] is the instruction pointer.
I suspect that at the moment of execution rip is pointing somewhere near the 0f 57 05 00 00 00 0f strip of bytes, but I can't quite figure out, how is this working and why do both compilers choose this approach.
P.S. I should point out that this is compiled using -O3
for me the output of gcc with the -S -O3 options for the same code is:
.file "test.c"
.text
.p2align 4,,15
.globl function
.type function, #function
function:
.LFB0:
.cfi_startproc
movsd (%rsi), %xmm0
xorpd .LC0(%rip), %xmm0
movsd %xmm0, (%rdi)
ret
.cfi_endproc
.LFE0:
.size function, .-function
.section .rodata.cst16,"aM",#progbits,16
.align 16
.LC0:
.long 0
.long -2147483648
.long 0
.long 0
.ident "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
.section .note.GNU-stack,"",#progbits
here the xorpd instruction uses instruction pointer relative addressing with the offset which points to .LC0 label with the 64 bit value 0x8000000000000000(the 63rd bit is set to one).
.LC0:
.long 0
.long -2147483648
if your compiler was big endian these lines where swaped.
xoring the double value with 0x8000000000000000 sets the sign bit(which is the 63rd bit) to one for a negative value.
clang uses xorps instruction for the same manner this xors the first 32bit of the double value.
if you run object dump with -r option it will show you the relocations that should be done on the program before running it.
objdump -d test.o -r
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <function>:
0: f2 0f 10 06 movsd (%rsi),%xmm0
4: 66 0f 57 05 00 00 00 xorpd 0x0(%rip),%xmm0 # c <function+0xc>
b: 00
8: R_X86_64_PC32 .LC0-0x4
c: f2 0f 11 07 movsd %xmm0,(%rdi)
10: c3 retq
Disassembly of section .text.startup:
0000000000000000 <main>:
0: 31 c0 xor %eax,%eax
2: c3 retq
here at <function + 0xb> we have a relocation of type R_X86_64_PC32.
PS: I'm using gcc 6.3.0
xorps xmm0,XMMWORD PTR [rip+0x0]
Any part of an instruction surrounded by [] is an indirect reference to memory.
In this case a reference to the memory at address RIP+0
(I doubt it is actually RIP+0, you might have edited the actual offset)
The X64 instruction set adds instruction pointer relative addressing. This means you can have (usually read-only) data in your program that you can address easily even if the program is moved around in memory.
A XOR xmm0,Y inverts all bits in xmm0 that are set in Y.
Negation involves inverting the sign bit, so that's why xor is used. Specifically xorpd/s because we are dealing with double resp. single floats.
I am trying to make an embedded system. I have some C code, however, before the main function runs, some pre-initialization is needed. Is there a way to tell the gcc compiler, that a certain function is to be put in the .init section rather than the .text section?
this is the code:
#include <stdint.h>
#define REGISTERS_BASE 0x3F000000
#define MAIL_BASE 0xB880 // Base address for the mailbox registers
// This bit is set in the status register if there is no space to write into the mailbox
#define MAIL_FULL 0x80000000
// This bit is set in the status register if there is nothing to read from the mailbox
#define MAIL_EMPTY 0x40000000
struct Message
{
uint32_t messageSize;
uint32_t requestCode;
uint32_t tagID;
uint32_t bufferSize;
uint32_t requestSize;
uint32_t pinNum;
uint32_t on_off_switch;
uint32_t end;
};
struct Message m =
{
.messageSize = sizeof(struct Message),
.requestCode =0,
.tagID = 0x00038041,
.bufferSize = 8,
.requestSize =0,
.pinNum = 130,
.on_off_switch = 1,
.end = 0,
};
void _start()
{
__asm__
(
"mov sp, #0x8000 \n"
"b main"
);
}
/** Main function - we'll never return from here */
int main(void)
{
uint32_t mailbox = MAIL_BASE + REGISTERS_BASE + 0x18;
volatile uint32_t status;
do
{
status = *(volatile uint32_t *)(mailbox);
}
while((status & 0x80000000));
*(volatile uint32_t *)(MAIL_BASE + REGISTERS_BASE + 0x20) = ((uint32_t)(&m) & 0xfffffff0) | (uint32_t)(8);
while(1);
}
EDIT: using __attribute__(section("init")) doesn't seem to be working
Dont understand why you think you need a .init section for baremetal. A complete working example for a pi zero (using .init)
start.s
.section .init
.globl _start
_start:
mov sp,#0x8000
bl centry
b .
so.c
unsigned int data=5;
unsigned int bss;
unsigned int centry ( void )
{
return(0);
}
so.ld
MEMORY
{
ram : ORIGIN = 0x8000, LENGTH = 0x1000
}
SECTIONS
{
.init : { *(.init*) } > ram
.text : { *(.text*) } > ram
.bss : { *(.bss*) } > ram
.data : { *(.data*) } > ram
}
build
arm-none-eabi-as start.s -o start.o
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-ld -T so.ld start.o so.o -o so.elf
arm-none-eabi-objdump -D so.elf
Disassembly of section .init:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <centry>
8008: eafffffe b 8008 <_start+0x8>
Disassembly of section .text:
0000800c <centry>:
800c: e3a00000 mov r0, #0
8010: e12fff1e bx lr
Disassembly of section .bss:
00008014 <bss>:
8014: 00000000 andeq r0, r0, r0
Disassembly of section .data:
00008018 <data>:
8018: 00000005 andeq r0, r0, r5
Note if you do it right you dont need to init .bss in the bootstrap (put .data after .bss and make sure there is at least one item in .data)
hexdump -C so.bin
00000000 02 d9 a0 e3 00 00 00 eb fe ff ff ea 00 00 a0 e3 |................|
00000010 1e ff 2f e1 00 00 00 00 05 00 00 00 |../.........|
0000001c
if you want them in separate places then yes your linker script gets instantly more complicated as well as your bootstrap (with lots of room for error).
The only thing the extra work that .init buys you here IMO is that you can re-arrange the linker command line
arm-none-eabi-ld -T so.ld so.o start.o -o so.elf
get rid of .init all together
Disassembly of section .text:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <centry>
8008: eafffffe b 8008 <_start+0x8>
0000800c <centry>:
800c: e3a00000 mov r0, #0
8010: e12fff1e bx lr
Disassembly of section .bss:
00008014 <bss>:
8014: 00000000 andeq r0, r0, r0
Disassembly of section .data:
00008018 <data>:
8018: 00000005 andeq r0, r0, r5
No issues, works fine. Just have to know that with gnu ld (and probably others) if you dont call out something in the linker script then it fills things in in the order presented (on the command line).
Whether or not a compiler uses sections or what they are named is compiler specific, so you have to dig into the compiler specific options to see if there are any to change the defaults. Using C to bootstrap C is more work than it is worth, gcc will accept assembly files if it is a Makefile issue you are having problems with, very rarely is there a reason to use inline assembly when you can use real assembly and have something more reliable and maintainable. In real assembly then these things are trivial.
.section .helloworld
.globl _start
_start:
mov sp,#0x8000
bl centry
b .
Disassembly of section .helloworld:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: ebfffffd bl 8000 <_start>
8008: eafffffe b 8008 <bss>
Disassembly of section .text:
00008000 <centry>:
8000: e3a00000 mov r0, #0
8004: e12fff1e bx lr
Disassembly of section .bss:
00008008 <bss>:
8008: 00000000 andeq r0, r0, r0
Disassembly of section .data:
0000800c <data>:
800c: 00000005 andeq r0, r0, r5
real assembly is generally used for bootstrapping, no compiler games required, no needing to go back and maintain the code every so often because of compiler games, porting is easier, etc.
I am trying to use gcc for embedded program generation. The gcc came from the Mingw toolset,
which is the same gcc I use to generate regular applications under Windows. I am trying to
generate a straight binary, located at 0x1000, for use in an embedded i386/32 bit system.
It gets tested under Bochs.
My linker file looks like this:
/* a list of files to link */
INPUT(test.o)
/* output format */
OUTPUT_FORMAT("pe-i386")
/* output filename */
/* OUTPUT_FILENAME("test.out") */
/* list of our memory sections */
MEMORY
{
ram : o = 0x1000, l = 256k
}
/* place sections */
SECTIONS
{
/* code and constants */
.text :
{
__text_start = .;
*(.text)
*(.strings)
__text_end = .;
} > ram
/* initialized data */
.data :
{
__data_start = .;
*(.data)
__data_end = .;
} > ram
/* uninitialized data */
.bss :
{
__bss_start = . ;
*(.bss)
*(COMMON)
__bss_end = . ;
} > ram
}
Basically, it means concatenate everything to a target address in ram at 0x1000.
If I select any target besides a PE file I get:
Cannot perform PE operations on a non PE file...
Which other messages here say is a known issue with ld on windows. So I perform another
step to create a binary using objcopy, which works fine.
Here are the output sections:
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000038 ffc01000 ffc01000 00001000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rdata 00000010 ffc02000 ffc02000 00002000 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .eh_frame 00000038 ffc03000 ffc03000 00003000 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
So the sections are being aligned on a page frame address (4kb), which I like.
Here is the disassembly of the .text frame:
test.pe: file format pe-i386
Disassembly of section .text:
ffc01000 <_crap>:
ffc01000: 55 push ebp
ffc01001: 89 e5 mov ebp,esp
ffc01003: 83 ec 10 sub esp,0x10
ffc01006: c7 45 f8 00 20 00 00 mov DWORD PTR [ebp-0x8],0x2000
ffc0100d: c7 45 fc 00 80 0b 00 mov DWORD PTR [ebp-0x4],0xb8000
ffc01014: eb 14 jmp ffc0102a <_crap+0x2a>
ffc01016: 8b 45 f8 mov eax,DWORD PTR [ebp-0x8]
ffc01019: 8a 00 mov al,BYTE PTR [eax]
ffc0101b: 66 98 cbw
ffc0101d: 8b 55 fc mov edx,DWORD PTR [ebp-0x4]
ffc01020: 66 89 02 mov WORD PTR [edx],ax
ffc01023: 83 45 fc 02 add DWORD PTR [ebp-0x4],0x2
ffc01027: ff 45 f8 inc DWORD PTR [ebp-0x8]
ffc0102a: 83 7d f8 00 cmp DWORD PTR [ebp-0x8],0x0
ffc0102e: 75 e6 jne ffc01016 <_crap+0x16>
ffc01030: b8 00 00 00 00 mov eax,0x0
ffc01035: c9 leave
ffc01036: c3 ret
ffc01037: 90 nop
So I have no idea how or why it located the code at 0xffc01000, that is not what I
specified in the linker instruction file.
With the idea that this was some artifact of the PE file format, I tried to select
other output formats like elf, but if I do that I hit the "cannot perform PE operations
on non PE output file" bug.
The questions are:
Why does it locate to 0xffc01000? This doesn't appear to make sense even in the context
of a Windows program.
Is there a way to properly locate the program?
Do I need to use a i386 gcc build that is not tied to Windows?
Thank you,
Scott Moore