Getting an label address to a register on THUMB assembly - Armv5 - gcc

I am trying to get the the address of a label in thumb assembly and I am having some trouble.
I already read this post but that cannot help me and I will explain why.
I am writing an simple program with Thumb assembly ( unfortunately I cannot use Thumb2 ).
Let's consider this code:
.arch armv5te
.syntax unified
.text
.thumb
.thumb_func
thumbnow:
0x0 PUSH {LR}
0x2 LDR R0, =loadValues
0x4 POP {PC}
.align
loadValues:
0x8 .word 0xdeadbee1
0xC .word 0xdeadbee2
0x10 .word 0xdeadbee3
I am using the arm-linux-gnueabi toolchain to assemble that.
My microcontroller doesn't have an MMU so the memory address are static, no virtual pages etc.
The thing that I am trying to do is to make R0 having the value of 0x8 here so that then I can access the three words like this:
LDR R1, [R0]
LDR R2, [R0,#4]
LDR R3, [R0,#8]
This is not possible with LDR though because the value in the word is not possible to fit in a MOV command. The documentation of the assembler states that if the value cannot fit in a MOV command then it will put the value in a literal pool.
So my question is, is it possible in Thumb assembly to get the actual address of the label if the content of the address cannot fit in a MOV command?

Starting with this
.thumb
ldr r0,=hello
adr r0,hello
nop
nop
nop
nop
hello:
.word 0,1,2,3
gives this unlinked
00000000 <hello-0xc>:
0: 4806 ldr r0, [pc, #24] ; (1c <hello+0x10>)
2: a002 add r0, pc, #8 ; (adr r0, c <hello>)
4: 46c0 nop ; (mov r8, r8)
6: 46c0 nop ; (mov r8, r8)
8: 46c0 nop ; (mov r8, r8)
a: 46c0 nop ; (mov r8, r8)
0000000c <hello>:
c: 00000000 andeq r0, r0, r0
10: 00000001 andeq r0, r0, r1
14: 00000002 andeq r0, r0, r2
18: 00000003 andeq r0, r0, r3
1c: 0000000c andeq r0, r0, r12
linked
00001000 <hello-0xc>:
1000: 4806 ldr r0, [pc, #24] ; (101c <hello+0x10>)
1002: a002 add r0, pc, #8 ; (adr r0, 100c <hello>)
1004: 46c0 nop ; (mov r8, r8)
1006: 46c0 nop ; (mov r8, r8)
1008: 46c0 nop ; (mov r8, r8)
100a: 46c0 nop ; (mov r8, r8)
0000100c <hello>:
100c: 00000000 andeq r0, r0, r0
1010: 00000001 andeq r0, r0, r1
1014: 00000002 andeq r0, r0, r2
1018: 00000003 andeq r0, r0, r3
101c: 0000100c andeq r1, r0, r12
both ways r0 will return the address to the start of data from which you can then offset into that data from the caller or wherever.
Edit
.thumb
adr r0,hello
nop
nop
nop
arm-none-eabi-as so.s -o so.o
so.s: Assembler messages:
so.s:2: Error: address calculation needs a strongly defined nearby symbol
So the tool won't turn that into a load from the pool for you.
For what you want to do I think the pc relative add (adr) is the best you are going to get. You can try other toolchains as all of this is language and toolchain specific (assembly language is defined by the assembler not the target and for each toolchain (with an assembler) there can be differences in the language). Over time within gnu, how the linker and assembler worked together has changed, the linker patches up things it didn't used to.
You could of course go into the linker and add code to it to perform this optimization, the problem is most likely that by link time the linker is looking to resolve an address in the pool which is easy for it to do it doesn't have to change the instruction, the assembler would have to leave information for the linker that this is not just a fill this memory location with an address thing, either you modify gas to allow adr to work, and then if the linker cant resolve it within the instruction then the linker bails out with an error.'
Or you could just hard-code what you want and maintain it. I am not sure why the adr solution isn't adequate.
mov r0,#8 is a valid thumb instruction.

Related

Does arm-none-eabi-ld rewrite the bl instruction?

I'm trying to understand why some Cortex-M0 code behaves differently when it is linked versus unlinked. In both cases it is loaded to 0x20000000. It looks like despite my best efforts to generate position independent code by passing -fPIC to the compiler, the bl instruction appears to differ after the code has passed through the linker. Am I reading this correctly, is that just a part of the linker's job in ARM Thumb, and is there a better way to generate a position independent function call?
Linked:
20000000:
20000000: 0003 movs r3, r0
20000002: 4852 ldr r0, [pc, #328]
20000004: 4685 mov sp, r0
20000006: 0018 movs r0, r3
20000008: f000 f802 bl 20000010
2000000c: 46c0 nop ; (mov r8, r8)
2000000e: 46c0 nop ; (mov r8, r8)
Unlinked:
00000000:
0: 0003 movs r3, r0
2: 4852 ldr r0, [pc, #328]
4: 4685 mov sp, r0
6: 0018 movs r0, r3
8: f7ff fffe bl 10
c: 46c0 nop ; (mov r8, r8)
e: 46c0 nop ; (mov r8, r8)
start.s
.globl _start
_start:
.word 0x20001000
.word reset
.word hang
.word hang
.thumb
.thumb_func
reset:
bl notmain
.thumb_func
hang:
b .
notmain.c
unsigned int x;
unsigned int fun ( unsigned int );
void notmain ( void )
{
x=fun(x+5);
}
fun.c
unsigned int y;
unsigned int fun ( unsigned int z )
{
return(y+z+1);
}
memmap
MEMORY
{
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.bss*) } > ram
}
build
arm-none-eabi-as start.s -o start.o
arm-none-eabi-gcc -fPIC -O2 -c -mthumb fun.c -o fun.o
arm-none-eabi-gcc -fPIC -O2 -c -mthumb notmain.c -o notmain.o
arm-none-eabi-ld -T memmap start.o notmain.o fun.o -o so.elf
produces
20000000 <_start>:
20000000: 20001000 andcs r1, r0, r0
20000004: 20000011 andcs r0, r0, r1, lsl r0
20000008: 20000015 andcs r0, r0, r5, lsl r0
2000000c: 20000015 andcs r0, r0, r5, lsl r0
20000010 <reset>:
20000010: f000 f802 bl 20000018 <notmain>
20000014 <hang>:
20000014: e7fe b.n 20000014 <hang>
...
20000018 <notmain>:
20000018: b510 push {r4, lr}
2000001a: 4b06 ldr r3, [pc, #24] ; (20000034 <notmain+0x1c>)
2000001c: 4a06 ldr r2, [pc, #24] ; (20000038 <notmain+0x20>)
2000001e: 447b add r3, pc
20000020: 589c ldr r4, [r3, r2]
20000022: 6823 ldr r3, [r4, #0]
20000024: 1d58 adds r0, r3, #5
20000026: f000 f809 bl 2000003c <fun>
2000002a: 6020 str r0, [r4, #0]
2000002c: bc10 pop {r4}
2000002e: bc01 pop {r0}
20000030: 4700 bx r0
20000032: 46c0 nop ; (mov r8, r8)
20000034: 00000032 andeq r0, r0, r2, lsr r0
20000038: 00000000 andeq r0, r0, r0
2000003c <fun>:
2000003c: 4b03 ldr r3, [pc, #12] ; (2000004c <fun+0x10>)
2000003e: 4a04 ldr r2, [pc, #16] ; (20000050 <fun+0x14>)
20000040: 447b add r3, pc
20000042: 589b ldr r3, [r3, r2]
20000044: 681b ldr r3, [r3, #0]
20000046: 3301 adds r3, #1
20000048: 1818 adds r0, r3, r0
2000004a: 4770 bx lr
2000004c: 00000010 andeq r0, r0, r0, lsl r0
20000050: 00000004 andeq r0, r0, r4
Disassembly of section .got:
20000054 <.got>:
20000054: 20000068 andcs r0, r0, r8, rrx
20000058: 2000006c andcs r0, r0, ip, rrx
Disassembly of section .got.plt:
2000005c <_GLOBAL_OFFSET_TABLE_>:
...
Disassembly of section .bss:
20000068 <x>:
20000068: 00000000 andeq r0, r0, r0
2000006c <y>:
2000006c: 00000000 andeq r0, r0, r0
when it wants to find the global variable x what it appears to have done is it takes the program counter and a linker supplied/modfied offset 0x32 and uses that to find the entry in the global offset table. then takes an offset from that to find X. same for Y. so it appears that when you relocate you will need to modify the global offset table at runtime or load time depending.
If I get rid of those global variables, other than the vector table which is hardcoded and not PIC (and wasnt compiled anyway), this is all position independent.
20000000 <_start>:
20000000: 20001000 andcs r1, r0, r0
20000004: 20000011 andcs r0, r0, r1, lsl r0
20000008: 20000015 andcs r0, r0, r5, lsl r0
2000000c: 20000015 andcs r0, r0, r5, lsl r0
20000010 <reset>:
20000010: f000 f802 bl 20000018 <notmain>
20000014 <hang>:
20000014: e7fe b.n 20000014 <hang>
...
20000018 <notmain>:
20000018: b508 push {r3, lr}
2000001a: 2005 movs r0, #5
2000001c: f000 f804 bl 20000028 <fun>
20000020: 3006 adds r0, #6
20000022: bc08 pop {r3}
20000024: bc02 pop {r1}
20000026: 4708 bx r1
20000028 <fun>:
20000028: 3001 adds r0, #1
2000002a: 4770 bx lr
back to this version
unsigned int y;
unsigned int fun ( unsigned int z )
{
return(y+z+1);
}
position independent
00000000 <fun>:
0: 4b03 ldr r3, [pc, #12] ; (10 <fun+0x10>)
2: 4a04 ldr r2, [pc, #16] ; (14 <fun+0x14>)
4: 447b add r3, pc
6: 589b ldr r3, [r3, r2]
8: 681b ldr r3, [r3, #0]
a: 3301 adds r3, #1
c: 1818 adds r0, r3, r0
e: 4770 bx lr
10: 00000008 andeq r0, r0, r8
14: 00000000 andeq r0, r0, r0
not position independent
00000000 <fun>:
0: 4b02 ldr r3, [pc, #8] ; (c <fun+0xc>)
2: 681b ldr r3, [r3, #0]
4: 3301 adds r3, #1
6: 1818 adds r0, r3, r0
8: 4770 bx lr
a: 46c0 nop ; (mov r8, r8)
c: 00000000 andeq r0, r0, r0
the code has to do a bit more work to access the external variable. position dependent, some work because it is external but not as much. the linker will fill in the required items to make it work...to link it...
the elf file contains information for the linker to know to do this.
Relocation section '.rel.text' at offset 0x1a4 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000010 00000a19 R_ARM_BASE_PREL 00000000 _GLOBAL_OFFSET_TABLE_
00000014 00000b1a R_ARM_GOT_BREL 00000004 y
or
Relocation section '.rel.text' at offset 0x174 contains 1 entries:
Offset Info Type Sym.Value Sym. Name
0000000c 00000a02 R_ARM_ABS32 00000004 y
notmain had these PIC
Relocation section '.rel.text' at offset 0x1cc contains 3 entries:
Offset Info Type Sym.Value Sym. Name
0000000e 00000a0a R_ARM_THM_CALL 00000000 fun
0000001c 00000b19 R_ARM_BASE_PREL 00000000 _GLOBAL_OFFSET_TABLE_
00000020 00000c1a R_ARM_GOT_BREL 00000004 x
and without.
Relocation section '.rel.text' at offset 0x198 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000008 00000a0a R_ARM_THM_CALL 00000000 fun
00000014 00000b02 R_ARM_ABS32 00000004 x
so in short the toolchain is doing its job, you dont need to re-do its job. And note this has nothing to do with arm or thumb. any time you use the object and linker model and allow for external items from an object the linker has to patch things up to glue the code together. thats just how it works.

How to align to cache line GCC ldr pc-relative

In ARM, GCC uses the PC-relative load is usually used to load constants into registers. The idea is that you store the constant relative to the instruction loading the constant. E.g. the following instruction can be used to load a constant from the address PC+8+offset
ldr r0, [pc, #offset]
As result, the .text segment interleaves instructions and data. The latter usually stored at the end of function's code. E.g.
00010860 <call_weak_fn>:
10860: e59f3014 ldr r3, [pc, #20] ; 1087c <call_weak_fn+0x1c>
10864: e59f2014 ldr r2, [pc, #20] ; 10880 <call_weak_fn+0x20>
10868: e08f3003 add r3, pc, r3
1086c: e7932002 ldr r2, [r3, r2]
10870: e3520000 cmp r2, #0
10874: 012fff1e bxeq lr
10878: e1a00000 nop ; (mov r0, r0)
1087c: 00089790 muleq r8, r0, r7
10880: 00000074 andeq r0, r0, r4, ror r0
For a research project, I would like to ensure that code and constant never reside on the same cache line (i.e. block 64 bytes aligned).
Is it possible to align the constants generated by GCC?

How do I link to external THUMB code?

I'm writing THUMB code for an embedded core (ARM7TDMI) that needs to be linked to existing THUMB code. I'm using the GNU ARM embedded toolchain (link). I cannot get the linker to treat the existing external code as THUMB; it seems to always think that it's ARM. The existing code that I'm linking to is absolutely static and cannot be changed/recompiled (it's a plain binary sitting on a ROM chip, basically).
Here is an example program, multiply.c, that demonstrates the issue:
extern int externalFunction(int x);
int multiply(int x, int y)
{
return externalFunction(x * y);
}
Compiled using:
arm-none-eabi-gcc -o multiply.o -c -O3 multiply.c -march=armv4t -mtune=arm7tdmi -mthumb
arm-none-eabi-ld -o linked.o multiply.o -T symbols.txt
Where symbols.txt is a simple linker script:
SECTIONS
{
.text 0x8000000 : { *(.text) }
}
externalFunction = 0x8002000;
When I objdump -d linked.o, I get:
08000000 <multiply>:
8000000: b510 push {r4, lr}
8000002: 4348 muls r0, r1
8000004: f000 f804 bl 8000010 <__externalFunction_from_thumb>
8000008: bc10 pop {r4}
800000a: bc02 pop {r1}
800000c: 4708 bx r1
800000e: 46c0 nop ; (mov r8, r8)
08000010 <__externalFunction_from_thumb>:
8000010: 4778 bx pc
8000012: 46c0 nop ; (mov r8, r8)
8000014: ea0007f9 b 8002000 <externalFunction>
Instead of branching directly to 0x8002000, it branches to a stub that switches to ARM mode first and then branches to 0x8002000 in ARM mode. I want that BL to branch directly to 0x8002000 and stay in THUMB mode, so that I'd get this instead:
08000000 <multiply>:
8000000: b510 push {r4, lr}
8000002: 4348 muls r0, r1
8000004: ???? ???? bl 8002000 <__externalFunction>
8000008: bc10 pop {r4}
800000a: bc02 pop {r1}
800000c: 4708 bx r1
ABI and calling convention issues aside, how do I achieve this?
one way to do it is make it do what you want
branchto.s
.thumb
.thumb_func
.globl branchto
branchto:
bx r0
so.c
extern unsigned int externalFunction;
extern int branchto ( unsigned int, int );
int fun ( int x )
{
return(branchto(externalFunction,x)+3);
}
so.ld
SECTIONS
{
.text 0x8000000 : { *(.text) }
}
externalFunction = 0x8002001;
producing
08000000 <fun>:
8000000: 4b04 ldr r3, [pc, #16] ; (8000014 <fun+0x14>)
8000002: b510 push {r4, lr}
8000004: 0001 movs r1, r0
8000006: 6818 ldr r0, [r3, #0]
8000008: f000 f806 bl 8000018 <branchto>
800000c: 3003 adds r0, #3
800000e: bc10 pop {r4}
8000010: bc02 pop {r1}
8000012: 4708 bx r1
8000014: 08002001 stmdaeq r0, {r0, sp}
08000018 <branchto>:
8000018: 4700 bx r0
Ross Ridge's solution in the comments works
static int (* const externalFunction)(int x) = (int (*)(int)) 0x80002001;
int fun ( int x )
{
return((* externalFunction)(x)+3);
}
but the hardcoded address is in the code not the linker script if that matters, was trying to solve that and couldnt.
08000000 <fun>:
8000000: b510 push {r4, lr}
8000002: 4b03 ldr r3, [pc, #12] ; (8000010 <fun+0x10>)
8000004: f000 f806 bl 8000014 <fun+0x14>
8000008: 3003 adds r0, #3
800000a: bc10 pop {r4}
800000c: bc02 pop {r1}
800000e: 4708 bx r1
8000010: 80002001 andhi r2, r0, r1
8000014: 4718 bx r3
8000016: 46c0 nop ; (mov r8, r8)
I prefer the assembly solution for something like this to force the exact instruction I want. Naturally if you had linked in the external function it would/should have just worked (there are some exceptions but gnu is getting really good at resolving the to and from arm/thumb for you in the linker).
I dont see it as a gnu bug actually, but instead they need a way in the linker script to declare that variable as a thumb function address rather than just some generic linker defined variable (likewise as an arm function address). Just like .thumb_func does (or a longer function/procedure declaration)
.word branchto
.thumb
.globl branchto
branchto:
bx r0
8000018: 0800001c stmdaeq r0, {r2, r3, r4}
0800001c <branchto>:
800001c: 4700 bx r0
.word branchto
.thumb
.thumb_func
.globl branchto
branchto:
bx r0
8000018: 0800001d stmdaeq r0, {r0, r2, r3, r4}
0800001c <branchto>:
800001c: 4700 bx r0
by just reading the gnu linker documentation there may be hope to get what you want
SECTIONS
{
.text0 0x08000000 : { so.o }
.text1 0x08002000 (NOLOAD) : { ex.o }
}
ex.o comming from a dummy function to make everyone happy
int externalFunction ( int x )
{
return(x);
}
08000000 <fun>:
8000000: b510 push {r4, lr}
8000002: f001 fffd bl 8002000 <externalFunction>
8000006: 3003 adds r0, #3
8000008: bc10 pop {r4}
800000a: bc02 pop {r1}
800000c: 4708 bx r1
and the NOLOAD keeps the dummy function out of the binary.
arm-none-eabi-objcopy so.elf -O srec --srec-forceS3 so.srec
S00A0000736F2E7372656338
S3150800000010B501F0FDFF033010BC02BC0847C0461E
S315080000104743433A2028474E552920362E322E305C
S31508000020004129000000616561626900011F000046
S3150800003000053454000602080109011204140115CA
S31008000040011703180119011A011E021E
S70500000000FA
note it wasnt perfect there was extra garbage that got pulled in, perhaps symbols
08000000 <fun>:
8000000: b510 push {r4, lr}
8000002: f001 fffd bl 8002000 <externalFunction>
8000006: 3003 adds r0, #3
8000008: bc10 pop {r4}
800000a: bc02 pop {r1}
800000c: 4708 bx r1
800000e: 46c0 nop ; (mov r8, r8)
8000010: 3a434347
8000014: 4e472820
8000018: 36202955
800001c: 302e322e
8000020: 00294100
8000024: 65610000
8000028: 00696261
800002c: 00001f01
8000030: 54340500
8000034: 08020600
8000038: 12010901
800003c: 15011404
8000040: 18031701
8000044: 1a011901
which you can see in the srec, but the 0x08002000 code is not there so your actual external function will get called.
I would go with just making the instruction you want or function pointers with an assignment if you dont want any asm.
The other comments/answers using long branches do work, but it would still be nice to have a direct BL call and avoid the unnecessary load.
I believe I've found a workaround here. Create a dummy file (let's call it ext.c) with:
__attribute__((naked)) int externalFunction(int x){}
Compile this file to ext.o (same way as you compile multiply.c). This generates a dummy object file with a correctly decorated function symbol for externalFunction, whose address gets overridden by the linker script, resulting in the desired BL instruction:
Disassembly of section .text:
08000000 <multiply>:
8000000: b510 push {r4, lr}
8000002: 4348 muls r0, r1
8000004: f001 fffc bl 8002000 <externalFunction>
8000008: bc10 pop {r4}
800000a: bc02 pop {r1}
800000c: 4708 bx r1
800000e: 46c0 nop ; (mov r8, r8)

What happens when executing an illegal NEON instruction in thumb2 elf?

Say we have an thumb2 elf file with following disassemble snippet by objdump:
00279ae0 <some_func>:
279ae0: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
279ae4: 4606 mov r6, r0
279ae6: f8df 9338 ldr.w r9, [pc, #824]
279aea: f44f 7380 mov.w r3, #256
....
279af2: 44f9 add r9, pc
279af4: ed2d 8b02 vpush {d8}
279af8: f8d6 108c ldr.w r1, [r6, #140]
1)
if I modify line 279af2 to some illegal instructions, like "ffff", than uppon executing, process will get a SIGILL/ILL_ILLOPC when running into ffff
2)
if I modify line 279af4 to illegal instructions ed2d ffff, process will just exit WITHOUT any signal received or any output in kmsg...... I really want to know why this happens only to NEON instructions? In this case, I'm expecting some error hint, but there is none... where can I find extra error hint other than kernel message?
Thank you guys so much.

Hardfault on STM32F030 startup, __libc_init_array

I'm trying to get a STM32Cube project compiled using arm-none-eabi-gcc and a Makefile.
I have specified:
CFLAGS = -mthumb\
-march=armv6-m\
-mlittle-endian\
-mcpu=cortex-m0\
-ffunction-sections\
-fdata-sections\
-MMD\
-std=c99\
-Wall\
-g\
-D$(PART)\
-c
and:
LDFLAGS = -Wl,--gc-sections\
-Wl,-T$(LDFILE)\
-Wl,-v
The FW builds without problems.but when I boot the MCU i get stuck in Hard Fault.
Stack trace is:
#0 HardFault_Handler () at ./Src/main.c:156
#1 <signal handler called>
#2 0x0800221c in ____libc_init_array_from_thumb ()
#3 0x080021be in LoopFillZerobss () at Src/startup_stm32f030x8.s:103
#4 0x080021be in LoopFillZerobss () at Src/startup_stm32f030x8.s:103
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
and I go straight to Hard Fault when stepping to bl __libc_init_array in the startup file.
/* Zero fill the bss segment. */
FillZerobss:
movs r3, #0
str r3, [r2]
adds r2, r2, #4
LoopFillZerobss:
ldr r3, = _ebss
cmp r2, r3
bcc FillZerobss
/* Call the clock system intitialization function.*/
bl SystemInit
/* Call static constructors */
bl __libc_init_array
/* Call the application's entry point.*/
bl main
Any ideas what could be wrong?
My arm-none-eabi-gcc version is 4.8.4 20140725 (release)
[edit]
The disassembly of the calls
08002218 <____libc_init_array_from_thumb>:
8002218: 4778 bx pc
800221a: 46c0 nop ; (mov r8, r8)
800221c: eafff812 b 800026c <__libc_init_array>
0800026c <__libc_init_array>:
800026c: e92d4070 push {r4, r5, r6, lr}
8000270: e59f506c ldr r5, [pc, #108] ; 80002e4 <__libc_init_array+0x78>
8000274: e59f606c ldr r6, [pc, #108] ; 80002e8 <__libc_init_array+0x7c>
8000278: e0656006 rsb r6, r5, r6
800027c: e1b06146 asrs r6, r6, #2
8000280: 12455004 subne r5, r5, #4
8000284: 13a04000 movne r4, #0
8000288: 0a000005 beq 80002a4 <__libc_init_array+0x38>
800028c: e2844001 add r4, r4, #1
8000290: e5b53004 ldr r3, [r5, #4]!
8000294: e1a0e00f mov lr, pc
8000298: e12fff13 bx r3
800029c: e1560004 cmp r6, r4
80002a0: 1afffff9 bne 800028c <__libc_init_array+0x20>
80002a4: e59f5040 ldr r5, [pc, #64] ; 80002ec <__libc_init_array+0x80>
80002a8: e59f6040 ldr r6, [pc, #64] ; 80002f0 <__libc_init_array+0x84>
80002ac: e0656006 rsb r6, r5, r6
80002b0: eb0007ca bl 80021e0 <_init>
80002b4: e1b06146 asrs r6, r6, #2
80002b8: 12455004 subne r5, r5, #4
80002bc: 13a04000 movne r4, #0
80002c0: 0a000005 beq 80002dc <__libc_init_array+0x70>
80002c4: e2844001 add r4, r4, #1
80002c8: e5b53004 ldr r3, [r5, #4]!
80002cc: e1a0e00f mov lr, pc
80002d0: e12fff13 bx r3
80002d4: e1560004 cmp r6, r4
80002d8: 1afffff9 bne 80002c4 <__libc_init_array+0x58>
80002dc: e8bd4070 pop {r4, r5, r6, lr}
80002e0: e12fff1e bx lr
80002e4: 08002258 .word 0x08002258
80002e8: 08002258 .word 0x08002258
80002ec: 08002258 .word 0x08002258
80002f0: 08002260 .word 0x08002260
[edit 2]
The register values from gdb:
(gdb) info reg
r0 0x20000000 536870912
r1 0x1 1
r2 0x0 0
r3 0x40021000 1073876992
r4 0xffffffff -1
r5 0xffffffff -1
r6 0xffffffff -1
r7 0x20001fd0 536879056
r8 0xffffffff -1
r9 0xffffffff -1
r10 0xffffffff -1
r11 0xffffffff -1
r12 0xffffffff -1
sp 0x20001fd0 0x20001fd0
lr 0xfffffff9 -7
pc 0x800067c 0x800067c <HardFault_Handler+4>
xPSR 0x61000003 1627389955
That __libc_init_array is ARM code, not Thumb, hence the M0 will fall over trying to execute some nonsense it doesn't understand (actually, it never quite gets there since it faults on the attempt to switch to ARM state in the bx, but hey, same difference...)
You'll need to make sure you use pure-Thumb versions of any libraries - a Cortex-M-specific toolchain might be a better bet than a generic ARM one. If you have a multilib toolchain, I'd suggest checking the output of arm-none-eabi-gcc --print-multi-lib to make sure you've specified all the relevant options to get proper Cortex-M libraries, and if you're using a separate link step, make sure you invoke it with LD=arm-none-eabi-gcc (plus the relevant multilib options), rather than LD=arm-none-eabi-ld.

Resources