What happens when executing an illegal NEON instruction in thumb2 elf? - linux-kernel

Say we have an thumb2 elf file with following disassemble snippet by objdump:
00279ae0 <some_func>:
279ae0: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
279ae4: 4606 mov r6, r0
279ae6: f8df 9338 ldr.w r9, [pc, #824]
279aea: f44f 7380 mov.w r3, #256
....
279af2: 44f9 add r9, pc
279af4: ed2d 8b02 vpush {d8}
279af8: f8d6 108c ldr.w r1, [r6, #140]
1)
if I modify line 279af2 to some illegal instructions, like "ffff", than uppon executing, process will get a SIGILL/ILL_ILLOPC when running into ffff
2)
if I modify line 279af4 to illegal instructions ed2d ffff, process will just exit WITHOUT any signal received or any output in kmsg...... I really want to know why this happens only to NEON instructions? In this case, I'm expecting some error hint, but there is none... where can I find extra error hint other than kernel message?
Thank you guys so much.

Related

No FPU support with gcc for ARM Cortex M?

I have the following function from a well known benchmark that I am compiling with gcc-arm-none-eabi-10-2020-q4-major:
#include <unistd.h>
double b[1000], c[1000];
void tuned_STREAM_Scale(double scalar)
{
ssize_t j;
for (j = 0; j < 1000; j++)
b[j] = scalar* c[j];
}
I am using the following compiler options:
arm-none-eabi-gcc -O3 -mcpu=cortex-m7 -mthumb -mfloat-abi=hard -mfpu=fpv5-sp-d16 -c test.c
However, if I check the compiled code, the compiler seems unable to use a basic FPU multiply instruction, and just uses the __aeabi_dmul function from libgcc (we can however see that a FPU vmov is used):
00000000 <tuned_STREAM_Scale>:
0: e92d 41f0 stmdb sp!, {r4, r5, r6, r7, r8, lr}
4: 4c08 ldr r4, [pc, #32] ; (28 <tuned_STREAM_Scale+0x28>)
6: 4d09 ldr r5, [pc, #36] ; (2c <tuned_STREAM_Scale+0x2c>)
8: f504 58fa add.w r8, r4, #8000 ; 0x1f40
c: ec57 6b10 vmov r6, r7, d0
10: e8f4 0102 ldrd r0, r1, [r4], #8
14: 4632 mov r2, r6
16: 463b mov r3, r7
18: f7ff fffe bl 0 <__aeabi_dmul>
1c: 4544 cmp r4, r8
1e: e8e5 0102 strd r0, r1, [r5], #8
22: d1f5 bne.n 10 <tuned_STREAM_Scale+0x10>
24: e8bd 81f0 ldmia.w sp!, {r4, r5, r6, r7, r8, pc}
If I compare with another compiler, the code is incomparably more efficient:
00000000 <tuned_STREAM_Scale>:
0: 4808 ldr r0, [pc, #32] ; (24 <tuned_STREAM_Scale+0x24>)
2: b580 push {r7, lr}
4: 4b06 ldr r3, [pc, #24] ; (20 <tuned_STREAM_Scale+0x20>)
6: 27c8 movs r7, #200 ; 0xc8
8: c806 ldmia r0!, {r1, r2}
a: ec42 1b11 vmov d1, r1, r2
e: ee20 1b01 vmul.f64 d1, d0, d1
12: 1e7f subs r7, r7, #1
14: ec52 1b11 vmov r1, r2, d1
18: c306 stmia r3!, {r1, r2}
1a: d1f5 bne.n 8 <tuned_STREAM_Scale+0x8>
1c: bd80 pop {r7, pc}
If I check inside gcc package the various libgcc object files depending on CPU or FPU options, I cannot find any FPU instructions in __aeabi_dmul or any other function.
I find very strange that gcc is not able to use a basic FPU multiplication, and I could not find in any documentation or README this limitation, so I am wondering if I am not doing anything wrong. I have checked older gcc versions and I still have this problem. Would it be due to gcc or to the compiled binaries from ARM?
The clue is in the compiler options you already posted:
-mfpu=fpv5-sp-d16 "sp" means single precision.
You told it not to generate hardware double instructions, which is correct for most Cortex-M7 processors because they can't execute them. If you have an M7 which can then you need to set the correct fpu argument.

Getting an label address to a register on THUMB assembly - Armv5

I am trying to get the the address of a label in thumb assembly and I am having some trouble.
I already read this post but that cannot help me and I will explain why.
I am writing an simple program with Thumb assembly ( unfortunately I cannot use Thumb2 ).
Let's consider this code:
.arch armv5te
.syntax unified
.text
.thumb
.thumb_func
thumbnow:
0x0 PUSH {LR}
0x2 LDR R0, =loadValues
0x4 POP {PC}
.align
loadValues:
0x8 .word 0xdeadbee1
0xC .word 0xdeadbee2
0x10 .word 0xdeadbee3
I am using the arm-linux-gnueabi toolchain to assemble that.
My microcontroller doesn't have an MMU so the memory address are static, no virtual pages etc.
The thing that I am trying to do is to make R0 having the value of 0x8 here so that then I can access the three words like this:
LDR R1, [R0]
LDR R2, [R0,#4]
LDR R3, [R0,#8]
This is not possible with LDR though because the value in the word is not possible to fit in a MOV command. The documentation of the assembler states that if the value cannot fit in a MOV command then it will put the value in a literal pool.
So my question is, is it possible in Thumb assembly to get the actual address of the label if the content of the address cannot fit in a MOV command?
Starting with this
.thumb
ldr r0,=hello
adr r0,hello
nop
nop
nop
nop
hello:
.word 0,1,2,3
gives this unlinked
00000000 <hello-0xc>:
0: 4806 ldr r0, [pc, #24] ; (1c <hello+0x10>)
2: a002 add r0, pc, #8 ; (adr r0, c <hello>)
4: 46c0 nop ; (mov r8, r8)
6: 46c0 nop ; (mov r8, r8)
8: 46c0 nop ; (mov r8, r8)
a: 46c0 nop ; (mov r8, r8)
0000000c <hello>:
c: 00000000 andeq r0, r0, r0
10: 00000001 andeq r0, r0, r1
14: 00000002 andeq r0, r0, r2
18: 00000003 andeq r0, r0, r3
1c: 0000000c andeq r0, r0, r12
linked
00001000 <hello-0xc>:
1000: 4806 ldr r0, [pc, #24] ; (101c <hello+0x10>)
1002: a002 add r0, pc, #8 ; (adr r0, 100c <hello>)
1004: 46c0 nop ; (mov r8, r8)
1006: 46c0 nop ; (mov r8, r8)
1008: 46c0 nop ; (mov r8, r8)
100a: 46c0 nop ; (mov r8, r8)
0000100c <hello>:
100c: 00000000 andeq r0, r0, r0
1010: 00000001 andeq r0, r0, r1
1014: 00000002 andeq r0, r0, r2
1018: 00000003 andeq r0, r0, r3
101c: 0000100c andeq r1, r0, r12
both ways r0 will return the address to the start of data from which you can then offset into that data from the caller or wherever.
Edit
.thumb
adr r0,hello
nop
nop
nop
arm-none-eabi-as so.s -o so.o
so.s: Assembler messages:
so.s:2: Error: address calculation needs a strongly defined nearby symbol
So the tool won't turn that into a load from the pool for you.
For what you want to do I think the pc relative add (adr) is the best you are going to get. You can try other toolchains as all of this is language and toolchain specific (assembly language is defined by the assembler not the target and for each toolchain (with an assembler) there can be differences in the language). Over time within gnu, how the linker and assembler worked together has changed, the linker patches up things it didn't used to.
You could of course go into the linker and add code to it to perform this optimization, the problem is most likely that by link time the linker is looking to resolve an address in the pool which is easy for it to do it doesn't have to change the instruction, the assembler would have to leave information for the linker that this is not just a fill this memory location with an address thing, either you modify gas to allow adr to work, and then if the linker cant resolve it within the instruction then the linker bails out with an error.'
Or you could just hard-code what you want and maintain it. I am not sure why the adr solution isn't adequate.
mov r0,#8 is a valid thumb instruction.

How to align to cache line GCC ldr pc-relative

In ARM, GCC uses the PC-relative load is usually used to load constants into registers. The idea is that you store the constant relative to the instruction loading the constant. E.g. the following instruction can be used to load a constant from the address PC+8+offset
ldr r0, [pc, #offset]
As result, the .text segment interleaves instructions and data. The latter usually stored at the end of function's code. E.g.
00010860 <call_weak_fn>:
10860: e59f3014 ldr r3, [pc, #20] ; 1087c <call_weak_fn+0x1c>
10864: e59f2014 ldr r2, [pc, #20] ; 10880 <call_weak_fn+0x20>
10868: e08f3003 add r3, pc, r3
1086c: e7932002 ldr r2, [r3, r2]
10870: e3520000 cmp r2, #0
10874: 012fff1e bxeq lr
10878: e1a00000 nop ; (mov r0, r0)
1087c: 00089790 muleq r8, r0, r7
10880: 00000074 andeq r0, r0, r4, ror r0
For a research project, I would like to ensure that code and constant never reside on the same cache line (i.e. block 64 bytes aligned).
Is it possible to align the constants generated by GCC?

"Access to unaligned memory location, bad address=ffffff"

I'm trying to read integers from an input.txt file, below is my read loop where I'm attempting to read and store the integers into an array. I keep getting "Access to unaligned memory location, bad address=ffffff" on any line after the line with "LDR R2, [R2,R5,LSL #2]...im using ARM SIM. Does anyone know what I'm doing wrong?
start:
MOV R5, #0 #int i
MOV R1, #0
swi SWI_Open
LDR R1,=InFileH
STR R0,[R1]
MOV R3, #0
readloop:
LDR R0, =InFileH
LDR R0, [R0]
swi SWI_RdInt
CMP R0, #0
BEQ readdone
#the int is now in R0
MOV R1, R0
LDR R3,=a
STR R2,[R3,R5,LSR#2]
MOV R2, R1
ADD R5, R5, #1 #i++
bal readloop
readdone:
MOV R0, #0
swi SWI_Close
swi SWI_Exit
.data
.align 4
InFileH: .skip 4
InFile: .asciz "numbers.txt"
OutFile: .asciz "numsort.txt"
OutFileH: .skip 4
NewLine: .asciz "\n"
a: .skip 400
i had faced similar issue while programming arm assembly
this was because it was expecting offset in multiples of 4
STR R2, [R1, #2]
the above instruction throws the similar error. so it was resolved by using
STR R2, [R1, #4]
for better understanding clickhere

Hardfault on STM32F030 startup, __libc_init_array

I'm trying to get a STM32Cube project compiled using arm-none-eabi-gcc and a Makefile.
I have specified:
CFLAGS = -mthumb\
-march=armv6-m\
-mlittle-endian\
-mcpu=cortex-m0\
-ffunction-sections\
-fdata-sections\
-MMD\
-std=c99\
-Wall\
-g\
-D$(PART)\
-c
and:
LDFLAGS = -Wl,--gc-sections\
-Wl,-T$(LDFILE)\
-Wl,-v
The FW builds without problems.but when I boot the MCU i get stuck in Hard Fault.
Stack trace is:
#0 HardFault_Handler () at ./Src/main.c:156
#1 <signal handler called>
#2 0x0800221c in ____libc_init_array_from_thumb ()
#3 0x080021be in LoopFillZerobss () at Src/startup_stm32f030x8.s:103
#4 0x080021be in LoopFillZerobss () at Src/startup_stm32f030x8.s:103
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
and I go straight to Hard Fault when stepping to bl __libc_init_array in the startup file.
/* Zero fill the bss segment. */
FillZerobss:
movs r3, #0
str r3, [r2]
adds r2, r2, #4
LoopFillZerobss:
ldr r3, = _ebss
cmp r2, r3
bcc FillZerobss
/* Call the clock system intitialization function.*/
bl SystemInit
/* Call static constructors */
bl __libc_init_array
/* Call the application's entry point.*/
bl main
Any ideas what could be wrong?
My arm-none-eabi-gcc version is 4.8.4 20140725 (release)
[edit]
The disassembly of the calls
08002218 <____libc_init_array_from_thumb>:
8002218: 4778 bx pc
800221a: 46c0 nop ; (mov r8, r8)
800221c: eafff812 b 800026c <__libc_init_array>
0800026c <__libc_init_array>:
800026c: e92d4070 push {r4, r5, r6, lr}
8000270: e59f506c ldr r5, [pc, #108] ; 80002e4 <__libc_init_array+0x78>
8000274: e59f606c ldr r6, [pc, #108] ; 80002e8 <__libc_init_array+0x7c>
8000278: e0656006 rsb r6, r5, r6
800027c: e1b06146 asrs r6, r6, #2
8000280: 12455004 subne r5, r5, #4
8000284: 13a04000 movne r4, #0
8000288: 0a000005 beq 80002a4 <__libc_init_array+0x38>
800028c: e2844001 add r4, r4, #1
8000290: e5b53004 ldr r3, [r5, #4]!
8000294: e1a0e00f mov lr, pc
8000298: e12fff13 bx r3
800029c: e1560004 cmp r6, r4
80002a0: 1afffff9 bne 800028c <__libc_init_array+0x20>
80002a4: e59f5040 ldr r5, [pc, #64] ; 80002ec <__libc_init_array+0x80>
80002a8: e59f6040 ldr r6, [pc, #64] ; 80002f0 <__libc_init_array+0x84>
80002ac: e0656006 rsb r6, r5, r6
80002b0: eb0007ca bl 80021e0 <_init>
80002b4: e1b06146 asrs r6, r6, #2
80002b8: 12455004 subne r5, r5, #4
80002bc: 13a04000 movne r4, #0
80002c0: 0a000005 beq 80002dc <__libc_init_array+0x70>
80002c4: e2844001 add r4, r4, #1
80002c8: e5b53004 ldr r3, [r5, #4]!
80002cc: e1a0e00f mov lr, pc
80002d0: e12fff13 bx r3
80002d4: e1560004 cmp r6, r4
80002d8: 1afffff9 bne 80002c4 <__libc_init_array+0x58>
80002dc: e8bd4070 pop {r4, r5, r6, lr}
80002e0: e12fff1e bx lr
80002e4: 08002258 .word 0x08002258
80002e8: 08002258 .word 0x08002258
80002ec: 08002258 .word 0x08002258
80002f0: 08002260 .word 0x08002260
[edit 2]
The register values from gdb:
(gdb) info reg
r0 0x20000000 536870912
r1 0x1 1
r2 0x0 0
r3 0x40021000 1073876992
r4 0xffffffff -1
r5 0xffffffff -1
r6 0xffffffff -1
r7 0x20001fd0 536879056
r8 0xffffffff -1
r9 0xffffffff -1
r10 0xffffffff -1
r11 0xffffffff -1
r12 0xffffffff -1
sp 0x20001fd0 0x20001fd0
lr 0xfffffff9 -7
pc 0x800067c 0x800067c <HardFault_Handler+4>
xPSR 0x61000003 1627389955
That __libc_init_array is ARM code, not Thumb, hence the M0 will fall over trying to execute some nonsense it doesn't understand (actually, it never quite gets there since it faults on the attempt to switch to ARM state in the bx, but hey, same difference...)
You'll need to make sure you use pure-Thumb versions of any libraries - a Cortex-M-specific toolchain might be a better bet than a generic ARM one. If you have a multilib toolchain, I'd suggest checking the output of arm-none-eabi-gcc --print-multi-lib to make sure you've specified all the relevant options to get proper Cortex-M libraries, and if you're using a separate link step, make sure you invoke it with LD=arm-none-eabi-gcc (plus the relevant multilib options), rather than LD=arm-none-eabi-ld.

Resources