Problems uploading and debugging binaries on LPC4088 because of Boot ROM - debugging

I am trying to upload this simple assembly program:
.global _start
.text
reset: b _start
undefined: b undefined
software_interrupt: b software_interrupt
prefetch_abort: b prefetch_abort
data_abort: b data_abort
nop
interrupt_request: b interrupt_request
fast_interrupt_request: b fast_interrupt_request
_start:
mov r0, #0
mov r1, #1
increase:
add r0, r0, r1
cmp r0, #10
bne increase
decrease:
sub r0, r0, r1
cmp r0, #0
bne decrease
b increase
stop: b stop
to my LPC4088 (I am using Embedded artists LPC4088 QSB) via SEGGER's JLink so I could later debug it using GDB.
First I compiled my sources with all the debugging symbols using GCC toolchain:
arm-none-eabi-as -g -gdwarf-2 -o program.o program.s
arm-none-eabi-ld -Ttext=0x0 -o program.elf program.o
arm-none-eabi-objcopy -O binary program.elf program.bin
But uploading binary program.bin to LPC4088 was unsuccessful. Then user #old_timer reminded me in the comments that LPC4088's boot ROM does a checksum test after every reset like described on a page 876 of LPC4088 user manual:
So I mad sure my binary would pass a checksum test by following steps described here. So I first created a C source file checksum.c:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char **argv) {
int fw, count, crc;
char buf[28];
fw = open(argv[1], O_RDWR);
// read fist 28 bytes
read(fw, &buf, 28);
// find 2's complement of entries 0 to 6
for (count=0, crc=0; count < 7; count++) {
crc += *((int*)(buf+count*4));
}
crc = (~crc) + 1;
// write it at offset 0x0000001C
lseek(fw, 0x0000001C, SEEK_SET);
write(fw, &crc, 4);
close(fw);
return 0;
}
compiled it using gcc -o checksum.bin checksum.c and then I fed it the original program.bin as an argument like this ./checksum.bin program.bin. So I got a modified program.bin which really had a value at 0x1C modified! Here is the comparison of the original:
and the modified version:
So the value at 0x1C was modified from 0xFEFFFFEA to 0x0400609D. This is all that was modified as can be seen from the images.
I then opened terminal application JLinkExe which presented a prompt. In the prompt I:
powered on my board using power on,
connected to the LPC4088 using command connect,
halted the MCPU using command h,
erased entire FLASH memory using command erase,
uploaded my modified binary to FLASH loadbin program.bin 0x0,
set the program counter to start at the beginning SetPC 0x4.
started stepping into the program using s.
When I started stepping into the program in first step I got some errors as can be seen at the end of the procedure inside JLinkExe prompt:
SEGGER J-Link Commander V6.30a (Compiled Jan 31 2018 18:14:21)
DLL version V6.30a, compiled Jan 31 2018 18:14:14
Connecting to J-Link via USB...O.K.
Firmware: J-Link V9 compiled Jan 29 2018 15:41:50
Hardware version: V9.30
S/N: 269300437
License(s): FlashBP, GDB
OEM: SEGGER-EDU
VTref = 3.293V
Type "connect" to establish a target connection, '?' for help
J-Link>connect
Please specify device / core. <Default>: LPC4088
Type '?' for selection dialog
Device>
Please specify target interface:
J) JTAG (Default)
S) SWD
TIF>
Device position in JTAG chain (IRPre,DRPre) <Default>: -1,-1 => Auto-detect
JTAGConf>
Specify target interface speed [kHz]. <Default>: 4000 kHz
Speed>
Device "LPC4088" selected.
Connecting to target via JTAG
TotalIRLen = 4, IRPrint = 0x01
JTAG chain detection found 1 devices:
#0 Id: 0x4BA00477, IRLen: 04, CoreSight JTAG-DP
Scanning AP map to find all available APs
AP[1]: Stopped AP scan as end of AP map has been reached
AP[0]: AHB-AP (IDR: 0x24770011)
Iterating through AP map to find AHB-AP to use
AP[0]: Core found
AP[0]: AHB-AP ROM base: 0xE00FF000
CPUID register: 0x410FC241. Implementer code: 0x41 (ARM)
Found Cortex-M4 r0p1, Little endian.
FPUnit: 6 code (BP) slots and 2 literal slots
CoreSight components:
ROMTbl[0] # E00FF000
ROMTbl[0][0]: E000E000, CID: B105E00D, PID: 000BB00C SCS-M7
ROMTbl[0][1]: E0001000, CID: B105E00D, PID: 003BB002 DWT
ROMTbl[0][2]: E0002000, CID: B105E00D, PID: 002BB003 FPB
ROMTbl[0][3]: E0000000, CID: B105E00D, PID: 003BB001 ITM
ROMTbl[0][4]: E0040000, CID: B105900D, PID: 000BB9A1 TPIU
ROMTbl[0][5]: E0041000, CID: B105900D, PID: 000BB925 ETM
Cortex-M4 identified.
J-Link>h
PC = 000001B2, CycleCnt = 825F97DB
R0 = 00000000, R1 = 20098038, R2 = 2009803C, R3 = 000531FB
R4 = 00000000, R5 = 00000000, R6 = 12345678, R7 = 00000000
R8 = 6C2030E3, R9 = 0430DB64, R10= 10000000, R11= 00000000
R12= 899B552C
SP(R13)= 1000FFF0, MSP= 1000FFF0, PSP= 6EBAAC08, R14(LR) = 00000211
XPSR = 21000000: APSR = nzCvq, EPSR = 01000000, IPSR = 000 (NoException)
CFBP = 00000000, CONTROL = 00, FAULTMASK = 00, BASEPRI = 00, PRIMASK = 00
FPS0 = 93310C50, FPS1 = 455D159C, FPS2 = 01BA3FC2, FPS3 = E851BEED
FPS4 = D937E8F4, FPS5 = 82BD7BF6, FPS6 = 8F16D263, FPS7 = B0E8C039
FPS8 = 302C0A38, FPS9 = 8007BC9C, FPS10= 9A1A276F, FPS11= 76C9DCFE
FPS12= B2FFFA20, FPS13= B55786BB, FPS14= 2175F73E, FPS15= 5D35EC5F
FPS16= 98917B32, FPS17= C964EEB6, FPS18= FEDCA529, FPS19= 1703B679
FPS20= 2F378232, FPS21= 973440E3, FPS22= 928C911C, FPS23= 20A1BF55
FPS24= 4AE3AD0C, FPS25= 4F47CC1E, FPS26= C7B418D5, FPS27= 3EAB9244
FPS28= 73C795D0, FPS29= A359C85E, FPS30= 823AEA80, FPS31= EC9CBCD5
FPSCR= 00000000
J-Link>erase
Erasing device (LPC4088)...
J-Link: Flash download: Only internal flash banks will be erased.
To enable erasing of other flash banks like QSPI or CFI, it needs to be enabled via "exec EnableEraseAllFlashBanks"
Comparing flash [100%] Done.
Erasing flash [100%] Done.
Verifying flash [100%] Done.
J-Link: Flash download: Total time needed: 3.357s (Prepare: 0.052s, Compare: 0.000s, Erase: 3.301s, Program: 0.000s, Verify: 0.000s, Restore: 0.002s)
Erasing done.
J-Link>loadbin program.bin 0x0
Downloading file [program.bin]...
Comparing flash [100%] Done.
Erasing flash [100%] Done.
Programming flash [100%] Done.
Verifying flash [100%] Done.
J-Link: Flash download: Bank 0 # 0x00000000: 1 range affected (4096 bytes)
J-Link: Flash download: Total time needed: 0.076s (Prepare: 0.056s, Compare: 0.001s, Erase: 0.000s, Program: 0.005s, Verify: 0.000s, Restore: 0.012s)
O.K.
J-Link>SetPC 0x4
J-Link>s
**************************
WARNING: T-bit of XPSR is 0 but should be 1. Changed to 1.
**************************
J-Link>s
****** Error: Failed to read current instruction.
J-Link>s
****** Error: Failed to read current instruction.
J-Link>s
****** Error: Failed to read current instruction.
J-Link>
So this code must have come from somewhere and it may be the LPC4088's Boot ROM which is remapped to 0x0 at boot time as is stated on page 907 of the LPC4088 user manual:
Do you have any idea on how to overcome this Boot ROM & checksum problem, so I could debug my program normally?
After a while I found out that warning:
**************************
WARNING: T-bit of XPSR is 0 but should be 1. Changed to 1.
**************************
is actually saying that I am trying to execute ARM instruction on a Cortex-M4 which is Thumb only! This T-bit mentioned in the warning is described on page 100 of ARMv7-M architecture reference manual:
And this is exactly what user #old_timer is saying.

You are trying to run arm instructions (0xExxxxxxxx is a big giveaway, not to mention the exception table being a lot of 0xEAxxxxxx instructions) on a cortex-m4. The cortex-m boots differently (vector table rather than executable instructions) and is thumb only (the thumb2 extensions in armv7-m are also...just thumb, dont be confused by that, what thumb2 extensions do matter but the early/original thumb is portable across all of them). So whether or not you need an additional checksum somewhere like older ARM7TDMI based NXP chips in order for the bootloader to allow the user/application code to run, you first need something that will run on the cortex-m4.
start with this, yes I know you have a cortex-m4 use cortex-m0 for now.
so.s
.cpu cortex-m0
.thumb
.thumb_func
.globl _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
# ...
.thumb_func
hang: b hang
.thumb_func
reset:
mov r1,#0
outer:
mov r0,#0xFF
inner:
nop
nop
add r1,#1
sub r0,#1
bne inner
nop
nop
b outer
build
arm-none-eabi-as so.s -o so.o
arm-none-eabi-ld -Ttext=0 so.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf -O binary so.bin
examine so.list to make sure the vector table is correct.
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 0000000f andeq r0, r0, pc
8: 0000000d andeq r0, r0, sp
0000000c <hang>:
c: e7fe b.n c <hang>
0000000e <reset>:
e: 2100 movs r1, #0
00000010 <outer>:
10: 20ff movs r0, #255 ; 0xff
00000012 <inner>:
12: 46c0 nop ; (mov r8, r8)
14: 46c0 nop ; (mov r8, r8)
16: 3101 adds r1, #1
18: 3801 subs r0, #1
1a: d1fa bne.n 12 <inner>
1c: 46c0 nop ; (mov r8, r8)
1e: 46c0 nop ; (mov r8, r8)
20: e7f6 b.n 10 <outer>
The reset entry point is 0x00E which is correctly indicated in the vector table at offset 0x4 as 0x00F. You can flash it to 0x000 and then reset and see if it works (need a debugger to stop it to see if it is stepping through that code).
To run from sram there is nothing position dependent here, so you can load the .bin as is to 0x20000000 and execute from 0x2000000E (or whatever address your toolchain ends up creating for the reset entry point).
Or you can remove the vector table
.cpu cortex-m0
.thumb
.thumb_func
reset:
mov r1,#0
outer:
mov r0,#0xFF
inner:
nop
nop
add r1,#1
sub r0,#1
bne inner
nop
nop
b outer
And link with -Ttext=0x20000000, then download to sram and start execution with the debugger at 0x20000000.
You should see r0 counting some, r1 should just keep counting forever then roll over and keep counting so if you stop it check the registers, resume, stop, etc you should see that activity.

Related

How to write a function to read ARM CPSR in either ARM or THUMB mode?

I am working on an ARMv7 (Cortex-A7) system, and I want to read CPSR from C file in either ARM mode or THUMB mode.
Firstly, I used the embedded ASSEMBLY instruction in C function as follows,
__asm__ volatile("mrs %0, CPSR\n" : "=r"(regval));
When I compiled the C file with -mthumb and ran the code with GDB, it showed that the regval is 0x60000010 which is NOT the 0x60000030 shown by GDB!
So how to write a function to read CPSR in either ARM or THUMB mode?
Updated with compiling option
a) Build the code with following command line to specify the THUMB mode.
arm-linux-gnueabi-gcc -g2 backtrace.c -mcpu=cortex-a7 -static -mthumb -o tbacktrace
Run tbacktrace with qemu and GDB, I got different value as,
(gdb) p/x regval
$7 = 0x60000010
(gdb) p/x $cpsr
$8 = 0x60000030
The question is why my mrs %0, CPSR\n showd CPSR is ARM mode, instead of THUMB mode which the code is built.
b) When build the code with command line (not specify -mcpu=cortex-a7),
arm-linux-gnueabi-gcc -g2 backtrace.c -mthumb -o tbacktrace
there reported the following error.
$ arm-linux-gnueabi-gcc -g2 backtrace.c -mthumb -o tbacktrace
/tmp/ccOg2tlo.s: Assembler messages:
/tmp/ccOg2tlo.s:2256: Error: selected processor does not support `mrs r3,CPSR' in Thumb mode
/tmp/ccOg2tlo.s:2398: Error: selected processor does not support `mrs r3,CPSR' in Thumb mode
c) Build the code without -mcpu or -mthumb, the code can be built and ran well.
So I think there should be some other ways to get right CPSR in both ARM and THUMB modes.
Updated with more assembly codes.
arm-linux-gnueabi-objdump -M force-thumb -d a.elf shows following,
4000014c: 0ff0 lsrs r0, r6, #31
4000014e: e92d 0f30 stmdb sp!, {r4, r5, r8, r9, sl, fp}
40000152: ee30 3407 cdp 4, 3, cr3, cr0, cr7, {0}
40000156: e210 b.n 4000057a <__aeabi_f2d+0x16>
40000158: 3ba3 subs r3, #163 ; 0xa3
4000015a: e1a0 b.n 4000049e <__adddf3+0x1f6>
4000015c: 001b movs r3, r3
4000015e: 0a00 lsrs r0, r0, #8
40000160: a000 add r0, pc, #0 ; (adr r0, 40000164 <B_Loop1>)
40000162: e3a0 b.n 400008a6 <__udivmoddi4+0x19a>
......
400002a8 <__adddf3>:
400002a8: b530 push {r4, r5, lr}
400002aa: ea4f 0441 mov.w r4, r1, lsl #1
400002ae: ea4f 0543 mov.w r5, r3, lsl #1
400002b2: ea94 0f05 teq r4, r5
400002b6: bf08 it eq
400002b8: ea90 0f02 teqeq r0, r2
400002bc: bf1f itttt ne
Here is a part of code of the project, which is built with -mthumb -mcpu=cortex-a7.
As Nate and Frant mentioned, I think the code is running in THUMB mode, and checking Tbit of CPSR to detect ARM or THUMB mode is un-necessary, is it correct?
A way to detect THUMB or ARM mode
After reading Nate's and Frant's comments, I had an idea to detect which mode the CPU is not by reading Tbit of CPSR. The idea is by reading PC register two times, and check the difference. If it is 2 (length of THUMB instruction), CPU is running in THUMB mode, if it is 4 (length of ARM instruction), CPU is in ARM mode.
The code is as follows,
register uint32_t pc1, pc2;
asm volatile("mov %0, pc\n mov %1, pc" : "=r"(pc1), "=r"(pc2));
I built the code with and without -mthumb, with -Os, the code seems to be able to detect the THUMB or ARM mode.
CPSR.c:
#include <stdint.h>
int main(int argc, char* argv[]) {
uint32_t regval;
asm volatile("mrs %0, CPSR" : "=r"(regval));
return regval;
}
If you don't use -mcpu=cortex-a7, your compiler will default to another CPU:
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-gcc -O0 -nostartfiles -nostdlib -Wl,--section-start=.text=0x80800000 -S CPSR.c
cat CPSR.s
.cpu arm7tdmi
.arch armv4t
The ARM7TDMI-S was introduced in 2001, and, as pointed out by your compiler, does not seem to support mrs r3,CPSR in Thumb mode. Therefore, you must specify -mcpu=cortex-a7:
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-gcc -mcpu=cortex-a7 -O0 -nostartfiles -nostdlib -Wl,--section-start=.text=0x80800000 -S CPSR.c
cat CPSR.s
.cpu cortex-a7
.arch armv7-a
CPU and architecture are now as expected.
Testing your code on real hardware - a Cortex-A7 running u-boot - in Arm and Thumb mode:
Arm:
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-gcc -O0 -mcpu=cortex-a7 -marm -nostartfiles -nostdlib -Wl,--section-start=.text=0x80800000 -o CPSR-arm.elf CPSR.c
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000080800000
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-objcopy -O srec CPSR-arm.elf CPSR-arm.srec
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-objdump -j .text -D CPSR-arm.elf
CPSR-arm.elf: file format elf32-littlearm
Disassembly of section .text:
80800000 <main>:
80800000: e52db004 push {fp} ; (str fp, [sp, #-4]!)
80800004: e28db000 add fp, sp, #0
80800008: e24dd01c sub sp, sp, #28
8080000c: e50b0010 str r0, [fp, #-16]
80800010: e50b1014 str r1, [fp, #-20] ; 0xffffffec
80800014: e50b2018 str r2, [fp, #-24] ; 0xffffffe8
80800018: e10f3000 mrs r3, CPSR
8080001c: e50b3008 str r3, [fp, #-8]
80800020: e51b3008 ldr r3, [fp, #-8]
80800024: e1a00003 mov r0, r3
80800028: e28bd000 add sp, fp, #0
8080002c: e49db004 pop {fp} ; (ldr fp, [sp], #4)
80800030: e12fff1e bx lr
I.MX7d running u-boot:
# loads
## Ready for S-Record download ...
## First Load Addr = 0x80800000
## Last Load Addr = 0x80800033
## Total Size = 0x00000034 = 52 Bytes
CACHE: Misaligned operation at range [80800000, 80800034]
## Start Addr = 0x80800000
# go 0x80800000
## Starting application at 0x80800000 ...
## Application terminated, rc = 0x200000D3
Thumb:
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-gcc -O0 -mcpu=cortex-a7 -mthumb -nostartfiles -nostdlib -Wl,--section-start=.text=0x80800000 -o CPSR-thumb.elf CPSR.c
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000080800000
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-objcopy -O srec CPSR-thumb.elf CPSR-thumb.srec
/opt/arm/10/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-objdump -j .text -D CPSR-thumb.elf
CPSR-thumb.elf: file format elf32-littlearm
Disassembly of section .text:
80800000 <main>:
80800000: b480 push {r7}
80800002: b087 sub sp, #28
80800004: af00 add r7, sp, #0
80800006: 60f8 str r0, [r7, #12]
80800008: 60b9 str r1, [r7, #8]
8080000a: 607a str r2, [r7, #4]
8080000c: f3ef 8300 mrs r3, CPSR
80800010: 617b str r3, [r7, #20]
80800012: 697b ldr r3, [r7, #20]
80800014: 4618 mov r0, r3
80800016: 371c adds r7, #28
80800018: 46bd mov sp, r7
8080001a: bc80 pop {r7}
8080001c: 4770 bx lr
I.MX7d running u-boot:
# loads
## Ready for S-Record download ...
## First Load Addr = 0x80800000
## Last Load Addr = 0x8080001D
## Total Size = 0x0000001E = 30 Bytes
CACHE: Misaligned operation at range [80800000, 8080001e]
## Start Addr = 0x80800000
#
# go 0x80800001
## Starting application at 0x80800001 ...
## Application terminated, rc = 0x200000D3
Bottom-line, both versions returned the same value for CPSR, i.e. 0x200000D3.
To the question
How to write a function to read ARM CPSR in either ARM or THUMB mode?
The answer would then be: The way you did.
Asking why p/x regval and p/x $cpsr are not returning the same value should be the topic for a different question, may be on the GDB forum.
Update #1: Nate Eldredge explained why the value read into the register has always the T bit set to zero.
Testing on a different Cortex-A7 (Allwinner H3), a JLink probe and the Ozone debugger, we can see that even though the value read by the MRS instruction is 0x200000D3, the value of CPSR_USR read by the JTAG probe and Ozone is 0x200001F3 when executing the Thumb version, and 0x200000D3 when executing the Arm version:
Arm:
Thumb:
This would I.M.H.O. perfectly validate his explanation.
Update #2
Still using the JLink debug probe, but in combination with JLinkGDBServerExe and arm-none-eabi-gdb 12.1 in TUI mode:
Arm:
Thumb:
The value for the CPSR register read by the JTAG probe is the one you would expect, i.e. has the Tbit set in Thumb mode.
You probably would get the same result in Linux using a TRACE32 JTAG probe.
Not sure this could be useful, but note that some pre-defined symbols differ when building an Arm or Thumb executable:
/opt/arm/11/arm-gnu-toolchain-11.3.rel1-x86_64-arm-none-eabi/bin/arm-none-eabi-gcc -dM -E -mcpu=cortex-a7 -marm - < /dev/null | grep -i arm
#define __ARM_SIZEOF_WCHAR_T 4
#define __ARM_FEATURE_SAT 1
#define __ARM_ARCH_ISA_ARM 1
#define __ARMEL__ 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_SIZEOF_MINIMAL_ENUM 1
#define __ARM_FEATURE_LDREX 15
#define __ARM_PCS 1
#define __ARM_FEATURE_QBIT 1
#define __ARM_ARCH_PROFILE 65
#define __ARM_32BIT_STATE 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_ARCH_ISA_THUMB 2
#define __ARM_ARCH 7
#define __ARM_FEATURE_UNALIGNED 1
#define __arm__ 1
#define __ARM_ARCH_7A__ 1
#define __ARM_FEATURE_SIMD32 1
#define __ARM_FEATURE_COPROC 15
#define __ARM_FEATURE_DSP 1
#define __ARM_ARCH_EXT_IDIV__ 1
#define __ARM_EABI__ 1
/opt/arm/11/arm-gnu-toolchain-11.3.rel1-x86_64-arm-none-eabi/bin/arm-none-eabi-gcc -dM -E -mcpu=cortex-a7 -mthumb - < /dev/null | grep -i thumb
#define __thumb2__ 1
#define __THUMB_INTERWORK__ 1
#define __thumb__ 1
#define __ARM_ARCH_ISA_THUMB 2
#define __THUMBEL__ 1
You could therefore use #ifdef __arm__ and #ifdef __thumb2__ statements in your code in order to know if you are executing the Arm or the Thumb version.
The instruction is working as designed and documented.
The discrepancy is in bit 5, which according to the ARMv7-A Architecture Reference Manual, is the T bit, indicating whether the processor is in Thumb state. It's one of the "execution state bits". Lower down on that page, under "Accessing the execution state bits", it says:
The execution state bits, other than the E bit, are RAZ [read as zero] when read by an MRS instruction.
So mrs rN, CPSR masks off those bits. I'm not sure why it's designed this way. But in principle you should already know whether you're in Thumb state or not, so it shouldn't really be necessary to read this information from CPSR.
On the other hand, gdb doesn't get its CPSR value from mrs rN, CPSR. I haven't checked, but I presume what happens is this: when your program hits a breakpoint, an exception is generated. This causes CPSR to be saved into SPSR (without masking any bits!), and the kernel's exception handler retrieves it from there to store as part of the saved context of your process, along with register values, etc. The saved context is made available to the debugger via appropriate system calls (e.g. ptrace(2)) and that's how it is able to display register contents and such. In particular, it gets the CPSR value that was saved at the breakpoint and which isn't masked.

Why is hello world in assembly for ARM mac 'invalid'?

The other answers don't tell me how to compile, I'm stuck
I have a simple hello world in assembly
.global start
.align 2
start: mov X0, #1
adr X1, hello
mov X2, #13
mov X16, #4
svc 0
mov X0, #0
mov X16, #1
svc 0
hello: .ascii "Hello\n"
I compiled it using clang hello.s -nostdlib -static
File says
% file ./a.out
./a.out: Mach-O 64-bit executable arm64
obj dump shows this and perhaps UNKNOWN_ARCHITECTURE is the problem?
./a.out: file format mach-o-arm64
Disassembly of section .text:
0000000100003fd8 <start>:
100003fd8: d2800020 mov x0, #0x1 // #1
100003fdc: 100000e1 adr x1, 100003ff8 <hello>
100003fe0: d28001a2 mov x2, #0xd // #13
100003fe4: d2800090 mov x16, #0x4 // #4
100003fe8: d4000001 svc #0x0
100003fec: d2800000 mov x0, #0x0 // #0
100003ff0: d2800030 mov x16, #0x1 // #1
100003ff4: d4000001 svc #0x0
0000000100003ff8 <hello>:
100003ff8: 6c6c6548 ldnp d8, d25, [x10, #-320]
100003ffc: Address 0x0000000100003ffc is out of bounds.
Disassembly of section LC_THREAD.UNKNOWN_ARCHITECTURE.0:
0000000000000000 <LC_THREAD.UNKNOWN_ARCHITECTURE.0>:
...
100: 00003fd8 udf #16344
104: 00000001 udf #1
...
Running in zsh says "killed" with error code 137.
This is what dtruss says
% sudo dtruss ./a.out
dtrace: system integrity protection is on, some features will not be available
dtrace: failed to execute ./a.out: Bad executable (or shared library)
Where did I go wrong? I'm on a M2
The kernel on arm64 macOS does not allow static binaries. It's as simple as that, see Why does macOS kill static executables created by clang?
But you don't need your binary to be static. Just rename start to _main and compile with clang hello.s and it will work.

How can I see the full backtrace using kgdb to debug an ARM Linux module?

I worked my way through all of the free Linux training materials created by Free Electrons. In the last lab, we learn to use kgdb to remotely debug a simple crash in a loadable module. The crash is caused by a null pointer dereference in a memzero function call.
I am using Linux kernel 4.9 and a BeagleBone Black as the target, all according to the recommendations for the labs, and I've had no problems up to this point. My host is Ubuntu xenial and I am using standard packages for the ARM toolchain (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) and gdb (7.11.1-0ubuntu1~16.04) debugger.
gdb is able to read the symbol tables from vmlinux and from the module with the bug in it, which is called drvbroken.ko. The module has a bug in its init function, so it crashes immediately when I insmod it.
gdb output:
(gdb) backtrace
#0 __memzero () at arch/arm/lib/memzero.S:69
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) list 69
64 ldmeqfd sp!, {pc} # 1/2 quick exit
65 /*
66 * No need to correct the count; we're only testing bits from now on
67 */
68 tst r1, #32 # 1
69 stmneia r0!, {r2, r3, ip, lr} # 4
70 stmneia r0!, {r2, r3, ip, lr} # 4
71 tst r1, #16 # 1 16 bytes or more?
72 stmneia r0!, {r2, r3, ip, lr} # 4
73 ldr lr, [sp], #4 # 1
The result is the same whether I build the kernel with CONFIG_ARM_UNWIND (the default) or disable that and use CONFIG_FRAME_POINTER (the old method recommended by the lab notes).
I tried the same procedure in kdb, and here I see a very long backtrace that includes the calling functions. The caller of memzero is cdev_init.
kdb output:
Entering kdb (current=0xde616240, pid 106) on processor 0 Oops: (null)
due to oops # 0xc04c2be0
CPU: 0 PID: 106 Comm: insmod Tainted: G O 4.9.0-dirty #1
Hardware name: Generic AM33XX (Flattened Device Tree)
task: de616240 task.stack: de676000
PC is at __memzero+0x40/0x7c
LR is at 0x0
pc : [<c04c2be0>] lr : [<00000000>] psr: 00000013
sp : de677da4 ip : 00000000 fp : de677dbc
r10: bf000240 r9 : 219a3868 r8 : 00000000
r7 : de65c7c0 r6 : de6420c0 r5 : bf0000b4 r4 : 00000000
r3 : 00000000 r2 : 00000000 r1 : fffffffc r0 : 00000000
Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 9e69c019 DAC: 00000051
CPU: 0 PID: 106 Comm: insmod Tainted: G O 4.9.0-dirty #1
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
... pruned function calls related to kdb itself ...
[<c08326cc>] (do_page_fault) from [<c010138c>] (do_DataAbort+0x3c/0xbc)
r10:bf000240 r9:de676000 r8:de677d50 r7:00000000 r6:c08326cc r5:00000817
r4:c0d0bb2c
[<c0101350>] (do_DataAbort) from [<c0831d04>] (__dabt_svc+0x64/0xa0)
Exception stack(0xde677d50 to 0xde677d98)
7d40: 00000000 fffffffc 00000000 00000000
7d60: 00000000 bf0000b4 de6420c0 de65c7c0 00000000 219a3868 bf000240 de677dbc
7d80: 00000000 de677da4 00000000 c04c2be0 00000013 ffffffff
r8:00000000 r7:de677d84 r6:ffffffff r5:00000013 r4:c04c2be0
[<c02bf44c>] (cdev_init) from [<bf002048>] (init_module+0x48/0xb4 [drvbroken])
r5:bf002000 r4:bf000480
[<bf002000>] (init_module [drvbroken]) from [<c01018d4>] (do_one_initcall+0x44/0x180)
r5:bf002000 r4:ffffe000
[<c0101890>] (do_one_initcall) from [<c024fa2c>] (do_init_module+0x64/0x1d8)
r8:00000001 r7:de65c7c0 r6:de6420c0 r5:c0dbfa84 r4:bf000240
[<c024f9c8>] (do_init_module) from [<c01e10e8>] (load_module+0x1d6c/0x23d8)
r6:c0d0512c r5:c0dbfa84 r4:c0d4c70f
[<c01df37c>] (load_module) from [<c01e18ac>] (SyS_init_module+0x158/0x17c)
r10:00000051 r9:de676000 r8:e0a95100 r7:00000000 r6:000ac118 r5:00004100
It is pretty easy to figure out where to look for the bug with this information, but alas, it is not possible to get a line number or list the source directly from kdb. This is much easier in gdb, assuming that I can get a full backtrace.

Windows enforces READ-ONLY .text section, even thus disabled by the ld linker

In the toy program below, I declare a variable in the .text section and writes to it, which gives a segmentation-fault, since the .text section is marked as READ-ONLY:
Breakpoint 1, 0x00401000 in start ()
(gdb) disassemble
Dump of assembler code for function start:
=> 0x00401000 <+0>: movl $0x2,0x40100a
End of assembler dump.
(gdb) stepi
Program received signal SIGSEGV, Segmentation fault.
0x00401000 in start ()
(gdb)
Here is the objdump output:
test.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000001f 00401000 00401000 00000200 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .idata 00000014 00402000 00402000 00000400 2**2
CONTENTS, ALLOC, LOAD, DATA
However, linking using the --omagic switch (disables READ-ONLY .text section) yields the following results:
ld --omagic -o test.exe test.obj
test.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000001f 00401000 00401000 000001d0 2**4
CONTENTS, ALLOC, LOAD, CODE
1 .idata 00000014 00402000 00402000 000003d0 2**2
CONTENTS, ALLOC, LOAD, DATA
But debugging this using GDB gives the following (weird) results:
Breakpoint 1, 0x00401000 in start ()
(gdb) disassemble
Dump of assembler code for function start:
=> 0x00401000 <+0>: dec %ebp
0x00401001 <+1>: pop %edx
0x00401002 <+2>: nop
0x00401003 <+3>: add %al,(%ebx)
0x00401005 <+5>: add %al,(%eax)
0x00401007 <+7>: add %al,(%eax,%eax,1)
End of assembler dump.
(gdb) stepi
0x00401001 in start ()
(gdb) stepi
0x00401002 in start ()
(gdb) stepi
0x00401003 in start ()
(gdb) stepi
0x00401005 in start ()
(gdb) stepi
Program received signal SIGSEGV, Segmentation fault.
0x00401005 in start ()
(gdb)
First of all, I still get a segmentation fault, but the assembly code has also changed structure?
How can I link the .text section as writable on Windows 10 x64?
Toy program:
BITS 32
section .text
global _start
_start:
mov [var], dword 2
var: dd 0
ret
For some reason, ld completely changes the PE executable linked using the --omagic option.
A quick comparison of the files using the cmp utility shows:
137 177 222
141 0 320
142 6 5
213 0 320
214 2 1
217 142 205
218 154 353
397 0 320
398 2 1
437 0 320
438 4 3
465 0 307
...
So lots of differences, although ld should in principle only change the sections flags of the section header (.text), i.e. set the flag IMAGE_SCN_MEM_WRITE.
Changing the flags manually using HxD, i.e. setting byte at offset 0x19F to 0xE0 solves the issue...
A trial run of the program with interchanged order of var and ret (otherwise the program crash):
Breakpoint 1, 0x00401000 in start ()
(gdb) disassemble
Dump of assembler code for function start:
=> 0x00401000 <+0>: movl $0x2,0x40100b
0x0040100a <+10>: ret
End of assembler dump.
(gdb) stepi
0x0040100a in start ()
(gdb) disassemble
Dump of assembler code for function start:
0x00401000 <+0>: movl $0x2,0x40100b
=> 0x0040100a <+10>: ret
End of assembler dump.
(gdb) x/wx var
0x40100b <var>: 0x00000002
(gdb)
and we see things work as expected.
My conclusion is that ld somehow generates a badly formatted PE executable, and I see that #RossRidge has the answer to this (ld doesn't respect the file alignment of sections).
The --omagic flag is causing the GNU linker to generate a bad PECOFF executable. Sections must aligned in the file with a minimum file alignment of 512 bytes, but the linker puts the .text section at file offset of 0x1d0.
Instead of using the --omagic flag, generate your executable normally and then use objcopy to change the flags in the section header:
ld -o test-tmp.exe test.obj
$(OBJCOPY) --set-section-flags .text=code,data,alloc,contents,load test-tmp.exe test.exe

Can Libffi be built for Cortex-M3?

I'm trying to build the foreign function interface library for a Cortex-M3 processor using GCC. According to http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html:
-mthumb
Generate code for the Thumb instruction set. The default is to use the 32-bit ARM instruction set. This option automatically enables either 16-bit Thumb-1 or mixed 16/32-bit Thumb-2 instructions based on the -mcpu=name and -march=name options. This option is not passed to the assembler. If you want to force assembler files to be interpreted as Thumb code, either add a `.thumb' directive to the source or pass the -mthumb option directly to the assembler by prefixing it with -Wa.
I've tried passing various various arguments to the assembler and can't seem to figure it out. Typical output as follows:
Building file: ../source/ffi/sysv.S
Invoking: GCC Assembler
arm-bare_newlib_cortex_m3_nommu-eabi-gcc -Wa,-mthumb-interwork -I"/home/neil/m3projects/robovero/firmware/include" -o"source/ffi/sysv.o" "../source/ffi/sysv.S"
../source/ffi/sysv.S: Assembler messages:
../source/ffi/sysv.S:145: Error: selected processor does not support ARM opcodes
../source/ffi/sysv.S:147: Error: attempt to use an ARM instruction on a Thumb-only processor -- `stmfd sp!,{r0-r3,fp,lr}'
...
Can I use libffi on Cortex-M3 without becoming an assembly expert?
It might be worth noting that when I invoke arm-bare_newlib_cortex_m3_nommu-eabi-as directly I get different errors.
I modify the sysV.S as follolwing, the error is caused by the ".arm" directive, when using cortex-m3, it should be comment out.
#ifdef __ARM_ARCH_7M__ /* cortex-m3 */
#undef __THUMB_INTERWORK__
#endif
#if __ARM_ARCH__ >= 5
# define call_reg(x) blx x
#elif defined (__ARM_ARCH_4T__)
# define call_reg(x) mov lr, pc ; bx x
# if defined(__thumb__) || defined(__THUMB_INTERWORK__)
# define __INTERWORKING__
# endif
#else
# define call_reg(x) mov lr, pc ; mov pc, x
#endif
/* Conditionally compile unwinder directives. */
#ifdef __ARM_EABI__
#define UNWIND
#else
#define UNWIND #
#endif
#if defined(__thumb__) && !defined(__THUMB_INTERWORK__)
.macro ARM_FUNC_START name
.text
.align 0
.thumb
.thumb_func
#ifdef __APPLE__
ENTRY($0)
#else
ENTRY(\name)
#endif
#ifndef __ARM_ARCH_7M__ /* not cortex-m3 */
bx pc
nop
.arm
#endif
UNWIND .fnstart
/* A hook to tell gdb that we've switched to ARM mode. Also used to call
directly from other local arm routines. */
#ifdef __APPLE__
_L__$0:
#else
_L__\name:
#endif
.endm
I hate to say it but it is a porting effort. Doable, not necessarily having to be an assembler expert, but will need to learn some. Going from thumb to arm is easy, thumb2, I would have to look that up, much of thumb2 is just thumb instructions. and thumb has a one to one mapping to arm instructions, but not the other way around. Thumb mostly limits you to the lower 8 registers on all the workhorse instructions, with special versions or special instructions to use the upper registers. So many of your arm instructions are going to turn into more than one thumb instruction.
Initially see if there is a build option to build this package without using assembler or go into that directory and see if there is something you can do in the makefile to use a C program instead of assembler. I assume there is a serious performance issue to using C which is why there is assembler to start with. Thumb2 in theory is more efficient than arm but that does not necessarily mean a direct port from arm to thumb2. So with some experience you may be able to hand port to thumb2 and keep some performance.
EDIT:
Downloaded the file in question. The define stuff up front implies that it is aware of both thumb and armv7m. is that how you are getting to where you were changing stm to push?
The assembler is telling you the truth - ARM assembly code can't be assembled to work successfully on a Thumb-2-only processor like the M3. There are no way for the assembler to map the ARM instruction mnemonics into opcodes that will make sense to a Cortex-M3. You'll need to port the assembly files to Thumb-2 assembly code to get things working. Depending on what the original assembly code does, you might get lucky and be able to port to C instead, but that may cost you a major performance hit.
Add "-Wa,-mimplicit-it=thumb" to the gcc CFLAGS to avoid "thumb conditional instruction should be in IT block" error
--- libffi.orig/src/arm/sysv.S
+++ libffi/src/arm/sysv.S
## -91,6 +91,10 ##
# define __ARM_ARCH__ 7
#endif
+#ifdef __ARM_ARCH_7M__ /* cortex-m3 */
+#undef __THUMB_INTERWORK__
+#endif
+
#if __ARM_ARCH__ >= 5
# define call_reg(x) blx x
#elif defined (__ARM_ARCH_4T__)
## -121,9 +125,11 ##
#else
ENTRY(\name)
#endif
+#ifndef __ARM_ARCH_7M__ /* not cortex-m3 */
bx pc
nop
.arm
+#endif
UNWIND .fnstart
/* A hook to tell gdb that we've switched to ARM mode. Also used to call
directly from other local arm routines. */
## -164,6 +170,10 ## _L__\name:
#endif
.endm
+#ifdef __ARM_ARCH_7M__ /* cortex-m3 */
+ .syntax unified
+#endif
+
# r0: ffi_prep_args
# r1: &ecif
# r2: cif->bytes
## -180,7 +190,11 ## ARM_FUNC_START ffi_call_SYSV
UNWIND .setfp fp, sp
# Make room for all of the new args.
+#ifdef __ARM_ARCH_7M__ /* cortex-m3 */
+ sub sp, sp, r2
+#else
sub sp, fp, r2
+#endif
# Place all of the ffi_prep_args in position
mov r0, sp
## -193,7 +207,12 ## ARM_FUNC_START ffi_call_SYSV
ldmia sp, {r0-r3}
# and adjust stack
+#ifdef __ARM_ARCH_7M__ /* cortex-m3 */
+ mov lr, sp
+ sub lr, fp, lr # cif->bytes == fp - sp
+#else
sub lr, fp, sp # cif->bytes == fp - sp
+#endif
ldr ip, [fp] # load fn() in advance
cmp lr, #16
movhs lr, #16
## -305,7 +324,13 ## ARM_FUNC_START ffi_closure_SYSV
beq .Lretlonglong
.Lclosure_epilogue:
add sp, sp, #16
+#ifdef __ARM_ARCH_7M__ /* cortex-m3 */
+ ldr ip, [sp, #4]
+ ldr sp, [sp]
+ mov pc, ip
+#else
ldmfd sp, {sp, pc}
+#endif
.Lretint:
ldr r0, [sp]
b .Lclosure_epilogue
## -381,7 +406,12 ## LSYM(Lbase_args):
ldmia sp, {r0-r3}
# and adjust stack
+#ifdef __ARM_ARCH_7M__ /* cortex-m3 */
+ mov lr, sp
+ sub lr, ip, lr # cif->bytes == (fp - 64) - sp
+#else
sub lr, ip, sp # cif->bytes == (fp - 64) - sp
+#endif
ldr ip, [fp] # load fn() in advance
cmp lr, #16
movhs lr, #16
## -469,7 +499,13 ## ARM_FUNC_START ffi_closure_VFP
.Lclosure_epilogue_vfp:
add sp, sp, #72
+#ifdef __ARM_ARCH_7M__ /* cortex-m3 */
+ ldr ip, [sp, #4]
+ ldr sp, [sp]
+ mov pc, ip
+#else
ldmfd sp, {sp, pc}
+#endif
.Lretfloat_vfp:
flds s0, [sp]

Resources