ARM cross compiler generating invalid branch argets in standard C functions - gcc

I'm working on a custom embedded project (using PlatformIO to setup the build environment), and I have found that calls to standard C functions like memset, memcpy are generating bogus code. The disassembly shows that instructions in both those functions (and others I've tried from stdlib) branch unconditionally to locations that contain no code, which of course causes the MCU (a Cortex M4, the Atmel D51) to hard-fault as it tries executing nonsense code. There are no compiler errors, only runtime errors in the form of hard-faults due to invalid instructions.
I believe it's something wrong with my compilation environment, as PlatformIO has some libraries used for an Adafruit board of the same processor, and that correctly links the functions above. Note that I am cross-compiling from Mac. Just below are the disassemblies for the memset function from the Adafruit and Custom projects:
Adafruit:
0x000012de: 02 44 add r2, r0
0x000012e0: 03 46 mov r3, r0
0x000012e2: 93 42 cmp r3, r2
0x000012e4: 00 d1 bne.n 0x12e8 <memset+10>
0x000012e6: 70 47 bx lr
0x000012e8: 03 f8 01 1b strb.w r1, [r3], #1
0x000012ec: f9 e7 b.n 0x12e2 <memset+4>
Custom:
0x000005b4: 00 30 adds r0, #0
0x000005b6: a0 e1 b.n 0x8fa <--- branch to address with no code and hard-fault
0x000005b8: 02 20 movs r0, #2
0x000005ba: 80 e0 b.n 0x6be
0x000005bc: 02 00 movs r2, r0
0x000005be: 53 e1 b.n 0x868
0x000005c0: 1e ff 2f 01 vrhadd.u16 d0, d14, d31
0x000005c4: 01 10 asrs r1, r0, #32
0x000005c6: c3 e4 b.n 0xffffff50
0x000005c8: fb ff ff ea ; <UNDEFINED> instruction: 0xfffbeaff
Even without the nonsensical branch targets, the custom version has a totally different form from the one above, suggesting to me that something horribly wrong is happening with linking. I assume the issue is at the linking stage, and not during compilation of individual object files. Linking between files that exist solely within my project causes no issue; local branching is correct. This weirdness seems confined to linking prebuilt libraries.
I should mention that the adafruit stuff also includes Arduino code, so part of that compilation process includes C++ whereas mine is purely C. I have based most of the compiler flags and build environment on the Adafruit project as it was the best reference for my own project, but I am not using arduino in any form.
Here's how the linker is called for each of the two projects
Adafruit (g++ can be interchanged with gcc w/ no error):
arm-none-eabi-g++ -o .pio/build/adafruit_grandcentral_m4/firmware.elf -T flash_without_bootloader.ld -mfloat-abi=hard -mfpu=fpv4-sp-d16 -Os -mcpu=cortex-m4 -mthumb -Wl,--gc-sections -Wl,--check-sections -Wl,--unresolved-symbols=report-all -Wl,--warn-common -Wl,--warn-section-align --specs=nosys.specs --specs=nano.specs .pio/build/adafruit_grandcentral_m4/src/main.cpp.o -L.pio/build/adafruit_grandcentral_m4 -L/Users/work-reese/.platformio/packages/framework-arduino-samd-adafruit/variants/grand_central_m4/linker_scripts/gcc -L/Users/work-reese/.platformio/packages/framework-cmsis/CMSIS/Lib/GCC -Wl,--start-group .pio/build/adafruit_grandcentral_m4/libFrameworkArduinoVariant.a .pio/build/adafruit_grandcentral_m4/libFrameworkArduino.a -larm_cortexM4lf_math -lm -Wl,--end-group
Custom:
arm-none-eabi-ar rc .pio/build/commonsense/libFrameworkCommonSense.a .pio/build/commonsense/FrameworkCommonSense/commonsense.o .pio/build/commonsense/FrameworkCommonSense/cortex_handlers.o .pio/build/commonsense/FrameworkCommonSense/led.o .pio/build/commonsense/FrameworkCommonSense/pinConfig.o .pio/build/commonsense/FrameworkCommonSense/startup.o
arm-none-eabi-ranlib .pio/build/commonsense/libFrameworkCommonSense.a
arm-none-eabi-gcc -o .pio/build/commonsense/firmware.elf -T commonsense_linker.ld -mfpu=fpv4-sp-d16 -mthumb -Wl,--gc-sections -Wl,--check-sections -Wl,--unresolved-symbols=report-all -Wl,--warn-common -Wl,--warn-section-align --specs=nosys.specs --specs=nano.specs -mcpu=cortex-m4 .pio/build/commonsense/src/main.o -L.pio/build/commonsense -L/Users/work-reese/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/lib -L/Users/work-reese/.platformio/packages/framework-cmsis/CMSIS/Lib/GCC -L/Users/work-reese/.platformio/packages/framework-commonsense/linker -Wl,--start-group .pio/build/commonsense/libFrameworkCommonSense.a -larm_cortexM4lf_math -lc_nano -lm -Wl,--end-group
This is using the arm cross compiler, version 7.2.1, and the toolchain contains distributions for libc, libc_nano, libm, etc. All the necessary libraries appear to be present.
Please note I included a few extra lines for the custom version's linking above so you can see what it's building libFrameworkCommonSense.a from. None of those files include any stdlib calls, although cortex_handlers does not have __libc_init_array in the reset handler because that was also causing hard-faults in the same way memset. The linker script is identical between the two; once again, I borrowed heavily from the adafruit project for the interrupt handlers and startup code, but I haven't seen any actual differences between the environments until now.
Adding the --print-multi-lib option shows several options that should work, namely thumb/v7e-m/fpv4-sp/softfp;#mthumb#march=armv7e-m#mfpu=fpv4-sp-d16#mfloat-abi=softfp which should be selected given the compiler flags. Weirdly, it fails to compile when printing the multilib options, citing that the object files to archive (arm-none-eabi-ar) are not present in the build directory. This is probably of no concern.
Here's the compilation for the main file, which includes the calls to memset and memcpy:
arm-none-eabi-gcc -o .pio/build/commonsense/src/main.o -c -std=gnu11 -mfpu=fpv4-sp-d16 -Og -g3 -mlong-calls --specs=nano.specs -specs=nosys.specs -fdata-sections -ffunction-sections -mfloat-abi=softfp -march=armv7e-m -mfpu=fpv4-sp-d16 -marm -mthumb-interwork -ffunction-sections -fdata-sections -Wall -mthumb -nostdlib --param max-inline-insns-single=500 -mcpu=cortex-m4 -DPLATFORMIO=50003 -D__SAMD51P20A__ -D__SAMD51__ -D__FPU_PRESENT -DARM_MATH_CM4 -DENABLE_CACHE -DVARIANT_QSPI_BAUD_DEFAULT=50000000 -DDEBUG -DADAFRUIT_LINKER -DF_CPU=120000000L -Iinclude -Isrc -I/Users/work-reese/.platformio/packages/framework-cmsis/CMSIS/Include -I/Users/work-reese/.platformio/packages/framework-cmsis-atmel/CMSIS/Device/ATMEL -I/Users/work-reese/.platformio/packages/framework-cmsis-atmel/CMSIS/Device/ATMEL/samd51 -I/Users/work-reese/.platformio/packages/framework-commonsense -I/Users/work-reese/.platformio/packages/framework-commonsense/core -I/Users/work-reese/.platformio/packages/framework-commonsense/hal -I/Users/work-reese/.platformio/packages/framework-commonsense/hal/include -I/Users/work-reese/.platformio/packages/framework-commonsense/hal/utils/include -I/Users/work-reese/.platformio/packages/framework-commonsense/hal/utils/src -I/Users/work-reese/.platformio/packages/framework-commonsense/hal/src -I/Users/work-reese/.platformio/packages/framework-commonsense/hpl -I/Users/work-reese/.platformio/packages/framework-commonsense/hri -I/Users/work-reese/.platformio/packages/framework-commonsense/sample src/main.c
Does anyone know why I would be having this behavior with incorrectly linked library functions? I've bashed on it for nearly a week, throwing many combinations of compiler flags at it to no avail. I feel there's something I'm overlooking, but don't know what. I'm glad to provide any additional information.
Side question: What is __libc_init_array(), and how necessary is it to call during program startup? I see this in the reset handler for adafruit and Atmel Studio projects. It's declared locally as a function prototype in their startup files, but reproducing the same thing in my own environment causes a hardfault as soon as the processor tries calling that function. I should think it is a part of libc or similar.

I have found that calls to standard C functions like memset, memcpy are generating bogus code. The disassembly shows that instructions in both those functions (and others I've tried from stdlib) branch unconditionally to locations that contain no code, which of course causes the MCU (a Cortex M4, the Atmel D51) to hard-fault as it tries executing nonsense code.
Actually, that is ARM code rather than thumb code. When you try to disassemble it as thumb, it's nonsense, but disassemble it as ARM it looks plausible.
Of course, your processor can't execute ARM code, but only thumb code, and in any event even a processor which could would have to encounter it in ARM mode. So no mystery on the hard fault.
What is unclear is exactly how you are ending up with ARM code in a thumb project. At first glance it appears your actual invocations of the compiler are specifying thumb, so I'd guess the problem code is actually arriving as a result of linking the wrong library.

Seems the issues are when trying to use the compiler flag -mfloat-abi=softfp. I switched over to -mfloat-abi=hard, and these linking issues seemed to go away. I confirmed that the wrong set of switches breaks the adafruit environment as well.
It still seems strange that I would have such an error based on whether I solely used hardware for floating point versus a hybrid of SW emulation and HW for FPs. None of my code was using floating point either.
Part of the reason I set to 'softfp' is that the port of FreeRTOS I found mentioned I should be using this switch. Hopefully this doesn't preclude me from using that.
My question still remains on __libc_init_array(), as that still produces a hard fault when I run it -- it's disassembly is also strange looking with branches to odd places (i.e., the exception table).

Related

Wierd GCC behaviour with ARM assembler. ANDSEQ instruction

If I try to assemble this program:
.text
main:
andseq r1,r3,r2,lsl #13
With the command gcc -c test.s, I get the following error:
Error: bad instruction `andseq r1,r3,r2,lsl#13'
After some tries I replaced andseq with andeqs, and now it compiles fine.
But if I dump the resulting obj file with objdump -d test.o I get this:
Disassembly of section .text:
00000000 <main>:
0: 00131682 andseq r1, r3, r2, lsl #13
Note how the instruction is decoded as andseq ....
Am I missing something? Is this a bug?
My system is Raspbian GNU/Linux 8, and my gcc is: gcc (Raspbian 4.9.2-10) 4.9.2. I have also tested with gcc-8.1.0 (edit, not really see edit), same results.
EDIT:
In fact, it seems Im using the same binutils with gcc8, so I really only tested this GNU assembler (GNU Binutils for Raspbian) 2.25. I'll try a more recent assembler.
For compatibility with old assembly files, GNU as defaults to divided syntax for ARM assembly. In divided syntax, andeqs is the correct mnemonic for the instruction you desire. You can issue a .syntax unified directive to select unified syntax, in which andseq is the correct mnemonic.
GNU objdump on the other hand only knows unified syntax, which explains the apparent inconsistency.
For new developments, I advise you to consistently use unified syntax if possible.
There is a good UAL vs pre-UAL mnemonic table on ARMv8 Appendix K6 "Legacy Instruction Syntax for AArch32 Instruction Sets"
One of the entries of that table is:
Pre-UAL syntax UAL equivalent
AND<c>S ANDS<c>
where eq is one of the possible condition codes <c>.

Cross compiling - Error: selected processor does not support `fmrx r3,fpexc' in ARM mode - Beaglebone

I'm trying to cross-compile a file to flash into the Beaglebone Black.
All works fine, but if I try to enable the FPU with
#define set_en_bit_in_fpexc() do { \
int dummy; \
__asm__ __volatile__ ("fmrx %0,fpexc\n\t" \
"orr %0,%0,#0x40000000\n\t" \
"fmxr fpexc,%0" : "=r" (dummy) : :); \
} while (0)
I get the following error
Error: selected processor does not support `fmrx r3,fpexc' in ARM mode
Error: selected processor does not support `fmxr fpexc,r3' in ARM mode
I also tried with thumb mode, but I get the same errors.
Of course if I remove the part of the code that initialize the FPU it works fine.
Why I get those errors?
Makefile
[...]
CROSSPATH?=/usr/bin
CROSSPFX=$(CROSSPATH)/arm-none-eabi-
CC=$(CROSSPFX)gcc
AS=$(CROSSPFX)as
LD=$(CROSSPFX)ld
NM=$(CROSSPFX)nm
OBJCOPY=$(CROSSPFX)objcopy
OBJDUMP=$(CROSSPFX)objdump
CFLAGS=-Wall -Wextra -O2 -ffreestanding
ARCHFLAGS=-mcpu=cortex-a8 -march=armv7-a -mfpu=neon
CCARCHFLAGS=$(ARCHFLAGS) -marm
[...]
I'm on Arch, kernel 4.8.1
P.S. My professor uses the linaro cross-compiler and it works just fine
Most of the Linaro toolchains are configured for ARMv7 hard-float by default (certainly the Linux ones, I'm less sure about the bare-metal ones). Looking at the configuration of the arm-none-eabi toolchain as packaged by Arch, I surmise it's just using the GCC defaults for things like that, which implies something like ARMv4t, and crucially, soft-float ABI.
Whilst the -mfpu option controls code generation in terms of which floating-point instructions may be used, apparently it's the float ABI which controls whether it'll let you do things which really only make sense on a hardware FPU, rather than under floating-point emulation.
When it's not configured by default, you need to explicitly select a floating-point ABI implying an actual hardware FPU, i.e. -mfloat-abi=hard (or -mfloat-abi=softfp, but there's really no reason to use that unless you need to link against other soft-float code).
-mfpu=vfpv3-d16 -mfloat-abi=hard
Just to give a more direct solution, I had to add -mfpu=vfpv3-d16.
Test code a.S:
fmrx r2, fpscr
Working command:
sudo apt-get install binutils-arm-linux-gnueabihf
arm-linux-gnueabihf-as -mfpu=vfpv3-d16 -mfloat-abi=hard a.S
Note that -mfloat-abi=hard is enabled by default on this particular build of arm-linux-gnueabihf-as, and could be omitted.
The default value of float-abi likely depends on -msoft-float vs -mhard-float controlled at GCC build time with:
./configure --with-float=soft
as documented at: https://gcc.gnu.org/install/configure.html You can get the flags used for your toolchain build with gcc -v as mentioned at: What configure options were used when building gcc / libstdc++? I could not however easily determine its default value if not given.
You may also be interested in -mfloat-abi=softfp which can produce hard floats for the executable, but generate soft function calls: ARM compilation error, VFP registered used by executable, not object file
The possible values of -mfpu= can be found at: https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/ARM-Options.html#ARM-Options
Also note that FMRX is the pre-UAL syntax for VMRS which the newer recommended syntax, see also: Are ARM instructuons SWI and SVC exactly same thing?
Tested on Ubuntu 16.04, arm-linux-gnueabihf-as 2.26.1.

gcc LTO appears to strip debugging symbols

I have project, running on an ARM Cortex-M4 processor, where I'm trying to include the gcc link-time optimization (LTO) feature.
Currently my compilation and linking flags are:
CFLAGS = -ggdb -ffunction-sections -Og
LDFLAGS = -Wl,-gc-sections
Everything works fine with these flags and I'm able to correctly debug the project.
Then I tried adding -flto to CFLAGS. Although the program works fine, I'm no longer able to debug the project, with gdb complaining of missing debugging symbols. Running objdump -g on the ELF file (with LTO enabled) gives the following output:
xxx.elf: file format elf32-littlearm
Contents of the .debug_frame section:
00000000 0000000c ffffffff CIE
Version: 1
Augmentation: ""
Code alignment factor: 2
Data alignment factor: -4
Return address column: 14
DW_CFA_def_cfa: r13 ofs 0
00000010 00000018 00000000 FDE cie=00000000 pc=08002a3c..08002a88
DW_CFA_advance_loc: 2 to 08002a3e
DW_CFA_def_cfa_offset: 16
DW_CFA_offset: r4 at cfa-16
DW_CFA_offset: r5 at cfa-12
DW_CFA_offset: r6 at cfa-8
DW_CFA_offset: r14 at cfa-4
DW_CFA_nop
0000002c 0000000c ffffffff CIE
Version: 1
Augmentation: ""
Code alignment factor: 2
Data alignment factor: -4
Return address column: 14
DW_CFA_def_cfa: r13 ofs 0
0000003c 0000000c 0000002c FDE cie=0000002c pc=08002a88..08002a98
Note the missing .debug_info section. Going back to the project settings and only removing -flto from CFLAGS solves the problem. objdump -g on the ELF file without LTO now shows a .debug_info section, filled with the proper references to the functions in my project, and debugging works fine again.
How to get LTO and debug symbols to play well together?
Edit: forgot to include my gcc information. I'm using the GNU ARM Embedded Toolchain, and the test was performed on versions 5.4-2016q2 and 5.4-2016q3.
It's because gcc does not support combine -flto with -g.
You can find the details GCC Online Docs - Optimize Options
"Combining -flto with -g is currently experimental and expected to
produce unexpected results."
When you use -flto, the -g will be ignored.
The situation should have improved by now.
GCC 8 finally got the early debug info improvements:
http://hubicka.blogspot.com/2018/06/gcc-8-link-time-and-interprocedural.html
While it was possible to build with LTO and -g and debug the resulting
binary, the debug information was kind of messed up C, instead of
debug info corresponding to the language program was originally
written in. This is finally solved. [...] The main idea is to produce
DWARF early during compilation, store it into object files and during
link-time just copy necessary fragments to final object files without
need for compiler to parse it and update it.
But note that -gsplit-dwarf won't work with LTO.
One can try using attribute((used)) or alternatively one can try using the debugging symbols in a way that does not change their values.

Using compile flag -ffunction-sections with debug symbols

I am compiling a C file using the gcc flag -ffunction-sections, to move every function into it's own section. The assembler is throwing the error:
job_queue.s:2395: Error: operation combines symbols in different segments
The compiler's assembly output at line 2395 is given here:
.section .debug_ranges,info
.Ldebug_ranges0:
.4byte .LBB7-.Ltext0
The symbol LBB7 is in the function (and thus the section) named ".text.add_event_handler"
The symbol Ltext0 is in the (otherwise empty) section named: ".text"
GCC --version gives:
pic30-elf-gcc.exe (GCC) 4.0.3 (dsPIC30, Microchip v3_30) (B) Build date: Jun 29 2011
If I use the compiler flag -g0 (to turn off debug info) everything compiles and runs perfectly.
My question:
Is this GCC output clearly wrong? It seems to me that GCC should have calculated the symbol LBB7's offset from the beginning of the .add_even_handler section instead of the .text section.
I suspect I am misunderstanding something because I cannot find anyone having the same difficulty on the Google.
The GCC output is definitely wrong. Perhaps it's fixed in newer GCC versions. If you can't upgrade you compiler, try compiling with -gdwarf-2 or, failing that, with -gdwarf-2 -gstrict-dwarf (for -gstrict-dwarf you'll have to upgrade the compiler too).
What this option does is to instruct GCC to generate (strict) DWARF2, which does not include non-contiguous address ranges support, introduced in DWARF3.
Of course, this may degrade the debugging information quality somewhat, YMMV.

How to use gcc and -msoft-float on an i386/x86-64? [duplicate]

Is it (easily) possible to use software floating point on i386 linux without incurring the expense of trapping into the kernel on each call? I've tried -msoft-float, but it seems the normal (ubuntu) C libraries don't have a FP library included:
$ gcc -m32 -msoft-float -lm -o test test.c
/tmp/cc8RXn8F.o: In function `main':
test.c:(.text+0x39): undefined reference to `__muldf3'
collect2: ld returned 1 exit status
It is surprising that gcc doesn't support this natively as the code is clearly available in the source within a directory called soft-fp. It's possible to compile that library manually:
$ svn co svn://gcc.gnu.org/svn/gcc/trunk/libgcc/ libgcc
$ cd libgcc/soft-fp/
$ gcc -c -O2 -msoft-float -m32 -I../config/arm/ -I.. *.c
$ ar -crv libsoft-fp.a *.o
There are a few c files which don't compile due to errors but the majority does compile. After copying libsoft-fp.a into the directory with our source files they now compile fine with -msoft-float:
$ gcc -g -m32 -msoft-float test.c -lsoft-fp -L.
A quick inspection using
$ objdump -D --disassembler-options=intel a.out | less
shows that as expected no x87 floating point instructions are called and the code runs considerably slower as well, by a factor of 8 in my example which uses lots of division.
Note: I would've preferred to compile the soft-float library with
$ gcc -c -O2 -msoft-float -m32 -I../config/i386/ -I.. *.c
but that results in loads of error messages like
adddf3.c: In function '__adddf3':
adddf3.c:46: error: unknown register name 'st(1)' in 'asm'
Seems like the i386 version is not well maintained as st(1) points to one of the x87 registers which are obviously not available when using -msoft-float.
Strangely or luckily the arm version compiles fine on an i386 and seems to work just fine.
Unless you want to bootstrap your entire toolchain by hand, you could start with uclibc toolchain (the i386 version, I imagine) -- soft float is (AFAIK) not directly supported for "native" compilation on debian and derivatives, but it can be used via the "embedded" approach of the uclibc toolchain.
GCC does not support this without some extra libraries. From the 386 documentation:
-msoft-float Generate output containing library calls for floating
point. Warning: the requisite
libraries are not part of GCC.
Normally the facilities of the
machine's usual C compiler are used,
but this can't be done directly in
cross-compilation. You must make your
own arrangements to provide suitable
library functions for
cross-compilation.
On machines where a function returns
floating point results in the 80387
register stack, some floating point
opcodes may be emitted even if
-msoft-float is used
Also, you cannot set -mfpmath=unit to "none", it has to be sse, 387 or both.
However, according to this gnu wiki page, there is fp-soft and ieee. There is also SoftFloat.
(For ARM there is -mfloat-abi=softfp, but it does not seem like something similar is available for 386 SX).
It does not seem like tcc supports software floating point numbers either.
Good luck finding a library that works for you.
G'day,
Unless you're targetting a platform that doesn't have inbuilt FP support, I can't think of a reason why you'd want to emulate FP support.
Doesn't your x386 platform have external FPU support? Pity it's not a x486 with the FPU built in!
In my experience, any soft emulation is bound to be much slower than its hardware equivalent.
That's why I finished up writing a package in Ada to taget the onboard 68k FPU instead of using the soft emulation provided by the compiler manufacturer at the time. They finished up bundling it in their compiler as a matter of fact.
Edit: Just seen your comment below. Hmmm, if you don't need a full suite of FP support is it possible to roll your own for the few math functions you do need? That how the Ada package I mentioned got started.
HTH
cheers,

Resources