How to prevent GCC from inserting memset during link-time optimization? - gcc

While developping a bare metal firmware in C for a RV32IM target (RISC-V), I encountered a linking error when LTO is enabled:
/home/duranda/riscv/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: /tmp/firmware.elf.5cZNyC.ltrans0.ltrans.o: in function `.L0 ':
/home/duranda/whatever/firmware.c:493: undefined reference to `memset'
There are however no call to memset in my firmware. The memset is inserted by GCC during optimization as described here. The build is optimized for size using GCC -Os and -flto -fuse-linker-plugin flags. In addition, the -fno-builtin-memset -nostdinc -fno-tree-loop-distribute-patterns -nostdlib -ffreestanding flags are used to prevent the use of memset during optimization and to not include standard libs.
How to prevent memset insertion during LTO? Note that the firmware should not be linked against libc. I also tried providing a custom implementation of memset but the linker does not want to use it for memset inserted during optimization (still throws undefined reference).

I hit similar issue servers years ago and tried to fixed that, but it turns out I misunderstanding the meaning of -fno-builtin[1], -fno-builtin not guaranteed GCC won't call memcpy, memmove or memset implicitly.
I guess the simplest solution is, DO NOT compile your libc.c with -flto, or in another word, compile libc.c with -fno-lto.
That's my guess about what happen, I don't have know how to reproduce what you see, so it might incorrect,
During the first phase of LTO, LTO will collect any symbol you used in program
And then ask linker to provide those files, and discard any unused symbol.
Then read those files into GCC and optimize again, in this moment gcc using some built-in function to optimize or code gen, but it not pull-in before.
The symbol reference is created at LTO stage, which is too late pull in any symbol in current GCC LTO flow, and in this case, memset is discard in earlier stage...
So you might have question about why compile libc.c with -fno-lto will work? because if it didn't involved into LTO flow, which means it won't be discarded in the LTO flow.
Some sample program to show the gcc will call memset even you compile with -fno-builtin, aarch64 gcc and riscv gcc will generate a function call to memset.
// $ riscv64-unknown-elf-gcc x.c -o - -O3 -S -fno-builtin
struct bar {
int a[100];
};
struct bar y;
void foo(){
struct bar x = {{0}};
y = x;
}
Here is the corresponding gcc source code[2] for this case.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2014-August/397382.html
[2] https://github.com/riscv/riscv-gcc/blob/riscv-gcc-10.2.0/gcc/expr.c#L3143

I'm not sure -fno-builtin-* does what you think it does. If you use those flags, then GCC will try to call an external function. If you don't use those flags, GCC will instead just insert inline code instead of relying on the library.
So it would appear to me you should simply not use any -fno-builtin flags.

Related

ARM GCC Cortex-M7 bare-metal compiler flags for proper floating-point stack alignment?

Using the arm-none-eabi-gcc compiler from ARM I find that printf("%lf") prints garbage values when the stack is not 8-byte-aligned prior to calling. If I manually align the stack, then it always works:
// This always works, but fails without the get/set_MSP manipulation when
// the MSP is not 8-byte-aligned
unsigned msp = __get_MSP();
__set_MSP(msp & ~7);
// Prints wrong floating-point value when stack is not 8-byte-aligned.
// No difference using %f vs. %lf
printf("%lf\n", 1.234);
__set_MSP(msp);
arm-none-eabi-gcc -v returns:
gcc version 10.2.1 20201103 (release) (GNU Arm Embedded Toolchain 10-2020-q4-major)
and relevant compiler flags are:
-O3 -mcpu=cortex-m7 -mfpu=fpv5-d16 -mfloat-abi=hard -mthumb -Wall -fdata-sections -ffunction-sections -fno-strict-aliasing -std=gnu++17 -fno-rtti -fno-use-cxa-atexit
Can anyone suggest any compiler flags I may be missing (or shouldn't be specifying?) -- or other solutions?
Thanks.
Gcc is just generating code that produces the 8 byte alignment that is mandated by the ARM ABI specification. This is specifically pointed out in ABI for the ArmĀ® Architecture Advisory Note
From Section 3.1 "The need to align SP to a multiple of 8 at conforming call sites"
In return for preserving the natural alignment of data, conforming code
is permitted to rely on that alignment. To support aligning data allocated on the stack, the stack pointer (SP) is required to be 8-byte aligned on entry to a conforming function...

Getting assember output from GCC/Clang in LTO mode

Normally, one can get GCC's optimized assembler output from a source file using the -S flag in GCC and Clang, as in the following example.
gcc -O3 -S -c -o foo.s foo.c
But suppose I compile all of my source files using -O3 -flto to enable link-time whole-program optimizations and want to see the final compiler-generated optimized assembly for a function, and/or see where/how code gets inlined.
The result of compiling is a bunch of .o files which are really IR files disguised as object files, as expected. In linking an executable or shared library, these are then smushed together, optimized as a whole, and then compiled into the target binary.
But what if I want assembly output from this procedure? That is, the assembly source that results after link-time optimizations, during the compilation of IR to assembly, and before the actual assembly and linkage into the final executable.
I tried simply adding a -S flag to the link step, but that didn't really work.
I know disassembling the executable is possible, even interleaving with source, but sometimes it's nicer to look at actual compiler-generated assembly, especially with -fverbose-asm.
For GCC just add -save-temps to linker command:
$ gcc -flto -save-temps ... *.o -o bin/libsortcheck.so
$ ls -1
...
libsortcheck.so.ltrans0.s
For Clang the situation is more complicated. In case you use GNU ld (default or -fuse-ld=ld) or Gold linker (enabled via -fuse-ld=gold), you need to run with -Wl,-plugin-opt=emit-asm:
$ clang tmp.c -flto -Wl,-plugin-opt=emit-asm -o tmp.s
For newer (11+) versions of LLD linker (enabled via -fuse-ld=lld) you can generate asm with -Wl,--lto-emit-asm.

Passing multiple -std switches to g++

Is it safe to assume that running g++ with
g++ -std=c++98 -std=c++11 ...
will compile using C++11? I haven't found an explicit confirmation in the documentation, but I see the -O flags behave this way.
The GCC manual doesn't state that the
last of any mutually exclusive -std=... options specified takes effect. The first occurrence
or the last occurrence are the only alternatives. There are numerous
GCC flags that take mutually exclusive alternative values from a finite set - mutually
exclusive, at least modulo the language of a translation unit. Let's call them mutex options for short.
It is a seemingly random rarity for it to be documented that the last setting takes effect. It is
documented for the -O options as you've noted, and in general terms for mutually exclusive warning options, perhaps
others. It's never documented that the first of multiple setting takes effect, because
it's never true.
The documentation leans - with imperfect consistency - on the historical conventions
of command usage in unix-likes OSes. If a command accepts a mutex option
then the last occurrence of the option takes effect. If the command were - unusually -
to act only on the first occurrence of the option then it would be a bug for
the command to accept subsequent occurrences at all: it should give a usage error.
This is custom and practice. The custom facilitates scripting with tools that
respect it, e.g. a script can invoke a tool passing a default setting of some
mutex option but enable the user to override that setting via a parameter of the script,
whose value can simply be appended to the default invocation.
In the absence of official GCC documentation to the effect you want, you might get
reassurance by attempting to find any GCC mutex option for which it is not
the case that the last occurrence takes effect. Here's one stab:
I'll compile and link this program:
main.cpp
#include <cstdio>
#if __cplusplus >= 201103L
static const char * str = "C++11";
#else
static const char * str = "Not C++11";
#endif
int main()
{
printf("%s\n%d\n",str,str); // Format `%d` for `str` mismatch
return 0;
}
with the commandline:
g++ -std=c++98 -std=c++11 -m32 -m64 -O0 -O1 -g3 -g0 \
-Wformat -Wno-format -o wrong -o right main.cpp
which requests contradictory option pairs:
-std=c++98 -std=c++11: Conform to C++98. Conform to C++11.
-m32 -m64: Produce 32-bit code. Produce 64-bit code.
-O0 -O1: Do not optimise at all. Optimize to level 1.
-g3 -g0: Emit maximum debugging info. Emit no debugging info.
-Wformat -Wno-format. Sanity-check printf arguments. Don't sanity check them.
-o wrong -o right. Output program wrong. Output program right
It builds successfully with no diagnostics:
$ echo "[$(g++ -std=c++98 -std=c++11 -m32 -m64 -O0 -O1 -g3 -g0 \
-Wformat -Wno-format -o wrong -o right main.cpp 2>&1)]"
[]
It outputs no program wrong:
$ ./wrong
bash: ./wrong: No such file or directory
It does output a program right:
$ ./right
C++11
-1713064076
which tells us it was compiled to C++11, not C++98.
The bug exposed by the garbage -1713064076 was not diagnosed because
-Wno-format, not -Wformat, took effect.
It is a 64-bit, not 32-bit executable:
$ file right
right: ELF 64-bit LSB shared object, x86-64 ...
It was optimized -O1, not -O0, because:
$ "[$(nm -C right | grep str)]"
[]
shows that the local symbol str is not in the symbol table.
And it contains no debugging information:
echo "[$(readelf --debug-dump right)]"
[]
as per -g0, not -g3.
Since GCC is open-source software, another way of resolving doubts
about its behaviour that is available to C programmers, at least,
is to inspect the relevant source code, available via git source-control at
https://github.com/gcc-mirror/gcc.
The relevant source code for your question is in file gcc/gcc/c-family/c-opts.c,
function,
/* Handle switch SCODE with argument ARG. VALUE is true, unless no-
form of an -f or -W option was given. Returns false if the switch was
invalid, true if valid. Use HANDLERS in recursive handle_option calls. */
bool
c_common_handle_option (size_t scode, const char *arg, int value,
int kind, location_t loc,
const struct cl_option_handlers *handlers);
It is essentially a simple switch ladder over option settings enumerated by scode - which
is OPT_std_c__11 for option -std=c++11 - and leaves no doubt that it
puts an -std option setting into effect regardless of what setting was in effect previously. You can look at branches other than master
(gcc-{5|6|7}-branch) with the same conclusion.
It's not uncommon to find GCC build system scripts that rely on the validity of
overriding an option setting by appending a new setting. Legalistically, this
is usually counting on undocumented behaviour, but there's a better
chance of Russia joining NATO than of GCC ceasing to take the last setting that
it parses for a mutex option.

How to (cross-)compile to both ARM hard- and soft-float (softfp) with a single GCC (cross-)compiler?

I'd like to use a single (cross-)compiler to compile code for different ARM calling conventions: since I always want to use floating point and NEON instructions, I just want to select the hard-float calling convention or the soft-float (softfp) calling convention.
My compiler defaults to hard-float, but it supports both architectures that I need:
$ arm-linux-gnueabihf-gcc -print-multi-lib
.;
arm-linux-gnueabi;#marm#march=armv4t#mfloat-abi=soft
$
When I compile with the default parameters:
$ arm-linux-gnueabihf-g++ -Wall -o hello_world_armhf hello_world.cpp
It succeeds without any errors.
If I compile with the parameters returned by -print-multi-lib:
$ arm-linux-gnueabihf-g++ -marm -march=armv4t -mfloat-abi=soft -Wall -o hello_world hello_world.cpp
It again compiles without error (By the way, how can I test that the resultant code is hard- or soft-float?)
Unfortunately, if I try this:
$ arm-linux-gnueabihf-g++ -march=armv7-a -mthumb-interwork -mfloat-abi=softfp -mfpu=neon -Wall -o hello_world hello_world.cpp
[...]/gcc/bin/../lib/gcc/arm-linux-gnueabihf/4.7.3/../../../../arm-linux-gnueabihf/bin/ld: error: hello_world uses VFP register arguments, /tmp/ccwvfDJo.o does not
[...]/gcc/bin/../lib/gcc/arm-linux-gnueabihf/4.7.3/../../../../arm-linux-gnueabihf/bin/ld: failed to merge target specific data of file /tmp/ccwvfDJo.o
collect2: error: ld returned 1 exit status
$
I've tested some other permutations of the parameters, but it seems that anything other than the combination shown by -print-multi-lib results in an error.
I've read ARM compilation error, VFP registered used by executable, not object file but the problem there was that some parts of the binary were soft- and some were hard-float. I have a single C++ file to compile...
What parameter(s) I miss to be able to compile with -march=armv7-a -mthumb-interwork -mfloat-abi=softfp -mfpu=neon?
How is it possible that the error is about VFP register arguments while I explicitly have -mfloat-abi=softfp in the command line which prohibits VFP register arguments?
Thanks!
For the records, hello_world.cpp contains the following:
#include <iostream>
int main()
{
std::cout << "Hello, world!" << std::endl;
return 0;
}
You need another compiler with corresponding multilib support.
You can check multilib support with next command.
arm-none-eabi-gcc -print-multi-lib
.;
thumb;#mthumb
fpu;#mfloat-abi=hard
armv6-m;#mthumb#march=armv6s-m
armv7-m;#mthumb#march=armv7-m
armv7e-m;#mthumb#march=armv7e-m
armv7-ar/thumb;#mthumb#march=armv7
cortex-m7;#mthumb#mcpu=cortex-m7
armv7e-m/softfp;#mthumb#march=armv7e-m#mfloat-abi=softfp#mfpu=fpv4-sp-d16
armv7e-m/fpu;#mthumb#march=armv7e-m#mfloat-abi=hard#mfpu=fpv4-sp-d16
armv7-ar/thumb/softfp;#mthumb#march=armv7#mfloat-abi=softfp#mfpu=vfpv3-d16
armv7-ar/thumb/fpu;#mthumb#march=armv7#mfloat-abi=hard#mfpu=vfpv3-d16
cortex-m7/softfp/fpv5-sp-d16;#mthumb#mcpu=cortex-m7#mfloat-abi=softfp#mfpu=fpv5-sp-d16
cortex-m7/softfp/fpv5-d16;#mthumb#mcpu=cortex-m7#mfloat-abi=softfp#mfpu=fpv5-d16
cortex-m7/fpu/fpv5-sp-d16;#mthumb#mcpu=cortex-m7#mfloat-abi=hard#mfpu=fpv5-sp-d16
cortex-m7/fpu/fpv5-d16;#mthumb#mcpu=cortex-m7#mfloat-abi=hard#mfpu=fpv5-d16
https://stackoverflow.com/questions/37418986/how-to-interpret-the-output-of-gcc-print-multi-lib
How to interpret the output of gcc -print-multi-lib
With this configuration gcc -mfloat-abi=hard not only will build your files using FPU instructions but also link them with corresponding libs, avoiding "X uses VFP register arguments, Y does not" error.
The above-mentioned -print-multi-lib output produced by gcc with this patch and --with-multilib-list=armv6-m,armv7,armv7-m,armv7e-m,armv7-r,armv7-a,cortex-m7 configuration option.
If you are interested in building your own gcc with Cortex-A series multilib support, just use --with-multilib-list=aprofile configuration option for any arm*-*-* target without any patches (at list with gcc-6.2.0).
As per Linaro FAQ if your compiler prints arm-linux-gnueabi;#marm#march=armv4t#mfloat-abi=soft then you can only use -march=armv4t. If you want to use -march=armv7-a you need to build compiler yourself.
Following link could be helpful in building yourself GCC ARM Builds

gcc/ld: undefined reference to unused function

I'm using gcc 4.3.4 and ld 2.20.51 in Cygwin under Windows 7. Here's a simplified version of my problem:
foo.o contains function foo_bar() which calls bar() in bar.o
bar.o contains function bar()
main.c calls functions in foo.o, but foo_bar() is not in the call chain
If I try to compile main.c and link it to foo.o, I get an undefined reference to _foo_bar error from ld. As you can see from my Makefile except below, I've tried using flags for putting each function in its own section and having the linker discard unused sections.
COMPILE_CYGWIN = gcc -iquote$(INCDIR)
COMPILE = $(COMPILE_CYGWIN) -g -MMD -MP -Wall -ffunction-sections -Wl,-gc-sections $(DEFINE)
main_OBJECTS = main.o foo.o
main.exe : $(main_OBJECTS)
$(COMPILE) -o main.exe $(main_OBJECTS)
The function foo_bar() is a short function that provides a connection between two networking layers in a protocol stack. Some programs don't need it, so they won't link in the other object files related to the upper layer of the stack. It's a small function, and seems inappropriate to put it into its own .o file.
I don't understand why ld throws the error -- nothing is calling foo_bar(), so there's no need to include bar() in the final executable. A coworker has just told me that ld is not a "smart linker", so maybe what I'm trying to do isn't possible?
Unless the linker is from Cyberdyne Systems it has no way to know exactly which functions will actually be called. It only knows which ones are referenced. Even Skynet's linker can't predict what run-time decisions will be made or what will happen if you load a module dynamically at run-time and it starts calling various global functions1.
So, if you link in module m and it references function f, you will need to link with whatever module has f.
1. This problem is related to the Halting Problem and has been proven undecidable.
I hit the similar issue and I find this page:
http://lists.gnu.org/archive/html/bug-gnu-utils/2004-09/msg00098.html
Highligt:
The GNU linker still works at .o file granularity.
Gcc pulls in foo.o and then find bar() was undefined.
You'd better put foo_bar() into another .o file.

Resources