Thread sanitizer complaining of unexpected memory map - parallel-processing

I am trying to test the usage of -fsanitize=thread for gcc, and its complaining of unexpected memory mapping, maybe there might have been some change in the kernel, and thats the reason for it. Is there any thing I could do to make it work ?
This is what I am doing ...
mfrw#kp ...fpp/asgn/as2 %
mfrw#kp ...fpp/asgn/as2 % cat tiny.cpp
#include <pthread.h>
int global;
void *thread(void *x) {
global = 42;
return x;
}
int main() {
pthread_t t;
pthread_create(&t, NULL, thread, NULL);
global = 43;
pthread_join(t, NULL);
return global;
}
mfrw#kp ...fpp/asgn/as2 % g++ tiny.cpp -fsanitize=thread -pie -fPIC -g -O1 -o tinyrace -pthread
mfrw#kp ...fpp/asgn/as2 % uname -a
Linux kp 4.4.33-1-MANJARO #1 SMP PREEMPT Fri Nov 18 18:06:44 UTC 2016 x86_64 GNU/Linux
mfrw#kp ...fpp/asgn/as2 % gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
mfrw#kp ...fpp/asgn/as2 % ./tinyrace
FATAL: ThreadSanitizer: unexpected memory mapping 0x55e38776b000-0x55e38776c000
mfrw#kp ...fpp/asgn/as2 %

It is to do with your compilation option: -pie -fPIC.
If I compiled your code (in Ubuntu 16.04, latest update) with:
g++ -fsanitize=thread -pie -fPIC tinyrace.c -g -O1 -o tinyrace -pthread
I will get the same error.
But if changed to:
g++ -fsanitize=thread tinyrace.c -g -O1 -o tinyrace -pthread
Then the race condition alert is printed:
./tinyrace
==================
WARNING: ThreadSanitizer: data race (pid=12032)
Write of size 4 at 0x00000060108c by thread T1:
#0 thread(void*) /home/tteikhua/tinyrace.c:5 (tinyrace+0x000000400a5d)
#1 <null> <null> (libtsan.so.0+0x0000000230d9)
Previous write of size 4 at 0x00000060108c by main thread:
#0 main /home/tteikhua/tinyrace.c:11 (tinyrace+0x000000400ab1)
Location is global 'global' of size 4 at 0x00000060108c (tinyrace+0x00000060108c)
Thread T1 (tid=12034, running) created by main thread at:
#0 pthread_create <null> (libtsan.so.0+0x000000027577)
#1 main /home/tteikhua/tinyrace.c:10 (tinyrace+0x000000400aa7)
SUMMARY: ThreadSanitizer: data race /home/tteikhua/tinyrace.c:5 thread(void*)

Yes, it's due to changes in the kernel and it's not GCC-specific, clang exposes the same behaviour.
There is a corresponding bug in GCC tracker, which references fix in the upstream. Comments mention kernels 4.1+, but I hit this problem on 3.16.
As mentioned in the answer by Peter Teoh, it might work if you omit pie/pic options, but the proper fix is in newer thread sanitizer used by newer compilers (after September 2016, but it's not clear whether GCC 6.x branch got the fix).

Related

CPU2017 benchmark 510.parest_r build failed with gcc9.3 and gcc9.4

everyone.
I'm try to build CPU2017 intrate and fprate test set on aarch64 server with gcc9.3. All the benchmark build successed, except 510.parest_r. Then i try build it with gcc9.4, meet the same error. I used the Example-gcc-linux-aarch64.cfg as configure file, just edit the gcc path.
Here is the failed info:
/home/gcc9.3/bin/g++ -std=c++03 -mabi=lp64 -c -o source/me-tomography/synthetic_data.o -DSPEC -DNDEBUG -Iinclude -I. -DSPEC_AUTO_SUPPRESS_OPENMP -O3 -DSPEC_LP64 source/me-tomography/synthetic_data.cc
/home/gcc9.3/bin/g++ -std=c++03 -mabi=lp64 -c -o source/multigrid/mg_base.o -DSPEC -DNDEBUG -Iinclude -I. -DSPEC_AUTO_SUPPRESS_OPENMP -O3 -DSPEC_LP64 source/multigrid/mg_base.cc
/home/gcc9.3/bin/g++ -std=c++03 -mabi=lp64 -c -o source/me-tomography/measurements.o -DSPEC -DNDEBUG -Iinclude -I. -DSPEC_AUTO_SUPPRESS_OPENMP -O3 -DSPEC_LP64 source/me-tomography/measurements.cc
init2.c:52: MPFR assertion failed: p >= 2 && p <= ((mpfr_prec_t)((mpfr_uprec_t)(~(mpfr_uprec_t)0)>>1))
during GIMPLE pass: forwprop
source/me-tomography/measurements.cc: In constructor 'METomography::Measurements::ReferencedMeasurements::RatioMinusRatio<dim, number>::RatioMinusRatio(const libparest::Slave::Stationary::ProblemDescription&, const dealii::Function<dim>&, const std::set<unsigned char>&) [with int dim = 3; number = double]':
source/me-tomography/measurements.cc:1739:7: internal compiler error: Aborted
1739 | RatioMinusRatio<dim,number>::
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
0xafbd97 crash_signal
../.././gcc/toplev.c:326
0xffff9e304d78 __GI_raise
../sysdeps/unix/sysv/linux/raise.c:51
0xffff9e2f1aab __GI_abort
/build/glibc-RIFKjK/glibc-2.31/stdlib/abort.c:79
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
specmake: *** [/home/spec/cpu2017_aarch64/benchspec/Makefile.defaults:356: source/me-tomography/measurements.o] Error 1
specmake: *** Waiting for unfinished jobs....
The failed info seem be caused by the MPRF float percision setting?
I tried build 510.parest_r with llvm-10, build success.
By the way, i build same gcc9.3 in x86_64 server, build 510.parest_r success.
You've found a bug in an older version of GCC (or possibly your system's RAM is failing, but unlikely if it consistently crashes at the same place). Or perhaps a bug in MPFR, although that seems less likely.
If you preprocess that source (add -E or -save-temps to the command line that crashed) and put it on https://godbolt.org/, does it still crash the same way with current ARM64 GCC, e.g. a nightly build of trunk? (https://godbolt.org/z/K6GnrYrj1 is ARM64 GCC trunk, with your command line args without the preprocessor stuff, which won't matter when compiling CPP output.)
If it still crashes with current GCC, then file a bug report on https://gcc.gnu.org/bugzilla/, ideally with a MCVE of the part of the source that triggers the bug. (Remove as many parts of the file as you can while preserving the crash behaviour. e.g. take out tons of stuff, undo if that makes it compile.)
If it doesn't crash with newer GCC, it might already be a known bug, or got fixed by accident, or a different MPFR or other library version mattered. In that case, maybe not worth reporting it upstream. Or if you do make sure to include the fact that the range of affected versions doesn't include GCC12 or current trunk. Probably this Stack Overflow Q&A is sufficient for future users to know that it's a known bug.

ld unrecognized emulation mode /tmp/ccK2pwtc.o on ubuntu

I'm using the following machine:
Linux version 5.4.0-42-generic (buildd#lgw01-amd64-038) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
The program
int main() {
return 0;
}
when compiled
gcc -Wl,-m main.c
returns
/usr/bin/ld: unrecognised emulation mode: /tmp/ccJI1LRo.o
Supported emulations: elf_x86_64 elf32_x86_64 elf_i386 elf_iamcu elf_l1om elf_k1om i386pep i386pe
collect2: error: ld returned 1 exit status
I am not sure where to start as to knowing why.
Also, gcc and ld versions:
> gcc --version
gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> ld --version
GNU ld (GNU Binutils for Ubuntu) 2.34
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
The -Wl,-m argument specifies that gcc should pass the -m argument to the linker.
When the linker is called, the -m option is used, and it's followed by a intermediate object file that's been compiled (/tmp/ccJI1LRo.o), as opposed to a supported emulation, which leads to the error message you've encountered.
The linker's -m option requires an argument for specifying an emulation. The output in your question lists supported emulations.

clang vs gcc behavioral difference with AddressSanitizer

I have a sample code with memory leaks. Though clang displays the leak correctly, I am unable to achieve the same with gcc. gcc version I am using is 4.8.5-39
Code:
#include <stdlib.h>
void *p;
int main() {
p = malloc(7);
p = 0; // The memory is leaked here.
return 0;
}
CLANG:
clang -fsanitize=address -g memory-leak.c ; ASAN_OPTIONS=detect_leaks=1 ./a.out
=================================================================
==15543==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 7 byte(s) in 1 object(s) allocated from:
#0 0x465289 in __interceptor_malloc (/u/optest/a.out+0x465289)
#1 0x47b549 in main /u/optest/memory-leak.c:4
#2 0x7f773fe14544 in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:266
SUMMARY: AddressSanitizer: 7 byte(s) leaked in 1 allocation(s).
GCC:
gcc -fsanitize=address -g memory-leak.c ; ASAN_OPTIONS=detect_leaks=1 ./a.out
I need to use gcc. Can someone help me understand why gcc isn't behaving like clang and what should I do to make it work.
Can someone help me understand why gcc isn't behaving like clang
Gcc-4.8 was released on March 22, 2013.
The patch to support -fsanitize=leak was sent on Nov. 15, 2013, and is likely not backported into the 4.8 release.
GCC-8.3 has no trouble detecting this leak:
$ gcc -g -fsanitize=address memory-leak.c && ./a.out
=================================================================
==166614==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 7 byte(s) in 1 object(s) allocated from:
#0 0x7fcaf3dc5330 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xe9330)
#1 0x55d297afc162 in main /tmp/memory-leak.c:4
#2 0x7fcaf394052a in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: 7 byte(s) leaked in 1 allocation(s).
This item in Asan's FAQ is relevant:
Q: Why didn't ASan report an obviously invalid
memory access in my code?
A1: If your errors is too obvious, compiler might have
already optimized it out by the time Asan runs.

Should gcc builtins always be resolved during the compilation step, or the linker step?

In running gcc 3.4.3 on a Solaris 5.11 box I see that builtin functions are left undefined during compilation, but are resolved against libgcc.a when the Solaris linker links against libgcc.a.
On gcc 4.5.2 on another Solaris box, the same builtin functions are resolved during compilation of the .o, and the linker isn't concerned about them.
A boiled down example file compiled on gcc 3.4.3:
# cat ctzll.c
int func(unsigned long long x)
{
return __builtin_ctzll(x);
}
# cat main.c
int main(void)
{
return func(0xc0ffee);
}
First compile ctzll.c, and check the symbols:
# nm ctzll.o
U __ctzdi2
0000000000000000 T func
Now compile main.c, and link the objects:
# gcc -m64 main.c -c
# gcc -m64 ctzll.o main.o -v
Reading specs from /usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/specs
Configured with: /build/i386/components/gcc3/gcc-3.4.3/configure --prefix=/usr/sfw --mandir=/usr/sfw/share/man --infodir=/usr/sfw/share/info --without-gnu-ld --with-ld=/usr/bin/ld --enable-languages=c,c++,f77,objc --enable-shared --with-gnu-as --with-as=/usr/gnu/bin/as
Thread model: posix
gcc version 3.4.3 (csl-sol210-3_4-20050802)
/usr/sfw/libexec/gcc/i386-pc-solaris2.11/3.4.3/collect2 -V -Y P,/lib/64:/usr/lib/64:/usr/sfw/lib/64 -R /lib/64:/usr/lib/64:/usr/sfw/lib/64 -Qy /usr/lib/amd64/crt1.o /usr/lib/amd64/crti.o /usr/lib/amd64/values-Xa.o /usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/amd64/crtbegin.o -L/usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/amd64 -L/usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/../../../amd64 -L/lib/amd64 -L/usr/lib/amd64 ctzll.o main.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh /usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/amd64/crtend.o /usr/lib/amd64/crtn.o
ld: Software Generation Utilities - Solaris Link Editors: 5.11-1.2276
# nm a.out | grep ctz
0000000000400fd0 T __ctzdi2
So, if I understand correctly, the Solaris linker resolved __ctzdi2 (the internal representation of __builtin_ctzll).
Now, compile with gcc 4.5.2 on the other Solaris machine:
#gcc -m64 ctzll.c -c
# nm ctzll.o
0000000000000000 T func
The symbol has been resolved fine in the object file, and it has been inlined into the .o assembly as:
8: 48 0f bc 45 f8 bsf -0x8(%rbp),%rax
Is the 3.4.3 compiler behaving correctly? I'd have expected the actual compilation to handle the builtins, like 4.5.2, by referencing the 64 bit version of libgcc.a. The lack of uniformity across the compilers is causing upstream problems in my project as the builtin remains undefined and the linker isn't resolving the symbols as I'm not linking against the OS specific 64 bit libraries (libgcc.a). I'm not sure if the 3.4.3 compiler is misconfigured causing the .o files to have undefined builtin functions caught by linking, or if the newer compilers are just smarter and I need to add the 64 bit libraries to the linker to handler the older compiler.
The 3.4.3 seems to show a valid libgcc.a which contains the definition of _ctzdi2:
# gcc -m64 -print-libgcc-file-name
/usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/amd64/libgcc.a
# nm /usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/amd64/libgcc.a | grep ctzdi2
0000000000000000 T __ctzdi2
_ctzdi2.o:
The whole point of libgcc.a is to implement builtin operations that gcc can't emit code for (which includes not only __builtin_ functions, but also things like 'long long' math operations on some 32-bit platforms).
Obviously the newer gcc got smarter about emitting code.

Link error when compiling gcc atomic operation in 32-bit mode

I have the following program:
~/test> cat test.cc
int main()
{
int i = 3;
int j = __sync_add_and_fetch(&i, 1);
return 0;
}
I'm compiling this program using GCC 4.2.2 on Linux running on a multi-cpu 64-bit Intel machine:
~/test> uname --all
Linux doom 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
When I compile the program in 64-bit mode, it compiles and links fine:
~/test> /share/tools/gcc-4.2.2/bin/g++ test.cc
~/test>
When I compile it in 32-bit mode, I get the following error:
~/test> /share/tools/gcc-4.2.2/bin/g++ -m32 test.cc
/tmp/ccEVHGkB.o(.text+0x27): In function `main':
: undefined reference to `__sync_add_and_fetch_4'
collect2: ld returned 1 exit status
~/test>
Although I will never actually run on a 32-bit processor, I do need a 32-bit executable so I can link with some 32-bit libraries.
My 2 questions are:
Why do I get a link error when I compile in 32-bit mode?
Is there some way to get the program to compile and link, while still being able to link with a 32-bit library?
The answer from Dan Udey was close, close enough in fact to allow me to find the real solution.
According to the man page "-mcpu" is a deprecated synonym for "-mtune" and just means "optimize for a particular CPU (but still run on older CPUs, albeit less optimal)". I tried this, and it did not solve the issue.
However, "-march=" means "generate code for a particular CPU (and don't run on older CPUs)". When I tried this it solved the problem: specifying a CPU of i486 or better got rid of the link error.
~/test> /share/tools/gcc-4.2.2/bin/g++ -m32 test.cc
/tmp/ccYnYLj6.o(.text+0x27): In function `main':
: undefined reference to `__sync_add_and_fetch_4'
collect2: ld returned 1 exit status
~/test> /share/tools/gcc-4.2.2/bin/g++ -m32 -march=i386 test.cc
/tmp/ccOr3ww8.o(.text+0x22): In function `main':
: undefined reference to `__sync_add_and_fetch_4'
collect2: ld returned 1 exit status
~/test> /share/tools/gcc-4.2.2/bin/g++ -m32 -march=i486 test.cc
~/test> /share/tools/gcc-4.2.2/bin/g++ -m32 -march=pentium test.cc
From the GCC page on Atomic Builtins:
Not all operations are supported by
all target processors. If a particular
operation cannot be implemented on the
target processor, a warning will be
generated and a call an external
function will be generated. The
external function will carry the same
name as the builtin, with an
additional suffix `_n' where n is the
size of the data type.
Judging from your compiler output, which refers to __sync_add_and_fetch_4, this is what's happening. For some reason, GCC is not generating the external function properly.
This is likely why you're only getting an error in 32-bit mode - when compiling for 64-bit mode, it compiles for your processor more closely. When compiling for 32-bit, it may well be using a generic arch (i386, for example) which does not natively support those features. Try specifying a specific architecture for your chip family (Xeon, Core 2, etc.) via -mcpu and see if that works.
If not, you'll have to figure out why GCC isn't including the appropriate function that it should be generating.

Resources