Memory leak in gcc 4.8.1 when using thread_local?

Memory leak in gcc 4.8.1 when using thread_local? - c++11

Valgrind is reporting leaked blocks, apparently one per thread, in the following code:
#include <iostream>
#include <thread>
#include <mutex>
#include <list>
#include <chrono>
std::mutex cout_mutex;
struct Foo
{
Foo()
{
std::lock_guard<std::mutex> lock( cout_mutex );
std::cout << __PRETTY_FUNCTION__ << '\n';
}
~Foo()
{
std::lock_guard<std::mutex> lock( cout_mutex );
std::cout << __PRETTY_FUNCTION__ << '\n';
}
void
hello_world()
{
std::lock_guard<std::mutex> lock( cout_mutex );
std::cout << __PRETTY_FUNCTION__ << '\n';
}
};
void
hello_world_thread()
{
thread_local Foo foo;
// must access, or the thread local variable may not be instantiated
foo.hello_world();
// keep the thread around momentarily
std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );
}
int main()
{
for ( int i = 0; i < 100; ++i )
{
std::list<std::thread> threads;
for ( int j = 0; j < 10; ++j )
{
std::thread thread( hello_world_thread );
threads.push_back( std::move( thread ) );
}
while ( ! threads.empty() )
{
threads.front().join();
threads.pop_front();
}
}
}
Compiler version:
$ g++ --version
g++ (GCC) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
GCC build options:
--enable-shared
--enable-threads=posix
--enable-__cxa_atexit
--enable-clocale=gnu
--enable-cxx-flags='-fno-omit-frame-pointer -g3'
--enable-languages=c,c++
--enable-libstdcxx-time=rt
--enable-checking=release
--enable-build-with-cxx
--disable-werror
--disable-multilib
--disable-bootstrap
--with-system-zlib
Program compilation options:
g++ -std=gnu++11 -Og -g3 -Wall -Wextra -fno-omit-frame-pointer thread_local.cc
valgrind version:
$ valgrind --version
valgrind-3.8.1
Valgrind options:
valgrind --leak-check=full --verbose ./a.out > /dev/null
Tail-end of valgrind output:
==1786== HEAP SUMMARY:
==1786== in use at exit: 24,000 bytes in 1,000 blocks
==1786== total heap usage: 3,604 allocs, 2,604 frees, 287,616 bytes allocated
==1786==
==1786== Searching for pointers to 1,000 not-freed blocks
==1786== Checked 215,720 bytes
==1786==
==1786== 24,000 bytes in 1,000 blocks are definitely lost in loss record 1 of 1
==1786== at 0x4C29969: operator new(unsigned long, std::nothrow_t const&) (vg_replace_malloc.c:329)
==1786== by 0x4E8E53E: __cxa_thread_atexit (atexit_thread.cc:119)
==1786== by 0x401036: hello_world_thread() (thread_local.cc:34)
==1786== by 0x401416: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (functional:1732)
==1786== by 0x4EE4830: execute_native_thread_routine (thread.cc:84)
==1786== by 0x5A10E99: start_thread (pthread_create.c:308)
==1786== by 0x573DCCC: clone (clone.S:112)
==1786==
==1786== LEAK SUMMARY:
==1786== definitely lost: 24,000 bytes in 1,000 blocks
==1786== indirectly lost: 0 bytes in 0 blocks
==1786== possibly lost: 0 bytes in 0 blocks
==1786== still reachable: 0 bytes in 0 blocks
==1786== suppressed: 0 bytes in 0 blocks
==1786==
==1786== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
--1786--
--1786-- used_suppression: 2 dl-hack3-cond-1
==1786==
==1786== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
Constructors and destructors were run once for each thread:
$ ./a.out | grep 'Foo::Foo' | wc -l
1000
$ ./a.out | grep hello_world | wc -l
1000
$ ./a.out | grep 'Foo::~Foo' | wc -l
1000
Notes:
If you change the number of threads created, the number of leaked blocks matches the number of threads.
The code is structured in such a way that might permit resource reuse (i.e. the leaked block) if GCC were so implemented.
From the valgrind stacktrace, thread_local.cc:34 is the line: thread_local Foo foo;
Due to the sleep_for() call, a program run takes about 10 seconds or so.
Any idea if this memory leak is in GCC, a result of my config options, or is some bug in my program?

It seems that the leak comes from the dynamic initialization.
Here is an example with an int :
thread_local int num=4; //static initialization
The last example does not leak. I tried it with 2 threads and no leak at all.
But now :
int func()
{
return 4;
}
thread_local int num2=func(); //dynamic initialization
This one leak ! With 2 threads it gives total heap uage: 8 allocs, 6 frees, 428 bytes allocated...
I would suggest you to use a workaround like :
thread_local Foo *foo = new Foo; //dynamic initialization
No forget at the end of the thread execution to do :
delete foo;
But the last example as one problem : What if the thread exit with error before your delete ? Leak again...
It seems that there is no great solution. Maybe we should report that to the g++ developers about that ?

try removing thread_local and using the following code
void
hello_world_thread()
{
Foo foo;
// must access, or the thread local variable may not be instantiated
foo.hello_world();
// keep the thread around momentarily
std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );
}
foo within hello_world_thread should be in the local stack for every thread. so every thread will maintain its own copy of foo. no need to explicitly marking it as thread_local. A thread_local should be used in a context when you have something like static or namespace level variable but you want each variable to maintain its own copy for every thread.
Regards
Kajal

Related

fwrite doesn't fail when disk is full?

here is a test program I wrote
int main( int argc, const char* argv[] )
{
const char name[1024] = "/dev/shm/test_file";
off_t len = atol(argv[argc - 1]);
char buf[1024];
FILE * f = fopen(name, "w");
for (int i = 0; i < len; i++) {
int ret = fwrite(buf, 1024, 1, f);
if (ret != 1) {
printf("disk full\n");
}
}
if ( fclose(f) != 0)
printf("failed to close\n");
return 0;
}
I tried to fill the /dev/shm to almost full
tmpfs 36G 36G 92K 100% /dev/shm
and ran
$ ./a.out 93
failed to close
my glibc
$ /lib/libc.so.6
GNU C Library stable release version 2.12, by Roland McGrath et al.
the kernel version is 2.6.32-642.13.1.el6.x86_64
I understand that this behavior is caused by fwrite try to cache the data in memory. (I tried setvbuf(NULL...) and fwrite immediately return failure). But this seems a little different from the definition
The fwrite() function shall return the number of elements successfully
written, which may be less than nitems if a write error is
encountered. If size or nitems is 0, fwrite() shall return 0 and the
state of the stream remains unchanged. Otherwise, if a write error
occurs, the error indicator for the stream shall be set, [CX] [Option
Start] and errno shall be set to indicate the error. [Option End]
The data was not successfully written to disk however its return value is 1. no errno set.
In this test case, the fclose catch the failure. But it could be caught by even a ftell function which is quite confusing.
I am wondering if this happens to all versions of glibc and would this be consider a bug.

The data was not successfully written to disk
The standard doesn't talk about the disk. It talks about data being successfully written to the stream (which it has been).
I am wondering if this happens to all versions of glibc
Most likely.
and would this be consider a bug.
It's a bug in your interpretation of the requirements on fwrite.

KLEE does not find uninitialized variable error

I am learning KLEE now and I wrote a simple code:
#include "klee/klee.h"
#include <stdio.h>
#include <stdlib.h>
int test(int *p)
{
int *q = (int *) malloc(sizeof(int));
if ((*p) == (*q)) {
printf("reading uninitialized heap memory");
}
return 0;
}
int main()
{
int *p = (int *) malloc(sizeof(int));
test(p);
return 0;
}
First, I generate LLVM bitcode, and then I execute KLEE to the bitcode.
Following is all output:
KLEE: output directory is "/Users/yjy/WorkSpace/Test/klee-out-13"
Using STP solver backend
KLEE: WARNING: undefined reference to function: printf
KLEE: WARNING ONCE: calling external: printf(140351601907424)
reading uninitialized heap memory
KLEE: done: total instructions = 61
KLEE: done: completed paths = 4
KLEE: done: generated tests = 4
I suppose that KLEE should give me an error that the q pointer is not initialized, but it doesn't. Why KLEE does not give me an error or warning about this? KLEE can not detect this error? Thanks in advance!

TLTR: KLEE has not implemented this feature. Clang can check this directly.
KLEE currently support add/sub/mul/div overflow checking. To use this feature, you have to compile the source code with clang -fsanitize=signed-integer-overflow or clang -fsanitize=unsigned-integer-overflow .
The idea is that a function call is inserted into the bytecode (e.g. __ubsan_handle_add_overflow) when you use the clang sanitizer. Then KLEE will handle the overflow checking once it meets the function call.
Clang support
MemorySanitizer, AddressSanitizer UndefinedBehaviorSanitizer. They are defined in projects/compiler-rt/lib directory. MemorySanitizer is the one you are looking for, which is a detector of uninitialized reads.
You can remove the KLEE function call and check with clang directly.
➜ ~ clang -g -fsanitize=memory st.cpp
➜ ~ ./a.out
==16031==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x490954 (/home/hailin/a.out+0x490954)
#1 0x7f21b72f382f (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#2 0x41a1d8 (/home/hailin/a.out+0x41a1d8)
SUMMARY: MemorySanitizer: use-of-uninitialized-value (/home/hailin/a.out+0x490954)
Exiting

GCC: How to disable heap usage entirely on an MCU?

I have an application that runs on an ARM Cortex-M based MCU and is written in C and C++. I use gcc and g++ to compile it and would like to completely disable any heap usage.
In the MCU startup file the heap size is already set to 0. In addition to that, I would also like to disallow any accidental heap use in the code.
In other words, I would like the linker (and/or the compiler) to give me an error when the malloc, calloc, free functions or the new, new[], delete, delete[] operators are used.
So far I've tried -nostdlib which gives me issues like undefined reference to _start. I also tried -nodefaultlibs but that one still does not complain when I try to call malloc. What is the right way to do this?
Notes:
This app runs on “bare metal”, there is no operating system.
I would also like to avoid any malloc usage in 3rd-party code (vendor-specific libraries, the standard library, printf etc.).
I'm fully okay with not using the parts of the C / C++ standard libraries that would require dynamic memory allocations.
I'd prefer a compile-time rather than a run-time solution.

I'm not sure it's the best way to go, however you can use the --wrap flag of ld (which can pass through gcc using -Wl).
The idea is that --wrap allows you to ask to ld to redirect the "real" symbol to your custom one; for example, if you do --wrap=malloc, then ld will look for your __wrap_malloc function to be called instead of the original `malloc.
Now, if you do --wrap=malloc without defining __wrap_malloc you will get away with it if nobody uses it, but if anyone references malloc you'll get a linking error.
$ cat test-nomalloc.c
#include <stdlib.h>
int main() {
#ifdef USE_MALLOC
malloc(10);
#endif
return 0;
}
$ gcc test-nomalloc.c -Wl,--wrap=malloc
$ gcc test-nomalloc.c -DUSE_MALLOC -Wl,--wrap=malloc
/tmp/ccIEUu9v.o: In function `main':
test-nomalloc.c:(.text+0xa): undefined reference to `__wrap_malloc'
collect2: error: ld returned 1 exit status
For new you can use the mangled names _Znwm (operator new(unsigned long)) and _Znam (operator new[](unsigned long)), which should be what every new should come down to in the end.

(posted as an answer because it won't fit in a comment)
If the OS you're running supports the use of LD_PRELOAD, this code should detect attempts to use the heap:
/* remove the LD_PRELOAD from the environment so it
doesn't kill any child process the app may spawn */
static void lib_init(void) __attribute__((constructor));
static void lib_init( void )
{
unsetenv( "LD_PRELOAD" );
}
void *malloc( size_t bytes )
{
kill( getpid(), SIGSEGV );
return( NULL );
}
void *calloc( size_t n, size_t bytes )
{
kill( getpid(), SIGSEGV );
return( NULL );
}
void *realloc( void *ptr, size_t bytes )
{
kill( getpid(), SIGSEGV );
return( NULL );
}
void *valloc( size_t bytes )
{
kill( getpid(), SIGSEGV );
return( NULL );
}
void *memalign( size_t alignment, size_t bytes )
{
kill( getpid(), SIGSEGV );
return( NULL );
}
int posix_memalign( void **ptr, size_t alignment, size_t bytes )
{
*ptr = NULL;
kill( getpid(), SIGSEGV );
return( -1 );
}
Assuming new is implemented using malloc() and delete is implemented using free(), that will catch all heap usage and give you a core file with a stack trace, assuming core files are enabled.
Add the proper headers, compile the file:
gcc [-m32|-m64] -shared heapdetect.c -o heapdetect.so
Run your app:
LD_PRELOAD=/path/to/heapdetect.so /your/app/here args ...

Why does ThreadSanitizer report a race with this lock-free example?

I've boiled this down to a simple self-contained example. The main thread enqueues 1000 items, and a worker thread tries to dequeue concurrently. ThreadSanitizer complains that there's a race between the read and the write of one of the elements, even though there is an acquire-release memory barrier sequence protecting them.
#include <atomic>
#include <thread>
#include <cassert>
struct FakeQueue
{
int items[1000];
std::atomic<int> m_enqueueIndex;
int m_dequeueIndex;
FakeQueue() : m_enqueueIndex(0), m_dequeueIndex(0) { }
void enqueue(int x)
{
auto tail = m_enqueueIndex.load(std::memory_order_relaxed);
items[tail] = x; // <- element written
m_enqueueIndex.store(tail + 1, std::memory_order_release);
}
bool try_dequeue(int& x)
{
auto tail = m_enqueueIndex.load(std::memory_order_acquire);
assert(tail >= m_dequeueIndex);
if (tail == m_dequeueIndex)
return false;
x = items[m_dequeueIndex]; // <- element read -- tsan says race!
++m_dequeueIndex;
return true;
}
};
FakeQueue q;
int main()
{
std::thread th([&]() {
int x;
for (int i = 0; i != 1000; ++i)
q.try_dequeue(x);
});
for (int i = 0; i != 1000; ++i)
q.enqueue(i);
th.join();
}
ThreadSanitizer output:
==================
WARNING: ThreadSanitizer: data race (pid=17220)
Read of size 4 at 0x0000006051c0 by thread T1:
#0 FakeQueue::try_dequeue(int&) /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:26 (issue49+0x000000402bcd)
#1 main::{lambda()#1}::operator()() const <null> (issue49+0x000000401132)
#2 _M_invoke<> /usr/include/c++/5.3.1/functional:1531 (issue49+0x0000004025e3)
#3 operator() /usr/include/c++/5.3.1/functional:1520 (issue49+0x0000004024ed)
#4 _M_run /usr/include/c++/5.3.1/thread:115 (issue49+0x00000040244d)
#5 <null> <null> (libstdc++.so.6+0x0000000b8f2f)
Previous write of size 4 at 0x0000006051c0 by main thread:
#0 FakeQueue::enqueue(int) /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:16 (issue49+0x000000402a90)
#1 main /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:44 (issue49+0x000000401187)
Location is global 'q' of size 4008 at 0x0000006051c0 (issue49+0x0000006051c0)
Thread T1 (tid=17222, running) created by main thread at:
#0 pthread_create <null> (libtsan.so.0+0x000000027a67)
#1 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) <null> (libstdc++.so.6+0x0000000b9072)
#2 main /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:41 (issue49+0x000000401168)
SUMMARY: ThreadSanitizer: data race /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:26 FakeQueue::try_dequeue(int&)
==================
ThreadSanitizer: reported 1 warnings
Command line:
g++ -std=c++11 -O0 -g -fsanitize=thread issue49.cpp -o issue49 -pthread
g++ version: 5.3.1
Can anybody shed some light onto why tsan thinks this is a data race?
UPDATE
It seems like this is a false positive. To appease ThreadSanitizer, I've added annotations (see here for the supported ones and here for an example). Note that detecting whether tsan is enabled in GCC via a macro has only recently been added, so I had to manually pass -D__SANITIZE_THREAD__ to g++ for now.
#if defined(__SANITIZE_THREAD__)
#define TSAN_ENABLED
#elif defined(__has_feature)
#if __has_feature(thread_sanitizer)
#define TSAN_ENABLED
#endif
#endif
#ifdef TSAN_ENABLED
#define TSAN_ANNOTATE_HAPPENS_BEFORE(addr) \
AnnotateHappensBefore(__FILE__, __LINE__, (void*)(addr))
#define TSAN_ANNOTATE_HAPPENS_AFTER(addr) \
AnnotateHappensAfter(__FILE__, __LINE__, (void*)(addr))
extern "C" void AnnotateHappensBefore(const char* f, int l, void* addr);
extern "C" void AnnotateHappensAfter(const char* f, int l, void* addr);
#else
#define TSAN_ANNOTATE_HAPPENS_BEFORE(addr)
#define TSAN_ANNOTATE_HAPPENS_AFTER(addr)
#endif
struct FakeQueue
{
int items[1000];
std::atomic<int> m_enqueueIndex;
int m_dequeueIndex;
FakeQueue() : m_enqueueIndex(0), m_dequeueIndex(0) { }
void enqueue(int x)
{
auto tail = m_enqueueIndex.load(std::memory_order_relaxed);
items[tail] = x;
TSAN_ANNOTATE_HAPPENS_BEFORE(&items[tail]);
m_enqueueIndex.store(tail + 1, std::memory_order_release);
}
bool try_dequeue(int& x)
{
auto tail = m_enqueueIndex.load(std::memory_order_acquire);
assert(tail >= m_dequeueIndex);
if (tail == m_dequeueIndex)
return false;
TSAN_ANNOTATE_HAPPENS_AFTER(&items[m_dequeueIndex]);
x = items[m_dequeueIndex];
++m_dequeueIndex;
return true;
}
};
// main() is as before
Now ThreadSanitizer is happy at runtime.

This looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78158. Disassembling the binary produced by GCC shows that it doesn't instrument the atomic operations on O0.
As a workaround, you can either build your code with GCC with -O1/-O2, or get yourself a fresh Clang build and use it to run ThreadSanitizer (this is the recommended way, as TSan is being developed as part of Clang and only backported to GCC).
The comments above are invalid: TSan can easily comprehend the happens-before relation between the atomics in your code (one can check that by running the above reproducer under TSan in Clang).
I also wouldn't recommend using the AnnotateHappensBefore()/AnnotateHappensAfter() for two reasons:
you shouldn't need them in most cases; they denote that the code is doing something really complex (in which case you may want to double-check you're doing it right);
if you make an error in your lock-free code, spraying it with annotations may mask that error, so that TSan won't notice it.

The ThreadSanitizer is not good at counting, it cannot understand that writes to the items always happen before the reads.
The ThreadSanitizer can find that the stores of m_enqueueIndex happen before the loads, but it does not understand that the store to items[m_dequeueIndex] must happen before the load when tail > m_dequeueIndex.

AddressSanitizer Crash on GCC 4.8

I've just tried out GCC 4.8's new exciting feature AddressSanitizer.
The program
#include <iostream>
int main(int argc, const char * argv[], const char * envp[]) {
int *x = nullptr;
int y = *x;
std::cout << y << std::endl;
return 0;
}
compile find using
g++-4.8 -std=gnu++0x -g -fsanitize=address -fno-omit-frame-pointer -Wall ~/h.cpp -o h
but when I run the program I get
ASAN:SIGSEGV
=================================================================
==7531== ERROR: AddressSanitizer crashed on unknown address 0x000000000000 (pc 0x000000400aac sp 0x7fff11ce0fd0 bp 0x7fff11ce1000 T0)
AddressSanitizer can not provide additional info.
#0 0x400aab (/home/per/h+0x400aab)
#1 0x7fc432e1b76c (/lib/x86_64-linux-gnu/libc-2.15.so+0x2176c)
Stats: 0M malloced (0M for red zones) by 0 calls
Stats: 0M realloced by 0 calls
Stats: 0M freed by 0 calls
Stats: 0M really freed by 0 calls
Stats: 0M (0 full pages) mmaped in 0 calls
mmaps by size class:
mallocs by size class:
frees by size class:
rfrees by size class:
Stats: malloc large: 0 small slow: 0
This seems like an incorrect way to report a memory error. Have I missed some compilation or link flags?

This is the intended way to report a NULL dereference.
You can run the program output through asan_symbolize.py (should be present in your GCC tree) to get symbol names and line numbers in the source file.

I cannot find any asan_symbolize.py on gcc 4.8 nor 4.9.
I added a workaround at https://code.google.com/p/address-sanitizer/issues/detail?id=223

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Memory leak in gcc 4.8.1 when using thread_local? - c++11

Related

fwrite doesn't fail when disk is full?

KLEE does not find uninitialized variable error

GCC: How to disable heap usage entirely on an MCU?

Why does ThreadSanitizer report a race with this lock-free example?

AddressSanitizer Crash on GCC 4.8

Categories

Resources