Segmentation fault on using boost syslog and cpp-netlib - boost

After adding boost syslog into source code, segmentation fault appears inside cpp-netlib library.
I was able to prepare minimum working code snippet to reproduce the problem.
#include <boost/network/protocol/http/client.hpp>
#include <boost/log/utility/setup/file.hpp>
#include <boost/log/sinks/syslog_backend.hpp>
#include <iostream>
using namespace boost::network;
using namespace boost::network::http;
namespace sinks = boost::log::sinks;
int main()
{
client::request request_("http://www.boost.org/");
client client_;
client::response response_ = client_.get(request_);
std::string body_ = body(response_);
std::cout << "body: " << body_;
using syslog_sinkT = sinks::synchronous_sink <sinks::syslog_backend>;
boost::shared_ptr <sinks::syslog_backend> backend = boost::make_shared <sinks::syslog_backend> ();
boost::shared_ptr<syslog_sinkT> sink = boost::make_shared <syslog_sinkT> (backend);
}
When last 2 lines are commented, segmentation fault disappears and everything works fine.
gdb stack trace (approximately, may vary):
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xf7c29b40 (LWP 19874)]
0x00000000 in ?? ()
(gdb) where
#0 0x00000000 in ?? ()
#1 0x083c376c in boost::asio::detail::task_io_service_operation::complete (bytes_transferred=260, ec=..., owner=...,
this=0xf6900710)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/detail/task_io_service_o$
eration.hpp:38
#2 boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., this=<optimized out>)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/detail/impl/task_io_serv$
ce.ipp:372
#3 boost::asio::detail::task_io_service::run (ec=..., this=0x84ea280)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/detail/impl/task_io_serv$
ce.ipp:149
#4 boost::asio::io_service::run (this=0x84e9a94)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/impl/io_service.ipp:59
#5 0x083b5766 in boost::_mfi::mf0<unsigned int, boost::asio::io_service>::operator() (p=<optimized out>, this=<optimized out>)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/bind/mem_fn_template.hpp:49
#6 boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> >::operator()<unsigned int, boost::_mfi::mf0<unsigned int, boost::a$
io::io_service>, boost::_bi::list0> (a=<synthetic pointer>, f=..., this=0x84ea94c)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/bind/bind.hpp:249
#7 boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boo$
t::asio::io_service*> > >::operator() (this=0x84ea944)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/bind/bind.hpp:1222
#8 boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::$
ist1<boost::_bi::value<boost::asio::io_service*> > > >::run (this=0x84ea828)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/thread/detail/thread.hpp:116
#9 0x0840a4f8 in boost::(anonymous namespace)::thread_proxy (param=0x84ea828) at libs/thread/src/pthread/thread.cpp:167
#10 0xf7de1f70 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#11 0xf7d18bee in clone () from /lib/i386-linux-gnu/libc.so.6
Problem exists on Ubuntu 14.04 with cpp-netlib 0.11.2 and both boost versions 1_58_0 and 1_60_0. Boost, cpp-netlib and my application are compiled with -std=c++11.
Note 1. Segmentation fault appears inside cpp-netlib before reaching syslog_backend creation. Only presence of last 2 lines guarantees SIGSEGV reproduction.
Note 2. Reproduces only with syslog_backend. Any other logging targets (file, consol) work fine.
The best idea I have is the problem may lay inside boost during static variables initialisation, but I have no proves regarding this version.
Any suggestions?

Seems like I used too many compile options for building both boost and cpp-netlib.
I prepared new build for both boost and cpp-netlib once again, but this time I used as less additional options as possible.
And it works fine.
EDIT: I found the key which causes the error. It's BOOST_ASIO_ENABLE_HANDLER_TRACKING It was defined during the boost compilation, but wasn't defined during compilation of cpp-netlib and my application.
https://svn.boost.org/trac/boost/ticket/11945

Related

Why calling dlopen sometimes breaks my application by damaging class variables content?

I am trying to load library with dlopen().
But call to this dlopen() function sometimes (not always) damages my class variables and then app goes to segmentation fault.
Below is not precise code (pseudocode), but explanation what happens:
class MyClass {
public:
int MyVar;
void Print() { printf("Simply breakpoint\n"); };
void LoadLibrary() { dlopen("/usr/lib/x86_64-linux-gnu/libavcodec.so.58.54.100",RTLD_LAZY); };
MyClass() {
MyVar = 12345;
printf("MyVar address %p\n",&MyVar);
Print();
LoadLibrary();
};
}
void main()
{
MyClass obj;
}
I do debug it with gdb following way:
>gdb MyApp
>break Print
>run
when it stops at Print function breakpoint I see printed address of variable MyVar.
MyVar address 0x7fff900bc2bc
Also I can check its content.
Then I do
>watch *0x7fff900bc2bc
Hardware watchpoint 2: *0x7fff900bc2bc
>cont
When it continues it breaks on unexpected writing to my variable MyVar:
Thread 1 "MyApp" hit Hardware watchpoint 2: *0x7fff900bc2bc
Old value = 12345
New value = 32767
memmove () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:356
356 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) backtrace
#0 memmove () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:356
#1 0x00007ffff7fde759 in _dl_map_object_deps (map=map#entry=0x7fff90145110, preloads=preloads#entry=0x0,
npreloads=npreloads#entry=0, trace_mode=trace_mode#entry=0, open_mode=open_mode#entry=-2147483648)
at dl-deps.c:446
#2 0x00007ffff7fe4db0 in dl_open_worker (a=a#entry=0x7fffa6fd80f0) at dl-open.c:571
#3 0x00007ffff53dd928 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>,
args=<optimized out>) at dl-error-skeleton.c:208
#4 0x00007ffff7fe460a in _dl_open (file=0x42d8ee0 "/usr/lib/x86_64-linux-gnu/libavcodec.so.58.54.100",
mode=-2147483646, caller_dlopen=<optimized out>, nsid=-2, argc=2, argv=0x7fffffffea88, env=0x54037d0)
at dl-open.c:837
#5 0x00007ffff57bc34c in dlopen_doit (a=a#entry=0x7fffa6fd8310) at dlopen.c:66
#6 0x00007ffff53dd928 in __GI__dl_catch_exception (exception=exception#entry=0x7fffa6fd82b0,
operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:208
#7 0x00007ffff53dd9f3 in __GI__dl_catch_error (objname=0x7fff900d8770, errstring=0x7fff900d8778,
mallocedp=0x7fff900d8768, operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:227
#8 0x00007ffff57bcb59 in _dlerror_run (operate=operate#entry=0x7ffff57bc2f0 <dlopen_doit>,
args=args#entry=0x7fffa6fd8310) at dlerror.c:170
#9 0x00007ffff57bc3da in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#10 0x000000000209ec5b in MyClass::LoadLibrary() ()
.......
From stack backtrace I see that MyVar is damaged by call to dlopen()
But why?
What I am doing wrong?
How to resolve?
Unfortunately I cannot show all source code because it is huge and involves many different components, many threads, many 3rd party libraries.
I cannot simply dynamically link my app with libavcodec because it is already statically linked in 3rd party library but 3rd party library is built without required features unfortunately (without VAAPI support). Dynamic linking makes symbol conflicts.
That is why I was decided try to load libavcodec manually by dlopen() and get all required function pointers from dlsym().
But why? What I am doing wrong? How to resolve?
You didn't say which version of GLIBC you are using (or which distribution).
The code in GLIBC-2.27 dl-deps.c reads:
struct link_map **l_initfini = (struct link_map **)
malloc ((2 * nneeded + 1) * sizeof needed[0]);
if (l_initfini == NULL)
_dl_signal_error (ENOMEM, map->l_name, NULL,
N_("cannot allocate dependency list"));
l_initfini[0] = l;
memcpy (&l_initfini[1], needed, nneeded * sizeof needed[0]);
memcpy (&l_initfini[nneeded + 1], l_initfini,
nneeded * sizeof needed[0]); // line 446
You also didn't say whether MyClass is heap or stack allocated.
One way that the GLIBC code could write over your variable is when you have already corrupted heap earlier. This is especially likely if MyClass is in fact heap-allocated (which it appears to be given the 0x7fff900bc2bc address).
The fact that this "write over" happens only some of the time is also symptomatic of heap corruption.
As the very first step, I would run the program under Valgrind and make sure that no heap corruption (heap buffer overflow, free unallocated, double-free, etc.) is detected before LoadLibrary() runs.

core dump stack indicates SIGSEGV due to vector<vector<int>> usage

I have a code snippet that is behaving weirdly. The code is simply aiming to implement radix and bucket sort. When I comment in the main one of either sort and run it works perfectly. But when I enable both of them i am getting a core dump. And the weird part is core dump as indicated by the stack is crossing over into the stl_vector.h.
The code reference is here:- https://rextester.com/RUUDP10453
When i enable only one of the sorts like below in main it works fine.
//doRadixSort(arr, size);
doBucketSort(arr, size);
or
doRadixSort(arr, size);
//doBucketSort(arr, size);
But when both are enabled there is segmentation fault after both sorts are completed as indicated by the
cout << "i am here at exit" << endl;
The core dump stack indicates some reference/hint at vector of vector buckets. But i have properly allocated and reserved it the required memory. so why this is happening i need some expertise to dig out. I have tried debugging this in eclipse CDT C++ for about 2 hrs with no lead.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 _int_free (av=0x7f66d702eb00 <main_arena>, p=0xf98020, have_lock=0) at malloc.c:3976
3976 >= ((char *) av->top + chunksize(av->top)), 0))
(gdb) where
#0 _int_free (av=0x7f66d702eb00 <main_arena>, p=0xf98020, have_lock=0) at malloc.c:3976
#1 0x00007f66d6cf33dc in __GI___libc_free (mem=<optimized out>) at malloc.c:2966
#2 0x00000000004030fa in __gnu_cxx::new_allocator<std::vector<int, std::allocator<int> > >::deallocate (this=0x7fffc6ffa060, __p=0xf98030) at /usr/include/c++/6.3.1/ext/new_allocator.h:110
#3 0x0000000000402d23 in std::allocator_traits<std::allocator<std::vector<int, std::allocator<int> > > >::deallocate (__a=..., __p=0xf98030, __n=10) at /usr/include/c++/6.3.1/bits/alloc_traits.h:442
#4 0x00000000004027ac in std::_Vector_base<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >::_M_deallocate (this=0x7fffc6ffa060, __p=0xf98030, __n=10)
at /usr/include/c++/6.3.1/bits/stl_vector.h:178
#5 0x00000000004025e4 in std::_Vector_base<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >::~_Vector_base (this=0x7fffc6ffa060, __in_chrg=<optimized out>)
at /usr/include/c++/6.3.1/bits/stl_vector.h:160
#6 0x000000000040211d in std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >::~vector (this=0x7fffc6ffa060, __in_chrg=<optimized out>)
at /usr/include/c++/6.3.1/bits/stl_vector.h:427
#7 0x0000000000401d4b in doBucketSort (arr=0x7fffc6ffa100, size=#0x7fffc6ffa0f8: 12) at tako.cpp:97
#8 0x0000000000401e29 in main (argc=1, argv=0x7fffc6ffa218) at tako.cpp:141
(gdb)
Alternatively, I found the below also works which is equivalent to the resize function.
vector<vector<int>> buckets;
constexpr size_t size=10, bucketSize=10;
buckets.reserve(bucketSize);
for(unsigned int i=0; i<=bucketSize; ++i)
buckets.push_back({ });
for(unsigned int i=0; i<=bucketSize; ++i)
buckets[i].reserve(size);

conflict in symbols exposed by tcmalloc and glibc

I was recently debugging a crash in a product and identified the cause to be a conflict in the memory allocation symbols exposed by glibc and tcmalloc. I wrote the following sample code for exposing this issue:
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <assert.h>
#include <stdlib.h>
int main()
{
struct addrinfo hints = {0}, *res = NULL;
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
int rc = getaddrinfo("myserver", NULL, &hints, &res);
assert(rc == 0);
return 0;
}
I compiled it using the following command:
g++ temp.cpp -g -lresolv
I executed the program using the following command:
LD_PRELOAD=/path/to/libtcmalloc_minimal.so.4 ./a.out
The program crashes with the following stack:
#0 0x00007ffff6c7c875 in *__GI_raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff6c7de51 in *__GI_abort () at abort.c:92
#2 0x00007ffff6cbd8bf in __libc_message (do_abort=2, fmt=0x7ffff6d8c460 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3 0x00007ffff6cc30c8 in malloc_printerr (action=2, str=0x7ffff6d88fec "free(): invalid pointer", ptr=) at malloc.c:6282
#4 0x00007ffff6cc810c in *__GI___libc_free (mem=) at malloc.c:3733
#5 0x00007ffff6839e89 in _nss_dns_gethostbyname4_r (name=0x400814 "myserver", pat=0x7fffffffdfa8, buffer=0x7fffffffd9b0 "myserver.mydomain.com", buflen=1024, errnop=0x7fffffffdfbc, herrnop=0x7fffffffdf98, ttlp=0x0) at nss_dns/dns-host.c:341
#6 0x00007ffff6d11917 in gaih_inet (name=0x400814 "myserver", service=0x7fffffffdf88, req=0x7fffffffe1d0, pai=0x7fffffffe160, naddrs=0x7fffffffe168) at ../sysdeps/posix/getaddrinfo.c:880
#7 0x00007ffff6d14301 in *__GI_getaddrinfo (name=0x400814 "myserver", service=0x0, hints=0x7fffffffe1d0, pai=0x7fffffffe200) at ../sysdeps/posix/getaddrinfo.c:2452
#8 0x00000000004006f0 in main () at temp.cpp:12
The reason for this is that the free() function called by _nss_dns_gethostbyname4_r() from libnss_dns.so is from libc.so while the corresponding malloc() was called from libresolv.so from libtcmalloc_minimal.so. The addresses of tcmalloc's malloc() and free() functions are getting into the GOT of libresolv.so leading to this crash. The crash goes away if I don't link my program to libresolv.so.
Now for my question. Is there any documentation which explains how to safely use tcmalloc to avoid crashes like this ?
glibc has some documentation for interposing malloc:
Replacing malloc
Something else must be going here, though. Typical builds of glibc and glibc will get this right (even in fairly old versions of either package).
My best guess is you are using some SUSE glibc variant, which uses RTLD_DEEPBIND for NSS modules. This results in a known issue with malloc interposition. SUSE suggests setting the RTLD_DEEPBIND=0 environment variable as a workaround.

`gdb` unable to unwind a stack

Consider following (broken) code:
#include <iostream>
#include <memory>
using namespace std;
class Test {
public:
unique_ptr<string> s;
Test() : s(NULL) {
}
void update(string& st) {
s = unique_ptr<string>(&(st));
}
};
void update(Test& t) {
string s("Hello to you");
t.update(s);
}
int main() {
Test t;
update(t);
cout << *t.s << endl;
}
Here we have error in method Test::update() we do not make a uniq copy of an object. So when the program is run under macOS, you'll get:
$ ./test
Hello t��E]�
test(44981,0x7fff99ba93c0) malloc: *** error for object 0x7fff5d45b690: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
[1] 44981 abort ./test
I've been able to to debug this case successfully using lldb. Even without setting a breakpoint in malloc_error_break, just running application until it gets caught in SIGABRT handler.
lldb ./test
(lldb) target create "./test"
Current executable set to './test' (x86_64).
(lldb) run
Process 44993 launched: './test' (x86_64)
Hello t��_�
test(44993,0x7fff99ba93c0) malloc: *** error for object 0x7fff5fbff680: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Process 44993 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff90d6cd42 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fff90d6cd42 <+10>: jae 0x7fff90d6cd4c ; <+20>
0x7fff90d6cd44 <+12>: movq %rax, %rdi
0x7fff90d6cd47 <+15>: jmp 0x7fff90d65caf ; cerror_nocancel
0x7fff90d6cd4c <+20>: retq
Target 0: (test) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff90d6cd42 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff90e5a457 libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x00007fff90cd2420 libsystem_c.dylib`abort + 129
frame #3: 0x00007fff90dc1fe7 libsystem_malloc.dylib`free + 530
frame #4: 0x0000000100001f7b test`Test::~Test() [inlined] std::__1::default_delete<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >::operator(this=0x00007fff5fbff730, __ptr="\a\x94\x99�\x7f\0\0��_�\x7f\0\0\x80�_�\x7f\0\00�_�\x7f\0\00�_�\x7f\0\00�_�\x7f\0\00�_�\x7f\0\00�_�\x7f\0\0��_�\x7f\0\0\x15\x1e\0\0\x01\0\0\0\x80�_�\x7f\0\n0")(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) const at memory:2397
frame #5: 0x0000000100001f46 test`Test::~Test() [inlined] std::__1::unique_ptr<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::default_delete<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::reset(this=0x00007fff5fbff730, __p="") at memory:2603
frame #6: 0x0000000100001ef3 test`Test::~Test() [inlined] std::__1::unique_ptr<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::default_delete<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::~unique_ptr(this=0x00007fff5fbff730) at memory:2571
frame #7: 0x0000000100001ef3 test`Test::~Test() [inlined] std::__1::unique_ptr<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::default_delete<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::~unique_ptr(this=0x00007fff5fbff730) at memory:2571
frame #8: 0x0000000100001ef3 test`Test::~Test(this=0x00007fff5fbff730) at main.cpp:6
frame #9: 0x0000000100001e15 test`Test::~Test(this=0x00007fff5fbff730) at main.cpp:6
frame #10: 0x0000000100001ab6 test`main at main.cpp:28
frame #11: 0x00007fff90c3e235 libdyld.dylib`start + 1
Now I see that the problem is in Test destructor, and from here it's a piece of cake.
Unfortunately, trying to debug this case using gdb under macOS was a total failure. Here is what I've done:
$ gdb ./test
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin16.7.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test...Reading symbols from /Users/bazhenov/Developer/linear-counter/tests/test/test.dSYM/Contents/Resources/DWARF/test...done.
done.
(gdb) run
Starting program: /Users/bazhenov/Developer/linear-counter/tests/test/test
[New Thread 0x1403 of process 45204]
warning: unhandled dyld version (15)
Hello tQ�_�
test(45204,0x7fff99ba93c0) malloc: *** error for object 0x7fff5fbff650: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Thread 2 received signal SIGABRT, Aborted.
0x00007fff90d6cd42 in ?? ()
(gdb) bt
#0 0x00007fff90d6cd42 in ?? ()
#1 0x00007fff90e5a457 in ?? ()
#2 0x00007fff5fbff590 in ?? ()
#3 0x0000030700000000 in ?? ()
#4 0x00007fff5fbff590 in ?? ()
#5 0x00007fff5fbff650 in ?? ()
#6 0x00007fff5fbff5a0 in ?? ()
#7 0x00007fff90cd2420 in ?? ()
#8 0xffffffff00000018 in ?? ()
#9 0x00007fff5fbff5b0 in ?? ()
#10 0x00007fffffffffdf in ?? ()
#11 0x00000001000c4000 in ?? ()
#12 0x00007fff5fbff5f0 in ?? ()
#13 0x00007fff90dc1fe7 in ?? ()
#14 0x378b45e65b700074 in ?? ()
#15 0x00007fff99ba00ac in ?? ()
#16 0x0000000000000000 in ?? ()
(gdb)
The question is: why gdb fails to unwind the stack correctly and what options do I have if I need to get correct backtrace using gdb?
why gdb fails to unwind the stack correctly
There are some problems on Mac OS X Sierra with gdb, see this post and gdb bug report.
what options do I have if I need to get correct backtrace using gdb
You can try to downgrade Mac OS (don't know whether is it possible) or try to apply temporary hack patch from above bug report.

helgrind does not detect recursive locking of std::mutex

I observed that helgrind won't detect a recursive lock on a non-recursive c++11 std::mutex. The problem is however detected when using pthread_mutex_lock.
Two simple testcases to demonstrate the problem:
// Test code: C++11 std::mutex
// helgrind does not detect recursive locking
void test_cpp11()
{
std::mutex m;
m.lock();
m.lock();
}
// pthread-based test code
// helgrind does detect recursive locking
void test_pth()
{
pthread_mutex_t m;
pthread_mutex_init(&m, 0);
pthread_mutex_lock(&m);
pthread_mutex_lock(&m);
}
gdb shows that the same pthread library functions are being called:
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007ffff78c2657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007ffff78c2480 in __GI___pthread_mutex_lock (mutex=0x7fffffffe450) at ../nptl/pthread_mutex_lock.c:79
#3 0x00000000004008ad in test_pth() ()
#1 0x00007ffff78c2657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007ffff78c2480 in __GI___pthread_mutex_lock (mutex=0x7fffffffe450) at ../nptl/pthread_mutex_lock.c:79
#3 0x00000000004007f7 in __gthread_mutex_lock(pthread_mutex_t*) ()
#4 0x00000000004008ec in std::mutex::lock() ()
#5 0x0000000000400857 in test_cpp11() ()
This was observed with g++ 4.7.3, 4.8.2 and 4.9.0 on Ubuntu 14.04 64-bit.
Does anyone have an idea what might be the reason and what might be done to get helgrind to detect the recursive locking?
Not an answer to the original question but I think it's worth mentioning that one should always check the program with both helgrind and drd. The drd tool successfully detects the problem in both scenarios.

Resources