helgrind does not detect recursive locking of std::mutex - c++11

I observed that helgrind won't detect a recursive lock on a non-recursive c++11 std::mutex. The problem is however detected when using pthread_mutex_lock.
Two simple testcases to demonstrate the problem:
// Test code: C++11 std::mutex
// helgrind does not detect recursive locking
void test_cpp11()
{
std::mutex m;
m.lock();
m.lock();
}
// pthread-based test code
// helgrind does detect recursive locking
void test_pth()
{
pthread_mutex_t m;
pthread_mutex_init(&m, 0);
pthread_mutex_lock(&m);
pthread_mutex_lock(&m);
}
gdb shows that the same pthread library functions are being called:
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007ffff78c2657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007ffff78c2480 in __GI___pthread_mutex_lock (mutex=0x7fffffffe450) at ../nptl/pthread_mutex_lock.c:79
#3 0x00000000004008ad in test_pth() ()
#1 0x00007ffff78c2657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007ffff78c2480 in __GI___pthread_mutex_lock (mutex=0x7fffffffe450) at ../nptl/pthread_mutex_lock.c:79
#3 0x00000000004007f7 in __gthread_mutex_lock(pthread_mutex_t*) ()
#4 0x00000000004008ec in std::mutex::lock() ()
#5 0x0000000000400857 in test_cpp11() ()
This was observed with g++ 4.7.3, 4.8.2 and 4.9.0 on Ubuntu 14.04 64-bit.
Does anyone have an idea what might be the reason and what might be done to get helgrind to detect the recursive locking?

Not an answer to the original question but I think it's worth mentioning that one should always check the program with both helgrind and drd. The drd tool successfully detects the problem in both scenarios.

Related

Why calling dlopen sometimes breaks my application by damaging class variables content?

I am trying to load library with dlopen().
But call to this dlopen() function sometimes (not always) damages my class variables and then app goes to segmentation fault.
Below is not precise code (pseudocode), but explanation what happens:
class MyClass {
public:
int MyVar;
void Print() { printf("Simply breakpoint\n"); };
void LoadLibrary() { dlopen("/usr/lib/x86_64-linux-gnu/libavcodec.so.58.54.100",RTLD_LAZY); };
MyClass() {
MyVar = 12345;
printf("MyVar address %p\n",&MyVar);
Print();
LoadLibrary();
};
}
void main()
{
MyClass obj;
}
I do debug it with gdb following way:
>gdb MyApp
>break Print
>run
when it stops at Print function breakpoint I see printed address of variable MyVar.
MyVar address 0x7fff900bc2bc
Also I can check its content.
Then I do
>watch *0x7fff900bc2bc
Hardware watchpoint 2: *0x7fff900bc2bc
>cont
When it continues it breaks on unexpected writing to my variable MyVar:
Thread 1 "MyApp" hit Hardware watchpoint 2: *0x7fff900bc2bc
Old value = 12345
New value = 32767
memmove () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:356
356 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) backtrace
#0 memmove () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:356
#1 0x00007ffff7fde759 in _dl_map_object_deps (map=map#entry=0x7fff90145110, preloads=preloads#entry=0x0,
npreloads=npreloads#entry=0, trace_mode=trace_mode#entry=0, open_mode=open_mode#entry=-2147483648)
at dl-deps.c:446
#2 0x00007ffff7fe4db0 in dl_open_worker (a=a#entry=0x7fffa6fd80f0) at dl-open.c:571
#3 0x00007ffff53dd928 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>,
args=<optimized out>) at dl-error-skeleton.c:208
#4 0x00007ffff7fe460a in _dl_open (file=0x42d8ee0 "/usr/lib/x86_64-linux-gnu/libavcodec.so.58.54.100",
mode=-2147483646, caller_dlopen=<optimized out>, nsid=-2, argc=2, argv=0x7fffffffea88, env=0x54037d0)
at dl-open.c:837
#5 0x00007ffff57bc34c in dlopen_doit (a=a#entry=0x7fffa6fd8310) at dlopen.c:66
#6 0x00007ffff53dd928 in __GI__dl_catch_exception (exception=exception#entry=0x7fffa6fd82b0,
operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:208
#7 0x00007ffff53dd9f3 in __GI__dl_catch_error (objname=0x7fff900d8770, errstring=0x7fff900d8778,
mallocedp=0x7fff900d8768, operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:227
#8 0x00007ffff57bcb59 in _dlerror_run (operate=operate#entry=0x7ffff57bc2f0 <dlopen_doit>,
args=args#entry=0x7fffa6fd8310) at dlerror.c:170
#9 0x00007ffff57bc3da in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#10 0x000000000209ec5b in MyClass::LoadLibrary() ()
.......
From stack backtrace I see that MyVar is damaged by call to dlopen()
But why?
What I am doing wrong?
How to resolve?
Unfortunately I cannot show all source code because it is huge and involves many different components, many threads, many 3rd party libraries.
I cannot simply dynamically link my app with libavcodec because it is already statically linked in 3rd party library but 3rd party library is built without required features unfortunately (without VAAPI support). Dynamic linking makes symbol conflicts.
That is why I was decided try to load libavcodec manually by dlopen() and get all required function pointers from dlsym().
But why? What I am doing wrong? How to resolve?
You didn't say which version of GLIBC you are using (or which distribution).
The code in GLIBC-2.27 dl-deps.c reads:
struct link_map **l_initfini = (struct link_map **)
malloc ((2 * nneeded + 1) * sizeof needed[0]);
if (l_initfini == NULL)
_dl_signal_error (ENOMEM, map->l_name, NULL,
N_("cannot allocate dependency list"));
l_initfini[0] = l;
memcpy (&l_initfini[1], needed, nneeded * sizeof needed[0]);
memcpy (&l_initfini[nneeded + 1], l_initfini,
nneeded * sizeof needed[0]); // line 446
You also didn't say whether MyClass is heap or stack allocated.
One way that the GLIBC code could write over your variable is when you have already corrupted heap earlier. This is especially likely if MyClass is in fact heap-allocated (which it appears to be given the 0x7fff900bc2bc address).
The fact that this "write over" happens only some of the time is also symptomatic of heap corruption.
As the very first step, I would run the program under Valgrind and make sure that no heap corruption (heap buffer overflow, free unallocated, double-free, etc.) is detected before LoadLibrary() runs.

conflict in symbols exposed by tcmalloc and glibc

I was recently debugging a crash in a product and identified the cause to be a conflict in the memory allocation symbols exposed by glibc and tcmalloc. I wrote the following sample code for exposing this issue:
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <assert.h>
#include <stdlib.h>
int main()
{
struct addrinfo hints = {0}, *res = NULL;
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
int rc = getaddrinfo("myserver", NULL, &hints, &res);
assert(rc == 0);
return 0;
}
I compiled it using the following command:
g++ temp.cpp -g -lresolv
I executed the program using the following command:
LD_PRELOAD=/path/to/libtcmalloc_minimal.so.4 ./a.out
The program crashes with the following stack:
#0 0x00007ffff6c7c875 in *__GI_raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff6c7de51 in *__GI_abort () at abort.c:92
#2 0x00007ffff6cbd8bf in __libc_message (do_abort=2, fmt=0x7ffff6d8c460 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3 0x00007ffff6cc30c8 in malloc_printerr (action=2, str=0x7ffff6d88fec "free(): invalid pointer", ptr=) at malloc.c:6282
#4 0x00007ffff6cc810c in *__GI___libc_free (mem=) at malloc.c:3733
#5 0x00007ffff6839e89 in _nss_dns_gethostbyname4_r (name=0x400814 "myserver", pat=0x7fffffffdfa8, buffer=0x7fffffffd9b0 "myserver.mydomain.com", buflen=1024, errnop=0x7fffffffdfbc, herrnop=0x7fffffffdf98, ttlp=0x0) at nss_dns/dns-host.c:341
#6 0x00007ffff6d11917 in gaih_inet (name=0x400814 "myserver", service=0x7fffffffdf88, req=0x7fffffffe1d0, pai=0x7fffffffe160, naddrs=0x7fffffffe168) at ../sysdeps/posix/getaddrinfo.c:880
#7 0x00007ffff6d14301 in *__GI_getaddrinfo (name=0x400814 "myserver", service=0x0, hints=0x7fffffffe1d0, pai=0x7fffffffe200) at ../sysdeps/posix/getaddrinfo.c:2452
#8 0x00000000004006f0 in main () at temp.cpp:12
The reason for this is that the free() function called by _nss_dns_gethostbyname4_r() from libnss_dns.so is from libc.so while the corresponding malloc() was called from libresolv.so from libtcmalloc_minimal.so. The addresses of tcmalloc's malloc() and free() functions are getting into the GOT of libresolv.so leading to this crash. The crash goes away if I don't link my program to libresolv.so.
Now for my question. Is there any documentation which explains how to safely use tcmalloc to avoid crashes like this ?
glibc has some documentation for interposing malloc:
Replacing malloc
Something else must be going here, though. Typical builds of glibc and glibc will get this right (even in fairly old versions of either package).
My best guess is you are using some SUSE glibc variant, which uses RTLD_DEEPBIND for NSS modules. This results in a known issue with malloc interposition. SUSE suggests setting the RTLD_DEEPBIND=0 environment variable as a workaround.

Does MSP430 GCC support newer C++ standards? (like 11, 14, 17)

I'm writing some code that would greatly benefit from the concise syntax of lambdas, which were introduced with C++ 11. Is this supported by the compiler?
How do I specify the compiler flags when compiling using Energia or embedXcode?
As of February 2018, up to C++14 is supported with some limitations:
http://processors.wiki.ti.com/index.php/C%2B%2B_Support_in_TI_Compilers
There isn't much about this topic on the TI site, or, at least, I don't know enough C++ to give you a detailed and precise response.
The implementation of the embedded ABI is described in this document that is mainly a derivation of the Itanium C++ ABI. It explains nothing about the implementation of lambdas nor the auto, keyword (or probably I'm not able to derive this information from the documentation).
Thus I decided to directly test in Energia. Apparently the g++ version is 4.6.3, thus it should support both.
And in fact (from a compilation point of view, I don't have my MSP here to test the code) it can compile something like:
// In template.hpp
#ifndef TEMPLATE_HPP_
#define TEMPLATE_HPP_
template<class T>
T func(T a) {
auto c = [&](int n) { return n + a; };
return c(0);
}
#endif /* TEMPLATE_HPP_ */
// in the sketch main
#include "template.hpp"
void setup() { int b = func<int>(0); }
void loop() { }
(the template works only if in an header, in the main sketch raises an error). To compile this sketch I had to modify one internal file of the editor. The maximum supported standard seems to be -std=c++0x, and the compilation flags are in the file:
$ENERGIA_ROOT/hardware/energia/msp430/platform.txt
in my setup the root is in /opt/energia. Inside that file I modified line 32 (compiler.cpp.flags) and added the option. Notice that -std=c++11 is not supported (raises an error).
compiler.cpp.flags=-std=c++0x -c -g -O2 {compiler.mlarge_flag} {compiler.warning_flags} -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -MMD
Unfortunately I have zero experience with embedXcode :\
Mimic std::function
std::function is not provided, thus you have to write some sort of class that mimics it. Something like:
// callback.hpp
#ifndef CALLBACK_HPP_
#define CALLBACK_HPP_
template <class RET, class ARG>
class Callback {
RET (*_f)(ARG);
public:
Callback() : _f(0) { };
Callback(RET (*f)(ARG)) : _f(f) { };
bool is_set() const { return (_f) ? true : false; }
RET operator()(ARG a) const { return is_set() ? _f(a) : 0; }
};
#endif /* CALLBACK_HPP_ */
// sketch
#include "callback.hpp"
// | !! empty capture!
void setup() { // V
auto clb = Callback<int, char>([](char c) { return (int)c; });
if (clb.is_set())
auto b = clb('a');
}
void loop() {}
may do the work, and it uses a simple trick:
The closure type for a lambda-expression with no lambda-capture has a public non-virtual non-explicit const conversion function to pointer to function having the same parameter and return types as the closure type’s function call operator. [C++11 standard 5.1.2]
As soon as you leave the capture empty, you are assured to have a "conversion" to a function pointer, thus you can store it without issues. The code I have written:
requires a first template RET that is the returned type
requires a second template ARG that is one argument for the callback. In the majority of the case you may consider to use void* as common argument (cast a struct pointer in a void pointer and use it as argument, to counter-cast in the function, the operation costs nothing)
implements two constructors: the empty constructor initialize the function pointer to NULL, while the second directly assigns the callback. Notice that the copy constructor is missing, you need to implement it.
implements a method to call the function (overloading the operator ()) and to check if the callback actually exists.
Again: this stuff compiles with no warnings, but I don't know if it works on the MSP430, since I cannot test it (it works on a common amd64 linux system).

Segmentation fault on using boost syslog and cpp-netlib

After adding boost syslog into source code, segmentation fault appears inside cpp-netlib library.
I was able to prepare minimum working code snippet to reproduce the problem.
#include <boost/network/protocol/http/client.hpp>
#include <boost/log/utility/setup/file.hpp>
#include <boost/log/sinks/syslog_backend.hpp>
#include <iostream>
using namespace boost::network;
using namespace boost::network::http;
namespace sinks = boost::log::sinks;
int main()
{
client::request request_("http://www.boost.org/");
client client_;
client::response response_ = client_.get(request_);
std::string body_ = body(response_);
std::cout << "body: " << body_;
using syslog_sinkT = sinks::synchronous_sink <sinks::syslog_backend>;
boost::shared_ptr <sinks::syslog_backend> backend = boost::make_shared <sinks::syslog_backend> ();
boost::shared_ptr<syslog_sinkT> sink = boost::make_shared <syslog_sinkT> (backend);
}
When last 2 lines are commented, segmentation fault disappears and everything works fine.
gdb stack trace (approximately, may vary):
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xf7c29b40 (LWP 19874)]
0x00000000 in ?? ()
(gdb) where
#0 0x00000000 in ?? ()
#1 0x083c376c in boost::asio::detail::task_io_service_operation::complete (bytes_transferred=260, ec=..., owner=...,
this=0xf6900710)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/detail/task_io_service_o$
eration.hpp:38
#2 boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., this=<optimized out>)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/detail/impl/task_io_serv$
ce.ipp:372
#3 boost::asio::detail::task_io_service::run (ec=..., this=0x84ea280)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/detail/impl/task_io_serv$
ce.ipp:149
#4 boost::asio::io_service::run (this=0x84e9a94)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/impl/io_service.ipp:59
#5 0x083b5766 in boost::_mfi::mf0<unsigned int, boost::asio::io_service>::operator() (p=<optimized out>, this=<optimized out>)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/bind/mem_fn_template.hpp:49
#6 boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> >::operator()<unsigned int, boost::_mfi::mf0<unsigned int, boost::a$
io::io_service>, boost::_bi::list0> (a=<synthetic pointer>, f=..., this=0x84ea94c)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/bind/bind.hpp:249
#7 boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boo$
t::asio::io_service*> > >::operator() (this=0x84ea944)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/bind/bind.hpp:1222
#8 boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::$
ist1<boost::_bi::value<boost::asio::io_service*> > > >::run (this=0x84ea828)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/thread/detail/thread.hpp:116
#9 0x0840a4f8 in boost::(anonymous namespace)::thread_proxy (param=0x84ea828) at libs/thread/src/pthread/thread.cpp:167
#10 0xf7de1f70 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#11 0xf7d18bee in clone () from /lib/i386-linux-gnu/libc.so.6
Problem exists on Ubuntu 14.04 with cpp-netlib 0.11.2 and both boost versions 1_58_0 and 1_60_0. Boost, cpp-netlib and my application are compiled with -std=c++11.
Note 1. Segmentation fault appears inside cpp-netlib before reaching syslog_backend creation. Only presence of last 2 lines guarantees SIGSEGV reproduction.
Note 2. Reproduces only with syslog_backend. Any other logging targets (file, consol) work fine.
The best idea I have is the problem may lay inside boost during static variables initialisation, but I have no proves regarding this version.
Any suggestions?
Seems like I used too many compile options for building both boost and cpp-netlib.
I prepared new build for both boost and cpp-netlib once again, but this time I used as less additional options as possible.
And it works fine.
EDIT: I found the key which causes the error. It's BOOST_ASIO_ENABLE_HANDLER_TRACKING It was defined during the boost compilation, but wasn't defined during compilation of cpp-netlib and my application.
https://svn.boost.org/trac/boost/ticket/11945

Pthreads in Mac OS X - Mutexes issue

I'm trying to learn how to program parallel algorithms in C using POSIX threads. My environment is a Mac OS X 10.5.5 with gcc 4.
Compiling:
gcc -Wall -D_REENTRANT -lpthread source.c -o test.o
So, my problem is, if I compile this in a Ubuntu 9.04 box, it runs smoothly in thread order, on Mac looks like mutexes doesn't work and the threads don't wait to get the shared information.
Mac:
#1
#0
#2
#5
#3
#4
ubuntu
#0
#1
#2
#3
#4
#5
Any ideas?
Follow below the source code:
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
#define NUM_THREADS 6
pthread_mutex_t mutexsum;
pthread_t threads[NUM_THREADS];
long Sum;
void *SumThreads(void *threadid){
int tmp;
int i,x[10],y[10];
// Para cada x e y do vetor, jogamos o valor de i, só para meio didáticos
for (i=0; i<10 ; i++){
x[i] = i;
y[i] = i;
}
tmp = Sum;
for (i=0; i<10 ; i++){
tmp += (x[i] * y[i]);
}
pthread_mutex_lock (&mutexsum);
Sum += tmp;
printf("Im thread #%ld sum until now is: %ld\n",threadid,Sum);
pthread_mutex_unlock (&mutexsum);
return 0;
}
int main(int argc, char *argv[]){
int i;
Sum = 0;
pthread_mutex_init(&mutexsum, NULL);
for(i=0; i<NUM_THREADS; i++){
pthread_create(&threads[i], NULL, SumThreads, (void *)i);
}
pthread_exit(NULL);
}
There is nothing on your code that will make your threads running in ANY order. If in Ubuntu is running on some order, it might be because you are just lucky. Try running 1000 times in Ubuntu and see if you get the same results over and over again.
The thing is, that you can't control the way the scheduler will make your threads access the processor(s). So, when you iterate through the for loop is creating your threads, you can't assume that the first call to pthread_create will get to run first, or will get to lock the mutex you are creating first. It's up to the scheduler which it at the OS level, and you can't control it, unless you write your own kernel :-).
If you want a serial behavior why would you run your code in separate threads in the first place? If it is just for experimentation, then one solution I can think of using pthread_signal to wake a specific thread up and make it running... Then the woken up thread can wake up the second one and so on so forth.
Hope it helps.
To my recollection, the variable you have protected isn't actually being shared amongst the processes. It exists in its own context inside each of the threads. So, it's really just a matter of when each thread gets scheduled that determines what will print.
I don't think one simple mutex will allow you to guarantee correctness, if correctness is defined as printing 0, 1, 2, 3 ...
what your code is doing is creating multiple execution contexts, using the code in your sum function as its execution code. the variable you are protecting, unless declared as static, will be unique to each call of that function.
in the end, it is coincidence that you are getting one system to print out correctly, because you have no logical method of blocking threads until it is their proper turn.
I don't do pthreads in C or any other language (but I do thread programming on high-performace computers) so this 'answer' might be useless to you;
What in your code requires the threads to pass the mutex in thread id order ? I see that the threads are created in id order, but what requires them to execute in that order /
If you do require your threads to execute in id order, why ? It seems a bit as if you are creating threads, then serialising them. To what end ?
When I program in threads and worry about execution order, I often try creating a very large number of threads and seeing what happens to the execution order.
As I say, ignore this if my lack of understanding of C and pthreads is too poor.

Resources