pthread_recursive_mutex - assertion failed - boost

I'm using ROS (Robot operating system) framework. If you are familiar with ROS, in my code, I'm not using activity servers. Plainly using publishers, subscribers and services. Unfortunately, I'm facing issue with pthread_recursive_mutex error. The following is the error and its backtrace.
If anyone is familiar with ROS stack, could you please share what could be potential causes that might cause this runtime error ?
I can give more information about my the runtime error. Help much appreciated. Thanks
/usr/include/boost/thread/pthread/recursive_mutex.hpp:113: void boost::recursive_mutex::lock(): Assertion `!pthread_mutex_lock(&m)' failed.

The lock method implementation merely assert the pthread return value:
void lock()
{
BOOST_VERIFY(!posix::pthread_mutex_lock(&m));
}
This means that according to the docs, either:
(EAGAIN) The mutex could not be acquired because the maximum number of
recursive locks for mutex has been exceeded.
This would indicate you have some kind of imbalance in your locks (not this call-site, because unique_lock<> makes sure that doesn't happen) or are just racking up threads that are all waiting for the same lock
(EOWNERDEAD) The mutex is a robust mutex and the process containing the
previous owning thread terminated while holding the mutex lock. The mutex
lock shall be acquired by the calling thread and it is up to the new
owner to make the state consistent.
Boost does not deal with this case and simply asserts. This would also not likely occur if all your threads use thread-safe lock-guards (scoped_lock, unique_lock, shared_lock, lock_guard). It could, however, occur, if you use the lock() (and unlock()) functions manually somewhere and the thread exits without unlock()ing
There are some other ways in which (particularly checked) mutexes can fail, but those would not apply to boost::recursive_mutex

This looks like a use-after-free problem, where a mutex has already been destroyed, probably because its owning object was deleted.
I had some success using Valgrind to hunt down this type of bugs. Install it using apt install valgrind, and add a launch-prefix="valgrind" to the <node> in your launch file. It will be super slow, but it's quite adept at pinpointing these issues.
Take this buggy program for example:
struct Test
{
int a;
};
int main()
{
Test* test = new Test();
test->a = 42;
delete test;
test->a = 0; // BUG!
}
valgrind ./testprog yields
==8348== Invalid write of size 4
==8348== at 0x108601: main (test.cpp:11)
==8348== Address 0x5b7ec80 is 0 bytes inside a block of size 4 free'd
==8348== at 0x4C3168B: operator delete(void*, unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==8348== by 0x108600: main (test.cpp:10)
==8348== Block was alloc'd at
==8348== at 0x4C303EF: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==8348== by 0x1085EA: main (test.cpp:8)
Note how it will not only tell you where the buggy access happened (test.cpp:11), but also where the Test object was deleted (test.cpp:10), and where it was initially created (test.cpp:8).
Good luck in your bug hunt!

Related

KSPIN_LOCK blocks when acquiring from Driver's main thread

I have a KSPIN_LOCK which is shared among a Windows driver's main thread and some threads I created with PsCreateSystemThread. The problem is that the main thread blocks if I try to acquire the spinlock and doesn't unblock. I'm very confused as to why this happens.. it's probably somehow connected to the fact that the main thread runs at driver IRQL, while the other threads run at PASSIVE_LEVEL as far as I know.
NOTE: If I only run the main thread, acquiring/releasing the lock works just fine.
NOTE: I'm using the functions KeAcquireSpinLock and KeReleaseSpinLock to acquire/release the lock.
Here's my checklist for a "stuck" spinlock:
Make sure the spinlock was initialized with KeInitializeSpinLock. If the KSPIN_LOCK holds uninitialized garbage, then the first attempt to acquire it will likely spin forever.
Check that you're not acquiring it recursively/nested. KSPIN_LOCK does not support recursion, and if you try it, it will spin forever.
Normal spinlocks must be acquired at IRQL <= DISPATCH_LEVEL. If you need something that works at DIRQL, check out [1] and [2].
Check for leaks. If one processor acquires the spinlock, but forgets to release it, then the next processor will spin forever when trying to acquire the lock.
Ensure there's no memory-safety issues. If code randomly writes a non-zero value on top of the spinlock, that'll cause it to appear to be acquired, and the next acquisition will spin forever.
Some of these issues can be caught easily and automatically with Driver Verifier; use it if you're not using it already. Other issues can be caught if you encapsulate the spinlock in a little helper that adds your own asserts. For example:
typedef struct _MY_LOCK {
KSPIN_LOCK Lock;
ULONG OwningProcessor;
KIRQL OldIrql;
} MY_LOCK;
void MyInitialize(MY_LOCK *lock) {
KeInitializeSpinLock(&lock->Lock);
lock->OwningProcessor = (ULONG)-1;
}
void MyAcquire(MY_LOCK *lock) {
ULONG current = KeGetCurrentProcessorIndex();
NT_ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
NT_ASSERT(current != lock->OwningProcessor); // check for recursion
KeAcquireSpinLock(&lock->Lock, &lock->OldIrql);
NT_ASSERT(lock->OwningProcessor == (ULONG)-1); // check lock was inited
lock->OwningProcessor = current;
}
void MyRelease(MY_LOCK *lock) {
NT_ASSERT(KeGetCurrentProcessorIndex() == lock->OwningProcessor);
lock->OwningProcessor = (ULONG)-1;
KeReleaseSpinLock(&lock->Lock, lock->OldIrql);
}
Wrappers around KSPIN_LOCK are common. The KSPIN_LOCK is like a race car that has all the optional features stripped off to maximize raw speed. If you aren't counting microseconds, you might reasonably decide to add back the heated seats and FM radio by wrapping the low-level KSPIN_LOCK in something like the above. (And with the magic of #ifdefs, you can always take the airbags out of your retail builds, if you need to.)

Why finalizer is never called?

var p = &sync.Pool{
New: func() interface{} {
return &serveconn{}
},
}
func newServeConn() *serveconn {
sc := p.Get().(*serveconn)
runtime.SetFinalizer(sc, (*serveconn).finalize)
fmt.Println(sc, "SetFinalizer")
return sc
}
func (sc *serveconn) finalize() {
fmt.Println(sc, "finalize")
*sc = serveconn{}
runtime.SetFinalizer(sc, nil)
p.Put(sc)
}
The above code tries to reuse object by SetFinalizer, but after debug I found finalizer is never called, why?
UPDATE
This may be related:https://github.com/golang/go/issues/2368
The above code tries to reuse object by SetFinalizer, but after debug I found finalizer is never called, why?
The finalizer is only called on an object when the GC
marks it as unused and then tries to sweep (free) at the end
of the GC cycle.
As a corollary, if a GC cycle is never performed during the runtime of your program, the finalizers you set may never be called.
Just in case you might hold a wrong assumption about the Go's GC, it may worth noting that Go does not employ reference counting on values; instead, it uses GC which works in parallel with the program, and the sessions during which it works happen periodically and are triggered by certain parameters like pressure on the heap produced by allocations.
A couple assorted notes regarding finalizers:
When the program terminates, no GC is forcibly run.
A corollary of this is that a finalizer is not guaranteed
to run at all.
If the GC finds a finalizer on an object about to be freed,
it calls the finalizer but does not free the object.
The object itself will be freed only at the next GC cycle —
wasting the memory.
All in all, you appear as trying to implement destructors.
Please don't: make your objects implement the sort-of standard method called Close and state in the contract of your type that the programmer is required to call it when they're done with the object.
When a programmer wants to call such a method no matter what, they use defer.
Note that this approach works perfectly for all types in the Go
stdlib which wrap resources provided by the OS—file and socket descriptors. So there is no need to pretend your types are somehow different.
Another useful thing to keep in mind is that Go was explicitly engineered to be no-nonsense, no-frills, no-magic, in-your-face language, and you're just trying to add magic to it.
Please don't, those who like decyphering layers of magic do program in Scala different languages.

Boost interprocess process crash when Memory allocation will lead to dead lock [duplicate]

I have a need for interprocess synchronization around a piece of hardware. Because this code will need to work on Windows and Linux, I'm wrapping with Boost Interprocess mutexes. Everything works well accept my method for checking abandonment of the mutex. There is the potential that this can happen and so I must prepare for it.
I've abandoned the mutex in my testing and, sure enough, when I use scoped_lock to lock the mutex, the process blocks indefinitely. I figured the way around this is by using the timeout mechanism on scoped_lock (since much time spent Googling for methods to account for this don't really show much, boost doesn't do much around this because of portability reasons).
Without further ado, here's what I have:
#include <boost/interprocess/sync/named_recursive_mutex.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>
typedef boost::interprocess::named_recursive_mutex MyMutex;
typedef boost::interprocess::scoped_lock<MyMutex> ScopedLock;
MyMutex* pGate = new MyMutex(boost::interprocess::open_or_create, "MutexName");
{
// ScopedLock lock(*pGate); // this blocks indefinitely
boost::posix_time::ptime timeout(boost::posix_time::microsec_clock::local_time() + boost::posix_time::seconds(10));
ScopedLock lock(*pGate, timeout); // a 10 second timeout that returns immediately if the mutex is abandoned ?????
if(!lock.owns()) {
delete pGate;
boost::interprocess::named_recursive_mutex::remove("MutexName");
pGate = new MyMutex(boost::interprocess::open_or_create, "MutexName");
}
}
That, at least, is the idea. Three interesting points:
When I don't use the timeout object, and the mutex is abandoned, the ScopedLock ctor blocks indefinitely. That's expected.
When I do use the timeout, and the mutex is abandoned, the ScopedLock ctor returns immediately and tells me that it doesn't own the mutex. Ok, perhaps that's normal, but why isn't it waiting for the 10 seconds I'm telling it too?
When the mutex isn't abandoned, and I use the timeout, the ScopedLock ctor still returns immediately, telling me that it couldn't lock, or take ownership, of the mutex and I go through the motions of removing the mutex and remaking it. This is not at all what I want.
So, what am I missing on using these objects? Perhaps it's staring me in the face, but I can't see it and so I'm asking for help.
I should also mention that, because of how this hardware works, if the process cannot gain ownership of the mutex within 10 seconds, the mutex is abandoned. In fact, I could probably wait as little as 50 or 60 milliseconds, but 10 seconds is a nice "round" number of generosity.
I'm compiling on Windows 7 using Visual Studio 2010.
Thanks,
Andy
When I don't use the timeout object, and the mutex is abandoned, the ScopedLock ctor blocks indefinitely. That's expected
The best solution for your problem would be if boost had support for robust mutexes. However Boost currently does not support robust mutexes. There is only a plan to emulate robust mutexes, because only linux has native support on that. The emulation is still just planned by Ion Gaztanaga, the library author.
Check this link about a possible hacking of rubust mutexes into the boost libs:
http://boost.2283326.n4.nabble.com/boost-interprocess-gt-1-45-robust-mutexes-td3416151.html
Meanwhile you might try to use atomic variables in a shared segment.
Also take a look at this stackoverflow entry:
How do I take ownership of an abandoned boost::interprocess::interprocess_mutex?
When I do use the timeout, and the mutex is abandoned, the ScopedLock ctor returns immediately and tells me that it doesn't own the mutex. Ok, perhaps that's normal, but why isn't it waiting for the 10 seconds I'm telling it too?
This is very strange, you should not get this behavior. However:
The timed lock is possibly implemented in terms of the try lock. Check this documentation:
http://www.boost.org/doc/libs/1_53_0/doc/html/boost/interprocess/scoped_lock.html#idp57421760-bb
This means, the implementation of the timed lock might throw an exception internally and then returns false.
inline bool windows_mutex::timed_lock(const boost::posix_time::ptime &abs_time)
{
sync_handles &handles =
windows_intermodule_singleton<sync_handles>::get();
//This can throw
winapi_mutex_functions mut(handles.obtain_mutex(this->id_));
return mut.timed_lock(abs_time);
}
Possibly, the handle cannot be obtained, because the mutex is abandoned.
When the mutex isn't abandoned, and I use the timeout, the ScopedLock ctor still returns immediately, telling me that it couldn't lock, or take ownership, of the mutex and I go through the motions of removing the mutex and remaking it. This is not at all what I want.
I am not sure about this one, but I think the named mutex is implemented by using a shared memory. If you are using Linux, check for the file /dev/shm/MutexName. In Linux, a file descriptor remains valid until that is not closed, no matter if you have removed the file itself by e.g. boost::interprocess::named_recursive_mutex::remove.
Check out the BOOST_INTERPROCESS_ENABLE_TIMEOUT_WHEN_LOCKING and BOOST_INTERPROCESS_TIMEOUT_WHEN_LOCKING_DURATION_MS compile flags. Define the first symbol in your code to force the interprocess mutexes to time out and the second symbol to define the timeout duration.
I helped to get them added to the library to solve the abandoned mutex issue. It was necessary to add it due to many interprocess constructs (like message_queue) that rely on the simple mutex rather than the timed mutex. There may be a more robust solution in the future, but this solution has worked just fine for my interprocess needs.
I'm sorry I can't help you with your code at the moment; something is not working correctly there.
BOOST_INTERPROCESS_ENABLE_TIMEOUT_WHEN_LOCKING is not so good. It throws an exception and does not help much. To workaround exceptional behaviour I wrote this macro. It works just alright for common purposed. In this sample named_mutex is used. The macro creates a scoped lock with a timeout, and if the lock cannot be acquired for EXCEPTIONAL reasons, it will unlock it afterwards. This way the program can lock it again later and does not freeze or crash immediately.
#define TIMEOUT 1000
#define SAFELOCK(pMutex) \
boost::posix_time::ptime wait_time \
= boost::posix_time::microsec_clock::universal_time() \
+ boost::posix_time::milliseconds(TIMEOUT); \
boost::interprocess::scoped_lock<boost::interprocess::named_mutex> lock(*pMutex, wait_time); \
if(!lock.owns()) { \
pMutex->unlock(); }
But even this is not optimal, because the code to be locked now runs unlocked once. This may cause problems. You can easily extend the macro however. E.g. run code only if lock.owns() is true.
boost::interprocess::named_mutex has 3 defination:
on windows, you can use macro to use windows mutex instead of boost mutex, you can try catch the abandoned exception, and you should unlock it!
on linux, the boost has pthread_mutex, but it not robust attribute in 1_65_1version
so I implemented interprocess_mutex myself use system API(windows Mutex and linux pthread_mutex process shared mode), but windows Mutex is in the kernel instead of file.
Craig Graham answered this in a reply already but I thought I'd elaborate because I found this, didn't read his message, and beat my head against it to figure it out.
On a POSIX system, timed lock calls:
timespec ts = ptime_to_timespec(abs_time);
pthread_mutex_timedlock(&m_mut, &ts)
Where abs_time is the ptime that the user passes into interprocess timed_lock.
The problem is, that abs_time must be in UTC, not system time.
Assume that you want to wait for 10 seconds; if you're ahead of UTC your timed_lock() will return immediately,
and if you're behind UTC, your timed_lock() will return in hours_behind - 10 seconds.
The following ptime times out an interprocess mutex in 10 seconds:
boost::posix_time::ptime now = boost::posix_time::second_clock::universal_time() +
boost::posix_time::seconds(10);
If I use ::local_time() instead of ::universal_time(), since I'm ahead of UTC, it returns immediately.
The documentation fails to mention this.
I haven't tried it, but digging into the code a bit, it looks like the same problem would occur on a non-POSIX system.
If BOOST_INTERPROCESS_POSIX_TIMEOUTS is not defined, the function ipcdetail::try_based_timed_lock(*this, abs_time) is called.
It uses universal time as well, waiting on while(microsec_clock::universal_time() < abs_time).
This is only speculation, as I don't have quick access to a Windows system to test this on.
For full details, see https://www.boost.org/doc/libs/1_76_0/boost/interprocess/sync/detail/common_algorithms.hpp

Is returning while holding a spinlock automatically unsafe?

The venerated book Linux Driver Development says that
The flags argument passed to spin_unlock_irqrestore must be the same variable passed to spin_lock_irqsave. You must also call spin_lock_irqsave and spin_unlock_irqrestore in the same function; otherwise your code may break on some architectures.
Yet I can't find any such restriction required by the official documentation bundled with the kernel code itself. And I find driver code that violates this guidance.
Obviously it isn't a good idea to call spin_lock_irqsave and spin_unlock_irqrestore from separate functions, because you're supposed to minimize the work done while holding a lock (with interrupts disabled, no less!). But have changes to the kernel made it possible if done with care, was it never actually against the API contract, or is it still verboten to do so?
If the restriction has been removed at some point, did it apply to version 3.10.17?
This is just a guess, but the might be unclearly referring to a potential bug which could happen if you try to use a nonlocal variable or storage location for flags.
Basically, flags has to be private to the current execution context, which is why spin_lock_irqsave is a macro which takes the name of the flags. While flags is being saved, you don't have the spinlock yet.
How this is related to locking and unlocking in a different function:
Consider two functions that some driver developer might write:
void my_lock(my_object *ctx)
{
spin_lock_irqsave(&ctx->mylock, ctx->myflags); /* BUG */
}
void my_unlock(my_object *ctx)
{
spin_unlock_irqrestore(&ctx->mylock, ctx->myflags);
}
This is a bug because at the time ctx->myflags is written, the lock is not yet held, and it is a shared variable visible to other contexts and processors. The local flags must be saved to a private location on the stack. Then when the lock is owned, by the caller, a copy of the flags can be saved into the exclusively owned object. In other words, it can be fixed like this:
void my_lock(my_object *ctx)
{
unsigned long flags;
spin_lock_irqsave(&ctx->mylock, flag);
ctx->myflags = flags;
}
void my_unlock(my_object *ctx)
{
unsigned long flags = ctx->myflags; /* probably unnecessary */
spin_unlock_irqrestore(&ctx->mylock, flags);
}
If it couldn't be fixed like that, it would be very difficult to implement higher level primitives which need to wrap IRQ spinlocks.
How it could be arch-dependent:
Suppose that spin_lock_irqsave expands into machine code which saves the current flags in some register, then acquires the lock, and then saves that register into specified flags destination. In that case, the buggy code is actually safe. If the expanded code saves the flags into the actual flags object designated by the caller and then tries to acquire the lock, then it's broken.
I have never see that constraint aside from the book. Probably, given information in the book is just outdated, or .. simply wrong.
In the current kernel(and at least since 2.6.32, which I start to work with) actual locking is done through many level of nested calls from spin_lock_irqsave(see, e.g. __raw_spin_lock_irqsave, which is called in the middle). So different function's context for lock and unlock may hardly be a reason for misfunction.

TCriticalSection.LockCount and negative values [duplicate]

I am debugging a deadlock issue and call stack shows that threads are waiting on some events.
Code is using critical section as synchronization primitive I think there is some issue here.
Also the debugger is pointing to a critical section that is owned by some other thread,but lock count is -2.
As per my understanding lock count>0 means that critical section is locked by one or more threads.
So is there any possibility that I am looking at right critical section which could be the culprit in deadlock.
In what scenarios can a critical section have negative lock count?
Beware: since Windows Server 2003 (for client OS this is Vista and newer) the meaning of LockCount has changed and -2 is a completely normal value, commonly seen when a thread has entered a critical section without waiting and no other thread is waiting for the CS. See Displaying a Critical Section:
In Microsoft Windows Server 2003 Service Pack 1 and later versions of Windows, the LockCount field is parsed as follows:
The lowest bit shows the lock status. If this bit is 0, the critical section is locked; if it is 1, the critical section is not locked.
The next bit shows whether a thread has been woken for this lock. If this bit is 0, then a thread has been woken for this lock; if it is 1, no thread has been woken.
The remaining bits are the ones-complement of the number of threads waiting for the lock.
I am assuming that you are talking about CCriticalSection class in MFC. I think you are looking at the right critical section. I have found that the critical section's lock count can go negative if the number of calls to Lock() function is less than the number of Unlock() calls. I found that this generally happens in the following type of code:
void f()
{
CSingleLock lock(&m_synchronizer, TRUE);
//Some logic here
m_synchronizer.Unlock();
}
At the first glance this code looks perfectly safe. However, note that I am using CCriticalSection's Unlock() method directly instead of CSingleLock's Unlock() method. Now what happens is that when the function exits, CSingleLock in its destructor calls Unlock() of the critical section again and its lock count becomes negative. After this the application will be in a bad shape and strange things start to happen. If you are using MFC critical sections then do check for this type of problems.

Resources