std::condition_variable why does it need a std::mutex - c++11

I am not sure if I really understand why std::condition_variable needs a additional std::mutex as a parameter? Should it not be locking by its self?
#include <iostream>
#include <condition_variable>
#include <thread>
#include <chrono>
std::condition_variable cv;
std::mutex cv_m;
int i = 0;
bool done = false;
void waits()
{
std::unique_lock<std::mutex> lk(cv_m);
std::cout << "Waiting... \n";
cv.wait(lk, []{return i == 1;});
std::cout << "...finished waiting. i == 1\n";
done = true;
}
void signals()
{
std::this_thread::sleep_for(std::chrono::seconds(1));
std::cout << "Notifying falsely...\n";
cv.notify_one(); // waiting thread is notified with i == 0.
// cv.wait wakes up, checks i, and goes back to waiting
std::unique_lock<std::mutex> lk(cv_m);
i = 1;
while (!done)
{
std::cout << "Notifying true change...\n";
lk.unlock();
cv.notify_one(); // waiting thread is notified with i == 1, cv.wait returns
std::this_thread::sleep_for(std::chrono::seconds(1));
lk.lock();
}
}
int main()
{
std::thread t1(waits), t2(signals);
t1.join();
t2.join();
}
Secondary, in the example they unlock the mutex first (signals method). Why are they doing this? Shoulden't they first lock and then unlock after notify?

The mutex protects the predicate, that is, the thing that you are waiting for. Since the thing you are waiting for is, necessarily, shared between threads, it must be protected somehow.
In your example above, i == 1 is the predicate. The mutex protects i.
It may be helpful to take a step back and think about why we need condition variables. One thread detects some state that prevents it from making forward progress and needs to wait for some other thread to change that state. This detection of state has to take place under a mutex because the state must be shared (otherwise, how could another thread change that state?).
But the thread can't release the mutex and then wait. What if the state changed after the mutex was released but before the thread managed to wait? So you need an atomic "unlock and wait" operation. That's specifically what condition variables provide.
With no mutex, what would they unlock?
The choice of whether to signal the condition variable before or after releasing the lock is a complex one with advantages on both sides. Generally speaking, you will get better performance if you signal while holding the lock.

A good rule of thumb to remember when working with multiple threads is that, when you ask a question, the result may be a lie. That is, the answer may have changed since it was given to you. The only way to reliably ask a question is to make it effectively single-threaded. Enter mutexes.
A condition variable waits for a trigger so that it can check its condition. To check its condition, it needs to ask a question.
If you do not lock before waiting, then it is possible that you ask the question and get the condition, and you are told that the condition is false. This becomes a lie as the trigger occurs and the condition becomes true. But you don't know this, since there's no mutex making this effectively single-threaded.
Instead, you wait on the condition variable for a trigger that will never fire, because it already did. This deadlocks.

Related

boost asio: Is it thread safe to call tcp::socket::async_read_some() when handler is protected by a strand

I'm struggle to full understand Boost ASIO and strands. I was under the impression that the call to socket::async_read_some() was safe as long as the handler was wrapped in a strand. This appears not to be the case since the code eventually throws an exception.
In my situation a third party library is making the Session::readSome() calls. I'm using a reactor pattern with the ASIO layer under the third party library. When data arrives on the socket the 3rd party is called to do the read. The pattern is used since it is necessary to abort the read operation at any time and have the 3rd party library error out and return its thread. The third party expected a blocking read so the code mimics it with a conditional variable.
Given the example below what is the proper way to do this? Do I need to wrap the async_read_some() call in a dispatch() or post() so it runs through a strand too?
Note: Compiler is c++14 ;-(
Example representative code:
Session::Session (ba::io_context& ioContext):
m_sessionStrand ( ioContext.get_executor() ),
m_socket ( m_sessionStrand )
{}
int32_t Session::readSome (unsigned char* pBuffer, uint32_t bufferSizeToRead, boost::system::error_code& errorCode)
{
// The 3d party expects a synchronous read so we mimic the behavior
// with a async_read and then wait for the results. With this pattern
// we can unblock the read elsewhere - for or example calling close on the socket -
// and still give the 3d party the illusion of a synchronous read.
// In such a cases the 3rd party will receive an error code
// on the read and return it's thread.
// Nothing to do
if ( bufferSizeToRead == 0) return 0;
// Create a mutable buffer
ba::mutable_buffer buffer (pBuffer, bufferSizeToRead);
std::size_t result = 0;
errorCode.clear();
// Setup conditional
m_readerPause.exchange(true);
auto readHandler = [&result, &errorCode, self=shared_from_this()](boost::system::error_code ec, std::size_t bytesRead)
{
result = bytesRead;
errorCode = ec;
// Signal that we got results
std::unique_lock<std::mutex> lock{m_readerMutex};
m_readerPause.exchange(false);
m_readerPauseCV.notify_all();
};
m_socket.async_read_some(buffer, ba::bind_executor (m_sessionStrand, readHandler));
// We pause the 3rd party read thread until we get the read results back - or an error occurs
{
std::unique_lock<std::mutex> lock{m_readerMutex};
m_readerPauseCV.wait (lock, [this]{ return !m_readerPause.load(std::memory_order_acquire); } );
}
return result;
}
The exception occurs in epoll_reactor.ipp. There is a race condition between the read and closing the socket.
void epoll_reactor::start_op(int op_type, socket_type descriptor,
epoll_reactor::per_descriptor_data& descriptor_data, reactor_op* op,
bool is_continuation, bool allow_speculative)
{
if (!descriptor_data)
{
op->ec_ = boost::asio::error::bad_descriptor;
post_immediate_completion(op, is_continuation);
return;
}
mutex::scoped_lock descriptor_lock(descriptor_data->mutex_);
if (descriptor_data->shutdown_) //!! SegFault here: descriptor_data == NULL*
{
post_immediate_completion(op, is_continuation);
return;
}
...
}
Thanks in advance for any insights in the proper way to handle this situation using ASIO.
The strand doesn't "protect" the handler. Instead, it protects some shared state (which you control) by synchronizing handler execution. It's exactly like a mutex for async execution.
According to this logic all code running on the strand can touch the shared resources, and conversely, code not guaranteed to be on the strand can not be allowed to touch them.
In your code, the shared resources consist of at least buffer, result, m_socket. It would be more complete to include the m_sessionStrand, m_readerPauseCV, m_readerMutex, m_readerPause but all of these are implicitly threadsafe the way they are used¹.
Your code looks to do things safely in these regards. However it makes a few unfortunate detours that make it harder than necessary to check/reason about the code:
it uses more (local) shared state to communicate results from the handler
it doesn't make explicit what the mutex and/or the strand protect
it employs both a mutex and a strand which conceptually compete for the same responsibility
it employs both a condition and an atomic bool, which again compete for the same responsibility
it does manual strand binding, which muddies the expectations about what the native executor for the m_socket object is expected to be
the initial read is not protected. This means that if Session::readSome is invoked from a "wild" thread, it will use member functions without synchronizing with any other operations that may be pending on the m_socket.
the atomic_bool mutations are spelled in Very Convoluted Ways(TM), which serve to show you (presumably) understand the memory model, but make the code harder to review without tangible merit. Clearly, the blocking synchronization will (far) outweigh any benefit of explicit memory acquisition order. I suggest to at least "normalize" the spelling as atomic_bool was explicitly designed to afford:
//m_readerPause.exchange(true);
m_readerPause = true;
and
m_readerPauseCV.wait(lock, [this] { return !m_readerPause; });
since you are emulating blocking IO, there is no merit capturing shared_from_this() in the lambda. Lifetime should be guaranteed by the calling party any ways.
Interestingly, you didn't show this capture, which is required for the lambda to compile, assuming you didn't use global variables.
Kudos for explicitly clearing the error_code output variable. This is oft forgotten. Technically, you did forget about with the (questionable?) early exit when (bufferSizeToRead == 0)... You might have a slightly unorthodox caller contract where this makes sense.
To be generic I'd suggest to perform the zero-length read as it might behave differently depending on the transport connected.
Last, but not least, m_socket.[async_]read_some is rarely what you require on application protocol level. I'll leave this one to you, as you might have this exceptional edge-case scenario.
Simplifying
Conceptually, I'd like to write:
int32_t Session::readSome(unsigned char* buf, uint32_t size, error_code& ec) {
ec.clear();
size_t result = 0;
std::tie(ec, result) = m_socket
.async_read_some(ba::buffer(buf, size),
ba::as_tuple(ba::use_future))
.get();
return result;
}
This uses futures to get the blocking behaviour while being cancelable. Sadly, contrary to expectation there is currently a limitation that prevents combining as_tuple and use_future.
So, we have to either ignore partial success scenarios (significant result when !ec):
int32_t Session::readSome(unsigned char* buf, uint32_t size, error_code& ec) try {
ec.clear();
return m_socket
.async_read_some(ba::buffer(buf, size), ba::use_future)
.get();
} catch (boost::system::system_error const& se) {
ec = se.code();
return 0;
}
I suspect that member-async_read_some doesn't have a partial success mode. However, let's still give it thought, seeing that I warned before that async_read_some is rarely what you need anyways:
int32_t Session::readSome(unsigned char* buf, uint32_t size, error_code& ec) {
std::promise<std::tuple<size_t, error_code> > p;
m_socket.async_read_some(ba::buffer(buf, size), [&p](error_code ec_, size_t n_) { p.set_value({n_, ec_}); });
size_t result;
std::tie(result, ec) = p.get_future().get();
return result;
}
Still considerably easier.
Interim Result
Self contained example with the current approach:
Live On Coliru
#include <boost/asio.hpp>
namespace ba = boost::asio;
using ba::ip::tcp;
using boost::system::error_code;
using CharT = /*unsigned*/ char; // for ease of output...
struct Session : std::enable_shared_from_this<Session> {
tcp::socket m_socket;
Session(ba::any_io_executor ex) : m_socket(make_strand(ex)) {
m_socket.connect({{}, 7878});
}
int32_t readSome(CharT* buf, uint32_t size, error_code& ec) {
std::promise<std::tuple<size_t, error_code>> p;
m_socket.async_read_some(ba::buffer(buf, size), [&p](error_code ec_, size_t n_) {
p.set_value({n_, ec_});
});
size_t result;
std::tie(result, ec) = p.get_future().get();
return result;
}
};
#include <iomanip>
#include <iostream>
int main() {
ba::thread_pool ioc;
auto s = std::make_shared<Session>(ioc.get_executor());
error_code ec;
CharT data[10];
while (auto n = s->readSome(data, 10, ec))
std::cout << "Received " << quoted(std::string(data, n)) << " (" << ec.message() << ")\n";
ioc.join();
}
Testing with
g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp
for resp in FOO LONG_BAR_QUX_RESPONSE; do nc -tln 7878 -w 0 <<< $resp; done&
set -x
sleep .2; ./a.out
sleep .2; ./a.out
Prints
+ sleep .2
+ ./a.out
Received "FOO
" (Success)
+ sleep .2
+ ./a.out
Received "LONG_BAR_Q" (Success)
Received "UX_RESPONS" (Success)
Received "E
" (Success)
External Synchronization (Cancellation?)
Now, code not show implies that other operations may act on m_socket, if at least only to cancel operations in flight³. If this situation arises you have add the missing synchronization, either using the mutex or the strand.
I suggest not introducing the competing synchronization mechanism, even though not "incorrect". It will
lead to simpler code
allow you to solidify your understanding of the use of the strand.
So, let's make sure that the operation runs on the strand:
int32_t readSome(CharT* buf, uint32_t size, error_code& ec) {
std::promise<size_t> p;
post(m_socket.get_executor(), [&] {
m_socket.async_read_some(ba::buffer(buf, size),
[&](error_code ec_, size_t n_) { ec = ec_; p.set_value(n_); });
});
return p.get_future().get();
}
void cancel() {
post(m_socket.get_executor(),
[self = shared_from_this()] { self->m_socket.cancel(); });
}
See it Live On Coliru
Exercising Cancellation
int main() {
ba::thread_pool ioc(1);
auto s = std::make_shared<Session>(ioc.get_executor());
std::thread th([&] {
std::this_thread::sleep_for(5s);
s->cancel();
});
error_code ec;
CharT data[10];
do {
auto n = s->readSome(data, 10, ec);
std::cout << "Received " << quoted(std::string(data, n)) << " (" << ec.message() << ")\n";
} while (!ec);
ioc.join();
th.join();
}
Again, Live On Coliru
¹ Technically in a multi-thread situation you need to notify the CV under the lock to allow for fair scheduling, i.e. to prevent waiter starvation. However your scenario is so isolated that you can get away with being somewhat sloppy.
² by default tcp::socket type-erases the executor with any_io_executor, but you could use basic_stream_socket<tcp, strand<io_context::executor_type> > to remove that cost if your executor type is statically known
³ Of course, POSIX sockets include full duplex scenarios, where read and write operations can be in flight simultaneoulsy.
UPDATE: redirect_error
Just re-discovered redirect_error which allows something close to as_tuple:
auto readSome(CharT* buf, uint32_t size, error_code& ec) {
return m_socket
.async_read_some(ba::buffer(buf, size),
ba::redirect_error(ba::use_future, ec))
.get();
}
void cancel() { m_socket.cancel(); }
This only suffices when readSome and cancel are guaranteed to be invoked on the strand.

Boost process continuously read output

I'm trying to read outputs/logs from different processes and display them in a GUI. The processes will be running for long time and produce huge output. I'm planning to stream the output from those processes and display them according to my needs. All the while allow my gui application to take user inputs and perform other actions.
What I've done here is, from main thread launch two threads for each process. One for launching the process and another for reading output from the process.
This is the solution I've come up thus far.
// Process Class
class MyProcess {
namespace bp = boost::process;
boost::asio::io_service mService; // member variable of the class
bp::ipstream mStream // member variable of the class
std::thread mProcessThread, mReaderThread // member variables of the class.
public void launch();
};
void
MyProcess::launch()
{
mReaderThread = std::thread([&](){
std::string line;
while(getline(mStream, line)) {
std::cout << line << std::endl;
}
});
mProcessThread = std::thread([&]() {
auto c = boost::child ("/path/of/executable", bp::std_out > mStream, mService);
mService.run();
mStream.pipe().close();
}
}
// Main Gui class
class MyGui
{
MyProcess process;
void launchProcess();
}
MyGui::launchProcess()
{
process.launch();
doSomethingElse();
}
The program is working as expected so far. But I'm not sure if this is the correct solution. Please let me know if there's any alternative/better/correct solution
Thanks,
Surya
The most striking conceptual issues I see are
Process are asynchronous, no need to add a thread to run them.¹
You prematurely close the pipe:
mService.run();
mStream.pipe().close();
Run is not "blocking" in the sense that it will not wait for the child to exit. You could use wait to achieve that. Other than that, you can just remove the close() call.
With the close means you will lose all or part of the output. You might not see any of the output if the child process takes a while before it outputs the first data.
You are accessing the mStream from multiple threads without synchronization. This invokes Undefined Behaviour because it opens a Data Race.
In this case you can remove the immediate problem by removing the mStream.close() call mentioned before, but you must take care to start the reader-thread only after the child has been initialized.
Strictly speaking the same caution should be taken for std::cout.
You are passing the io_service reference, but it's not being used. Just dropping it seems like a good idea.
The destructor of MyProcess needs to detach or join the threads. To prevent Zombies, it needs to detach or reap the child pid too.
In combination with the lifetime of mStream detaching the reader thread is not really an option, as mStream is being used from the thread.
Let's put out the first fixes first, and after that I'll suggest show some more simplifications that make sense in the scope of your sample.
First Fixes
I used a simple bash command to emulate a command generating 1000 lines of ping:
Live On Coliru
#include <boost/process.hpp>
#include <thread>
#include <iostream>
namespace bp = boost::process;
/////////////////////////
class MyProcess {
bp::ipstream mStream;
bp::child mChild;
std::thread mReaderThread;
public:
~MyProcess();
void launch();
};
void MyProcess::launch() {
mChild = bp::child("/bin/bash", std::vector<std::string> {"-c", "yes ping | head -n 1000" }, bp::std_out > mStream);
mReaderThread = std::thread([&]() {
std::string line;
while (getline(mStream, line)) {
std::cout << line << std::endl;
}
});
}
MyProcess::~MyProcess() {
if (mReaderThread.joinable()) mReaderThread.join();
if (mChild.running()) mChild.wait();
}
/////////////////////////
class MyGui {
MyProcess _process;
public:
void launchProcess();
};
void MyGui::launchProcess() {
_process.launch();
// doSomethingElse();
}
int main() {
MyGui gui;
gui.launchProcess();
}
Simplify!
In the current model, the thread doesn't pull it's weight.
I you'd use io_service with asynchronous IO instead, you could even do away with the whole thread to begin with, by polling the service from inside your GUI event loop².
If you're gonna have it, and since child processes naturally execute asynchronously³ you could simply do:
Live On Coliru
#include <boost/process.hpp>
#include <thread>
#include <iostream>
std::thread launch(std::string const& command, std::vector<std::string> args = {}) {
namespace bp = boost::process;
return std::thread([=] {
bp::ipstream stream;
bp::child c(command, args, bp::std_out > stream);
std::string line;
while (getline(stream, line)) {
// TODO likely post to some kind of queue for processing
std::cout << line << std::endl;
}
c.wait(); // reap PID
});
}
The demo displays exactly the same output as earlier.
¹ In fact, adding threads is asking for trouble with fork
² or perhaps idle tick or similar idea. Qt has a ready-made integration (How to integrate Boost.Asio main loop in GUI framework like Qt4 or GTK)
³ on all platforms supported by Boost Process

Using ThreadSafe variable macros in LabWindows/CVI

I am using the thread safe variable macros in the LabWindows/CVI environment and have observed that it is possible to get a pointer to a thread safe variable before it has been released. (from a previous request)
Because the data I am interesting in protecting is a struct, I am unable to explicitly set nesting level, so I assume that the nesting level stays at 0, i.e. that once a single thread safe pointer has been issued, a request for second will be denied until the first has been released. However, I have observed this not to be true while stepping through a debug session. Execution continues through the DefineThreadSafeVar(CLI, SafeCli); statement by continuing to use the F8 step-into key, and subsequent requests for a pointer to the thread-safe variable are granted without ever having released the original.
My expectations are that these macros should prevent access to a thread-safe variable once a pointer to it has been issued and not yet released.
Are my expectations incorrect?
Or have I implemented the calls incorrectly?
Here is my source code:
#include <utility.h>
typedef struct {
int hndl;
int connect;
int sock;
}CLI;
DefineThreadSafeVar(CLI, SafeCli);
void func1(void);
void func2(void);
int main(void)
{
InitializeSafeCli();
func1();
return 0;
}
void func1(void)
{
CLI *safe;
safe = GetPointerToSafeCli();//original issue
safe->connect = 2;
safe->hndl = 3;
safe->sock = 4;
func2();
safe->connect;
safe->hndl;
safe->sock;
ReleasePointerToSafeCli();
}
void func2(void)
{
CLI *safe;
safe = GetPointerToSafeCli();//request is granted. previous issue had not been released.
//shouldn't request have been denied ?
safe->connect = 5;//variable is modified.
safe->hndl = 6;
safe->sock = 7;
}
In your case, you are calling func2() within func1() and subsequently is within the same call stack and thus the same thread. You're granted access because you are requesting the pointer from within the same thread that already has access to the pointer.
GetPointerToSafeCli() is a waiting call. Had it been called from thread A and then again in thread B before ReleasePointerToSafeCli() was called back in thread A, thread B would wait until the pointer was released before granting access.
LabWindows/CVI - Programming with DefineThreadSafeScalarVar

How to sink rvalue in a path where it's not used

When a function takes an rvalue reference which it doesn't use in some branches, what should it do with the rvalue to maintain the semantic correctness of it's signature and to be consistent about the ownership of the data. Considering the sample code below:
#include <memory>
#include <list>
#include <iostream>
struct Packet {};
std::list<std::unique_ptr<Packet>> queue;
void EnQueue(bool condition, std::unique_ptr<Packet> &&pkt) {
if (condition) queue.push_back(std::move(pkt));
else /* How to consume the pkt? */;
}
int main()
{
std::unique_ptr<Packet> upkt1(new Packet());
std::unique_ptr<Packet> upkt2(new Packet());
EnQueue(true, std::move(upkt1));
EnQueue(false, std::move(upkt2));
std::cout << "raw ptr1: " << upkt1.get() << std::endl;
std::cout << "raw ptr2: " << upkt2.get() << std::endl;
return 0;
}
The signature of the Enqueue function indicates that it will take ownership of the data passed to it but this is only true if it hits the if path, instead if it hits the else path the function effectively doesn't use the rvalue and the ownership is returned back to the caller, which is illustrated by the fact that upkt2.get is not NULL after returning from the function. The net effect is that the behaviour of EnQueue is inconsistent with it's signature.
The question now is - whether this is an acceptable behaviour or should the EnQueue function be changed to be consistent, if so how?
I see three ways of dealing with this.
Document that after the function exits, "pkt is left in a valid, but unspecified state." That way, the caller cannot assume anything about the parameter afterwards; if they need it cleared no matter what, they can do it explicitly. If they don't, they will not pay for any internal cleanup they would not use.
If you want to make the signature 100% clear on accepting ownership, just take pkt by value instead of rvalue reference (as suggested by #Quentin in comments).
Construct a temporary from pkt:
if (condition) queue.push_back(std::move(pkt));
else auto sink(std::move(pkt));

OpenACC 2.0 routine: data locality

Take the following code, which illustrates the calling of a simple routine on the accelerator, compiled on the device using OpenACC 2.0's routine directive:
#include <iostream>
#pragma acc routine
int function(int *ARRAY,int multiplier){
int sum=0;
#pragma acc loop reduction(+:sum)
for(int i=0; i<10; ++i){
sum+=multiplier*ARRAY[i];
}
return sum;
}
int main(){
int *ARRAY = new int[10];
int multiplier = 5;
int out;
for(int i=0; i<10; i++){
ARRAY[i] = 1;
}
#pragma acc enter data create(out) copyin(ARRAY[0:10],multiplier)
#pragma acc parallel present(out,ARRAY[0:10],multiplier)
if (function(ARRAY,multiplier) == 50){
out = 1;
}else{
out = 0;
}
#pragma acc exit data copyout(out) delete(ARRAY[0:10],multiplier)
std::cout << out << std::endl;
}
How does function know to use the device copies of ARRAY[0:10] and multiplier when it is called from within a parallel region? How can we enforce the use of the device copies?
When your routine is called within a device region (the parallel in your code), it is being called by the threads on the device, which means those threads will only have access to arrays on the device. The compiler may actually choose to inline that function, or it may be a device-side function call. That means that you can know that when the function is called from the device it will be receiving device copies of the data because the function is essentially inheriting the present data clause from the parallel region. If you still want to convince yourself that you're running on the device once inside the function, you could call acc_on_device, but that only tells you that you're running on the accelerator, not that you received a device pointer.
If you want to enforce the use of device copies more than that, you could make the routine nohost so that it would technically not be valid to call from the host, but that doesn't really do what you're asking, which is to do a check on the GPU that the array really is a device array.
Keep in mind though that any code inside a parallel region that is not inside a loop will be run gang-redundantly, so the write to out is likely a race condition, unless you happen to be running with one gang or you write to it using an atomic.
Basically, when you involved "data" clause, the device will create/copy data to the device memory, then the block of code that defined with "acc routine" will be executed on the device. Notice that the memory between host and device does not share unlike multi-threading (OpenMP). So yes, "function" will be using the device copies of ARRAY and multiplier as long as it is under data segment. Hope this helps! :)
You should assign the function with one parallelism level such as gang/worker/vector. It's a more accurate way.
The routine will use the date in device memory.

Resources