C++ class with static mutex & racing condition - c++11

I have this C++ class with static mutex as private member of the class to protect cout in another public function of the class. But when I call object of the class from two threads I get a racing condition. Not sure why ?
class ThreadSafePrint
{
public:
void myprint(int threadNumber)
{
std::lock_guard<std::mutex> gaurd(mymutex);
cout <<"Thread " << threadNumber << endl;
}
private:
static std::mutex mymutex;
};
std::mutex ThreadSafePrint::mymutex;
int main()
{
ThreadSafePrint obj;
std::vector<std::thread> workers;
int threadNumber;
// create 2 threads and pass a number
for(int i=0; i<2;++i)
{
// threadNumber = 0 for 1st thread
if(i==0)
{
threadNumber = i;
}
// threadNumber = 1 for 2nd thread
if(i==1)
{
threadNumber = i;
}
workers.push_back(std::thread([&obj,&threadNumber]()
{
obj.myprint(threadNumber);
}));
}
// join all threads
std::for_each(workers.begin(), workers.end(),[](std::thread & th)
{
th.join();
});
return 0;
}
Here are some results:
>> ./mythreads
Thread 1
Thread 1
>> ./mythreads
Thread 0
Thread 0

You capture a reference to the local variable threadNumber in two worker threads, access it in both threads, and mutate it in the main thread without any synchronisation. This is indeed a race condition. Capture by value instead.
workers.push_back(std::thread([&obj, threadNumber]()

You have to capture threadNumber by value, not by reference.
Exchange:
workers.push_back(std::thread([&obj,&threadNumber]()
by
workers.push_back(std::thread([&obj,threadNumber]()
Otherwise the variable threadNumber will be altered also for the first thread, by the second loop run.
#include <iostream>
#include <thread>
#include <mutex>
#include <vector>
#include <algorithm>
class ThreadSafePrint
{
public:
void myprint(int threadNumber)
{
std::lock_guard<std::mutex> gaurd(mymutex);
std::cout <<"Thread " << threadNumber << std::endl;
}
private:
static std::mutex mymutex;
};
std::mutex ThreadSafePrint::mymutex;
int main()
{
ThreadSafePrint obj;
std::vector<std::thread> workers;
int threadNumber;
// create 2 threads and pass a number
for(int i=0; i<2;++i)
{
// threadNumber = 0 for 1st thread
if(i==0)
{
threadNumber = i;
}
// threadNumber = 1 for 2nd thread
if(i==1)
{
threadNumber = i;
}
workers.push_back(std::thread([&obj,threadNumber]()
{
obj.myprint(threadNumber);
}));
}
// join all threads
std::for_each(workers.begin(), workers.end(),[](std::thread & th)
{
th.join();
});
return 0;
}

When you create your threads, you explicitly ask the compiler to provide the thread with access to the same instance of the variable threadNumber that the main function/thread is using.
[&threadNumber]
Again: this is an explicit share.
Indeed, your code suggests that you might want to better grasp the language before you experiment with threading, this code is very strange:
int threadNumber;
// create 2 threads and pass a number
for(int i=0; i<2;++i)
{
// threadNumber = 0 for 1st thread
if(i==0)
{
threadNumber = i;
}
// threadNumber = 1 for 2nd thread
if(i==1)
{
threadNumber = i;
}
It's unclear why anyone would write this instead of:
for (int i = 0; i < 2; ++i) {
workers.push_back(std::thread([&obj, i] () {
obj.myprint(threadNumber);
}));
}
Even this still has a number of design oddities - why are you passing obj by reference? It's an empty class with one static member, you could just as easily avoid the capture and write:
for (int i = 0; i < 2; ++i) {
workers.emplace_back([] (int threadNumber) {
ThreadSafePrint obj;
obj.myprint(threadNumber);
}, i); // pass `i` -> `threadNumber`
}

Related

C++11 std::threads not exiting

Could you please check the following code which is not exiting even after condition becomes false?
I'm trying to print numbers from 1 to 10 by first thread, 2 to 20 by second thread likewise & I have 10 threads, whenever count reaches to 100, my program should terminate safely by terminating all threads. But that is not happening, after printing, it stuck up and I don't understand why?
Is there any data race? Please guide.
#include<iostream>
#include<vector>
#include<thread>
#include<mutex>
#include<condition_variable>
std::mutex mu;
int count=1;
bool isDone = true;
std::condition_variable cv;
void Print10(int tid)
{
std::unique_lock<std::mutex> lock(mu);
while(isDone){
cv.wait(lock,[tid](){ return ((count/10)==tid);});
for(int i=0;i<10;i++)
std::cout<<"tid="<<tid<<" count="<<count++<<"\n";
isDone = count<100;//!(count == (((tid+1)*10)+1));
std::cout<<"tid="<<tid<<" isDone="<<isDone<<"\n";
cv.notify_all();
}
}
int main()
{
std::vector<std::thread> vec;
for(int i=0;i<10;i++)
{
vec.push_back(std::thread(Print10,i));
}
for(auto &th : vec)
{
if(th.joinable())
th.join();
}
}
I believe the following code should work for you
#include<iostream>
#include<vector>
#include<thread>
#include<mutex>
#include<condition_variable>
using namespace std;
mutex mu;
int count=1;
bool isDone = true;
condition_variable cv;
void Print10(int tid)
{
unique_lock<std::mutex> lock(mu);
// Wait until condition --> Wait till count/10 = tid
while(count/10 != tid)
cv.wait(lock);
// Core logic
for(int i=0;i<10;i++)
cout<<"tid="<<tid<<" count="<<count++<<"\n";
// Release the current thread thus ensuring serailization
cv.notify_one();
}
int main()
{
std::vector<std::thread> vec;
for(int i=0;i<10;i++)
{
vec.push_back(std::thread(Print10,i));
}
for(auto &th : vec)
{
if(th.joinable())
th.join();
}
return 0;
}

no data while cpu profiling - visual studio

i tried to profile performance of my code, and thats what i get:
i took a code from microsoft docs from topic about profiling:
#include <iostream>
#include <limits>
#include <mutex>
#include <random>
#include <functional>
//.cpp file code:
static constexpr int MIN_ITERATIONS = std::numeric_limits<int>::max() / 1000;
static constexpr int MAX_ITERATIONS = MIN_ITERATIONS + 10000;
long long m_totalIterations = 0;
std::mutex m_totalItersLock;
int getNumber()
{
std::uniform_int_distribution<int> num_distribution(MIN_ITERATIONS, MAX_ITERATIONS);
std::mt19937 random_number_engine; // pseudorandom number generator
auto get_num = std::bind(num_distribution, random_number_engine);
int random_num = get_num();
auto result = 0;
{
std::lock_guard<std::mutex> lock(m_totalItersLock);
m_totalIterations += random_num;
}
// we're just spinning here
// to increase CPU usage
for (int i = 0; i < random_num; i++)
{
result = get_num();
}
return result;
}
void doWork()
{
std::wcout << L"The doWork function is running on another thread." << std::endl;
auto x = getNumber();
}
int main()
{
std::vector<std::thread> threads;
for (int i = 0; i < 10; ++i) {
threads.push_back(std::thread(doWork));
std::cout << "The Main() thread calls this after starting the new thread" << std::endl;
}
for (auto& thread : threads) {
thread.join();
}
return 0;
}
, and still i'm getting different output (or no output actually). Can someone help me pls? I'm trying to do that on Visual Studio Community 2019

c++ random set seed failed

I am trying to set seed to the c++ std::default_random_engine:
#include<random>
#include<time.h>
#include<iostream>
using namespace std;
void print_rand();
int main() {
for (int i{0}; i < 20; ++i) {
print_rand();
}
return 0;
}
void print_rand() {
default_random_engine e;
e.seed(time(0));
cout << e() << endl;
}
It seems that the printed numbers are same, how could I set the seed to generate the random number according to the time?
You have to seed only once instead of every time the function is called. Then you will get different values. I will move the functionality to main() to demonstrate this.
#include<random>
#include<time.h>
#include<iostream>
int main() {
std::default_random_engine e;
e.seed(time(0));
for (int i{0}; i < 20; ++i) {
std::cout << e() << std::endl;
}
return 0;
}
See Live Demo
As #P.W. said, you should seed only once. A minimal change in that direction would be using a static variable with the seed given to the constructor:
#include<random>
#include<time.h>
#include<iostream>
void print_rand();
int main() {
for (int i{0}; i < 20; ++i) {
print_rand();
}
return 0;
}
void print_rand() {
static std::default_random_engine e(time(0));
cout << e() << endl;
}

Stored lambda function calls are very slow - fix or workaround?

In an attempt to make a more usable version of the code I wrote for an answer to another question, I used a lambda function to process an individual unit. This is a work in progress. I've got the "client" syntax looking pretty nice:
// for loop split into 4 threads, calling doThing for each index
parloop(4, 0, 100000000, [](int i) { doThing(i); });
However, I have an issue. Whenever I call the saved lambda, it takes up a ton of CPU time. doThing itself is an empty stub. If I just comment out the internal call to the lambda, then the speed returns to normal (4 times speedup for 4 threads). I'm using std::function to save the reference to the lambda.
My question is - Is there some better way that the stl library internally manages lambdas for large sets of data, that I haven't come across?
struct parloop
{
public:
std::vector<std::thread> myThreads;
int numThreads, rangeStart, rangeEnd;
std::function<void (int)> lambda;
parloop(int _numThreads, int _rangeStart, int _rangeEnd, std::function<void(int)> _lambda) //
: numThreads(_numThreads), rangeStart(_rangeStart), rangeEnd(_rangeEnd), lambda(_lambda) //
{
init();
exit();
}
void init()
{
myThreads.resize(numThreads);
for (int i = 0; i < numThreads; ++i)
{
myThreads[i] = std::thread(myThreadFunction, this, chunkStart(i), chunkEnd(i));
}
}
void exit()
{
for (int i = 0; i < numThreads; ++i)
{
myThreads[i].join();
}
}
int rangeJump()
{
return ceil(float(rangeEnd - rangeStart) / float(numThreads));
}
int chunkStart(int i)
{
return rangeJump() * i;
}
int chunkEnd(int i)
{
return std::min(rangeJump() * (i + 1) - 1, rangeEnd);
}
static void myThreadFunction(parloop *self, int start, int end) //
{
std::function<void(int)> lambda = self->lambda;
// we're just going to loop through the numbers and print them out
for (int i = start; i <= end; ++i)
{
lambda(i); // commenting this out speeds things up back to normal
}
}
};
void doThing(int i) // "payload" of the lambda function
{
}
int main()
{
auto start = timer.now();
auto stop = timer.now();
// run 4 trials of each number of threads
for (int x = 1; x <= 4; ++x)
{
// test between 1-8 threads
for (int numThreads = 1; numThreads <= 8; ++numThreads)
{
start = timer.now();
// this is the line of code which calls doThing in the loop
parloop(numThreads, 0, 100000000, [](int i) { doThing(i); });
stop = timer.now();
cout << numThreads << " Time = " << std::chrono::duration_cast<std::chrono::nanoseconds>(stop - start).count() / 1000000.0f << " ms\n";
//cout << "\t\tsimple list, time was " << deltaTime2 / 1000000.0f << " ms\n";
}
}
cin.ignore();
cin.get();
return 0;
}
I'm using std::function to save the reference to the lambda.
That's one possible problem, as std::function is not a zero-runtime-cost abstraction. It is a type-erased wrapper that has a virtual-call like cost when invoking operator() and could also potentially heap-allocate (which could mean a cache-miss per call).
If you want to store your lambda in such a way that does not introduce additional overhead and that allows the compiler to inline it, you should use a template parameter. This is not always possible, but might fit your use case. Example:
template <typename TFunction>
struct parloop
{
public:
std::thread **myThreads;
int numThreads, rangeStart, rangeEnd;
TFunction lambda;
parloop(TFunction&& _lambda,
int _numThreads, int _rangeStart, int _rangeEnd)
: lambda(std::move(_lambda)),
numThreads(_numThreads), rangeStart(_rangeStart),
rangeEnd(_rangeEnd)
{
init();
exit();
}
// ...
To deduce the type of the lambda, you can use an helper function:
template <typename TF, typename... TArgs>
auto make_parloop(TF&& lambda, TArgs&&... xs)
{
return parloop<std::decay_t<TF>>(
std::forward<TF>(lambda), std::forward<TArgs>(xs)...);
}
Usage:
auto p = make_parloop([](int i) { doThing(i); },
numThreads, 0, 100000000);
I wrote an article that's related to the subject:
"Passing functions to functions"
It contains some benchmarks that show how much assembly is generated for std::function compared to a template parameter and other solutions.

boost::variant vs. polymorphism, very different performance results with clang and gcc

I'm trying to figure out how much the execution time of boost::variant differ from a polymorphism approach. In my first test I got very different results on gcc 4.9.1 and clang+llvm 3.5.
You can find the code below. Here are my results:
clang+llvm
polymorphism: 2.16401
boost::variant: 3.83487
gcc:
polymorphism: 2.46161
boost::variant: 1.33326
I compiled both with -O3.
Is someone able to explain that?
code
#include <iostream>
#include <vector>
#include <algorithm>
#include <boost/variant.hpp>
#include <boost/variant/apply_visitor.hpp>
#include <ctime>
struct value_type {
value_type() {}
virtual ~value_type() {}
virtual void inc() = 0;
};
struct int_type : value_type {
int_type() : value_type() {}
virtual ~int_type() {}
void inc() { value += 1; }
private:
int value = 0;
};
struct float_type : value_type {
float_type() : value_type() {}
virtual ~float_type() {}
void inc() { value += 1; }
private:
float value = 0;
};
void dyn_test() {
std::vector<std::unique_ptr<value_type>> v;
for (int i = 0; i < 1024; i++) {
if (i % 2 == 0)
v.emplace_back(new int_type());
else
v.emplace_back(new float_type());
}
for (int i = 0; i < 900000; i++) {
std::for_each(v.begin(), v.end(), [](auto &item) { item->inc(); });
}
}
struct visitor : boost::static_visitor<> {
template <typename T> void operator()(T &item) { item += 1; }
};
using mytype = boost::variant<int, float>;
void static_test() {
std::vector<mytype> v;
for (int i = 0; i < 1024; i++) {
if (i % 2 == 0)
v.emplace_back(0);
else
v.emplace_back(0.f);
}
visitor vi;
for (int i = 0; i < 900000; i++) {
std::for_each(v.begin(), v.end(), boost::apply_visitor(vi));
}
}
template <typename F> double measure(F f) {
clock_t start = clock();
f();
clock_t end = clock();
float seconds = (float)(end - start) / CLOCKS_PER_SEC;
return seconds;
}
int main() {
std::cout << "polymorphism: " << measure([] { dyn_test(); }) << std::endl;
std::cout << "boost::variant: " << measure([] { static_test(); }) << std::endl;
return 0;
}
assembler
gcc
clang+llvm
Clang is known to miscompile some std::vector functions from various Standard libraries, due to some edge cases in their inliner. I don't know if those have been fixed by now but quite possibly not. Since unique_ptr is smaller and simpler than boost::variant it's more likely that it does not trigger these edge cases.
The code you post is practically "Why boost::variant is great". A dynamic allocation and random pointer index in addition to the regular indirections that both perform? That's a heavy hit (relatively).

Resources