why async() write on temporary variable and no crash happend - c++11

This is some code I write to test async() wirte on tempory variables.
the function test use async() the execute a function write_A(A* a), and use future.wait_for() to wait the result for a period of time. If it'is timeout, test returns, the A a2 will be freeed at the same time, because it is allocated on stack. The write_A write on A* a would crash. But in fact the program works fine. Why the async executed function write_A can write on freed stack tempory variable?
struct A {
string name;
string address;
};
int write_A(A* a) {
sleep(3);
a->name = "tractor";
a->address = "unknow";
cout <<"write_A return" << endl;
return 0;
}
void test(A* a) {
A a2;
future<int> fut = async(launch::async, write_A, &a2);
auto status = fut.wait_for(chrono::milliseconds(1000));
if (status == future_status::ready) {
int ret = fut.get();
*a = a2;
cout <<"succ"<<endl;
} else {
cout <<"timeout"<<endl;
}
}
void test2() {
A a;
test(&a);
}
int main ()
{
test2();
sleep(5);
return 0;
}
I expect the program will crash because write_A write on a pointer of object which has beed released when the test returned. But the program output:
timeout
write_A return

This object
future<int> fut
is destroyed at the end of test function. When future destructor is called, it blocks until shared state is ready - it means write_A is ended. So, pointer to a inside test function is valid all time.
Local objects are destroyed in reverse order of their creations, so in this case
A a2;
future<int> fut = async(launch::async, write_A, &a2);
as first fut is deleted, its destructor waits until write_A is completed. Then a2 is destroyed.

Related

Member function captured by lambda asynchronously dispatch issue

As we know, if we have a lambda that captures class members and the lambda is called asynchronously after the class object is released, then it should crash. But if the lambda captures "this", and call this->memFunc() after the class object is released, seems it works OK. I cannot understand why it doesn't crash. See code v1.
class A:
{
public:
int func()
{
std::ofstream myfile("example.txt");
if (myfile.is_open())
{
myfile << "Write from child thread.\n";
myfile.close();
}
else
{
std::cout << "Unable to open file";
}
}
void detach()
{
std::thread t([this]() {
std::this_thread::sleep_for(std::chrono::milliseconds(3000));
func();
});
t.detach();
}
};
int main()
{
{
A a;
a.detach();
}
std::cout << "main end" << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(5000));
return 0;
}
v2:
#define RETURN_FROM_LAMBDA_IF_DEAD(x) \
auto sharedThis = x.lock(); \
if(!sharedThis) \
return;
class A: public std::enable_shared_from_this<A>
{
public:
int func()
{
std::ofstream myfile("example.txt");
if (myfile.is_open())
{
myfile << "Write from child thread.\n";
myfile.close();
}
else
{
std::cout << "Unable to open file";
}
}
void detach()
{
std::thread t([weakThis = weak_from_this(), this]() {
RETURN_FROM_LAMBDA_IF_DEAD(weakThis);
std::this_thread::sleep_for(std::chrono::milliseconds(3000));
func();
});
t.detach();
}
};
int main()
{
{
A a;
a.detach();
}
std::cout << "main end" << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(5000));
return 0;
}
class A has a member function detach() that will create a child thread. The child thread accepts a lambda in which A's member function func() will be called.
When main thread prints "main end", the object a should be released, so when its child thread calls a's func(), it should crash. But a's func() runs ok and "example.txt" created successfully.
Why a's func() can be called even after a has been released?
To make sure a has been released when the child thread call func(), I add weak pointer check, see code v2.
This time, the child thread returns from lambda directly and does not call func(). It means object a indeed has been released when the child thread initiates to run.
Could anyone help give some instructions?

Why is atomic_thread_fence(memory_order_seq_cst) needed in a lock-free queue that already uses seq_cst CAS?

A lock-free queue, only one thread execute push and pop, others execute steal.
However, I can't understand why steal() needs std::atomic_thread_fence(std::memory_order_seq_cst).
In my opinion, steal() only has one store operation, that is _top.compare_exchange_strong, and it has memory_order_seq_cst. So, why does it need a seq_cst fence as well?
template <typename T>
class WorkStealingQueue {
public:
WorkStealingQueue() : _bottom(1), _top(1) { }
~WorkStealingQueue() { delete [] _buffer; }
int init(size_t capacity) {
if (capacity & (capacity - 1)) {
LOG(ERROR) << "Invalid capacity=" << capacity
<< " which must be power of 2";
return -1;
}
_buffer = new(std::nothrow) T[capacity];
_capacity = capacity;
return 0;
}
// Steal one item from the queue.
// Returns true on stolen.
// May run in parallel with push() pop() or another steal().
bool steal(T* val) {
size_t t = _top.load(std::memory_order_acquire);
size_t b = _bottom.load(std::memory_order_acquire);
if (t >= b) {
// Permit false negative for performance considerations.
return false;
}
do {
std::atomic_thread_fence(std::memory_order_seq_cst);
b = _bottom.load(std::memory_order_acquire);
if (t >= b) {
return false;
}
*val = _buffer[t & (_capacity - 1)];
} while (!_top.compare_exchange_strong(t, t + 1,
std::memory_order_seq_cst,
std::memory_order_relaxed));
return true;
}
// Pop an item from the queue.
// Returns true on popped and the item is written to `val'.
// May run in parallel with steal().
// Never run in parallel with push() or another pop().
bool pop(T* val) {
const size_t b = _bottom.load(std::memory_order_relaxed);
size_t t = _top.load(std::memory_order_relaxed);
if (t >= b) {
// fast check since we call pop() in each sched.
// Stale _top which is smaller should not enter this branch.
return false;
}
const size_t newb = b - 1;
_bottom.store(newb, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
t = _top.load(std::memory_order_relaxed);
if (t > newb) {
_bottom.store(b, std::memory_order_relaxed);
return false;
}
*val = _buffer[newb & (_capacity - 1)];
if (t != newb) {
return true;
}
// Single last element, compete with steal()
const bool popped = _top.compare_exchange_strong(
t, t + 1, std::memory_order_seq_cst, std::memory_order_relaxed);
_bottom.store(b, std::memory_order_relaxed);
return popped;
}
// Push an item into the queue.
// Returns true on pushed.
// May run in parallel with steal().
// Never run in parallel with pop() or another push().
bool push(const T& x) {
const size_t b = _bottom.load(std::memory_order_relaxed);
const size_t t = _top.load(std::memory_order_acquire);
if (b >= t + _capacity) { // Full queue.
return false;
}
_buffer[b & (_capacity - 1)] = x;
_bottom.store(b + 1, std::memory_order_release);
return true;
}
private:
DISALLOW_COPY_AND_ASSIGN(WorkStealingQueue);
std::atomic<size_t> _bottom;
size_t _capacity;
T* _buffer;
std::atomic<size_t> BAIDU_CACHELINE_ALIGNMENT _top;
};
You do not have to use a seq-cst-fence, but then you would have to make the operations on _bottom sequentially consistent. The reason is that it must be guaranteed that the load operation in steal sees the updated value written in pop. Otherwise you could have a race condition where the same item could be returned twice (once from pop and once from steal).
For comparison you can take a look at my implementation of the Chase-Lev-Deque: https://github.com/mpoeter/xenium/blob/master/xenium/chase_work_stealing_deque.hpp

Stored lambda function calls are very slow - fix or workaround?

In an attempt to make a more usable version of the code I wrote for an answer to another question, I used a lambda function to process an individual unit. This is a work in progress. I've got the "client" syntax looking pretty nice:
// for loop split into 4 threads, calling doThing for each index
parloop(4, 0, 100000000, [](int i) { doThing(i); });
However, I have an issue. Whenever I call the saved lambda, it takes up a ton of CPU time. doThing itself is an empty stub. If I just comment out the internal call to the lambda, then the speed returns to normal (4 times speedup for 4 threads). I'm using std::function to save the reference to the lambda.
My question is - Is there some better way that the stl library internally manages lambdas for large sets of data, that I haven't come across?
struct parloop
{
public:
std::vector<std::thread> myThreads;
int numThreads, rangeStart, rangeEnd;
std::function<void (int)> lambda;
parloop(int _numThreads, int _rangeStart, int _rangeEnd, std::function<void(int)> _lambda) //
: numThreads(_numThreads), rangeStart(_rangeStart), rangeEnd(_rangeEnd), lambda(_lambda) //
{
init();
exit();
}
void init()
{
myThreads.resize(numThreads);
for (int i = 0; i < numThreads; ++i)
{
myThreads[i] = std::thread(myThreadFunction, this, chunkStart(i), chunkEnd(i));
}
}
void exit()
{
for (int i = 0; i < numThreads; ++i)
{
myThreads[i].join();
}
}
int rangeJump()
{
return ceil(float(rangeEnd - rangeStart) / float(numThreads));
}
int chunkStart(int i)
{
return rangeJump() * i;
}
int chunkEnd(int i)
{
return std::min(rangeJump() * (i + 1) - 1, rangeEnd);
}
static void myThreadFunction(parloop *self, int start, int end) //
{
std::function<void(int)> lambda = self->lambda;
// we're just going to loop through the numbers and print them out
for (int i = start; i <= end; ++i)
{
lambda(i); // commenting this out speeds things up back to normal
}
}
};
void doThing(int i) // "payload" of the lambda function
{
}
int main()
{
auto start = timer.now();
auto stop = timer.now();
// run 4 trials of each number of threads
for (int x = 1; x <= 4; ++x)
{
// test between 1-8 threads
for (int numThreads = 1; numThreads <= 8; ++numThreads)
{
start = timer.now();
// this is the line of code which calls doThing in the loop
parloop(numThreads, 0, 100000000, [](int i) { doThing(i); });
stop = timer.now();
cout << numThreads << " Time = " << std::chrono::duration_cast<std::chrono::nanoseconds>(stop - start).count() / 1000000.0f << " ms\n";
//cout << "\t\tsimple list, time was " << deltaTime2 / 1000000.0f << " ms\n";
}
}
cin.ignore();
cin.get();
return 0;
}
I'm using std::function to save the reference to the lambda.
That's one possible problem, as std::function is not a zero-runtime-cost abstraction. It is a type-erased wrapper that has a virtual-call like cost when invoking operator() and could also potentially heap-allocate (which could mean a cache-miss per call).
If you want to store your lambda in such a way that does not introduce additional overhead and that allows the compiler to inline it, you should use a template parameter. This is not always possible, but might fit your use case. Example:
template <typename TFunction>
struct parloop
{
public:
std::thread **myThreads;
int numThreads, rangeStart, rangeEnd;
TFunction lambda;
parloop(TFunction&& _lambda,
int _numThreads, int _rangeStart, int _rangeEnd)
: lambda(std::move(_lambda)),
numThreads(_numThreads), rangeStart(_rangeStart),
rangeEnd(_rangeEnd)
{
init();
exit();
}
// ...
To deduce the type of the lambda, you can use an helper function:
template <typename TF, typename... TArgs>
auto make_parloop(TF&& lambda, TArgs&&... xs)
{
return parloop<std::decay_t<TF>>(
std::forward<TF>(lambda), std::forward<TArgs>(xs)...);
}
Usage:
auto p = make_parloop([](int i) { doThing(i); },
numThreads, 0, 100000000);
I wrote an article that's related to the subject:
"Passing functions to functions"
It contains some benchmarks that show how much assembly is generated for std::function compared to a template parameter and other solutions.

ffmpeg av_read_frame() need very long time to stop

I use ffmpeg to decode RTSP video.It likes that:
When it's on the end of file,it block in the av_read_frame() for a long time,why?
Various reasons can cause long blocking. But you can control the processing time for a I/O layer.
Use the structure AVFormatContext::interrupt_callback to set the interrupt handler.
class timeout_handler {
public:
timeout_handler(unsigned int t) : timeout_ms_(TimeoutMs){}
void reset(unsigned int 0) {
timeout_ms_ = TimeoutMs;
lastTime_ = my_get_local_time();
}
bool is_timeout(){
const my_time_duration actualDelay = my_get_local_time() - lastTime_;
return actualDelay > timeout_ms_;
}
static int check_interrupt(void * t) {
return t && static_cast<timeout_handler *>(t)->is_timeout();
}
public:
unsigned int timeout_ms_;
my_time_t lastTime_;
};
/// .................
AVFormatContext * ic;
timeout_handler * th = new timeout_handler(kDefaultTimeout);
/// .................
ic->interrupt_callback.opaque = (void*)th ;
ic->interrupt_callback.callback = &timeout_handler::check_interrupt;
/// open input
// avformat_open_input(ic, ... );
// etc
/// .................
/// before any I/O operations, for example:
th->reset(kDefaultTimeout);
int e = AVERROR(EAGAIN);
while (AVERROR(EAGAIN) == e)
e = av_read_frame(ic, &packet);
// If the time exceeds the limit, then the process interruped at the next IO operation.
This problem come because av_read_frame() stuck in network infinite loop
I got the same problem then I have used interrupt call back please refer the sample code
First initialize your context and set interrupt call back
AVFormatContext *_formatCtx;
//Initialize format context
_formatCtx=avformat_alloc_context();
//Initialize intrrupt callback
AVIOInterruptCB icb={interruptCallBack,(__bridge void *)(self)};
_formatCtx->interrupt_callback=icb;
now handle the interrupt in your callback
int interruptCallBack(void *ctx){
//once your preferred time is out you can return 1 and exit from the loop
if(timeout){
//exit
return 1;
}
//continue
return 0;
}

Sharing an object between threads

How would you set the object data that is shared between threads and needs to be updated once after the complete cycle of (say) two threads in busy loop?
CRITICAL_SECTION critical_section_;
int value; //needs to be updated once after the cycle of any number of threads running in busy loop
void ThreadsFunction(int i)
{
while (true)
{
EnterCriticalSection(&critical_section_);
/* Lines of Code */
LeaveCriticalSection(&critical_section_);
}
}
Edit: The value can be an object of any class.
Two suggestions:
Make the object itself thread safe.
Pass the object into the thread as instance data
I'll use C++ as a reference in my example. You can easily transpose this to pure C if you want.
// MyObject is the core data you want to share between threads
struct MyObject
{
int value;
int othervalue;
// all all the other members you want here
};
class MyThreadSafeObject
{
private:
CRITICAL_SECTION _cs;
MyObject _myojbect;
bool _fLocked;
public:
MyThreadSafeObject()
{
_fLocked = false
InitializeCriticalSection();
}
~MYThreadSafeObject()
{
DeleteCriticalSection();
}
// add "getter and setter" methods for each member in MyObject
int SetValue(int x)
{
EnterCriticalSection(&_cs);
_myobject.value = x;
LeaveCriticalSection(&_cs);
}
int GetValue()
{
int x;
EnterCriticalSection(&_cs);
x = _myobject.value;
LeaveCriticalSection(&_cs);
return x;
}
// add "getter and setter" methods for each member in MyObject
int SetOtherValue(int x)
{
EnterCriticalSection(&_cs);
_myobject.othervalue = x;
LeaveCriticalSection(&_cs);
}
int GetOtherValue()
{
int x;
EnterCriticalSection(&_cs);
x = _myobject.othervalue;
LeaveCriticalSection(&_cs);
return x;
}
// and if you need to access the whole object directly without using a critsec lock on each variable access, add lock/unlock methods
bool Lock(MyObject** ppObject)
{
EnterCriticalSection(&_cs);
*ppObject = &_myobject;
_fLocked = true;
return true;
}
bool UnLock()
{
if (_fLocked == false)
return false;
_fLocked = false;
LeaveCriticalSection();
return true;
}
};
Then, create your object and thread as follows:
MyThreadSafeObject* pObjectThreadSafe;
MyObject* pObject = NULL;
// now initilaize your object
pObjectThreadSafe->Lock(&pObject);
pObject->value = 0; // initailze value and all the other members of pObject to what you want them to be.
pObject->othervalue = 0;
pObjectThreadSafe->Unlock();
pObject = NULL;
// Create your threads, passing the pointer to MyThreadSafeObject as your instance data
DWORD dwThreadID = 0;
HANDLE hThread = CreateThread(NULL, NULL, ThreadRoutine, pObjectThreadSafe, 0, &dwThreadID);
And your thread will operate as follows
DWORD __stdcall ThreadFunction(void* pData)
{
MyThreadSafeObject* pObjectThreadSafe = (MyThreadSafeObject*)pData;
MyObject* pObject = NULL;
while (true)
{
/* lines of code */
pObjectThreadSafe->SetValue(x);
/* lines of code */
}
}
If you want implement thread safe update of an integer you should better use InterlockedIncrement and InterlockedDecrement or InterlockedExchangeAdd functions. See http://msdn.microsoft.com/en-us/library/ms684122(VS.85).aspx.
If you do need use EnterCriticalSection and LeaveCriticalSection you will find an example in http://msdn.microsoft.com/en-us/library/ms686908(v=VS.85).aspx, but I recommend you to use EnterCriticalSection inside of __try block and LeaveCriticalSection inside of the __finally part of this blocks.

Resources