G++ -cilkplus random behavior with std::vectors - c++11

The following (reduced) code is very badly handled by the series of GCC
#include <vector>
#include <cilk/cilk.h>
void walk(std::vector<int> v, int bnd, unsigned size) {
if (v.size() < size)
for (int i=0; i<bnd; i++) {
std::vector<int> vnew(v);
vnew.push_back(i);
cilk_spawn walk(vnew, bnd, size);
}
}
int main(int argc, char **argv) {
std::vector<int> v{};
walk(v , 5, 5);
}
Specifically:
G++ 5.3.1 crash:
red.cpp: In function ‘<built-in>’:
red.cpp:20:39: internal compiler error: in lower_stmt, at gimple-low.c:397
cilk_spawn walk(vnew, bnd, size);
G++ 6.3.1 create a code which works perfectly well if executed on one core
but segfault sometime, signal a double free some other times if using more cores. A student who
has a arch linux g++7 reported a similar result.
My question : is there something wrong with that code. Am I invoking some
undefined behavior or is it simply a bug I should report ?

Answering my own question:
According to https://gcc.gnu.org/ml/gcc-help/2017-03/msg00078.html its indeed a bug in GCC. The temporary is destroyed in the parent and not in the children in a cilk_spawn. So if the thread fork really occur, it might be destroyed too early.

Related

warning #3180: unrecognized OpenMP #pragma

I am having a really hard time in implementing openMP code on my mac machine on Terminal with icc compiler. I find the following error! Please do help me with the correction of error.
The following code is pasted as follows. IT NEVER WORK FOR openMP for, reduce either. The pragma is just not recognising. Appreciate yourself trying the code to help.
#include <stdio.h>
#include <omp.h>
int main()
{
#pragma omp parallel for
{
for(int i=0;i<3;i++)
{
printf("Hello");
}
}
return 0;
}
To add to my comment, the correct version of the code is
#include <stdio.h>
#include <omp.h>
int main()
{
#pragma omp parallel for
for(int i=0;i<3;i++)
{
printf("Hello");
}
return 0;
}
The proper compiler command line is icc -fopenmp ... -o bla.exe bla.c (assuming that the file is named bla.c). Please replace ... with the other command line options that you will need for your code to compile.
UPDATE: The proper compiler command line for the new OpenMP compilers from Intel is to use -fiopenmp (needs -fopenmp-targets=spir64 for GPUs).

Why does ThreadSanitizer report a race with this lock-free example?

I've boiled this down to a simple self-contained example. The main thread enqueues 1000 items, and a worker thread tries to dequeue concurrently. ThreadSanitizer complains that there's a race between the read and the write of one of the elements, even though there is an acquire-release memory barrier sequence protecting them.
#include <atomic>
#include <thread>
#include <cassert>
struct FakeQueue
{
int items[1000];
std::atomic<int> m_enqueueIndex;
int m_dequeueIndex;
FakeQueue() : m_enqueueIndex(0), m_dequeueIndex(0) { }
void enqueue(int x)
{
auto tail = m_enqueueIndex.load(std::memory_order_relaxed);
items[tail] = x; // <- element written
m_enqueueIndex.store(tail + 1, std::memory_order_release);
}
bool try_dequeue(int& x)
{
auto tail = m_enqueueIndex.load(std::memory_order_acquire);
assert(tail >= m_dequeueIndex);
if (tail == m_dequeueIndex)
return false;
x = items[m_dequeueIndex]; // <- element read -- tsan says race!
++m_dequeueIndex;
return true;
}
};
FakeQueue q;
int main()
{
std::thread th([&]() {
int x;
for (int i = 0; i != 1000; ++i)
q.try_dequeue(x);
});
for (int i = 0; i != 1000; ++i)
q.enqueue(i);
th.join();
}
ThreadSanitizer output:
==================
WARNING: ThreadSanitizer: data race (pid=17220)
Read of size 4 at 0x0000006051c0 by thread T1:
#0 FakeQueue::try_dequeue(int&) /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:26 (issue49+0x000000402bcd)
#1 main::{lambda()#1}::operator()() const <null> (issue49+0x000000401132)
#2 _M_invoke<> /usr/include/c++/5.3.1/functional:1531 (issue49+0x0000004025e3)
#3 operator() /usr/include/c++/5.3.1/functional:1520 (issue49+0x0000004024ed)
#4 _M_run /usr/include/c++/5.3.1/thread:115 (issue49+0x00000040244d)
#5 <null> <null> (libstdc++.so.6+0x0000000b8f2f)
Previous write of size 4 at 0x0000006051c0 by main thread:
#0 FakeQueue::enqueue(int) /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:16 (issue49+0x000000402a90)
#1 main /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:44 (issue49+0x000000401187)
Location is global 'q' of size 4008 at 0x0000006051c0 (issue49+0x0000006051c0)
Thread T1 (tid=17222, running) created by main thread at:
#0 pthread_create <null> (libtsan.so.0+0x000000027a67)
#1 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) <null> (libstdc++.so.6+0x0000000b9072)
#2 main /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:41 (issue49+0x000000401168)
SUMMARY: ThreadSanitizer: data race /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:26 FakeQueue::try_dequeue(int&)
==================
ThreadSanitizer: reported 1 warnings
Command line:
g++ -std=c++11 -O0 -g -fsanitize=thread issue49.cpp -o issue49 -pthread
g++ version: 5.3.1
Can anybody shed some light onto why tsan thinks this is a data race?
UPDATE
It seems like this is a false positive. To appease ThreadSanitizer, I've added annotations (see here for the supported ones and here for an example). Note that detecting whether tsan is enabled in GCC via a macro has only recently been added, so I had to manually pass -D__SANITIZE_THREAD__ to g++ for now.
#if defined(__SANITIZE_THREAD__)
#define TSAN_ENABLED
#elif defined(__has_feature)
#if __has_feature(thread_sanitizer)
#define TSAN_ENABLED
#endif
#endif
#ifdef TSAN_ENABLED
#define TSAN_ANNOTATE_HAPPENS_BEFORE(addr) \
AnnotateHappensBefore(__FILE__, __LINE__, (void*)(addr))
#define TSAN_ANNOTATE_HAPPENS_AFTER(addr) \
AnnotateHappensAfter(__FILE__, __LINE__, (void*)(addr))
extern "C" void AnnotateHappensBefore(const char* f, int l, void* addr);
extern "C" void AnnotateHappensAfter(const char* f, int l, void* addr);
#else
#define TSAN_ANNOTATE_HAPPENS_BEFORE(addr)
#define TSAN_ANNOTATE_HAPPENS_AFTER(addr)
#endif
struct FakeQueue
{
int items[1000];
std::atomic<int> m_enqueueIndex;
int m_dequeueIndex;
FakeQueue() : m_enqueueIndex(0), m_dequeueIndex(0) { }
void enqueue(int x)
{
auto tail = m_enqueueIndex.load(std::memory_order_relaxed);
items[tail] = x;
TSAN_ANNOTATE_HAPPENS_BEFORE(&items[tail]);
m_enqueueIndex.store(tail + 1, std::memory_order_release);
}
bool try_dequeue(int& x)
{
auto tail = m_enqueueIndex.load(std::memory_order_acquire);
assert(tail >= m_dequeueIndex);
if (tail == m_dequeueIndex)
return false;
TSAN_ANNOTATE_HAPPENS_AFTER(&items[m_dequeueIndex]);
x = items[m_dequeueIndex];
++m_dequeueIndex;
return true;
}
};
// main() is as before
Now ThreadSanitizer is happy at runtime.
This looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78158. Disassembling the binary produced by GCC shows that it doesn't instrument the atomic operations on O0.
As a workaround, you can either build your code with GCC with -O1/-O2, or get yourself a fresh Clang build and use it to run ThreadSanitizer (this is the recommended way, as TSan is being developed as part of Clang and only backported to GCC).
The comments above are invalid: TSan can easily comprehend the happens-before relation between the atomics in your code (one can check that by running the above reproducer under TSan in Clang).
I also wouldn't recommend using the AnnotateHappensBefore()/AnnotateHappensAfter() for two reasons:
you shouldn't need them in most cases; they denote that the code is doing something really complex (in which case you may want to double-check you're doing it right);
if you make an error in your lock-free code, spraying it with annotations may mask that error, so that TSan won't notice it.
The ThreadSanitizer is not good at counting, it cannot understand that writes to the items always happen before the reads.
The ThreadSanitizer can find that the stores of m_enqueueIndex happen before the loads, but it does not understand that the store to items[m_dequeueIndex] must happen before the load when tail > m_dequeueIndex.

gcc "not inlined" warning

Does gcc's inline __attribute__((__always_inline__)) generate warning, when compiler can't inline function?
Because VS does http://msdn.microsoft.com/en-us/library/z8y1yy88.aspx:
If the compiler cannot inline a function declared with __forceinline,
it generates a level 1 warning.
You need -Winline to get warnings about non-inlined functions.
If you want to verify this you can try taking the address of an inline function (which prevents it from being inlined) and then you should see a warning.
#include <stdio.h>
static inline __attribute__ ((always_inline)) int add(int a, int b)
{
return a + b;
}
int main(void)
{
printf("%d\n", add(21, 21));
printf("%p\n", add);
return 0;
}
EDIT
I've been trying to produce a warning with the above code and other examples without success - it seems that the behaviour of current versions of gcc and clang may have changed in this area. I'll delete this answer if I can't code up with a better example that generates a warning.

Dereferencing void* warnings on Xcode

I'm aware of this SO question and this SO question. The element
of novelty in this one is in its focus on Xcode, and in its use of
square brackets to dereference a pointer to void.
The following program compiles with no warning in Xcode 4.5.2, compiles
with a warning on GCC 4.2 and, even though I don't have Visual Studio
right now, I remember that it would consider this a compiler
error, and MSDN and Internet agree.
#include <stdio.h>
int main(int argc, const char * argv[])
{
int x = 24;
void *xPtr = &x;
int *xPtr2 = (int *)&xPtr[1];
printf("%p %p\n", xPtr, xPtr2);
}
If I change the third line of the body of main to:
int *xPtr2 = (int *)(xPtr + 1);
It compiles with no warnings on both GCC and Xcode.
I would like to know how can I turn this silence into warnings or errors, on
GDB and especially Xcode/LLVM, including the fact that function main is int but
does not explicitly return any value (By the way I think -Wall does
the trick on GDB).
that isnt wrong at all...
the compiler doesnt know how big the pointer is ... a void[] ~~ void*
thats why char* used as strings need to be \0-terminated
you cannot turn on a warning for that as it isnt possible to determine a 'size of memory pointer to by a pointer' at compile time
void *v = nil;
*v[1] = 0 //invalid
void *v = malloc(sizeof(int)*2);
*v[1] = 0 //valid
*note typed inline on SO -- sorry for any non-working code

Problems with remove_if in VS2010 when using sets

I have the following code.
#include <set>
#include <algorithm>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
typedef set<long> MySet;
MySet a;
for( int i = 0; i < 10; ++i)
{
a.insert(i);
}
MySet::iterator start,end,last;
start = a.begin();
end = a.end();
last = remove_if(start,end,bind2nd(less_equal<long>(),5));
return 0;
}
Which under VS2005 used to compile fine. However using VS2010 I get the following error:
Error 1 error C3892: '_Next' : you cannot assign to a variable that is const c:\program files\microsoft visual studio 10.0\vc\include\algorithm
If I make the container a vector, everything is fine.
I'm guessing something has changed in the standard that I'm not aware of, can someone please shed some light on why this no longer works?
A std::set always keeps its elements in sorted order. std::remove_if attempts to move the elements you don't want removed to the beginning of the collection. This would violate set's invariant of maintaining the elements in sorted order.
The code never should have worked. Older compilers might not have enforced the rules tightly enough to let you know that it wasn't supposed to work, but (apparently) your current one does.

Resources