Why does ThreadSanitizer report a race with this lock-free example? - c++11

I've boiled this down to a simple self-contained example. The main thread enqueues 1000 items, and a worker thread tries to dequeue concurrently. ThreadSanitizer complains that there's a race between the read and the write of one of the elements, even though there is an acquire-release memory barrier sequence protecting them.
#include <atomic>
#include <thread>
#include <cassert>
struct FakeQueue
{
int items[1000];
std::atomic<int> m_enqueueIndex;
int m_dequeueIndex;
FakeQueue() : m_enqueueIndex(0), m_dequeueIndex(0) { }
void enqueue(int x)
{
auto tail = m_enqueueIndex.load(std::memory_order_relaxed);
items[tail] = x; // <- element written
m_enqueueIndex.store(tail + 1, std::memory_order_release);
}
bool try_dequeue(int& x)
{
auto tail = m_enqueueIndex.load(std::memory_order_acquire);
assert(tail >= m_dequeueIndex);
if (tail == m_dequeueIndex)
return false;
x = items[m_dequeueIndex]; // <- element read -- tsan says race!
++m_dequeueIndex;
return true;
}
};
FakeQueue q;
int main()
{
std::thread th([&]() {
int x;
for (int i = 0; i != 1000; ++i)
q.try_dequeue(x);
});
for (int i = 0; i != 1000; ++i)
q.enqueue(i);
th.join();
}
ThreadSanitizer output:
==================
WARNING: ThreadSanitizer: data race (pid=17220)
Read of size 4 at 0x0000006051c0 by thread T1:
#0 FakeQueue::try_dequeue(int&) /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:26 (issue49+0x000000402bcd)
#1 main::{lambda()#1}::operator()() const <null> (issue49+0x000000401132)
#2 _M_invoke<> /usr/include/c++/5.3.1/functional:1531 (issue49+0x0000004025e3)
#3 operator() /usr/include/c++/5.3.1/functional:1520 (issue49+0x0000004024ed)
#4 _M_run /usr/include/c++/5.3.1/thread:115 (issue49+0x00000040244d)
#5 <null> <null> (libstdc++.so.6+0x0000000b8f2f)
Previous write of size 4 at 0x0000006051c0 by main thread:
#0 FakeQueue::enqueue(int) /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:16 (issue49+0x000000402a90)
#1 main /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:44 (issue49+0x000000401187)
Location is global 'q' of size 4008 at 0x0000006051c0 (issue49+0x0000006051c0)
Thread T1 (tid=17222, running) created by main thread at:
#0 pthread_create <null> (libtsan.so.0+0x000000027a67)
#1 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) <null> (libstdc++.so.6+0x0000000b9072)
#2 main /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:41 (issue49+0x000000401168)
SUMMARY: ThreadSanitizer: data race /home/cameron/projects/concurrentqueue/tests/tsan/issue49.cpp:26 FakeQueue::try_dequeue(int&)
==================
ThreadSanitizer: reported 1 warnings
Command line:
g++ -std=c++11 -O0 -g -fsanitize=thread issue49.cpp -o issue49 -pthread
g++ version: 5.3.1
Can anybody shed some light onto why tsan thinks this is a data race?
UPDATE
It seems like this is a false positive. To appease ThreadSanitizer, I've added annotations (see here for the supported ones and here for an example). Note that detecting whether tsan is enabled in GCC via a macro has only recently been added, so I had to manually pass -D__SANITIZE_THREAD__ to g++ for now.
#if defined(__SANITIZE_THREAD__)
#define TSAN_ENABLED
#elif defined(__has_feature)
#if __has_feature(thread_sanitizer)
#define TSAN_ENABLED
#endif
#endif
#ifdef TSAN_ENABLED
#define TSAN_ANNOTATE_HAPPENS_BEFORE(addr) \
AnnotateHappensBefore(__FILE__, __LINE__, (void*)(addr))
#define TSAN_ANNOTATE_HAPPENS_AFTER(addr) \
AnnotateHappensAfter(__FILE__, __LINE__, (void*)(addr))
extern "C" void AnnotateHappensBefore(const char* f, int l, void* addr);
extern "C" void AnnotateHappensAfter(const char* f, int l, void* addr);
#else
#define TSAN_ANNOTATE_HAPPENS_BEFORE(addr)
#define TSAN_ANNOTATE_HAPPENS_AFTER(addr)
#endif
struct FakeQueue
{
int items[1000];
std::atomic<int> m_enqueueIndex;
int m_dequeueIndex;
FakeQueue() : m_enqueueIndex(0), m_dequeueIndex(0) { }
void enqueue(int x)
{
auto tail = m_enqueueIndex.load(std::memory_order_relaxed);
items[tail] = x;
TSAN_ANNOTATE_HAPPENS_BEFORE(&items[tail]);
m_enqueueIndex.store(tail + 1, std::memory_order_release);
}
bool try_dequeue(int& x)
{
auto tail = m_enqueueIndex.load(std::memory_order_acquire);
assert(tail >= m_dequeueIndex);
if (tail == m_dequeueIndex)
return false;
TSAN_ANNOTATE_HAPPENS_AFTER(&items[m_dequeueIndex]);
x = items[m_dequeueIndex];
++m_dequeueIndex;
return true;
}
};
// main() is as before
Now ThreadSanitizer is happy at runtime.

This looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78158. Disassembling the binary produced by GCC shows that it doesn't instrument the atomic operations on O0.
As a workaround, you can either build your code with GCC with -O1/-O2, or get yourself a fresh Clang build and use it to run ThreadSanitizer (this is the recommended way, as TSan is being developed as part of Clang and only backported to GCC).
The comments above are invalid: TSan can easily comprehend the happens-before relation between the atomics in your code (one can check that by running the above reproducer under TSan in Clang).
I also wouldn't recommend using the AnnotateHappensBefore()/AnnotateHappensAfter() for two reasons:
you shouldn't need them in most cases; they denote that the code is doing something really complex (in which case you may want to double-check you're doing it right);
if you make an error in your lock-free code, spraying it with annotations may mask that error, so that TSan won't notice it.

The ThreadSanitizer is not good at counting, it cannot understand that writes to the items always happen before the reads.
The ThreadSanitizer can find that the stores of m_enqueueIndex happen before the loads, but it does not understand that the store to items[m_dequeueIndex] must happen before the load when tail > m_dequeueIndex.

Related

Implementing stub function and linking it to a definition in a library during run time

When I was reading about using shared libraries, I learnt how the definitions of standard C functions, like printf, are resolved during run-time.
I want to implement functions in my project in the similar manner. I can have stub definition of functions for compiling and linking phase. And actual definition of the functions will be present in a library on the device where I'll run my executable.
Suppose I have a main function:
#include<stdio.h>
#include"sum.h"
int main()
{
int num = 10;
int result = 0;
result = sum(num);
printf("Sum = %d\n",result);
return 0;
}
And my sum.h looks like:
#ifndef SUM_H
#define SUM_H
#ifdef __cplusplus
extern "C" {
#endif
#ifndef __EXPORT
#ifdef _USERDLL
#define __EXPORT __declspec(dllexport)
#else
#define __EXPORT __declspec(dllimport)
#endif // _USER_DLL
#endif // __EXPORT
__EXPORT int sum(int num);
}
#endif
#endif
And while builiding this executable, I'll use stub definition in sum_stub.c file:
// sum_stub.c
#include<stdio.h>
#include"sum.h"
int sum(int num) {
int res = 0;
printf("Inside stubbed function. Result=%d\n",res);
return res;
}
Let the executable that is build using above files is get_sum.exe
The actual function that will calculate sum is compiled as a shared library, say sum.dll.
// sum.c that will be compiled to sum.dll
#include<stdio.h>
#include"sum.h"
int sum(int num) {
int res = 0;
int i=0;
for (i=0; i<num; i++)
res = res + i;
return res;
}
Now as I run my executable, get_sum.exe, how can I link sum.dll at runtime so that correct function definition is used (inside sum.dll) instead of the stubbed one, that I used while compiling the executable?
I am looking for a way to implement it on windows target machine i.e. by using MSVC build tools and clang compiler.
What you are looking for is called "Delay-loaded DLL". Details on overriding default DLL loading code are provided in MSDN article on Linker support for delay-loaded DLLs.

G++ -cilkplus random behavior with std::vectors

The following (reduced) code is very badly handled by the series of GCC
#include <vector>
#include <cilk/cilk.h>
void walk(std::vector<int> v, int bnd, unsigned size) {
if (v.size() < size)
for (int i=0; i<bnd; i++) {
std::vector<int> vnew(v);
vnew.push_back(i);
cilk_spawn walk(vnew, bnd, size);
}
}
int main(int argc, char **argv) {
std::vector<int> v{};
walk(v , 5, 5);
}
Specifically:
G++ 5.3.1 crash:
red.cpp: In function ‘<built-in>’:
red.cpp:20:39: internal compiler error: in lower_stmt, at gimple-low.c:397
cilk_spawn walk(vnew, bnd, size);
G++ 6.3.1 create a code which works perfectly well if executed on one core
but segfault sometime, signal a double free some other times if using more cores. A student who
has a arch linux g++7 reported a similar result.
My question : is there something wrong with that code. Am I invoking some
undefined behavior or is it simply a bug I should report ?
Answering my own question:
According to https://gcc.gnu.org/ml/gcc-help/2017-03/msg00078.html its indeed a bug in GCC. The temporary is destroyed in the parent and not in the children in a cilk_spawn. So if the thread fork really occur, it might be destroyed too early.

Dereferencing void* warnings on Xcode

I'm aware of this SO question and this SO question. The element
of novelty in this one is in its focus on Xcode, and in its use of
square brackets to dereference a pointer to void.
The following program compiles with no warning in Xcode 4.5.2, compiles
with a warning on GCC 4.2 and, even though I don't have Visual Studio
right now, I remember that it would consider this a compiler
error, and MSDN and Internet agree.
#include <stdio.h>
int main(int argc, const char * argv[])
{
int x = 24;
void *xPtr = &x;
int *xPtr2 = (int *)&xPtr[1];
printf("%p %p\n", xPtr, xPtr2);
}
If I change the third line of the body of main to:
int *xPtr2 = (int *)(xPtr + 1);
It compiles with no warnings on both GCC and Xcode.
I would like to know how can I turn this silence into warnings or errors, on
GDB and especially Xcode/LLVM, including the fact that function main is int but
does not explicitly return any value (By the way I think -Wall does
the trick on GDB).
that isnt wrong at all...
the compiler doesnt know how big the pointer is ... a void[] ~~ void*
thats why char* used as strings need to be \0-terminated
you cannot turn on a warning for that as it isnt possible to determine a 'size of memory pointer to by a pointer' at compile time
void *v = nil;
*v[1] = 0 //invalid
void *v = malloc(sizeof(int)*2);
*v[1] = 0 //valid
*note typed inline on SO -- sorry for any non-working code

How to find the address & length of a C++ function at runtime (MinGW)

As this is my first post to stackoverflow I want to thank you all for your valuable posts that helped me a lot in the past.
I use MinGW (gcc 4.4.0) on Windows-7(64) - more specifically I use Nokia Qt + MinGW but Qt is not involved in my Question.
I need to find the address and -more important- the length of specific functions of my application at runtime, in order to encode/decode these functions and implement a software protection system.
I already found a solution on how to compute the length of a function, by assuming that static functions placed one after each other in a source-file, it is logical to be also sequentially placed in the compiled object file and subsequently in memory.
Unfortunately this is true only if the whole CPP file is compiled with option: "g++ -O0" (optimization level = 0).
If I compile it with "g++ -O2" (which is the default for my project) the compiler seems to relocate some of the functions and as a result the computed function length seems to be both incorrect and negative(!).
This is happening even if I put a "#pragma GCC optimize 0" line in the source file,
which is supposed to be the equivalent of a "g++ -O0" command line option.
I suppose that "g++ -O2" instructs the compiler to perform some global file-level optimization (some function relocation?) which is not avoided by using the #pragma directive.
Do you have any idea how to prevent this, without having to compile the whole file with -O0 option?
OR: Do you know of any other method to find the length of a function at runtime?
I prepare a small example for you, and the results with different compilation options, to highlight the case.
The Source:
// ===================================================================
// test.cpp
//
// Intention: To find the addr and length of a function at runtime
// Problem: The application output is correct when compiled with: "g++ -O0"
// but it's erroneous when compiled with "g++ -O2"
// (although a directive "#pragma GCC optimize 0" is present)
// ===================================================================
#include <stdio.h>
#include <math.h>
#pragma GCC optimize 0
static int test_01(int p1)
{
putchar('a');
putchar('\n');
return 1;
}
static int test_02(int p1)
{
putchar('b');
putchar('b');
putchar('\n');
return 2;
}
static int test_03(int p1)
{
putchar('c');
putchar('\n');
return 3;
}
static int test_04(int p1)
{
putchar('d');
putchar('\n');
return 4;
}
// Print a HexDump of a specific address and length
void HexDump(void *startAddr, long len)
{
unsigned char *buf = (unsigned char *)startAddr;
printf("addr:%ld, len:%ld\n", (long )startAddr, len);
len = (long )fabs(len);
while (len)
{
printf("%02x.", *buf);
buf++;
len--;
}
printf("\n");
}
int main(int argc, char *argv[])
{
printf("======================\n");
long fun_len = (long )test_02 - (long )test_01;
HexDump((void *)test_01, fun_len);
printf("======================\n");
fun_len = (long )test_03 - (long )test_02;
HexDump((void *)test_02, fun_len);
printf("======================\n");
fun_len = (long )test_04 - (long )test_03;
HexDump((void *)test_03, fun_len);
printf("Test End\n");
getchar();
// Just a trick to block optimizer from eliminating test_xx() functions as unused
if (argc > 1)
{
test_01(1);
test_02(2);
test_03(3);
test_04(4);
}
}
The (correct) Output when compiled with "g++ -O0":
[note the 'c3' byte (= assembly 'ret') at the end of all functions]
======================
addr:4199344, len:37
55.89.e5.83.ec.18.c7.04.24.61.00.00.00.e8.4e.62.00.00.c7.04.24.0a.00.00.00.e8.42
.62.00.00.b8.01.00.00.00.c9.c3.
======================
addr:4199381, len:49
55.89.e5.83.ec.18.c7.04.24.62.00.00.00.e8.29.62.00.00.c7.04.24.62.00.00.00.e8.1d
.62.00.00.c7.04.24.0a.00.00.00.e8.11.62.00.00.b8.02.00.00.00.c9.c3.
======================
addr:4199430, len:37
55.89.e5.83.ec.18.c7.04.24.63.00.00.00.e8.f8.61.00.00.c7.04.24.0a.00.00.00.e8.ec
.61.00.00.b8.03.00.00.00.c9.c3.
Test End
The erroneous Output when compiled with "g++ -O2":
(a) function test_01 addr & len seem correct
(b) functions test_02, test_03 have negative lengths,
and fun. test_02 length is also incorrect.
======================
addr:4199416, len:36
83.ec.1c.c7.04.24.61.00.00.00.e8.c5.61.00.00.c7.04.24.0a.00.00.00.e8.b9.61.00.00
.b8.01.00.00.00.83.c4.1c.c3.
======================
addr:4199452, len:-72
83.ec.1c.c7.04.24.62.00.00.00.e8.a1.61.00.00.c7.04.24.62.00.00.00.e8.95.61.00.00
.c7.04.24.0a.00.00.00.e8.89.61.00.00.b8.02.00.00.00.83.c4.1c.c3.57.56.53.83.ec.2
0.8b.5c.24.34.8b.7c.24.30.89.5c.24.08.89.7c.24.04.c7.04.
======================
addr:4199380, len:-36
83.ec.1c.c7.04.24.63.00.00.00.e8.e9.61.00.00.c7.04.24.0a.00.00.00.e8.dd.61.00.00
.b8.03.00.00.00.83.c4.1c.c3.
Test End
This is happening even if I put a "#pragma GCC optimize 0" line in the source file, which is supposed to be the equivalent of a "g++ -O0" command line option.
I don't believe this is true: it is supposed to be the equivalent of attaching __attribute__((optimize(0))) to subsequently defined functions, which causes those functions to be compiled with a different optimisation level. But this does not affect what goes on at the top level, whereas the command line option does.
If you really must do horrible things that rely on top level ordering, try the -fno-toplevel-reorder option. And I suspect that it would be a good idea to add __attribute__((noinline)) to the functions in question as well.

getline on MacOSX 10.6 crashing C compiler?

I'm having a really hard time getting an R library installed that requires some compilation in C. I'm using a Mac OSX Snow Leopard machine and trying to install this R package (here).
I've looked at the thread talking about getline on macs and have tried a few of these fixes, but nothing is working! I'm a newbie and don't know any C, so that may be why! Can anyone give me some tips on how I could modify files in this package to get it to install?? Anyhelp would be pathetically appreciated! Here's the error I'm getting:
** libs
** arch - i386
g++ -arch i386 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/usr/local/include -D_FASTMAP -DMAQ_LONGREADS -fPIC -g -O2 -c bed2vector.C -o bed2vector.o
In file included from /usr/include/c++/4.2.1/backward/strstream:51,
from bed2vector.C:8:
/usr/include/c++/4.2.1/backward/backward_warning.h:32:2: warning: #warning This file includes at least one deprecated or antiquated header. Please consider using one of the 32 headers found in section 17.4.1.2 of the C++ standard. Examples include substituting the <X> header for the <X.h> header for C++ includes, or <iostream> instead of the deprecated header <iostream.h>. To disable this warning use -Wno-deprecated.
bed2vector.C: In function ‘int get_a_line(FILE*, BZFILE*, int, std::string&)’:
bed2vector.C:74: error: no matching function for call to ‘getline(char**, size_t*, FILE*&)’
make: *** [bed2vector.o] Error 1
chmod: /Library/Frameworks/R.framework/Resources/library/spp/libs/i386/*: No such file or directory
ERROR: compilation failed for package 'spp'
The easiest solution is probably to add a static definition for getline() to bed2vector.c. This might be good enough:
/* PASTE AT TOP OF FILE */
#include <stdio.h> /* flockfile, getc_unlocked, funlockfile */
#include <stdlib.h> /* malloc, realloc */
#include <errno.h> /* errno */
#include <unistd.h> /* ssize_t */
extern "C" ssize_t getline(char **lineptr, size_t *n, FILE *stream);
/* PASTE REMAINDER AT BOTTOM OF FILE */
ssize_t
getline(char **linep, size_t *np, FILE *stream)
{
char *p = NULL;
size_t i = 0;
if (!linep || !np) {
errno = EINVAL;
return -1;
}
if (!(*linep) || !(*np)) {
*np = 120;
*linep = (char *)malloc(*np);
if (!(*linep)) {
return -1;
}
}
flockfile(stream);
p = *linep;
for (int ch = 0; (ch = getc_unlocked(stream)) != EOF;) {
if (i > *np) {
/* Grow *linep. */
size_t m = *np * 2;
char *s = (char *)realloc(*linep, m);
if (!s) {
int error = errno;
funlockfile(stream);
errno = error;
return -1;
}
*linep = s;
*np = m;
}
p[i] = ch;
if ('\n' == ch) break;
i += 1;
}
funlockfile(stream);
/* Null-terminate the string. */
if (i > *np) {
/* Grow *linep. */
size_t m = *np * 2;
char *s = (char *)realloc(*linep, m);
if (!s) {
return -1;
}
*linep = s;
*np = m;
}
p[i + 1] = '\0';
return ((i > 0)? i : -1);
}
This doesn't handle the case where the line is longer than the maximum value that ssize_t can represent. If you run into that case, you've likely got other problems.
Zeroth question: Have you considered using a package manager like fink or MacPorts rather than compiling yourself? I know that fink has an R package.
First question: How is the R build managed? Is there a ./configure? If so have you looked at the options to it? Does it use make? Scons? Some other dependency manager?
Second question: Have you told the build system that you are working on a Mac? Can you specify that you don't have a libc with native getline?
If the build system doesn't support Mac OS---but I image that R's does---you are probably going to have to download the standalone version, and hack the build to include it. How exactly you do that depends on the build system. And you may need to hack the source some.

Resources