Random number generator of L'Ecuyer with Bays-Durham - thread-safety

I am working with Monte Carlo simulations to find the decimal places of PI. So far so good but OpenMP came in and I realize that ran2, arguably the best RGN, is not threadsafe! The implementation is here. Since I have not worked with OpenMP and neither a lot on multi-threading I am stuck at making this thread safe using OpenMP.
So far what I know is that a function is already thread-safe if it doesn't modify non-local memory and it doesn't call any function that does. In this case, there are 3 variables which are static and thus will be modified if gets used by different threads.
One possible solution is to call this in a thread safe way by enclosing the calling of ran2 in a critical section but that makes no sense as I get no speedup.
Can somebody give me pointers on how to proceed with this or if somebody has any reference that will be great too!

What is generally done to render a procedure thread safe is to associate previously static data to a thread local data. Look for instance at the man page of rand_r() that is a thread safe version of rand().
So in your version of L'Ecuyer:
define a struct (say state) that holds the static data
redefine procedure ran2() to have an additional parameter that is a pointer to this struct state and modify the code accordingly. Let ran2_r() be the new name.
define in every thread a local struct state to hold the state
probably state needs to be seeded. You can use get_thread_num() to provide a per thread seed to initialize properly the state when entering the thread.
Now you just need to call your new ran2_r() with a pointer to this state. It will be modified by the procedure, but modifications will be stored in the thread local state var.

Related

Container Thread Safety

I'm aware of Container Thread Safety topic listed there:
https://en.cppreference.com/w/cpp/container
But I want to know: can I use non-const member functions and const member functions concurrently without blocking (a mutex)?
More specific:
Can I use std::vector::push_back and std::vector::size concurrently?
Can I use std::set::insert and std::set::size concurrently?
This doesn't make a practical sense commonly, but I don't need an exact result of size which I'll use, I just need a valid result at the time when I call it.
P.S. My doubts are come from there: https://www.cplusplus.com/reference/set/set/insert/ where they say for std::set::insert that
Concurrently accessing existing elements is safe
So maybe getting the size of containter is also safe.
The main thread-safety rule of stl containers is that if more than one working thread is accessing a shared container, and at least one of them is non-const, then the threads should be synchronized. If you do not put any synchronizations, it will be undefined behavior.
if you take a look at the C++ reference here for std::vector::size(), it says:
Data Races
The container is accessed. No contained elements are accessed:
concurrently accessing or modifying them is safe.
As mentioned, the vector container will be accessed during the call to .size() and this access does not allow you to call non-const methods at the same time on the vector. If you push_back an element into the vector when you get the size of the vector by calling .size(), then the behavior of your program will be undefined.

Go lang global variables without goroutines overwriting

I'm writing a CMS in Go and have a session type (user id, page contents to render, etc). Ideally I'd like that type to be a global variable so I'm not having to propagate it through all the nested functions, however having a global variable like that would obviously mean that each new session would overwrite it's predecessor, which, needlessly to say, would be an epic fail.
Some languages to offer a way of having globals within threads that are preserved within that thread (ie the value of that global is sandboxed within that thread). While I'm aware that Goroutines are not threading, I just wondered if there was a similar method at my disposal or if I'd have to pass a local pointer of my session type down through the varies nested routines.
I'm guessing channels wouldn't do this? From what I can gather (and please correct me if I'm wrong here), but they're basically just a safe way of sharing global variables?
edit: I'd forgotten about this question! Anyhow, an update for anyone who is curious. This question was written back when I was new to Go and the CMS was basically my first project. I was coming from a C background with familiarity with POSIX thread but I quickly realised a better approach was to write the code in a mode functional design with session objects passed down as pointers in function parameters. This gave me both the context-sensitive local scope I was after while also minimizing the amount to data I was copying about. However being a 7 year old project and one that was at the start of my transition to Go, it's fair to say the project could do with a major rewrite anyway as there are a lot of mistakes made. That's a concern for another day though - currently it works and I have enough other projects on the go at.
You'll want to use something like a Context:
http://blog.golang.org/context
Basically, the pattern is to create a Context for each unique thing you want to do. (A web request in your case.) Use context.WithValue to embed multiple variables in the context. Then always pass it as the first parameter to other methods that are doing further work in other goroutines.
Getting the variable you need out of the context is a matter of calling context.Value from within any goroutine. From the above link:
A Context is safe for simultaneous use by multiple goroutines. Code can pass a single Context to any number of goroutines and cancel that Context to signal all of them.
I had an implementation where I was explicitly sending variables as method parameters, and I discovered that embedding these variables using contexts significantly cleaned up my code.
Using a Context also helps because it provides ways to end long-running tasks by using channels, select, and a concept called a "done channel." See this article for a great basic review and implementation:
http://blog.golang.org/pipelines
I'd recommend reading the pipelines article first for a good flavor of how to manage communication among goroutines, then the context article for a better idea of how to level-up and start embedding variables to pass around.
Good luck!
Don't use global variables. Use Go goroutine-local variables.
go-routine Id..
There are already goroutine-local variables: they are called function
arguments, function return values, and local variables.
Russ
If you have more than one user, then wouldn't you need that info for each connection? So I would think that you'd have a struct per connected user. It would be idiomatic Go to pass a pointer to that struct when setting up the worker goroutine, or passing the pointer over a channel.

What is the design rationale behind HandleScope?

V8 requires a HandleScope to be declared in order to clean up any Local handles that were created within scope. I understand that HandleScope will dereference these handles for garbage collection, but I'm interested in why each Local class doesn't do the dereferencing themselves like most internal ref_ptr type helpers.
My thought is that HandleScope can do it more efficiently by dumping a large number of handles all at once rather than one by one as they would in a ref_ptr type scoped class.
Here is how I understand the documentation and the handles-inl.h source code. I, too, might be completely wrong since I'm not a V8 developer and documentation is scarce.
The garbage collector will, at times, move stuff from one memory location to another and, during one such sweep, also check which objects are still reachable and which are not. In contrast to reference-counting types like std::shared_ptr, this is able to detect and collect cyclic data structures. For all of this to work, V8 has to have a good idea about what objects are reachable.
On the other hand, objects are created and deleted quite a lot during the internals of some computation. You don't want too much overhead for each such operation. The way to achieve this is by creating a stack of handles. Each object listed in that stack is available from some handle in some C++ computation. In addition to this, there are persistent handles, which presumably take more work to set up and which can survive beyond C++ computations.
Having a stack of references requires that you use this in a stack-like way. There is no “invalid” mark in that stack. All the objects from bottom to top of the stack are valid object references. The way to ensure this is the LocalScope. It keeps things hierarchical. With reference counted pointers you can do something like this:
shared_ptr<Object>* f() {
shared_ptr<Object> a(new Object(1));
shared_ptr<Object>* b = new shared_ptr<Object>(new Object(2));
return b;
}
void g() {
shared_ptr<Object> c = *f();
}
Here the object 1 is created first, then the object 2 is created, then the function returns and object 1 is destroyed, then object 2 is destroyed. The key point here is that there is a point in time when object 1 is invalid but object 2 is still valid. That's what LocalScope aims to avoid.
Some other GC implementations examine the C stack and look for pointers they find there. This has a good chance of false positives, since stuff which is in fact data could be misinterpreted as a pointer. For reachability this might seem rather harmless, but when rewriting pointers since you're moving objects, this can be fatal. It has a number of other drawbacks, and relies a lot on how the low level implementation of the language actually works. V8 avoids that by keeping the handle stack separate from the function call stack, while at the same time ensuring that they are sufficiently aligned to guarantee the mentioned hierarchy requirements.
To offer yet another comparison: an object references by just one shared_ptr becomes collectible (and actually will be collected) once its C++ block scope ends. An object referenced by a v8::Handle will become collectible when leaving the nearest enclosing scope which did contain a HandleScope object. So programmers have more control over the granularity of stack operations. In a tight loop where performance is important, it might be useful to maintain just a single HandleScope for the whole computation, so that you won't have to access the handle stack data structure so often. On the other hand, doing so will keep all the objects around for the whole duration of the computation, which would be very bad indeed if this were a loop iterating over many values, since all of them would be kept around till the end. But the programmer has full control, and can arrange things in the most appropriate way.
Personally, I'd make sure to construct a HandleScope
At the beginning of every function which might be called from outside your code. This ensures that your code will clean up after itself.
In the body of every loop which might see more than three or so iterations, so that you only keep variables from the current iteration.
Around every block of code which is followed by some callback invocation, since this ensures that your stuff can get cleaned if the callback requires more memory.
Whenever I feel that something might produce considerable amounts of intermediate data which should get cleaned (or at least become collectible) as soon as possible.
In general I'd not create a HandleScope for every internal function if I can be sure that every other function calling this will already have set up a HandleScope. But that's probably a matter of taste.
Disclaimer: This may not be an official answer, more of a conjuncture on my part; but the v8 documentation is hardly
useful on this topic. So I may be proven wrong.
From my understanding, in developing various v8 based backed application. Its a means of handling the difference between the C++ and javaScript environment.
Imagine the following sequence, which a self dereferencing pointer can break the system.
JavaScript calls up a C++ wrapped v8 function : lets say helloWorld()
C++ function creates a v8::handle of value "hello world =x"
C++ returns the value to the v8 virtual machine
C++ function does its usual cleaning up of resources, including dereferencing of handles
Another C++ function / process, overwrites the freed memory space
V8 reads the handle : and the data is no longer the same "hell!#(#..."
And that's just the surface of the complicated inconsistency between the two; Hence to tackle the various issues of connecting the JavaScript VM (Virtual Machine) to the C++ interfacing code, i believe the development team, decided to simplify the issue via the following...
All variable handles, are to be stored in "buckets" aka HandleScopes, to be built / compiled / run / destroyed by their
respective C++ code, when needed.
Additionally all function handles, are to only refer to C++ static functions (i know this is irritating), which ensures the "existence"
of the function call regardless of constructors / destructor.
Think of it from a development point of view, in which it marks a very strong distinction between the JavaScript VM development team, and the C++ integration team (Chrome dev team?). Allowing both sides to work without interfering one another.
Lastly it could also be the sake of simplicity, to emulate multiple VM : as v8 was originally meant for google chrome. Hence a simple HandleScope creation and destruction whenever we open / close a tab, makes for much easier GC managment, especially in cases where you have many VM running (each tab in chrome).

Why do we need boost::thread_specific_ptr?

Why do we need boost::thread_specific_ptr, or in other words what can we not easily do without it?
I can see why pthread provides pthread_getspecific() etc. These functions are useful for cleaning up after dead threads, and handy to call from C-style functions (the obvious alternative being to pass a pointer everywhere that points to some memory allocated before the thread was created).
In contrast, the constructor of boost:thread takes a callable class by value, and everything non-static in that class becomes thread local once it is copied. I cannot see why I would want to use boost::thread_specific_ptr in preference to a class member any more than I would want to use a global variable in OOP code.
Do I horribly misunderstand anything? A very brief example would help, please. Many thanks.
thread_specific_ptr simply provides portable thread local data access. You don't have to be managing your threads with Boost.Thread to get value from this. The canonical example is the one cited in the Boost docs for this class:
One example is the C errno variable,
used for storing the error code
related to functions from the Standard
C library. It is common practice (and
required by POSIX) for compilers that
support multi-threaded applications to
provide a separate instance of errno
for each thread, in order to avoid
different threads competing to read or
update the value.

Can I be sure that the code I write is always executed in the same thread?

I normally work on single threaded applications and have generally never really bothered with dealing with threads. My understanding of how things work - which certainly, may be wrong - is that as long as we're always dealing with single threaded code (i.e. no forks or anything like that) it will always be executed in the same thread.
Is this assumption correct? I have a fuzzy idea that UI libraries/frameworks may spawn off threads of their own to handle GUI stuff (which accounts for the fact that the Windows task manager tells me that my 'single threaded' application is actually running on 10 threads) but I'm guessing that this shouldn't affect me?
How does this apply to COM? For instance, if I were to create an instance of a COM component in my code; and that COM component writes some information to a thread-based location (using System.Threading.Thread.GetData for instance) will my application be able to get hold of that information?
So in summary:
In single threaded code, can I be sure that whatever I store in a thread-based location can be retrievable from anywhere else in the code?
If that single threaded code were to create an instance of a COM component which stores some information in a thread-based location, can that be similarly retrievable from anywhere else?
UI usually has the opposite constraint (sadly): it's single threaded and everything must happen on that thread.
The easiest way to check if you are always in the same thread (for, say, a function) is to have an integer variable set at -1, and have a check function like (say you are in C#):
void AssertSingleThread()
{
if (m_ThreadId < 0) m_ThreadId = Thread.CurrentThread.ManagedThreadId;
Debug.Assert(m_ThreadId == Thread.CurrentThread.ManagedThreadId);
}
That said:
I don't understand the question #1, really. Why store in a thread-based location if your purpose is to have a global scope ?
About the second question, most COM code runs on a single thread and, most often, on the thread where your UI message processing lives - this is because most COM code is designed to be compatible with VB6 which is single-thread.
The reason your program has about 10 threads is because both Windows (if you use some of its features like completion ports, or some kind of timers) and the CLR (for example for the GC or, again, some types of timers) may create threads in your process space (technically any program with enough priviledges, can too).
Think about having the model of having a single dataStore class running in your mainThread that all threads can read and write their instance variables to. This will avoid a lot of problems that might arise accessing threads all over the shop.
Simple idea, until you reach the fun part of threading. Concurrency and synchronization; simply, if you have two threads that want to read and write to the same variable inside dataStore at the same time, you have a problem.
Java handles this by allowing you to declare a variable or method synchronized, allowing only one thread access at a time.
I believe some .NET objects have Lock and Synchronized methods defined on them, but I know no more than this.

Resources