I have a collection of data, with "agents" that operate on that data. I have also built up an dependency list between the agents because some agents depend on what other agents do. I also have a structure set up to enforce what data within the collection of data can be accessed or mutated by each agent. I am stuck on how to implement this design, a job pool seems too simple to handling the dependencies . My question is related to how to actually implement this type of design. The design is very similar to FlowBased programming(if I understand it correctly), but data is operated on in bulk.
My first thought was to have a tree heirarchy of the tasks that need to be done:
/ | \
a1 a2 a3
| /
For example, I can run a1, a2, and a3 concurrently. But to run a4, a1 and a2 need to be finished.
What would the best tools to set this up? Should I use a signal/slot implementation, roll my own Channels, use futures and promises to emulate a channel/signal/slot system? Maybe make each Node have a number of dependencies, and when each dependent agent finishes a counter is incremented when the next Node is called until it matches the number of deps. Or this could be implemented as a "gate" type of structure that holds the number of deps and sends a signal or w/e to the agent when the deps are satisfied.
I could make make my own TaskManager that does the scheduling, but I'd rather call each top-level Node once, and have the hierarchy automatically traversed. Is their something else entirely I could try. I'm interested in any crazy ideas you might have.
I'm leaning towards something like this, using "signals" and "slots":
+----------+ +----------+
| Actor1 | | Actor2 |
| update() | | update() |
+----|-----+ +----|-----+
\____ _____/
\ /
| Gate |
| Actor4 |
| update() |
How is this type of problem typically tackled? I would like to keep it somewhat generic and use popular libraries if I can. I also need good response times since this will be running in the update() loop of a game engine.
You have an object which encapsulates data, and agents working on that data, which together are the underpinnings for the Visitor design pattern. According to the Gang of Four, the intent of the Visitor patter is to
Represent an operation to be performed on the elements of an object/structure. Visitor lets you define a new operation without changing the classes of the elements on which it operates.
The abstract base visitor, visited classes would have at least the following functionality:
class DataElement
virtual ~DataElement() = default;
virtual void Accept(DataVisitor*);
DataElement() = default;
DataElement::Accept(DataVisitor* v)
v->gendata(this); // double dispatch
class DataVisitor
virtual ~DataVisitor() {} = default;
virtual void visit_SpreadData(SpreadData*) = 0;
virtual void gendata(DataElement*) = 0;
DataVisitor() = default;
For the chain of responsibility that you are setting up, using a Mediator would be a good start - intent:
Define an object that encapsulates how a set of objects interact. mediator promises loose coupling by keeping objects from referring to each other explicitly, and it lets you vary their interaction independently.
For that we would modify at least the DataVisitor class to contain a private mediator class. The abstract mediator base class might look something like:
class Mediator
virtual ~Mediator() = default;
virtual void mediate() = 0;
std::list<DataVisitor*> get_visitors() const
Mediator() = default;
virtual void CreateVisitors() = 0;
std::list<DataVisitor*> visitors_;
In a concrete mediator class, the one that specifies the behavior in your diagram, the main ingredients are:
define private std::mutex, std::condition_variable, and possibly a threadsafe queue;
in CreateVisitors(), push a1, a2, a3, a4 onto visitors_ list;
create thread methods for each a_i, a1_thread(),a2_thread(),a3_thread() will run immediately, while a4_thread() will wait for the completion of a1, a2, possibly using the results of gendata() (via notify_one()) which have been pushed to the queue;
fill in mediate():
std::vector threadv;
threadv.emplace_back(&ConcreteMediator::a1_thread, this);
threadv.emplace_back(&ConcreteMediator::a4_thread, this);
std::for_each(threadv.begin(), threadv.end(), std::mem_fn(&std::thread::join));
This is a pretty unsophisticated setup, but it certainly maintains the loose coupling (Mediator) and scalability (Visitor) that you desire.
I'm building a publish-subscribe class (called SystermInterface), which is responsible to receive updates from its instances, and publish them to subscribers.
Adding a subscriber callback function is trivial and has no issues, but removing it yields an error, because std::function<()> is not comparable in C++.
std::vector<std::function<void()> subs;
void subscribe(std::function<void()> f)
void unsubscribe(std::function<void()> f)
std::remove(subs.begin(), subs.end(), f); // Error
I've came down to five solutions to this error:
Registering the function using a weak_ptr, where the subscriber must keep the returned shared_ptr alive.
Solution example at this link.
Instead of registering at a vector, map the callback function by a custom key, unique per callback function.
Solution example at this link
Using vector of function pointers. Example
Make the callback function comparable by utilizing the address.
Use an interface class (parent class) to call a virtual function.
In my design, all intended classes inherits a parent class called
ServiceCore, So instead of registering a callback function, just
register ServiceCore reference in the vector.
Given that the SystemInterface class has a field attribute per instance (ID) (Which is managed by ServiceCore, and supplied to SystemInterface by constructing a ServiceCore child instance).
To my perspective, the first solution is neat and would work, but it requires handling at subscribers, which is something I don't really prefer.
The second solution would make my implementation more complex, where my implementation looks as:
using namespace std;
enum INFO_SUB_IMPORTANCE : uint8_t
INFO_SUB_PRIMARY, // Only gets the important updates.
INFO_SUB_ALL // Gets all updates
using CBF = function<void(string,string)>;
using REQINF_SUBS = map<string, INFO_SUBTREE>; // It's keyed by an iterator, explaining it goes out of the question scope.
using INFSRC_SUBS = map<string, INFO_SUBTREE>;
REQINF_SUBS infoSubrs;
INFSRC_SUBS sourceSubrs;
WILD_SUBS wildSubrs;
void subscribeInfo(string info, INFO_SUB_IMPORTANCE imp, CBF f) {
void subscribeSource(string source, INFO_SUB_IMPORTANCE imp, CBF f) {
void subscribeWild(INFO_SUB_IMPORTANCE imp, CBF f) {
The second solution would require INFO_SUBTREE to be an extended map, but can be keyed by an ID:
using KEY_T = uint32_t; // or string...
For the third solution, I'm not aware of the limitations given by using function pointers, and the consequences of the fourth solution.
The Fifth solution would eliminate the purpose of dealing with CBFs, but it'll be more complex at subscriber-side, where a subscriber is required to override the virtual function and so receives all updates at one place, in which further requires filteration of the message id and so direct the payload to the intended routines using multiple if/else blocks, which will increase by increasing subscriptions.
What I'm looking for is an advice for the best available option.
Regarding your proposed solutions:
That would work. It can be made easy for the caller: have subscribe() create the shared_ptr and corresponding weak_ptr objects, and let it return the shared_ptr.
Then the caller must not lose the key. In a way this is similar to the above.
This of course is less generic, and then you can no longer have (the equivalent of) captures.
You can't: there is no way to get the address of the function stored inside a std::function. You can do &f inside subscribe() but that will only give you the address of the local variable f, which will go out of scope as soon as you return.
That works, and is in a way similar to 1 and 2, although now the "key" is provided by the caller.
Options 1, 2 and 5 are similar in that there is some other data stored in subs that refers to the actual std::function: either a std::shared_ptr, a key or a pointer to a base class. I'll present option 6 here, which is kind of similar in spirit but avoids storing any extra data:
Store a std::function<void()> directly, and return the index in the vector where it was stored. When removing an item, don't std::remove() it, but just set it to std::nullptr. Next time subscribe() is called, it checks if there is an empty element in the vector and reuses it:
std::vector<std::function<void()> subs;
std::size_t subscribe(std::function<void()> f) {
if (auto it = std::find(subs.begin(), subs.end(), std::nullptr); it != subs.end()) {
*it = f;
return std::distance(subs.begin(), it);
} else {
return subs.size() - 1;
void unsubscribe(std::size_t index) {
subs[index] = std::nullptr;
The code that actually calls the functions stored in subs must now of course first check against std::nullptrs. The above works because std::nullptr is treated as the "empty" function, and there is an operator==() overload that can check a std::function against std::nullptr, thus making std::find() work.
One drawback of option 6 as shown above is that a std::size_t is a rather generic type. To make it safer, you might wrap it in a class SubscriptionHandle or something like that.
As for the best solution: option 1 is quite heavy-weight. Options 2 and 5 are very reasonable, but 6 is, I think, the most efficient.
What is the cleaner way of extracting predicates which will have multiple uses. Methods or Class fields?
The two examples:
1.Class Field
void someMethod() {
IntStream.range(1, 100)
private IntPredicate isOverFifty = number -> number > 50;
void someMethod() {
IntStream.range(1, 100)
private IntPredicate isOverFifty() {
return number -> number > 50;
For me, the field way looks a little bit nicer, but is this the right way? I have my doubts.
Generally you cache things that are expensive to create and these stateless lambdas are not. A stateless lambda will have a single instance created for the entire pipeline (under the current implementation). The first invocation is the most expensive one - the underlying Predicate implementation class will be created and linked; but this happens only once for both stateless and stateful lambdas.
A stateful lambda will use a different instance for each element and it might make sense to cache those, but your example is stateless, so I would not.
If you still want that (for reading purposes I assume), I would do it in a class Predicates let's assume. It would be re-usable across different classes as well, something like this:
public final class Predicates {
private Predicates(){
public static IntPredicate isOverFifty() {
return number -> number > 50;
You should also notice that the usage of Predicates.isOverFifty inside a Stream and x -> x > 50 while semantically the same, will have different memory usages.
In the first case, only a single instance (and class) will be created and served to all clients; while the second (x -> x > 50) will create not only a different instance, but also a different class for each of it's clients (think the same expression used in different places inside your application). This happens because the linkage happens per CallSite - and in the second case the CallSite is always different.
But that is something you should not rely on (and probably even consider) - these Objects and classes are fast to build and fast to remove by the GC - whatever fits your needs - use that.
To answer, it's better If you expand those lambda expressions for old fashioned Java. You can see now, these are two ways we used in our codes. So, the answer is, it all depends how you write a particular code segment.
private IntPredicate isOverFifty = new IntPredicate<Integer>(){
public void test(number){
return number > 50;
private IntPredicate isOverFifty() {
return new IntPredicate<Integer>(){
public void test(number){
return number > 50;
1) For field case you will have always allocated predicate for each new your object. Not a big deal if you have a few instances, likes, service. But if this is a value object which can be N, this is not good solution. Also keep in mind that someMethod() may not be called at all. One of possible solution is to make predicate as static field.
2) For method case you will create the predicate once every time for someMethod() call. After GC will discard it.
I have a Entity-Component System.
Components : classes with data, but have no complex function
Entity : integer + list of Components (<=1 instance per type per entity)
Systems : a lot of function with minimum data, do complex game logic
Here is a sample. A bullet entity pewpew is exploded, so it will be set to be invisible (at #2):-
class Component_Projectile : public ComponentBase{
public: GraphicObject* graphicObject=nullptr; //#1
public: Entity whoCreateMe=nullptr;
public: Entity targetEnemy=nullptr;
//.... other fields ....
class System_Projectile : public SystemBase {
public: void explodeDamage(Entity pewpew){ //called every time-step
Pointer<Component_Projectile> comPro=pewpew;
comPro->graphicObject->setVisible(false); //#2
//.... generate some cool particles, damage surrounded object, etc
//.... other functions ....
It works OK.
New Version
Half year later, I realized that my architecture looks inconsistent.
I cache all game-logic object using Entity.
But cache Physic Object and Graphic Object by direct pointer. (e.g. #1)
I have a crazy idea :
Physic Object and Graphic Object should also be game-logic object!
They should be encapsulated into Entity.
Now the code will be (the change is marked with #1 and #2):-
class Component_Projectile : public ComponentBase{
public: Entity graphicObject=nullptr; //#1
public: Entity whoCreateMe=nullptr;
public: Entity targetEnemy=nullptr;
//.... other fields ....
class System_Projectile : public SystemBase {
public: void explodeDamage(Entity pewpew){ //called every time-step
Pointer<Component_Projectile> comPro=pewpew;
system_graphic->setVisible(comPro->graphicObject,false); //#2
//.... generate some cool particles, damage surrounded object, etc
//.... other functions ....
After playing them for a week, I can conclude pro/cons of this approach as below :-
1. Drastically reduce couple between game logic VS graphic-engine/physic-engine
All graphics-specific function is now encapsulated inside 1-3 systems.
No other game systems (e.g. System_Projectile) refer to the hardcode type e.g. GraphicObject.
2. Dramatically increase flexibility in design
Old vanilla graphic object is not just a graphic object anymore!
I can change it to something else, especially add special feature that is totally insane / too specific for physic/graphic engine, e.g.
rainbow blinking graphic
strange gravity, magnet
swap in/out many physic-object type in the same Entity (not sure)
3. Reduce compile time
It is accidentally become a pimpl idiom.
For example, System_Projectile don't have to #include "GraphicObject.h" any more.
1. I have to encapsulate many graphic/physic-object's functions.
For example,
is implemented as
public: void setVisible(Entity entity,bool visible){
It is tedious, but not a very hard work.
It can be partially alleviated by <...>.
2. Performance is (only) little bit worse.
Need a few additional indirection.
3. Code is less readable.
The new version is harder to read.
comPro->graphicObject->setVisible(false); //old version
system_graphic->setVisible(comPro->graphicObject,false); //new version
4. Losing type + Ctrl+space is less usable
In old version, I can easily ctrl+space in this code :-
comPro->graphicObject-> (ctrl+space)
It is now harder. I have to think which system I want to call.
In most code location, the advantage overcome the disadvantage, so I decided I will use the new version.
How to alleviate the disadvantages, especially number 3 and 4?
Design-pattern? C++ magic?
I may use Entity-Component System in a wrong way. (?)
I am migrating a project that was run on bare-bone to linux, and need to eliminate some {disable,enable}_scheduler calls. :)
So I need a lock-free sync solution in a single writer, multiple readers scenario, where the writer thread cannot be blocked. I came up with the following solution, which does not fit to the usual acquire-release ordering:
class RWSync {
std::atomic<int> version; // incremented after every modification
std::atomic_bool invalid; // true during write
RWSync() : version(0), invalid(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
do { // wait until the object is valid
currentVersion = version.load(std::memory_order_acquire);
} while (invalid.load(std::memory_order_acquire));
// check if something changed
} while (version.load(std::memory_order_acquire) != currentVersion
|| invalid.load(std::memory_order_acquire));
void beginWrite() {
invalid.store(true, std::memory_order_relaxed);
void endWrite() {
version.fetch_add(1, std::memory_order_release);
invalid.store(false, std::memory_order_release);
I hope the intent is clear: I wrap the modification of a (non-atomic) payload between beginWrite/endWrite, and read the payload only inside the lambda function passed to sync().
As you can see, here I have an atomic store in beginWrite() where no writes after the store operation can be reordered before the store. I did not find suitable examples, and I am not experienced in this field at all, so I'd like some confirmation that it is OK (verification through testing is not easy either).
Is this code race-free and work as I expect?
If I use std::memory_order_seq_cst in every atomic operation, can I omit the fences? (Even if yes, I guess the performance would be worse)
Can I drop the fence in endWrite()?
Can I use memory_order_acq_rel in the fences? I don't really get the difference -- the single total order concept is not clear to me.
Is there any simplification / optimization opportunity?
+1. I happily accept any better idea as the name of this class :)
The code is basically correct.
Instead of having two atomic variables (version and invalid) you may use single version variable with semantic "Odd values are invalid". This is known as "sequential lock" mechanism.
Reducing number of atomic variables simplifies things a lot:
class RWSync {
// Incremented before and after every modification.
// Odd values mean that object in invalid state.
std::atomic<int> version;
RWSync() : version(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
currentVersion = version.load(std::memory_order_seq_cst);
// This may reduce calls to lambda(), nothing more
if(currentVersion | 1) continue;
// Repeat until something changed or object is in an invalid state.
} while ((currentVersion | 1) ||
version.load(std::memory_order_seq_cst) != currentVersion));
void beginWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Invalidation requires sequential order
version.store(currentVersion + 1, std::memory_order_seq_cst);
void endWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Release order is sufficient for mark an object as valid
version.store(currentVersion + 1, std::memory_order_release);
Note the difference in memory orders in beginWrite() and endWrite():
endWrite() makes sure that all previous object's modifications have been completed. It is sufficient to use release memory order for that.
beginWrite() makes sure that reader will detect object being in invalid state before any futher object's modification is started. Such garantee requires seq_cst memory order. Because of that reader uses seq_cst memory order too.
As for fences, it is better to incorporate them into previous/futher atomic operation: compiler knows how to make the result fast.
Explanations of some modifications of original code:
1) Atomic modification like fetch_add() is intended for cases, when concurrent modifications (like another fetch_add()) are possible. For correctness, such modifications use memory locking or other very time-costly architecture-specific things.
Atomic assignment (store()) does not use memory locking, so it is cheaper than fetch_add(). You may use such assignment because concurrent modifications are not possible in your case (reader does not modify version).
2) Unlike to release-acquire semantic, which differentiate load and store operations, sequential consistency (memory_order_seq_cst) is applicable to every atomic access, and provide total order between these accesses.
The accepted answer is not correct. I guess the code should be something like "currentVersion & 1" instead of "currentVersion | 1". And subtler mistake is that, reader thread can go into lambda(), and after that, the write thread could run beginWrite() and write value to non-atomic variable. In this situation, write action in payload and read action in payload haven't happens-before relationship. concurrent access (without happens-before relationship) to non-atomic variable is a data race. Note that, single total order of memory_order_seq_cst does not means the happens-before relationship; they are consistent, but two kind of things.
We have people who run code for simulations, testing etc. on some supercomputers that we have. What would be nice is, if as part of a build process we can check that not only that the code compiles but that the ouput matches some pattern which will indicate we are getting meaningful results.
i.e. the researcher may know that the value of x must be within some bounds. If not, then a logical error has been made in the code (assuming it compiles and their is no compile time error).
Are there any pre-written packages for this kind of thing. The code is written in FORTRAN, C, C++ etc.
Any specific or general advice would be appreciated.
I expect most unit testing frameworks could do this; supply a toy test data set and see that the answer is sane in various different ways.
A good way to ensure that the resulting value of any computation (whether final or intermediate) meets certain constraints, is to use an object oriented programming language like C++, and define data-types that internally enforce the conditions that you are checking for. You can then use those data-types as the return value of any computation to ensure that said conditions are met for the value returned.
Let's look at a simple example. Assume that you have a member function inside of an Airplane class as a part of a flight control system that estimates the mass of the airplane instance as a function of the number passengers and the amount of fuel that plane has at that moment. One way to declare the Airplane class and an airplaneMass() member function is the following:
class Airplane {
int airplaneMass() const; // note the plain int return type
However, a better way to implement the above, would be to define a type AirplaneMass that can be used as the function's return type instead of int. AirplaneMass can internally ensure (in it's constructor and any overloaded operators) that the value it encapsulates meets certain constraints. An example implementation of the AirplaneMass datatype could be the following:
class AirplaneMass {
// AirplaneMass constructor
AirplaneMass(int m) {
if (m < MIN || m > MAX) {
// throw exception or log constraint violation
// if the value of m meets the constraints,
// assign it to the internal value.
mass_ = m;
/* range checking should also be done in the implementation
of overloaded operators. For instance, you may want to
make sure that the resultant of the ++ operation for
any instance of AirplaneMass also lies within the
specified constraints. */
int mass_;
Thereafter, you can redeclare class Airplane and its airplaneMass() member function as follows:
class Airplane {
AirplaneMass airplaneMass() const;
// note the more specific AirplaneMass return type
The above will ensure that the value returned by airplaneMass() is between MIN and MAX. Otherwise, an exception will be thrown, or the error condition will be logged.
I had to do that for conversions this month. I don't know if that might help you, but it appeared quite simple a solution to me.
First, I defined a tolerance level. (Java-ish example code...)
private static final double TOLERANCE = 0.000000000001D;
Then I defined a new "areEqual" method which checks if the difference between both values is lower than the tolerance level or not.
private static boolean areEqual(double a, double b) {
return (abs(a - b) < TOLERANCE);
If I get a false somewhere, it means the check has probably failed. I can adjust the tolerance to see if it's just a precision problem or really a bad result. Works quite well in my situation.