I am trying to use MPI Distributed Simulation feature of NS-3.
I have implemented an application and a node class in my module.
I also have a factory class as a singleton object. Do I have to consider synchronization using monitors and Mutex in my singleton class?
In some of the functions I am changing class variables and therefore it looks like that I should consider thread safety but I am not sure how MPI works and if it actually creates one instance of the object or it creates separate objects in every process.
Thanks
The ns-3 MPI support distributes ns-3 nodes across mpi computing nodes so, if you have one process-level factory singleton, there will be one instance of this factory on each mpi processing node and it will not share its state with other instances of the factory on other nodes.
In general, it is considered a really bad idea to use global state (i.e., state that is shared between node instances) with MPI-based simulations.
Related
I'm new to Dask
what i'm trying to find is "shared array between processes and it needed to be writable by any proccess"
could someone can show me that?
Top
a way to implement shared writable array in dask
Dask's internal abstraction is a DAG, a functional graph in which it is assumed that tasks act the same should you rerun them ("functionally pure"), since it's always possible that a task runs in two places, or that a worker which holds a task's output dies.
Dask does not, therefore, support mutable data structures as task inputs/outputs normally. However, you can execute tasks that create mutation as a side-effect, such as any of the functions that write to disk.
If you are prepared to set up your own shared memory and pass around handles to this, there is nothing stopping you from making functions that mutate that memory. The caveats around tasks running multiple times hold, and you would be on your own. There is no mechanism currently to do this kind of thing for you, but it is something I personally intend to investigate within the next few months.
According to the MPI implementation of Storm the workers manage connections to other workers and maintain a mapping from task to task. Also, transferring takes in a task id and a tuple, and it serializes the tuple and puts it onto a "transfer queue”.
The question is, if there is a way to organise scheduling, such that certain tasks of an operator communicate to only certain tasks of the following operator at a given time according to the application’s topology (could ZeroMQ possibly do something like this?).
Q : "If there is a way to organise scheduling, such that certain tasks of an operator communicate to only certain tasks of the following operator at a given time according to the application’s topology ( could ZeroMQ possibly do something like this? )."
Obviously could,it does allow smart & flexible creation of signalling/messaging meta-plane(s) infrastructure(s) for the distributed-computing, improving itself in doing this for about the last 12+ years.
The #HristoIlliev attached comment's URL details that Apache-Storm itself reports to already use the ZeroMQ-layer for its own services *[in ver.0.8.0, almost all implementation (source-code) links unfortunately already dead there]:
The implementation for distributed mode uses ZeroMQ code
The implementation for local mode uses in-memory Java queues (so that it's easy to use Storm locally without needing to get ZeroMQ installed) code
...
Tasks listen on an in-memory ZeroMQ port for messages from the virtual port code
So the topology-related part of your question is related to the decision already made on this subject in the "outer" Apache-Storm architecture, that was done.
Tasks are responsible for message routing. A tuple is emitted either to a direct stream (where the task id is specified) or a regular stream. In direct streams, the message is only sent if that bolt subscribes to that direct stream. In regular streams, the stream grouping functions are used to determine the task ids to send the tuple to.
The MPI does the same for the HPC-focused computing ecosphere, since FORTRAN jobs started to run on first HPC distributed infrastructures. Due to the most of the HPC-computing problems were "simply" scaled onto larger footprints of the computing hardware, the MPI focus was more on efficiency of such uniform scaling, not visiting thus the opposite corner of adaptive, almost ad-hoc setup of message-passing infrastructure, layered topologies of specialised ZeroMQ Scalable Formal Communication Archetypes Patterns, so each of the tools focus on other factors.
If you feel you want to read a bit more on ZeroMQ, this answer might help to fast understand the core underlying concepts.
Assuming
I am building a stateless micro-service which exposes a few simple API endpoints (e.g., a read-through cache, save an entry to database),
I am using the non-blocking database clients e.g., mysql or redis and
I always want my microservices to speak to each other via HTTP (by placing EC2 instances behind a load balancer)
Questions
When will I want to use more than 1 standard verticles (i.e., write the whole microservice as a single verticle and deploy n instances of it (n = num of event-loop threads))?. Won't adding more verticles only add to serialization and context-switching costs?
Lets say I split the microservice into multiple standard verticles (for whatever reason). Wouldn't deploying n (n = num of event-loop threads) instances of each always give better performance than deploying a different ratio of instances. As each verticle is just a listener on an address and it will mean every event-loop thread can handle all kinds of messages and they are load balanced already.
When will I want to run my application in cluster mode? Based on docs, I get the feeling that cluster mode makes sense only when you have multiple verticles and that too when you have an actual use-case for clustering e.g., different EC2 instances handle requests for different users to help with data locality (say using ignite)
P.S., please help if even if you can answer one of the above questions.
I always want my microservices to speak to each other via HTTP (by placing EC2 instances behind a load balancer)
It doesn't make much sence to use Vertx if you already went for this overcomplicated approach.
Vertx is using Event Bus for in-cluster communication, eliminating the need for HTTP as well as LB in front.
Answers:
Why should it? If verticles are not talking to each other, where the serialization overhead should occur?
If your verticles are using non-blocking calls (and thus are multithreded), you won't see any difference between 1 or N instances on the same machine. Also if your verticle starts a (HTTP) server over a certain port, then the all instances will share that single server accross all thread (vertx is doing some magic reroutings here)
Cluster mode is the thing which I mentioned in the beginning. This is the proper way to distribute and scale you microservices.
A verticle is a way to structure your code. So, you'd want verticle of another type probably when your main verticle grows too big. How big? That depends on your preferences. I try to keep them rather small, about 200 LOC at the most, to do one thing.
Not necessarily. Different verticles can perform very different tasks, at different paces. Having N instances of all of them is not necessarily bad, but rather redundant.
Probably never. Clustered mode was a thing before microservices. Using it adds another level of complexity (cluster manager, for example Hazelcast), and it also means you cannot be as polyglot.
I want to know the applicability of the Akka Actor model.
I know it is useful in the case a huge number of Actor instances are created and destroyed. e.g. a call server, where every incoming call creates an actor instance and communicates with few other actors and get killed after the call is over.
Is it also useful in the following scenario :
A server has a few processing elements (10~50) implemented over Actors. The lifetime of these processing elements is infinite. some of them do not maintain state and a few maintain state. The processing elements process the message and pass the message to other actors in a fixed manner. The system receives a huge number of messages from outside and gets passed through processing elements and goes out of the system.
My gut feeling is that we cannot get any advantage by using Akka Actor model and even implementing this server in Scala. Because the use case for which Akka is designed, is not applicable here. If the scale-up meant that processing elements be increased dynamically then it would be applicable.
For fixed topologies, I think if i implement it in Java, it is going to be more beneficial in terms of raw performance. The 'immutability' feature of Scala leads to more copies and so reduces performance. So i believe i better stick to Java.
Is my understanding correct? I a nut shell i want to know why i should leave Java and use Scala/Akka for the application scenario above. and my target is to process 1 million messages per second.
If this question is still actual...
Scala vs. Java
Scala gives productivity to developers.
Immutability decreases debugging to almost zero level.
GC perfectly copes with waste immutables.
Akka Actors vs. other means
Akka has dispatcher that distributes all tasks across fixed thread pool. This allows to evenly consume available resources. This approach is much better than the fixed worker threads — the processing resources are provided to the tasks not DataFlow nodes.
DataFlow implementation
There is a SynapseGrid library that is built on top of Akka Actors and allows easy construction of DataFlow systems distributed over fixed immortal Actors. It can even draw the DataFlow diagram (in .dot format) of the whole system.
(The library is more convenient to be used with Scala.)
I'm wondering if there's an approved practice in a multi-threaded app. Should I have one DAO per thread or simply make one DAO a thread safe singleton.
This really depends a lot on the mechanism you're using for data access. If you have a very scalable data access, and lots of threads, using some form of thread static data access can be advantageous.
If you don't have scalable data access, your provider doesn't support multiple threads per process, or you just don't need the scalability at that point, using a singleton with appropriate synchronization is simpler and easier to implement.
For most business style applications, I personally think the singleton approach is easier to maintain, and probably better - if for no other reason than it's much, much easier to test effectively. Having multiple threads for data access is likely not required, as the data access is probably not going to be a bottleneck that effects usability (if you design correctly, and batch requests appropriately).
Use the approach that best suits your application architecture, unless:
1) Your data access objects are expensive to create, in which case you should lean toward a thread-safe singleton.
2) Your objects maintain mutable state, as in the Active Record pattern. (Immutable DAO configuration state, like timeout thresholds, doesn't count.)