Actors: Thread based vs Event driven - actor

Reading about actor based programming, I came across thread based actors and event driven actors.
What's the difference between two? And when to use which?

The Actor Model itself is a concurrency specification. From the Actor Model perspective, there is no concept of "events", "threads" or a "processes", only Actors and messages.
This is actually one of the benefits of the Actor model in that it takes a different approach to concurrency that provides greater isolation. Any shared information must be exchanged via messages and the processing of a particular message is atomic, so no special concurrency synchronization techniques are required during execution of an Actor's message handler.
By contrast, threads run in a shared memory space, which requires careful coordination of memory updates (via mutexes and locks) or simulated exchange of information (via queues... with locks or mutexes). Processes implement a concurrency model that is more similar to Actors in that they provide more isolation, but do not have native support for message exchange so the information exchange must often be constructed and managed within the process via pipes or sockets and corresponding protocols and resource management; Actors provide this message exchange functionality "out of the box" as it were.
Practically speaking, most Operating Systems provide the "process" and "thread" concurrency implementations instead of Actors, so the Actor framework you utilize will need to map your Actors onto the operating system's underlying concurrency model. Some do this by implementing each Actor as a separate thread, some by implementing Actors as separate processes, some by using one or more threads and iterating through the set of Actors to pass messages. Even event-driven implementations such as Beam (the Erlang VM) will need to perform this mapping if they wish to utilize multiple processor contexts and there is an Operating System between the VM and the hardware.
When you write your Actors, you are writing to that concurrency abstraction and should not be concerned with any concurrency model remapping handled by your Actor Framework: that is the purpose and the purview of the framework.
Unfortunately, since the OS manages the system resources, and the OS's concurrency model (along with your language) is usually built around a thread and/or process model, you will need to consider the effect of the mapping when working with system resources. For example, if your Actor is written in C and will be making a blocking read() syscall on a file or socket, an event driven model will cause all Actors to halt, whereas a thread- or process-based Actor model should allow the other Actors to continue processing while the other Actor is blocked on the read() call. I this same scenario but using CPython (for example), both an event-driven and a threaded Actor model would still block all other Actors due to the CPython GIL, but a process-based Python Actor model would still be able to allow concurrent Actor execution. Many Actor Frameworks will provide additional functionality to help maintain the Actor model abstractions when using system resources, so you will need to check the details of your Framework when writing Actors that interact with OS-managed resources (especially in a blocking manner).
Some Actor Frameworks are capable of changing the implementation mode based on startup configuration parameters (e.g. running as event-driven, threaded, or process-based depending on a start parameter). This can be very useful; for example, simple testing using a cooperative, single-threaded implementation allows repeatable and understandable test results while the production application still runs with full concurrency and no modifications.
In general, when writing Actors you should try to use Actor-based concurrency practices and migrate any external interfaces that might be affected by the underlying implementation to the edge of your design where they can be easily redesigned if needed.

Related

Is there a built-in concurrency control in boost::interprocess::message_queue?

In a multiple producer-single consumer setting, is there a built-in concurrency control mechanism among the producer processes that are calling send() on the message_queue?
What happens if there are multiple instances of the consumer processes all calling receive() on the message_queue?
To the question whether message_queue is thread-safe, the answer is yes.
message_queue is implemented in terms of shared_memory_object using some control structures (ipcdetail::mq_hdr_t) which include synchronization primitives (like interprocess_mutex).
These are used to prevent unguarded concurrent access.

What does JMS Session single-threadedness mean?

What is the exact nature of the thread-unsafety of a JMS Session and its associated constructs (Message, Consumer, Producer, etc)? Is it just that access to them must be serialized, or is it that access is restricted to the creating thread only?
Or is it a hybrid case where creation can be distinguished from use, i.e. one thread can create them only and then another thread can be the only one to use them? This last possibility would seem to contradict the statement in this answer which says "In fact you must not use it from two different threads at different times either!"
But consider the "Server Side" example code from the ActiveMQ documentation.
The Server class has data members named session (of type Session) and replyProducer (of type MessageProducer) which are
created in one thread: whichever one invokes the Server() constructor and thereby invokes the setupMessageQueueConsumer() method with the actual creation calls; and
used in another thread: whichever one invokes the onMessage() asynchronous callback.
(In fact, the session member is used in both threads too: in one to create the replyProducer member, and in the other to create a message.)
Is this official example code working by accident or by design? Is it really possible to create such objects in one thread and then arrange for another thread to use them?
(Note: in other messaging infrastructures, such as Solace, it's possible to specify the thread on which callbacks occur, which could be exploited to get around this "thread affinity of objects" restriction, but no such API call is defined in JMS, as far as I know.)
JMS specification says a session object should not be used across threads except when calling Session.Close() method. Technically speaking if access to Session object or it's children (producer, consumer etc) is serialized then Session or it's child objects can be accessed across threads. Having said that, since JMS is an API specification, it's implementation differs from vendor to vendor. Some vendors might strictly enforce the thread affinity while some may not. So it's always better to stick to JMS specification and write code accordingly.
The official answer appears to be a footnote to section 4.4. "Session" on p.60 in the JMS 1.1 specification.
There are no restrictions on the number of threads that can use a Session object or those it creates. The restriction is that the resources of a Session should not be used concurrently by multiple threads. It is up to the user to insure that this concurrency restriction is met. The simplest way to do this is to use one thread. In the case of asynchronous delivery, use one thread for setup in stopped mode and then start asynchronous delivery. In more complex cases the user must provide explicit synchronization.
Whether a particular implementation abides by this is another matter, of course. In the case of the ActiveMQ example, the code is conforming because all inbound message handling is through a single asynchronous callback.

Can Akka actors and Java8 CompletableFuture's be safely combined?

As far as I understood the Akka documentation, an Akka ActorSystem contains its own thread pool to execute actors. I use Akka in a Java application that also uses Java8 futures; the latter are executed by the ForkJoinPool.commonPool(). So, actors and futures use different pools, which may defeat certain assumptions hidden in the two schedulers (e.g. the Akka scheduler might assume that futures are run on the Akka pool). Could this create any performance problems?
There are no hidden assumptions concerning the execution of Actors and Futures: the only guarantee that we give is that any given Actor is only executed on at most one thread at any given time. Futures observe no such restrictions, they run whenever the ExecutionContext (or ThreadPool) decides to run them.
You will of course have to observe all the same caveats when combining Actors with Java8 Futures that also apply to Scala Futures, see the docs. In particular, never touch anything (no fields, no methods) of an Actor from within a Future task or callback. Only the ActorRef is safe.

boost.asio - do i need to use locks if sharing database type object between different async handlers?

I'm making a little server for a project, I have a log handler class which contains a log implemented as a map and some methods to act on it (add entry, flush to disk, commit etc..)
This object is instantiated in the server Class, and I'm passing the address to the session so each session can add entries to it.
The sessions are async, the log writes will happen in the async_read callback. I'm wondering if this will be an issue and if i need to use locks?
The map format is map<transactionId map<sequenceNum, pair<head, body>>, each session will access a different transactionId, so there should be no clashes as far as i can figure. Also hypothetically, if they were all writing to the same place in memory -- something large enough that the operation would not be atomic; would i need locks? As far as I understand each async method dispatches a thread to handle the operation, which would make me assume yes. At the same time I read that one of the great uses of async functions is the fact that synchronization primitives are not needed. So I'm a bit confused.
First time using ASIO or any type of asynchronous functions altogether, and i'm not a very experienced coder. I hope the question makes sense! The code seems to run fine so far, but i'm curios if it's correct.
Thank you!
Asynchronous handlers will only be invoked in application threads processing the io_service event loop via run(), run_one(), poll(), or poll_one(). The documentation states:
Asynchronous completion handlers will only be called from threads that are currently calling io_service::run().
Hence, for a non-thread safe shared resource:
If the application code only has one thread, then there is neither concurrency nor race conditions. Thus, no additional form of synchronization is required. Boost.Asio refers to this as an implicit strand.
If the application code has multiple threads processing the event-loop and the shared resource is only accessed within handlers, then synchronization needs to occur, as multiple threads may attempt to concurrently access the shared resource. To resolve this, one can either:
Protect the calls to the shared resource via a synchronization primitive, such as a mutex. This question covers using mutexes within handlers.
Use the same strand to wrap() the ReadHandlers. A strand will prevent concurrent invocation of handlers dispatched through it. For more details on the usage of strands, particularly for composed operations, such as async_read(), consider reading this answer.
Rather than posting the entire ReadHandler into the strand, one could limit interacting with the shared resource to a specific set of functions, and these functions are posted as CompletionHandlers to the same strand. This subtle difference between this and the previous solution is the granularity of synchronization.
If the application code has multiple threads and the shared resource is accessed from threads processing the event loop and from threads not processing the event loop, then synchronization primitives, such as a mutex, needs to be used.
Also, even if a shared resource is small enough that writes and reads are always atomic, one should prefer using explicit and proper synchronization. For example, although the write and read may be atomic, without proper memory fencing to guarantee memory visibility, a thread may not observe a chance in memory even though the actual memory has chanced. Boost.Asio's will perform the proper memory barriers to guarantee visibility. For more details, on Boost.Asio and memory barriers, consider reading this answer.

How to design and structure a program that uses Actors

From Joe Armstrong's dissertation, he specified that an Actor-based program should be designed by following three steps. The thing is, I don't understand how the steps map to a real world problem or how to apply them. Here's Joe's original suggestion.
We identify all the truly concurrent activities in our real world activity.
We identify all message channels between the concurrent activities.
We write down all the messages which can flow on the different message channels.
Now we write the program. The structure of the program should exactly follow the structure of the problem. Each real world concurrent activity should be mapped onto exactly one concurrent process in our programming language. If there is a 1:1 mapping of the problem onto the program we say that the program is isomorphic to the problem.
It is extremely important that the mapping is exactly 1:1. The reason for this is that it minimizes the conceptual gap between the problem and the solution. If this mapping is not 1:1 the program will quickly degenerate, and become difficult to understand. This degeneration is often observed when non-CO languages are used to solve concurrent problems. Often the only way to get the program to work is to force several independent activities to be controlled by the same language thread or process. This leads to an inevitable loss of clarity, and makes the programs subject to complex and irreproducible interference errors.
I think #1 is fairly easy to figure out. It's #2 (and 3) where I get lost. To illustrate my frustration I stubbed out a small service available in this gist (Ruby service with callbacks).
Looking at that example service I can see how to answer #1. We have 5 concurrent services.
Start
LoginGateway
LogoutGateway
Stop
Subscribe
Some of those services don't work (or shouldn't) depending on the state the service is in. If the service hasn't been Started, then Login/Logout/Subscribe make no sense. Does this kind of state information have any relevance to Joe's 3 steps?
Anyway, given the example/mock service in that gist, I'm wondering how someone would go about designing a program to wrap this service up in an Actory fashion. I would just like to see a list of guidelines on how to apply Joe's 3 steps. Bonus points for writing some code (any language).
Generally, when structuring an application to use actors you have to identify the concurrent features of your application, which can be tricky to get the hang of. You identify 5 concurrent "services":
Start
LoginGateway
LogoutGateway
Stop
Subscribe
1, 4 and 5 seem to be types of messages that can flow through the system, 2 and 3 I'm not sure how to describe. Your gist is rather large and not super clear to me, but it looks like you've got some kind of message queue system. The actions a User can take are:
Log in to the system
Log out of the system
Subscribe to a Queue of messages
I'll assume logging in and out requires some auth step. I'll assume further that if the user fails the auth step their connection is broken but that creating a connection is not sufficient authentication.
The actions the System takes are:
Handling User actions
Routing messages to subscribers of a Queue
If that's not broadly true, let me know and I'll change this answer. (I'll assume that the messages that get sent to users are not generated by users but are an intrinsic part of the System; maybe we're discussing a monitoring service.) Anyhow, what is concurrent here? A few things:
Users act independently of one another
Queues have separate states
An actor based architecture represents each concurrent entity as its own process. The User is a finite state machine which authenticates, subscribes to a queue, alternatively receives messages and subscribes to more queues and eventually disconnects. In Erlang/OTP we'd represent this by a gen_fsm. The User process carries all the state needed to interact with the client which, if we're exposing a service over a network, would be a socket.
Authentication implies that the System is itself a 'process', though, more likely than not it's really a collection of processes which in Erlang/OTP we call an application. I digress. For simplification we'll assume that System is itself a single process which has some well-defined protocol and a state that keeps user credentials. User logins are, then, a well-defined message from a User process to the System process and the response therefrom. If there were no authentication we'd have no need for a System process as the only state related to a User would be a socket.
The careful reader will ask where do we accept socket connections for each User? Ah, good question. There's another concurrent entity in not mentioned, which we'll call here the Listener. It's another process that only listens for connections, creates a User for each new established socket and hands over ownership to the new User process, then loops back to listen.
The Queue is also a finite state machine. From its start state it accepts User subscription requests via a well-defined protocol, broadcasts messages to subscribers or accepts unsubscribe requests from User processes. This implies that the Queue has an internal store of User processes, the details of which are very dependent on language and need. In Erlang/OTP, for example, each Queue process would be a gen_server which stored User process ids--or PIDs--in a list and for each message to transmit simply did a multi-send to each User process in the list.
(In Erlang/OTP we'd user supervisors to ensure that processes stay alive and are restarted on death, which simplifies greatly the amount of work an Erlang developer has to do to ensure reliability in an actor-based architecture.)
Basically, to restate what Joe wrote, actor based architecture boils down to these points:
identify concurrent entities in the system and represent them in the implementation by processes,
decide how your processes will send messages (a primitive operation in Erlang/OTP, but something that has to be implemented explicitly in C or Ruby) and
create well-defined protocols between entities in the system which hide state modification.
It's been said that the Internet is the world's most successful actor based architecture and, really, that's not far off.

Resources