Can I use akka persistence when the actor state is only increase in size? - event-sourcing

I'm playing with akka persistence trying to implement a service where my state is a potentially very big (well let's say it won't fit in RAM) list of some entities. Lets say user want all history on all entities to be available. Can I do that in akka persistence?
Right now my actor state looks like that.
case class System(var processes: Map[Long, Process] = Map()) {
def updated(event: Event): System = event match {
case ProcessDetectedEvent(time, activitySets, id, processType) =>
val process = Process(activitySets.coordinates, time, activitySets.channels, id, processType, false)
copy(processes = processes + (id -> process))
case ProcessMovedEvent(id, activitySets, time) =>
val process = Process(activitySets.coordinates, time, activitySets.channels, id, processes(id).processType, false)
copy(processes = processes + (id -> process))
case ProcessClosedEvent(time, id) =>
val currentProcess = processes(id)
val process = Process(currentProcess.coordinates, time, currentProcess.channels, id, currentProcess.processType, true)
copy(processes = processes + (id -> process))
case _ => this
}
}
As you can see the map of Processes is stored in memory, so the application can run out of memory if the number of processes would be large.

Akka persistence is used by a stateful actor to recover its internal state when the actor is started, restarted after a JVM crash or by a supervisor, or migrated in a cluster. In this case application/JVM may crash with OutOfMemory exception in the long run. And when this actor restarts, the persistence recovery mechanism would recreate all the Process's information in the map. But again the total memory would be high and the application can again crash while running. So persistence in this case would not be helpful to avoid the application crash unless you persist only a partial list of processes to reduce the memory.
So first you need to figure out a way to solve this memory exception. May be you can try the following options.
Try to increase the JVM heap size during JVM restart after the OutOfMemory exception.
While recovering the state, replay only a selected list of messages so that total memory used would be low but the state is incomplete.
If the list of messages to be replayed during recovery is too large, snapshots can be used to reduce the state recovery time.

Perhaps you want to think about whether there are meaningful ways to partition your data set into scopes that do have some reasonable bounds. Then you could represent each scope by a persistent actor, and if you need information that spans your entire store, you would have to have some kind of coordinator for managing the scopes and iterating over them. But depending on how sophisticated this starts getting, at some point, I'd have to wonder if you'd be reinventing map-reduce or Spark.

I think what you might be looking for (at least it's another option) is snapshots.
When using event sourcing and event reply the generally advised approach is to use a snapshot every so often.
So when you get back your events you get back a snapshot and then the events that took place since that snapshot. This means you have less objects streamed from your event storage (less memory) and less things to process and apply (faster)... but this does come with it's own trade-offs... which I won't discuss here.
This again only covers the most common scenarios. If your eventing handling changes then you may need to rebuild your events... although this would raise some rather serious and interesting questions about how you are building your system.
I haven't looked too closely, but akka might have this notion of snapshoting built into it. If not, there's a learning curve and lots of trial and error on the road ahead as you start hitting all those roads less travelled in ideal approaches but the real world throws at you.

Related

How to implement a cache in a Vertx application?

I have an application that at some point has to perform REST requests towards another (non-reactive) system. It happens that a high number of requests are performed towards exactly the same remote resource (the resulting HTTP request is the same).
I was thinking to avoid flooding the other system by using a simple cache in my app.
I am in full control of the cache and I have proper moments when to invalidate it, so this is not an issue. Without this cache, I'm running into other issues, like connection timeout or read timeout, the other system having troubles with high load.
Map<String, Future<Element>> cache = new ConcurrentHashMap<>();
Future<Element> lookupElement(String id) {
String key = createKey(id);
return cache.computeIfAbsent(key, key -> {
return performRESTRequest(id);
}.onSucces( element -> {
// some further processing
}
}
As I mentioned lookupElement() is invoked from different worker threads with same id.
The first thread will enter in the computeIfAbsent and perform the remote quest while the other threads will be blocked by ConcurrentHashMap.
However, when the first thread finishes, the waiting threads will receive the same Future object. Imagine 30 "clients" reacting to the same Future instance.
In my case this works quite fine and fast up to a particular load, but when the processing input of the app increases, resulting in even more invocations to lookupElement(), my app becomes slower and slower (although it reports 300% CPU usage, it logs slowly) till it starts to report OutOfMemoryException.
My questions are:
Do you see any Vertx specific issue with this approach?
Is there a more Vertx friendly caching approach I could use when there is a high concurrency on the same cache key?
Is it a good practice to cache the Future?
So, a bit unusual to respond to my own question, but I managed to solve the problem.
I was having two dilemmas:
Is ConcurentHashMap and computeIfAbsent() appropriate for Vertx?
Is it safe to cache a Future object?
I am using this caching approach in two places in my app, one for protecting the REST endpoint, and one for some more complex database query.
What was happening is that for the database query there was up to 1300 "clients" waiting for a response. Or 1300 listeners waiting for an onSuccess() of the same Future. When the Future was emitting strange things were happening. Some kind of thread strangulation.
I did a bit of refactoring eliminating this concurrency on the same resource/key, but I did kept both caches and things went back to normal.
In conclusion I think my caching approach is safe as long as we have enough spreading or in other words, we don't have such a high concurrency on the same resource. Having 20-30 listeners on the same Future works just fine.

Turn recovery on after first message

I have a persistent actor which receives many messages. Fist message is CREATE (case class) and next messages are UPDATEs (case classes). So if it receives CREATE then it should not go into persistence to run recovery because the storage is empty for this actor. It's performance wasting from my perspective.
Is there any possibility to do not call recovery for particular input message (the first one which is CREATE), please?
A persistent actor will always have to hit the database, because there is no other way to know whether it having existed before - it could have been created in a previous instance of the application that was stopped or it could have been created on a different node in a cluster.
In general a good pattern for performance is to keep the actor in memory after it has been hit the first time, as that will allow as fast responses as possible. The most common way to do this is using Cluster Sharding (which you can read more about in the docs here: https://doc.akka.io/docs/akka/current/cluster-sharding.html?language=scala#cluster-sharding
I have never heard of anyone seeing the hit for an empty persistent actor as a performance problem and I'm not sure it is possible to solve that in a general way, so if you have such a problem and somehow can know the actor was never created before you can not do that with Akka Persistence but would have to build a special solution for that yourself.

Low loading into cache speed

I'm using Infinispan 6.0.0 in a 3-node setup (distributed caching with 2 replicas for each entry, no writes into persistent store) and I'm just reading the file line-by-line and storing that lines' contents into the cache. The speed seems a bit low to me (I can achieve more writes onto the SSD (persistent storage) than into RAM with Infinispan), but there isn't any obvious bottleneck in the test code (I'm using buffered input streams, and their limits certainly aren't reached. As for now, I'm able to write 100K entries each ~45 seconds and that doesn't satisfy me. Assume simplified code snippet:
while ((s = reader.readLine()) != null) {
cache.put(s.substring(0,2), s.substring(2,5));
}
And CacheManager is created as follows:
return new DefaultCacheManager(
GlobalConfigurationBuilder.defaultClusteredBuilder()
.transport().addProperty("configurationFile", "jgroups.xml").build(),
new ConfigurationBuilder()
.clustering().cacheMode(CacheMode.DIST_ASYNC).hash().numOwners(2)
.transaction().transactionMode(TransactionMode.TRANSACTIONAL).lockingMode(LockingMode.OPTIMISTIC)
.build());
What could I be possibly doing wrong?
I am not fully aware of all the asynchronous mode specialities, but I'd afraid that something in the two-phase commit (Prepare and Commit) might force some blocking RPC => waiting for network latency => slow down.
Do you need transactional behaviour? If not, switch them off. If you really need it, you may disable just the autocommit feature and load the cluster via non-transactional operations. Or, you may try one phase commits.
Another option could be mass loading via putAll (with tens or hundreds of entries, depends on your entry size), but routing of this message is not really smart. In transactional mode it could behave a bit better, I guess.
The last option if you just want to load the cluster fast and then operate on it could be transferring the bulk data to each node without Infinispan (using your own JGroups channel, or just with sockets), and loading all nodes with the CACHE_MODE_LOCAL flag.
By default Infinispan follows the Map.put() contract of returning the previous value, so even though you are using the DIST_ASYNC cache mode you're still implicitly performing a synchronous cache.get() for every put.
You can avoid this in two ways:
configurationBuilder.unsafe().unreliableReturnValues(true) will suppress the remote lookup for all the operations on the cache.
cache.getAdvancedCache().withFlags(Flag.IGNORE_RETURN_VALUES).put(k, v) will suppress the remote lookup for a single operation.

Big difference in throughput when using MSMQ as subscription storage for Nservicebus as opposed to RavenDB

Recently we noticed that our Nservicebus subscribers were not able to handle the increasing load. We have a fairly constant input stream of events (measurement data from embedded devices), so it is very important that the throughput follows the input.
After some profiling, we concluded that it was not the handling of the events that was taking a lot of time, but rather the NServiceBus process of retrieving and publishing events. To try to get a better idea of what goes on, I recreated the Pub/Sub sample (http://particular.net/articles/nservicebus-step-by-step-publish-subscribe-communication-code-first).
On my laptop, using all the NServiceBus defaults, the maximum throughput of the Ordering.Server is about 10 events/second. The only thing it does is
class PlaceOrderHandler : IHandleMessages<PlaceOrder>
{
public IBus Bus { get; set; }
public void Handle(PlaceOrder message)
{
Bus.Publish<OrderPlaced>
(e => { e.Id = message.Id; e.Product = message.Product; });
}
}
I then started to play around with configuration settings. None seem to have any impact on this (very low) performance:
Configure.With()
.DefaultBuilder()
.UseTransport<Msmq>()
.MsmqSubscriptionStorage();
With this configuration, the throughput instantly went up to 60 messages/sec.
I have two questions:
When using MSMQ as subscription storage, the performance is much better than RavenDB. Why does something as trivial as the storage for subscription data have such an impact?
I would have expected a much higher performance. Are there any other configuration settings that I should use to get at least one order of magnitude better than this? On our servers, the maximum throughput when running this sample is about 200 msg/s. This is far from spectacular for a system that doesn't even do anything useful yet.
MSMQ doesn't have native pub/sub capabilities so NServiceBus adds support this by storing the list of subscribers and then looping over that list sending a copy of the event to each of the subscribers. This translates to X message queuing operations where X is the number of subscribers. This explains why RabbitMQ is faster since it has native pub/sub so you would only need one operation against the broker.
The reason the storage based on a msmq queue is faster is that it's a local storage (can't be used if you need to scaleout the endpoint) and that means that we can cache the data since that can't be any other endpoint instances updating the storage. In short this means that we get away with a in memory lookup which as you can see is the fastest option.
There are plans to add native caching across all storages:
https://github.com/Particular/NServiceBus/issues/1320
200 msg/s sounds quite low, what number do you get if you skip the bus.Publish? (just to get a base line)
Possibility 1: distributed transactions
Distributed transactions are created when processing messages because of the combination Queue-Database.
Try measuring without transactional handling of the messages. How does that compare?
Possibility 2: msmq might not be the best queueing system for your needs
Ever considered switching to rabbitmq for transport? I have very good experiences with RabbitMq in combination with MassTransit. Way exceed the numbers you are mentioning in your question.

When multi MessageConsumer connect to same queue(Websphere MQ),how to load balance message-consumer?

I am Using WebSphere MQ 7,and I have two clients connected to the same QMgr and consuming messages from same queue, like following code:
while (true) {
TextMessage message = (TextMessage) consumer.receive(1000);
if (message != null) {
System.out.println("*********************" + message.getText());
}
}
I found only one client always retrieve messages. Is there any method to let consume-message load balancing in two client? Any config options in MQ Server side?
When managing queue handles, it is MUCH faster for WMQ to put them in a stack rather than a LIFO queue. So if the messages arrive on the queue slower than it takes to process them, it is possible that an instance will process the message and perform another GET, which WMQ pushes down on the stack. The result is that only one instance will see messages in a low-volume use case.
In larger environments where there are many instances waiting on messages, it is possible that activity will round-robin amongst a portion of those instances while the other instances starve for messages. For example, with 10 GETters on the queue you may see three processing messages and 7 idle.
Although this is considerably faster for MQ, it is confusing to customers who are not aware of how it works internally and so they open PMRs asking this exact question. IBM had to choose among several alternatives:
Adding several code paths to manage by stack for performance when fully loaded, versus manage by LIFO for apparent balancing when lightly loaded. This bloats the code, adds many new decision points to introduce errors and solves a problem that was one of perception rather than reliability or performance.
Educate the customers as to how it works. Of course, once you document it, then you can't change it. The way I found out about this was attending the "WMQ Internals" presentation at IMPACT. It's not in the Infocenter so IBM can change it, but it is available for customers.
Do nothing. Although this is the best result from the code design point of view, the behavior is counter-intuitive. Users need to understand why things do not behave as expected and will waste time trying to find the configuration that results in the desired behavior, or open a PMR.
I don't know for sure that it still works this way but I expect that it does. The way I used to test it was to put many messages on the queue at once and then see how they were distributed. If you drop about 50 messages on the queue in one unit of work, you should see a better distribution between the two instances.
How do you drop 50 messages on the queue at once? First generate them with the applications turned off or to a spare queue. If you generated them in the target queue, use the Q program to move them to the spare queue. Now start the apps and make sure the queue's IPPROC count equals however many instances of the app you started. Using Q again, copy all of the messages to the original queue in a single unit of work. Since they all become available on the queue at once, your two app instances should both immediately be passed a message. If you used copy instead of move, you can repeat this as often as required.
Your client is not doing much, so one instance can probably handle the full load. Try implementing a more realistic workload, or, simpler yet, put a Thread.sleep in the client.

Resources