How to handle and updated a shared hash map in Actor System? - spring-boot

Hi, I have meta-data , HashMap object , need to be accessed inside my
Actor class, and need to be looked up. If found key I should used it
inside the Actor business logic. If not found I need to create one key
and value and update the HashMap object.
So how to handle this in Actor System? as you know each actor instance
would generate a not-found-key simultaneously which would result in
duplicate and inconstant. So what is the industry standard to
handle this scenario? Please provide your advice how to handle it.

If I understand you correctly, your situation is such that you have one hash map that is accessed in multiple actors, and you want to know how to keep a consistent state in the hash map across all the actors.
There should be one publisher actor and several subscriber actors. The publisher holds the canonical copy of the hash map. The publisher first sends a copy of the hash map to all the subscriber actors. These subscribers start performing business logic on the hash map.
When the business logic in a subscriber actor wants to update the hash map, it sends a message to the publisher actor. The subscriber does not update the hash map in its local actor state. Instead, it waits for the published hash map from the publisher.
The publisher actor accepts the key-value pair from the subscriber and uses it to update its canonical copy of the hash map. It then publishes that updated hash map to all the subscribers.
There are two ways for the subscriber to send its key value pair to the publisher. One is asynchronous, the other is synchronous. The first uses tell, the second uses ask. Tell is lightweight, ask is heavyweight. Ask has the advantage that there is no gap between sending the update to the publisher and receiving the updated hash map back. Of course, the subscriber that did the ask will receive two copies of the updated hash map: the first time as the response to the ask, the second time when the publisher publishes the hash map to all the subscribers. This does not cause any issues since Akka guarantees that messages sent first will be received first. For that reason, and the fact that no local updates are occurring on the hash map, the second published version of the hash map will never cause a recent write to be temporarily deleted from the hash map. Just to be safe, you may want to include a flag in the published message telling the subscriber actors which one of the subscribers can ignore the published hash map since it has already received it as the response to an ask.
This solution will guarantee a consistent hash map state, but this synchronization method may not be adequate for your application. The subscriber actors can overwrite each others' changes, and this may not be what you want. To prevent this situation, it may be necessary to separate business logic in the various subscriber actors.

Related

Is it possible to define a single saga which will process many messages

My team is considering if we can use mass transit as a primary solution for sagas in RabbitMq (vs NServiceBus). I admit that our experience which solution like masstransit and nserviceBus are minimal and we have started to introduce messaging into our system. So I sorry if my question will be simple or even stupid.
However, when I reviewed the mass transit documentation I noticed that I am not sure if that is possible to solve one of our cases.
The case looks like:
One of our components will produce up to 100 messages which will be "sent" to queue. These messages are a result of a single operation in a system. All of the messages will have the same Correlated Id and our internal publication id (same too).
1) is it possible to define a single instance saga (by correlated id) which will wait until it receives all messages from a queue and then process them as a single batch?
2) otherwise, is there any solution to ensure all of the sent messages was processed? (Consistency batch?) I assume that correlated Id will serve as a way to found an existing saga instance (singleton). In the ideal case, I would like to complete an instance of a saga When the system will process every message which belongs to a single group (to one publication)
I look at CompositeEvent too but I do not sure if I could use it to "ensure" that every message was processed and then I would let to complete saga for specific correlated Id.
Can you explain how could it be achieved? And into what mechanism I should look at in order to correlated id a lot of messages with the same id to the single saga and then complete if all of msg will be consumed?
Thank you in advance for any response
What you describe is how correlation by id works. It is like that out of the box.
So, in short - when you configure correlation for your messages correctly, all messages with the same correlation id will be handled by the same saga instance.
Concerning the second question - unless you publish a separate event that would inform the saga about how messages it should expect, how would it know that? You can definitely schedule a long timeout, attempting and assuming that within the timeout all the messages will be received by the saga, but it's not reliable.
Composite events won't help here since they are for messages with different types to be handled as one when all of them arrive and it doesn't count for the number of messages of each type. It just waits for one message of each type.
The ability to receive a series of messages and then operate on them in a batch is a common case, so much so that there is a sample showing how to do just that:
Batch Sample
Each saga instance has a unique correlation identifier, and as long as those messages can be correlated to that single instance, MassTransit will manage the concurrency (either optimistic or pessimistic, and depending upon the saga storage engine).
I'd suggest reviewing the state machine in the sample, and seeing how that compares to your scenario.

Which guarantees does Kafka Stream provide when using a RocksDb state store with changelog?

I'm building a Kafka Streams application that generates change events by comparing every new calculated object with the last known object.
So for every message on the input topic, I update an object in a state store and every once in a while (using punctuate), I apply a calculation on this object and compare the result with the previous calculation result (coming from another state store).
To make sure this operation is consistent, I do the following after the punctuate triggers:
write a tuple to the state store
compare the two values, create change events and context.forward them. So the events go to the results topic.
swap the tuple by the new_value and write it to the state store
I use this tuple for scenario's where the application crashes or rebalances, so I can always send out the correct set of events before continuing.
Now, I noticed the resulting events are not always consistent, especially if the application frequently rebalances. It looks like in rare cases the Kafka Streams application emits events to the results topic, but the changelog topic is not up to date yet. In other words, I produced something to the results topic, but my changelog topic is not at the same state yet.
So, when I do a stateStore.put() and the method call returns successfully, are there any guarantees when it will be on the changelog topic?
Can I enforce a changelog flush? When I do context.commit(), when will that flush+commit happen?
To get complete consistency, you will need to enable processing.guarantee="exaclty_once" -- otherwise, with a potential error, you might get inconsistent results.
If you want to stay with "at_least_once", you might want to use a single store, and update the store after processing is done (ie, after calling forward()). This minimized the time window to get inconsistencies.
And yes, if you call context.commit(), before input topic offsets are committed, all stores will be flushed to disk, and all pending producer writes will also be flushed.

How to solve two generals issue between event store and persistence layer?

Two General Problems - EventStore and persistence layer?
I would like to understand how industry is actually dealing with this problems!
If a microservice 1 persists object X into Database A. In the same time, for micro-service 2 to feed on the data from micro-service 1, micro-service 1 writes the same object X to an event store B.
Now, the question I have is, where do I write object X first?
Database A first and then to event store B, is it fair to roll back the thread at the app level if Database A is down? Also, what should be the ideal error handle if Database A is online and persisted object X but event store B is down?
What should be the error handle look like if we go vice-versa of point 1?
I do understand that in today's world of distributed high-available systems, systems going down is questionable thing. But, it can happen. I want to understand what needs to be done when either database or event store system/cluster is down?
In general you want to avoid relying on a two-phase commit of the kind you describe.
In general, (presuming an event-sourced system; not sure if that's implicit in your question/an option for you - perhaps SqlStreamStore might be relevant in your context?), this is typically managed by having something project from from a single authoritative set of events on a pull basis - each event being written that requires an associated action against some downstream maintains a pointer to how far it has got projecting events from the base stream, and restarts from there if interrupted.
First of all, an Event store is a type of Persistence, which stores the applications state as a series of events as opposed to a flat persistence that stores the last projected state.
If a microservice 1 persists object X into Database A. In the same time, for micro-service 2 to feed on the data from micro-service 1, micro-service 1 writes the same object X to an event store B.
You are trying to have two sources of truth that must be kept in sync by some sort of distributed transaction which is not very scalable.
This is an unusual mode of using an Event store. In general an Event store is the canonical source of information, the single source of truth. You are trying to use it as an communication channel. The Event store is the persistence of an event-sourced Aggregate (see Domain Driven Design).
I see to options:
you could refactor your architecture and make the object X and event-sourced entity having as persistence the Event store. Then have a Read-model subscribe to the Event store and build a flat representation of the object X that is persisted in the database A. In other words, write first to the Event store and then in the Database A (but in an eventually consistent manner!). This is a big jump and you should really think if you want to go event-sourced.
you could use CQRS without Event sourcing. This means that after every modification, the object X emits one or more Domain events, that are persisted in the Database A in the same local transaction as the object X itself. The microservice 2 could subscribe to the Database A to get the emitted events. The actual subscribing depends on the type of database.
I have a feeling you are using event store as a channel of communication, instead of using it as a database. If you want micro-service 2 to feed on the data from micro-service 1, then you should communicate with REST services.
Of course, relying on REST services might make you less resilient to outages. In that case, using a piece of technology dedicated to communication would be the right way to go. (I'm thinking MQ/Topics, such as RabbitMQ, Kafka, etc.)
Then, once your services are talking to each other, you will still need to persist your data... but only at one single location.
Therefore, you will need to define where you want to store the data.
Ask yourself:
Who will have the governance of the data persistance ?
Is it Microservice1 ? if so, then everytime Microservice2 needs to read the data, it will make a REST call to Microservice1.
is it the other way around ? Microservice2 has the governance of the data, and Microservice1 consumes it ?
It could be a third microservice that you haven't even created yet. It depends how you applied your separation of concerns.
Let's take an example :
Microservice1's responsibility is to process our data to export them in PDF and other formats
Microservice2's responsibility is to expose a service for a legacy partner, that requires our data to be returned in a very proprietary representation.
who is going to store the data, here ?
Microservice1 should not be the one to persist the data : its job is only to convert the data to other formats. If it requires some data, it will fetch them from the one having the governance of the data.
Microservice2 should not be the one to persist the data. After all, maybe we have a number of other Microservices similar to this one, but for other partners, with different proprietary formats.
If there is a service where you can do CRUD operations, this is your guy. If you don't have such a service, maybe you can find an existing Microservice who wouldn't have conflicting responsibilities.
For instance : if I have a Microservice3 that makes sure everytime an my ObjectX is changed, it will send a PDF-representation of it to some address, and notify all my partners that the data are out-of-date. In that scenario, this Microservice looks like a good candidate to become the "governor of the data" for this part of the domain, and be the one-stop-shop for writing/reading in the database.

Difference between Observer, Pub/Sub, and Data Binding

What is the difference between the Observer Pattern, Publish/Subscribe, and Data Binding?
I searched around a bit on Stack Overflow and did not find any good answers.
What I have come to believe is that data binding is a generic term and there are different ways of implementing it such as the Observer Pattern or the Pub/Sub pattern. With the Observer pattern, an Observable updates its Observers. With Pub/Sub, 0-many publishers can publish messages of certain classes and 0-many subscribers can subscribe to messages of certain classes.
Are there other patterns of implementing "data binding"?
There are two major differences between Observer/Observable and Publisher/Subscriber patterns:
Observer/Observable pattern is mostly implemented in a synchronous way, i.e. the observable calls the appropriate method of all its observers when some event occurs. The Publisher/Subscriber pattern is mostly implemented in an asynchronous way (using message queue).
In the Observer/Observable pattern, the observers are aware of the observable. Whereas, in Publisher/Subscriber, publishers and subscribers don't need to know each other. They simply communicate with the help of message queues.
As you mentioned correctly, data binding is a generic term and it can be implemented using either Observer/Observable or Publisher/Subscriber method. Data is the Publisher/Observable.
Here's my take on the three:
Data Binding
Essentially, at the core this just means "the value of property X on object Y is semantically bound to the value of property A on object B. No assumptions are made as to how Y knows or is fed changes on object B.
Observer, or Observable/Observer
A design pattern by which an object is imbued with the ability to notify others of specific events - typically done using actual events, which are kind of like slots in the object with the shape of a specific function/method. The observable is the one who provides notifications, and the observer receives those notifications. In .net, the observable can expose an event and the observer subscribes to that event with an "event handler" shaped hook. No assumptions are made about the specific mechanism which notifications occur, nor about the number of observers one observable can notify.
Pub/Sub
Another name (perhaps with more "broadcast" semantics) of the Observable/Observer pattern, which usually implies a more "dynamic" flavor - observers can subscribe or unsubscribe to notifications and one observable can "shout out" to multiple observers. In .NET, one can use the standard events for this, since events are a form of MulticastDelegate, and so can support delivery of events to multiple subscribers, and also support unsubscription. Pub/Sub has a slightly different meaning in certain contexts, usually involving more "anonymity" between event and eventer, which can be facilitated by any number of abstractions, usually involving some "middle man" (such as a message queue) who knows all parties, but the individual parties don't know about each other.
Data Binding, Redux
In many "MVC-like" patterns, the observable exposes some manner of "property changed notification" that also contains information about the specific property changed. The observer is implicit, usually created by the framework, and subscribes to these notifications via some binding syntax to specifically identify an object and property, and the "event handler" just copies the new value over, potentially triggering any update or refresh logic.
Data binding re Redux
An alternative implementation for data binding? Ok, here's a stupid one:
a background thread is started that constantly checks the bound property on an object.
if that thread detects that the value of the property has changed since last check, copy the value over to the bound item.
I am a bit amused that all the answers here were trying to explain the subtle difference between Observer and Pub/Sub patterns without giving any concrete examples. I bet most of the readers still don't know how to implement each one by reading one is synchronous and the other is asynchronous.
One thing to note is: The goal of these patterns is trying to decouple code
The Observer is a design pattern where an object (known as a subject) maintains a list of objects depending on it (observers), automatically notifying them of any changes to state.
Observer pattern
This means an observable object has a list where it keeps all its observers(which are usually functions). and can traverse this list and invoke these functions when it feels a good time.
see this observer pattern example for details.
This pattern is good when you want to listen for any data change on an object and update other UI views correspondingly.
But the Cons are Observables only maintain one array for keeping observers
(in the example, the array is observersList).
It does NOT differentiate how the update is triggered because it only has one notify function, which triggers all the functions stored in that array.
If we want to group observers handlers based on different events. We just need to modify that observersList to an Object like
var events = {
"event1": [handler1, handler2],
"event2": [handler3]
}
see this pubsub example for details.
and people call this variation as pub/sub. So you can trigger different functions based on the events you published.
I agree with your conclusion about both patterns, nevertheless, for me, I use Observable when I'm in the same process and I use the Pub/Sub in inter-process scenarios, where all parties only know the common channel but not the parties.
I don't know other patterns, or let me say this way, I've never needed another patterns for this task. Even most MVC frameworks and data binding implementations use usually internally the observer concept.
If you're interested in inter-process communication, I recommend you:
"Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions" -
https://www.enterpriseintegrationpatterns.com/
This book contains a lot of ideas about how to send messages between processes or classes that can be used even in intra-process communication tasks (it helped me to program in a more loose-coupled way).
I hope this helps!
One concrete difference is that an Observable is always engaged when an observer no longer wants to observe. But a subscriber can stop the subscription and the publisher will never be aware of these intent to unsubscribe

Google App Engine: Message class using list properties for receivers

I have a message model and I want it to have several receivers, possibly a lot of them.
I would also like to be able to tell for each receiver if the message was viewed or not (read/unread). Also I would like a receiver to be able to delete the message.
The two possible solutions are the following, for each I have a Message model an User model.
For the first (using the ideas presented here http://www.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html)
I have a MessageReceivers class which has a ListProperty containing the users that will receive the message and set the parent to the message. I query of this with messages = db.GqlQuery('SELECT __key__ FROM MessageReceivers WHERE receivers = :1', user) and the do a db.get([ key.parent() for key in messages ]).
The problem I have which this is that I'm not sure how to store the state of the message: whether it is read or not and a subsequent issue whether the user has new messages. An additional issue would be the overhead of deleting a message (would have to remove user from receivers list property)
For the second: I have a MessageReceiver for each receiver it has links to message and to user and also stores the state (read/unread).
Which of this two approached do you consider that it has a better performance? And in the case of the first do you have any suggestion on handling the status of the message.
I've implement first option in production. Drawback is that ListProperty is limited to 2500 entries if you use custom index. Shameless plug: See my blog bost http://bravenewmethod.wordpress.com/2011/03/23/developing-on-google-app-engine-for-production/
Read state storing. I did this by implementing an entity that stored unread messages up to few months back and then just assumed older ones read. Even simpler is to query the messages in date order, and store the last known message timestamp in entity and assume all older as read. I don't recommended keeping long history in entity with huge list property, because reading and storing such entities can get really slow.
Message deletion is expensive, no way around that.
If you need to store state per message, your best option is to write one entity per recipient, with read state (and anything else, such as flags, etcetera), rather than using the index relation pattern.

Resources