Oracle BPEL receive message (Oracle SOA 12.2.1.4.0) - oracle

I would like to insert in a BPEL flow a sort of event listener that waits for a message.
I thought about implementing this with the "receive / message" component, but I didn't understand how it should be configured to intercept
one and only one message, that is precisely related to the current instance of the flow.
I defined a variable CorrelationId to store an unique identifier; next, on the component "receive message" I defined a correlation set, but I didn't understand how to pass the correlationID to it

Not sure how this composite gets called, but you could receive the message(s) in one composite and either put them in a JMS queue with a second composite that dequeues and processes, or you could put the messages into a table and have the second composite poll the table using the database adapter, setting maxTransactionSize=1.

Related

Spring boot kafka: Microservice multi instances, concurrency and partitions

I have a question about the way of publishing and reading messages in kafka for microservices arquitectures with multiple instance of the same microservices for writing and reading.
My main problem here is that the microservices that publish and read are configure with an autoscaling but a default numer of instances of 1.
The point is that I have an entity, let call it "Event" that are stored in the DDBB and each entity has its own ID in the DDBB. When some specific command are executed in a specific entity (let say with entityID = ajsha87) it must be published a message that will be readed by a consumer. if each of this messages for the same entity is writen in diferent partitions and cosumed at the same time (Concurrency issue) I will have a lot of problems.
My question is about if according to the entityID for example I can set in which partitions all events of this specific entity will be published. For another entity with different ID I dont care about the partion but the messages for the same entity must be always published in the same partition to avoid that a consumer will read a messages (2) published after a message (1).
There is any mechanism to do that, or each time I save the entity I have randomly store in the DDBB the partition ID in which its messages will be published?
Same happens with consumers. Only one consumer can read a partition at the same time because if not, a consumer number 1 can read the message (1) from partition (1) realted with entity (ID=78198) and then another can read the message (2) from partition (1) ralated with the same entity and process the message 2 before number one.
There is any mechanish about subscribe each instance only to one partition according to the microservice autoscaling?
Another option it will be to assign dinamically for each new publisher instance a partition, but I dont know how to configure that dinamically to set diferent particions IDs according to the microservice instance
I am using spring boot by the way
Thanks for you answer and recomendations and sorry if my english is not good enough.
If you use Hash Partitioner as the partitioner in producer config (This is the default partitioner in many libraries), and use same key for same entity (let say with entityID = ajsha87) kafka manages to send all messages with same key to same partition.
If you are using group consumer, One consumer instance take the responsibility of one partition and all messages published to that partition consumes by that instance only. Instance can be changed if there is rebalancing when upscaling. but still messages in same partition will read from one consumer instance.

Version number in event sourcing aggregate?

I am building Microservices. One of my MicroService is using CQRS and Event sourcing. Integration events are raised in the system and i am saving my aggregates in event store also updating my read model.
My questions is why we need version in aggregate when we are updating the event stream against that aggregate ? I read we need this for consistency and events are to be replayed in sequence and we need to check version before saving (https://blog.leifbattermann.de/2017/04/21/12-things-you-should-know-about-event-sourcing/) I still can't get my head around this since events are raised and saved in order , so i really need concrete example to understand what benefit we get from version and why we even need them.
Many thanks,
Imran
Let me describe a case where aggregate versions are useful:
In our reSove framework aggregate version is used for optimistic concurrency control.
I'll explain it by example. Let's say InventoryItem aggregate accept commands AddItems and OrderItems. AddItems increases number of items in stock, OrderItems - decreases.
Suppose you have an InventoryItem aggregate #123 with one event - ITEMS_ADDED with quantity of 5. Aggregate #123 state say there are 5 items in stock.
So your UI is showing users that there are 5 items in stock. User A decide to order 3 items, user B - 4 items. Both issue OrderItems commands, almost at the same time, let's say user A is first by couple milliseconds.
Now, if you have a single instance of aggregate #123 in memory, in the single thread, you don't have a problem - first command from user A would succeed, event would be applied, state say quantity is 2, so second command from user B would fail.
In a distributed or serverless system where commands from A and B would be in separate processes, both commands would succeed and bring aggregate into incorrect state if we don't use some concurrency control. There several ways to do this - pessimistic locking, command queue, aggregate repository or optimistic locking.
Optimistic locking seems to be simplest and most practical solution:
We say that every aggregate has a version - number of events in its stream. So our aggregate #123 has version 1.
When aggregate emits an event, this event data has an aggregate version. In our case ITEMS_ORDERED events from users A and B will have event aggregate version of 2. Obviously, aggregate events should have versions to be sequentially increasing. So what we need to do is just put a database constraint that tuple {aggregateId, aggregateVersion} should be unique on write to event store.
Let's see how our example would work in a distributed system with optimistic concurrency control:
User A issues a command OrderItem for aggregate #123
Aggregate #123 is restored from events {version 1, quantity 5}
User B issues a command OrderItem for aggregate #123
Another instance of Aggregate #123 is restored from events (version 1, quantity 5)
Instance of aggregate for user A performs a command, it succeeds, event ITEMS_ORDERED {aggregateId 123, version 2} is written to event store.
Instance of aggregate for user B performs a command, it succeeds, event ITEMS_ORDERED {aggregateId 123, version 2} it attempts to write it to event store and fails with concurrency exception.
On such exception command handler for user B just repeats the whole procedure - then Aggregate #123 would be in a state of {version 2, quantity 2} and command will be executed correctly.
I hope this clears the case where aggregate versions are useful.
Yes, this is right. You need the version or a sequence number for consistency.
Two things you want:
Correct ordering
Usually events are idempotent in nature because in a distributed system idempotent messages or events are easier to deal with. Idempotent messages are the ones that even when applied multiple times will give the same result. Updating a register with a fixed value (say one) is idempotent, but incrementing a counter by one is not. In distributed systems when A sends a message to B, B acknowledges A. But if B consumes the message and due to some network error the acknowledgement to A is lost, A doesn't know if B received the message and so it sends the message again. Now B applies the message again and if the message is not idempotent, the final state will go wrong. So, you want idempotent message. But if you fail to apply these idempotent messages in the same order as they are produced, your state will be again wrong. This ordering can be achieved using the version id or a sequence. If your event store is an RDBMS you cannot order your events without any similar sort key. In Kafka also, you have the offset id and client keeps track of the offset up to which it has consumed
Deduplication
Secondly, what if your messages are not idempotent? Or what if your messages are idempotent but the consumer invokes some external services in a non-deterministic way. In such cases, you need an exactly-once semantics because if you apply the same message twice, your state will be wrong. Here also you need the version id or sequence number. If at the consumer end, you keep track of the version id you have already processed, you can dedupe based on the id. In Kafka, you might then want to store the offset id at the consumer end
Further clarifications based on comments:
The author of the article in question assumed an RDBMS as an event store. The version id or the event sequence is expected to be generated by the producer. Therefore, in your example, the "delivered" event will have a higher sequence than the "in transit" event.
The problem happens when you want to process your events in parallel. What if one consumer gets the "delivered" event and the other consumer gets the "in transit" event? Clearly you have to ensure that all events of a particular order are processed by the same consumer. In Kafka, you solve this problem by choosing order id as the partition key. Since one partition will be processes by one consumer only, you know you'll always get the "in transit" before delivery. But multiple orders will be spread across different consumers within the same consumer group and thus you do parallel processing.
Regarding aggregate id, I think this is synonymous to topic in Kafka. Since the author assumed RDBMS store, he needs some identifier to segregate different categories of message. You do that by creating separate topics in Kafka and also consumer groups per aggregate.

DocumentDB unique concurrent insert?

I have a horizontally event-source driven application that runs using an Azure Service Bus Topic and a Service Bus Queue. Some events for building up my domain model's state are received through the topic by all my servers, while the ones on the queue (the ones received a lot more often and not mutating domain model state) are distributed among the servers in order to distribute the load.
Now, every time one of my servers receives an event through the queue or topic, it stores it in a DocumentDB which it uses as event store.
Now here's the problem. How can I be sure that the same document is not inserted twice? Let's say 3 servers receive the same event. They all try to store it. How can I make it fail for 2 of the servers in the case they decide to do it all at the same time? Is there any form of unique constraint I can set in DocumentDB or some kind of transaction scope to prevent the document from being inserted twice?
The id property for each document has a uniqueness constraint. You can use this constraint to ensure that duplicate documents are not written to a collection.

Windows Azure Run Once Routine

I'm trying to initialize my data in my Azure Data Tables but I only want this to happen once on the server at startup (i.e. via the WebRole Role Entry OnStart routine). The problem is if I have multiple instances starting up at the same time then potentially either one of those instances can add records to the same table at the same time hence duplicating the data at runtime.
Is there there like an overarching routine for all instances? An application object in which I can shove a value into and check it in each of the instances to see if the tables have been created or not? A singleton of some sort that azure exposes?
Cheers
Rob
No, but you could use a Blob lease as a mutex. You could also use a table lock in SQL Azure, if you're using that.
You could also use a Queue, and drop a message in there and then just one role would pick up the message and process it.
You could create a new single instance role that does this job on role start.
To be really paranoid about this and address the event of failure in the middle of writing the data, you can do something even more complex.
A queue message is a great way to ensure transactional capabilities as long as the work you are doing can be idempotent.
Each instance adds a message to a queue.
Each instance polls the queue and on receiving a message
Reads the locking row from the table.
If the ‘create data state’ value is ‘unclaimed’
Attempts to update the row with a ‘in process’ value and a timeout expiration timestamp based on the amount of time needed to create the data.
if the update is successful, the instance owns the task of creating the data
So create the data
update the ‘create data state’ to ‘committed’
delete the message
else if the update is unsuccessful the instance does not own the task
so just delete the message.
Else if the ‘create data’ value is ‘in process’, check if the current time is past the expiration timestamp.
That would imply that the ‘in process’ failed
So try all over again to set the state to ‘in process’, delete the incomplete written rows
And try recreating the data, updating the state and deleting the message
Else if the ‘create data’ value is ‘committed’
Just delete the queue message, since the work has been done

How to handle set based consistency validation in CQRS?

I have a fairly simple domain model involving a list of Facility aggregate roots. Given that I'm using CQRS and an event-bus to handle events raised from the domain, how could you handle validation on sets? For example, say I have the following requirement:
Facility's must have a unique name.
Since I'm using an eventually consistent database on the query side, the data in it is not guaranteed to be accurate at the time the event processesor processes the event.
For example, a FacilityCreatedEvent is in the query database event processing queue waiting to be processed and written into the database. A new CreateFacilityCommand is sent to the domain to be processed. The domain services query the read database to see if there are any other Facility's registered already with that name, but returns false because the CreateNewFacilityEvent has not yet been processed and written to the store. The new CreateFacilityCommand will now succeed and throw up another FacilityCreatedEvent which would blow up when the event processor tries to write it into the database and finds that another Facility already exists with that name.
The solution I went with was to add a System aggregate root that could maintain a list of the current Facility names. When creating a new Facility, I use the System aggregate (only one System as a global object / singleton) as a factory for it. If the given facility name already exists, then it will throw a validation error.
This keeps the validation constraints within the domain and does not rely on the eventually consistent query store.
Three approaches are outlined in Eventual Consistency and Set Validation:
If the problem is rare or not important, deal with it administratively, possibly by sending a notification to an admin.
Dispatch a DuplicateFacilityNameDetected event, which could kick off an automated resolution process.
Maintain a Service that knows about used Facility names, maybe by listening to domain events and maintaining a persistent list of names. Before creating any new Facility, check with this service first.
Also see this related question: Uniqueness validation when using CQRS and Event sourcing
In this case, you may implement a simple CRUD style service that basically does an insert in a Sql table with a primary key constraint.
The insert will only happen once. When duplicate commands with the same value that should only exist one time hits the aggregate, the aggregate calls the service, the service fails the Insert operation due to a violation of the Primary Key constraint, throws an error, the whole process fails and no events are generated, no reporting in the query side, maybe a reporting of the failure in a table for eventual consistency checking where the user can query to know the status of the command processing. To check that, just query again and again the Command Status View Model with the Command Guid.
Obviously, when the command holds a value that does not exists in the table for primary key checking, the operation is a success.
The table of the primary key constraint should be only be used as a service, but, because you implemented Event sourcing, you can replay the events to rebuild the table of primary key constraint.
Because uniqueness check would be done before data writing, so the better method is to build a event-tracking service, which would send a notification when the process finished or terminated.

Resources