How to handle set based consistency validation in CQRS? - validation

I have a fairly simple domain model involving a list of Facility aggregate roots. Given that I'm using CQRS and an event-bus to handle events raised from the domain, how could you handle validation on sets? For example, say I have the following requirement:
Facility's must have a unique name.
Since I'm using an eventually consistent database on the query side, the data in it is not guaranteed to be accurate at the time the event processesor processes the event.
For example, a FacilityCreatedEvent is in the query database event processing queue waiting to be processed and written into the database. A new CreateFacilityCommand is sent to the domain to be processed. The domain services query the read database to see if there are any other Facility's registered already with that name, but returns false because the CreateNewFacilityEvent has not yet been processed and written to the store. The new CreateFacilityCommand will now succeed and throw up another FacilityCreatedEvent which would blow up when the event processor tries to write it into the database and finds that another Facility already exists with that name.

The solution I went with was to add a System aggregate root that could maintain a list of the current Facility names. When creating a new Facility, I use the System aggregate (only one System as a global object / singleton) as a factory for it. If the given facility name already exists, then it will throw a validation error.
This keeps the validation constraints within the domain and does not rely on the eventually consistent query store.

Three approaches are outlined in Eventual Consistency and Set Validation:
If the problem is rare or not important, deal with it administratively, possibly by sending a notification to an admin.
Dispatch a DuplicateFacilityNameDetected event, which could kick off an automated resolution process.
Maintain a Service that knows about used Facility names, maybe by listening to domain events and maintaining a persistent list of names. Before creating any new Facility, check with this service first.
Also see this related question: Uniqueness validation when using CQRS and Event sourcing

In this case, you may implement a simple CRUD style service that basically does an insert in a Sql table with a primary key constraint.
The insert will only happen once. When duplicate commands with the same value that should only exist one time hits the aggregate, the aggregate calls the service, the service fails the Insert operation due to a violation of the Primary Key constraint, throws an error, the whole process fails and no events are generated, no reporting in the query side, maybe a reporting of the failure in a table for eventual consistency checking where the user can query to know the status of the command processing. To check that, just query again and again the Command Status View Model with the Command Guid.
Obviously, when the command holds a value that does not exists in the table for primary key checking, the operation is a success.
The table of the primary key constraint should be only be used as a service, but, because you implemented Event sourcing, you can replay the events to rebuild the table of primary key constraint.

Because uniqueness check would be done before data writing, so the better method is to build a event-tracking service, which would send a notification when the process finished or terminated.

Related

What should be projection primary key on query side - CQRS, Event Sourcing, Microservices

I have one thing that confuses me.
I have 2 microservices.
One creates commands and other consumes commands and produces events (events are stored in Event Store).
In my example aggregates have Guid as Entity ID, and Guid is created when aggregate is created.
Thing that confuses me is, should that key (generated on write side) be transfered via Event to query side (microservice that created command)?
Or maybe query side (projection) should have separate id in read DB.
Or maybe I should generate some shared key?
What is best solution here?
I think it all depends on your setup.
If you are doing CQRS, and you have a separate read-service (within the same bounded context), then it is up to the read-side service to model the data as it wish, either reusing the same keys or not.
If you are communicating between two different services (separate bounded contexts) then I recommend you create new primary keys in the receiving service and use the incoming key as a foreign key. Just as you would do with relationships between two tables in a SQL-database.
I think this depends on your requirements. Is there a specific reason to have different keys?
Given that you are using Guids as your PK, it seems simplest to reuse the PKs assigned by the write side.
Some reasons you might want to keep the keys consistent:
During command processing an ID was returned to the client that they may have cached and should reasonably expect to be able to use that key when querying the read side.
If your write side data is long-lived and there is an bug on your read side output, it is gonna be much easier to debug what went wrong if your keys are consistent on write and read side.
Entities in the write side will use the write side Guid PK of another entity as its FK. When you emit an event for this new dependent entity you would want the read side to be able to build the relationship back to the principal.
This is kind of an odd question.
Your primary key on a projection could literally be anything or you might not even have one.
There is no "correct answer" for this question ... It depends entirely on the projection.
What if my projection was say just a flattening out of information associated to an aggregate ... As example we have an "order" and we make a row per order showing summary information about that order. Using an "OrderId" here would seemingly make some sense as my primary key.
What if my projection was building out counts of orders by Product? Well then using a "ProductItemId" would make a lot more sense.
What if in either of these cases the Ids themselves ("OrderId" and "ProductItemId") could change? Well then using another key might make a lot of sense.
What if this is an append-only table? I might not even want to have a key.
Again, there is not a ... correct ... answer here there are many situations that you may run into.

How to avoid concurrent requests to a lambda

I have a ReportGeneration lambda that takes request from client and adds following entries to a DDB table.
Customer ID <hash key>
ReportGenerationRequestID(UUID) <sort key>
ExecutionStartTime
ReportExecutionStatus < workflow status>
I have enabled DDB stream trigger on this table and a create entry in this table triggers the report generation workflow. This is a multi-step workflow that takes a while to complete.
Where ReportExecutionStatus is the status of the report processing workflow.
I am supposed to maintain the history of all report generation requests that a customer has initiated.
Now What I am trying to do is avoid concurrent processing requests by the same customer, so if a report for a customer is already getting generated don’t create another record in DDB ?
Option Considered :
query ddb for the customerID(consistent read) :
- From the list see if any entry is either InProgress or Scheduled
If not then create a new one (consistent write)
Otherwise return already existing
Issue: If customer clicks in a split second to generate report, two lambdas can be triggered, causing 2 entires in DDB and two parallel workflows can be initiated something that I don’t want.
Can someone recommend what will be the best approach to ensure that there are no concurrent executions (2 worklflows) for the same Report from same customer.
In short when one execution is in progress another one should not start.
You can use ConditionExpression to only create the entry if it doesn't already exist - if you need to check different items, than you can use DynamoDB Transactions to check if another item already exists and if not, create your item.
Those would be the ways to do it with DynamoDB, getting a higher consistency.
Another option would be to use SQS FIFO queues. You can group them by the customer ID, then you wouldn't have concurrent processing of messages for the same customer. Additionally with this SQS solution you get all the advantages of using SQS - like automated retry mechanisms or a dead letter queue.
Limiting the number of concurrent Lambda executions is not possible as far as I know. That is the whole point of AWS Lambda, to easily scale and run multiple Lambdas concurrently.
That said, there is probably a better solution for your problem using a DynamoDB feature called "Strongly Consistent Reads"
By default reads to DynamoDB (if you use the AWS SDK) are eventually consistent, causing the behaviour you observed: Two writes to the same table are made but your Lambda only was able to notice one of those writes.
If you use Strongly consistent reads, the documentation states:
When you request a strongly consistent read, DynamoDB returns a response with the most up-to-date data, reflecting the updates from all prior write operations that were successful.
So your Lambda needs to do a strongly consistent read to your table to check if the customer already has a job running. If there is already a job running the Lambda does not create a new job.

CQRS Event-sourcing and own database per microservice

I have some questions above event-sourcing and cqrs in microservices architecture.
I understand that after send command some microservice executes it and emits event. Event-store subcsribes on it and saves inside his database. Also some ReadModel basing on this event generates and saves optimized data inside read database.
My first question is - Can microservice has his own database and store
data inside it too? Or maybe in event-sourcing approach microservices
don't have their own databases and everything is only stored inside
event store?
My second question is - when I execute command in microservice and
need some data for validation purposes do I need call ReadModel or
what? Assuming microservices haven't got their own databases I have no
choice?
Can microservice has his own database and store data inside it too?
Definitely, microservice can have its own database. But let's use terms from ES/CQRS. Database can represent Event Store (append-only log of immutabale events) and Read Model - some database used to answer queries which is populated by proseccing events.
So, microservice can have its own Read model, populated from events from other microservices.
Or microservice can process commands and save events to the shared Event Store.
Or microservice can process commands and save events to its own Event store.
Choice is yours, and it depends on degree of separation you want to achieve among microservices.
I would put all events that usually consumed together into same Event store. Which means I should be able to query for these events and have a single ordered stream as a result.
when I execute command in microservice and need some data for validation purposes do I need call ReadModel or what?
Command is executed by Aggregate, that has its own state. This state is built by processing all events for this aggregate, and this state should be used to validate a command.
You cannot/should not talk to Read Models in the command handler, primarily because those read models are not consistent with aggregate state. Aggregate state is consistent.
You can query Read Model before sending a command (to make sure it can be sent). But in command handler you need to rely on aggregate state only.
There is a famous case of registering user with requirement of a unique name. As a primary validation, in your UI code you can query read model and tell user that entered name is taken. If name is not taken, UI lets user issue a command. I'm assuming your Aggregate root is user.
But when processing this command ({id:123, type:CREATE_USER, name:somename}) you cannot check that "somename" is taken, because aggregate state for user 123 does not contain a list of taken names. You can potentially query some AllUsernames read model, but it can be milliseconds old, and some other user could take this "somename" already. So in this scenario, you will find a duplication during adding names to read model. And at that point you can do some compensation action - usually issue a command to suspend a user with duplicated name and ask him to re-register or change his name somehow.
It may seems strange, but if you have a really distributed system with several replicas of user list, you'll have the same problem, so why not just embrace the fact that data is always not fully consistent, and just deal with it?

DocumentDB unique concurrent insert?

I have a horizontally event-source driven application that runs using an Azure Service Bus Topic and a Service Bus Queue. Some events for building up my domain model's state are received through the topic by all my servers, while the ones on the queue (the ones received a lot more often and not mutating domain model state) are distributed among the servers in order to distribute the load.
Now, every time one of my servers receives an event through the queue or topic, it stores it in a DocumentDB which it uses as event store.
Now here's the problem. How can I be sure that the same document is not inserted twice? Let's say 3 servers receive the same event. They all try to store it. How can I make it fail for 2 of the servers in the case they decide to do it all at the same time? Is there any form of unique constraint I can set in DocumentDB or some kind of transaction scope to prevent the document from being inserted twice?
The id property for each document has a uniqueness constraint. You can use this constraint to ensure that duplicate documents are not written to a collection.

Windows Azure Run Once Routine

I'm trying to initialize my data in my Azure Data Tables but I only want this to happen once on the server at startup (i.e. via the WebRole Role Entry OnStart routine). The problem is if I have multiple instances starting up at the same time then potentially either one of those instances can add records to the same table at the same time hence duplicating the data at runtime.
Is there there like an overarching routine for all instances? An application object in which I can shove a value into and check it in each of the instances to see if the tables have been created or not? A singleton of some sort that azure exposes?
Cheers
Rob
No, but you could use a Blob lease as a mutex. You could also use a table lock in SQL Azure, if you're using that.
You could also use a Queue, and drop a message in there and then just one role would pick up the message and process it.
You could create a new single instance role that does this job on role start.
To be really paranoid about this and address the event of failure in the middle of writing the data, you can do something even more complex.
A queue message is a great way to ensure transactional capabilities as long as the work you are doing can be idempotent.
Each instance adds a message to a queue.
Each instance polls the queue and on receiving a message
Reads the locking row from the table.
If the ‘create data state’ value is ‘unclaimed’
Attempts to update the row with a ‘in process’ value and a timeout expiration timestamp based on the amount of time needed to create the data.
if the update is successful, the instance owns the task of creating the data
So create the data
update the ‘create data state’ to ‘committed’
delete the message
else if the update is unsuccessful the instance does not own the task
so just delete the message.
Else if the ‘create data’ value is ‘in process’, check if the current time is past the expiration timestamp.
That would imply that the ‘in process’ failed
So try all over again to set the state to ‘in process’, delete the incomplete written rows
And try recreating the data, updating the state and deleting the message
Else if the ‘create data’ value is ‘committed’
Just delete the queue message, since the work has been done

Resources