Is there a way to combine a query and a command in CQRS? - events

I have a project built using CQRS, but I can't figure out how to implement one use case.
The user needs to be able to make a Query which will return a set of data for them to view. However, I also need to save the data they got at the same time.
Is there a way to do this within a Query without violating CQRS' principles? Or would the Query and Command need to be two separate API calls one after another?

In CQRS it is your client who can do both command and queries. This client is not necessary a piece of UI.
It can be an API endpoint handler, which would
receive a query
forward it to the query endpoint
wait for the answer
send an answer to the caller
send a command to store the answer

Is there a way to do this within a Query without violating CQRS' principles?
It depends.
If "save the data" means "make some change to the domain model"... well, that would be pretty weird.
Asking a question should not change the answer. -- Bertrand Meyer
On the other hand, logging/telemetry are pretty normal ways to track the activity of an application, so that should be fine.
There are some realities of a distributed system on an unreliable network that you need to be aware of (what should the behavior be if the telemetry system is not available? What are the consequences of recording queries that don't actually reach the client (because the network is unreliable).

As #VoiceOfUnreason stated, it may be somewhat strange to effect domain changes when querying data.
However, it may be that you could swop that around.
For instance, perhaps one could query a forecast of sorts. We would want to store that forecast. It then seems as though the query results in us having to save the result. This appears to break CQS at some level since each query would result in a change of state.
If we swop that around and first request a forecast via the domain handling and then that produces a result, or even a pointer to the result, then the query would be something you could perform on the data multiple times without "breaking" CQS.

Related

CQRS commands and GraphQL mutations

I've just learned about CQRS, and I would like to combine it in a project with a GraphQL based API. However, in order to do that, a question has come to my mind: according to CQRS, commands have to not return anything after its execution. However, according to GraphQL conventions, mutations have to return the updated state of the entity.
How should I deal with that? Are CQRS and GraphQL incompatible? The only solution that comes to my mind is, in order to resolve the mutation, first execute a command and later a query, in order to get the response object. Is there anything better than that? It doesn't look very efficent to me...
Thanks in advance
How should I deal with that?
Real answer? Ignore the "have to not return anything" constraint; the underlying assumptions behind that constraint don't hold, so you shouldn't be leaning to hard on it.
How exactly to do that is going to depend on your design.
For example, if you are updating the domain model in the same process that handles the HTTP Request, then it is a perfectly reasonable thing to (a) save the domain model, (b) run your view projection on the copy of the model that you just saved, (c) and then return the view.
In other words, the information goes through exactly the same transformations it would "normally", except that we perform those transformations synchronously, rather than asynchronously.
If the model is updated in a different process, then things get trickier, since more message passing is required, and you may need to deal with timeouts. For instance, you can imagine a solution where you send the command, and then poll the "read side" until that model is updated to reflect your changes.
It's all trade offs, and those trade-offs are an inevitable consequence of choosing a distributed architecture. We don't choose CQRS because it makes everything better, we choose CQRS because it makes some things better, other things worse, and we are in a context where the things it makes better are more important than the things it makes worse.
I am considering similar, i.e. using GraphQL predominantly for interfacing with the read-side of a system based on CQRS.
On the write-side, however, I am considering using a Web or REST API that has one end-point that accepts commands.
Remember, in CQRS you don't directly mutate entities but submit a command signalling your intent / desire to do something.
Alternatively, thinking out loud here, it may be possible to use mutations in GraphQL to create commands and track their status using subscriptions.

Possible to implement a "job" pattern with GraphQL

Is there a reasonable way to implement a job-based query paradigm in GraphQL?
In particular, something like the following:
Caller submits a search request
Backend returns a job ID
Caller receives status updates on the job as it runs
Caller separately can retrieve pages of data from the job results
I guess the problem I see here is that we are splitting up the process into two steps: One is making the request and the second is retrieving data. As a result, the fields requested in the first request do not correspond with what is returned (just a job ID). And similarly, a call to retrieve results has the same issue.
Subscriptions don't really solve this problem either, I don't believe. They might help with requesting data that might take a long time to return I think, but that isn't quite the same as a job-based API.
Maybe this is a niche use case, and I have no doubt that it wasn't what GraphQL was initially built to solve. But, I'm just wondering if this is something doable, or if this is more of trying to fit a square peg into a round hole.

GraphQL Asynchronous query results

I'm trying to implement a batch query interface with GraphQL. I can get a request to work synchronously without issue, but I'm not sure how to approach making the result asynchronous. Basically, I want to be able to kick off the query and return a pointer of sorts to where the results will eventually be when the query is done. I'd like to do this because the queries can sometimes take quite a while.
In REST, this is trivial. You return a 202 and return a Location header pointing to where the client can go to fetch the result. GraphQL as a specification does not seem to have this notion; it appears to always want requests to be handled synchronously.
Is there any convention for doing things like this in GraphQL? I very much like the query specification but I'd prefer to not leave the client HTTP connection open for up to a few minutes while a large query is executed on the backend. If anything happens to kill that connection the entire query would need to be retried, even if the results themselves are durable.
What you're trying to do is not solved easily in a spec-compliant way. Apollo introduced the idea of a #defer directive that does pretty much what you're looking for but it's still an experimental feature. I believe Relay Modern is trying to do something similar.
The idea is effectively the same -- the client uses a directive to mark a field or fragment as deferrable. The server resolves the request but leaves the deferred field null. It then sends one or more patches to the client with the deferred data. The client is able to apply the initial request and the patches separately to its cache, triggering the appropriate UI changes each time as usual.
I was working on a similar issue recently. My use case was to submit a job to create a report and provide the result back to the user. Creating a report takes couple of minutes which makes it an asynchronous operation. I created a mutation which submitted the job to the backend processing system and returned a job ID. Then I periodically poll the jobs field using a query to find out about the state of the job and eventually the results. As the result is a file, I return a link to a different endpoint where it can be downloaded (similar approach Github uses).
Polling for actual results is working as expected but I guess this might be better solved by subscriptions.

Compensating Events on CQRS/ES Architecture

So, I'm working on a CQRS/ES project in which we are having some doubts about how to handle trivial problems that would be easy to handle in other architectures
My scenario is the following:
I have a customer CRUD REST API and each customer has unique document(number), so when I'm registering a new customer I have to verify if there is another customer with that document to avoid duplicity, but when it comes to a CQRS/ES architecture where we have eventual consistency, I found out that this kind of validations can be very hard to address.
It is important to notice that my problem is not across microservices, but between the command application and the query application of the same microservice.
Also we are using eventstore.
My current solution:
So what I do today is, in my command application, before saving the CustomerCreated event, I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right? Because my query can be desynchronized, so I cannot trust it 100%. That's when my second validation kicks in, when my query application is processing the events and saving them to my PostgreSQL, I check again if there is a customer with that document and if there is, I reject that event and emit a compensating event to undo/cancel/inactivate the customer with the duplicated document, therefore finishing that customer stream on eventstore.
Altough this works, there are 2 things that bother me here, the first thing is my command application relying on the query application, so if my query application is down, my command is affected (today I just return false on my validation if query is down but still...) and second thing is, should a query/read model really be able to emit events? And if so, what is the correct way of doing it? Should the command have some kind of API for that? Or should the query emit the event directly to eventstore using some common shared library? And if I have more than one view/read? Which one should I choose to handle this?
Really hope someone could shine a light into these questions and help me this these matters.
For reference, you may want to be reviewing what Greg Young has written about Set Validation.
I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right?
That's exactly right - your read model is stale copy, and may not have all of the information collected by the write model.
That's when my second validation kicks in, when my query application is processing the events and saving them to my PostgreSQL, I check again if there is a customer with that document and if there is, I reject that event and emit a compensating event to undo/cancel/inactivate the customer with the duplicated document, therefore finishing that customer stream on eventstore.
This spelling doesn't quite match the usual designs. The more common implementation is that, if we detect a problem when reading data, we send a command message to the write model, telling it to straighten things out.
This is commonly referred to as a process manager, but you can think of it as the automation of a human supervisor of the system. Conceptually, a process manager is an event sourced collection of messages to be sent to the command model.
You might also want to consider whether you are modeling your domain correctly. If documents are supposed to be unique, then maybe the command model should be using the document number as a key in the book of record, rather than using the customer. Or perhaps the document id should be a function of the customer data, rather than being an arbitrary input.
as far as I know, eventstore doesn't have transactions across different streams
Right - one of the things you really need to be thinking about in general is where your stream boundaries lie. If set validation has significant business value, then you really need to be thinking about getting the entire set into a single stream (or by finding a way to constrain uniqueness without using a set).
How should I send a command message to the write model? via API? via a message broker like Kafka?
That's plumbing; it doesn't really matter how you do it, so long as you are sure that the command runs within its own transaction/unit of work.
So what I do today is, in my command application, before saving the CustomerCreated event, I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right? Because my query can be desynchronized, so I cannot trust it 100%.
No, you cannot safely rely on the query side, which is eventually consistent, to prevent the system to step into an invalid state.
You have two options:
You permit the system to enter in a temporary, pending state and then, eventually, you will bring it into a valid permanent state; for this you could allow the command to pass, yield CustomerRegistered event and using a Saga/Process manager you verify against a uniquely-indexed-by-document-collection and issue a compensating command (not event!), i.e. UnregisterCustomer.
Instead of sending a command, you create&start a Saga/Process that preallocates the document in a uniquely-indexed-by-document-collection and if successfully then send the RegisterCustomer command. You can model the Saga as an entity.
So, in both solution you use a Saga/Process manager. In order for the system to be resilient you should make sure that RegisterCustomer command is idempotent (so you can resend it if the Saga fails/is restarted)
You've butted up against a fairly common problem. I think the other answer by VoicOfUnreason is worth reading. I just wanted to make you aware of a few more options.
A simple approach I have used in the past is to create a lookup table. Your command tries to register the key in a unique constraint table. If it can reserve the key the command can go ahead.
Depending on the nature of the data and the domain you could let this 'problem' occur and raise additional events to mark it. If it is something that's important to the business/the way the application works then you can deal with it either manually or at the time via compensating commands. if the latter then it would make sense to use a process manager.
In some (rare) cases where speed/capacity is less of an issue then you could consider old-fashioned locking and transactions. Admittedly these are much better suited to CRUD style implementations but they can be used in CQRS/ES.
I have more detail on this in my blog post: How to Handle Set Based Consistency Validation in CQRS
I hope you find it helpful.

How to update/migrate data when using CQRS and an EventStore?

So I'm currently diving the CQRS architecture along with the EventStore "pattern".
It opens applications to a new dimension of scalability and flexibility as well as testing.
However I'm still stuck on how to properly handle data migration.
Here is a concrete use case:
Let's say I want to manage a blog with articles and comments.
On the write side, I'm using MySQL, and on the read side ElasticSearch, now every time a I process a Command, I persist the data on the write side, dispatch an Event to persist the data on the read side.
Now lets say I've some sort of ViewModel called ArticleSummary which contains an id, and a title.
I've a new feature request, to include the article tags to my ArticleSummary, I would add some dictionary to my model to include the tags.
Given the tags did already exist in my write layer, I would need to update or use a new "table" to properly use the new included data.
I'm aware of the EventLog Replay strategy which consists in replaying all the events to "update" all the ViewModel, but, seriously, is it viable when we do have a billion of rows?
Is there any proven strategies? Any feedbacks?
I'm aware of the EventLog Replay strategy which consists in replaying
all the events to "update" all the ViewModel, but, seriously, is it
viable when we do have a billion of rows?
I would say "yes" :)
You are going to write a handler for the new summary feature that would update your query side anyway. So you already have the code. Writing special once-off migration code may not buy you all that much. I would go with migration code when you have to do an initial update of, say, a new system that requires some data transformation once off, but in this case your infrastructure would exist.
You would need to send only the relevant events to the new handler so you also wouldn't replay everything.
In any event, if you have a billion rows of data your servers would probably be able to handle the load :)
Im currently using the NEventStore by JOliver.
When we started, we were replaying our entire store back through our denormalizers/event handlers when the application started up.
We were initially keeping all our data in memory but knew this approach wouldn't be viable in the long term.
The approach we use currently is that we can replay an individual denormalizer, which makes things a lot faster since you aren't unnecessarily replaying events through denomalizers that haven't changed.
The trick we found though was that we needed another representation of our commits so we could query all the events that we handled by event type - a query that cannot be performed against the normal store.

Resources