I am thinking about something what is connected a little bit with CQRS. There is a pattern Request-Reply. In example of HTTP transport into header we put Request-Id for at least tracking purposes. In my case monitoring between different microservices. If incoming request contains it than rewrite is done to Correlation-Id header. As I think this is done on transport layer (infrastructure). Question is if that Request-Id (sometimes named as Message-Id) should be delivered from business layer in example directly from command that we are executing - some mechinics does this auto-magically - like ICommand requires that Id is present?
Or it's totally different thing that exists only in infrastructure layer (transport)? If yes, than how to correlate transport id with business command id? At least one log/trace/track thing has to be placed with both identifiers? Is there pattenr that I missed? Moreover what you think CorrelationId should be in business command or not?
IMHO concepts such as correlation id, causation id, request id, message id, etc belong to the infrastructure layer as they are not part of the business rules.
However, I've added a metadata attribute to my Command and Event objects to save this kind of info which helps me to manage the correlation and causation relationship between commands and events.
By having this metadata attribute in the form of an associative array (hash map, dictionary or whatever key-value format), you leave your code opened to persist any tracking info you may need in the future without polluting your Application and Domain layers too much.
Related
I want to write some Rest(ful) application with Spring Boot and Spring Data JPA.
Let's assume that for business reasons I have a database with the following tables:
customer(id number, first_name text, last_name text, type text);
customer_type(type text, description text);
where:
id is generated by the database at inserion time
type column in customer table is a foreign key to type column in customer_type table and it is immutable from a microservice point of view, just a lookup table.
Assuming I want to create APIs for CRUD operations on a customer but want to minimize api calls when just reading, I suppose I need the following operations:
GET /customer/{id}
POST /customer
PUT /customer/{id}
DELETE /customer/{id}
How the body should be structured?
For GET operation the response should be
{
"id":123,
"firstName":"John",
"lastName":"Doe",
"customerType":{
"type":"P",
"description":"Premium Customer"
}
}
But for POST I imagine I need to avoid sending the id and send just the customer type since the description is immutable and the client needs the description only for visualizing the information on screen, but this leads to different request body from the one returned in the GET operation.
For the PUT operation is the same but also should the id field be sent? How to handle the case where the id in the API path is different from the id in the request body if sent?
DELETE should not be a problem since it just deletes the row in customer table.
Thank you
How the body should be structured?
Let's make a step back first and let us discuss quickly what you basically try when following a REST architecture and why and how REST installs those mechanisms.
REST is an architectural style that helps in decoupling clients from servers by introducing indirection mechanisms which may seem odd at first but in the end allow you to achieve the required level of decoupling which allows clients to introduce changes which clients will naturally adept to. Such indirection mechanisms include attaching URIs to link-relation names, using form-based representation formats to tell a client how to create requests, content-type negotiation to return representations supported and understood by others and so forth. If you don't need such properties, i.e. as client and servers always go hand in hand in regards to changes and communicate on predefined messages, REST is probably not the best style to follow. If you though have a server that is contacted by various clients not under your control or a client that has to contact various servers, also not under your direct control, this is where REST truly starts to shine if all parties adhere to these concepts.
One of RESTs premise is that a server will teach clients everything they need to know in order to construct requests. If you look at the Web, where HTML is basically used everywhere, you might see that HTML defines HTML forms which basically allow a server to explain to a client what properties of a resource the server expects as input. On top of that the form also tells you client which HTTP operation to use, which target URI to send the request to and which media-type to represent the state in. In HTML this is usually implicitly given as application/x-www-form-urlencoded which chains properties together i.e. like this:
firstName=Roman&lastName=Vottner&role=Dev
or the like. This is in essence what HATEOAS or hypertext as the engine of application state is all about. You use in-build controls of the media-type exchanged to allow your client to progress its task instead of having to consult external documentation to lookup the "API" of some services. I.e. a form could state that an input only allows numeric values, that a sub-portion of the form represents a date/time picker widget which a client could render to a user accordingly, or an element represents a slider with a given range of admissible values and the like.
How the actual representation format you have to send to the server has to look like depends on the instructed media-type. I.e. HAL forms uses application/json by default and also specifies that application/x-www-form-urlencoded needs to be supported. Other media-types have explicitly negotiated between client and server. Ion states that application/json or application/ion+json have to be negotiated via the Content-Type request header.
In plain application/json the url-encoded payload from above could simply be expressed as:
{
"firstName": "Roman",
"lastName": "Vottner",
"role": "Dev"
}
and this is OK as the server basically instructed you to send this data in that format.
There are further media-types available that are worth a closer look whether they could fit your need or not. I.e. Hydra has a bit of a different take on this matter by connecting Linked Data to REST and its affordances called operations and allows to describe resources and its properties through LD classes. So the presence of an affordance for a certain resource tells you what you can do with that resource, like i.e. updating its state, and therefore also which class it belongs to and therefore which properties it has.
This just should illustrate how a negotiated media type finally decides how the actual representation needs to look like that has to be sent to the server.
In regards of whether to put in resource identifiers in the payload or not it depends. Usually resources are identified by the URI/IRI and this, as a whole, is the identifier of the resource. In your application though you will reference related domain objects through their ID which does not necessarily need to be, and probably also should not be, part of the IRI itself. I.e. let's assume we retrieve a resource that represents an order. That order contains the users name and address, the various items that got ordered including some meta data describing those items and what not. It usually makes sense in such a case to add the orderId which you use in your application even though the URI may contain that information already. Users of that API are usually not interested in those URIs but the actual content and might also never see those URIs if they are hidden behind automated processes or user interfaces. If a user now wants to print out that order s/he has all the information needed to file complaints later on via phone i.e. In other cases, i.e. if you design a resource to be an all-purpose clipboard like, copy&paste location, an ID does not make any sense unless you grant the user to explicitly reference one of that states directly.
The reason why IDs should not be part of the URI itself stems from the fact that a URI shouldn't change if the actual resource does not change. I.e. we have a customer who went through a merge a couple of years ago. They used to expose all their products via own URIs that exposed the productId as part of the URI. During the merger the tried to combine the various different data models to reduce the number of systems they had to operate while serving each of their customers with the same data as before as the underlying products didn't change. As they tried to stay "backwards" compatible for the purpose of supporting legacy systems of their customers, they quickly noticed that exposing those productIds as part of the URI was causing them some troubles. If they had used a mapping table of i.e. exposed UUIDs to internal productIds (again an introduction of indirection) earlier they could have reduced their whole data model and thus complexity by a lot while being able to change the mapping from internal prodcutId to UUID on the fly while allowing their clients to lookup the product information.
Long story short, as hopefully can be seen the structure of a representation depends on the exchanged media type. There are loads of different media-types available. Use the ones that allow you to describe resources to clients, such as HAL/HAL forms, Ion, Hydra, .... In regards to URIs, don't overengineer URIs. They are, as a whole, just a pointer to a resource and clients are usually interested in the content, not the URI! As such, make use of indirection-features like link-relation names, content-type negotiation and so forth to help remove the direct coupling of clients to services but instead rely more on the document type exchanged. The media-type here becomes basically the contract of the message. Through mappings on the client and server side resources of various representations can be "translated" to an object which you can use in your application.
As you've tagged your question with spring-boot and spring-data-jpa, you might want to look into spring-hateoas. It supports HAL out of the box, HAL forms can be used via affordances though the media-type needs to be enabled explicitly for it otherwise you might miss out on the form-template in the responses. Hydra support in spring-hateoas seems to be added through hydra-java which implements the Spring HATEOAS SPI. While Amazon provides implementation for Ion for various programming languages, including Java, it does not yet support Spring HATEOAS or Spring in general. Here a custom SPI implementation may be necessary.
For PUT operations you need to send the id of the entity that you want to update.
If you want to generate the same response as you would get in GET, then you need to write a DTO and map details accordingly.
Let's assume we host two microservices: RealEstate and Candidate.
The RealEstate service is responsible for managing rental properties, landlords and so forth.
The Candidate service provides commands to apply for a rental property.
There would be a CandidateForRentalProperty command which requires the RentalPropertyId and all necessary Candidate information.
Now the crucial point: Different types of RentalPropertys require a different set of Candidate information.
Therefore the commands and aggregates got splitten up:
Commands: CandidateForParkingLot, CandidateForFlat, and so forth.
Aggregates: ParkingLotCandidature, FlatCandidature, and so forth.
The UI asks the read model to decide which command has to be called.
It's reasonable for me to validate the Candidate information and all the business logic involved with that in the Candidate domain layer, but leave out validation whether the correct command got called based on the given RentalPropertyId. Reason: Multiple aggregates are involved in this validation.
The microservice should be autonomous and it's read model consumes events from the RealEstate domain, hence it's not guaranteed to be up to date. We don't want to reject candidates based on that but rather use eventual consistency.
Yes, this could lead to inept Candidate information used for a certain kind of RentalProperty. Someone could just call the CandidateForFlat command with a parking lot rental property id.
But how do we handle the cases in which this happens?
The RealEstate domain does not know anything about Candidates.
Would there be an event handler which checks if there is something wrong and execute an appropriate command to compensate?
On the other hand, this "mapping" is domain logic and I'd like to accomodate it in the domain layer. But I don't know who's accountable for this kind of compensating measures. Would the Candidate aggregate be informed, like IneptApplicationTypeUsed or something like that?
As an aside - commands are usually imperative verbs. ApplyForFlat might be a better spelling than CandidateForFlat.
The pattern you are probably looking for here is that of an exception report; when the candidate service matches a CandidateForFlat message with a ParkingLot identifier, then the candidate service emits as an output a message saying "hey, we've got a problem here".
If a follow up message fixes the problem -- the candidate service gets an updated message that fixes the identifier in the CandidateForFlat message, or the candidate service gets an update from real estate announcing that the identifier actually points to a Flat, then the candidate service can emit another message "never mind, the problem has been fixed"
I tend to find in this pattern that the input commands to the service are really all just variations of handle(Event); the user submitted, the http request arrived; the only question is whether or not the microservice chooses to track that event. In other words, the "command" stream is just another logical event source that the microservice is subscribed to.
As you said, validation of commands should be performed at the point of command generation - at client side - where read models are available.
Command processing is performed by aggregate, so it cannot and should not check validity or existence of other aggregates. So it should trust a command issuer.
If commands comes from an untrusted environment like public API, then your API gateway becomes a client, and it should have necessary read models to validate references.
If you want to accept a command fast and check it later, then log events like ClientAppliedForParkingLot, and have a Saga/Process manager handle further workflow by keeping its internal state, and issuing commands like AcceptApplication or RejectApplication.
I understand the need for validation but I don't think the example you gave calls for cross-Aggregate (or cross-microservice for that matter) compensating measures as stated in the Q title.
Verifications like checking that the ID the client gave along with the flat rental command matches a flat and not a parking lot, that the client has permission to do that, and so forth, are legitimate. But letting the client create such commands in the wild and waiting for an external actor to come around and enforce these rules seems subpar because the rules could be made intrinsic properties of the object originating the process.
So what I'd recommend is to change the entry point into the operation - to create the Candidature Aggregate Root as part of another Aggregate Root's behavior. If that other Aggregate (RentalProperty in our case) lives in another Bounded Context/microservice, you can maintain a list of RentalProperties in the Candidate Bounded Context with just the amount of info needed, and initiate the Candidature from there.
So you would have
FlatCandidatureHandler ==loads==> RentalProperty ==creates==> FlatCandidature
or
FlatCandidatureHandler ==checks existence==> local RentalProperty data
==creates==> FlatCandidature
As a side note, what could actually necessitate compensating actions are factors extrinsic to the root object of the process. For instance, if the property becomes unavailable in the mean time. Then whatever Aggregate holds that information should emit an event when that happens and the compensation should be initiated.
CQRS states: command should not query read side.
Ok. Let's take following example:
The user needs to create orders with order lines, each order line contains product_id, price, quantity.
It sends requests to the server with order information and the list of order lines.
The server (command handler) should not trust the client and needs to validate if provided products (product_ids) exist (otherwise, there will be a lot of garbage).
Since command handler is not allowed to query read side, it should somehow validate this information on the write side.
What we have on the write side: Repositories. In terms of DDD, repositories operate only with Aggregate Roots, the repository can only GET BY ID, and SAVE.
In this case, the only option is to load all product aggregates, one by one (repository has only GET BY ID method).
Note: Event sourcing is used as a persistence, so it would be problematic and not efficient to load multiple aggregates at once to avoid multiple requests to the repository).
What is the best solution for this case?
P.S.: One solution is to redesign UI (more like task based UI), e.g.: User first creates order (with general info), then adds products one by one (each addition separate http request), but still I need to support bulk operations (api for third party applications as an example).
The short answer: pass a domain service (see Evans, chapter 5) to the aggregate along with the other command arguments.
CQRS states: command should not query read side.
That's not an absolute -- there are trade offs involved when you include a query in your command handler; that doesn't mean that you cannot do it.
In domain-driven-design, we have the concept of a domain service, which is a stateless mechanism by which the aggregate can learn information from data outside of its own consistency boundary.
So you can define a service that validates whether or not a product exists, and pass that service to the aggregate as an argument when you add the item. The work of computing whether the product exists would be abstracted behind the service interface.
But what you need to keep in mind is this: products, presumably, are defined outside of the order aggregate. That means that they can be changing concurrently with your check to verify the product_id. From the point of view of correctness, there's no real difference between checking the validity of the product_id in the aggregate, or in the application's command handler, or in the client code. In all three places, the product state that you are validating against can be stale.
Udi Dahan shared an interest observation years ago
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
If the client has validated the data one hundred milliseconds ago when composing the command, and the data was valid them, what should the behavior of the aggregate be?
Think about a command to add a product that is composed concurrently with an order of that same product - should the correctness of the system, from a business perspective, depend on the order that those two commands happen to arrive?
Another thing to keep in mind is that, by introducing this check into your aggregate, you are coupling the ability to change the aggregate to the availability of the domain service. What is supposed to happen if the domain service can't reach the data it needs (because the read model is down, or whatever). Does it block? throw an exception? make a guess? Does this choice ripple back into the design of the aggregate, and so on.
I'm developing small CQRS+ES framework and develop applications with it. In my system, I should log some action of the client and use it for analytics, statistics and maybe in the future do something in domain with it. For example, client (on web) download some resource(s) and I need save date, time, type (download, partial,...), from region or country (maybe IP), etc. after that in some view client can see count of download or some complex report. I'm not sure how to implement this feather.
First solution creates analytic context and some aggregate, in each client action send some command like IncreaseDownloadCounter(resourced) them handle the command and raise domain event's and updating view, but in this scenario first download occurred and after that, I send command so this is not really command and on other side version conflict increase.
The second solution is raising event, from client side and update the view model base on it, but in this type of handling my event not store in event store because it's not raise by command and never change any domain context. If is store it in event store, no aggregate to handle it after fetch for some other use.
Third solution is raising event, from client side and I store it on other database may be for each type of event have special table, but in this manner of event handle I have multiple event storage with different schema and difficult on recreating view models and trace events for recreating contexts states so in future if I add some domain for use this type of event's it's difficult to use events.
What is the best approach and solution for this scenario?
First solution creates analytic context and some aggregate
Unquestionably the wrong answer; the event has already happened, so it is too late for the domain model to complain.
What you have is a stream of events. Putting them in the same event store that you use for your aggregate event streams is fine. Putting them in a separate store is also fine. So you are going to need some other constraint to make a good choice.
Typically, reads vastly outnumber writes, so one concern might be that these events are going to saturate the domain store. That might push you towards storing these events separately from your data model (prior art: we typically keep the business data in our persistent book of record, but the sequence of http requests received by the server is typically written instead to a log...)
If you are supporting an operational view, push on the requirement that the state be recovered after a restart. You might be able to get by with building your view off of an in memory model of the event counts, and use something more practical for the representations of the events.
Thanks for your complete answer, so I should create something like the ES schema without some field (aggregate name or type, version, etc.) and collect client event in that repository, some offline process read and update read model or create command to do something on domain space.
Something like that, yes. If the view for the client doesn't actually require any validation by your model at all, then building the read model from the externally provided events is fine.
Are you recommending save some claim or authorization token of the user and sender app for validation in another process?
Maybe, maybe not. The token describes the authority of the event; our own event handler is the authority for the command(s) that is/are derived from the events. It's an interesting question that probably requires more context -- I'd suggest you open a new question on that point.
I am designing some events that will be raised when actions are performed or data changes in a system. These events will likely be consumed by many different services and will be serialized as XML, although more broadly my question also applies to the design of more modern funky things like Webhooks.
I'm specifically thinking about how to describe changes with an event and am having difficulty choosing between different implementations. Let me illustrate my quandry.
Imagine a customer is created, and a simple event is raised.
<CustomerCreated>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</CustomerCreated>
Now let's say Bob spends lots of money and becomes a gold customer, or indeed any other property changes (e.g.: he now prefers to be known as Robert). I could raise an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is nice because the schema of the Created and Modified events are the same and any subscriber receives the complete current state of the entity. However it is difficult for any receiver to determine which properties have changed without tracking state themselves.
I then thought about an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is more compact and only contains the properties that have changed, but comes with the downside that the receiver must apply the changes and reassemble the current state of the entity if they need it. Also, the schemas of the Created and Modified events must be different now; CustomerId is required but all other properties are optional.
Then I came up with this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<Before>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</Before>
<After>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</After>
</CustomerModified>
This covers all bases as it contains the full current state, plus a receiver can figure out what has changed. The Before and After elements have the exact same schema type as the Created event. However, it is incredibly verbose.
I've struggled to find any good examples of events; are there any other patterns I should consider?
You tagged the question as "Event Sourcing", but your question seems to be more about Event-Driven SOA.
I agree with #Matt's answer--"CustomerModified" is not granular enough to capture intent if there are multiple business reasons why a Customer would change.
However, I would back up even further and ask you to consider why you are storing Customer information in a local service, when it seems that you (presumably) already have a source of truth for customer. The starting point for consuming Customer information should be getting it from the source when it's needed. Storing a copy of information that can be queried reliably from the source may very well be an unnecessary optimization (and complication).
Even if you do need to store Customer data locally (and there are certainly valid reasons for need to do so), consider passing only the data necessary to construct a query of the source of truth (the service emitting the event):
<SomeInterestingCustomerStateChange>
<CustomerId>1234</CustomerId>
</SomeInterestingCustomerStateChange>
So these event types can be as granular as necessary, e.g. "CustomerAddressChanged" or simply "CustomerChanged", and it is up to the consumer to query for the information it needs based on the event type.
There is not a "one-size-fits-all" solution--sometimes it does make more sense to pass the relevant data with the event. Again, I agree with #Matt's answer if this is the direction you need to move in.
Edit Based on Comment
I would agree that using an ESB to query is generally not a good idea. Some people use an ESB this way, but IMHO it's a bad practice.
Your original question and your comments to this answer and to Matt's talk about only including fields that have changed. This would definitely be problematic in many languages, where you would have to somehow distinguish between a property being empty/null and a property not being included in the event. If the event is getting serialized/de-serialized from/to a static type, it will be painful (if not impossible) to know the difference between "First Name is being set to NULL" and "First Name is missing because it didn't change".
Based on your comment that this is about synchronization of systems, my recommendation would be to send the full set of data on each change (assuming signal+query is not an option). That leaves the interpretation of the data up to each consuming system, and limits the responsibility of the publisher to emitting a more generic event, i.e. "Customer 1234 has been modified to X state". This event seems more broadly useful than the other options, and if other systems receive this event, they can interpret it as they see fit. They can dump/rewrite their own data for Customer 1234, or they can compare it to what they have and update only what changed. Sending only what changed seems more specific to a single consumer or a specific type of consumer.
All that said, I don't think any of your proposed solutions are "right" or "wrong". You know best what will work for your unique situation.
Events should be used to describe intent as well as details, for example, you could have a CustomerRegistered event with all the details for the customer that was registered. Then later in the stream a CustomerMadeGoldAccount event that only really needs to capture the customer Id of the customer who's account was changed to gold.
It's up to the consumers of the events to build up the current state of the system that they are interested in.
This allows only the most pertinent information to be stored in each event, imagine having hundreds of properties for a customer, if every command that changed a single property had to raise an event with all the properties before and after, this gets unwieldy pretty quickly. It's also difficult to determine why the change occurred if you just publish a generic CustomerModified event, which is often a question that is asked about the current state of an entity.
Only capturing data relevant to the event means that the command that issues the event only needs to have enough data about the entity to validate the command can be executed, it doesn't need to even read the whole customer entity.
Subscribers of the events also only need to build up a state for things that they are interested in, e.g. perhaps an 'account level' widget is listening to these events, all it needs to keep around is the customer ids and account levels so that it can display what account level the customer is at.
Instead of trying to convey everything through payload xmls' fields, you can distinguish between different operations based on -
1. Different endpoint URLs depending on the operation(this is preferred)
2. Have an opcode(operation code) as an element in the xml file which tells which operation is to used to handle the incoming request.(more nearer to your examples)
There are a few enterprise patterns applicable to your business case - messaging and its variants, and if your system is extensible then Enterprise Service Bus should be used. An ESB allows reliable handling of events and processing.