CheckGenesis use case(s) - substrate

In today's Substrate Collaborative Learning, the SignedExtension impl for CheckGenesis came up (see this riot conversation for validation-related discussion).
Going back to first principles, what are the use case(s) for CheckGenesis ?

When a user submits a transaction to a Substrate-based blockchain, there is extra signed data attached to the transaction to ensure it is applied to the correct chain state that the user intended it for. You can see what kind of additional signed data is attached in the node template
The purpose of CheckGenesis is to ensure that the transaction is submitted to the correct chain instead of a different one. Without CheckGenesis, the following attack would be possible.
Alice pays Bob a few tokens on a chain that they both commonly use.
The transaction goes through as expected and Bob receives the tokens.
Bob notices that Alice re-uses her key on another chain.
So he submits her transaction to that second chain as well.
The transaction also goes through on the second chain, and Bob receives a second payment.
By referring to the chain that the transaction is intended for in the signed data, Alice can prevent that attack.
As cryptographic advice, you should not reuse keys across applications in general. Not all blockchains are based on Substrate, and not all chains include this check.

Related

Is there a elegant way of putting domain validations inside domain layer?

I am new to domain driven design and there is one thing that is bothering me when I'm writing domain model. How to handle domain validation?
I am designing library management system where user can search through books and see if book is on the stock.
If it is not, user can create request for book so some kind of queue is created. Rule is that we don't have any book in stock. Right now I have information about quantity inside book entity and that is not a problem but what if i have different bounded context for requesting books and book catalog. Then I must somehow contact another vertical/service and ask (validate) that book quantity is zero before creating book aggregate.
Also I am checking if user have valid membership card, is book already borrowed by him, do user have active requests for any book.
Things that bother me.
I need to know what exactly to include in aggregate before passing it to domain model because of validations. I am not sure that is safest approach because my validations accuracy will depend from specification/query, etc.
Another very important thing. When application layer method start with execution and something is not valid client will get only validation messages for code that was executed and there is good chance there is more things that are preventing code for execution. This can be really inconvenient if user is filling some form.
First thoughts for solving this problem.
I have command/handler architecture and I am using MediatR so I am thinking to move domain validations between command and handler and that will solve my problems for now but that approach will spread domain knowledge across bounded context and domain model will not be smart enough to guard from not valid actions. More precise I will need to think before executing application method (handler) what I need to validate.
So I am really curious. Is there any clear way of handling domain validations inside domain model?
Is there any clear way of handling domain validations inside domain model?
Yes; they require work and careful thinking.
One aspect of careful thinking is to distinguish message validation from domain logic. Message validation is an isolated thing, a message is valid or not according to the schema of the message -- are all of the required fields present, is the data in the right form, are the numbers in the allowed range, and so on. Really, we're asking the question "did the client fill out the form correctly?"
Integrating a valid message with previously known information (aka, the "state" of the domain model) is a domain logic concern. State is chosen deliberately - the domain model is a state machine for the bookkeeping of your domain.
Depending on your domain, and the information that is available, there can be states that mean that the client doesn't get what they want. "The road less traveled" doesn't mean that things are invalid.
Furthermore, if your system is distributed (different pieces of data are the responsibility of different authorities), then any locally cached copies of that data are necessarily stale, and may be out of date. See Pat Helland's Memories, Guesses, and Apologies. That we will sometimes produce an incorrect answer is an inevitable consequence of distributing the work. If we're responsible, then we performed a cost benefit analysis to ensure that the expected benefits of distributing the work offset the expected risks.

Role of off-chain workers

I'm trying to build a mental model of the role of off-chain workers in substrate. The bigger picture seems to be that they move logic inside the substrate node, that was otherwise done by oracles, triggering on predefined transactions. There are two use cases I was thinking of specifically:
1: Validating file formats: incoming transaction proposes a file accessible via url or ipfs hash, and it's format needs to be validated. An off-chain worker fetches the file, asserts format (size, encoding, content, whatever) and if correct submits another transaction saying it's valid.
2: Key generation: let's assume there is a separate service distributed with the substrate node, which manages keys for each instance. Node A runs a key sharing algorithm (like Shamir's secret sharing) via this external service between participants A, B and C, then makes a transaction creating a group (A,B,C) on-chain. This transaction triggers all nodes that are in this group to run off-chain workers, call into their local key store verifying having the key. They can all mark it on-chain afterwards.
As far as I understand it correctly, off-chain workers are triggered in every node after block execution. In the former use case, this would result in lots of transactions validating just one file, and nothing guarantees the correctness of these. What is a good way of reaching consensus on the validity of the file? Is it also possible without economic incentives like staking? It would be problematic with tokens having no value in the network, e.g in enterprise settings. Is this even the right use case for off-chain workers? The second example should not suffer from such issue, we just need all parties to verify having the key.
Where does the thought process above go wrong, and why?
As far as I understand it correctly, off-chain workers are triggered in every node after block execution.
Yes and no. There is a CLI flag for it. And at the time of this writing it says:
--offchain-worker <ENABLED>
Should execute offchain workers on every block.
By default it's only enabled for nodes that are authoring new blocks. [default: WhenValidating] [possible
values: Always, Never, WhenValidating]
In the former use case, this would result in lots of transactions validating just one file, and nothing guarantees the correctness of these.
I think it is the responsibility of the receiving function (aka. Call) to handle and incentivise this. For example, there could be a reward opportunity to validate an address. But, if it has already been submitted by another transaction, you will get slashed (or even if not, you do pay some transaction fee, for nothing). In such cases, you can assume that not all participants will submit a transaction. They will only do it when there is a chance of improvement, which should be depicted by your potential reward/slash scheme.
Is this even the right use case for off-chain workers?
I am no expert here, but I think at least the validation example is a good example. It is just a matter of finding a good incentive + anti-spam slashing.
I am less familiar with the second example, so no comments on that.

How to handle dependent behavior in a domain class?

Let's say I've got a domain class, which has functions, that are to be called in a sequence. Each function does its job but if the previous step in the sequence is not done yet, it throws an error. The other way is that each function completes the step required for it to run, and then executes its own logic. I feel that this way is not a good practice, since I am adding multiple responsibilities, and the caller wont know what all operations can happen when he invokes a method.
My question is, how to handle dependent scenarios in DDD. Is it the responsibility of the caller to invoke the methods in the right sequence? Or do we make the methods handle the dependent operations before it's own logic?
Is it the responsibility of the caller to invoke the methods in the right sequence?
It's ok if those methods have a business meaning. For example the client may book a flight, and then book a hotel room. Both of those is something the client understands, and it is the client's logic to call them in this sequence. On the other hand, inserting the reservation into the database, then committing (or whatever) is technical. The client should not have to deal with that at all. Or "initializing" an object, then calling other methods, then calling "close".
Requiring a sequence of technical calls is a form of temporal coupling, it is considered a bad practice, and is not directly related to DDD.
The solution is to model the problem better. There is probably a higher level use-case the caller wants achieved with this call sequence. So instead of publishing the individual "steps" required, just support the higher use-case as a whole.
In general you should always design with the goal to get any sequence of valid calls to actually mean something (as far as the language allows).
Update: A possible model for the mentioned "File" domain:
public interface LocalFile {
RemoteFile upload();
}
public interface RemoteFile {
RemoteFile convert(...);
LocalFile download();
}
From my point of view, what you are describing is the orchestration of domain model operations. That's the job of the application layer, the layer upon domain model. You should have an application service that would call the domain model methods in the right sequence, and it also should take into account whether some step has left any task undone, and in such case, tell the next step to perform it.
TLDR; Scroll to the bottom for the answer, but the backstory will give some good context.
If the caller into your domain must know the order in which to call things, then you have missed an opportunity to encapsulate business logic in your domain, which is a symptom of an anemic domain.
#RobertBräutigam made a very good point:
Requiring a sequence of technical calls is a form of temporal coupling, it is considered a bad practice, and is not directly related to DDD.
This is true, but it is worse when you do it with your domain model because non-domain concerns get intermixed with domain concerns. Intent becomes lost in a sea of non business logic. If you can, you look for a higher-order aggregate that encapsulates the ordering. To borrow Robert's example, rather than booking a flight then a hotel room, and forcing that on the client, you could have a Vacation aggregate take both and validate it.
I know that sounds wrong in your case, and I suspect you're right. There's a clear dependency that can't happen all at once, so we can't be the end of the story. When you have a clear dependency with intermediate transactions that must occur before the "final" state, we have... orchestration (think sagas, distributed transactions, domain events and all that goodness).
What you describe with file operations spans across transactions. The manipulation (state change) of a domain is transactional at each point in a distributed transaction, but is not transactional overall. So when #choquero70 says
you are describing is the orchestration of domain model operations. That's the job of the application layer, the layer upon domain model.
that's also correct. Orchestration is key. Each step must manipulate the state of the domain once, and once only, and leave it in a valid state, but it OK for there to be multiple steps.
Each of those individual points along the timeline are valid moments in the state of your domain.
So, back to your model. If you expose a single interface with multiple possible calls to all steps, then you leave yourself open to things being called out of order. Make this impossible or at least improbable. Orchestration is not just about what to do, but what to prevent from happening. Create smaller interfaces/classes to avoid accidentally increasing the "surface area" of what could be misused accidentally.
In this way, you are guiding the caller on what to do next by feeding them valid intermediate states. But, and this is the important part, the burden on what to call in what order is not on the caller. Sure, the caller could know what to do, but why force it.
Your basic algorithm is the same: upload, transform, download.
Is it the responsibility of the caller to invoke the methods in the right sequence?
Not exactly. Is the responsibility of the caller to choose from legitimate choices given the state of your domain. It's "your" responsibility to present these choices via business methods on your correctly modeled moment/interval aggregate suitable for the caller to use.
Or do we make the methods handle the dependent operations before it's own logic?
If you've setup orchestration correctly, this won't be necessary. But it does make sense to validate anyway.
On a side note, each step of the orchestration you do should be very linear in nature. I tell my developers to be suspicious of an orchestration step that has an if statement in it. If there's an if it's likely better to be part of another orchestration step or encapsulated in business logic.

How to use compensating measures in an CQRS and DDD based application

Let's assume we host two microservices: RealEstate and Candidate.
The RealEstate service is responsible for managing rental properties, landlords and so forth.
The Candidate service provides commands to apply for a rental property.
There would be a CandidateForRentalProperty command which requires the RentalPropertyId and all necessary Candidate information.
Now the crucial point: Different types of RentalPropertys require a different set of Candidate information.
Therefore the commands and aggregates got splitten up:
Commands: CandidateForParkingLot, CandidateForFlat, and so forth.
Aggregates: ParkingLotCandidature, FlatCandidature, and so forth.
The UI asks the read model to decide which command has to be called.
It's reasonable for me to validate the Candidate information and all the business logic involved with that in the Candidate domain layer, but leave out validation whether the correct command got called based on the given RentalPropertyId. Reason: Multiple aggregates are involved in this validation.
The microservice should be autonomous and it's read model consumes events from the RealEstate domain, hence it's not guaranteed to be up to date. We don't want to reject candidates based on that but rather use eventual consistency.
Yes, this could lead to inept Candidate information used for a certain kind of RentalProperty. Someone could just call the CandidateForFlat command with a parking lot rental property id.
But how do we handle the cases in which this happens?
The RealEstate domain does not know anything about Candidates.
Would there be an event handler which checks if there is something wrong and execute an appropriate command to compensate?
On the other hand, this "mapping" is domain logic and I'd like to accomodate it in the domain layer. But I don't know who's accountable for this kind of compensating measures. Would the Candidate aggregate be informed, like IneptApplicationTypeUsed or something like that?
As an aside - commands are usually imperative verbs. ApplyForFlat might be a better spelling than CandidateForFlat.
The pattern you are probably looking for here is that of an exception report; when the candidate service matches a CandidateForFlat message with a ParkingLot identifier, then the candidate service emits as an output a message saying "hey, we've got a problem here".
If a follow up message fixes the problem -- the candidate service gets an updated message that fixes the identifier in the CandidateForFlat message, or the candidate service gets an update from real estate announcing that the identifier actually points to a Flat, then the candidate service can emit another message "never mind, the problem has been fixed"
I tend to find in this pattern that the input commands to the service are really all just variations of handle(Event); the user submitted, the http request arrived; the only question is whether or not the microservice chooses to track that event. In other words, the "command" stream is just another logical event source that the microservice is subscribed to.
As you said, validation of commands should be performed at the point of command generation - at client side - where read models are available.
Command processing is performed by aggregate, so it cannot and should not check validity or existence of other aggregates. So it should trust a command issuer.
If commands comes from an untrusted environment like public API, then your API gateway becomes a client, and it should have necessary read models to validate references.
If you want to accept a command fast and check it later, then log events like ClientAppliedForParkingLot, and have a Saga/Process manager handle further workflow by keeping its internal state, and issuing commands like AcceptApplication or RejectApplication.
I understand the need for validation but I don't think the example you gave calls for cross-Aggregate (or cross-microservice for that matter) compensating measures as stated in the Q title.
Verifications like checking that the ID the client gave along with the flat rental command matches a flat and not a parking lot, that the client has permission to do that, and so forth, are legitimate. But letting the client create such commands in the wild and waiting for an external actor to come around and enforce these rules seems subpar because the rules could be made intrinsic properties of the object originating the process.
So what I'd recommend is to change the entry point into the operation - to create the Candidature Aggregate Root as part of another Aggregate Root's behavior. If that other Aggregate (RentalProperty in our case) lives in another Bounded Context/microservice, you can maintain a list of RentalProperties in the Candidate Bounded Context with just the amount of info needed, and initiate the Candidature from there.
So you would have
FlatCandidatureHandler ==loads==> RentalProperty ==creates==> FlatCandidature
or
FlatCandidatureHandler ==checks existence==> local RentalProperty data
==creates==> FlatCandidature
As a side note, what could actually necessitate compensating actions are factors extrinsic to the root object of the process. For instance, if the property becomes unavailable in the mean time. Then whatever Aggregate holds that information should emit an event when that happens and the compensation should be initiated.

CQRS + Microservices: How to handle relations / validation?

Scenario:
I have 2 Microservices (which both use CQRS + Event Sourcing internally)
Microservice 1 manages Contacts (= Aggregate Root)
Microservice 2 manages Invoices (= Aggregate Root)
The recipient of an invoice must be a valid contact.
CreateInvoiceCommand:
{
"content": "my invoice content",
"recipient": "42"
}
I now read lot's of times, that the write side (= the command handler) shouldn't call the read side.
Taking this into account, the Invoices Microservice must listen to all ContactCreated and ContactDeleted events in order to know if the given recipient id is valid.
Then I'd have thousands of Contacts within the Invoices Microservice, even if I know that only a few of them will ever receive an Invoice.
Is there any best practice to handle those scenarios?
The recipient of an invoice must be a valid contact.
So the first thing you need to be aware of - if two entities are part of different aggregates, you can't really implement "apply a change to this entity only if that entity satisfies a specification", because that entity could change between the moment you evaluate the specification and the moment you perform the write.
In other words - you can only get eventual consistency across an aggregate boundary.
The aggregate is the authority for its own state, but everything else (for example, the contents of the command message), it pretty much has to accept that some external authority has checked the data.
There are a couple approaches you can take here
1) You can blindly accept that the recipient specified in the command is valid.
2) You can try to verify the validity of the recipient from some external authority (aka: a read model of some other aggregate) between receiving it from the untrusted source and submitting it to the domain model.
3) You can blindly accept the command as described, but treat the invoice as provisional until the validity of the recipient is confirmed. That means there is a second command to run on the invoice that certifies the recipient.
Note - from the point of view of the model, these different commands are equivalent, but at the application layer they don't need to be -- you can restrict access to the command to trusted sources (don't make it part of the public api, require authorization that is only available to trusted sources, etc).
Approach #3 is the most microservicy, as the two commands can be separated in time -- you can accept the CreateInvoice command as soon as it arrives, and certify the recipient asynchronously.
Where would you put approach 4), where the Invoices Microservice has it's own Contacts Store which gets updated whenever there's a ContactCreated or ContactDeleted event? Then both entities are part of the same service and boundary. Now it should be possible to make things consistent, right?
No. You've made the two entities part of the same service, but the problem was never that they were in different services, but that they are in separate aggregates -- meaning we can be changing the entity states concurrently, which means that we can't ensure that they are immediately synchronized.
If you wanted immediate consistency, you need a model that draws your boundaries differently.
For instance, if the invoice entities were modeled as part of the Contacts aggregate, then the aggregate can ensure the invariant that new invoices require a valid recipient -- the domain model uses the copy of the state in memory to confirm that the recipient was valid when we loaded, and the write into the book of record verifies that the book of record hadn't changed since the load happened.
The write of the aggregate state is a compare-and-swap in the book of record; if some concurrent process had invalidated the recipient, the CAS operation would fail.
The trade off, of course, is that any change to the Contact aggregate would also cause the invoice to fail; concurrent editing of different invoices with the same recipient goes out the window.
Aggregates are all or nothing; they aren't separable.
Now, one out might be that your Invoice aggregate has a part that must be immediately consistent with the recipient, and another part where eventually consistent, or even inconsistent, is acceptable. In which case your goal is to refactor the model.
The recipient of an invoice must be a valid contact.
This is a business rule. The question should be asked, what does this business rule mean for my application? Who should take responsibility for implementing this rule, or can the responsibility be shared?
One possibility is that, yes, the business rule is about invoices so it should be the responsibility of the Invoices Service to implement it.
However, the business rule is really about the creation of invoices. And the owner of invoice creation in your architecture is, strangely, not the Invoices Service. The reason for this is that the name of the command is CreateInvoiceCommand.
Let's think about this - the Invoices Service will never just create an invoice on its own. It just provides the capability. In this architecture, the actual owner of invoice creation is the sender of the command.
Using this line of reasoning, if the business rule is saying that invoice creation cannot happen against an invalid recipient, then it becomes the responsibility of the command sender to ensure this business rule is implemented.
This would be a very different scenario if, rather than receiving a command, the Invoices Service subscribed to events. As an example, an event called WidgetSold. In this scenario, the owner of invoice creation clearly would be the Invoicing service, and so the business rule would be implemented there instead.
If the user clicks the create invoice for contact 42 button, it's the
user's responsibility to take care that contact 42 exists
Yes, that is correct. The user's intention is to create an invoice. The business rules around invoice creation should, therefore, be enforced at this point. How this happens (or whether this happens at all) is a different question.
But what if the user doesn't care? Then it would create an invoice
with an invalid recipient id.
Also correct. As you say, there are side-effects to this approach, one of which is that you can end up with inconsistent data across your system. That is one of the realities of SOA.
Isn't this somehow similar to this: The Invoice has a currencyCode
property, it's a String.
I don't know if I agree or not. Is asking is this a valid ISO currency? different to asking is entity 42 valid according to another system?. I would think so.
Isn't it kinda the same as given recipient is not null and is valid
according to my Contacts Database?
I agree that in reality, you could implement this validation in the service. I am just saying that I don't think it's the right place for it. If you wanted to do this, you would have to either call out the another service or store all contacts locally, as you framed your question originally. I think it's simpler to just do it outside of the service.
I think that the answer depends on how resilient you want the system to be, that is, how to handle the situation in wich the Contacts Microservice is down (not responding or very slow).
1. You want to be very resilient
If the Contacts Microservice is down, you want to be able to emit invoices for some (maybe most) of the contacts. In this case you listen to the ContactCreated and ContactDeleted and maintain a (eventually consistent) local list of valid contacts; they should be named accordingly to the Ubiquitous language in this bounded context, like Payers (or something like that). Then, in the Application layer, when building the CreateInvoiceCommand you check that Payer is valid and create the command.
2. You don't need to be resilient
If the Contacts Microservice is down, you refuse to generate invoices. In this case, when building the command you make a request to the Invoices Microservice API endpoint and verify that the Payer is valid.
In any case, you check for contact's validity before the command is dispatched.

Resources