WaterfallStep design vs. SOLID principales in Bot Framework v4 - botframework

After moving over from Bot Framework v3 and studying the docs/samples, I'm still not understanding why the WaterfallStep dialog handling in v4 was implemented in such a way.
Why was it chosen to process the result of the previous step in the next step within a waterfall?
Given a waterfall with 3 Steps PromptName, PromptAge and PromptLocation, I see the following:
Method naming: Given the 2nd and 3rd prompt, method naming gets unclear. Naturally we would do AskForAge() or AskForLocation() but this is misleading due to the next point
SOLID Principals: Isn't "single responsibility principal" violated as we do two things in each step? Storing the previous response while asking for the next one in the same method, which ultimately leads in method names like AskForLocationAndStoreAge()
Code duplication: Due to the fact that each step assumes concrete input from its previous step, it can't be reused easily nor can order be changed. Even the simplest samples are hard to read.
I'm looking for some clarification on why the design was chosen this way or what I have missed in the concept.

This question seems largely opinion-based and therefore I don't know that it is appropriate for Stack Overflow. It turns out there is actually a good place to ask these kinds of questions, and it's the BotBuilder GitHub repo. I will attempt to answer it all the same.
Why was it chosen to process the result of the previous step in the next step within a waterfall?
This has to do with how bot conversations work. In its most basic form, without any dialogs or state or anything, a bot works by answering questions or otherwise responding to messages from the user. That is to say, a bot program's entire lifespan is to receive one message, process it, and provide a response while the message it received is still "active" in some sense. Then every following message the bot receives from the user is processed in the same way as though the first message was never received, like the message is being processed by an entirely new instance of the program with no memory of previous messages.
To make bot conversations feel more continuous, bot state and dialogs are needed. With dialogs and specifically prompt dialogs, the bot is now able to ask the user a question rather than just answer questions the user asked. In order to do this, the bot needs to store information about the conversation somewhere so that when the next message is received from the user the new "instance" of the bot program will know that the message from the user should be interpreted as a response to the question that the previous instance of the program asked. It's like leaving a note for someone working the next shift to let them know about something that happened during the previous shift that they need to follow up on.
Knowing all this, it seems only natural to process the result of the previous step in the next step within a waterfall because of the nature of conversations and dialogs. In a waterfall dialog containing prompts, the bot will receive messages pertaining to the last message the bot sent. So it needs to process the result of the previous step in the next step. It also needs to respond to the message, and in a waterfall that often means asking another question.
Isn't "single responsibility principal" violated as we do two things in each step? Storing the previous response while asking for the next one in the same method, which ultimately leads in method names like AskForLocationAndStoreAge()
As I understand it, the single responsibility principle refers to classes and not methods. If it does refer to methods, then that principle may well be violated or bent in some way in this case. However, it doesn't have to be. You are free to make multiple methods to handle each step, or even to make multiple steps to handle each message. If you wanted, your waterfall could contain a step that processes the result of the previous prompt and then continues on into the next step which makes a new prompt.
Due to the fact that each step assumes concrete input from its previous step, it can't be reused easily nor can order be changed. Even the simplest samples are hard to read.
You ultimately have control over how the input is validated/interpreted so it can be as concrete as you want. The reusability of a dialog or waterfall step has everything to do with how similar the different things you want to do are, the same as in any area of programming. If the samples are hard to read, I recommend raising issues with those samples. Of course, you can still raise an issue with the design of the SDK itself in the appropriate repo, but please do consider including suggestions for the way you think it should be instead.

Related

Commitment Level

I've written a basic rpc client which polls the state of an Solana account to look for a specific condition (i.e. a unique int64 Id being written to it). When the condition arises, I call a smart contract which takes the same account as a mutable argument.
Before doing anything, the program checks for the same condition. However this check fails. I understand we're dealing with a distributed system and that state maybe inconsistent for a period of time, but I can repeatedly call for over 30 secs and it fails each time, before ultimately succeeding.
I've read about the concept of commitment levels but always assumed the account state passed into the smart contract would be the latest state of the world (i.e. processed)? What I appear to be observing is it's more like the finalised state.
Can anyone shed some light on what might be going on here?
I will try and come up with a minimal code example to demonstrate the problem but just wanted to ask the question first, to see if anyone can point me in the right direction.
Thanks
So if you look at the docs you linked, processed has a note:
the block may still be skipped by the cluster
This is a very important note if you're only looking for account state changes and don't want some that may be false. There's a number of reasons that a slot can be skipped, or a transaction could be rejected by the cluster.
If any of the above happens, then the account state that is accepted by the cluster as a whole may not be reflected in processed, but finalized.
In the end my specific problem came down to pre-flight checks using the 'finalized' commitment level when my logic for polling the account was using 'confirmed'. Modifying the preflightCommitment argument on sendTransaction fixed the problem for me.

Compensating Events on CQRS/ES Architecture

So, I'm working on a CQRS/ES project in which we are having some doubts about how to handle trivial problems that would be easy to handle in other architectures
My scenario is the following:
I have a customer CRUD REST API and each customer has unique document(number), so when I'm registering a new customer I have to verify if there is another customer with that document to avoid duplicity, but when it comes to a CQRS/ES architecture where we have eventual consistency, I found out that this kind of validations can be very hard to address.
It is important to notice that my problem is not across microservices, but between the command application and the query application of the same microservice.
Also we are using eventstore.
My current solution:
So what I do today is, in my command application, before saving the CustomerCreated event, I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right? Because my query can be desynchronized, so I cannot trust it 100%. That's when my second validation kicks in, when my query application is processing the events and saving them to my PostgreSQL, I check again if there is a customer with that document and if there is, I reject that event and emit a compensating event to undo/cancel/inactivate the customer with the duplicated document, therefore finishing that customer stream on eventstore.
Altough this works, there are 2 things that bother me here, the first thing is my command application relying on the query application, so if my query application is down, my command is affected (today I just return false on my validation if query is down but still...) and second thing is, should a query/read model really be able to emit events? And if so, what is the correct way of doing it? Should the command have some kind of API for that? Or should the query emit the event directly to eventstore using some common shared library? And if I have more than one view/read? Which one should I choose to handle this?
Really hope someone could shine a light into these questions and help me this these matters.
For reference, you may want to be reviewing what Greg Young has written about Set Validation.
I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right?
That's exactly right - your read model is stale copy, and may not have all of the information collected by the write model.
That's when my second validation kicks in, when my query application is processing the events and saving them to my PostgreSQL, I check again if there is a customer with that document and if there is, I reject that event and emit a compensating event to undo/cancel/inactivate the customer with the duplicated document, therefore finishing that customer stream on eventstore.
This spelling doesn't quite match the usual designs. The more common implementation is that, if we detect a problem when reading data, we send a command message to the write model, telling it to straighten things out.
This is commonly referred to as a process manager, but you can think of it as the automation of a human supervisor of the system. Conceptually, a process manager is an event sourced collection of messages to be sent to the command model.
You might also want to consider whether you are modeling your domain correctly. If documents are supposed to be unique, then maybe the command model should be using the document number as a key in the book of record, rather than using the customer. Or perhaps the document id should be a function of the customer data, rather than being an arbitrary input.
as far as I know, eventstore doesn't have transactions across different streams
Right - one of the things you really need to be thinking about in general is where your stream boundaries lie. If set validation has significant business value, then you really need to be thinking about getting the entire set into a single stream (or by finding a way to constrain uniqueness without using a set).
How should I send a command message to the write model? via API? via a message broker like Kafka?
That's plumbing; it doesn't really matter how you do it, so long as you are sure that the command runs within its own transaction/unit of work.
So what I do today is, in my command application, before saving the CustomerCreated event, I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right? Because my query can be desynchronized, so I cannot trust it 100%.
No, you cannot safely rely on the query side, which is eventually consistent, to prevent the system to step into an invalid state.
You have two options:
You permit the system to enter in a temporary, pending state and then, eventually, you will bring it into a valid permanent state; for this you could allow the command to pass, yield CustomerRegistered event and using a Saga/Process manager you verify against a uniquely-indexed-by-document-collection and issue a compensating command (not event!), i.e. UnregisterCustomer.
Instead of sending a command, you create&start a Saga/Process that preallocates the document in a uniquely-indexed-by-document-collection and if successfully then send the RegisterCustomer command. You can model the Saga as an entity.
So, in both solution you use a Saga/Process manager. In order for the system to be resilient you should make sure that RegisterCustomer command is idempotent (so you can resend it if the Saga fails/is restarted)
You've butted up against a fairly common problem. I think the other answer by VoicOfUnreason is worth reading. I just wanted to make you aware of a few more options.
A simple approach I have used in the past is to create a lookup table. Your command tries to register the key in a unique constraint table. If it can reserve the key the command can go ahead.
Depending on the nature of the data and the domain you could let this 'problem' occur and raise additional events to mark it. If it is something that's important to the business/the way the application works then you can deal with it either manually or at the time via compensating commands. if the latter then it would make sense to use a process manager.
In some (rare) cases where speed/capacity is less of an issue then you could consider old-fashioned locking and transactions. Admittedly these are much better suited to CRUD style implementations but they can be used in CQRS/ES.
I have more detail on this in my blog post: How to Handle Set Based Consistency Validation in CQRS
I hope you find it helpful.

Design of notification events

I am designing some events that will be raised when actions are performed or data changes in a system. These events will likely be consumed by many different services and will be serialized as XML, although more broadly my question also applies to the design of more modern funky things like Webhooks.
I'm specifically thinking about how to describe changes with an event and am having difficulty choosing between different implementations. Let me illustrate my quandry.
Imagine a customer is created, and a simple event is raised.
<CustomerCreated>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</CustomerCreated>
Now let's say Bob spends lots of money and becomes a gold customer, or indeed any other property changes (e.g.: he now prefers to be known as Robert). I could raise an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is nice because the schema of the Created and Modified events are the same and any subscriber receives the complete current state of the entity. However it is difficult for any receiver to determine which properties have changed without tracking state themselves.
I then thought about an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is more compact and only contains the properties that have changed, but comes with the downside that the receiver must apply the changes and reassemble the current state of the entity if they need it. Also, the schemas of the Created and Modified events must be different now; CustomerId is required but all other properties are optional.
Then I came up with this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<Before>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</Before>
<After>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</After>
</CustomerModified>
This covers all bases as it contains the full current state, plus a receiver can figure out what has changed. The Before and After elements have the exact same schema type as the Created event. However, it is incredibly verbose.
I've struggled to find any good examples of events; are there any other patterns I should consider?
You tagged the question as "Event Sourcing", but your question seems to be more about Event-Driven SOA.
I agree with #Matt's answer--"CustomerModified" is not granular enough to capture intent if there are multiple business reasons why a Customer would change.
However, I would back up even further and ask you to consider why you are storing Customer information in a local service, when it seems that you (presumably) already have a source of truth for customer. The starting point for consuming Customer information should be getting it from the source when it's needed. Storing a copy of information that can be queried reliably from the source may very well be an unnecessary optimization (and complication).
Even if you do need to store Customer data locally (and there are certainly valid reasons for need to do so), consider passing only the data necessary to construct a query of the source of truth (the service emitting the event):
<SomeInterestingCustomerStateChange>
<CustomerId>1234</CustomerId>
</SomeInterestingCustomerStateChange>
So these event types can be as granular as necessary, e.g. "CustomerAddressChanged" or simply "CustomerChanged", and it is up to the consumer to query for the information it needs based on the event type.
There is not a "one-size-fits-all" solution--sometimes it does make more sense to pass the relevant data with the event. Again, I agree with #Matt's answer if this is the direction you need to move in.
Edit Based on Comment
I would agree that using an ESB to query is generally not a good idea. Some people use an ESB this way, but IMHO it's a bad practice.
Your original question and your comments to this answer and to Matt's talk about only including fields that have changed. This would definitely be problematic in many languages, where you would have to somehow distinguish between a property being empty/null and a property not being included in the event. If the event is getting serialized/de-serialized from/to a static type, it will be painful (if not impossible) to know the difference between "First Name is being set to NULL" and "First Name is missing because it didn't change".
Based on your comment that this is about synchronization of systems, my recommendation would be to send the full set of data on each change (assuming signal+query is not an option). That leaves the interpretation of the data up to each consuming system, and limits the responsibility of the publisher to emitting a more generic event, i.e. "Customer 1234 has been modified to X state". This event seems more broadly useful than the other options, and if other systems receive this event, they can interpret it as they see fit. They can dump/rewrite their own data for Customer 1234, or they can compare it to what they have and update only what changed. Sending only what changed seems more specific to a single consumer or a specific type of consumer.
All that said, I don't think any of your proposed solutions are "right" or "wrong". You know best what will work for your unique situation.
Events should be used to describe intent as well as details, for example, you could have a CustomerRegistered event with all the details for the customer that was registered. Then later in the stream a CustomerMadeGoldAccount event that only really needs to capture the customer Id of the customer who's account was changed to gold.
It's up to the consumers of the events to build up the current state of the system that they are interested in.
This allows only the most pertinent information to be stored in each event, imagine having hundreds of properties for a customer, if every command that changed a single property had to raise an event with all the properties before and after, this gets unwieldy pretty quickly. It's also difficult to determine why the change occurred if you just publish a generic CustomerModified event, which is often a question that is asked about the current state of an entity.
Only capturing data relevant to the event means that the command that issues the event only needs to have enough data about the entity to validate the command can be executed, it doesn't need to even read the whole customer entity.
Subscribers of the events also only need to build up a state for things that they are interested in, e.g. perhaps an 'account level' widget is listening to these events, all it needs to keep around is the customer ids and account levels so that it can display what account level the customer is at.
Instead of trying to convey everything through payload xmls' fields, you can distinguish between different operations based on -
1. Different endpoint URLs depending on the operation(this is preferred)
2. Have an opcode(operation code) as an element in the xml file which tells which operation is to used to handle the incoming request.(more nearer to your examples)
There are a few enterprise patterns applicable to your business case - messaging and its variants, and if your system is extensible then Enterprise Service Bus should be used. An ESB allows reliable handling of events and processing.

EventSourcing fix business logic bugs

I'm making some study of eventsourcing before applying it (or not).
Quick question : When using EventSourcing pattern we can imagine this scenario to handle an event :
command sent
command handler receive the previous command, validate it then
command handler persist this event and publish it
business model apply (business logic algorithm v1 for example) this event mutating its internal state
We can replay all the events and reconstruct the business object state.
How to handle business logic bugs (business logic algorithm v1 contains a nasty bugs).
I read we can fix the bug and replay the events and then we got the business model in a valid state once again.
But what happens if when fixing the business rule when applying event#1 would have caused the 'futurs' commands to fails ? In other words, the event#2, event#3, event#n was dependend of the state of the domain model after applying event#0. How can we fix the cascading events failure ?
I don't have a specific usecase : but we can imagine an account where balance is currently positive. Applying Event#0 increment the balance but this was a bug, the developer wanted to reduce the balance. Event#1 is a purchase that was valid because of the positive balance at this time.
The developer fixes the bug and replay the events. Event#0 decrease the balance which becomes negative. Event#1 is replayed : what happens ?
Do we need to handle this case with 'compensation' ? how ?
thanks in advance for your comments, external ressources that can be of any help (articles, blogs).
bye
Minor correction
When using EventSourcing pattern we can imagine this scenario to handle an event
command sent
command handler receive the previous command, validate it then
business model verifies that the command can be satisfied without violating the business invariant, and calculates the ensuing events
command handler persist this events and publish them
The command handler (specifically, the anti-corruption layer) is responsible for making sure that the command is well formed. The business model decides if the command is permitted by the business.
The good news: the events are just state changes; all of the rule validation is already done. When you fix the bug in the domain object so that it produces the correct events in response to the command, you aren't changing the way the event is applied.
And you certainly aren't changing the history -- if the ATM gave away $20 that it wasn't supposed to, you can't get the money back by editing the record.
What that means is that deploying the bug fix keeps the problem from getting worse; but it doesn't do anything for the event histories that are incorrect.
Compensating events are the right answer here. Ever have a grocery clerk double scan an item, and have to back one of them out? If you look closely, you'll see all three items
+1 candy bar
+1 candy bar
-1 candy bar
That's the idiom of the compensating event being appended to end of the stream.
So if the error showed first appeared in event #0, and then [event #1 .. event #99] have been played on top of that, the remedy for the error is to publish a compensating event #100.
Notice that this is exactly what book keepers would do. You put the wrong sign on the entry on line #1, add a bunch more entries, realize your mistake, and add a new entry that compensates for the earlier mistake.
More good news: in mature business processes, there are already mitigation procedures in place to handle various contingencies. So you can grab a meeting with your domain experts, and doodle on the whiteboard explaining the problem, and your experts should be able to show you the right way to compensate for it. Everything after that is feature management (does the mitigation need to be automated? Does the system need to do the mitigation automatically, or can it let human experts tell it what mitigation to apply, etc. etc.)

Data validation: fail fast, fail early vs. complete validation

Regarding data validation, I've heard that the options are to "fail fast, fail early" or "complete validation". The first approach fails on the very first validation error, whereas the second one builds up a list of failures and presents it.
I'm wondering about this in the context of both server-side and client-side data validation. Which method is appropriate in what context, and why?
My personal preference for data-validation on the client-side is the second method which informs the user of all failing constraints. I'm not informed enough to have an opinion about the server-side, although I would imagine it depends on the business logic involved.
Part of why this is confusing is that people don't discuss orthogonality as part of the criteria. 'Fail-early' is useful so that the error is caught where it happens, not downstream. But for orthogonal failures, there is no downstream, or multiple downstreams.
For example, most user forms are filled with lots of independent questions, like username, password, email, for example. Since they're independent, wait until all 3 arrive, and describe all errors at once. Making the user go through three submit-check-complain cycles is preposterous.
For trivial errors such as invalid inputs or missing data, it's up to you how convenient you want to make the system for the user. It could be very annoying, for example, if somebody was importing a full spreadsheet of data into a system and the first row failed and you said "first row failed." The user fixes the first row, imports, and gets the message "second row failed." Imagine there are 65536 rows. You already know you're not going to do anything with the data, but do you want to make the user's life easier? Again, these are the trivial errors I'm discussing, and of course you will be designing a system that validates all data before processing.
For more serious errors that either you do not expect or are more than mere validation issues, revert to fail fast and hard.

Resources