I've written a basic rpc client which polls the state of an Solana account to look for a specific condition (i.e. a unique int64 Id being written to it). When the condition arises, I call a smart contract which takes the same account as a mutable argument.
Before doing anything, the program checks for the same condition. However this check fails. I understand we're dealing with a distributed system and that state maybe inconsistent for a period of time, but I can repeatedly call for over 30 secs and it fails each time, before ultimately succeeding.
I've read about the concept of commitment levels but always assumed the account state passed into the smart contract would be the latest state of the world (i.e. processed)? What I appear to be observing is it's more like the finalised state.
Can anyone shed some light on what might be going on here?
I will try and come up with a minimal code example to demonstrate the problem but just wanted to ask the question first, to see if anyone can point me in the right direction.
Thanks
So if you look at the docs you linked, processed has a note:
the block may still be skipped by the cluster
This is a very important note if you're only looking for account state changes and don't want some that may be false. There's a number of reasons that a slot can be skipped, or a transaction could be rejected by the cluster.
If any of the above happens, then the account state that is accepted by the cluster as a whole may not be reflected in processed, but finalized.
In the end my specific problem came down to pre-flight checks using the 'finalized' commitment level when my logic for polling the account was using 'confirmed'. Modifying the preflightCommitment argument on sendTransaction fixed the problem for me.
Related
I'm trying to build a mental model of the role of off-chain workers in substrate. The bigger picture seems to be that they move logic inside the substrate node, that was otherwise done by oracles, triggering on predefined transactions. There are two use cases I was thinking of specifically:
1: Validating file formats: incoming transaction proposes a file accessible via url or ipfs hash, and it's format needs to be validated. An off-chain worker fetches the file, asserts format (size, encoding, content, whatever) and if correct submits another transaction saying it's valid.
2: Key generation: let's assume there is a separate service distributed with the substrate node, which manages keys for each instance. Node A runs a key sharing algorithm (like Shamir's secret sharing) via this external service between participants A, B and C, then makes a transaction creating a group (A,B,C) on-chain. This transaction triggers all nodes that are in this group to run off-chain workers, call into their local key store verifying having the key. They can all mark it on-chain afterwards.
As far as I understand it correctly, off-chain workers are triggered in every node after block execution. In the former use case, this would result in lots of transactions validating just one file, and nothing guarantees the correctness of these. What is a good way of reaching consensus on the validity of the file? Is it also possible without economic incentives like staking? It would be problematic with tokens having no value in the network, e.g in enterprise settings. Is this even the right use case for off-chain workers? The second example should not suffer from such issue, we just need all parties to verify having the key.
Where does the thought process above go wrong, and why?
As far as I understand it correctly, off-chain workers are triggered in every node after block execution.
Yes and no. There is a CLI flag for it. And at the time of this writing it says:
--offchain-worker <ENABLED>
Should execute offchain workers on every block.
By default it's only enabled for nodes that are authoring new blocks. [default: WhenValidating] [possible
values: Always, Never, WhenValidating]
In the former use case, this would result in lots of transactions validating just one file, and nothing guarantees the correctness of these.
I think it is the responsibility of the receiving function (aka. Call) to handle and incentivise this. For example, there could be a reward opportunity to validate an address. But, if it has already been submitted by another transaction, you will get slashed (or even if not, you do pay some transaction fee, for nothing). In such cases, you can assume that not all participants will submit a transaction. They will only do it when there is a chance of improvement, which should be depicted by your potential reward/slash scheme.
Is this even the right use case for off-chain workers?
I am no expert here, but I think at least the validation example is a good example. It is just a matter of finding a good incentive + anti-spam slashing.
I am less familiar with the second example, so no comments on that.
After moving over from Bot Framework v3 and studying the docs/samples, I'm still not understanding why the WaterfallStep dialog handling in v4 was implemented in such a way.
Why was it chosen to process the result of the previous step in the next step within a waterfall?
Given a waterfall with 3 Steps PromptName, PromptAge and PromptLocation, I see the following:
Method naming: Given the 2nd and 3rd prompt, method naming gets unclear. Naturally we would do AskForAge() or AskForLocation() but this is misleading due to the next point
SOLID Principals: Isn't "single responsibility principal" violated as we do two things in each step? Storing the previous response while asking for the next one in the same method, which ultimately leads in method names like AskForLocationAndStoreAge()
Code duplication: Due to the fact that each step assumes concrete input from its previous step, it can't be reused easily nor can order be changed. Even the simplest samples are hard to read.
I'm looking for some clarification on why the design was chosen this way or what I have missed in the concept.
This question seems largely opinion-based and therefore I don't know that it is appropriate for Stack Overflow. It turns out there is actually a good place to ask these kinds of questions, and it's the BotBuilder GitHub repo. I will attempt to answer it all the same.
Why was it chosen to process the result of the previous step in the next step within a waterfall?
This has to do with how bot conversations work. In its most basic form, without any dialogs or state or anything, a bot works by answering questions or otherwise responding to messages from the user. That is to say, a bot program's entire lifespan is to receive one message, process it, and provide a response while the message it received is still "active" in some sense. Then every following message the bot receives from the user is processed in the same way as though the first message was never received, like the message is being processed by an entirely new instance of the program with no memory of previous messages.
To make bot conversations feel more continuous, bot state and dialogs are needed. With dialogs and specifically prompt dialogs, the bot is now able to ask the user a question rather than just answer questions the user asked. In order to do this, the bot needs to store information about the conversation somewhere so that when the next message is received from the user the new "instance" of the bot program will know that the message from the user should be interpreted as a response to the question that the previous instance of the program asked. It's like leaving a note for someone working the next shift to let them know about something that happened during the previous shift that they need to follow up on.
Knowing all this, it seems only natural to process the result of the previous step in the next step within a waterfall because of the nature of conversations and dialogs. In a waterfall dialog containing prompts, the bot will receive messages pertaining to the last message the bot sent. So it needs to process the result of the previous step in the next step. It also needs to respond to the message, and in a waterfall that often means asking another question.
Isn't "single responsibility principal" violated as we do two things in each step? Storing the previous response while asking for the next one in the same method, which ultimately leads in method names like AskForLocationAndStoreAge()
As I understand it, the single responsibility principle refers to classes and not methods. If it does refer to methods, then that principle may well be violated or bent in some way in this case. However, it doesn't have to be. You are free to make multiple methods to handle each step, or even to make multiple steps to handle each message. If you wanted, your waterfall could contain a step that processes the result of the previous prompt and then continues on into the next step which makes a new prompt.
Due to the fact that each step assumes concrete input from its previous step, it can't be reused easily nor can order be changed. Even the simplest samples are hard to read.
You ultimately have control over how the input is validated/interpreted so it can be as concrete as you want. The reusability of a dialog or waterfall step has everything to do with how similar the different things you want to do are, the same as in any area of programming. If the samples are hard to read, I recommend raising issues with those samples. Of course, you can still raise an issue with the design of the SDK itself in the appropriate repo, but please do consider including suggestions for the way you think it should be instead.
This is more of a theorical question.
Well, imagine that I have two programas that work simultaneously, the main one only do something when he receives a flag marked with true from a secondary program. So, this main program has a function that will keep asking to the secondary for the value of the flag, and when it gets true, it will do something.
What I learned at college is that the polling is the simplest way of doing that. But when I started working as an developer, coworkers told me that this method generate some overhead or it's waste of computation, by asking every certain amount of time for a value.
I tried to come up with some ideas for doing this in a different way, searched on the internet for something like this, but didn't found a useful way about how to do this.
I read about interruptions and passive ways that can cause the main program to get that data only if was informed by the secondary program. But how this happen? The main program will need a function to check for interruption right? So it will not end the same way as before?
What could I do differently?
There is no magic...
no program will guess when it has new information to be read, what you can do is decide between two approaches,
A -> asks -> B
A <- is informed <- B
whenever use each? it depends in many other factors like:
1- how fast you need the data be delivered from the moment it is generated? as far as possible? or keep a while and acumulate
2- how fast the data is generated?
3- how many simoultaneuos clients are requesting data at same server
4- what type of data you deal with? persistent? fast-changing?
If you are building something like a stocks analyzer where you need to ask the price of stocks everysecond (and it will change also everysecond) the approach you mentioned may be the best
if you are writing a chat based app like whatsapp where you need to check if there is some new message to the client and most of time wont... publish subscribe may be the best
but all of this is a very superficial look into a high impact architecture decision, it is not possible to get the best by just looking one factor
what i want to show is that
coworkers told me that this method generate some overhead or it's
waste of computation
it is not a right statement, it may be in some particular scenario but overhead will always exist in distributed systems
The typical way to prevent polling is by using the Publish/Subscribe pattern.
Your client program will subscribe to the server program and when an event occurs, the server program will publish to all its subscribers for them to handle however they need to.
If you flip the order of the requests you end up with something more similar to a standard web API. Your main program (left in your example) would be a server listening for requests. The secondary program would be a client hitting an endpoint on the server to trigger an event.
There's many ways to accomplish this in every language and it doesn't have to be tied to tcp/ip requests.
I'll add a few links for you shortly.
Well, in most of languages you won't implement such a low level. But theorically speaking, there are different waiting strategies, you are talking about active waiting. Doing this you can easily eat all your memory.
Most of languages implements libraries to allow you to start a process as a service which is at passive waiting and it is triggered when a request comes.
So, I'm working on a CQRS/ES project in which we are having some doubts about how to handle trivial problems that would be easy to handle in other architectures
My scenario is the following:
I have a customer CRUD REST API and each customer has unique document(number), so when I'm registering a new customer I have to verify if there is another customer with that document to avoid duplicity, but when it comes to a CQRS/ES architecture where we have eventual consistency, I found out that this kind of validations can be very hard to address.
It is important to notice that my problem is not across microservices, but between the command application and the query application of the same microservice.
Also we are using eventstore.
My current solution:
So what I do today is, in my command application, before saving the CustomerCreated event, I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right? Because my query can be desynchronized, so I cannot trust it 100%. That's when my second validation kicks in, when my query application is processing the events and saving them to my PostgreSQL, I check again if there is a customer with that document and if there is, I reject that event and emit a compensating event to undo/cancel/inactivate the customer with the duplicated document, therefore finishing that customer stream on eventstore.
Altough this works, there are 2 things that bother me here, the first thing is my command application relying on the query application, so if my query application is down, my command is affected (today I just return false on my validation if query is down but still...) and second thing is, should a query/read model really be able to emit events? And if so, what is the correct way of doing it? Should the command have some kind of API for that? Or should the query emit the event directly to eventstore using some common shared library? And if I have more than one view/read? Which one should I choose to handle this?
Really hope someone could shine a light into these questions and help me this these matters.
For reference, you may want to be reviewing what Greg Young has written about Set Validation.
I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right?
That's exactly right - your read model is stale copy, and may not have all of the information collected by the write model.
That's when my second validation kicks in, when my query application is processing the events and saving them to my PostgreSQL, I check again if there is a customer with that document and if there is, I reject that event and emit a compensating event to undo/cancel/inactivate the customer with the duplicated document, therefore finishing that customer stream on eventstore.
This spelling doesn't quite match the usual designs. The more common implementation is that, if we detect a problem when reading data, we send a command message to the write model, telling it to straighten things out.
This is commonly referred to as a process manager, but you can think of it as the automation of a human supervisor of the system. Conceptually, a process manager is an event sourced collection of messages to be sent to the command model.
You might also want to consider whether you are modeling your domain correctly. If documents are supposed to be unique, then maybe the command model should be using the document number as a key in the book of record, rather than using the customer. Or perhaps the document id should be a function of the customer data, rather than being an arbitrary input.
as far as I know, eventstore doesn't have transactions across different streams
Right - one of the things you really need to be thinking about in general is where your stream boundaries lie. If set validation has significant business value, then you really need to be thinking about getting the entire set into a single stream (or by finding a way to constrain uniqueness without using a set).
How should I send a command message to the write model? via API? via a message broker like Kafka?
That's plumbing; it doesn't really matter how you do it, so long as you are sure that the command runs within its own transaction/unit of work.
So what I do today is, in my command application, before saving the CustomerCreated event, I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right? Because my query can be desynchronized, so I cannot trust it 100%.
No, you cannot safely rely on the query side, which is eventually consistent, to prevent the system to step into an invalid state.
You have two options:
You permit the system to enter in a temporary, pending state and then, eventually, you will bring it into a valid permanent state; for this you could allow the command to pass, yield CustomerRegistered event and using a Saga/Process manager you verify against a uniquely-indexed-by-document-collection and issue a compensating command (not event!), i.e. UnregisterCustomer.
Instead of sending a command, you create&start a Saga/Process that preallocates the document in a uniquely-indexed-by-document-collection and if successfully then send the RegisterCustomer command. You can model the Saga as an entity.
So, in both solution you use a Saga/Process manager. In order for the system to be resilient you should make sure that RegisterCustomer command is idempotent (so you can resend it if the Saga fails/is restarted)
You've butted up against a fairly common problem. I think the other answer by VoicOfUnreason is worth reading. I just wanted to make you aware of a few more options.
A simple approach I have used in the past is to create a lookup table. Your command tries to register the key in a unique constraint table. If it can reserve the key the command can go ahead.
Depending on the nature of the data and the domain you could let this 'problem' occur and raise additional events to mark it. If it is something that's important to the business/the way the application works then you can deal with it either manually or at the time via compensating commands. if the latter then it would make sense to use a process manager.
In some (rare) cases where speed/capacity is less of an issue then you could consider old-fashioned locking and transactions. Admittedly these are much better suited to CRUD style implementations but they can be used in CQRS/ES.
I have more detail on this in my blog post: How to Handle Set Based Consistency Validation in CQRS
I hope you find it helpful.
I'm making some study of eventsourcing before applying it (or not).
Quick question : When using EventSourcing pattern we can imagine this scenario to handle an event :
command sent
command handler receive the previous command, validate it then
command handler persist this event and publish it
business model apply (business logic algorithm v1 for example) this event mutating its internal state
We can replay all the events and reconstruct the business object state.
How to handle business logic bugs (business logic algorithm v1 contains a nasty bugs).
I read we can fix the bug and replay the events and then we got the business model in a valid state once again.
But what happens if when fixing the business rule when applying event#1 would have caused the 'futurs' commands to fails ? In other words, the event#2, event#3, event#n was dependend of the state of the domain model after applying event#0. How can we fix the cascading events failure ?
I don't have a specific usecase : but we can imagine an account where balance is currently positive. Applying Event#0 increment the balance but this was a bug, the developer wanted to reduce the balance. Event#1 is a purchase that was valid because of the positive balance at this time.
The developer fixes the bug and replay the events. Event#0 decrease the balance which becomes negative. Event#1 is replayed : what happens ?
Do we need to handle this case with 'compensation' ? how ?
thanks in advance for your comments, external ressources that can be of any help (articles, blogs).
bye
Minor correction
When using EventSourcing pattern we can imagine this scenario to handle an event
command sent
command handler receive the previous command, validate it then
business model verifies that the command can be satisfied without violating the business invariant, and calculates the ensuing events
command handler persist this events and publish them
The command handler (specifically, the anti-corruption layer) is responsible for making sure that the command is well formed. The business model decides if the command is permitted by the business.
The good news: the events are just state changes; all of the rule validation is already done. When you fix the bug in the domain object so that it produces the correct events in response to the command, you aren't changing the way the event is applied.
And you certainly aren't changing the history -- if the ATM gave away $20 that it wasn't supposed to, you can't get the money back by editing the record.
What that means is that deploying the bug fix keeps the problem from getting worse; but it doesn't do anything for the event histories that are incorrect.
Compensating events are the right answer here. Ever have a grocery clerk double scan an item, and have to back one of them out? If you look closely, you'll see all three items
+1 candy bar
+1 candy bar
-1 candy bar
That's the idiom of the compensating event being appended to end of the stream.
So if the error showed first appeared in event #0, and then [event #1 .. event #99] have been played on top of that, the remedy for the error is to publish a compensating event #100.
Notice that this is exactly what book keepers would do. You put the wrong sign on the entry on line #1, add a bunch more entries, realize your mistake, and add a new entry that compensates for the earlier mistake.
More good news: in mature business processes, there are already mitigation procedures in place to handle various contingencies. So you can grab a meeting with your domain experts, and doodle on the whiteboard explaining the problem, and your experts should be able to show you the right way to compensate for it. Everything after that is feature management (does the mitigation need to be automated? Does the system need to do the mitigation automatically, or can it let human experts tell it what mitigation to apply, etc. etc.)