I read documentation for making two way communication in OneM2M. I understand that one way of doing that is using subscription and notification system. Lets assume we have an example like the figure below. IN-AE (Smartphone) wants to open the light of ADN-AE2. Assume that registration and resource creations are already done and also ADN-AE-2 is already subscribed to the container of a light in ADN-AE-2.
As a result, oneM2M covers the part of how IN-AE sends the light control request and how ADN-AE-2 or ADN-AE-1 took the request that comes from IN-AE and execute it. But what if ADN-AE-2 cannot control the light and fail while controlling it. What should be the scenario ? The contentinstance is already created with the request sent by IN-AE.
1- Should ADN-AE-2 create another contentinstance that is previous state of the container
2- Should ADN-AE-2 remove the contentinstance that is not executed (then what about other subscribers that take the notification and execute it successfuly)
What should be the recommended way if the actuator cannot do the action ?
http://www.onem2m.org/tr-0034/procedures/actuator-switch-control
This is a general problem in asynchronous communication. What happens when the receiver never receives a command, or is just not able to execute it (for whatever reasons, for example it is busy doing other things or the request is outside certain parameters)?
Both your options are valid, but won't work when the ADN is offline, so that the ADN-AE2 cannot perform the procedures. Also, even when the ADN is always online, both procedures have problems when there are more than one other AE that wants to control the ADN-AE2, or when the IN-AE is impatiently setting the desired state again and again. This often might result in race conditions.
I would suggest to re-think the communication scheme between the two AE's and split the original Container into two: one Container for the target state, and another Container for the status state. The target Container is used by AE's, other than the ADN-AE2, to set a desired state. The ADN-AE2 is notified of the creation of a new ContentInstance and acts accordingly. It then creates a new ContentInstance in the state container that reflects the new internal state, and this will notify the IN-AE to reflect the change.
The following drawing reflects an example resource structure for this pattern:
AE ─┬─ Container_target ─── ContentInstances* ◀═══ Container for desired state, set by other AEs
│
└─ Container_state ─── ContentInstances* ◀═══ Container for actual state, set only by this AE
This is a common pattern in asynchronous communication when the connectivity between nodes is not always reliable, or when the outcome of a state-change request is not always guaranteed. The responsibility for executing requests lies with the ADN-AE2, the policy for how often to retry or react on failure lies with the IN-AE.
Related
I am attempting to accomplish something along these lines with Quarkus, and Naryana:
client calls service to start a process that takes a while: /lra/start
This call sets off an LRA, and returns an LRA id used to track the status of the action
client can keep polling some endpoint to determine status
service eventually finishes and marks the action done through the coordinator
client sees that the action has completed, is given the result or makes another request to get that result
Is this a valid use case? Am I visualizing the correct way this tool can work? Based on how the linked guide reads, it seems that the endpoints are more of a passthrough to the coordinator, notifying it that we start and end an LRA. Is there a more programmatic way to interact with the coordinator?
Yes, it might be a valid use case, but in every case please read the MicroProfile LRA specification - https://github.com/eclipse/microprofile-lra.
The idea you describe is more or less one LRA participant executing in a new LRA and polling the status of this execution. This is not totally what the LRA is intended for, but surely can be used this way.
The main idea of LRA is the composition of distributed transactions based on the saga pattern. Basically, the point is to coordinate multiple services to achieve consistent results with an eventual consistency guarantee. So you see that the main benefit arises when you can propagate LRA through different services that either all complete their actions or all of their compensation callbacks will be called in case of failures (and, of course, only for the services that executed their actions in the first place). Here is also an example with the LRA propagation https://github.com/xstefank/quarkus-lra-trip-example.
EDIT: Sorry, I forgot to add the programmatic API that allows same interactions as annotations - https://github.com/jbosstm/narayana/blob/master/rts/lra/client/src/main/java/io/narayana/lra/client/NarayanaLRAClient.java. However, note that is not in the specification and is only specific to Narayana.
SMB2 CHANGE_NOTIFY looks promising, as if it could deliver enough information on subdirectory or subtree updates from the server, so we can keep our listing of remote directory up-to-date by handling the response.
However, it's not a subscription to an event stream, just a one-off command receiving one response, so I suspect that it can be used only as a mere hint to invalidate our cache and reread the directory. When we receive a response, there could be any additional changes before we send another CHANGE_NOTIFY request, and we'll miss the details of these changes.
Is there any way around this problem? Or is rereading directory on learning that it's updated a necessary step?
I want to understand possible solutions on the protocol level (you can imagine I'm using a customized client that I can make do what I want, with some common servers like Windows or smbd3).
Strictly speaking, not even re-reading the directory listing should save you, since the directory can change between re-reading the listing and submitting another CHANGE_NOTIFY request. The race condition just moves to a different spot.
Except there is no race condition.
This took a little digging, but it’s all there in the specification. In MS-SMB2 v20200826 §3.3.5.19 ‘Receiving an SMB2 CHANGE_NOTIFY Request’, it is stated:
The server MUST process a change notification request in the object store as specified by the algorithm in section 3.3.1.3.
In §3.3.1.3 ‘Algorithm for Change Notifications in an Object Store’, we have:
The server MUST implement an algorithm that monitors for changes on an object store. The effect of this algorithm MUST be identical to that used to offer the behavior specified in [MS-CIFS] sections 3.2.4.39 and 3.3.5.59.4.
And in MS-CIFS v20201001 §3.3.5.59.4 ‘Receiving an NT_TRANSACT_NOTIFY_CHANGE Request’ there is this:
If the client has not issued any NT_TRANSACT_NOTIFY_CHANGE Requests on this FID previously, the server SHOULD allocate an empty change notification buffer and associate it with the open directory. The size of the buffer SHOULD be at least equal to the MaxParameterCount field in the SMB_COM_NT_TRANSACT Request (section 2.2.4.62.1) used to transport the NT_TRANSACT_NOTIFY_CHANGE Request. If the client previously issued an NT_TRANSACT_NOTIFY_CHANGE Request on this FID, the server SHOULD already have a change notification buffer associated with the FID. The change notification buffer is used to collect directory change information in between NT_TRANSACT_NOTIFY_CHANGE (section 2.2.7.4) calls that reference the same FID.
Emphasis mine. This agrees with how Samba implements it (the fsp structure persists between individual requests). I wouldn’t expect Microsoft to do worse than that by not keeping the promise they made in their own specification.
I'm trying to build a mental model of the role of off-chain workers in substrate. The bigger picture seems to be that they move logic inside the substrate node, that was otherwise done by oracles, triggering on predefined transactions. There are two use cases I was thinking of specifically:
1: Validating file formats: incoming transaction proposes a file accessible via url or ipfs hash, and it's format needs to be validated. An off-chain worker fetches the file, asserts format (size, encoding, content, whatever) and if correct submits another transaction saying it's valid.
2: Key generation: let's assume there is a separate service distributed with the substrate node, which manages keys for each instance. Node A runs a key sharing algorithm (like Shamir's secret sharing) via this external service between participants A, B and C, then makes a transaction creating a group (A,B,C) on-chain. This transaction triggers all nodes that are in this group to run off-chain workers, call into their local key store verifying having the key. They can all mark it on-chain afterwards.
As far as I understand it correctly, off-chain workers are triggered in every node after block execution. In the former use case, this would result in lots of transactions validating just one file, and nothing guarantees the correctness of these. What is a good way of reaching consensus on the validity of the file? Is it also possible without economic incentives like staking? It would be problematic with tokens having no value in the network, e.g in enterprise settings. Is this even the right use case for off-chain workers? The second example should not suffer from such issue, we just need all parties to verify having the key.
Where does the thought process above go wrong, and why?
As far as I understand it correctly, off-chain workers are triggered in every node after block execution.
Yes and no. There is a CLI flag for it. And at the time of this writing it says:
--offchain-worker <ENABLED>
Should execute offchain workers on every block.
By default it's only enabled for nodes that are authoring new blocks. [default: WhenValidating] [possible
values: Always, Never, WhenValidating]
In the former use case, this would result in lots of transactions validating just one file, and nothing guarantees the correctness of these.
I think it is the responsibility of the receiving function (aka. Call) to handle and incentivise this. For example, there could be a reward opportunity to validate an address. But, if it has already been submitted by another transaction, you will get slashed (or even if not, you do pay some transaction fee, for nothing). In such cases, you can assume that not all participants will submit a transaction. They will only do it when there is a chance of improvement, which should be depicted by your potential reward/slash scheme.
Is this even the right use case for off-chain workers?
I am no expert here, but I think at least the validation example is a good example. It is just a matter of finding a good incentive + anti-spam slashing.
I am less familiar with the second example, so no comments on that.
I am designing and developing a microservice platform based on the specifications of http://microservices.io/
The entire framework integrates through socket thus removing the overhead of multiple HTTP requests (like most REST APIs).
A service registry host receives the registry of multiple microservice hosts, each microservice is responsible for a domain of the business. Another host we call a router (or API gateway) is responsible for exposing the microservices for consumption by third parties.
We will use the structure of Sagas (in choreography style) to distribute the requisitions, so we have some doubts:
Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events? (the same logic applies to rollback)
Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?
If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?
I think the main point is that in this router and microservice structure, who is responsible for building the Sagas and propagating their events.
The article Patterns for Microservices — Sync vs. Async does a great job defining many of the terms used here and has animated gifs demonstrating sync vs. async and orchestrated vs. choreographed as well as hybrid setups.
I know the OP answered his own question for his use case, but I want to try and address the questions raised a bit more generally in lieu of the linked article.
Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events?
To use a more general term, a process manager is an orchestrator. A concrete implementation of this may involve a stateful actor that orchestrates a workflow, keeping track of the progress in some way. Since a saga is workflow itself (composed of both forward and compensating actions), it would be the job of the process manager to keep track of the state the saga until completion (success or failure). This typically involves the actor sending synchronous* calls to services waiting for some result before going to the next step. Parallel operations can of course be introduced and what not, but the point is that this actor dictates the progression of the saga.
This is fundamentally different from the choreography model. With this model there is no central actor keeping track of the state of a saga, but rather the saga progresses implicitly via the events that each step emits. Arguably, this is a more pure case of an event-driven model since there is no coordination.
That said, the challenge with this model is observing the state at any given point in time. With the orchestration model above, in theory, each actor could be queried for the state of the saga. In this choreographed model, we don't have this luxury, so in practice a correlation ID is added to every message corresponding to (in this case) a saga. If the messages are queryable in some way (the event bus supports it or through some other storage means), then the messages corresponding to a saga could be queried and the saga state could be reconstructed.. (effectively an event sourced modeled).
Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?
This is an interesting question by itself and one that I have been thinking about quite a lot. The easiest and default answer would be.. hard code the saga plans and map them to the incoming message types. E.g. message A triggers plan X, message B triggers plan Y, etc.
However, I have been thinking about what a control plane might look like that manages these plans and provides the mechanism for pushing changes dynamically to message handlers and/or orchestrators dynamically. The two specific use cases in mind are changes in authorization policies or dynamically adding new steps to a plan.
If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?
The way I have approached this is to include references to the large data if these are objects such as a file or something. For data that are inherently streams themselves, a parallel channel could be referenced that a consumer could read from once it receives the message. I think the important distinction here is to decouple thinking about the messages driving the workflow from where the data is physically materialized which depends on the data representation.
For microservices, every microservice should be responsible for its domain business.
Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events? (the same logic applies to rollback)
All events are not passed to the next microservice, but are published, then all microservices interested in the events should subscribe to them.
If there is rollback, you should consider orchestration.
Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?
The microservice who publish the event will certainly know how to build it. There are no chain of events, because every microservice interested in the event will subscribe it separately.
If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?
Only publish the data others may be interested, not all. In most cases, the data are not large, and message queue can handle them efficiently
I'm developing small CQRS+ES framework and develop applications with it. In my system, I should log some action of the client and use it for analytics, statistics and maybe in the future do something in domain with it. For example, client (on web) download some resource(s) and I need save date, time, type (download, partial,...), from region or country (maybe IP), etc. after that in some view client can see count of download or some complex report. I'm not sure how to implement this feather.
First solution creates analytic context and some aggregate, in each client action send some command like IncreaseDownloadCounter(resourced) them handle the command and raise domain event's and updating view, but in this scenario first download occurred and after that, I send command so this is not really command and on other side version conflict increase.
The second solution is raising event, from client side and update the view model base on it, but in this type of handling my event not store in event store because it's not raise by command and never change any domain context. If is store it in event store, no aggregate to handle it after fetch for some other use.
Third solution is raising event, from client side and I store it on other database may be for each type of event have special table, but in this manner of event handle I have multiple event storage with different schema and difficult on recreating view models and trace events for recreating contexts states so in future if I add some domain for use this type of event's it's difficult to use events.
What is the best approach and solution for this scenario?
First solution creates analytic context and some aggregate
Unquestionably the wrong answer; the event has already happened, so it is too late for the domain model to complain.
What you have is a stream of events. Putting them in the same event store that you use for your aggregate event streams is fine. Putting them in a separate store is also fine. So you are going to need some other constraint to make a good choice.
Typically, reads vastly outnumber writes, so one concern might be that these events are going to saturate the domain store. That might push you towards storing these events separately from your data model (prior art: we typically keep the business data in our persistent book of record, but the sequence of http requests received by the server is typically written instead to a log...)
If you are supporting an operational view, push on the requirement that the state be recovered after a restart. You might be able to get by with building your view off of an in memory model of the event counts, and use something more practical for the representations of the events.
Thanks for your complete answer, so I should create something like the ES schema without some field (aggregate name or type, version, etc.) and collect client event in that repository, some offline process read and update read model or create command to do something on domain space.
Something like that, yes. If the view for the client doesn't actually require any validation by your model at all, then building the read model from the externally provided events is fine.
Are you recommending save some claim or authorization token of the user and sender app for validation in another process?
Maybe, maybe not. The token describes the authority of the event; our own event handler is the authority for the command(s) that is/are derived from the events. It's an interesting question that probably requires more context -- I'd suggest you open a new question on that point.