No state machine in elsa-workflows? - elsa-workflows

love the elsa-workflows project as I was heavily using WWF in the past. However many of my workflows where state machines. I can't see any in elsa, any plans to support this ?

Elsa 2 does not support the state machine model (only the flowchart model), but I am planning on revising the engine for Elsa 3 which would allow any type of model, including state machine and simple sequential flows like we have in Windows WF.
After I answered with the above I started to think ahead of the state machine architecture for V3, during which I realized we can implement the state machine model already today with V2.
All it would take is a simple new activity called e.g. "State" that has an infinite number of outcomes. This State activity would simply set a workflow variable called e.g. "StateMachineState" or "CurrentState". Each outbound connection would be connected to any trigger responsible for transitioning into the next state. This could be a message from a service bus, a timer, an HTTP request, or anything else that's available with Elsa.
The only real change that would need to be added to make the user experience smooth is the ability to keep adding connections without having to specify them manually from the activity editor. With the current design, we could probably just automatically add an extra outcome to the activity. So initially there would just be e.g. "Transition 1". When that one becomes connected, a "Transition 2" would appear.
Anyway, I am revising my answer to: it's not here yet, but:
You can implement it yourself today, and
I will add an initial version of the State machine model to either Elsa 2.1 or 2.2, depending on any hidden gotchas I might have failed to see.
I just pushed a change that includes a State activity.
With this, you can now easily implement a state machine by adding State activities to your workflow. Here's an example of a traffic light state machine:
This workflow kick starts automatically after 5 seconds, after which it will transition into the "Green" state. Then it stays there for 10 seconds before transitioning into the "Yellow" state. After 5 seconds, it then transitions into the "Red" state, and finally transitions back to the "Green" state after 5 seconds. Then it repeats.
To use the State activity, you specify things:
State name.
Allowed transitions (the traffic light example includes only one transition per state, but you can specify more than just one).


Spring State Machine - Persist Libraries and Final State - Stops Listening

I was looking at spring state machine (spending a small amount of time evaluating, before being moved onto another project).
I wanted to use papyrus and UML modeling for an Order Flow. This worked. I had a REST interface working. I expanded to look at the persistence demo and created a number of state machines using a cross-reference id.
I used thymeleaf to show the various orders, their states and send events.
This all seemed to work UNTIL any one of the state machines entered a "Final State" (The one that looks like a bullseye). At this point the AbstractPersistStateMachineHandler stopped triggering/listening and the onPersist no longer fired.
Is there an issue with using a "final state" and the persistence ( approach?
If i reworked it to just ensure this last state was a "normal" state (but with no exists) then it worked fine, but from a state model perspective probably doesnt accurately show that we have reached the end of the lifecyle.
Alot of what i did would have been based around: \spring-statemachine\spring-statemachine-samples\datapersist

DDD dealing with Eventual consistency for multiple aggregates inside a bounded context with NoSQL

I am currently working on a DDD Geolocation application that has two separate aggregate roots inside one bounded context. Due to frequent coordinate updates I am using redis to persist my data which doesn't allow rollbacks.
My first aggregate root is a trip object containing driver (users), passengers (list of users), etc.
My second aggregate root is user position updates
When a coordinate update is sent I will generate and fire a "UpdateUserPostionEvent". As a side effect I will also generate and fire a "UpdateTripEvent" at a certain point, which will update coordinates of drivers/passengers.
My question is how can I deal with eventual consistency if I am firing my "UpdateLiveTripEvent" asynchronously. My UpdateLiveTripEventHandler has several points of failure and besides logging an error how can I deal with this inconsistency?
I am using a library called MediatR and the INotificationHandler which is as far as I know is "Fire and Forget"
Edit: Ended up finding this SO post that describes exactly what I need (saga/process manager) but unfortunately I am unable to find any kind of Saga implentation for handling events within the same BC. All examples I am seeing involve a sevice bus.
Same or different Bounded Context; with or without Sagas; it does not matter.
Why a event handling fail? Domain rules or Infrastructure.
Domain rules:
A raised event handled by an aggregate (the event handler use the aggregate to apply the event) should NEVER fail by Domain Rules.
If the "target" aggregate has Domain Rules that reject the event your aggregate design is wrong. Commands/Operations can be rejected by Domain rules. Events can not be rejected (nor Undo) by Domain rules.
A event should be raised when all domain rules to this operation was checked by the "origin" aggregate. The "target" aggregate apply the event and maybe raises another event with some values calculated by the "target" aggregate (domain rules, but not for reject the event; events are unrejectable by domain rules; but to "continue" the consistency "chain" with good responsibility segregation). That is the reason why events should have sentences in past as names; because already happened.
Event simulation:
Agg1: Hey buddies! User did this cool thing and everything seems to be OK. --> UserDidThisCoolThingEvent
Agg2: Woha, that is awesome! I'm gonna put +3 in User points. --> UserReceivedSomePointsEvent
Agg3: +3 points to this user? The user just reach 100 points. That is a lot! I'm gonna to convert this User into VIP User. --> UserTurnedIntoVIPEvent
Agg4: A new VIP User? Let's notify it to the rest of the Users to create some envy ;)
Fix it and apply the event. ;) Even "by hand" if needed once your persistence engine, network and/or machine is up again.
Automatic retries for short time fails. ErrorQueues/Logs to not loose your events (and apply it later) in a long time outage.
Event sourcing also helps with this because you can always reapply the persisted events in the "target" aggegate without extra effort to keep events somewhere (i.e. event logs) because your domain persistence is also your event store.

Spark streaming mapWithState timeout without remove

Imagine a use case where events are streaming in per user but only the first week of events are of interest. Within that time frame stateful logic is taking place using mapWithState. After that period the user incoming events should be disregarded.
As the user's state takes memory, it makes sense to change it after the user's week period to a simple already-seen-marker.
If any event comes in for that user a week or later after his first event, it is easy to change the state to that already-seen-marker.
But, if no events come after that week, the state never changes to that already-seen-marker, and the state will continue to occupy memory forever.
As far as I understand, adding a timeout ( to user's state ) will not help, as you are not allowed to change state for a timeout state ( makes sense, as it is going to be removed ).
Is there a simple way to achieve this use case?
From what I understand, Spark's 2.2 mapGroupsWithState has richer timeouts that can be used not only to remove a state, but also to change it (check here).

State machine: validation before initial save in database?

This is a question regarding state machines in general, I don't need help with the actual implementation. Imagine a state machine that formalizes a simple bug report, from inception to its final demise. The bug might transition across states such as "NEW", "CONFIRMED", "RESOLVED", "REOPENED", and "CLOSED". Along with every state transition there is also some accompanying validation code, which could for instance make sure that when moving from NEW to CONFIRMED we have recorded who confirmed it.
My question is related to the initial state – when the bug is just "NEW". It's tempting to say that initial validation is not part of the state machine (e.g. making sure that the bug actually has some description, for instance, before saving it with state "NEW" in the database). But isn't that also a state transition, from "just created" to "NEW"? Shouldn't that transition be validated like any other transition? Isn't it artificial and sub-optimal to separate the initial validation from all other validations?
On the other hand, if we do create a "fake" initial state (say, "CREATED"), along with its respective transition ("CREATED" --> "NEW"), then what happens when that transition isn't validated? If it is validated, it's all good – we switch states and we save the object with the new state (actually called "NEW" here) in the database. But if it doesn't validate then we obviously don't want to save it in the database, and that breaks the state machine pattern by not having an initial state and a final state (we would have an initial state, albeit a fake one – "CREATED" –, but two final states – "CLOSED" and "DELETED"). Not only that, but the "DELETED" state would also be fake, in that there will never be any persistent objects with that state (just as there will never be any persistent objects with state "CREATED").
How do you handle this issue?
Ok, after further investigation it looks like the pattern does solve my issue by itself: in some (most?) state machine models, there is in fact an initial transaction that ends in the initial state. So there is in fact a "fake" initial state, as far as the actual code is concerned, but that state must not be considered a real state in the state machine.

Why is infinite loop protection being triggered on my CRM workflow?

I should have added from the outset - this is in Microsoft Dynamics CRM 2011
I know CRM well, but I'm at a loss to explain behaviour on my current deployment.
Please read the outline of my scenario to help me understand which of my presumptions / understandings is wrong (and therefore what is causing this error). It's not consistent with my expectations.
Basic Scenario
Requirement demands that a web service is called every X minutes (it adds pending items to a database index)
I've opted to use a workflow / custom entity trigger model (i.e. I have a custom entity which has a CREATE plugin registered. The plugin executes my logic. An accompanying workflow is started when "completed" time + [timeout period] expires. On expiry, it creates a new trigger record and the workflow ends).
The plugin logic works just fine. The workflow concept works fine to a point, but after a period of time the workflow stalls with a failure:
This workflow job was canceled because the workflow that started it included an infinite loop. Correct the workflow logic and try again. For information about workflow logic, see Help.
So in a nutshell - standard infinite loop detection. I understand the concept and why it exists.
Specific deployment
Firstly, I think it's quite safe for us to ignore the content of the plugin code in this scenario. It works fine, it's atomic and hardly touches CRM (to be clear, it is a pre-event plugin which runs the remote web service, awaits a response and then sets the "completed on" date/time attribute on my Trigger record before passing the Target entity back into the pipeline) . So long as a Trigger record is created, this code runs and does what it should.
Having discounted the content of the plugin, there might be an issue that I don't appreciate in having the plugin registered on the pre-create step of the entity...
So that leaves the workflow itself. It's a simple one. It runs thusly:
On creation of a new Trigger entity...
it has a Timeout of Trigger.new_completedon + 15 minutes
on timeout, it creates a new Trigger record (with no "completed on" value - this is set by the plugin remember)
That's all - no explicit "end workflow" (though I've just added one now and will set it testing...)
With this set-up, I manually create a new Trigger record and the process spins nicely into action. Roll forwards 1h 58 mins (based on the last cycle I ran - remembering that my plugin code may take a minute to finish running), after 7 successful execution cycles (i.e. new workflow jobs being created and completed), the 8th one fails with the aforementioned error.
What I already know (correct me where I'm wrong)
Recursion depth, by default, is set to 8. If a workflow / plugin calls itself 8 times then an infinite loop is detected.
Recursion depth is reset every one hour (or 10 minutes - see "Warnings" in linked blog?)
Recursion depth settings can be set via PowerShell or SDK code using the Deployment Web Service in an on-premise deployment only (via the Set-CrmSetting Cmdlet)
What I don't want to hear (please)
"Change recursion depth settings"
I cannot change the Deployment recursion depth settings as this is not an option in an online scenario - ultimately I will be deploying to CRM Online too.
"Increase the timeout period on your workflow"
This is not an option either - the reindex needs to occur every 15 minutes, ideally sooner.
#Boone suggested below that the recursion depth timeout is reset after 60 minutes of inactivity rather than every 60 minutes. Therein lies the first misunderstanding.
While discussing with #alex, I suggested that there may be some persistence of CorrelationId between creating an entity via the workflow and the workflow that ultimates gets spawned... Well there is. The CorrelationId is the same in both the plugin and the workflow and any records that spool from that thread. I am now looking at ways to decouple the CorrelationId (or perhaps the creation of records) from the entity and the workflow.
For the one hour "reset" to take place you have to have NO activity for an hour. It doesn't reset just 1 hour from the original. So since you have an activity every 15 minutes, it never has a chance to reset. I don't know that is said in stone anywhere... but from my experience.
In CRM 4 it was possible to create a CRM Service (Google creating a CRM service in the child pipeline) and reset the correlation ID (using CorrelationToken.NewToken()). I don't see anything so easy in the 2011 SDK. No idea if this trick worked in the online environment. Is 2011 online backwards compatible with CRM 4 plug-ins?
One thing you could try would be to use the IExecutionContext.CorrelationId to scavenge the asyncoperation (System Job) table. But according to the metadata, the attribute I think might be useful (CorrelationId, CorrelationUpdatedTime, Depth) are NOT valid for update. Maybe you could delete the rows? Even that may not help.
I doubt this can be solved like this.
I'd suggest a different approach: deploy a simple application alongside CRM and let it call the web service, which in turn can use the XRM endpoints in order to change the records.
Or, you can try something like this upon your crm service initialization in the plugin (dug it up from one of my plugins) leaving your workflow untouched:
CrmService service = new CrmService();
//initialize service here, then...
CorrelationToken newtoken = new CorrelationToken();
newtoken.CorrelationId = context.CorrelationId;
newtoken.CorrelationUpdatedTime = context.CorrelationUpdatedTime;
// WILD GUESS: Enforce unlimited depth ?
corToken.Depth = 0; // THIS WAS: context.Depth;
//updating correlation token
service.CorrelationTokenValue = corToken;
I admit I don't really remember much about this (code dates back to about 2 years ago), but it might help.
