spring state machine "dynamic fork" - spring

I'm trying to model a business process using the spring state machine. So far I've been very sucessful with it but I'm stuck on trying to model a dynamic bit, where
the user is in state A
in that state he can create a short (predefined) task for a different user (a small state machine)
those users have to basically execute a state machine flow til the end
it should be possible to spawn many tasks concurently.
the user returns to state A once all created by him tasks have completed.
Here is a graphical representation of what I'm trying to achieve.
I think I could do this if I represent each task as a state machine and so on but I would prefer to avoid going that route as it would complicate the application. Ideally I would have just one state machine configuration.
In the spring reference I found the fork pseudo state to be maybe what I'm looking for however the offical example repo only covers a static configuration (https://github.com/spring-projects/spring-statemachine/blob/master/docs/src/reference/asciidoc/sm-examples.adoc#statemachine-examples-tasks) where each tasks are already defined (T1, T2, T3). For my application needs however I would want to be able to (at runtime) add "T4".
In essence I would like to know whether my requirements could be fullfilled with a single state machine and if I could use fork() for my needs. If its not the case I will welcome any advice that would push me in the right direction.

As I commented over the weekend, if you need a "dynamic" configuration then easiest way to do it is using "dynamic builder interfaces" which is same as in all other examples. It was basically added to be able to use SSM outside of a spring application context. Tasks recipe uses this model as it supports running a DAG of tasks using hierarchical regions and submachines.
You don't necessarily need fork as if parallel regions are entered using initial states it is equivalent. You however need join to wait parallel regions to join their execution.
While that recipe provide some background how thins can be done, we have hopefully something better in our roadmap which is supposed to add a dsl language which should make these kind of custom implementations a much easier to make.

Related

How to apply machine learning for streaming data in Apache NIFI

I have a processor that generates time series data in JSON format. Based on the received data I need to make a forecast using machine learning algorithms on python. Then write the new forecast values ​​to another flow file.
The problem is: when you run such a python script, it must perform many massive preprocessing operations: queries to a database, creating a complex data structure, initializing forecasting models, etc.
If you use ExecuteStreamCommand, then for each flow file the script will be run every time. Is this true?
Can I make in NIFI a python script that starts once and receives the flow files many times, storing the history of previously received data. Or do I need to make an HTTP service that will receive data from NIFI?
You have a few options:
Build a custom processor. This is my suggested approach. The code would need to be in Java (or Groovy, which provides a more Python-like experience) but would not have Python dependencies, etc. However, I have seen examples of this approach for ML model application (see Tim Spann's examples) and this is generally very effective. The initialization and individual flowfile trigger logic is cleanly separated, and performance is good.
Use InvokeScriptedProcessor. This will allow you to write the code in Python and separate the initialization (pre-processing, DB connections, etc., onScheduled in NiFi processor parlance) with the execution phase (onTrigger). Some examples exist but I have not personally pursued this with Python specifically. You can use Python dependencies but not "native modules" (i.e. compiled C code), as the execution engine is still Jython.
Use ExecuteStreamCommand. Not strongly recommended. As you mention, every invocation would require the preprocessing steps to occur, unless you designed your external application in such a way that it ran a long-lived "server" component and each ESC command sent data to it and returned an individual response. I don't know what your existing Python application looks like, but this would likely involve complicated changes. Tim has another example using CDSW to host and deploy the model and NiFi to send it data via HTTP to evaluate.
Make a Custom Processor that can do that. Java is more appropriate. I believe you can do pretty much every with Java you just need to find libraries. Yes, there might be some issues with some initialization and preprocessing that can be handled by all that in the init function of nifi that will allow you preserve the state of certain components.
Link in my use case I had to build a custom processor that could take in images and apply count the number of people in that image. For that, I had to load a deep learning model once in the init method and after through on trigger method, it could be taking the reference of that model every time it processes an image.

How do one store data in a block?

My use case is very simple. I want to create a chain where peers can store some public data. What is the best way to accomplish that in Substrate?
I think I should implement a custom run time for that, but I'm not sure how to create a transaction sending data. I didn't found anything on that.
You may be looking for something like the system module's remark transaction.
It allows users to submit arbitrary pieces of data and have them attested to by the blockchain. That feature is available in any Substrate based blockchain including the node template.
A good place to start learning how to build custom runtimes and explore your idea more is the Proof of Existence tutorial.

Create a StateMachineInterceptor to persist StateMachineContext

I'm struggling to persist my state machine following the recipes and examples available. I'm working with the master branch and my state machine uses Hierarchical States, Regions and Orthogonal states. The first example I followed is spring-statemachine-samples/persist but it seems to deal only with basic FSM. The second one I tried is LocalStateMachineInterceptor but id does not seem to be working with Hierarchical States. Also, I can't find any way to persist an history state via a StateMachinePersist.
Is there an example of a complex FSM with persistence anywhere?
I have to be honest that persistence is one relatively unknown topic for samples and docs when things gets more complicated. It is something I'm currently working on to make it easier because as a user you should not care as there should be a relatively clean API's to do it. So stay tuned for those.
Having said that, before we get code more clear on this;
StateMachinePersist leads to StateMachineContext and there is some code in tests, namely StateMachineResetTests which shows some ways to do these things. There was also a question gh127 where I wrote something about internals of resetting a machine which is what a persistence does.
History state, yes that's my bad, for some reason it has slipped from my radar. Thanks for pointing it out! Created an issue for it gh182.

Independent subtates in a state machine

I need some suggestions. I am trying to implement an online order process through Spring state machine and am trying to construct a state diagram before I get to work. Now say my order can be canceled by three different admin users CanceledByAdmin1,CanceledByAdmin2 and CanceledByAdmin3. Should I make them substate of Cancel state or create three different states? Keeping in mind that all canceled states are the final states and independent of each other, I don't know if making substates does anything other than simplifying the paper diagram. Any help would be appreciated.
What comes for Spring Statemachine we can have only one terminate state and trying to make that as collection of substates is a bit awkward because once you enter it, state machine should stop all processing. Thought this area is something what I've probably overlooked and could try to enhance things.
While you could probably have a state S1 having three substates S11/S11E,S12/S12E andS13/S13E with triggerless transition from S11 to S11E and same with other substates, even this feels a bit weird because none of those would actually terminate root state machine.
I guess question is what you're trying to accomplish?
If you only want to keep that information around who/which cancelled the order, could you use a simple single terminate state and during a transition to that terminate state, add/modify extended state variables with this info.
Extended state variables are usually used to overcome these problems of suddenly having astronomical count of states to keep arbitrary information around. I know that in this example you only have three, but what about if you have 10, or 100? If you actually need to add even one more, you need to change state machine configuration and recompile. With extended state variables you would not need to do that.

How would you implement a Workflow system?

I need to implement a Workflow system.
For example, to export some data, I need to:
Use an XSLT processor to transform an XML file
Use the resulting transformation to convert into an arbitrary data structure
Use the resulting (file or data) and generate an archive
Move the archive into a given folder.
I started to create two types of class, Workflow, which is responsible of adding new Step object and run it.
Each Steps implement a StepInterface.
My main concerns is all my steps are dependent to the previous one (except the first), and I'm wondering what would be the best way to handle such problems.
I though of looping over each steps and providing each steps the result of the previous (if any), but I'm not really happy with it.
Another idea would have been to allow a "previous" Step to be set into a Step, like :
$s = new Step();
$s->setPreviousStep(Step $step);
But I lose the utility of a Workflow class.
Any ideas, advices?
By the way, I'm also concerned about success or failure of the whole workflow, it means that if any steps fail I need to rollback or clean the previous data.
I've implemented a similar workflow engine a last year (closed source though - so no code that I can share). Here's a few ideas based on that experience:
StepInterface - can do what you're doing right now - abstract a single step.
Additionally, provide a rollback capability but I think a step should know when it fails and clean up before proceeding further. An abstract step can handle this for you (template method)
You might want to consider branching based on the StepResult - so you could do a StepMatcher that takes a stepResult object and a conditional - its sub-steps are executed only if the conditional returns true.
You could also do a StepException to handle exceptional flows if a step errors out. Ideally, this is something that you can define either at a workflow level (do this if any step fails) and/or at a step level.
I'd taken the approach that a step returns a well defined structure (StepResult) that's available to the next step. If there's bulky data (say a large file etc), then the URI/locator to the resource is passed in the StepResult.
Your workflow is going to need a context to work with - in the example you quote, this would be the name of the file, the location of the archive and so on - so think of a WorkflowContext
Additional thoughts
You might want to consider the following too - if this is something that you're planning to implement as a large scale service/server:
Steps could be in libraries that were dynamically loaded
Workflow definition in an XML/JSON file - again, dynamically reloaded when edited.
Remote invocation and call back - submit job to remote service with a callback API. when the remote service calls back, the workflow execution is picked up at the subsequent step in the flow.
Parallel execution where possible etc.
stateless design
Rolling back can be fit into this structure easily, as each Step will implement its own rollback() method, which the workflow can call (in reverse order preferably) if any of the steps fail.
As for the main question, it really depends on how sophisticated do you want to get. On a basic level, you can define a StepResult interface, which is returned by each step and passed on to the next one. The obvious problem with this approach is that each step should "know" which implementation of StepResult to expect. For small systems this may be acceptable, for larger systems you'd probably need some kind of configurable mapping framework that can be told how to convert the result of the previous step into the input of the next one. So Workflow will call Step, Step returns StepResult, Workflow then calls StepResultConverter (which is your configurable mapping thingy), StepResultConverter returns a StepInput, Workflow then calls the next Step with StepInput and so on.
I've had great success implementing workflow using a finite state machine. It can be as simple or complicated as you like, with multiple workflows linking to each other. Generally an FSM can be implemented as a simple table where the current state of a given object is tracked in a history table by keeping a journal of the transitions on the object and simply retrieving the last entry. So a transition would be of the form:
nextState = TransLookup(currState, Event, [Condition])
If you are implementing a front end you can use this transition information to construct a list of the events available to a given object in its current state.

Resources