Merge two distributed asynchronous tasks results and trigger an action - spring-boot

I have the following complex scenario and looking forward for the best solution:
- The first async process calls a third-party service and gets an Id
- The second async process gets an attribute from a client and tries two update a record in database based on the Id from the first part and the new attribute.
Also I would mention that the both above async tasks are triggered inside different APIs(spring boot based restful APIS) and they are not inside the same method. In other words, it is a kinda of problem in a distributed system.
The silliest solution would be having a loop to wait for the first async task result and complete the whole process.
Any suggestion?

Related

Long running async method vs firing an event upon completion

I have to create a library that communicates with a device via a COM port.
In the one of the functions, I need to issue a command, then wait for several seconds as it performs a test (it varies from 10 to 1000 seconds) and return the result of the test:
One approach is to use async-await pattern:
public async Task<decimal> TaskMeasurementAsync(CancellationToken ctx = default)
{
PerformTheTest();
// Wait till the test is finished
await Task.Delay(_duration, ctx);
return ReadTheResult();
}
The other that comes to mind is to just fire an event upon completion.
The device performs a test and the duration is specified prior to performing it. So in either case I would either have to use Task.Delay() or Thread.Sleep() in order to wait for the completion of the task on the device.
I lean towards async-await as it easy to build in the cancellation and for the lack of a better term, it is self contained, i.e. I don't have to declare an event, create a EventArgs class etc.
Would appreciate any feedback on which approach is better if someone has come across a similar dilemma.
Thank you.
There are several tools available for how to structure your code.
Events are a push model (so is System.Reactive, a.k.a. "LINQ over events"). The idea is that you subscribe to the event, and then your handler is invoked zero or more times.
Tasks are a pull model. The idea is that you start some operation, and the Task will let you know when it completes. One drawback to tasks is that they only represent a single result.
The coming-soon async streams are also a pull model - one that works for multiple results.
In your case, you are starting an operation (the test), waiting for it to complete, and then reading the result. This sounds very much like a pull model would be appropriate here, so I recommend Task<T> over events/Rx.

Is there a way to execute the compensation from faulted activity

Lets assume I have defined the routing slip activity. Within Execute method I would like to make several asynchronous service calls. Lets assume 3 service calls. Two of them succeed and one fails. Then I would like to execute compensate action of this activity in order to compensate the changes introduced by two succeeded service calls. From what I see the compensation only runs for previous activities, the current activity compensation has no chance to be invoked when there is exception somewhere in it. Is there a way to deal with it or I should change the approach?
I would like to achive sth similar to
using MassTransit.
You should have three separate activities, and execute them in order, so that as they succeed individually, they are added to the log. If an activity fails, the previous activities will be compensated.
By having all three calls in a single activity, you're going against the entire reason for having the routing slip and activities.

Restful triggering of Camunda process definitions from nodejs

I’m a beginner at Camunda/BPMN and I want to use it to control what is going on in nodejs, mostly likely using a REST API, at least for now. (Unless folks have a better idea for how nodejs should talk to Camunda.) My goal is to deliver systems where non-programmers can update the business logic in very practical ways.
I'd like to trigger the start of perhaps more-than-one process by sending a REST message, say to reflect that "a new insurance policy has been sold" and that might trigger the instantiation of say 2 processes on Monday but perhaps on Tuesday we add a third and now the same REST API call should now trigger more activity on Wednesday. (I figure it is better for nodejs to know about events but not about the process definitions. After all, my goal is to use Camunda as a sort of business logic server for my application. The less the nodejs code needs to know, the better.)
Which REST API should I be using to express the message that, say "a new insurance policy has been sold"? When I look at:
https://docs.camunda.org/manual/develop/reference/rest/signal/post-signal/
I find it very confusing. What should "name" match in the biz process definitions? I assume I don't need an executionId? I assume I can leave out tenantId?
Would some string in the message match the ID of a start event in one or more process definitions (or what has to match what)?
When I look at a process, is there an easy way to tell what variables I need to supply to start that process running?
Should I perhaps avoid using this event-oriented style of kicking off processes and just use the POST /process-definition/key/{key}/start? It would seem to me to be better form to trigger activity with events or signals or something like that rather than to have my nodejs code know about the specific process definition by name.
Should I be using events or signals in this case?
I gather that the start event should not be a "None Start Event" but I'm not clear on what type of start event TO use if I want automatic triggering based on events or signals or something? Would a "Non-interrupting - Message Start Event" be the right sort? I'm finding this confusing.
Once I have triggered the process to start, what does nodejs need to send to step the process forward from one task in that instance to the next?
Thanks!
In order to instantiate a new workflow instance you have the following possibilities:
Start exactly one instance:
Start a workflow instance by its known "key": https://docs.camunda.org/manual/develop/reference/rest/process-definition/post-start-process-instance/
Start a workflow by a message start event: https://docs.camunda.org/manual/develop/reference/rest/message/post-message/. A message can only start one specific workflow instance, it is not allowed that this is not a unique relationship. The message start event is the one you have to use in your BPMN process model. See also https://docs.camunda.org/manual/develop/reference/bpmn20/events/message-events/. This might indeed be the better approach to make your client independent of the process definition key.
Start multiple instances:
- Start a workflow instance by a BPMN signal event: https://docs.camunda.org/manual/develop/reference/rest/signal/post-signal/. The signal name could start many instances as once.
The name of the message or name of signal would be configured in the BPMN model. Both could work for your use case.
Once a process instance is started it will move automatically execute the next steps.
Probably following this example (https://blog.bernd-ruecker.com/use-camunda-without-touching-java-and-get-an-easy-to-use-rest-based-orchestration-and-workflow-7bdf25ac198e) step by step can give you some better idea?

Batch processing business flows

I would like to ask about batch processing. I need to process 100 000 business flows which consist from these steps: generate PDF (asynchronous), send mail and upload document to archive system. I am considering to use activiti with spring boot (async service tasks) because I have control over failed jobs and I can easily retry them etc. I don't know if it's good idea to use activiti or camunda or some other tool.
You could use a multi instance call activity. With the multi instance you can specify how of the call activity should be executed (in your case 100_000 times). The call activity will call your process model to archive the pdf. For each call (instance of the multi instance) you can define a variable which should be forwared to your called process, so it is possible to have a list of PDF file names in the main process and forward a name to each sub process.
The main process could look like this:
Make sure that you make the multi instance async before to use the asynchronous continuation, otherwise this will not work with 100_000 instances.
The multi instance call activity could look like this:
<bpmn:callActivity id="Task_0fl5th9" name="archiving pdf" calledElement="archivePdf">
<bpmn:incoming>SequenceFlow_04xoo79</bpmn:incoming>
<bpmn:outgoing>SequenceFlow_0036ezx</bpmn:outgoing>
<bpmn:multiInstanceLoopCharacteristics camunda:asyncBefore="true" camunda:collection="pdfNames" camunda:elementVariable="pdfName">
<bpmn:loopCardinality xsi:type="bpmn:tFormalExpression">100_000</bpmn:loopCardinality>
</bpmn:multiInstanceLoopCharacteristics>
</bpmn:callActivity>

Asynchronous methods of ApiController -- what's the profit? When to use?

(This probably duplicates the question ASP.NET MVC4 Async controller - Why to use?, but about webapi, and I do not agree with answers in there)
Suppose I have a long running SQL request. Its data should be than serialized to JSON and sent to browser (as a response for xhr request). Sample code:
public class DataController : ApiController
{
public Task<Data> Get()
{
return LoadDataAsync(); // Load data asynchronously?
}
}
What actually happens when I do $.getJson('api/data', ...) (see this poster http://www.asp.net/posters/web-api/ASP.NET-Web-API-Poster.pdf):
[IIS] Request is accepted by IIS.
[IIS] IIS waits for one thread [THREAD] from the managed pool (http://msdn.microsoft.com/en-us/library/0ka9477y(v=vs.110).aspx) and starts work in it.
[THREAD] Webapi Creates new DataController object in that thread, and other classes.
[THREAD] Uses task-parallel lib to start a sql-query in [THREAD2]
[THREAD] goes back to managed pool, ready for other processing
[THREAD2] works with sql driver, reads data as it ready and invokes [THREAD3] to reply for xhr request
[THREAD3] sends response.
Please, feel free to correct me, if there's something wrong.
In the question above, they say, the point and profit is, that [THREAD2] is not from The Managed Pool, however MSDN article (link above) says that
By default, parallel library types like Task and Task<TResult> use thread pool threads to run tasks.
So I make a conclusion, that all THREE THREADS are from managed pool.
Furthermore, if I used synchronous method, I would still keep my server responsive, using only one thread (from the precious thread pool).
So, what's the actual point of swapping from 1 thread to 3 threads? Why not just maximize threads in thread pool?
Are there any clearly useful ways of using async controllers?
I think the key misunderstanding is around how async tasks work. I have an async intro on my blog that may help.
In particular, a Task returned by an async method does not run any code. Rather, it is just a convenient way to notify callers of the result of that method. The MSDN docs you quoted only apply to tasks that actually run code, e.g., Task.Run.
BTW, the poster you referenced has nothing to do with threads. Here's what happens in an async database request (slightly simplified):
Request is accepted by IIS and passed to ASP.NET.
ASP.NET takes one of its thread pool threads and assigns it to that request.
WebApi creates DataController etc.
The controller action starts an asynchronous SQL query.
The request thread returns to the thread pool. There are now no threads processing the request.
When the result arrives from the SQL server, a thread pool thread reads the response.
That thread pool thread notifies the request that it is ready to continue processing.
Since ASP.NET knows that no other threads are handling that request, it just assigns that same thread the request so it can finish it off directly.
If you want some proof-of-concept code, I have an old Gist that artificially restricts the ASP.NET thread pool to the number of cores (which is its minimum setting) and then does N+1 synchronous and asynchronous requests. That code just does a delay for a second instead of contacting a SQL server, but the general principle is the same.
The profit of asynchronous actions is that while the controller is waiting for the sql query to finish no threads are allocated for this request, while if you used a synchronous method a thread would be locked up in the execution of this method from the start to end of that method. While SQL server is doing its job the thread isn't doing anything but waiting.
If you use asynchronous methods this same thread can respond to other requests while SQL server is doing its thing.
I believe your steps are wrong at step 4, I don't think it will create a new thread to do the SQL query. At 6 there isn't created a new thread, it is just one of the available threads that will be used to continue from where the first thread left off. The thread at 6 could be the same as started the async operation.
The point of async is not to make an application multi threaded but rather to let a single threaded application carry on with something different instead of waiting for a response from an external call that is executing on a different thread or process.
Consider a desk top application that shows stock prices from different exchanges. The application needs to make a couple of REST / http calls to get some data from each of the remote stock exchange servers.
A single threaded application would make the first call, wait doing nothing until it got the first set of prices, update it's window, then make a call to the next external stock price server, again wait doing nothing until it got the prices, update it's window... etc..
We could go all multi threaded kick off the requests in parallel and update the screen in parallel, but since most of the time is spent waiting for a response from the remote server this seems overkill.
It might be better for the thread to:
Make a request for the first server but instead of waiting for the answer leave a marker, a place to come back to when the prices arrive and move on to issuing the second request, again leaving a marker of a place to come back to...etc.
When all the requests have been issued the application execution thread can go on to dealing with user input or what ever is required.
Now when a response from one of the servers is received the thread can be directed to continue from the marker it laid down previously and update the window.
All of the above could have been coded long hand, single threaded, but was so horrendous going multi threaded was often easier. Now the process of leaving the marker and coming back is done by the compiler when we write async/await. All single threaded.
There are two key points here:
1) Multi threaded does still happen! The processing of our request for stock prices happens on a different thread (on a different machine). If we were doing db access the same would be true. In examples where the wait is for a timer, the timer runs on a different thread. Our application though is single threaded, the execution point just jumps around (in a controlled manner) while external threads execute
2) We lose the benefit of async execution as soon as the application requires an async operation to complete. Consider an application showing the price of coffee from two exchanges, the application could initiate the requests and update it's windows asynchronously on a single thread, but now if the application also calculated the difference in price between two exchanges it would have to wait for the async calls to complete. This is forced on us because an async method (such as one we might write to call an exchange for a stock price) does not return the stock price but a Task, which can be thought of as a way to get back to the marker that was laid down so the function can complete and return the stock price.
This means that every function that calls an async function needs to be async or to wait for the "other thread/process/machine" call at the bottom of the call stack to complete, and if we are waiting for the bottom call to complete well why bother with async at all?
When writing a web api, IIS or other host is the desktop application, we write our controller methods async so that the host can execute other methods on our thread to service other requests while our code is waiting for a response from work on a different thread/process/machine.
I my opinion the following describes a clear advantage of async controllers over synchronous ones.
A web application using synchronous methods to service high latency
calls where the thread pool grows to the .NET 4.5 default maximum of
5, 000 threads would consume approximately 5 GB more memory than an
application able the service the same requests using asynchronous
methods and only 50 threads. When you’re doing asynchronous work,
you’re not always using a thread. For example, when you make an
asynchronous web service request, ASP.NET will not be using any
threads between the async method call and the await. Using the thread
pool to service requests with high latency can lead to a large memory
footprint and poor utilization of the server hardware.
from Using Asynchronous Methods in ASP.NET MVC 4

Resources