Batch processing business flows - spring

I would like to ask about batch processing. I need to process 100 000 business flows which consist from these steps: generate PDF (asynchronous), send mail and upload document to archive system. I am considering to use activiti with spring boot (async service tasks) because I have control over failed jobs and I can easily retry them etc. I don't know if it's good idea to use activiti or camunda or some other tool.

You could use a multi instance call activity. With the multi instance you can specify how of the call activity should be executed (in your case 100_000 times). The call activity will call your process model to archive the pdf. For each call (instance of the multi instance) you can define a variable which should be forwared to your called process, so it is possible to have a list of PDF file names in the main process and forward a name to each sub process.
The main process could look like this:
Make sure that you make the multi instance async before to use the asynchronous continuation, otherwise this will not work with 100_000 instances.
The multi instance call activity could look like this:
<bpmn:callActivity id="Task_0fl5th9" name="archiving pdf" calledElement="archivePdf">
<bpmn:incoming>SequenceFlow_04xoo79</bpmn:incoming>
<bpmn:outgoing>SequenceFlow_0036ezx</bpmn:outgoing>
<bpmn:multiInstanceLoopCharacteristics camunda:asyncBefore="true" camunda:collection="pdfNames" camunda:elementVariable="pdfName">
<bpmn:loopCardinality xsi:type="bpmn:tFormalExpression">100_000</bpmn:loopCardinality>
</bpmn:multiInstanceLoopCharacteristics>
</bpmn:callActivity>

Related

Difference between #onScheduled and #onTrigger of Apache NIFI?

I'm tring to implement a custom processor which read message from RocketMQ.
Basiclly I need
create a MessageCosumer once
then call MessageCosumer to consume message repeatly.
#onScheduled and #onTrigger, which one should I use and how to achieve it?
You can create the MessageConsumer in the method with #OnScheduled, store it as a field in the processor class, and then invoke it inside the #onTrigger() method.
The #OnScheduled method will be called whenever the processor is scheduled to be run (i.e. a user clicks/invokes the API to "start" the processor). The #onTrigger() method runs every time the processor actually performs some unit of work (i.e. when one or more flowfiles are pulled from the incoming queue and operated on, or when the timer fires if this is the first processor in a flow segment). The Apache NiFi Developer Guide has more information on this as well as a common scenarios and patterns section which may be helpful.
I would also look at the source code for ConsumeJMS and AbstractJMSProcessor as it is a similar pattern.

Merge two distributed asynchronous tasks results and trigger an action

I have the following complex scenario and looking forward for the best solution:
- The first async process calls a third-party service and gets an Id
- The second async process gets an attribute from a client and tries two update a record in database based on the Id from the first part and the new attribute.
Also I would mention that the both above async tasks are triggered inside different APIs(spring boot based restful APIS) and they are not inside the same method. In other words, it is a kinda of problem in a distributed system.
The silliest solution would be having a loop to wait for the first async task result and complete the whole process.
Any suggestion?

run nifi flow once and notify me when it is finish

I use rest api in my program,I made a processor group for convent a mongodb collection to json file:
I want to run the scheduling only one time,so I set the "Run schedule" to 10000 sec.Then I will stop the group when the data flow have ran one time,and I made a Notify processor and add a DistributedMapCacheService.But the DistributedMapCacheClientService of the Notify processor only comunicates with the DistributedMapCacheService in nifi itself,It never nofity my program.
I try to use my own socket server,but I only get a message "nifi" but no more message.
My question is:If I only want scheduling run once and stop it,how do I know when shall I stop it?Or is there some other way to achieve my purpose,like detect if the json file exists or use incremental data(If the scheduling run twice,the data will be repeated twice)?
As #daggett said you can do it in a synchronous way you can use HandleHttpRequest as trigger and HandleHttpResponse to manage the response.
For an asynchronous was you have several options for the notification like PutTCP, PostHTTP, GetHTTP, use FTP, file system, XMPP or whatever.
If the scheduling run twice the duplicated elements depends on the processors you use, some of them have state others no, but if you are facing problems with repeated elements you can use the DetectDuplicate processor.

Serial consumption between message types

I have a MassTransit system that will consume 2 message types, one for a batch process, the other for CRUD operations on a single entity. Whilst the batch process is running, the CRUD operations should not be de-queued.
Is this possible to achieve using MassTransit? It seems the exchange binding -> type name, would potentially make this behavior difficult.
A solution would be to use one message type to denote both operations and then interrogate the message contents to discern between single and batch but this feels like a code smell. Also, this would require concurrency configuration to ensure only one consumer is ever active.
Can anyone help with an alternative solution here? Essentially, we need to pause all message consumption whilst an event driven process is running.
Thanks in advance.
By pause, do you mean that you want the CRUD operations to be able to occur without being blocked by the batch process? Because if it's only a matter of not having the two separate messages get in the way of each other, the most logical solution is using two separate queues, one receive endpoint for the batch process and another for the CRUD operations.
Now, if you truly need to separate the batch process such that it doesn't happen during the CRUD operations, that will require more work. And what if you receive a CRUD operation while the batch process is already running?
I think the separate queues is your best solution, however.

Restful triggering of Camunda process definitions from nodejs

I’m a beginner at Camunda/BPMN and I want to use it to control what is going on in nodejs, mostly likely using a REST API, at least for now. (Unless folks have a better idea for how nodejs should talk to Camunda.) My goal is to deliver systems where non-programmers can update the business logic in very practical ways.
I'd like to trigger the start of perhaps more-than-one process by sending a REST message, say to reflect that "a new insurance policy has been sold" and that might trigger the instantiation of say 2 processes on Monday but perhaps on Tuesday we add a third and now the same REST API call should now trigger more activity on Wednesday. (I figure it is better for nodejs to know about events but not about the process definitions. After all, my goal is to use Camunda as a sort of business logic server for my application. The less the nodejs code needs to know, the better.)
Which REST API should I be using to express the message that, say "a new insurance policy has been sold"? When I look at:
https://docs.camunda.org/manual/develop/reference/rest/signal/post-signal/
I find it very confusing. What should "name" match in the biz process definitions? I assume I don't need an executionId? I assume I can leave out tenantId?
Would some string in the message match the ID of a start event in one or more process definitions (or what has to match what)?
When I look at a process, is there an easy way to tell what variables I need to supply to start that process running?
Should I perhaps avoid using this event-oriented style of kicking off processes and just use the POST /process-definition/key/{key}/start? It would seem to me to be better form to trigger activity with events or signals or something like that rather than to have my nodejs code know about the specific process definition by name.
Should I be using events or signals in this case?
I gather that the start event should not be a "None Start Event" but I'm not clear on what type of start event TO use if I want automatic triggering based on events or signals or something? Would a "Non-interrupting - Message Start Event" be the right sort? I'm finding this confusing.
Once I have triggered the process to start, what does nodejs need to send to step the process forward from one task in that instance to the next?
Thanks!
In order to instantiate a new workflow instance you have the following possibilities:
Start exactly one instance:
Start a workflow instance by its known "key": https://docs.camunda.org/manual/develop/reference/rest/process-definition/post-start-process-instance/
Start a workflow by a message start event: https://docs.camunda.org/manual/develop/reference/rest/message/post-message/. A message can only start one specific workflow instance, it is not allowed that this is not a unique relationship. The message start event is the one you have to use in your BPMN process model. See also https://docs.camunda.org/manual/develop/reference/bpmn20/events/message-events/. This might indeed be the better approach to make your client independent of the process definition key.
Start multiple instances:
- Start a workflow instance by a BPMN signal event: https://docs.camunda.org/manual/develop/reference/rest/signal/post-signal/. The signal name could start many instances as once.
The name of the message or name of signal would be configured in the BPMN model. Both could work for your use case.
Once a process instance is started it will move automatically execute the next steps.
Probably following this example (https://blog.bernd-ruecker.com/use-camunda-without-touching-java-and-get-an-easy-to-use-rest-based-orchestration-and-workflow-7bdf25ac198e) step by step can give you some better idea?

Resources