How to know all the fragment Execution has been done - apache-nifi

I have the following flow :
following is the Explanation of Flow
1- I have large file that i am splitting by splitRecord Processor into multiple fragment .
2- All the fragment is passing through ExecuteStream command processor . The processor is doing some processing on all the fragment
3- The processing of data on Execute command processor can be fail or can be passed too .
4- At the end how can i know All the fragment has been processed (it could be either fail or pass)
I am expecting the appropriate result by using nifi processor

Related

NiFi: MergeRecord is generating duplicates

I have some troubles with the MergeRecord processor in Nifi. You can see the whole Nifi flow below: I'm getting a json array from an API, then I split it, I apply some filters and then I want to build the json array again.
Nifi workflow
I'm able to build the good json array from all the chunks, but the problem is that the processor is generating data indefinitely. When I execute the job step by step (by starting / stopping every processors one by one) everything is fine, but when the MergeRecord is running it's generating the same data even if I stop the begin of the flow (so there is no more inputs...)
You can see a screenshot below of the data in the "merged" box that are stacking
data stacked
I scheduled this processor every 10 sec, and after 30 sec you can see that it executed 3 times and generated 3 times the same file while there is no more data above. It's weird because when you look at the "original" box of the processor I can see the right original amount of data (18,43Kb). But the merged part is still increasing...
Here is the configuration of the MergeRecord:
configuration
I suppose that I'm missing something but I don't know why !
Thank you for your help,
Regards,
Thomas

Working of onTrigger - nifi custom processor

I just started learning about the custom processor in nifi. I want to understand the specific case of working of onTrigger. I am doing some operations in onTrigger function using the property values which are defined in the nifi flow Processor interface.
Ex: Property value in the custom processor takes a string separated by ',' and in the onTrigger function I write a code which converts the string into an array of String and removes the additional white spaces.
My question is will this operation run every time a flowfile passes through the custom processor or will it be converted only once.
I tried going through the official development docs but could'nt find info on this
The Java code of a processor is compiled when you run a Maven build to produce the NAR file. The code is not compiled by NiFi itself.
You then deploy a NAR file to a NiFi instance by placing it in the lib directory, and then you use components from that NAR in your flow by adding them to the canvas.
Once a component is on the canvas and it is started, then the onTrigger method is called according to the scheduling strategy.
Whatever code is in onTrigger will run for every execution of the processor, so your code to read the property and split the value will run every time.
If the property supports expression language from flow files, then you need to run this code every time in onTrigger because the resulting value could be different for every flow file.
if the property does not support expression language from flow files, then you can instead use a method with #OnScheduled and process the property value into whatever you need, and store in a member variable of the processor, this way it only happens one time.

NiFI:Passing a flowfile to the next processor by datetime from filename

I have a NiFi flow where the flowfile will go through the flow toward the ExecuteStreamCommand processor to execute a python script to do some process. The file that run in the flow have a name format as a date (MMdd_HHmm) for example:0505_1015.csv 0505_1030.csv 0505_1045.csv and so on. But in my flow I need each file to run in order(if those are in the flow at the same time) so it is not first in first out but file that are in the flow which name that has less date and time need to run and go into the processor first.
So, from the example, if 0505_1015.csv 0505_1030.csv 0505_1045.csvare currently in the flow waiting to run in the next processor. 0505_1015.csv will get into the processor first then after it finish running, 0505_1030.csv will be execute in the processor.
So, if the files are in the flow it should run from the first date time. As shwon in the picture, if there are more than one flowfile in the flow, the one that will execute first is the one that has less datetime in it name.
I have read something about EnforceOrder and prioritizer and found this post but still I cannot figure how to do this.
Thank you

Does Apache NiFi support batch processing?

I need to know if Apache NiFi supports running processors until completion.
"the execution of a series of processors in process group wait for anothor process group results execution to be complete".
For example:
Suppose there are three processors in NiFi UI.
P1-->P2-->P3
P-->Processor
Now I need to run P1 if it run completely then run P2 And finally it will run like sequence but one wait for another to be complete.
EDIT-1:
Just for example I have data in web URL. I can download that data using GetHTTP Processor. Now I stored that in putFile content. If file saved in putFile directory then run FetchFile to process that file into my database like below workflow.
GetHTTP-->PutFile-->FetchFile-->DB
Is this possible?
NiFi itself is not really a batch processing system, it is a data flow system more geared towards continuous processing. Having said that, there are some techniques you can use to do batch-like operations, depending on which processors you're using.
The Split processors (SplitText, SplitJSON, etc.) write attributes to the flow files that include a "fragment.identifier" which is unique for all splits created from an incoming flow file, and "fragment.count" which is the total number of those splits. Processors like MergeContent use those attributes to process a whole batch (aka fragment), so the output from those kinds of processors would occur after an entire batch/fragment has been processed.
Another technique is to write an empty file in a temp directory when the job is complete, then a ListFile processor (pointing at that temp directory) would issue a flow file when the file is detected.
Can you describe more about the processors in your flow, and how you would know when a batch was complete?

Nifi executeprocess processor

I am using executeprocess processor to execute shell script. But I want to make sure that if script executed successfully then only flow should go to next processor. Is there any way to check this?
In addition to JDP10101's answer, you could alternatively consider using ExecuteStreamCommand (with a GenerateFlowFile in front of it to trigger its execution) instead. ExecuteStreamCommand writes an attribute "execution.status" that gives the exit code for the process. You could use this with RouteOnAttribute to handle success (presumably execution.status == 0) and failure (execution.status != 0)
ExecuteProcess is running a command outside of NiFi, so determining what "success"/"failure" is can be difficult. Depending on what your command outputs in the event of a "failure" will change what your flow logic will be.
First you probably will want to set "Redirect Error Stream" to true in order to have the error stream in the content of the FlowFile. Then you will need determine what series of characters indicates a "failure". This could be as simple as "ERROR" or more complex depending on your process. Once you determine what a "failure" means/is, use RouteText to route it off FlowFiles whose content contains those characters.

Resources