I have a Nifi flow with the SplitJson processor. Splitted parts are processing by some other processors and then the Notify processor notifies the Wait processor, and everything is good there. But I can't figure out how to transfer the changes of processing to the original FlowFile that waits at the Wait processor. I want to combine all the results at one place (probably at the original Flowfile) to do something with the final combined result, how can I do that?
I have found the solution, thanks to SivaprasannaSethuraman
I just have to merge my split flowfiles before calling Notify processor. And then at the Wait processor I have to wait only one event instead of waiting ${fragment.count} events. So I have my expected merged result in that single event at the Wait processor.
Related
Hello I am trying to make a data flow through Nifi which are listFile -> fetchFile -> data transformation -> putKudu
However, I want to somehow pause the fetchFile processor or hold on to the flowfiles received from the local server until a previous flowfile completely goes through the downstream, after putKudu processor, then release the next flowfile one by one.
I looked at Wait and Notify processors but could not find a way to notify the next flowfilw in the queue to be released.
Any help is appreciated.
I think you can achieve what you want by putting all processors after FetchFile in a Process Group with an input port. Connect the FetchFile to the process group. Then you configure the Process Group itself to have a Flowfile concurrency of 1.
This should cause the input port inside the process group to only accept one file when the prevoious is done.
Before we get into my problem, a quick (and simplified) overview of my NiFi Flow:
I get a csv file from a fileserver, convert it into json and split the array. So I now have one FlowFile for each line in my original csv file. I now transform each of the FlowFiles, depending on some conditions. Some of the FlowFiles will be routet to multiple transformations, which means, some FlowFiles will be split again (by the RouteOnAttribute processor) and become multiple FlowFiles once more. In the end, I want to merge the files back into one big file. So my flow looks something like this:
GetFile -> ConvertRecord -> SplitJson -> RouteOnAttribute to several transformations -> Funnel Everything -> MergeContent
Now, I want to make sure that every FlowFile that belongs to that first csv file ends up in the same final file. I tried using the scheduling-time in the MergeContent processor but that is way too unreliable. I looked into Notify and Wait but I simply cannot get it to work. I don't want to wait after my split and continue on success because I would lose all my transformations. Also, I cannot use the ${fragment.count} - attribute that my split processor is giving me because one of my FlowFiles might have been routet to different transformations and ended up as multiple FlowFiles.
I cannot find a way to merge every FlowFile from that original csv into a final file. Can someone help?
Thanks in advance!
Suppose I've a directory which contains multiple files. I want to list the directory, fetch all files and process them. But if there is a flowfile with a particular filename (e.g., file.txt) then I want to process this flowfile first before processing any other one. Please note I can't list the directory again due to my use case limitations. It has to be in a single flow.
You can start with something similar to below flow. Use Wait-Notify to implement gate like mechanism. But I think, to work this as expected you need to set Run Schedule for ListFile and the execution interval should be greater than expiration duration of Wait processor so that if a specific file is not present in the list of that execution attempt still those files will be processed before next execution of ListFile and won't be stuck at Wait processor queue!
I am creating a flow for processing some data from multiple sources (same platform, different customer). Each FlowFile is generated by triggering the HandleHttpRequest processor. I can only process one file at a time for certain customer. This process is also asynchronous (I am looping while I don't receive the response from the API that the process was finished).
What I have right now is a Wait/Notify flow, so after one FlowFile gets processed, Wait will release another file to process. However, this will only work for one customer. What I want is to have a dynamic number of Wait processors or one Wait processor, that can release FlowFiles conditionally (by attribute).
Example:
I have customer A and B. Each has generated FlowFiles with attribute
customer: ${cust_name}
These FlowFiles has been stopped in Wait processor and waiting for the notification by the Notify processor. The order of these files is unknown (order of files for one customer is always sorted). This means, that the queue can look like this (A3 B3 A2 A1 B2 B1). What I want is to Notify the Wait processor to release next A element or B element by attribute.
Is something like this possible ?
I found the solution to what I wanted to achieve !
So I have a Wait processor accepting files with an attribute customer, which has either value of A or B.
The files are then flowing in a loop in the Wait processor into wait relationship.
What happens is, that the order of these files entering wait queue is always the same. The Wait processor always look up for the first entry in the queue ant that's it.
To achieve the perpetual cycling of FlowFiles, you need to configure the wait queue with FirstInFirstOutPrioritizer.
However, this will not guarantee that Wait processor will release the oldest FlowFile, because the wait queue is always changing.
But there is a solution for this. There is a Wait Penalty Duration attribute, which will skip the first file in the queue if it did not match the signal, then second, third ... until the desired oldest file was found (or penalty will expire). You can find the whole conversation here https://github.com/apache/nifi/pull/3540
It works with Run schedule set to 0 and wait queue at default settings.
I have a scenario in NIFI where I have complex task executions happening in many different processors. I would like to append one more processor at the end of this flow. This processor wants to wait for all the processors to finish and then execute it(JUST ONCE!).
How would it be possible to achieve it?
You may want to look at the Wait/Notify processors for performing this type of coordination...
https://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.Wait/index.html
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.Notify/index.html