I am creating a flow for processing some data from multiple sources (same platform, different customer). Each FlowFile is generated by triggering the HandleHttpRequest processor. I can only process one file at a time for certain customer. This process is also asynchronous (I am looping while I don't receive the response from the API that the process was finished).
What I have right now is a Wait/Notify flow, so after one FlowFile gets processed, Wait will release another file to process. However, this will only work for one customer. What I want is to have a dynamic number of Wait processors or one Wait processor, that can release FlowFiles conditionally (by attribute).
Example:
I have customer A and B. Each has generated FlowFiles with attribute
customer: ${cust_name}
These FlowFiles has been stopped in Wait processor and waiting for the notification by the Notify processor. The order of these files is unknown (order of files for one customer is always sorted). This means, that the queue can look like this (A3 B3 A2 A1 B2 B1). What I want is to Notify the Wait processor to release next A element or B element by attribute.
Is something like this possible ?
I found the solution to what I wanted to achieve !
So I have a Wait processor accepting files with an attribute customer, which has either value of A or B.
The files are then flowing in a loop in the Wait processor into wait relationship.
What happens is, that the order of these files entering wait queue is always the same. The Wait processor always look up for the first entry in the queue ant that's it.
To achieve the perpetual cycling of FlowFiles, you need to configure the wait queue with FirstInFirstOutPrioritizer.
However, this will not guarantee that Wait processor will release the oldest FlowFile, because the wait queue is always changing.
But there is a solution for this. There is a Wait Penalty Duration attribute, which will skip the first file in the queue if it did not match the signal, then second, third ... until the desired oldest file was found (or penalty will expire). You can find the whole conversation here https://github.com/apache/nifi/pull/3540
It works with Run schedule set to 0 and wait queue at default settings.
Related
Consider this flow:
It's a simple flow to authenticate to an HTTP API and handle success/failure. In the failure state, you can see I added a ControlRate processor and that there are 2 FlowFiles in the queue for it. I have it set to only pass one FlowFile every 30 seconds (Time Duration = 30sec Maximum Rate = 1). So the queue will continue to fill during this, if the authentication process continues to fail.
What I want is to essentially drop all but the first FlowFile in this queue, because I don't want it to continue re-triggering the authentication processor after we get a successful authentication.
I believe I can accomplish this by setting the FlowFile Expiration (on the highlighted queue) to be just longer than the 30 second Time Duration of the ControlRate processor. But this seems a bit arbitrary and not quite correct in my mind.
Is there a way to say "take first, drop rest" for the highlighted queue?
Hello I am trying to make a data flow through Nifi which are listFile -> fetchFile -> data transformation -> putKudu
However, I want to somehow pause the fetchFile processor or hold on to the flowfiles received from the local server until a previous flowfile completely goes through the downstream, after putKudu processor, then release the next flowfile one by one.
I looked at Wait and Notify processors but could not find a way to notify the next flowfilw in the queue to be released.
Any help is appreciated.
I think you can achieve what you want by putting all processors after FetchFile in a Process Group with an input port. Connect the FetchFile to the process group. Then you configure the Process Group itself to have a Flowfile concurrency of 1.
This should cause the input port inside the process group to only accept one file when the prevoious is done.
Suppose I've a directory which contains multiple files. I want to list the directory, fetch all files and process them. But if there is a flowfile with a particular filename (e.g., file.txt) then I want to process this flowfile first before processing any other one. Please note I can't list the directory again due to my use case limitations. It has to be in a single flow.
You can start with something similar to below flow. Use Wait-Notify to implement gate like mechanism. But I think, to work this as expected you need to set Run Schedule for ListFile and the execution interval should be greater than expiration duration of Wait processor so that if a specific file is not present in the list of that execution attempt still those files will be processed before next execution of ListFile and won't be stuck at Wait processor queue!
I am trying to set up wait and notify processes to execute a final flow when several complex flows finish. However, I do not understand why the counters do not reset.
On top of that, the wait process sends the flowfiles to the "success" path when the counter is greater than "Target Signal Count" attribute.
Deleting and creating new cache clients and servers resets the counters and the problem does not reappear until I empty queues after the "Notify" process.
With a hardcoded release signal identifier, the wait processor will pass the flowfiles to the success relationship only once. That is, when the signal counter reaches the target signal count of 2. When another flowfile is being generated, the same release signal identifier will be used. So the notify processor will increase the count beyond 2.
On top of that, the wait process sends the flowfiles to the "success"
path when the counter is greater than "Target Signal Count" attribute.
No. Flowfiles are send to the success relationship when signal count exactly matches your target signal count.
You can solve this problem in two ways.
First you can to use a dynamic release signal identifier, that changes every time a new flowfile is generated. GenerateFlowFile creates a filename attribute, which you can use. This way you are using a new cache key for every new flowfile, so you have a new counter for every new root flowfile.
Second you can use PutDistributedMapCache to manually update the counter.
I create a flow to test the first solution:
GenerateFlowFile contains some simple text:
SplitText simply splits the text, resulting in one flowfile per line. Then you route the original flowfile to the Wait processor:
Notice that I use ${filename} to set a dynamic release signal identifier. ${fragment.count} is provided by the SplitText processor and holds the total number of splits or lines in that specific use case. Now you have to increase the counter with Notify processor:
Once all lines where routed to Notify a signal for the counter name chunks will be released and the Wait processor will route the original flowfile to the success relationship.
PS: to dive deeper into Wait/Notify you might check out this blog post: How to wait for all fragments to be processed, then do something?.
I have a Nifi flow with the SplitJson processor. Splitted parts are processing by some other processors and then the Notify processor notifies the Wait processor, and everything is good there. But I can't figure out how to transfer the changes of processing to the original FlowFile that waits at the Wait processor. I want to combine all the results at one place (probably at the original Flowfile) to do something with the final combined result, how can I do that?
I have found the solution, thanks to SivaprasannaSethuraman
I just have to merge my split flowfiles before calling Notify processor. And then at the Wait processor I have to wait only one event instead of waiting ${fragment.count} events. So I have my expected merged result in that single event at the Wait processor.