I am using executeprocess processor to execute shell script. But I want to make sure that if script executed successfully then only flow should go to next processor. Is there any way to check this?
In addition to JDP10101's answer, you could alternatively consider using ExecuteStreamCommand (with a GenerateFlowFile in front of it to trigger its execution) instead. ExecuteStreamCommand writes an attribute "execution.status" that gives the exit code for the process. You could use this with RouteOnAttribute to handle success (presumably execution.status == 0) and failure (execution.status != 0)
ExecuteProcess is running a command outside of NiFi, so determining what "success"/"failure" is can be difficult. Depending on what your command outputs in the event of a "failure" will change what your flow logic will be.
First you probably will want to set "Redirect Error Stream" to true in order to have the error stream in the content of the FlowFile. Then you will need determine what series of characters indicates a "failure". This could be as simple as "ERROR" or more complex depending on your process. Once you determine what a "failure" means/is, use RouteText to route it off FlowFiles whose content contains those characters.
Related
Suppose I've a directory which contains multiple files. I want to list the directory, fetch all files and process them. But if there is a flowfile with a particular filename (e.g., file.txt) then I want to process this flowfile first before processing any other one. Please note I can't list the directory again due to my use case limitations. It has to be in a single flow.
You can start with something similar to below flow. Use Wait-Notify to implement gate like mechanism. But I think, to work this as expected you need to set Run Schedule for ListFile and the execution interval should be greater than expiration duration of Wait processor so that if a specific file is not present in the list of that execution attempt still those files will be processed before next execution of ListFile and won't be stuck at Wait processor queue!
Hello I have a NiFi simple process group consisting on a GetTwitter connected to a ExecuteStreamCommand
GetTwitter Process ---> ExecuteStreamCommand
The executeStreamCommand reads the flowfile and do some analytics. Since the models that brings in memory are quite heavy, I was wondering if there was a way to keep the command alive and having the resulting flowfile without closing the execution of this one.
Thank you in advance
I have a NiFi flow where the flowfile will go through the flow toward the ExecuteStreamCommand processor to execute a python script to do some process. The file that run in the flow have a name format as a date (MMdd_HHmm) for example:0505_1015.csv 0505_1030.csv 0505_1045.csv and so on. But in my flow I need each file to run in order(if those are in the flow at the same time) so it is not first in first out but file that are in the flow which name that has less date and time need to run and go into the processor first.
So, from the example, if 0505_1015.csv 0505_1030.csv 0505_1045.csvare currently in the flow waiting to run in the next processor. 0505_1015.csv will get into the processor first then after it finish running, 0505_1030.csv will be execute in the processor.
So, if the files are in the flow it should run from the first date time. As shwon in the picture, if there are more than one flowfile in the flow, the one that will execute first is the one that has less datetime in it name.
I have read something about EnforceOrder and prioritizer and found this post but still I cannot figure how to do this.
Thank you
Like the question states, is there some way to synchronize NiFi process groups or pipelines that don't/can't connect in the UI?
Eg. I have a process where I want to getFTP->putHDFS->moveHDFS (which ends up actually being getFTP->putHDFS->listHDFS->moveHDFS, see https://stackoverflow.com/a/50166151/8236733). However, listHDFS does not seem to take any incoming connections. Trying to do something with process groups like P1{getFTP->putHDFS->outport}->P2{inport->listHDFS->moveHDFS} also runs into the same problem (listHDFS can't seem to take any incoming connections). We don't want to moveHDFS before we ever even get anything from getFTP, but given the above, I don't see how these actions can be synchronized to occur in the right order.
New to NiFi, but I imagine this is a common use case and there must be some NiFi-ish way of doing this that I am missing. Advice in this would be appreciated. Thanks.
I'm not sure what requirement is preventing you from writing the file retrieved from FTP directly to the desired HDFS location, or if this is a "write n files to HDFS with a . starting the filename and then rename all when some certain threshold is reached" scenario.
ListHDFS does not take any incoming relationships because it should not be triggered by an incoming event, but rather on a timer/CRON schedule. Every time it runs, it will produce n flowfiles, where each references an HDFS file that has been detected to be written to the filesystem since the last execution. To do this, the processor stores local state.
Your flow segments do not need to be connected in this case. You'll have "flow segment A" which performs the FTP -> HDFS writing (GetFTP -> PutHDFS) and you'll have an independent "flow segment B" which lists the HDFS directory, reads the file descriptors (but not the content of the file unless you use FetchHDFS as well) and moves them (ListHDFS -> MoveHDFS). The ListHDFS processor will run constantly, but if it does not detect any new files during a run, it will simply yield and perform a no-op. Once the PutHDFS processor completes the task of writing a file to the HDFS file system, on the next ListHDFS execution, it will detect that file and generate a flowfile describing it.
You can tune the scheduling to your liking, but in general this is a very common pattern in NiFi flows.
I need to know if Apache NiFi supports running processors until completion.
"the execution of a series of processors in process group wait for anothor process group results execution to be complete".
For example:
Suppose there are three processors in NiFi UI.
P1-->P2-->P3
P-->Processor
Now I need to run P1 if it run completely then run P2 And finally it will run like sequence but one wait for another to be complete.
EDIT-1:
Just for example I have data in web URL. I can download that data using GetHTTP Processor. Now I stored that in putFile content. If file saved in putFile directory then run FetchFile to process that file into my database like below workflow.
GetHTTP-->PutFile-->FetchFile-->DB
Is this possible?
NiFi itself is not really a batch processing system, it is a data flow system more geared towards continuous processing. Having said that, there are some techniques you can use to do batch-like operations, depending on which processors you're using.
The Split processors (SplitText, SplitJSON, etc.) write attributes to the flow files that include a "fragment.identifier" which is unique for all splits created from an incoming flow file, and "fragment.count" which is the total number of those splits. Processors like MergeContent use those attributes to process a whole batch (aka fragment), so the output from those kinds of processors would occur after an entire batch/fragment has been processed.
Another technique is to write an empty file in a temp directory when the job is complete, then a ListFile processor (pointing at that temp directory) would issue a flow file when the file is detected.
Can you describe more about the processors in your flow, and how you would know when a batch was complete?