I am playing around with NiFi custom processor.
How can I inject an instance of org.apache.nifi.web.StandardNiFiServiceFacade into my custom processor instance?
Background:
I am trying to achieve the goal of stopping the processor after the processor is executed. I understand that nifi processors are meant only for stream processing and not for batch processing, in which we execute the job just once. But to leverage on the NiFi execution support, this need to be done. As experimented further, I will be able to do that with the instance of StandardNiFiServiceFacade available in the custom processor instance.
This is not made available to the processor API intentionally. If you are certain you want have the processor tell the controller to stop scheduling it then it can make an HTTP/REST API call to the API as would be the case for the user interface or programmatic API calls.
Processors should, however, never be doing this. They are either scheduled to execute or not scheduled to execute. If the conditions to perform some function are no longer as needed then the processor can check for these conditions and short-circuit its on trigger call and simply return. If the conditions to perform some function are present then it can run them.
If you are triggering this custom processor from an upstream processor such as GenerateFlowFile, you may be able to leverage ExecuteScript to emulate a "one-and-done" job trigger, check out my blog post for Groovy script(s) that might help you achieve what you're trying to do.
Related
how to start the dataflow I created without accessing the apache nifi interface. Is it possible to trigger run by running a .bat file? I am new in apache nifi and somewhat clueless on the limitation of apache nifi
I saved the dataflow as a template and want to start it without accessing apache nifi interface
There are several ways to start a processor.
Timer driven
This is the default mode. The Processor will be scheduled
to run on a regular interval. The interval at which the Processor is
run is defined by the 'Run Schedule' option (see below).
CRON driven
When using the CRON driven scheduling mode, the Processor is scheduled
to run periodically, similar to the Timer driven scheduling mode.
However, the CRON driven mode provides significantly more flexibility
at the expense of increasing the complexity of the configuration. The
CRON driven scheduling value is a string of six required fields and
one optional field, each separated by a space.
Event driven
When this mode is selected, the Processor will be triggered to run by
an event, and that event occurs when FlowFiles enter Connections
feeding this Processor. This mode is currently considered experimental
and is not supported by all Processors. When this mode is selected,
the 'Run Schedule' option is not configurable, as the Processor is not
triggered to run periodically but as the result of an event.
Additionally, this is the only mode for which the 'Concurrent Tasks'
option can be set to 0. In this case, the number of threads is limited
only by the size of the Event-Driven Thread Pool that the
administrator has configured.
You can read more about it in the Scheduling part of the NiFi User Guide.
If you specifically want to start a processor from a bat file, you can use cURL. For that your flow must start with either ListenHTTP or HandleHttpRequest. E.g. if ListenHTTP listens on port 8089 and your NiFi instance is accessible via my-nifi-intance.com, then you will have a webhook like my-nifi-intance.com:8089/webhook that will initiate the flow.
Since you are asking a very basic question, I encourage you to start with reading the Apache NiFi User Guide.
I am developing new Nifi processor for my data flow. I make code changes in eclipse , creates new .nar file and copy it to Nifi lib for testing it.
On ever nar update Nifi needs a restart which takes a significant amount of time.
Is there any better way of testing your new .nar in Nifi ? Because restarting Nifi for every small change reduces your development speed.
There are a few options for rapid prototyping and testing that make developing Apache NiFi processors easier.
Model your code in ExecuteScript -- using the ExecuteScript processor means you can make code changes to the domain-related code (whatever you type into the processor Script Body property or a file referenced by Script File) without having to build anything or restart the application. You can replay the same flowfiles through the updated code using the provenance replay feature. You can also test your scripts directly with Matt Burgess' NiFi Script Tester tool. Once you have acceptable behavior, take the script body and migrate it to a custom processor that can be deployed.
Use the unit testing and integration testing features of NiFi -- the test harnesses and "runners" provided by the core framework will allow you to simulate flow scenarios in automated tests before deploying the entire application. It takes a little time to build out the first flow, but once you do, it's a repeatable and understandable process which you can use to cover edge cases and ensure desired behavior.
Just check how testing done for standard nifi processors. And do the same. For example look at dbcp https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-standard-services/nifi-dbcp-service-bundle/nifi-dbcp-service/src
For those tests you don't need to start nifi.
I need to delete a record from inside of a background job. When I do that, it triggers the afterDelete function, which is great because I have a bit of logic in afterDelete, but it fails because it doesn't have the context of a request.user, which I use in my afterDelete logic.
I need to schedule this job to run from my dashboard – I'm not using the REST endpoint.
Is there a way to pass that context to afterDelete? How do I handle the situation?
Edit: This is all happening inside the context of Cloud Code. No Parse Android/iOS SDK is being used.
I am using JSF-2, Spring 4, hibernate 4 in my application. I have Spring type service layer, Dao Layers , Models and other thing. I want to schedule some of the services which should be automatically executed or called at specified time, usually these services or business logic would perform some kind of data-mapping from excel-file to database.
I want to perform these task without user-intervention and scheduler should take care all these data-mapping.
Note : I am calling these services from my view as well as these services also should be used in scheduler to perform data-mapping.
I am newbie at utmost level, never used any kind of scheduler or anything. So my question :
1)what should I have to use to schedule these task?
2)I am confused regarding Spring Batch and Spring-sheduler? are they both perform scheduling ,if no then what is actual use of sping-batch?
3)Can spring-scheduler itself sufficient enough to perform these scheduling
Any help would be highly considerable.
1)what should I have to schedule these task?
Basically you need the classes that support the operations that you want to do (excel creation from database queries), spring in both cases.
2)I am confused regarding Spring Batch and Spring-sheduler? are they both perform scheduling ,if know then what is actual use of sping-batch?
Spring Batch provides reusable functions that are essential in
processing large volumes of records, including logging/tracing,
transaction management, job processing statistics, job restart, skip,
and resource management. It also provides more advanced technical
services and features that will enable extremely high-volume and high
performance batch jobs though optimization and partitioning techniques
Spring scheduler just run any method at certain time, it is not so robust, and only execute the logic involve on a process, not statistic, not job restart, just start a process during predefined period of time (calling a method of a class)
3)Can spring-scheduler itself sufficient enough to perform these scheduling?
Yes it is, if you are not very related with spring-batch this will take more time that just call the methods you already have.
Scheduler A scheduler is a software product that allows an enterprise
to schedule and track computer batch tasks
Scheduler just ran the process.
I have created apps in the past that would have web pages that would call the persistence layer to get some query results or to insert, delete, etc against a db. However, nothing was left running in the background except for the persistence layer. Now I need to develop an app that has an process that is always running in the background, which is waiting for messages to come thru a zeromq messaging system (cannot change this at this point). I am a little lost as to how to setup the object so that it can always be running and yet I can control or query the results from the object.
Is there any tutorial/examples that covers this configuration?
Thanks,
You could use some kind of timer, to start a method every second to look at a specific ressource and process the input taken from that.
If you use Spring than you could have a look at the #Scheduled annotation.
If your input is some kind of java method invokation, than have a look at the java.util.concurrent Package, and concurrent programming at all. -- But be aware of the fact, that there are some restictions one creating own Threads in an EJB environment.