how to start the dataflow I created without accessing the apache nifi interface. Is it possible to trigger run by running a .bat file? I am new in apache nifi and somewhat clueless on the limitation of apache nifi
I saved the dataflow as a template and want to start it without accessing apache nifi interface
There are several ways to start a processor.
Timer driven
This is the default mode. The Processor will be scheduled
to run on a regular interval. The interval at which the Processor is
run is defined by the 'Run Schedule' option (see below).
CRON driven
When using the CRON driven scheduling mode, the Processor is scheduled
to run periodically, similar to the Timer driven scheduling mode.
However, the CRON driven mode provides significantly more flexibility
at the expense of increasing the complexity of the configuration. The
CRON driven scheduling value is a string of six required fields and
one optional field, each separated by a space.
Event driven
When this mode is selected, the Processor will be triggered to run by
an event, and that event occurs when FlowFiles enter Connections
feeding this Processor. This mode is currently considered experimental
and is not supported by all Processors. When this mode is selected,
the 'Run Schedule' option is not configurable, as the Processor is not
triggered to run periodically but as the result of an event.
Additionally, this is the only mode for which the 'Concurrent Tasks'
option can be set to 0. In this case, the number of threads is limited
only by the size of the Event-Driven Thread Pool that the
administrator has configured.
You can read more about it in the Scheduling part of the NiFi User Guide.
If you specifically want to start a processor from a bat file, you can use cURL. For that your flow must start with either ListenHTTP or HandleHttpRequest. E.g. if ListenHTTP listens on port 8089 and your NiFi instance is accessible via my-nifi-intance.com, then you will have a webhook like my-nifi-intance.com:8089/webhook that will initiate the flow.
Since you are asking a very basic question, I encourage you to start with reading the Apache NiFi User Guide.
Related
I was reading about Spring Batch and I read the below:
Spring Batch is not a scheduling framework. There are many good
enterprise schedulers (such as Quartz, Tivoli, Control-M, etc.)
available in both the commercial and open source spaces. It is
intended to work in conjunction with a scheduler, not replace a
scheduler.
Source: https://docs.spring.io/spring-batch/docs/current/reference/html/spring-batch-intro.html#springBatchBackground
So what is the difference between Spring Batch and Tivoli?
Spring Batch is mainly designed to provide a runtime for java batch workload.
IBM Workload Scheduler (Tivoli) / HCL Workload Automation, like other schedulers, doesn't run the workload directly, but is used to triggers any kind of workload (jobs), including Spring Batch, on on-prem or hybrid and multi cloud environments, including Kubernetes.
It can trigger jobs based on calendar, time, considering free/working day, complex runcycles (e.g. 3 working days before the end of each month).
In addition it can trigger workload based on dependencies on other jobs, so that they can start as soon as the previous job (running on any other system) has completed successfully, or run jobs only if predecessor has completed with a specific RC or result. Or you can use logic resources and limits to control how many jobs using the same machine or resource can run at the same time.
It can be also used to trigger workload based on events, e.g. when a new file is uploaded.
In recent releases IBM Workload Scheduler / HCL Workload Automation also added built-in capabilities to transfer files.
IBM Workload Scheduler / HCL Workload Automation is also key to have a centralized monitoring and recovery of failures, to centralize security granting access to different teams only on their jobs, to have a centralized governance (e.g. auditing any change and recovery on jobs).
It's also able to forecast the job durations and when every job will run, and generate alerts if they are running too long or if based on predecessors they are expected to miss their deadline.
I am playing around with NiFi custom processor.
How can I inject an instance of org.apache.nifi.web.StandardNiFiServiceFacade into my custom processor instance?
Background:
I am trying to achieve the goal of stopping the processor after the processor is executed. I understand that nifi processors are meant only for stream processing and not for batch processing, in which we execute the job just once. But to leverage on the NiFi execution support, this need to be done. As experimented further, I will be able to do that with the instance of StandardNiFiServiceFacade available in the custom processor instance.
This is not made available to the processor API intentionally. If you are certain you want have the processor tell the controller to stop scheduling it then it can make an HTTP/REST API call to the API as would be the case for the user interface or programmatic API calls.
Processors should, however, never be doing this. They are either scheduled to execute or not scheduled to execute. If the conditions to perform some function are no longer as needed then the processor can check for these conditions and short-circuit its on trigger call and simply return. If the conditions to perform some function are present then it can run them.
If you are triggering this custom processor from an upstream processor such as GenerateFlowFile, you may be able to leverage ExecuteScript to emulate a "one-and-done" job trigger, check out my blog post for Groovy script(s) that might help you achieve what you're trying to do.
We are trying to revamp our batch job scheduling and monitoring process over the entire enterprise. Currently all our batch jobs are scheduled using Unix crontab and are monitored using log files generated by shell scripts.
This process has lot of disadvantages and as the number of applications grow this gets really complicated.
Two copies of applications need to be deployed one to App-Server and one as standalone(since business logic is shared between both). This is complicating our build process too.
There is no easy of use web-ui for us to see the status of jobs and manually run failed jobs remotely without getting onto the unix box.
There is no fail over or load balanced batch processing.
So I was thinking of using Quartz (with our existing Spring apps) in our applications and deploy them to App-Servers and no longer rely on the unix crontab.
Is there a way I can write a centralized web application from where I can schedule and monitor jobs running on different quartz schedulers on different app servers?
P.S: I know quartzdesk.com is one solution, but I don't want to enable RMI on my JVM.
You could use SpringBoot scheduler as an Orchestrator and call REST APIs for the remote (or local, if you are small) execution. This way, as your app grows you could easily leverage a load balancer.
If you have the possibility of using cloud services (like Amazon, Azure or Google Cloud), this could be done easily using their own load balancers. They also support docker and could take care of any peaks of utilization.
I am developing new Nifi processor for my data flow. I make code changes in eclipse , creates new .nar file and copy it to Nifi lib for testing it.
On ever nar update Nifi needs a restart which takes a significant amount of time.
Is there any better way of testing your new .nar in Nifi ? Because restarting Nifi for every small change reduces your development speed.
There are a few options for rapid prototyping and testing that make developing Apache NiFi processors easier.
Model your code in ExecuteScript -- using the ExecuteScript processor means you can make code changes to the domain-related code (whatever you type into the processor Script Body property or a file referenced by Script File) without having to build anything or restart the application. You can replay the same flowfiles through the updated code using the provenance replay feature. You can also test your scripts directly with Matt Burgess' NiFi Script Tester tool. Once you have acceptable behavior, take the script body and migrate it to a custom processor that can be deployed.
Use the unit testing and integration testing features of NiFi -- the test harnesses and "runners" provided by the core framework will allow you to simulate flow scenarios in automated tests before deploying the entire application. It takes a little time to build out the first flow, but once you do, it's a repeatable and understandable process which you can use to cover edge cases and ensure desired behavior.
Just check how testing done for standard nifi processors. And do the same. For example look at dbcp https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-standard-services/nifi-dbcp-service-bundle/nifi-dbcp-service/src
For those tests you don't need to start nifi.
I am trying to understand how the various components of Mesos work together, and found this excellent tutorial that contains the following architectural overview:
I have a few concerns about this that aren't made clear (either in the article or in the official Mesos docs):
Where are the Schedulers running? Are there "Scheduler nodes" where only the Schedulers should be running?
If I was writing my own Mesos framework, what Scheduler functionality would I need to implement? Is it just a binary yes/no or accept/reject for Offers sent by the Master? Any concrete examples?
If I was writing my own Mesos framework, what Executor functionality would I need to implement? Any concrete examples?
What's a concrete example of a Task that would be sent to an Executor?
Are Executors "pinned" (permanently installed on) Slaves, or do they float around in an "on demand" type fashion, being installed and executed dynamically/on-the-fly?
Great questions!
I believe it would be really helpful to have a look at a sample framework such as Rendler. This will probably answer most of your question and give you feeling for the framework internal.
Let me now try to answer the question which might be still be open after this.
Scheduler Location
Schedulers are not on on any special nodes, but keep in mind that schedulers can failover as well (as any part in a distributed system).
Scheduler functionality
Have a look at Rendler or at the framework development guide.
Executor functionality/Task
I believe Rendler is a good example to understand the Task/Executor relationship. Just start reading the README/description on the main github page.
Executor pinning
Executors are started on each node when the first Task requiring such executor is send to this node. After this it will remain on that node.
Hope this helped!
To add to js84's excellent response,
Scheduler Location: Many users like to launch the schedulers via another framework like Marathon to ensure that if the scheduler or its node dies, then it can be restarted elsewhere.
Scheduler functionality: After registering with Mesos, your scheduler will start getting resource offers in the resourceOffers() callback, in which your scheduler should launch (at least) one task on a subset (or all) of the resources being offered. You'll probably also want to implement the statusUpdate() callback to handle task completion/failure.
Note that you may not even need to implement your own scheduler if an existing framework like Marathon/Chronos/Aurora/Kubernetes could suffice.
Executor functionality: You usually don't need to create a custom executor if you just want to launch a linux process or docker container and know when it completes. You could just use the default mesos-executor (by specifying a CommandInfo directly in TaskInfo, instead of embedded inside an ExecutorInfo). If, however you want to build a custom executor, at minimum you need to implement launchTask(), and ideally also killTask().
Example Task: An example task could be a simple linux command like sleep 1000 or echo "Hello World", or a docker container (via ContainerInfo) like image : 'mysql'. Or, if you use a custom executor, then the executor defines what a task is and how to run it, so a task could instead be run as another thread in the executor's process, or just become an item in a queue in a single-threaded executor.
Executor pinning: The executor is distributed via CommandInfo URIs, just like any task binaries, so they do not need to be preinstalled on the nodes. Mesos will fetch and run it for you.
Schedulers: are some strategy to accept or reject the offer. Schedulers we can write our own or we can use some existing one like chronos. In scheduler we should evaluate the resources available and then either accept or reject.
Scheduler functionality: Example could be like suppose say u have a task which needs 8 cpus to run, but the offer from mesos may be 6 cpus which won't serve the need in this case u can reject.
Executor functionality : Executor handles state related information of your task. Set of APIs you need to implement like what is the status of assigned task in mesos slave. What is the num of cpus currently available in mesos slave where executor is running.
concrete example for executor : chronos
being installed and executed dynamically/on-the-fly : These are not possible, you need to pre configure the executors. However you can replicate the executors using autoscaling.