I am direct runer in apache beam and I want to build my own dev ops UI , I am mostly done in my UI where I can show the pipeline details (failure / success ) can display metrics results etc .
What I am now interested in is to display the pipeline graph (DAG graph) which shows complete design of my running pipeline .
Appreciate you help in advance.
You could refer to DataflowRunner code, it generates a pipeline json with all the worker steps and uploads it to cloud. Later, which is also referred to build a pipeline graph in GCP.
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
Related
I am looking for best approaches for deploying NiFi flows from my DEV environment to TEST/PROD environments.
Below links gives an overview of how we can achieve the same; basically it explains we have to make use of NiFi Cli to automate the deployment.
https://pierrevillard.com/2018/04/09/automate-workflow-deployment-in-apache-nifi-with-the-nifi-registry/
https://bryanbende.com/development/2018/01/19/apache-nifi-how-do-i-deploy-my-flow
But I was wondering is there an option to create a general script which can be used for deploying for different types of flows. Since the variables that we need to set for one processor is different from another one, not sure how we can do the same.
Any help is appreciated
I am the primary maintainer of NiPyAPI, a Python client for working with Apache NiFi. I have an example script covering the steps you are requesting though it is not part of the official Apache project.
https://github.com/Chaffelson/nipyapi/blob/master/nipyapi/demo/fdlc.py
We have started to explore and use Nifi for data flow as a basic ETL tool.
Got to know about Kylo as a datalake specific tool which works over Nifi.
Are there any industry usage and pattern where Kylo is being used Or any article giving its use case/preference over custom Hadoop components like Nifi/Spark ?
Please take a look at the following two resources:
1) Kylo's website: The home page lists domains where Kylo is being used.
2) Kylo FAQs: Useful information that can help you understand Kylo's architecture and comparison with other tools.
Kylo is designed to work with NiFi and Spark, and does not replace them. You can build custom Spark jobs and execute them via ExecuteSparkJob NiFi processor provided by Kylo.
I need to receive the list of available EMR Release labels in order to run my Java application which starts an EC2 instance and executes a hadoop job. The main problem here that EMR Release labels are specific for each region and I need to get this list dynamically. The sample of the code used in my application:
runJobFlowRequest.setReleaseLabel( "emr-4.8.0" );
Does anyone have an idea where I could obtain this list of Release labels programmatically via Amazon API in order to use it in my application?
I do not think we have any API exposed to get this list programatically. Usually most regions will have same release labels and EMR Console should show the most updated list for a specific region.
You can also see the list on the documentation : https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-relguide-links.html
Kubernetes scheduler includes two parts: predicate and priority. The source code is in kubernetes/plugin/pkg/scheduler. I want to add a new priority algorithm to the default priorities. Can anyone guide me the detailed steps? Thanks a lot!
Maybe I should do the following steps:
Add my own priority algorithm to the path: kubernetes/plugin/pkg/scheduler/algorithm/priorities
Register that priority algorithm
Build/Recompile the whole k8s project and install\deploy a new k8s cluster
Test if that priority effects, maybe give it a high weight.
If there are more detailed articles and documents, it will help me a lot!
The more detailed the better!Thanks a lot!
k8s version: 1.2.0, 1.4.0 or later.
You can run your scheduler as a kubernetes deployment.
Kelsey Hightower has an example scheduler coded up on Github
The meat and bones of this is here: https://github.com/kelseyhightower/scheduler/blob/master/bestprice.go
And the deployment yaml is here
Essentially, you can package it up as a docker container and deploy it.
Take note of the way you interact with the k8s API using this package in order to do it this way you'll need to have a similar wrapper, but it's much easier than building/recompiling the whole k8s package.
I have gone through the below documentation link
https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#reporting-tasks
But still I need a sample or workflow and its use in NiFi
A ReportingTask is a way to push information like metrics and statistics out of NiFi to an external system. It is a global component that you define in the controller settings, similar to controller services, so it is not on the canvas.
The available reporting tasks are in the documentation below the processors:
https://nifi.apache.org/docs.html
You can also look at the examples in the NiFi code:
https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-reporting-tasks
https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-ambari-bundle/nifi-ambari-reporting-task