Get NiFi environment details as Nifi DEV/TEST/PROD - apache-nifi

I have requirement to print the current environment details as NIFI DEV/TEST/PROD based on the running environment.
I tried ${hostname(true)}, but it returns the hostname. How can we get the environment info as DEV/TEST/PROD?

Related

Right Keytab for Nifi Processor

I have Nifi 3 node cluster (Installed Via Hortonworks Data Flow - HDF ) in Kerborized environment. As part of installation Ambari has created nifi service keytab .
Hi
Can I use this nifi.service.keytab for configuring processors like PutHDFS who talks to Hadoop services ?
The nifi.service.keytab is machine specific and always expect principal names with machine information. ex nifi/HOSTNAME#REALM
If I configure my Processor with nfii/NODE1_Hostname#REALM information then I see kerberos authentication exception in other two nodes.
How do I dynamically resolve hostname to use nifi service keytab ?
The keytab principal name field supports Apache NiFi Expression Language, so you can use an expression like the following: nifi/${hostname()}#REALM, and each node will resolve that expression (independently) to something like nifi/host1.nifi.com#REALM or nifi/host2.nifi.com#REALM, etc.
If you do not want it to be the explicit hostname, you can also set an environment variable on each node (export NIFI_KEYTAB_HOSTNAME="modified_host_format_1", etc.) and reference the environment variable in the EL expression the same way: nifi/${NIFI_KEYTAB_HOSTNAME}#REALM.

Spark Cluster EC2 - SSH Error

When setting up a basic cluster using the supplied spark script for ec2, the cluster is created (1 mater, 1 slave) but I continuously get SSH hostname resolution errors.
This is because the host names are empty.
As understand it, the point of the script is that it creates the instances and so it should know the names and be able to resolve these as part of the setup.
So the question is - am I supposed to configure ec2-script before trying to launch the cluster?

Determine Deployment Group from appspec.yml

I am using the elb scripts from https://github.com/awslabs/aws-codedeploy-samples/tree/master/load-balancing/elb to remove my ec2 instances from the load balancer before I do my code updates.
I need to define the load balancer in the ELB_LIST variable of the common_functions.sh bash script. This load balancer will be different for each environment (or deployment group).
Is there a way I can set this variable based on which deployment group I am deploying too from within this bash script?
The application artifacts will be the same, but deployed to different environments or groups and hence, different load balancers.
Well then, after searching the forums on aws, I see they now support deployment specific environment variables.
So I can reference the deployment group from within bash and set the load balancer:
if [ "$DEPLOYMENT_GROUP_NAME" == "Staging" ]
then
ELB_LIST="STAGING-ELB"
fi
RE http://blogs.aws.amazon.com/application-management/post/Tx1PX2XMPLYPULD/Using-CodeDeploy-Environment-Variables

A way to find the zookeeper ip's

On our (Cloudera CDH) cluster we have 3 ZK nodes running.
For parcelling purposes I'm looking for a way to get those node's IP's dynamically instead of hard-coding them.
Is there any environment variable or REST call that I'm missing?
I must have missed it, but the env var ZK_QUORUM does the trick!
from cloudera/cm_ext on github:
If you add a dependency to ZooKeeper service, then any process in that
service
(e.g. role daemon process, command process, client config deployment
process) will get the ZooKeeper quorum in ZK_QUORUM environment
variable. This can be used in the control script to add configuration
properties for the ZooKeeper quorum.

Assigning the host name to a hadoop job using weka in AWS

I have been using the wekaDistributedHadoop1.0.4 and wekaDistributedBase1.0.2 packages on my local machine to run some of the basic jobs. There is a field "HDFS host" which must be filled in order to run the jobs. I have been using "localhost" since I have been testing on my local machine and this works fine. I blindly tried using "localhost" when running on AWS EMR but the job failed. What I would like to know is what host name should I be entering into the field so that weka will call on the correct master? Is it the public DNS name which is provided when starting the cluster or is there a method in the API which gets that address for me?
If you want to manually do it.
Create a cluster and keep it alive, you can find info in amazon ec2 instances manage console, in security group elastic mapreduce master/slave. Find it out, login master node and edit conf file and fill with right name.
If you need automatically do it.
Write a shell executed in bootstrap. You can refer to https://serverfault.com/questions/279297/what-is-the-easiest-way-to-get-a-ec2-public-dns-inside-a-running-instance

Resources