Nifi Clustering: Embedded Zookeeper setup issue - apache-nifi

I have followed the NiFi clustering steps mentioned in NiFi admin guide. But the NiFi nodes are not forming a working cluster with embedded zookeepers. Am I missing something? Please help.
The configuration in zookeeper.properties is as follows. The 192.168.99.101 is the localhost IP address where NiFi is running and listening on port 9090:
clientPort=2181
initLimit=10
autopurge.purgeInterval=24
syncLimit=5
tickTime=2000
dataDir=./state/zookeeper
autopurge.snapRetainCount=30
server.1=192.168.99.101:2888:3888
The configuration pertaining to Zookeeper in nifi.properties is as follows:
nifi.state.management.embedded.zookeeper.start=true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties
nifi.zookeeper.connect.string=192.168.99.101:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
nifi.zookeeper.auth.type=
nifi.zookeeper.kerberos.removeHostFromPrincipal=
nifi.zookeeper.kerberos.removeRealmFromPrincipal=

Following the detailed Zookeeper based NiFi clustering steps as documented in these articles helped: Pierre Villard on NiFi clustering and Elton Atkins on NiFi clustering.
Also, following Matt Clarke's advise regarding using dedicated external zookeepers instead of embedded zookeepers helped.
Documenting what helped me just in case it helps someone else who struggles with similar problem in near future.

Related

How to monitor Hadoop cluster with ELK

I'm looking into the possibilities of monitoring hadoop cluster with ELK/EFK stack. I have searched over the public domains but couldn't find anything relevant.
Any help in this regard will be highly appreciated
It's not clear what you're trying to monitor.
Everything in Hadoop is mostly a Java process, so adding some JMX exporters like Prometheus or Jolokia would expose metrics over REST, and from there you would have to periodically poll those into Elasticsearch.
To enable JMX, you'd have to edit the hadoop-env.sh scripts, I believe, for YARN and HDFS, to control any JVM options. Hive, Spark, Hbase, etc all have similar scripts
General example here on Jolokia https://www.elastic.co/blog/monitoring-java-applications-with-metricbeat-and-jolokia
Other than that, Filebeat and Metricbeat operate the same as any other system
If you used Cloudera Manager or Ambari to control your cluster, then monitoring would be provided for you from those tools

Does hadoop itself contains fault-tolerance failover functionality?

I just installed new version of hadoop2, I wish to know if I config a hadoop cluster and it's brought up, how can I know if data transmission is failed, and there's a need for failover?
Do I have to install other components like zookeeper to track/enable any HA events?
Thanks!
High Availability is not enabled by default. I would highly encourage you to read the Hadoop documentation from Apache. (http://hadoop.apache.org/) It will give an overview of the architecture and services that run on a Hadoop cluster.
Zookeeper is required for many Hadoop services to coordinate their actions across the entire Hadoop cluster, regardless of the cluster being HA or not. More information can be found in the Apache Zookeeper documentation (http://zookeeper.apache.org/).

Storm UI not working

We are executing a Storm topology in pseudo mode.
Storm topology is executing fine and able to connect Storm UI (8080).
But Storm UI is not displaying the running topology information.
Restarted the storm UI process also but no use.
Does storm needs special configuration to display running topology in Storm UI?
You have only to provide port to ui.port option in storm.yaml, like: ui.port: 8080, also made sure that provided port is not already in use. And you don't need to run supervisor to check your Storm UI is running or not, just run nimbus and start ui.
Provide ui.port in storm.yaml file, default port is 8080
Start storm ui by bin/storm ui
I am facing same issue, because my port is already in use so I provided manually port number..
just add ui.port: 8090 in your storm.yaml file which is present inside conf folder of apache storm. And re-run the command storm ui .
Now type http://localhost:8090/ in your google chrome or any other browser.
What versions of Storm are you running?
Check to make sure both Nimbus AND a Supervisor are running. I have seen that if a topology is deployed with no supervisor running then nothing is displayed.
I was also facing the same issue. As the default port is 8080 and is already in use, you might be getting 404 there.
As suggested above as well just use ui.port: 8081 or anything else then 8080 which is not in use.
Mind the space between : and 8081, I faced problem for that as well. Not sure but if you face problem, just mind that space as well and include it.
Also after this if you face any issue, please run the zookeeper/bin> zkcli -server yourhostname command and try it.
Good luck !!
When running the pseudo mode, We normally forget giving a name to topology. If we don't provide the name for topology at the time of submitting it. Then it won't show up in storm UI.
Check following:
Supervisor is running
Nimbus is running
zookeeper is running
you are giving topology some name
Thanks

monitoring hadoop cluster with ganglia

I'm new to hadoop and trying to monitor the working of a multi node cluster using ganglia, The setup of gmond is done on all nodes and ganglia monitor only on the master.However,there are hadoop metrics graphs only for the master node and just system metrics for slaves. Do these hadoop metrics on the master include the slave metrics as well?Or is there any mistake in configuration files? Any help would be appreciated.
I think you should read this in order to understand how metrics flow between master and slave.
However, I would like to brief that, in genral, hadoop based or hbase based metrics are directly emitted/ sent to the master server (By master server, I mean the server on which gmetad is installed). All other OS related metrics are first collected by gmond installed on the corresponding slave and then redirected to the gmond installed on the master server.
So, if you are not getting any OS related metrics of slave servers then there is some misconfiguration in your gmond.conf. To know more about how to configure ganglia, please read this. This has helped me and could help you for sure, if you go through carefully.
There is a mistake in your configuration files.
More precisely, in transmitting / collecting the data, whichever approach you use.

One node Apache NiFi cluster

I am setting up a secure cluster with NiFi 1.7.1 using an external Zookeeper
While I do plan to have 3 nodes in the cluster, at the moment I am only going to start one node.
Is that a valid cluster from NiFi standpoint.
The reason for my question is I get this SSL Handshake error during startup and I want to rule out that having single not is not the cause of this problem,
Thanks
Vijay
Yes, a single node can be run as a cluster.

Resources