One node Apache NiFi cluster - apache-nifi

I am setting up a secure cluster with NiFi 1.7.1 using an external Zookeeper
While I do plan to have 3 nodes in the cluster, at the moment I am only going to start one node.
Is that a valid cluster from NiFi standpoint.
The reason for my question is I get this SSL Handshake error during startup and I want to rule out that having single not is not the cause of this problem,
Thanks
Vijay

Yes, a single node can be run as a cluster.

Related

NiFi: Failed to connect node to cluster because local flow is different than cluster flow

After rebooting the server, NiFi does not start. Before the server reboot, I was able to shutdown/start NiFi without any issues.
I ensured that the 3 config files (flow.xml.gz, authorizations.xml, and users.xml) are identical on all the nodes.
2019-12-08 14:36:10,085 ERROR [main] o.a.nifi.controller.StandardFlowService Failed to load flow from cluster due to: org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
at org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1026)
at org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:539)
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1009)
at org.apache.nifi.NiFi.<init>(NiFi.java:158)
at org.apache.nifi.NiFi.<init>(NiFi.java:72)
at org.apache.nifi.NiFi.main(NiFi.java:297)
Caused by: org.apache.nifi.controller.UninheritableFlowException: Proposed Authorizer is not inheritable by the flow controller because of Authorizer differences: Proposed Authorizations do not match current Authorizations: Proposed fingerprint is not inheritable because the current access policies is not empty.
Also, ruled out any zookeeper corruption issue by deleting the znode for NiFi in the zookeeper cluster.
I am on NiFi 1.9.1
Any help is highly appreciated.
This means there is a difference in authorizations.xml or users.xml, most likely authorizations.xml. I would try copying those two files from one of the other nodes over to the node that is having the problem, this will ensure they are exactly the same.
A suggestion, if there is case we can't copy flow.xml.gz (like it was with me) for various process restrictions. We can stop Nifi service on problematic node, rename existing flow.xml.gz to a backup (just to be sure we don't loose it) and restart Nifi service.
Nifi would automatically generate a flow.xml.gz and connect the node to the cluster. It worked for me, hence, sharing.
Thanks

What is the best way to test hadoop?

I have completed the hadoop cluster setup with 3 journal nodes for QJM, 4 datanodes, 2 namenode, 3 zookeeper but I need to confirm whether the connectivity had made successfully between them so, I am searching for a too which can perform the following task
1) Should check which namenode currently in active state
2) Is both the namenode is communicating with each other successfully
3) Should check whether all journal nodes are communicating with each other successfully
4) Should check whether all zookeeper are communicating with each other successfully
5) Which zookeeper currently playing the master role
Is there any tool or any commands available to check above task?
Can anyone please help me to solve this?
Using Ambari you can monitor complete cluster performace and working.
Also if you want to validate your hadoop jobs programatically,
Then you can use the concept of Counters in Hadoop.

Storm-zookeeper transactional logs extremlly large

I'm using a ZooKeeper cluster (3 mchines) for my Storm cluster (4 machines). The problem is that -because of the topologies deployed on the storm cluster- the zookeeper transactional logs grow to be extremly large making the zookeeper desk to be full and what is really strange that those logs are not devided into multiple files instead I'm having one big transactional file in every zookeeper machine! making the autopurge in my zookeeper configuration not to have any affect on those files.
Is there a way to solve this problem from zookeeper side, or can I change the way storm uses zookeeper to minimize the size of those logs?
Note: I'm using zookeeper 3.6.4 and Storm 0.9.6 .
I was able to resolve this problem by using Pacemarker to process heartbeats from workers instead of zookeeper; That allowed me to avoid writting to zookeeper disk in order to maintain consistency and use in-memory store instead. In order to be able to use Pacemaker I upgraded to Storm-1.0.2.

Is there a way to restrict the application master to launch on particular nodes?

I have a cluster setup with nodes that are not reliable and can go down (They are aws spot instances). I am trying to make sure that my application master only launches on the reliable nodes (aws on demand instances) of the cluster. Is there a workaround for the same? My cluster is being managed by hortonworks ambari.
This can be achieved by using node labels. I was able to use the property in spark spark.yarn.am.nodeLabelExpression to restrict my application master to a set of nodes while running spark on yarn. Add the node labels to whichever nodes you want to use for application masters.

Does Zookeeper need to have its own server with HAMR?

This is in regard to a big data analytics engine put out by http://hamrtech.com
Does Zookeeper have to be on its own server with HAMR?
No it does not have to be part of the hamr cluster but every node within the cluster must have access to it.

Resources