I am seeing some errors in my nifi cluster, I have a 3 node secured nifi cluster i am seeing the below errors. at the 2 nodes
ERROR [main] org.apache.nifi.web.server.JettyServer Unable to load flow due to:
java.io.IOException: org.apache.nifi.cluster.ConnectionException:
Failed to connect node to cluster due to: java.io.IOException:
Could not begin listening for incoming connections in order to load balance data across the cluster.
Please verify the values of the 'nifi.cluster.load.balance.port' and 'nifi.cluster.load.balance.host'
properties as well as the 'nifi.security.*' properties
See the clustering configuration guide for the list of clustering options you have to configure. For load balancing, you'll need to specify ports that are open in your firewall so that the nodes can communicate. You'll also need to make sure that each host has its node hostname property set, its host ports set and that there are no firewall restricts between the nodes and your Apache Zookeeper cluster.
If you want to simplify the setup to play around, you can use the information in the clustering configuration section of the admin guide to set up an embedded ZooKeeper node within each NiFi instance. However, I would recommend setting up an external ZooKeeper cluster. A little more work, but ultimately worth it.
Related
If I want to use distCp on an on-prem hadoop cluster, so it can 'push' data to external cloud storage, what firewall considerations must be made in order to leverage this tool? What ports does the actual transfer of data take place on? Is it via SSH, and/or port 8020? I need to make sure network connectivity is provided for source to destination, but with the least amount of privileges ascribed to it. (i.e., only opening ports that are absolutely needed)
I do not believe SSH is used for the actual data transfer, other than you actually logging into the cluster and starting the command, for example.
At a minimum, it would be the RPC data-transfer ports for the NameNodes and Datanodes, so whatever you've configured for fs.defaultFS, dfs.namenode.rpc-address and dfs.datanode.address
For example; I have 3 nifi nodes in nifi cluster. Example hostnames of these nodes;
192.168.12.50:8080(primary)
192.168.54.60:8080
192.168.95.70:8080
I know that I can access to nifi-rest api from all nifi nodes. I have GetHTTP processor for get cluster summary from rest-api, and this processor runs on only pimary node. I did set "URL" property of this processor to 192.168.12.50:8080/nifi-api/controller/cluster.
But, if primary node is down, new primary node will be elected. Thus, I will not be able to access 192.168.12.50:8080 address from new primary node. Because this node was down. So, I will not be able to get cluster summary result from rest-api.
In this case, Can I use "localhost:8080/nifi-api/controller/cluster" instead of "192.168.12.50:8080/nifi-api/controller/cluster" for each node in nifi cluster?
It depends on a few things... if you are running securely then you have certificates that are generated for each node specific to the hostname, so the host in the web requests needs to match the host in the certificates, so you can't use localhost in that case.
It also depends how NiFi's web server is configured. If nifi.web.http.host or nifi.web.https.host has a specific hostname specified, then the web server is only bound to that hostname and may not accept connections with a different hostname. In a default unsecure setup, if you leave nifi.web.http.host blank then it binds to all interfaces.
You may be able to use the expression language function to obtain the hostname of the current node. So you could make the url something like "http://${hostname()}/nifi-api/controller/cluster".
We have 2 locations connected by VPN.
Currently we have 2 independent graylog servers.
We want to create some kind co cluster, so we can reach logs on both sides even if VPN is down.
Is is something like this:
We already tried to create Elasticsearch cluster, but this is not the way.
If VPN is down whole cluster is down and logs not working on either side.
I found this article: https://www.elastic.co/blog/scaling_elasticsearch_across_data_centers_with_kafka
with such topology:
but I have no idea how to configure Apache Kafka so it will be broker for graylog and input for syslog server.
Any help/another idea / link will be much appreciated.
I have a question about the clustering respectively the reconnection in the clustering in Elasticsearch.
I have 2 Elasticsearch-Server on 2 different servers within a network. Both Elasticsearch's are in the same cluster.
In an error scenario the network connection could be broken. I simulate this behaviour while pulling the network cable on one server.
After reconnecting the server to the network the clustering won't be working. When I put some data to one Elasticsearch, the data would not be transferred to the other Elasticsearch.
Does anybody know if there are some settings about the reconnecting?
Best Regards
Thomas
Why dont just put all Elasticsearch servers behind the load balancer with single DNS name, there could be issue in server which go down and need manual intervention , after correcting problem in server it will be available under load balancer automatically.
Did you check if all nodes join the cluster again?
You may want to try following APIs:
Check nodes status
http://es-host:9200/_nodes
Check cluster status
http://es-host:9200/_cluster/health
I have a clustered Named Node Setup. The Named nodes are configured to be Active and Passive.
When I make a WEBHDFS call, the URL to be provided is
http://:/webhdfs/v1/
Since I have 2 Named nodes available, I have 2 URL's available
http://:/webhdfs/v1/ - Its active now
http://:/webhdfs/v1/ - its passive now
My question is : The named nodes can failover any time. What value do I provide in HOST? Should I give the Service name? Is there a virtual IP that is normally configured in HDP platform which takes care of the redirection?
Or should I place a load balancer or gateway in front of the Named Nodes so that the failover is handled without any impact to the calling application.
It's a bug, it doesn't work in HA mode.
You have to explicitly put the active NN URL every time NN changes it's state.
https://hortonworks.jira.com/browse/BUG-30030
You will get an exception if you're talking to an inactive namenode.
See my answer here Any command to get active namenode for nameservice in hadoop?
You must determine the active Namenode first, then issue your WebHDFS API request to the active namenode. Issuing WebHDFS API requests to a standby namenode will result in an HTTP 403 error.
There is no automatic way to determine the active Namenode when using WebHDFS yet. You can use the hdfs command line client to query the configuration, or alternatively, loop through the Namenodes and issue JMX API requests to the `/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus" endpoint and parse the output.