Cluster has stale Kerberos client configuration - hadoop

Use cloudera manager 5.3.0, One of the cluster have this configuation issue,another cluster no problem.
I can't find a clue to solve this issue, Hours of googling didnt help me. thanks!
cluster1
Cluster has stale Kerberos client configuration.

The /etc/krb5.conf file on your new node is probably different that the one present on your other nodes (or simply not even there). Either 'Deploy Kerberos Client Configuration' from your cluster menu in CM (this will create the krb5.conf file from CM's Kerberos configuration), or copy a working krb5.conf file from another node to the new one.

Related

NiFi: Failed to connect node to cluster because local flow is different than cluster flow

After rebooting the server, NiFi does not start. Before the server reboot, I was able to shutdown/start NiFi without any issues.
I ensured that the 3 config files (flow.xml.gz, authorizations.xml, and users.xml) are identical on all the nodes.
2019-12-08 14:36:10,085 ERROR [main] o.a.nifi.controller.StandardFlowService Failed to load flow from cluster due to: org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
at org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1026)
at org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:539)
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1009)
at org.apache.nifi.NiFi.<init>(NiFi.java:158)
at org.apache.nifi.NiFi.<init>(NiFi.java:72)
at org.apache.nifi.NiFi.main(NiFi.java:297)
Caused by: org.apache.nifi.controller.UninheritableFlowException: Proposed Authorizer is not inheritable by the flow controller because of Authorizer differences: Proposed Authorizations do not match current Authorizations: Proposed fingerprint is not inheritable because the current access policies is not empty.
Also, ruled out any zookeeper corruption issue by deleting the znode for NiFi in the zookeeper cluster.
I am on NiFi 1.9.1
Any help is highly appreciated.
This means there is a difference in authorizations.xml or users.xml, most likely authorizations.xml. I would try copying those two files from one of the other nodes over to the node that is having the problem, this will ensure they are exactly the same.
A suggestion, if there is case we can't copy flow.xml.gz (like it was with me) for various process restrictions. We can stop Nifi service on problematic node, rename existing flow.xml.gz to a backup (just to be sure we don't loose it) and restart Nifi service.
Nifi would automatically generate a flow.xml.gz and connect the node to the cluster. It worked for me, hence, sharing.
Thanks

Cloudera node /etc/krb5.conf replaced at every reboot

I have a question, why are my cloudera nodes replacing the file /etc/krb5.conf ata every reboot ?? Im trying to make modifications, and when someone issues a reboot the file is again replaced by the old config file
Both CDH and HDP distros have an option to let their Hadoop cluster manager (Cloudera Manager vs. Ambari) also manage the Kerberos client config on all nodes.
Or rather, they have an option not to let it manage it for you...
From CDH 6.3 documentation
Choose whether Cloudera Manager should deploy and manage the krb5.conf on your cluster or not ...this page will let you configure the properties that will be emitted in it. In particular, the safety valves on this page can be used to configure cross-realm authentication.
From HDP 3.1 documentation
(Optional) To manage your Kerberos client krb5.conf manually (and not have Ambari manage the krb5.conf), expand the Advanced krb5-conf section and uncheck the "Manage" option.(Optional) To not have Ambari install the Kerberos client libraries on all hosts, expand the Advanced kerberos-env section and uncheck the “Install OS-specific Kerberos client package(s)” option

system_auth replication in Cassandra

I'm trying to configure authentication on Cassandra. It seems like because of replication strategy that is used for system_auth, it can't replicate user credentials to all the nodes in cluster, so I end up getting Incorrect credentials on one node, and getting successful connection on another.
This is related question. The guy there says you have to make sure credentials are always on all nodes.
How to do it? The option that is offered there says you have to alter keyspace to put replication factor equal to amount of nodes in cluster, then run repair on each node. That's whole tons of work to be done if you want your cassandra to be dynamically scalable. If I add 1 node today, 1 node another day, alter keyspace replication and then keep restarting nodes manually that will end up some kind of chaos.
Hour of googling actually leaded to slightly mentioned EverywhereStrategy, but I don't see anywhere in docs it mentioned as available. How do people configure APIs to work with Cassandra authentication then, if you can't be sure that your user actually present on node, that you're specifying as contact point?
Obviously, talking about true scale, when you can change the size of cluster without doing restarts of each node.
When you enable authentication in Cassandra, then Yes you have increase the system_auth keyspace replication_factor to N(total number of nodes) and run a complete repair, but you don't need to restart the nodes after you add a new Node.
If repair is consuming more time then you optimize your repair like repair only the system_auth keyspace
nodetool repair system_auth
(or)
nodetool repair -pr system_auth
As per Cassandra a complete repair should be done regularly. For more details on repair see the below links:
http://www.datastax.com/dev/blog/repair-in-cassandra
https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra/
http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html
Answering your questions:
Question: How do people configure APIs to work with Cassandra authentication then, if you can't be sure that your user actually present on node, that you're specifying as contact point?
Answer: I'm using Cassandra 2.2 and Astyanax thrift API from my Spring project, using which I am able to handle the Cassandra authentication effectively. Specify what version of Cassandra you are using and what driver you are using to connect CQL driver or Astyanax thrift API?
Question: Obviously, talking about true scale, when you can change the size of cluster without doing restarts of each node.
Answer: Yes you can scale your Cassandra cluster without restarting nodes, please check the datastax documentation for Cassandra 2.2 version:
http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/operations/opsAddNodeToCluster.html
Check the datastax docs for the version you are using.

How To Sync the contents to all nodes

I was following this tutorial trying to install and configure Spark on my cluster..
My cluster (5 nodes) is hosted on AWS and installed from Cloudera Manager.
It is mentioned in the tutorial that "Sync the contents of /etc/spark/conf to all nodes." after the modification of the configuration file.
I am really wondering what is the easies way to make that happen. I read a post that has a similar question like mine HERE. Based on my understanding, for the configuration files of hadoop, hdfs ...etc. which are monitored by zookeeper or cloudera manager. That might be the case to use CM deploy or zookeeper to make it happen.
However, Spark's configuration file is totally outside zookeeper's scope. How can I "sync" to other nodes..
Many thanks!
Why don't you mount the same EBS via NFS to /etc/spark/conf or one of it's parents so the files are automatically synced?

Cloudera installation Doubts?

I am new to cloudera, I installed cloudera in my system successfully I have two doubts,
Consider a machine with some nodes already using hadoop with some data, Can we install Cloudera to use the existing Hadoop without made any changes or modifaction on data stored existing hadooop.
I installed Cloudera in my machine, I have another three machines to add those as clusters, I want to know, Am i want install cloudera in those three machines before add those machines as clusters ?, or Can we add a node as clusters without installing cloudera on that purticular nodes?.
Thanks in advance can anyone, please give some information about the above questions.
Answer to questions -
1. If you want to migrate to CDH from existing Apache Distribution, you can follow this link
Excerpt:
Overview
The migration process does require a moderate understanding of Linux
system administration. You should make a plan before you start. You
will be restarting some critical services such as the name node and
job tracker, so some downtime is necessary. Given the value of the
data on your cluster, you’ll also want to be careful to take recent
back ups of any mission-critical data sets as well as the name node
meta-data.
Backing up your data is most important if you’re upgrading from a
version of Hadoop based on an Apache Software Foundation release
earlier than 0.20.
2.CDH binary needs be installed and configured in all the nodes to have a CDH based cluster up and running.
From the Cloudera Manual
You can migrate the data from a CDH3 (or any Apache Hadoop) cluster to a CDH4 cluster by
using a tool that copies out data in parallel, such as the DistCp tool
offered in CDH4.
Other sources
Regarding your second question,
Again from the manual page
Important:
Before proceeding, you need to decide:
As a general rule:
The NameNode and JobTracker run on the the same "master" host unless
the cluster is large (more than a few tens of nodes), and the master
host (or hosts) should not
run the Secondary NameNode (if used), DataNode or TaskTracker
services. In a large cluster, it is especially important that the
Secondary NameNode (if used) runs on a separate machine from the
NameNode. Each node in the cluster except the master host(s) should
run the DataNode and TaskTracker services.
Additionally, if you use Cloudera Manager it will automatically do all the setup necessary i.e install the necessary selected components on the nodes in the cluster.
Off-topic: I had a bad habit of not referrring the manual properly. Have a clear look at it, it answers all our questions
Answer to your second question,
you can add directly, with installation few pre requisites like openssh-clients and firewalls and java.
these machines( existing node, new three nodes) should accept same username and password (or) you should set passwordless ssh to these hosts..
you should connect to the internet while adding the nodes.
I hope it will help you:)

Resources