NameNode Format error "failure to login for principal: X from keytab Y: Unable to obtain password from user" with Kerberos in a Hadoop cluster - hadoop

I've been setting up Kerberos with my Hadoop cluster on Ubuntu 20.04.1 LTS and when I try to reformat the namenode in command line after changing all config files and setting everything up (including principals and keytabs), I'm being met by the error:
Exiting with status 1: org.apache.hadoop.security.KerberosAuthException: failure to login: for principal: hdfs/hadoopmaster.406bigdata.com#406BIGDATA.COM from keytab /etc/security/keytabs/hdfs.service.keytab javax.security.auth.login.LoginException: Unable to obtain password from user
This is taking place on my master node, with host name "hadoopmaster". Keytabs are stored in /etc/security/keytabs and when checking the keytabs using klist -t -k -e, the keytab has the correct principal "hdfs/hadoopmaster.406bigdata.com#406BIGDATA"
My hdfs-site.xml file consists of the following properties (includes more, but not included in code below as shouldn't be relevant to the error):
<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/security/keytabs/hdfs.service.keytab</value>
</property
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/hadoopmaster.406bigdata.com#406BIGDATA.COM</value>
</property>
I also have yarn setup with keytabs and principals and that starts up fine (log files have been checked and no errors) and can access the WebUI.
Tried changing filepaths of the keytabs outside of the root directory, double checked /etc/hosts file, the file has correct permissions and ownerships but nothing has helped fix the issue.

What happens when you su hdfs and try and use the keytab? --> does hdfs user have permissions to access the file?

Related

"Cannot set priority of datanode process" Unable to start datanode due to keytab password issue

I am attempting to start a datanode on a kerberised hadoop cluster, however when I run the command:
sudo ./hdfs --config $HADOOP_HOME/etc/hadoop --daemon start datanode
I am met with the error "ERROR: Cannot set priority of datanode process". in the logs I can see that the error is:
org.apach.hadoop.security.KerberosAuthException: failure to login: for principal: hdfs/hadoopworker1.securerealm.com#SECUREREALM.COM from keytab /etc/security/keytabs/hdfs.service.keytab javax.security.auth.login.LoginException: Unable to obtain passord from user
I can confirm that the keytab indicated definitely exists in the specified location and that the principal specified exists. The principal and keytab match the way they are called in hdfs-site.xml and in yarn-site.xml. My master node called its keytabs the same way without issue, but the keytabs for my workers seem to be causing issues.

Access hdfs from outside the cluster

I have a hadoop cluster on aws and I am trying to access it from outside the cluster through a hadoop client. I can successfully hdfs dfs -ls and see all contents but when I try to put or get a file I get this error:
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.fs.FsShell.displayError(FsShell.java:304)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:289)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
I have hadoop 2.6.0 installed in both my cluster and my local machine. I have copied the conf files of the cluster to the local machine and have these options in hdfs-site.xml (along with some other options).
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.enable</name>
<value>false</value>
</property>
My core-site.xml contains a single property in both the cluster and the client:
<property>
<name>fs.defaultFS</name>
<value>hdfs://public-dns:9000</value>
<description>NameNode URI</description>
</property>
I found similar questions but wasn't able to find a solution to this.
How about you SSH into that machine?
I know this is a very bad idea but to get the work done, you can first copy that file on machine using scp and then SSH into that cluster/master and do hdfs dfs -put on that copied local file.
You can also automate this via a script but again, this is just to get the work done for now.
Wait for someone else to answer to know the proper way!
I had similar issue with my cluster when running hadoop fs -get and I could resolve it. Just check if all your data nodes are resolvable using FQDN(Fully Qualified Domain Name) from your local host. In my case nc command was successful using ip addresses for data nodes but not with host name.
run below command :
for i in cat /<host list file>; do nc -vz $i 50010; done
50010 is default datanode port
when you run any hadoop command it try to connect to data nodes using FQDN and thats where it gives this weird NPE.
Do below export and run your hadoop command
export HADOOP_ROOT_LOGGER=DEBUG,console
you will see this NPE comes when it is trying to connect to any datanode for data transfer.
I had a java code which was also doing hadoop fs -get using APIs and there ,exception was more clearer
java.lang.Exception: java.nio.channels.UnresolvedAddressException
Let me know if this helps you.

Kerberos defaulting to wrong principal when accessing hdfs from remote server

I've configured kerberos to access hdfs from a remote server and I am able to authenticate and generate a ticket but when I try to access hdfs I am getting an error:
09/02 15:50:02 WARN ipc.Client: Exception encountered while connecting to the server : java.lang.IllegalArgumentException: Server has invalid Kerberos principal: nn/hdp.stack.com#GLOBAL.STACK.COM
in our krb5.conf file, we defined the the admin_server and kdc under a different realm:
DEV.STACK.COM = {
admin_server = hdp.stack.com
kdc = hdp.stack.com
}
Why is it defaulting to a different realm that is also defined in our krb5 (GLOBAL.STACK.COM?). I have ensured that all our hadoop xml files are #DEV.STACK.COM
Any ideas? Any help much appreciated!
In your KRB5 conf, you should define explicitly which machine belongs to which realm, starting with generic rules then adding exceptions.
E.g.
[domain_realm]
stack.com = GLOBAL.STACK.COM
stack.dev = DEV.STACK.COM
vm123456.dc01.stack.com = DEV.STACK.COM
srv99999.dc99.stack.com = DEV.STACK.COM
We have a Cluster Spark that requires connect to hdfs on Cloudera and we have getting the same error.
8/08/07 15:04:45 WARN Client: Exception encountered while connecting to the server : java.lang.IllegalArgumentException: Server has invalid Kerberos principal: hdfs/host1#REALM.DEV
java.io.IOException: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: hdfs/host1#REALM.DEV; Host Details : local host is: "workstation1.local/10.1.0.62"; destination host is: "host1":8020;
Based on above post of ohshazbot and other post on Cloudera site https://community.cloudera.com/t5/Cloudera-Manager-Installation/Kerberos-issue/td-p/29843 we modified the core-site.xml (in Spark Cluster ....spark/conf/core-site.xml)file adding the property and the connection is succesfull
<property>
<name>dfs.namenode.kerberos.principal.pattern</name>
<value>*</value>
</property>
I recently bumped into an issue with this using HDP2.2 on 2 seperate clusters and after hooking up a debugger to the client I found the issue, which may affect other flavors of hadoop.
There is a possible hdfs config dfs.namenode.kerberos.principal.pattern which dictates if a principal pattern is valid. If the principal doesn't match AND doesn't match the current clusters principal then you get the exception you saw. In hdp2.3 and higher, as well as cdh 5.4 and higher, it looks like this is set to a default of *, which means everything hits. But in HDP2.2 it's not in the defaults so this error occurs whenever you try to talk to the remote kerberized hdfs from your existing kerberized hdfs.
Simply adding this property with *, or any other pattern which matches both local and remote principal names, resolves the issue.
give ->
ls -lart
check for keytab file ex: .example.keytab
if keytab file is there
give -
klist -kt keytabfilename
ex: klist -kt .example.keytab
you'll get principal like example#EXAM-PLE.COM in output
then give
kinit -kt .example.keytab(keytabfile) example#EXAM-PLE.COM(principal)

Cloudera Hadoop access with Kerberos gives TokenCache error : Can't get Master Kerberos principal for use as renewer

I am trying to access Cloudera Hadoop setup (HIVE + Impala) from Mac Book Pro OS X 10.8.4.
We have Cloudera CDH-4.3.0 installed on Linux servers. I have extracted CDH-4.2.0 tarball to my Mac Book Pro.
I have set proper configuration and Kerberos credentials so that commands like 'hadoop -fs -ls /' works and HIVE shell starts up.
However when I do 'show databases' command it gives following error:
> hive
> show databases;
>
Failed with exception java.io.IOException:java.io.IOException: Can't get Master Kerberos principal for use as renewer
The error is related to TokenCache.
When I searched for error, it seems following method 'obtainTokensForNamenodesInternal' throws this error when it tries to get a delegation token for specific FS and fails.
http://hadoop.apache.org/docs/current/api/src-html/org/apache/hadoop/mapreduce/security/TokenCache.html
On client side I don't see any error in HIVE shell logs. I have also tried using tarballs of CDH 4.3.0 with same configuration I get the same error.
Any help or pointers for resolving this error would be highly appreciated.
It seems that you have not config the kerberos for yarn.
Add the follow configure in your yarn-site.cml
<property>
<name>yarn.nodemanager.principal</name>
<value>yarn_priciple/fqdn#_HOST</value>
</property>
<property>
<name>yarn.resourcemanager.principal</name>
<value>yarn_priciple/fqdn#_HOST</value>
</property>
Create a new Gateway YARN role instance in the host from Cloudera Manager. It will automatically setup and update the yarn-site.xml.

hadoop hdfs points to file:/// not hdfs://

So I installed Hadoop via Cloudera Manager cdh3u5 on CentOS 5. When I run cmd
hadoop fs -ls /
I expected to see the contents of hdfs://localhost.localdomain:8020/
However, it had returned the contents of file:///
Now, this goes without saying that I can access my hdfs:// through
hadoop fs -ls hdfs://localhost.localdomain:8020/
But when it came to installing other applications such as Accumulo, accumulo would automatically detect Hadoop Filesystem in file:///
Question is, has anyone ran into this issue and how did you resolve it?
I had a look at HDFS thrift server returns content of local FS, not HDFS , which was a similar issue, but did not solve this issue.
Also, I do not get this issue with Cloudera Manager cdh4.
By default, Hadoop is going to use local mode. You probably need to set fs.default.name to hdfs://localhost.localdomain:8020/ in $HADOOP_HOME/conf/core-site.xml.
To do this, you add this to core-site.xml:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost.localdomain:8020/</value>
</property>
The reason why Accumulo is confused is because it's using the same default configuration to figure out where HDFS is... and it's defaulting to file://
We should specify data node data directory and name node meta data directory.
dfs.name.dir,
dfs.namenode.name.dir,
dfs.data.dir,
dfs.datanode.data.dir,
fs.default.name
in core-site.xml file and format name node.
To format HDFS Name Node:
hadoop namenode -format
Enter 'Yes' to confirm formatting name node. Restart HDFS service and deploy client configuration to access HDFS.
If you have already did the above steps. Ensure client configuration is deployed correctly and it points to the actual cluster endpoints.

Resources