Kerberos Authentication on Hadoop Cluster - hadoop

I have prepared a 2 node cluster with plain apache Hadoop. These nodes acts as Kerberos client to another machines which acts as Kerberos Server.
The KDC Db, principals of hdfs on each machines are created with their kaytab files with proper encryption types, using AES.
The required hdfs-site, core-site, mapred-site, yarn-site and container-executor.cfg files are modified. Also for unlimited strength of security, the JCE policy files are also kept in $JAVA_HOME/lib/security directory.
When starting the namenode daemon, it is working fine. But while accessing the hdfs as
hadoop fs –ls /
we got the below error:
15/02/06 15:17:12 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] ls: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "xxxxxxx/10.122.48.12"; destination host is: "xxxxxxx":8020;
If anyone has prior knowledge or has worked on Kerberos on top of Hadoop, kindly suggest us some solution on the above issue.

To use Hadoop command, you need to use kinit command to get a Kerberos ticket first:
kinit [-kt user_keytab username]
Once it's done, you can list the ticket with:
klist
See cloudera's doc for more details: Verify that Kerberos Security is Working

Related

kRB Ticket issues

Namenode and zk service went down and try to restart getting below error.
Open source hadoop cluster with KRB security enabled
ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
Getting m

Unable to access Hadoop CLI after enabling Kerberos

I've followed the following tutorial CDH Hadoop Kerberos, NameNode and DataNode are able to start properly and I'm able to see all the DataNode listed on the WebUI (0.0.0.0:50070). But I'm unable to access the Hadoop CLI. I've followed this tutorial Certain Java versions cannot read credentials cache, still I'm unable to use the Hadoop CLI.
[root#local9 hduser]# hadoop fs -ls /
20/11/03 12:24:32 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
20/11/03 12:24:32 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
20/11/03 12:24:32 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
ls: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "local9/192.168.2.9"; destination host is: "local9":8020;
[root#local9 hduser]# klist
Ticket cache: KEYRING:persistent:0:krb_ccache_hVEAjWz
Default principal: hdfs/local9#FBSPL.COM
Valid starting Expires Service principal
11/03/2020 12:22:42 11/04/2020 12:22:42 krbtgt/FBSPL.COM#FBSPL.COM
renew until 11/10/2020 12:22:12
[root#local9 hduser]# kinit -R
[root#local9 hduser]# klist
Ticket cache: KEYRING:persistent:0:krb_ccache_hVEAjWz
Default principal: hdfs/local9#FBSPL.COM
Valid starting Expires Service principal
11/03/2020 12:24:50 11/04/2020 12:24:50 krbtgt/FBSPL.COM#FBSPL.COM
renew until 11/10/2020 12:22:12
[root#local9 hduser]# hadoop fs -ls /
20/11/03 12:25:04 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
20/11/03 12:25:04 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
20/11/03 12:25:04 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
ls: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "local9/192.168.2.9"; destination host is: "local9":8020;
Any Help would be greatly appreciated.
I figured out the issue.
It's a cache credential bug in Redhat: Red Hat Bugzilla – Bug 1029110
Then I found this document on Kerberos on Cloudera: Manage krb5.conf
Finally the solution was to comment out this line from /etc/krb5.conf
default_ccache_name = KEYRING:persistent:%{uid}
I was able to access the Hadoop CLI after commenting this line.

Kerberos HBase Zookeeper fails

I'm trying to kerberise my HBase Cluster and I get some problems with Zookeeper. When I start Hbase I get this error on the Master log :
ERROR [main-SendThread(X.X.X.X:2181)] client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.
ERROR [main-SendThread(X.X.X.X:2181)] zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.
DEBUG [main-EventThread] zookeeper.ZKWatcher: master:16000-0x16c236187be0000, quorum=Y.Y.Y.Y:2181,X.X.X.X:2181, baseZNode=/hbase Received ZooKeeper Event, type=None, state=AuthFailed, path=null
DEBUG [main] zookeeper.ZooKeeper: Close called on already closed client
On the Zookeeper log, I get :
WARN [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181] quorum.Learner: Unexpected exception, tries=0, connecting to /X.X.X.X:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
I verified my firewall, the ports are open
For the configuration, I followed the HBase Reference Guide :
http://hbase.apache.org/book.html#zk.sasl.auth
At first I thought it was a problem with my keytab but Hadoop is working fine with it.
I run HBase 2.0.5, Hadoop 3.1.2 and the Zookeeper is the one provided by HBase.
Following #SamsonScharfrichter 's comment, I've tried a few things :
I've created and specified in /etc/hosts the FQDN of my servers and modified my configurations to reflect this change.
Changed the hostname of my servers for the FQDN
tried to nslookup my hostnames, didn't work since they are specified in /etc/hosts
It didn't do anything, I'm still getting the error. My guess is that Kerberos tries to search for a DNS on my public NIC and not my private. I do not know why it struggles so hard to find my servers, since hadoop has absolutely no problem with it.
EDIT - I set up a private DNS on my network. DNS working great, still getting the error. I'm about to give up
EDIT 2 - I installed tshark on the node with the error. Apparently I get a frame with the message :
Error: KRB5KDC_ERR_C_PRINCIPAL_UNKNOWN
which is weird, I verified my keytab and the principals listed in kadmin. Maybe there defaults principals that I don't use ?

PriviledgedActionException (failed to find any kerberos tgt)

I am connecting to hdfs by using kerberos as authentication mechanism,I am running a job which takes 3 days to complete,I am getting this error:
org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:user_name(auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
I have also initiated crontab which does kinit by using the same keytab for every 1 hour but still am getting the same error and job fails to complete
please help me

Spark and secured phoenix not working on yarn

I am trying to connect to secured phoenix through spark in yarn using JDBC, and i can see on the logs, it is connecting successfully:
JDBC URL: jdbc:phoenix:zookeeper_quorum:/hbase-secure:someprincial#REALM.COM:/path/to/keytab/someprincipal.keytab
18/02/27 09:30:22 INFO ConnectionQueryServicesImpl: Trying to connect to a secure cluster with keytab:/path/to/keytab/someprincipal.keytab
18/02/27 09:30:22 DEBUG UserGroupInformation: hadoop login
18/02/27 09:30:22 DEBUG UserGroupInformation: hadoop login commit
18/02/27 09:30:22 DEBUG UserGroupInformation: using kerberos user:someprincial#REALM.COM
18/02/27 09:30:22 DEBUG UserGroupInformation: Using user: "someprincial#REALM.COM" with name someprincial#REALM.COM
18/02/27 09:30:22 DEBUG UserGroupInformation: User entry: "someprincial#REALM.COM"
18/02/27 09:30:22 INFO UserGroupInformation: Login successful for user someprincial#REALM.COM using keytab file /path/to/keytab/someprincipal.keytab
18/02/27 09:30:22 INFO ConnectionQueryServicesImpl: Successfull login to secure cluster!!
but then later, when trying to call AbstractRpcClient, it is giving me an issue and it is not anymore using KERBEROS authentication in UserGroupInformation, and it seems it is getting the OS user instead of the one i provided in the JDBC
18/02/27 09:30:23 DEBUG AbstractRpcClient: RPC Server Kerberos principal name for service=ClientService is hbase/hbaseprincipa#REALM.COM
18/02/27 09:30:23 DEBUG AbstractRpcClient: Use KERBEROS authentication for service ClientService, sasl=true
18/02/27 09:30:23 DEBUG AbstractRpcClient: Connecting to some.host.name/10.000.145.544:16020
18/02/27 09:30:23 DEBUG UserGroupInformation: PrivilegedAction as:someuser (auth:SIMPLE) from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734)
18/02/27 09:30:23 DEBUG HBaseSaslRpcClient: Creating SASL GSSAPI client. Server's Kerberos principal name is hbase/hbaseprincipa#REALM.COM
18/02/27 09:30:23 DEBUG UserGroupInformation: PrivilegedActionException as:someuser (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/02/27 09:30:23 DEBUG UserGroupInformation: PrivilegedAction as:someuser (auth:SIMPLE) from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleSaslConnectionFailure(RpcClientImpl.java:637)
18/02/27 09:30:23 WARN AbstractRpcClient: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/02/27 09:30:23 ERROR AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:611)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:156)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:737)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:734)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:887)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:856)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1199)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32741)
at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:379)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:201)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:63)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:364)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:338)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
... 26 more
This issue only happens when i am running in yarn, but when i am running in my local, it is using the same UserGroupInformation and able to connect to ClientService without any issues.
Do you have any idea why is this happening?
I already included in my classpath (executor) all the needed configurations like hbase-site.xml, core-site.xml, hdfs-site.xml, I also set the JAAS config file.
I've noticed on my local that in the beginning, the UGI gets the one from my OS, then since i tried to connect to phoenix, phoenix (ConnectionQueryServicesImpl.java) overrides the UGI with the one I indicated in JDBC, so when trying to connect again, it is using the correct UGI.
When running in cluster, it seems that it is not like that, even though I connected to phoenix successfully, when trying to use the UGI again, it gets the one from OS - i am running in same executor.
notice that the RpcClientImpl is using CurrentUser which is based from the OS user.
In my driver, whenever i try to get the CurrentUser, it is using kerberos authentication with the principal - assuming that kinit is done or keytab & principal is provided in the spark submit command
In executor, when there is a valid token in the node, LoginUser is set to kerberos authentication but CurrentUser is set to simple authentication using OS information
How can i make the executor change the CurrentUser?
Anyway, i am able to solve it by forcingly doing the update using the LoginUser with UserGroupInformation.doAs() method
After several weeks, I finally fingered it out. The key is setting spark.yarn.security.credentials.hbase.enabled to true.
Submit spark as following:
spark-submit \
--master yarn \
--keytab my-keytab \
--principal my-principal \
--conf spark.yarn.security.credentials.hbase.enabled=true \
# other configs
And in executor, create phoenix connection without keytab and principal:
String url = "jdbc:phoenix:2.1.8.1:2181:/hbase";
Connection conn = DriverManager.getConnection(url, properties);

Resources