Hadoop Kerberos: hdfs command 'Failed to find any Kerberos tgt' even though I had got one ticket using kinit - hadoop

I set up Kerberos authentication for Hadoop cluster. When i try to get kerberos ticket using kinit, it stores the ticket in krb5cc_0
$ sudo klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hduser/stwhdrm01#FDATA.COM
Valid starting Expires Service principal
01/04/2018 10:15:14 01/05/2018 10:15:14 krbtgt/FDATA.COM#FDATA.COM
But when I tried to list HDFS directory on command line I got the following error:
$ hdfs dfs -ls /
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
>>>KinitOptions cache name is /tmp/krb5cc_1001
18/01/04 10:07:48 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/01/04 10:07:48 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
My /etc/krb5.conf:
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = FDATA.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
default_ccache_name = FILE:/tmp/krb5cc_0
[realms]
FDATA.COM = {
kdc = kdc.fdata.com
admin_server = kdc.fdata.com
}
[domain_realm]
.fdata.com = FDATA.COM
fdata.com = FDATA.COM
OS: Centos 7
Kerberos: MIT Kerberos 1.5.1
Hadoop: Apache Hadoop 2.7.3
Why hdfs and kinit using different kerberos ccache file?

Because you called kinit with sudo not as yourself. Your klist output shows the Kerberos ticket for root.

Related

Unable to access Hadoop CLI after enabling Kerberos

I've followed the following tutorial CDH Hadoop Kerberos, NameNode and DataNode are able to start properly and I'm able to see all the DataNode listed on the WebUI (0.0.0.0:50070). But I'm unable to access the Hadoop CLI. I've followed this tutorial Certain Java versions cannot read credentials cache, still I'm unable to use the Hadoop CLI.
[root#local9 hduser]# hadoop fs -ls /
20/11/03 12:24:32 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
20/11/03 12:24:32 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
20/11/03 12:24:32 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
ls: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "local9/192.168.2.9"; destination host is: "local9":8020;
[root#local9 hduser]# klist
Ticket cache: KEYRING:persistent:0:krb_ccache_hVEAjWz
Default principal: hdfs/local9#FBSPL.COM
Valid starting Expires Service principal
11/03/2020 12:22:42 11/04/2020 12:22:42 krbtgt/FBSPL.COM#FBSPL.COM
renew until 11/10/2020 12:22:12
[root#local9 hduser]# kinit -R
[root#local9 hduser]# klist
Ticket cache: KEYRING:persistent:0:krb_ccache_hVEAjWz
Default principal: hdfs/local9#FBSPL.COM
Valid starting Expires Service principal
11/03/2020 12:24:50 11/04/2020 12:24:50 krbtgt/FBSPL.COM#FBSPL.COM
renew until 11/10/2020 12:22:12
[root#local9 hduser]# hadoop fs -ls /
20/11/03 12:25:04 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
20/11/03 12:25:04 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
20/11/03 12:25:04 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
ls: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "local9/192.168.2.9"; destination host is: "local9":8020;
Any Help would be greatly appreciated.
I figured out the issue.
It's a cache credential bug in Redhat: Red Hat Bugzilla – Bug 1029110
Then I found this document on Kerberos on Cloudera: Manage krb5.conf
Finally the solution was to comment out this line from /etc/krb5.conf
default_ccache_name = KEYRING:persistent:%{uid}
I was able to access the Hadoop CLI after commenting this line.

Can't access HDFS when Hadoop cluster kerberized

I successfully kerberized a test Hortonworks cluster. Ambari created keytabs for the services and they are all started. There is HA for namenodes. Standby namenode starts fast, the Active namenode takes much longer. Namenode UI shows that everything is correct. Can login by using kerberos.
Namenodes are nn1.zim.com and nn2.zim.com
What can be wrong with this configuration?
Login as hdfs, load keytab with kinit -kt. On list HDFS attempt I get this error:
[root#nn1 hdfs]# hdfs dfs -ls /
18/12/02 16:18:22 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSE xception: No valid credentials provided (Mechanism level: Failed to find any Ker beros tgt)]
18/12/02 16:18:22 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechani sm level: Failed to find any Kerberos tgt)]; Host Details : local host is: "nn1. zim.com/192.168.50.10"; destination host is: "nn2.zim.com":8020; , while invokin g ClientNamenodeProtocolTranslatorPB.getFileInfo over nn2.zim.com/192.168.50.11: 8020 after 1 failover attempts. Trying to failover after sleeping for 1123ms.
Kerberos principal for hosts are:
nn1.zim.com/192.168.50.10#ZIM.COM
nn1.zim.com#ZIM.COM
nn2.zim.com/192.168.50.11#ZIM.COM
nn2.zim.com#ZIM.COM
host/nn1.zim.com#ZIM.COM
host/nn2.zim.com#ZIM.COM
The krb5.cfg:
[logging] default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults] dns_lookup_realm = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
rdns = false default_realm =
ZIM.COM
default_ccache_name = KEYRING:persistent:%{uid}
[realms] ZIM.COM = {
kdc = kb.zim.com
admin_server = kb.zim.com
}
[domain_realm]
.zim.com = ZIM.COM
zim.com = ZIM.COM
SOLUTION: The two kerberos principals for each host have to be created: FQDN and short. I've created only FQDN (nn1.zim.com) - this was the cause of the issue. After creation of the second principal (nn1), everything started to work.
When you work with Active Directory, both types of the principals would be created automatically on AD Computer object creation.

Spark and secured phoenix not working on yarn

I am trying to connect to secured phoenix through spark in yarn using JDBC, and i can see on the logs, it is connecting successfully:
JDBC URL: jdbc:phoenix:zookeeper_quorum:/hbase-secure:someprincial#REALM.COM:/path/to/keytab/someprincipal.keytab
18/02/27 09:30:22 INFO ConnectionQueryServicesImpl: Trying to connect to a secure cluster with keytab:/path/to/keytab/someprincipal.keytab
18/02/27 09:30:22 DEBUG UserGroupInformation: hadoop login
18/02/27 09:30:22 DEBUG UserGroupInformation: hadoop login commit
18/02/27 09:30:22 DEBUG UserGroupInformation: using kerberos user:someprincial#REALM.COM
18/02/27 09:30:22 DEBUG UserGroupInformation: Using user: "someprincial#REALM.COM" with name someprincial#REALM.COM
18/02/27 09:30:22 DEBUG UserGroupInformation: User entry: "someprincial#REALM.COM"
18/02/27 09:30:22 INFO UserGroupInformation: Login successful for user someprincial#REALM.COM using keytab file /path/to/keytab/someprincipal.keytab
18/02/27 09:30:22 INFO ConnectionQueryServicesImpl: Successfull login to secure cluster!!
but then later, when trying to call AbstractRpcClient, it is giving me an issue and it is not anymore using KERBEROS authentication in UserGroupInformation, and it seems it is getting the OS user instead of the one i provided in the JDBC
18/02/27 09:30:23 DEBUG AbstractRpcClient: RPC Server Kerberos principal name for service=ClientService is hbase/hbaseprincipa#REALM.COM
18/02/27 09:30:23 DEBUG AbstractRpcClient: Use KERBEROS authentication for service ClientService, sasl=true
18/02/27 09:30:23 DEBUG AbstractRpcClient: Connecting to some.host.name/10.000.145.544:16020
18/02/27 09:30:23 DEBUG UserGroupInformation: PrivilegedAction as:someuser (auth:SIMPLE) from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734)
18/02/27 09:30:23 DEBUG HBaseSaslRpcClient: Creating SASL GSSAPI client. Server's Kerberos principal name is hbase/hbaseprincipa#REALM.COM
18/02/27 09:30:23 DEBUG UserGroupInformation: PrivilegedActionException as:someuser (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/02/27 09:30:23 DEBUG UserGroupInformation: PrivilegedAction as:someuser (auth:SIMPLE) from:org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleSaslConnectionFailure(RpcClientImpl.java:637)
18/02/27 09:30:23 WARN AbstractRpcClient: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/02/27 09:30:23 ERROR AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:611)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:156)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:737)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:734)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:887)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:856)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1199)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32741)
at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:379)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:201)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:63)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:364)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:338)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
... 26 more
This issue only happens when i am running in yarn, but when i am running in my local, it is using the same UserGroupInformation and able to connect to ClientService without any issues.
Do you have any idea why is this happening?
I already included in my classpath (executor) all the needed configurations like hbase-site.xml, core-site.xml, hdfs-site.xml, I also set the JAAS config file.
I've noticed on my local that in the beginning, the UGI gets the one from my OS, then since i tried to connect to phoenix, phoenix (ConnectionQueryServicesImpl.java) overrides the UGI with the one I indicated in JDBC, so when trying to connect again, it is using the correct UGI.
When running in cluster, it seems that it is not like that, even though I connected to phoenix successfully, when trying to use the UGI again, it gets the one from OS - i am running in same executor.
notice that the RpcClientImpl is using CurrentUser which is based from the OS user.
In my driver, whenever i try to get the CurrentUser, it is using kerberos authentication with the principal - assuming that kinit is done or keytab & principal is provided in the spark submit command
In executor, when there is a valid token in the node, LoginUser is set to kerberos authentication but CurrentUser is set to simple authentication using OS information
How can i make the executor change the CurrentUser?
Anyway, i am able to solve it by forcingly doing the update using the LoginUser with UserGroupInformation.doAs() method
After several weeks, I finally fingered it out. The key is setting spark.yarn.security.credentials.hbase.enabled to true.
Submit spark as following:
spark-submit \
--master yarn \
--keytab my-keytab \
--principal my-principal \
--conf spark.yarn.security.credentials.hbase.enabled=true \
# other configs
And in executor, create phoenix connection without keytab and principal:
String url = "jdbc:phoenix:2.1.8.1:2181:/hbase";
Connection conn = DriverManager.getConnection(url, properties);

KrbException connecting to Hadoop cluster with Zookeeper client - UNKNOWN_SERVER

My Zookeeper client is having trouble connecting to the Hadoop cluster.
This works fine from a Linux VM, but I am using a Mac.
I set the -Dsun.security.krb5.debug=true flag on the JVM and get the following output:
Found ticket for solr#DDA.MYCO.COM to go to krbtgt/DDA.MYCO.COM#DDA.MYCO.COM expiring on Sat Apr 29 03:15:04 BST 2017
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for solr#DDA.MYCO.COM to go to krbtgt/DDA.MYCO.COM#DDA.MYCO.COM expiring on Sat Apr 29 03:15:04 BST 2017
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: same realm
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 17 16 23.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType
>>> KrbKdcReq send: kdc=oc-10-252-132-139.nat-ucfc2z3b.usdv1.mycloud.com UDP:88, timeout=30000, number of retries =3, #bytes=682
>>> KDCCommunication: kdc=oc-10-252-132-139.nat-ucfc2z3b.usdv1.mycloud.com UDP:88, timeout=30000,Attempt =1, #bytes=682
>>> KrbKdcReq send: #bytes read=217
>>> KdcAccessibility: remove oc-10-252-132-139.nat-ucfc2z3b.usdv1.mycloud.com
>>> KDCRep: init() encoding tag is 126 req type is 13
>>>KRBError:
cTime is Thu Dec 24 11:18:15 GMT 2015 1450955895000
sTime is Fri Apr 28 15:15:06 BST 2017 1493388906000
suSec is 925863
error code is 7
error Message is Server not found in Kerberos database
cname is solr#DDA.MYCO.COM
sname is zookeeper/oc-10-252-132-160.nat-ucfc2z3b.usdv1.mycloud.com#DDA.MYCO.COM
msgType is 30
KrbException: Server not found in Kerberos database (7) - UNKNOWN_SERVER
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:693)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
at org.apache.zookeeper.client.ZooKeeperSaslClient$2.run(ZooKeeperSaslClient.java:366)
at org.apache.zookeeper.client.ZooKeeperSaslClient$2.run(ZooKeeperSaslClient.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:362)
at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:348)
at org.apache.zookeeper.client.ZooKeeperSaslClient.sendSaslPacket(ZooKeeperSaslClient.java:420)
at org.apache.zookeeper.client.ZooKeeperSaslClient.initialize(ZooKeeperSaslClient.java:458)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1057)
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
at sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
... 18 more
ERROR 2017-04-28 15:15:07,046 5539 org.apache.zookeeper.client.ZooKeeperSaslClient [main-SendThread(oc-10-252-132-160.nat-ucfc2z3b.usdv1.mycloud.com:2181)]
An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed
[Caused by GSSException: No valid credentials provided
(Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)])
occurred when evaluating Zookeeper Quorum Member's received SASL token.
This may be caused by Java's being unable to resolve the Zookeeper Quorum Member's hostname correctly.
You may want to try to adding '-Dsun.net.spi.nameservice.provider.1=dns,sun' to your client's JVMFLAGS environment.
Zookeeper Client will go to AUTH_FAILED state.
I've tested Kerberos config as follows:
>kinit -kt /etc/security/keytabs/solr.headless.keytab solr
>klist
Credentials cache: API:3451691D-7D5E-49FD-A27C-135816F33E4D
Principal: solr#DDA.MYCO.COM
Issued Expires Principal
Apr 28 16:58:02 2017 Apr 29 04:58:02 2017 krbtgt/DDA.MYCO.COM#DDA.MYCO.COM
Following the instructions from hortonworks I managed to get the kerberos ticket stored in a file:
>klist -c FILE:/tmp/krb5cc_501
Credentials cache: FILE:/tmp/krb5cc_501
Principal: solr#DDA.MYCO.COM
Issued Expires Principal
Apr 28 17:10:25 2017 Apr 29 05:10:25 2017 krbtgt/DDA.MYCO.COM#DDA.MYCO.COM
Also I tried the suggested JVM option suggested in the stack trace (-Dsun.net.spi.nameservice.provider.1=dns,sun), but this led to a different error along the lines of Client session timed out, which suggests that this JVM param is preventing the client from connecting correctly in the first place.
==EDIT==
Seems that the Mac version of Kerberos is not the latest:
> krb5-config --version
Kerberos 5 release 1.7-prerelease
I just tried brew install krb5 to install a newer version, then adjusting the path to point to the new version.
> krb5-config --version
Kerberos 5 release 1.15.1
This has had no effect whatsoever on the outcome.
NB this works fine from a linux VM on my Mac, using exactly the same jaas.conf, keytab files, and krb5.conf.
krb5.conf:
[libdefaults]
renew_lifetime = 7d
forwardable = true
default_realm = DDA.MYCO.COM
ticket_lifetime = 24h
dns_lookup_realm = false
dns_lookup_kdc = false
[realms]
DDA.MYCO.COM = {
admin_server = oc-10-252-132-139.nat-ucfc2z3b.usdv1.mycloud.com
kdc = oc-10-252-132-139.nat-ucfc2z3b.usdv1.mycloud.com
}
Reverse DNS:
I checked that the FQDN hostname I'm connecting to can be found using a reverse DNS lookup:
> host 10.252.132.160
160.132.252.10.in-addr.arpa domain name pointer oc-10-252-132-160.nat-ucfc2z3b.usdv1.mycloud.com.
This is exactly as per the response to the same command from the linux VM.
===WIRESHARK ANALYSIS===
Using Wireshark configured to use the system key tabs allows a bit more detail in the analysis.
Here I have found that a failed call looks like this:
client -> host AS-REQ
host -> client AS-REP
client -> host AS-REQ
host -> client AS-REP
client -> host TGS-REQ <-- this call is detailed below
host -> client KRB error KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN
The erroneous TGS-REQ call shows the following:
Kerberos
tgs-req
pvno: 5
msg-type: krb-tgs-req (12)
padata: 1 item
req-body
Padding: 0
kdc-options: 40000000 (forwardable)
realm: DDA.MYCO.COM
sname
name-type: kRB5-NT-UNKNOWN (0)
sname-string: 2 items
SNameString: zookeeper
SNameString: oc-10-252-134-51.nat-ucfc2z3b.usdv1.mycloud.com
till: 1970-01-01 00:00:00 (UTC)
nonce: 797021964
etype: 3 items
ENCTYPE: eTYPE-AES128-CTS-HMAC-SHA1-96 (17)
ENCTYPE: eTYPE-DES3-CBC-SHA1 (16)
ENCTYPE: eTYPE-ARCFOUR-HMAC-MD5 (23)
Here is the corresponding successful call from the linux box, which is followed by several more exchanges.
Kerberos
tgs-req
pvno: 5
msg-type: krb-tgs-req (12)
padata: 1 item
req-body
Padding: 0
kdc-options: 40000000 (forwardable)
realm: DDA.MYCO.COM
sname
name-type: kRB5-NT-UNKNOWN (0)
sname-string: 2 items
SNameString: zookeeper
SNameString: d59407.ddapoc.ucfc2z3b.usdv1.mycloud.com
till: 1970-01-01 00:00:00 (UTC)
nonce: 681936272
etype: 3 items
ENCTYPE: eTYPE-AES128-CTS-HMAC-SHA1-96 (17)
ENCTYPE: eTYPE-DES3-CBC-SHA1 (16)
ENCTYPE: eTYPE-ARCFOUR-HMAC-MD5 (23)
So it looks like the client is sending
oc-10-252-134-51.nat-ucfc2z3b.usdv1.mycloud.com
as the server host, when it should be sending:
d59407.ddapoc.ucfc2z3b.usdv1.mycloud.com
So the question is, how do I fix that? Bear in mind this is a Java piece of code.
My /etc/hosts has the following:
10.252.132.160 b3e073.ddapoc.ucfc2z3b.usdv1.mycloud.com
10.252.134.51 d59407.ddapoc.ucfc2z3b.usdv1.mycloud.com
10.252.132.139 d7cc18.ddapoc.ucfc2z3b.usdv1.mycloud.com
And my krb5.conf file has:
kdc = d7cc18.ddapoc.ucfc2z3b.usdv1.mycloud.com
kdc = b3e073.ddapoc.ucfc2z3b.usdv1.mycloud.com
kdc = d59407.ddapoc.ucfc2z3b.usdv1.mycloud.com
I tried adding -Dsun.net.spi.nameservice.provider.1=file,dns as a JVM param but got the same result.
I fixed this by setting up a local dnsmasq instance to supply the forward and reverse DNS lookups.
So now from the command line, host d59407.ddapoc.ucfc2z3b.usdv1.mycloud.com returns 10.252.134.51
See also here and here.
Looks like some DNS issue.
Could this SO question help you resolving your problem?
Also, here is an Q&A about the problem.
It also could be because of non Sun JVM.

Kerberos Authentication on Hadoop Cluster

I have prepared a 2 node cluster with plain apache Hadoop. These nodes acts as Kerberos client to another machines which acts as Kerberos Server.
The KDC Db, principals of hdfs on each machines are created with their kaytab files with proper encryption types, using AES.
The required hdfs-site, core-site, mapred-site, yarn-site and container-executor.cfg files are modified. Also for unlimited strength of security, the JCE policy files are also kept in $JAVA_HOME/lib/security directory.
When starting the namenode daemon, it is working fine. But while accessing the hdfs as
hadoop fs –ls /
we got the below error:
15/02/06 15:17:12 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] ls: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "xxxxxxx/10.122.48.12"; destination host is: "xxxxxxx":8020;
If anyone has prior knowledge or has worked on Kerberos on top of Hadoop, kindly suggest us some solution on the above issue.
To use Hadoop command, you need to use kinit command to get a Kerberos ticket first:
kinit [-kt user_keytab username]
Once it's done, you can list the ticket with:
klist
See cloudera's doc for more details: Verify that Kerberos Security is Working

Resources