KrbException connecting to Hadoop cluster with Zookeeper client - UNKNOWN_SERVER - hadoop

My Zookeeper client is having trouble connecting to the Hadoop cluster.
This works fine from a Linux VM, but I am using a Mac.
I set the -Dsun.security.krb5.debug=true flag on the JVM and get the following output:
Found ticket for solr#DDA.MYCO.COM to go to krbtgt/DDA.MYCO.COM#DDA.MYCO.COM expiring on Sat Apr 29 03:15:04 BST 2017
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for solr#DDA.MYCO.COM to go to krbtgt/DDA.MYCO.COM#DDA.MYCO.COM expiring on Sat Apr 29 03:15:04 BST 2017
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: same realm
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 17 16 23.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType
>>> KrbKdcReq send: kdc=oc-10-252-132-139.nat-ucfc2z3b.usdv1.mycloud.com UDP:88, timeout=30000, number of retries =3, #bytes=682
>>> KDCCommunication: kdc=oc-10-252-132-139.nat-ucfc2z3b.usdv1.mycloud.com UDP:88, timeout=30000,Attempt =1, #bytes=682
>>> KrbKdcReq send: #bytes read=217
>>> KdcAccessibility: remove oc-10-252-132-139.nat-ucfc2z3b.usdv1.mycloud.com
>>> KDCRep: init() encoding tag is 126 req type is 13
>>>KRBError:
cTime is Thu Dec 24 11:18:15 GMT 2015 1450955895000
sTime is Fri Apr 28 15:15:06 BST 2017 1493388906000
suSec is 925863
error code is 7
error Message is Server not found in Kerberos database
cname is solr#DDA.MYCO.COM
sname is zookeeper/oc-10-252-132-160.nat-ucfc2z3b.usdv1.mycloud.com#DDA.MYCO.COM
msgType is 30
KrbException: Server not found in Kerberos database (7) - UNKNOWN_SERVER
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:693)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
at org.apache.zookeeper.client.ZooKeeperSaslClient$2.run(ZooKeeperSaslClient.java:366)
at org.apache.zookeeper.client.ZooKeeperSaslClient$2.run(ZooKeeperSaslClient.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:362)
at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:348)
at org.apache.zookeeper.client.ZooKeeperSaslClient.sendSaslPacket(ZooKeeperSaslClient.java:420)
at org.apache.zookeeper.client.ZooKeeperSaslClient.initialize(ZooKeeperSaslClient.java:458)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1057)
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
at sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
... 18 more
ERROR 2017-04-28 15:15:07,046 5539 org.apache.zookeeper.client.ZooKeeperSaslClient [main-SendThread(oc-10-252-132-160.nat-ucfc2z3b.usdv1.mycloud.com:2181)]
An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed
[Caused by GSSException: No valid credentials provided
(Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)])
occurred when evaluating Zookeeper Quorum Member's received SASL token.
This may be caused by Java's being unable to resolve the Zookeeper Quorum Member's hostname correctly.
You may want to try to adding '-Dsun.net.spi.nameservice.provider.1=dns,sun' to your client's JVMFLAGS environment.
Zookeeper Client will go to AUTH_FAILED state.
I've tested Kerberos config as follows:
>kinit -kt /etc/security/keytabs/solr.headless.keytab solr
>klist
Credentials cache: API:3451691D-7D5E-49FD-A27C-135816F33E4D
Principal: solr#DDA.MYCO.COM
Issued Expires Principal
Apr 28 16:58:02 2017 Apr 29 04:58:02 2017 krbtgt/DDA.MYCO.COM#DDA.MYCO.COM
Following the instructions from hortonworks I managed to get the kerberos ticket stored in a file:
>klist -c FILE:/tmp/krb5cc_501
Credentials cache: FILE:/tmp/krb5cc_501
Principal: solr#DDA.MYCO.COM
Issued Expires Principal
Apr 28 17:10:25 2017 Apr 29 05:10:25 2017 krbtgt/DDA.MYCO.COM#DDA.MYCO.COM
Also I tried the suggested JVM option suggested in the stack trace (-Dsun.net.spi.nameservice.provider.1=dns,sun), but this led to a different error along the lines of Client session timed out, which suggests that this JVM param is preventing the client from connecting correctly in the first place.
==EDIT==
Seems that the Mac version of Kerberos is not the latest:
> krb5-config --version
Kerberos 5 release 1.7-prerelease
I just tried brew install krb5 to install a newer version, then adjusting the path to point to the new version.
> krb5-config --version
Kerberos 5 release 1.15.1
This has had no effect whatsoever on the outcome.
NB this works fine from a linux VM on my Mac, using exactly the same jaas.conf, keytab files, and krb5.conf.
krb5.conf:
[libdefaults]
renew_lifetime = 7d
forwardable = true
default_realm = DDA.MYCO.COM
ticket_lifetime = 24h
dns_lookup_realm = false
dns_lookup_kdc = false
[realms]
DDA.MYCO.COM = {
admin_server = oc-10-252-132-139.nat-ucfc2z3b.usdv1.mycloud.com
kdc = oc-10-252-132-139.nat-ucfc2z3b.usdv1.mycloud.com
}
Reverse DNS:
I checked that the FQDN hostname I'm connecting to can be found using a reverse DNS lookup:
> host 10.252.132.160
160.132.252.10.in-addr.arpa domain name pointer oc-10-252-132-160.nat-ucfc2z3b.usdv1.mycloud.com.
This is exactly as per the response to the same command from the linux VM.
===WIRESHARK ANALYSIS===
Using Wireshark configured to use the system key tabs allows a bit more detail in the analysis.
Here I have found that a failed call looks like this:
client -> host AS-REQ
host -> client AS-REP
client -> host AS-REQ
host -> client AS-REP
client -> host TGS-REQ <-- this call is detailed below
host -> client KRB error KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN
The erroneous TGS-REQ call shows the following:
Kerberos
tgs-req
pvno: 5
msg-type: krb-tgs-req (12)
padata: 1 item
req-body
Padding: 0
kdc-options: 40000000 (forwardable)
realm: DDA.MYCO.COM
sname
name-type: kRB5-NT-UNKNOWN (0)
sname-string: 2 items
SNameString: zookeeper
SNameString: oc-10-252-134-51.nat-ucfc2z3b.usdv1.mycloud.com
till: 1970-01-01 00:00:00 (UTC)
nonce: 797021964
etype: 3 items
ENCTYPE: eTYPE-AES128-CTS-HMAC-SHA1-96 (17)
ENCTYPE: eTYPE-DES3-CBC-SHA1 (16)
ENCTYPE: eTYPE-ARCFOUR-HMAC-MD5 (23)
Here is the corresponding successful call from the linux box, which is followed by several more exchanges.
Kerberos
tgs-req
pvno: 5
msg-type: krb-tgs-req (12)
padata: 1 item
req-body
Padding: 0
kdc-options: 40000000 (forwardable)
realm: DDA.MYCO.COM
sname
name-type: kRB5-NT-UNKNOWN (0)
sname-string: 2 items
SNameString: zookeeper
SNameString: d59407.ddapoc.ucfc2z3b.usdv1.mycloud.com
till: 1970-01-01 00:00:00 (UTC)
nonce: 681936272
etype: 3 items
ENCTYPE: eTYPE-AES128-CTS-HMAC-SHA1-96 (17)
ENCTYPE: eTYPE-DES3-CBC-SHA1 (16)
ENCTYPE: eTYPE-ARCFOUR-HMAC-MD5 (23)
So it looks like the client is sending
oc-10-252-134-51.nat-ucfc2z3b.usdv1.mycloud.com
as the server host, when it should be sending:
d59407.ddapoc.ucfc2z3b.usdv1.mycloud.com
So the question is, how do I fix that? Bear in mind this is a Java piece of code.
My /etc/hosts has the following:
10.252.132.160 b3e073.ddapoc.ucfc2z3b.usdv1.mycloud.com
10.252.134.51 d59407.ddapoc.ucfc2z3b.usdv1.mycloud.com
10.252.132.139 d7cc18.ddapoc.ucfc2z3b.usdv1.mycloud.com
And my krb5.conf file has:
kdc = d7cc18.ddapoc.ucfc2z3b.usdv1.mycloud.com
kdc = b3e073.ddapoc.ucfc2z3b.usdv1.mycloud.com
kdc = d59407.ddapoc.ucfc2z3b.usdv1.mycloud.com
I tried adding -Dsun.net.spi.nameservice.provider.1=file,dns as a JVM param but got the same result.

I fixed this by setting up a local dnsmasq instance to supply the forward and reverse DNS lookups.
So now from the command line, host d59407.ddapoc.ucfc2z3b.usdv1.mycloud.com returns 10.252.134.51
See also here and here.

Looks like some DNS issue.
Could this SO question help you resolving your problem?
Also, here is an Q&A about the problem.
It also could be because of non Sun JVM.

Related

WebLogic Server 12c cant log in localhost Exception weblogic.nodemanager.NMConnectException: Connection refused

I use a Oracle WebLogic Server 12c and when I start it on Eclipse it shows this error, it does'nt allow me to log in localhost neither when I try to access it
from another computer in a different domain, in the log appears that the server is not reacheable.
This is the exception that appears in Eclipse.
This Exception occurred at Wed Nov 13 08:11:07 CET 2019.
weblogic.nodemanager.NMConnectException: Connection refused: connect. Could not connect to NodeManager. Check that it is running at localhost/127.0.0.1:5556.
Problem invoking WLST - Traceback (innermost last):
File "C:\Oracle\Middleware\Oracle_Home\user_projects\domains\base_domain\bin\scripts_manejados\StartBT.wlst", line 1, in ?
File "<iostream>", line 111, in nmConnect
File "<iostream>", line 552, in raiseWLSTException
WLSTException: Error occurred while performing nmConnect : Cannot connect to Node Manager. : Connection refused: connect. Could not connect to NodeManager. Check that it is running at localhost/127.0.0.1:5545.
This is my nodemanager properties:
#Tue Nov 12 09:45:58 CET 2019
#Node manager properties
#Fri Jun 24 14:55:43 CEST 2016
DomainsFile=C\:\\Oracle\\Middleware\\Oracle_Home\\user_projects\\domains\\base_domain\\nodemanager\\nodemanager.domains
LogLimit=0
PropertiesVersion=12.1.3
AuthenticationEnabled=true
NodeManagerHome=C\:\\Oracle\\Middleware\\Oracle_Home\\user_projects\\domains\\base_domain\\nodemanager
JavaHome=C\:\\Program Files\\Java\\jdk1.7.0_75
LogLevel=INFO
DomainsFileEnabled=true
StartScriptName=startWebLogic.cmd
ListenAddress=localhost
NativeVersionEnabled=true
ListenPort=5540
LogToStderr=true
SecureListener=false
LogCount=1
StopScriptEnabled=false
QuitEnabled=false
LogAppend=true
StateCheckInterval=500
CrashRecoveryEnabled=false
StartScriptEnabled=true
LogFile=C\:\\Oracle\\Middleware\\Oracle_Home\\user_projects\\domains\\base_domain\\nodemanager\\nodemanager.log
LogFormatter=weblogic.nodemanager.server.LogFormatter
ListenBacklog=50
And this is my StartBt script file, used to start the server and specify some preferences:
nmConnect('weblogic','AXLWL20040','localhost','5521','base_domain','C:/Oracle/Middleware/Oracle_Home/user_projects/domains/base_domain','plain')
nmStart('AdminServer')
nmDisconnect()
I want to make it accesible, not only for local.
nmConnect('weblogic','AXLWL20040','localhost',**'5521'**,'base_domain','C:/Oracle/Middleware/Oracle_Home/user_projects/domains/base_domain','plain')
Port should listen to 5545.
You wrongly configure the ports. Your NodeManager listen on port 5540
ListenPort=5540
You try to connect to 5545
WLSTException:..Could not connect to NodeManager. Check that it is running at localhost/127.0.0.1:5545

Java 8 cryptography issue

I have developed a biometric authencation system on java8u144 and active directory password reset using ldaps on java8u191. When I tried to combine them...
Forst biometric encryption popped error for invalid key size. I updated JCE UNLIMITED .THEN BIOMETRIC STARTED WORKING BUT ldaps connection issues remain for ssl handshake pkix path building failed
I am not able to fix it
Pls help me out
I am running out of time
i am getting following exception
%% Invalidated: [Session-1, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384]
http-nio-8084-exec-2, SEND TLSv1.2 ALERT: fatal, description = certificate_unknown
http-nio-8084-exec-2, WRITE: TLSv1.2 Alert, length = 2
[Raw write]: length = 7
0000: 15 03 03 00 02 02 2E .......
http-nio-8084-exec-2, called closeSocket()
http-nio-8084-exec-2, handling exception: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
on java8u144 without jce code was working but biometric needed unlimited strength. i tried java8u144 java8u162 java8u191
currently using java8u162.
above exception is coming only after JCE upgrade.
kindly guide how to get certificate chain for this.
NOTE LDP.exe is working on client sucessfully.
& OPENSSL : unable to verify first cerificate

Hadoop Kerberos: hdfs command 'Failed to find any Kerberos tgt' even though I had got one ticket using kinit

I set up Kerberos authentication for Hadoop cluster. When i try to get kerberos ticket using kinit, it stores the ticket in krb5cc_0
$ sudo klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hduser/stwhdrm01#FDATA.COM
Valid starting Expires Service principal
01/04/2018 10:15:14 01/05/2018 10:15:14 krbtgt/FDATA.COM#FDATA.COM
But when I tried to list HDFS directory on command line I got the following error:
$ hdfs dfs -ls /
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
>>>KinitOptions cache name is /tmp/krb5cc_1001
18/01/04 10:07:48 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/01/04 10:07:48 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
My /etc/krb5.conf:
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = FDATA.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
default_ccache_name = FILE:/tmp/krb5cc_0
[realms]
FDATA.COM = {
kdc = kdc.fdata.com
admin_server = kdc.fdata.com
}
[domain_realm]
.fdata.com = FDATA.COM
fdata.com = FDATA.COM
OS: Centos 7
Kerberos: MIT Kerberos 1.5.1
Hadoop: Apache Hadoop 2.7.3
Why hdfs and kinit using different kerberos ccache file?
Because you called kinit with sudo not as yourself. Your klist output shows the Kerberos ticket for root.

One node in hadoop cluster failure

I have configured 10 nodes HDP hadoop cluster recently, each node is of OS SLES11..
On master node I have configured all master services and clients..also the mabari-server. Remaining nodes other slave services and their clients.
NTP sync is on, other pre-requisites also fine.
I am experiencing weird behavior on hadoop cluster, After starting all the services within few hours one of the node goes down.
When I experienced this first time, I have restarted that particular node and added back to the cluster.
Now My master node is causing the same issue due to which whole cluster is down. I have checked the logs but there are no indications related to failure.
I am clueless what is the root cause for the failure of the node in hadoop cluster?
Below are logs :-
the system which went down:
/var/log/messages
these are /var/log/messages: notice)=0', processed='source(src)=6830'
Apr 23 05:22:43 lnx1863 SuSEfirewall2: SuSEfirewall2 not active Apr 23
05:23:49 lnx1863 SuSEfirewall2: SuSEfirewall2 not active Apr 23
05:24:17 lnx1863 sudo: root : TTY=pts/0 ; PWD=/ ; USER=root ;
COMMAND=/usr/bin/du -h / Apr 23 05:24:55 lnx1863 SuSEfirewall2:
SuSEfirewall2 not active Apr 23 05:25:22 lnx1863 kernel:
[248531.127254] megasas: Found FW in FAULT state, will reset adapter.
Apr 23 05:25:22 lnx1863 kernel: [248531.127260] megaraid_sas:
resetting fusion adapter. Apr 23 05:25:22 lnx1863 kernel:
[248531.127427] megaraid_sas: Reset not supported, killing adapter.
namenode logs:-
INFO 2015-04-23 05:27:43,665 Heartbeat.py:78 - Building Heartbeat:
{responseId = 7607, timestamp = 1429781263665, commandsInProgress =
False, componentsMapped = True} INFO 2015-04-23 05:28:44,053
security.py:135 - Encountered communication error. Details:
SSLError('The read operation timed out',) ERROR 2015-04-23
05:28:44,053 Controller.py:278 - Connection to http://localhost was
lost (details=Request to
https://localhost:8441/agent/v1/heartbeat/localhostip failed due to
Error occured during connecting to the server: The read operation
timed out) INFO 2015-04-23 05:29:16,061 NetUtil.py:48 - Connecting to
https://localhost:8440/connection_info INFO 2015-04-23 05:29:16,118
security.py:93 - SSL Connect being called.. connecting to the server

Why tinyproxy requires an upstream proxy?

Today I configured a basic tinyproxy.
I expected it to act as proxy for ubuntu repositories.
But when trying to download stuff from repositories I got this on tinyproxy log
CONNECT Mar 27 17:30:46 [20348]: Connect (file descriptor 9): [unknown] [192.168.2.30]
CONNECT Mar 27 17:30:46 [20348]: Request (file descriptor 9): GET http://br.archive.ubuntu.com/ubuntu/pool/main/t/tdb/python-tdb_1.2.12-1_amd64.deb HTTP/1.1
INFO Mar 27 17:30:46 [20348]: No upstream proxy for br.archive.ubuntu.com
ERROR Mar 27 17:30:56 [20348]: opensock: Could not retrieve info for br.archive.ubuntu.com
INFO Mar 27 17:30:56 [20348]: no entity
I stuck on some misconcept. Do not tinyproxy send requests for outside servers directly?
I supllied an external proxy server to fix this
upstream 117.79.64.29:80

Resources