ambari-agent no longer able to contact server at http://<dom>:8440? - hadoop

We're currently running Hortonworks 2.6.5.0:
$ hadoop version
Hadoop 2.7.3.2.6.5.0-292
Subversion git#github.com:hortonworks/hadoop.git -r 3091053c59a62c82d82c9f778c48bde5ef0a89a1
Compiled by jenkins on 2018-05-11T07:53Z
Compiled with protoc 2.5.0
From source with checksum abed71da5bc89062f6f6711179f2058
This command was run using /usr/hdp/2.6.5.0-292/hadoop/hadoop-common-2.7.3.2.6.5.0-292.jar
The OS is CentOS 7:
$ cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
We recently started noticing these issues in the ambari-agent's log file:
$ grep -i "error|warn" /var/log/ambari-agent/*
/var/log/ambari-agent/ambari-agent.log:WARNING 2018-07-30 14:03:50,982 NetUtil.py:124 - Server at https://hbase26-2.mydom.com:8440 is not reachable, sleeping for 10 seconds...
/var/log/ambari-agent/ambari-agent.log:ERROR 2018-07-30 14:04:00,986 NetUtil.py:96 - EOF occurred in violation of protocol (_ssl.c:579)
/var/log/ambari-agent/ambari-agent.log:ERROR 2018-07-30 14:04:00,990 NetUtil.py:97 - SSLError: Failed to connect. Please check openssl library versions.
/var/log/ambari-agent/ambari-agent.log:WARNING 2018-07-30 14:04:00,990 NetUtil.py:124 - Server at https://hbase26-2.aa.mydom.com:8440 is not reachable, sleeping for 10 seconds...
/var/log/ambari-agent/ambari-agent.log:ERROR 2018-07-30 14:04:10,993 NetUtil.py:96 - EOF occurred in violation of protocol (_ssl.c:579)
/var/log/ambari-agent/ambari-agent.log:ERROR 2018-07-30 14:04:10,994 NetUtil.py:97 - SSLError: Failed to connect. Please check openssl library versions.
/var/log/ambari-agent/ambari-agent.log:WARNING 2018-07-30 14:04:10,994 NetUtil.py:124 - Server at https://hbase26-2.aa.mydom.com:8440 is not reachable, sleeping for 10 seconds...
/var/log/ambari-agent/ambari-agent.log:ERROR 2018-07-30 14:04:20,996 NetUtil.py:96 - EOF occurred in violation of protocol (_ssl.c:579)
/var/log/ambari-agent/ambari-agent.log:ERROR 2018-07-30 14:04:20,997 NetUtil.py:97 - SSLError: Failed to connect. Please check openssl library versions.
When these started occurring we could no longer manage any aspects of the Hadoop cluster through Ambari. All the services showed little yellow question marks and said "heartbeat lost".
Multiple restarts would not allow us to resume Ambari, and ultimately regain control our cluster.

This issue turned out to be due to the server's inability to deal with TLSv1.1 certificates when it was attempting to connect to the CA service on port 8440.
We noticed that the service was in fact running:
$ netstat -tapn|grep 8440
tcp 0 0 0.0.0.0:8440 0.0.0.0:* LISTEN 1203/java
But curl's to this would fail, unless we disabled TLS checks via the --insecure switch. This was our first clue that it appeared to be something related to TLS.
Further investigations led us to NetUtil.py (part of Ambari) which seemed OK. Other leads include:
$ cat /etc/ambari-agent/conf/ambari-agent.ini
...
[security]
ssl_verify_cert = 0
...
And this:
$ grep -E '\[https|verify' /etc/python/cert-verification.cfg
[https]
#verify=platform_default
verify=disable
None of which worked. What did ultimately work is this, Forcing ambari-agent to use TLSv1.2 vs. TLS1.1:
$ grep -E "\[security|force" /etc/ambari-agent/conf/ambari-agent.ini
[security]
force_https_protocol=PROTOCOL_TLSv1_2
And then restarting, ambari-agent restart.
I was able to piece this all together using wisps of hints scattered all over the Internet. I'm putting this here in the hopes it will help any other poor souls that have this happen to their Hadoop/Hortonworks cluster.
References
Ambari agent- [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed
Java/Python Updates and Ambari Agent TLS Settings
Openssl error upon host registration
Cleaning up Ambari Metrics System Data
Why did this happen?
Further debugging/digging I found this thread titled: Disabling TLSv1 & TLS1.1 - Enabling TLSv1.2. It's apparently mandatory that you now configure your Ambari Agent's to use TLSv1.2.

Related

Elastic-APM Invalid index name [_license]

We are trying to get elastic-apm install (for now on our development systems).
According to Homebrew, we have the latest elasticsearch-oss (7.10.2), kibana-oss (7.10.2) and today installed apm-server-oss (which is at version 7.13.0).
Running a apm-server test output we get:
% apm-server test output
elasticsearch: http://localhost:9200...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: ::1, 127.0.0.1
dial up... OK
TLS... WARN secure connection disabled
talk to server... ERROR Connection marked as failed because the onConnect callback failed: could not connect to a compatible version of Elasticsearch: 400 Bad Request: {"error":{"root_cause":[{"type":"invalid_index_name_exception","reason":"Invalid index name [_license], must not start with '_'.","index_uuid":"_na_","index":"_license"}],"type":"invalid_index_name_exception","reason":"Invalid index name [_license], must not start with '_'.","index_uuid":"_na_","index":"_license"},"status":400}
Because the documentation on getting APM going is somewhat obtuse, perhaps this is a configuration issue. But how to investigate further?
Is the solution to install an earlier version of apm-server? If so....how to actually do that with homebrew?
I had the same issue when using non-oss versions. Managed to fix this by upgrading ElasticSearch and Kibana to 7.13.2.

Babeltrace Connection Refused

I'm using LTTng for live debugging. The target machine which I'm debugging has connectivity to only one other machine (say M1), which in turn has connectivity to the external world. I've started lttng-relayd on M1. M1 and my dev host can ping each other. On the target machine, I've created an lttng session as:
lttng create --live 100000 -U net://M1's-ipaddr
I've enabled filters and have started the session.
Now on my dev host (or for that matter any other machine) when I run
babeltrace -i lttng-live net://M1's-ipaddr
I get the below error:
Connect: Connection refused
[error] Connection failed
[warning] [Context] Cannot open_trace of format lttng-live at path net://M1's-ipaddr.
[warning] [Context] cannot open trace "net://M1's-ipaddr" for reading.
[error] opening trace "net://M1's-ipaddr" for reading.
I googled about it, but could not find much help. My babeltrace version is 1.4.0

What might cause the Kubernetes API server to fail to write the client CA configmap?

I'm experiencing that the Kubernetes API server fails to start during cluster bootstrapping with the following error log, apparently due to being unable to initialize its "client CA configmap":
E1029 14:35:56.211083 5 client_ca_hook.go:78] Timeout: request did not complete within allowed duration
F1029 14:35:56.211121 5 hooks.go:126] PostStartHook “ca-registration” failed: unable to initialize client CA configmap: timed out waiting for the condition
It seems to happen here in the Kubernetes source code. What might cause this error?
See the full log here.
Update: It seems that my etcd cluster isn't accessible from master nodes, even though the same command works from etcd member machines:
$ sudo ETCDCTL_API=3 etcdctl --cacert=/opt/tectonic/tls/etcd-client-ca.crt \
--cert=/opt/tectonic/tls/etcd-client.crt --key=/opt/tectonic/tls/etcd-client.key \
--endpoints=https://coreos-testing-etcd-0.socialfoodie.club:2379 \
endpoint health
https://coreos-testing-etcd-0.socialfoodie.club:2379 is unhealthy: failed to connect: grpc: timed out when dialing
Error: unhealthy cluster
I found out that despite the cryptic error message in the API server, the cause is that it can't write to the etcd cluster. The reason was that the API server was configured with a different client certificate authority than what the etcd cluster was using, due to a timing issue wrt. copying certificates in my Terraform cluster setup. I figured out that the CA was the problem by using curl to contact the etcd cluster instead of etcdctl, as it gave a clear error message.
Thanks to #johnharris85 for suggesting etcd connectivity being an issue!

Can't connect remotely to WAS 8.5 full profile installed on Ubuntu 14.04 from RAD 9.5 installed on OSX

WAS 8.5 full profile isn't officially supported on OSX by IBM so the only option to perform development from OSX is to install stub runtime and connect to WAS installed remotely. I'm trying to install such scenario but something done wrong and can't connect to my WAS.
There's my installation:
On OSX El Capitan Installed:
RAD 9.5 with WAS 8.5 stub runtime (WebSphere Application Server traditional V8.5 stub)
Installed Virtual Box with Ubuntu Desktop edition 14.04
Ubuntu's hostname: anatoly-ubuntu-vm and it's accessible from host, i.e. ping anatoly-ubuntu-vm works fine
On Ubuntu installed:
WAS 8.5 full profile at /opt/IBM/WebSphere
Created AppSrv01 profile at /opt/IBM/WebSphere/AppServer/profiles
WAS installed with root user, IBM Installation Manager required root permission when was started
My connection settings in RAD:
server name: WebSphere Application Server traditional V8.5 stub at anatoly-ubuntu-vm
hostname: anatoly-ubuntu-vm
Runtime environment: WebSphere Application Server traditional V8.5 stub
Connection type:
I've tried RMI 2809 and SOAP 8880 both options didn't work
Enable the server to start remotely is checked and Select the operating system running the remote server: checked Linux option with my username and password. I've tried my regular Ubuntu account and root, both option didn't work.
Server profile path defined as /opt/IBM/WebSphere/AppServer/profiles/AppSrv01
When I try to start server I get the following exception:
The following problems has occurred when starting the server.
CTGRI0001E The application could not establish a connection to
anatoly-ubuntu-vm .
What am I doing wrong?
UPDATE 1:
After I've written this post I've figured out that SSH server isn't installed and configured at all as described here: Requirements for using Remote Execution and Access (RXA) Now, I've installed and it seems like connecting but stuck at 23% at stage Preparing launch delegate, after a while it throw the following error:
The following problems has occurred when starting the server. The
server may not be started in the correct mode. You can restart the
server to desired mode if it is started. CTGRI0075E A file transfer to
or from the system named [anatoly-ubuntu-vm] timed out before the
transfer could complete. The current timeout interval is set to 240000
milliseconds, and might need to be increased.
UPDATE 2:
As I can see despite error message server is started and I even can connect to web console anatoly-ubuntu-vm:9060/console/ibm but it looks that not SOAP connection, neither RMI connection don't work. When I run Test Connection from Settings overview page in RAD, I get the following error:
The connection failed after trying to use all the available connection
types.
Verify the port values are correct and the server has been started. If
the security of the server is enabled, verify the "Security is enabled
on this server" check box is selected, and the user ID and password
are provided. You can specify this in the server editor or when
creating a new server.
For a Technote with details on the most common server connection
problem, see http://www.ibm.com/support/docview.wss?uid=swg21266028.
The last connection attempt failed with the following exception:
ADMC0016E: The system cannot create a SOAP connector to connect to
host anatoly-ubuntu-vm at port 8880.
UPDATE 3
As #DanielBarbarian guessed I tried to run ./wsadmin.sh -port 8880 and it worked and returned
Connected to process "server1" on node anatoly-ubuntu-vmNode01 using SOAP connector; The type of process is: UnManagedProcess`
This is settings of my ports:
UPDATE 4
When I trying to run telnet anatoly-ubuntu-vm 8880 from OSX host I get the following response (ip address changed to non real due to privacy issue):
anatoly-mac:~ anatoly$ telnet anatoly-ubuntu-vm 8880
Trying 192.168.10.10...
Connected to anatoly-ubuntu-vm
Escape character is '^]'.
HTTP/1.1 408 Request Timeout
Content-Type: text/html
Content-Length: 117
Connection: close
<HTML><TITLE>408 - Request Timeout</TITLE><BODY>
<h1>408 Connection timed out while reading request</h1></BODY>
</HTML>Connection closed by foreign host

tor not working with sqlmap

root#kali:~# sqlmap --tor --tor-type=SOCKS5 -u http://www.target.com/abc.php?cat=50
sqlmap/1.0-dev - automatic SQL injection and database takeover tool
http://sqlmap.org
[!] legal disclaimer: Usage of sqlmap for attacking targets without prior mutual consent is illegal. It is the end user's responsibility to obey all applicable local, state and federal laws. Developers assume no liability and are not responsible for any misuse or damage caused by this program
[*] starting at 14:18:00
[14:18:00] [WARNING] increasing default value for option '--time-sec' to 10 because switch '--tor' was provided
[14:18:00] [INFO] setting Tor SOCKS proxy settings
[14:18:00] [INFO] testing connection to the target URL
[14:18:00] [CRITICAL] unable to connect to the target URL or proxy. sqlmap is going to retry the request
[14:18:00] [WARNING] please make sure that you have Tor installed and running so you could successfully use switch '--tor' (e.g. 'https://help.ubuntu.com/community/Tor')
[14:18:01] [CRITICAL] unable to connect to the target URL or proxy. sqlmap is going to retry the request
[14:18:02] [CRITICAL] unable to connect to the target URL or proxy. sqlmap is going to retry the request
[14:18:03] [CRITICAL] unable to connect to the target URL or proxy
[*] shutting down at 14:18:03
how to fix this ???
You have to install tor apt-get install tor and then run tor in a terminal, after that you can do sqlmap --tor --tor-type socks5 --tor-port=9050 --check-tor -u http://www.target.com/abc.php?cat=50 to execute with tor.
I had a very similar issue the way that I solved it was to use proxychains and edit the proxychains conf file to route requests through tor. It is easy enough to do
I do not have kali open but you can find the file by opening the terminal and typing
locate proxychains.conf

Resources