OOZIE : Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ] - hadoop

I'm trying to execute Oozie job with the help of
URL: https://www.safaribooksonline.com/library/view/apache-oozie/9781449369910/ch05.html
While executing
oozie job -run -config target/example/job.properties
Getting error as :
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 1 sec. Retry count = 1
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 2 sec. Retry count = 2
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 4 sec. Retry count = 3
Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 8 sec. Retry count = 4
Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection refused (Connection refused)
Any idea; why connection is getting refused?

It is not able to connect to the Oozie server from Oozie client (command line). Find the Oozie server url and do one of the following:
Set (export) the Oozie server as environment variable export OOZIE_URL=http://hostname:11000/oozie
Use the -oozie parameter into the oozie command. oozie job -oozie http://hostname:11000/oozie -run -config target/example/job.properties

Related

ambari on HDP cluster + ambari-metrics-collector service not start

we have some issue with ambari-metrics-collector service , ( we have HDP cluster version - 2.6.4 with 8 nodes )
ambari metrics collector service can’t start or start of few second then failed
the details about metrics collector version
rpm -qa | grep metrics
ambari-metrics-grafana-2.6.1.0-143.x86_64
ambari-metrics-monitor-2.6.1.0-143.x86_64
ambari-metrics-collector-2.5.0.3-7.x86_64
ambari-metrics-hadoop-sink-2.6.1.0-143.x86_64
all machines are rhel 7.2
we performed the following steps in order to resolve the problem
1.restart metrics-collector service
su - ams -c '/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ stop'
su - ams -c '/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ start'
or
ambari-metrics-collector stop
ambari-metrics-collector start
2.restart ambari-metrics-monitor on all nodes
ambari-metrics-monitor stop
ambari-metrics-monitor start
3.clean the folder /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/
mv /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/zookeeper_0 /tmp/bck/zookeeper/
Then restart metrics-collector service
4.Tuning the metrics-collector parameters according - https://docs.cloudera.com/HDPDocuments/Ambari-2.2.1.0/bk_ambari_reference_guide/content/_ams_general_guidelines.html
we update the follwing parameters in ambari
metrics_collector_heap_size=1024
hbase_regionserver_heapsize=1024
hbase_master_heapsize=512
hbase_master_xmn_size=128
status for now: - steps 1-4 doesn’t help
From the logs we can see the following:
log file - ambari-metrics-collector.log
2020-06-25 09:06:14,474 WARN org.apache.zookeeper.ClientCnxn: Session 0x172eab71f310002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2020-06-25 09:06:14,575 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=master02.sys671.com:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/meta-region-server
log file - hbase-ams-master-master02.sys671.com.log
2020-06-25 09:38:18,799 WARN [RS:0;master02:51842-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Session 0x172ead5d73a0004 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2020-06-25 09:38:20,437 INFO [main-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Opening socket connection to server master02.sys671.com/23.2.35.171:61181. Will not attempt to authenticate using SASL (unknown error)
2020-06-25 09:38:20,438 WARN [main-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Session 0x172ead5d73a0002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
we also not see that port is listening ( timeline.metrics.service.webapp.address )
netstat -tulpn | grep 6188
any advice how to continue from this point ?
we'll appreciate to get any help about this problem

hive is not working with HDP with Ambari

I have installed the HDP with Ambari 2.6.1. It mostly did everything automatically but Hive is unable to start properly.
I saw a post somewhere and I deleted the pid and killed the process as well in hope that I would restart it and it would work but now it's showing heartbeat lost on the machine.
Please guide me on what should I do?
I am listing the errors I got from ambari.
1.Hive Metastore:
Metastore on machine2.ambari.local failed (Traceback (mos&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL'' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://machine2.ambari.local:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) Error: Could not open client transport with JDBC Uri: jdbc:hive2://machine2.ambari.local:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) )" data-bindattr-40693="40693"> Connection failed on host machine2.ambari.local:10000 (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute ldap_password=ldap_password) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl timeou... t recent call last): File "/var/lib/ambari-agent/cache/common-servics/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 203, in execute timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__ self.env.run() File "/
Hive Server2:
awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL'' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://machine2.ambari.local:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) Error: Could not open client transport with JDBC Uri: jdbc:hive2://machine2.ambari.local:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) )" data-bindattr-40693="40693"> Connection failed on host machine2.ambari.local:10000 (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py", line 212, in execute ldap_password=ldap_password) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/hive_check.py", line 81, in check_thrift_port_sasl timeou...
There are following three different issues in this question
Hiveserver2 alert :Following error indicate port 10000 on machine2.ambari.local is not reachable, either there is no process (HiveServer2) running on port 10000 or some proxy issue.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://machine2.ambari.local:10000/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused)
Heartbeat Lost : In question it is mentioned that "heartbeat lost on the machine", this could be because ambari-agent process is no more running on host for which heartbeat lost is notified.
After deleting the pid and killing the process , by default it will not restart automatically.

Error while connecting Oozie server: connection timed out

I am trying to run pig program using oozie in command prompt but i am getting error like
Connection exception has occurred [ java.net.ConnectException Connection timed out ]. Trying after 1 sec. Retry count = 1
Connection exception has occurred [ java.net.ConnectException Connection timed out ]. Trying after 2 sec. Retry count = 2
Connection exception has occurred [ java.net.ConnectException Connection timed out ]. Trying after 4 sec. Retry count = 3
Connection exception has occurred [ java.net.ConnectException Connection timed out ]. Trying after 8 sec. Retry count = 4
Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 4. Exception = Connection timed out
and i am running this command
oozie job -oozie http://localhost:11000/oozie -config job.properties -run

Storm - Supervisors launched but not connecting to Nimbus

I have a Storm cluster with 1 Nimbus, 4 Supervisors and 2 Zookeeper nodes. My Storm.yaml is as following:
storm.zookeeper.servers:
- "storage14"
- "storage15"
nimbus.seeds: ["storage01"]
#storm.local.hostname: "storage05"
supervisor.supervisors:
- "storage02"
- "storage03"
- "storage04"
- "storage05"
storm.local.dir: "/tmp/storm"
worker.childopts: "-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump"
This storm.yaml file is used by both Nimbus and Supervisors. When Nimbus is started I have the storm.local.hostname commented out as is shown above.
However, when starting Supervisors on respective nodes, I uncomment the storm.local.hostname and set it to the hostname of the node on which the supervisor is being launched. For instance if I was launching the supervisor on storage05, the storm.yaml file would have the following additional config param:
storm.local.hostname: "storage05"
The problem is even though Nimubs is launched successfully and I can see it on the Storm UI, some supervisors do not seem to be able to connect to Nimbus. For instance of the 4 nodes I start supervisors on, Storm UI often shows only 2 of them connected. However, if I ssh in to these nodes and run jps, I can see that the supervisor process is running on ALL of these nodes.
The Supervisors at the nodes which do end up connecting are not the same always, so it is definitely not a problem with those specific nodes.
Another thing to notice is if I try to execute a topology on whatever nodes that got connected, it does not get registered by the cluster and I can not see that topology on the UI either.
What do you think might be causing this erratic behavior?
UPDATE:
Tail end of nimbus.log has the following lines
2017-01-25 00:04:25.216 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage14/192.168.140.194:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
Your UPDATE (nimbus log) indicates that your Nimbus cannot connect Zookeeper cluster. Please check that Zookeeper cluster (storage14/storage15) is accessible from storage01 (not only node is accessible, but also do telnet to Zookeeper server via "telnet storage14 (and/or storage15) 2181").
When ZK connectivity issue is gone please try starting supervisor again.

HBase program not able to connect to ZooKeeper on localhost

I am new to HBase and I am sure i installed it correctly. On my terminal, I am able to start hbase shell also but even a simple create statement gives me the following error :
(Note : I am trying to run it on stand alone mode)
WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:193)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:450)
at org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
........
Also, I have written only 2 lines in eclipse and I try to run it,
public static void main(String[] args) throws MasterNotRunningException, ZooKeeperConnectionException {
// TODO Auto-generated method stub
Configuration config = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(config);
}
it gives me the following error :
15/03/18 22:25:37 INFO zookeeper.ClientCnxn: Opening socket connection to server fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
15/03/18 22:25:38 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
What could possible be the issue?
I have my /etc/hosts file as 127.0.0.1 localhost.
Anything else that I need to do or change?
http://hbase.apache.org/book.html#quickstart Here is very good documentation to setup hbase. you might be missing some configurations.

Resources