java.net.ConnectException error when running yarn - hadoop

I'm having an error when running yarn on a job. HDFS and Yarn both start up fine, jps shows everything normal, pseudo-distributed mode on HDFS works perfectly, and I have triple and quadruple checked my configuration files. Whenever I attempt to run Yarn, however, this happens:
INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From serverA/IPaddress to serverB:30170 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over null after 6 failover attempts. Trying to failover after sleeping for 44428ms.
Yarn then attempts to connect over and over again until I forcefully quit the process. Any ideas why this is happening?

Can you see yarn web ui?
How did you start hdfs and yarn?
You can try ./sbin/start-all.sh

Related

What is this error on spark-submit by HDFS HA yarn

here is my error log:
$ /spark-submit --master yarn --deploy-mode cluster pi.py
...
2021-12-23 01:31:04,330 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1954)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1442)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1895)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:860)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
, while invoking ClientNamenodeProtocolTranslatorPB.setPermission over master/172.17.0.2:8020. Trying to failover immediately.
...
Why I get this erorr??
NOTE. Spark master is run 'master', so spark-submit command run in 'master'
NOTE. Spark worker is run 'worker1' and 'worker2' and 'worker3'
NOTE. ResourceManager run in 'master' and 'master2'
ADD. When print above error log, master2's DFSZKFailoverController is disappeard to jps command result.
ADD. When print above error log, master's Namenode is disappeard to jps command result.
It happens when Spark is unable to access HDFS.
If configured correctly HDFS client will handle the StandbyException by attempting to fail itself over to the other NameNode in the HA, and then it will reattempt the operation.
Replace active Namenode URI manually and check if you are still having the same error, if not HA is not properly configured.

Hbase Master turing off after start. Setup for Hbase on Hadoop for a single cluster DB on my local machine

I have installed Hadoop (2.9.1) and Hbase (2.1) on my linux machine with the appropriate configurations.
1) I start all hadoop components. Using jps, I am able to see all the components that are running. This step is working fine.
2) When I start hbase, all the hbase components start again . Using the jps command, I am able to see the required components are running again. However, within 10 seconds, Hmaster turns off.
This is the contents of the log file for hbase master:-
The errors outlined below are pretty much the same for both master and regionserver log file.
2018-08-17 17:13:14,255 WARN [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
I understand that there is some port connection problem, but don't quite know what and where to make the changes.
Thank you in advance for your guidance.

There are something wrong about Hadoop cluster

I have build a hadoop cluster on ECS on Aliyun of Alibaba.com( it's like AWS). The OS is Ubuntu12.04 . The version of Hadoop is 2.7.1
The cluster is consisted of one master and two slaves.
I can start it successfully. Every node can work well, and I
can use ssh to access two slave node from master node.
Every node is started.
But when I run the wordcount program, there is something wrong. The
error is as following:
exception: java.net.ConnectException: Call From master/10.144.52.189 to localhost:38635 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
When I added Port 38635 in the file /etc/ssh/sshd_config, I run the wordcount program again. The error is still existed, the only difference is the Port 38635 changed.
exception: java.net.ConnectException: Call From master/10.144.52.189 to localhost:46656 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
How to fix this problem? The ports 38635 and 46656 are added in /etc/ssh/sshd_config, the error occurs when run the wordcount program with a new port in the error information.

SocketTimeoutException in hadoop fs -getmerge

I'm running hadoop fs -getmerge and getting the following error:
12/10/30 09:24:45 INFO hdfs.DFSClient: Failed to connect to /[IP], add to
deadNodes and continue
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be
ready for read. ch : java.nio.channels.SocketChannel
I'm getting this error with different IP each try and I don't see any suspicious error or warning in the data node logs.
any thoughts?
HDFS reads are done directly from the block holding DataNodes.
A common reason behind this, especially if it is consistent in failure this way, is the lack of proper Client ➜ DataNode connectivity, owing to firewalls or other reasons.

Error in copying files to HDFS

I tried installing hadoop in two nodes. Both the nodes are up and running. The namenode runs on Ubuntu 10.10 and Datanode on Fedora 13. While copying the file from local file system to hdfs I encountered the following errors.
The terminal showed:
12/04/12 02:19:15 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.OException: Bad connect ack with firstBadLink as 10.211.87.162:9200
12/04/12 02:19:15 INFO hdfs.DFSClient: Abandoning block blk_-1069539184735421145_1014
The log file in namenode showed:
2012-10-16 16:17:56,723 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.6.2.26:50010, storageID=DS-880164535-10.18.13.10-50010-1349721715148, infoPort=50075, ipcPort=50020):DataXceiver
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:282)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:662)
Datanodes available are indicated as 2. I've disabled the firewall and selinux.
The following changes have also been made in the hdfs-site.xml
dfs.socket.timeout -> 360000
dfs.datanode.socket.write.timeout -> 3600000
dfs.datanode.max.xcievers -> 1048576
Both the nodes run sun-java6-jdk, The datanode contains Openjdk but the path settings have been made for sun java.
Yet the same error persists.
What might be the solution.
That's because your firewall is on.
try
sudo /etc/init.d/iptables stop
If you are on Ubuntu, do
sudo ufw disable
this should solve the issue.
The exception log mentioned tha the failure reason is No route to host.
Try ping 10.6.2.26 to test your network connection.

Resources