Hadoop yarn node list shows slaves as localhost.localdomain:#somenumber. connection refuse exception - hadoop

I have got connection refuse exception from localhost.localdomain/127.0.0.1 to localhost.localdomain:55352 when trying to run wordcount program.
yarn node -list gives
hduser#localhost:/usr/local/hadoop/etc/hadoop$ yarn node -list
15/05/27 07:23:54 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.111.72:8040
Total Nodes:2
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
localhost.localdomain:32991 RUNNING localhost.localdomain:8042 0
localhost.localdomain:55352 RUNNING localhost.localdomain:8042 0
master /etc/hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#127.0.1.1 ubuntu-Standard-PC-i440FX-PIIX-1996
192.168.111.72 master
192.168.111.65 slave1
192.168.111.66 slave2
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
slave /etc/hosts:
127.0.0.1 localhost.localdomain localhost
#127.0.1.1 ubuntu-Standard-PC-i440FX-PIIX-1996
192.168.111.72 master
#192.168.111.65 slave1
#192.168.111.66 slave2
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
What I understood is master is wrongly trying to connect to slaves on localhost. Please help me resolve this. Any suggestion is appreciated. Thank you.

Here is the code how NodeManager builds the NodeId:
private NodeId buildNodeId(InetSocketAddress connectAddress,
String hostOverride) {
if (hostOverride != null) {
connectAddress = NetUtils.getConnectAddress(
new InetSocketAddress(hostOverride, connectAddress.getPort()));
}
return NodeId.newInstance(
connectAddress.getAddress().getCanonicalHostName(),
connectAddress.getPort());
}
NodeManager tries to get the canonical hostname from the binding address, localhost will be gotten by given address 127.0.0.1.
So in your case, on the slave host, localhost.localdomain is the default host name for address 127.0.0.1, and the possible solution might be changing the first line of /etc/hosts on your slaves respectively to:
127.0.0.1 slave1 localhost.localdomain localhost
and
127.0.0.1 slave2 localhost.localdomain localhost

Related

Datanode denied communication with namenode because hostname cannot be resolved

I ran a hadoop cluster in kubernetes, with 4 journalnodes and 2 namenodes. Sometimes, my datanodes cannot register to namenodes.
17/06/08 07:45:32 INFO datanode.DataNode: Block pool BP-541956668-10.100.81.42-1496827795971 (Datanode Uuid null) service to hadoop-namenode-0.myhadoopcluster/10.100.81.42:8020 beginning handshake with NN
17/06/08 07:45:32 ERROR datanode.DataNode: Initialization failed for Block pool BP-541956668-10.100.81.42-1496827795971 (Datanode Uuid null) service to hadoop-namenode-0.myhadoopcluster/10.100.81.42:8020 Datanode denied communication with namenode because hostname cannot be resolved (ip=10.100.9.45, hostname=10.100.9.45): DatanodeRegistration(0.0.0.0:50010, datanodeUuid=b1babba6-9a6f-40dc-933b-08885cbd358e, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-bceaa23f-ba3d-4749-a542-74cda1e82e07;nsid=177502984;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:863)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4529)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1279)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:95)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28539)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
It says:
hadoop-namenode-0.myhadoopcluster/10.100.81.42:8020 Datanode denied communication with namenode because hostname cannot be resolved (ip=10.100.9.45, hostname=10.100.9.45)
However, I can ping hadoop-namenode-0.myhadoopcluster, 10.100.81.42, 10.100.9.45 in both the datanode and the namenode.
/etc/hosts in datanode:
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.100.9.45 hadoop-datanode-0.myhadoopcluster.default.svc.cluster.local hadoop-datanode-0
/etc/hosts in namenode:
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.100.81.42 hadoop-namenode-0.myhadoopcluster.default.svc.cluster.local hadoop-namenode-0
And I have already set dfs.namenode.datanode.registration.ip-hostname-check to false in hdfs-site.xml
I guess the problem may be related to dns. And in other similar problems, hadoop are not deployed in kubernetes or docker container, so I posted this one. Please do not tag it as duplicated...
In my situation, I included three configuration to the namenode and datanode as well:
dfs.namenode.datanode.registration.ip-hostname-check: false
dfs.client.use.datanode.hostname: false (default)
dfs.datanode.use.datanode.hostname: false (default)
I hope you found a resolution to the issue by now.
I ran into similar problem last week, but my cluster is set up in a different environment but the problem context is same.
Essentially , the reverse DNS lookup needs to be set up to solve this issue if the cluster is using a DNS Resolver then this needs to be set up at the DNS server level or if the Name Nodes are looking into /etc/hosts file to find Data Nodes then there needs to be any entry for the Data nodes there.
I have updated an old question in Hortonworks Community Forum Post,Link as below:
https://community.hortonworks.com/questions/24320/datanode-denied-communication-with-namenode.html?childToView=135321#answer-135321

Conncection refused error in hadoop-2.6.0 while running word count program

I would like to know the root cause of this error. When I started running the conventional word count program in hadoop-2.6.0 the following exception is generated. your suggestions will be much appreciated.
16/03/16 11:21:40 INFO mapreduce.Job: map 0% reduce 0%
16/03/16 11:21:40 INFO mapreduce.Job: Job job_1458107299826_0002 failed with state FAILED due to: Application application_1458107299826_0002 failed 2 times due to Error launching appattempt_1458107299826_0002_000002. Got exception: java.net.ConnectException: Call From abc-OptiPlex-3020/127.0.1.1 to abc-OptiPlex-3020:52890 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
my /etc/hosts file is as follows: from master system
127.0.0.1 localhost
127.0.1.1 abc-OptiPlex-3020
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.12.106 HadoopMaster
192.168.12.105 HadoopSlave1
Try this
Install openssh server and client
ssh username#hostname

Error in Hbase Standalone mode

My standalone hbase starts, but when i do any command in shell it goes in errors...
I have tried many solutions, nothing worked :(
Error message:
hbase(main):001:0> create 'emp' 'data'
16/02/08 18:42:57 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
16/02/08 18:42:57 WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:450)
.
.
.
.
My /etc/hosts:
> 127.0.0.1 localhost
> 127.0.0.1 ankit21
> 192.168.43.11 pooja
> 192.168.43.143 laptop#kalpesh
> 192.168.43.72 aditi
> # The following lines are desirable for IPv6 capable hosts(43.177)
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
JPS:
ankit#ankit21:/usr/lib/hbase/hbase-0.94.8$ jps
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
7609 HMaster
7756 Jps
Well, you need to start your zookeeper service. After that run jps command to check, you should be able to find "QuorumPeer" as a service for zookeeper.
Before starting the zookeeper service , make sure you have configured zoo.cfg in the conf directory. You need add the machine ip in zoo.cfg along with the port for zookeeper.

Hadoop slave cannot connect to master, even when service is running and ports are open

I'm running hadoop 2.5.1 and I'm having a problem when slaves are connecting to master. My goal is to set-up a hadoop cluster. I hope someone can help, I'm been poundering with this too long already! :)
This is what comes up to the log file of slave:
2014-10-18 22:14:07,368 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: master/192.168.0.104:8020
This is my core-site.xml -file (same on master and slave):
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master/</value>
</property>
</configuration>
This is my hosts -file ((almost)same on master and slave).. I have hard coded addresses to there without any success:
127.0.0.1 localhost
192.168.0.104 xubuntu: xubuntu
192.168.0.104 master
192.168.0.194 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Netstats from master:
xubuntu#xubuntu:/usr/local/hadoop/logs$ netstat -atnp | grep 8020
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.0.104:8020 0.0.0.0:* LISTEN 26917/java
tcp 0 0 192.168.0.104:52114 192.168.0.104:8020 ESTABLISHED 27046/java
tcp 0 0 192.168.0.104:8020 192.168.0.104:52114 ESTABLISHED 26917/java
Nmap from master to master:
Starting Nmap 6.40 ( http://nmap.org ) at 2014-10-18 22:36 EEST
Nmap scan report for master (192.168.0.104)
Host is up (0.000072s latency).
rDNS record for 192.168.0.104: xubuntu:
PORT STATE SERVICE
8020/tcp open unknown
..and nmap from slave to master (even when the port is open, the slave doesn't connect to it..):
ubuntu#ubuntu:/usr/local/hadoop/logs$ nmap master -p 8020
Starting Nmap 6.40 ( http://nmap.org ) at 2014-10-18 22:35 EEST
Nmap scan report for master (192.168.0.104)
Host is up (0.14s latency).
PORT STATE SERVICE
8020/tcp open unknown
What is this all about? The problem is not about firewall.. I have also read every thread there is to to this without any success. I'm frustrated to this.. :(
At least one of your problems is that you are using old configuration name for the HDFS. For version 2.5.1 the configuration name should be fs.defaultFS instead of fs.default.name. I also suggest defining the port in the value, so the value would be hdfs://master:8020.
Sorry, I'm not linux guru, so I don't know about nmap, but does telnet'ing work from slave to master to the port?

HBase exception

When I use HBase in pseudo cluster mode, I am getting the below exception. It will be really great if somebody could shed some light on this issue to resolve it
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=10, exceptions:
Wed Feb 06 15:22:23 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable#29422384, java.io.IOException: java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for reader reader=file:/home/688697/hbase/test/c28d92322c97364af59b09d4f4b4a95f/cf/c5de203afb5647c0b90c6c18d58319e9, compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=0deptempname0/cf:email/1360143938898/Put, lastKey=4191151deptempname4191151/cf:place/1360143938898/Put, avgKeyLen=45, avgValueLen=7, entries=17860666, length=1093021429, cur=10275517deptempname10275517/cf:place/1360143938898/Put/vlen=4]
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:104)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:106)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:289)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:138)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3004)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:2951)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:2968)
at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2155)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: file:/home/688697/hbase/test/c28d92322c97364af59b09d4f4b4a95f/cf/c5de203afb5647c0b90c6c18d58319e9 at 37837312 exp: -819174049 got: 1765448374
at org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:320)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276)
at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:211)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:229)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:193)
at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:431)
at org.apache.hadoop.fs.FSInputChecker.seek(FSInputChecker.java:412)
at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:48)
at org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:318)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1047)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:266)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readNextDataBlock(HFileReaderV2.java:452)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:416)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:99)
... 12 more
Wed Feb 06 15:22:24 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable#29422384, java.io.IOException: java.io.IOException: java.lang.IllegalArgumentException
at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1079)
at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1068)
at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2182)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:216)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:395)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:99)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:106)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:326)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:138)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3004)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:2951)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:2968)
at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2155)
... 5 more
The root cause of this issue resides in your /etc/hosts file. If you check your /etc/hosts file you will find a entry something like the one below (in my case mu machine is named domainnameyouwanttogive)
127.0.0.1 localhost
127.0.1.1 domainnameyouwanttogive
The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
The root cause is that domainnameyouwanttogive resolves to 127.0.1.1 which is incorrect as it should resolve to 127.0.0.1 (or a external IP). As my external IP is 192.168.58.10 I created the following /etc/hosts configuration;
127.0.0.1 localhost
192.168.43.3 domainnameyouwanttogive
The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
This will ensure that the resolving of your host processes on your localhost will be done correctly and you can start your HBase installation correctly on your development system.
And also please make sure that your hadoop namenode is running with same domain name you are using for hbase

Resources