Conncection refused error in hadoop-2.6.0 while running word count program - hadoop

I would like to know the root cause of this error. When I started running the conventional word count program in hadoop-2.6.0 the following exception is generated. your suggestions will be much appreciated.
16/03/16 11:21:40 INFO mapreduce.Job: map 0% reduce 0%
16/03/16 11:21:40 INFO mapreduce.Job: Job job_1458107299826_0002 failed with state FAILED due to: Application application_1458107299826_0002 failed 2 times due to Error launching appattempt_1458107299826_0002_000002. Got exception: java.net.ConnectException: Call From abc-OptiPlex-3020/127.0.1.1 to abc-OptiPlex-3020:52890 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
my /etc/hosts file is as follows: from master system
127.0.0.1 localhost
127.0.1.1 abc-OptiPlex-3020
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.12.106 HadoopMaster
192.168.12.105 HadoopSlave1

Try this
Install openssh server and client
ssh username#hostname

Related

Datanode denied communication with namenode because hostname cannot be resolved

I ran a hadoop cluster in kubernetes, with 4 journalnodes and 2 namenodes. Sometimes, my datanodes cannot register to namenodes.
17/06/08 07:45:32 INFO datanode.DataNode: Block pool BP-541956668-10.100.81.42-1496827795971 (Datanode Uuid null) service to hadoop-namenode-0.myhadoopcluster/10.100.81.42:8020 beginning handshake with NN
17/06/08 07:45:32 ERROR datanode.DataNode: Initialization failed for Block pool BP-541956668-10.100.81.42-1496827795971 (Datanode Uuid null) service to hadoop-namenode-0.myhadoopcluster/10.100.81.42:8020 Datanode denied communication with namenode because hostname cannot be resolved (ip=10.100.9.45, hostname=10.100.9.45): DatanodeRegistration(0.0.0.0:50010, datanodeUuid=b1babba6-9a6f-40dc-933b-08885cbd358e, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-bceaa23f-ba3d-4749-a542-74cda1e82e07;nsid=177502984;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:863)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4529)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1279)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:95)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28539)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
It says:
hadoop-namenode-0.myhadoopcluster/10.100.81.42:8020 Datanode denied communication with namenode because hostname cannot be resolved (ip=10.100.9.45, hostname=10.100.9.45)
However, I can ping hadoop-namenode-0.myhadoopcluster, 10.100.81.42, 10.100.9.45 in both the datanode and the namenode.
/etc/hosts in datanode:
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.100.9.45 hadoop-datanode-0.myhadoopcluster.default.svc.cluster.local hadoop-datanode-0
/etc/hosts in namenode:
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.100.81.42 hadoop-namenode-0.myhadoopcluster.default.svc.cluster.local hadoop-namenode-0
And I have already set dfs.namenode.datanode.registration.ip-hostname-check to false in hdfs-site.xml
I guess the problem may be related to dns. And in other similar problems, hadoop are not deployed in kubernetes or docker container, so I posted this one. Please do not tag it as duplicated...
In my situation, I included three configuration to the namenode and datanode as well:
dfs.namenode.datanode.registration.ip-hostname-check: false
dfs.client.use.datanode.hostname: false (default)
dfs.datanode.use.datanode.hostname: false (default)
I hope you found a resolution to the issue by now.
I ran into similar problem last week, but my cluster is set up in a different environment but the problem context is same.
Essentially , the reverse DNS lookup needs to be set up to solve this issue if the cluster is using a DNS Resolver then this needs to be set up at the DNS server level or if the Name Nodes are looking into /etc/hosts file to find Data Nodes then there needs to be any entry for the Data nodes there.
I have updated an old question in Hortonworks Community Forum Post,Link as below:
https://community.hortonworks.com/questions/24320/datanode-denied-communication-with-namenode.html?childToView=135321#answer-135321

Why suddenly the slave node lost connection to the master node in hadoop?

I have set up hadoop 2.7.2 upon a cluster with a master(ubuntu 15.10) and two slave(slave2,3) hosted in the master by virtualbox.
I have run several examples like wordcount,it all works ok. But when i try to run my own job,say Myjob, it runs well at first, but after a while, it will definitely be interrupted by this error:
INFO ipc.Client: Retrying connect to server: slave3/xxx.216.227.176(the ip of slave):38046. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
Sometimes it would be slave2 ,sometimes would be slave3. And my connection to that slave by ssh shows that the connection is closed by remote.
But the virtualbox shows that slave runs well, and i can ressh to that slave, However all the hadoop process are been killed.
Need to mention that my own job runs longer than the example job.
At first, i think it maybe some error caused by my config file, so, i reinstall hadoop at master and slaves. But the error stills.
So, i think that maybe it is caused by my network config in the slave nodes. So, i changed the last filed of the slave's ip like xxx.xxx.xxx.183 to xxx.xxx.xxx.176 and reinstall hadoop.
And I rerun the job, at this time the job runs longer than usual. But, at the last, when the map stage mostly finished (map 86% reduce 28%), it failed due to the same error!
INFO ipc.Client: Retrying connect to server: slave3/125.xxx.227.xxx(the ip of slave):38046. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
Also there is something log in yarn-user-resourcemanager-Master.log :
java.net.ConnectException: Call From Master/xxx.216.227.186 to slave2:44592 failed on connection exception: java.net.ConnectException: refuse to connect; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
It seems like that the longer time the app runs , the bigger chance it will fail.
Here is my hosts file:
127.0.0.1 localhost
#127.0.1.1 Master
xxx.216.227.186 Master
xxx.216.227.185 slave1# the slave1 has some problem thus do not connect to the cluster
xxx.216.227.176 slave2
xxx.216.227.166 slave3
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Why? How to fix it? Thanks!

Error in Hbase Standalone mode

My standalone hbase starts, but when i do any command in shell it goes in errors...
I have tried many solutions, nothing worked :(
Error message:
hbase(main):001:0> create 'emp' 'data'
16/02/08 18:42:57 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
16/02/08 18:42:57 WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:450)
.
.
.
.
My /etc/hosts:
> 127.0.0.1 localhost
> 127.0.0.1 ankit21
> 192.168.43.11 pooja
> 192.168.43.143 laptop#kalpesh
> 192.168.43.72 aditi
> # The following lines are desirable for IPv6 capable hosts(43.177)
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
JPS:
ankit#ankit21:/usr/lib/hbase/hbase-0.94.8$ jps
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
7609 HMaster
7756 Jps
Well, you need to start your zookeeper service. After that run jps command to check, you should be able to find "QuorumPeer" as a service for zookeeper.
Before starting the zookeeper service , make sure you have configured zoo.cfg in the conf directory. You need add the machine ip in zoo.cfg along with the port for zookeeper.

Hadoop 2.6 multinode cluster failed on connection exception when running example jar

Any example hadoop 2.6 mapreduce application is giving same error- java.net.ConnectException: Connection refused; The error output is:
hduser#localhost:~$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /usr/local/hadoop/input output_wordcount
15/05/26 06:01:14 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.111.72:8040
15/05/26 06:01:15 INFO input.FileInputFormat: Total input paths to process : 1
15/05/26 06:01:15 INFO mapreduce.JobSubmitter: number of splits:1
15/05/26 06:01:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1432599812585_0002
15/05/26 06:01:16 INFO impl.YarnClientImpl: Submitted application application_1432599812585_0002
15/05/26 06:01:16 INFO mapreduce.Job: The url to track the job: http://localhost.localdomain:8088/proxy/application_1432599812585_0002/
15/05/26 06:01:16 INFO mapreduce.Job: Running job: job_1432599812585_0002
15/05/26 06:01:37 INFO mapreduce.Job: Job job_1432599812585_0002 running in uber mode : false
15/05/26 06:01:37 INFO mapreduce.Job: map 0% reduce 0%
15/05/26 06:01:37 INFO mapreduce.Job: Job job_1432599812585_0002 failed with state FAILED due to: Application application_1432599812585_0002 failed 2 times due to Error launching appattempt_1432599812585_0002_000002. Got exception: java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 to localhost.localdomain:56148 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy31.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
... 9 more
. Failing the application.
15/05/26 06:01:37 INFO mapreduce.Job: Counters: 0
My /etc/hosts looks like this-
127.0.0.1 localhost.localdomain localhost
127.0.1.1 ubuntu-Standard-PC-i440FX-PIIX-1996
192.168.111.72 master
192.168.111.65 slave1
192.168.111.66 slave2
# The following lines are desirable for IPv6 capable hosts
#::1 ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters
I have commented ipv6 lines out after trying many other possibilities. I am wondering where the error actually is. Thanks in advance for reply.
Thanks for your reply #Ashok. But jps on master and slaves show all demons are runnning. Attaching output-
Master
hduser#localhost:~$ jps
23518 Jps
10442 NameNode
10752 SecondaryNameNode
12348 ResourceManager
Slave1
hduser#localhost:~$ jps
28691 NodeManager
13987 Jps
27298 DataNode
And same for slave2.
Found solution!!
Call From localhost.localdomain/127.0.0.1 to localhost.localdomain:56148 failed on connection exception: java.net.ConnectException: Connection refused;
Both master and slaves were having host names of localhost.localdomain in /etc/hostname.
I changed host names of slaves to slave1 and slave2. That worked.
Thank you everyone for your time.
It seems your Namenode not running, or any of the other daemons are not running, also make sure you can ping between the nodes.

HBase exception

When I use HBase in pseudo cluster mode, I am getting the below exception. It will be really great if somebody could shed some light on this issue to resolve it
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=10, exceptions:
Wed Feb 06 15:22:23 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable#29422384, java.io.IOException: java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for reader reader=file:/home/688697/hbase/test/c28d92322c97364af59b09d4f4b4a95f/cf/c5de203afb5647c0b90c6c18d58319e9, compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=0deptempname0/cf:email/1360143938898/Put, lastKey=4191151deptempname4191151/cf:place/1360143938898/Put, avgKeyLen=45, avgValueLen=7, entries=17860666, length=1093021429, cur=10275517deptempname10275517/cf:place/1360143938898/Put/vlen=4]
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:104)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:106)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:289)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:138)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3004)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:2951)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:2968)
at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2155)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: file:/home/688697/hbase/test/c28d92322c97364af59b09d4f4b4a95f/cf/c5de203afb5647c0b90c6c18d58319e9 at 37837312 exp: -819174049 got: 1765448374
at org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:320)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276)
at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:211)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:229)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:193)
at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:431)
at org.apache.hadoop.fs.FSInputChecker.seek(FSInputChecker.java:412)
at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:48)
at org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:318)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1047)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:266)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readNextDataBlock(HFileReaderV2.java:452)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:416)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:99)
... 12 more
Wed Feb 06 15:22:24 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable#29422384, java.io.IOException: java.io.IOException: java.lang.IllegalArgumentException
at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1079)
at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1068)
at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2182)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:216)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:395)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:99)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:106)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:326)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:138)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3004)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:2951)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:2968)
at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2155)
... 5 more
The root cause of this issue resides in your /etc/hosts file. If you check your /etc/hosts file you will find a entry something like the one below (in my case mu machine is named domainnameyouwanttogive)
127.0.0.1 localhost
127.0.1.1 domainnameyouwanttogive
The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
The root cause is that domainnameyouwanttogive resolves to 127.0.1.1 which is incorrect as it should resolve to 127.0.0.1 (or a external IP). As my external IP is 192.168.58.10 I created the following /etc/hosts configuration;
127.0.0.1 localhost
192.168.43.3 domainnameyouwanttogive
The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
This will ensure that the resolving of your host processes on your localhost will be done correctly and you can start your HBase installation correctly on your development system.
And also please make sure that your hadoop namenode is running with same domain name you are using for hbase

Resources