DataNodes can't talk to NameNode - hadoop

Set up a Hadoop cluster of 3 nodes. One of them got both NameNode and DataNode roles while other two are just DataNodes.
I started all nodes and services but in summary it shows only one of DataNodes's status is live. Status of other nodes are not even showing.
My question is what is the difference between being started and being live? And why other nodes don't have a status at all?
I guess the issue is datanodes can't talk to namenode. So as Azwaw pointed out, I checked /etc/hosts file. It was like this:
127.0.0.1 nnode.domain nnode localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.1.4.212 nnode.domain nnode
192.1.5.124 dnode02.domain dnode02
192.1.5.125 dnode01.domain dnode01
I changed first line to:
127.0.0.1 localhost.localdomain localhost localhost4 localhost4.localdomain4
Now I can establish connection with nnode.domain:50070, however errors at datanode side changed. Here log piece from datanode:
2015-05-15 10:08:21,721 ERROR datanode.DataNode (DataXceiver.java:run(253)) - dnode01.domain:50010:DataXceiver error processing unknown operation src: /127.0.0.1:49000 dst: /127.0.0.1:50010
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:212)
at java.lang.Thread.run(Thread.java:745)
2015-05-15 10:08:23,670 INFO datanode.DataNode (BPServiceActor.java:register(782)) - Block pool BP-2116866246-127.0.0.1-1431441630609 (Datanode Uuid null) service to nnode.domain/192.1.4.212:8020 beginning handshake with NN
2015-05-15 10:08:23,674 ERROR datanode.DataNode (BPServiceActor.java:run(840)) - Initialization failed for Block pool BP-2116866246-127.0.0.1-1431441630609 (Datanode Uuid null) service to nnode.domain/192.1.4.212:8020 Datanode denied communication with namenode because hostname cannot be resolved (ip=192.1.4.1, hostname=192.1.4.1): DatanodeRegistration(0.0.0.0, datanodeUuid=7f1be518-1255-4a6a-b31c-22be5dc47673, infoPort=50075, ipcPort=8010, storageInfo=lv=-56;cid=CID-51d1dfd0-9376-44a7-b581-c14eec95fd74;nsid=450599258;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:887)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:5282)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1082)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26378)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
This is odd, there is no host with 192.1.4.1 IP address. Why would datanodes try to connect 192.1.4.1?
Unresolved datanode registration: hostname cannot be resolved (ip=192.1.4.1, hostname=192.1.4.1)

"Datanodes 3/3 started" means 3 datanodes process running
Datanodes status "1 live / 0 Dead / 0 Decommissioning" means your namenode is able to communicate with one node.
It seems to be a network problem (make sure HDFS ports are open on your firewall). The alive Datanode is probably on same machine as your Namenode.

Moving NameNode to same network with DataNodes solved the problem.
DataNodes are in 192.1.5.* network.
NameNode was in 192.1.4.* network.
After moving NameNode to 192.1.5.* did the trick for my case.

Related

Cloudera Manager Health Issue: NameNode Connectivity, Web Server Status

Below is a snapshot of the health issues reported on CM. The datanodes in the list keep changing. Some errors from the datanode logs :
3:59:31.859 PM ERROR org.apache.hadoop.hdfs.server.datanode.DataNode
datanode05.hadoop.com:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.248.200.113:45252 dest: /10.248.200.105:50010
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)
5:46:03.606 PM INFO org.apache.hadoop.hdfs.server.datanode.DataNode
Exception for BP-846315089-10.248.200.4-1369774276029:blk_-780307518048042460_200374997
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.248.200.105:50010 remote=/10.248.200.122:43572]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.DataInputStream.read(DataInputStream.java:132)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)
Snapshot:
I am unable to figure out the root cause of the issue. I can manually connect from one datanode to another without issues, I don't believe it is a network issue. Also, the missing blocks and under-replicated block counts change (up & down) as well.
Cloudera Manager : Cloudera Standard 4.8.1
CDH 4.7
Any help in resolving this issue is appreciated.
Update: Jan 01, 2016
For the datanodes listed as bad, when I see the dadanode logs, I see this message a lot...
11:58:30.066 AM INFO org.apache.hadoop.hdfs.server.datanode.DataNode
Receiving BP-846315089-10.248.200.4-1369774276029:blk_-706861374092956879_36606459 src: /10.248.200.123:56795 dest: /10.248.200.112:50010
Why is this datanode receiving a lot of blocks from other datanodes around the same time? It seems that because of this activity the datanode cannot respond to the namenode request in time and thus timing out. All bad datanodes show the same pattern.
Similar question got answered
hdfs data node disconnected from namenode.
Please check your firewall. Use
telnet ipaddress port
to check the connectivity.

On hadoop 2.2.0, My datanode could't start up

everyone, I have a little problem when I build the Hadoop Cluster
My node install CentOS 6.5,java1.7.60 and hadoop 2.2.0
I want to build a master and three slaves
I try to build it like this
But in the end of this, I try to start up my namenode and datanode
My /etc/hosts like this:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.10 master
10.10.10.11 slave1
10.10.10.12 slave2
10.10.10.13 slave3
Just like this when I type:
$ hadoop namenode -fromat:
java.net.UnknownHostException: hadoop01.hadoopcluster: hadoop01.hadoopcluster
at java.net.InetAddress.getLocalHost(InetAddress.java:1473)
at org.apache.hadoop.net.DNS.resolveLocalHostname(DNS.java:264)
at org.apache.hadoop.net.DNS.<clinit>(DNS.java:57)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:914)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:550)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:144)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:837)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1213)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)
Caused by: java.net.UnknownHostException: hadoop01.hadoopcluster
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
... 8 more
14/07/01 16:50:46 WARN net.DNS: Unable to determine address of the host-falling back to "localhost" address
java.net.UnknownHostException: hadoop01.hadoopcluster: hadoop01.hadoopcluster
at java.net.InetAddress.getLocalHost(InetAddress.java:1473)
at org.apache.hadoop.net.DNS.resolveLocalHostIPAddress(DNS.java:287)
at org.apache.hadoop.net.DNS.<clinit>(DNS.java:58)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:914)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:550)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:144)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:837)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1213)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)
Caused by: java.net.UnknownHostException: hadoop01.hadoopcluster
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
... 8 more
and try to issue start-dfs.sh and start-yarn.sh:
$ start-dfs.sh
14/07/01 16:55:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: namenode running as process 2395. Stop it first.
slave2: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-hadoop03.hadopcluster.out
slave1: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-hadoop02.hadopcluster.out
slave3: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-hadoop04.hadopcluster.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 2564. Stop it first.
14/07/01 16:55:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/logs/yarn-hadoop-resourcemanager-hadoop01.hadoopcluster.out
slave1: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-hadoop02.hadopcluster.out
slave3: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-hadoop04.hadopcluster.out
slave2: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-hadoop03.hadopcluster.out
and type jps:
$ jps
2564 SecondaryNameNode
5591 Jps
2395 NameNode
I only look like this, didn't have DataNode, NodeManager, ResourceManger... etc,
Is it any where wrong when I setting it?
Could anyone can suggest me something, thanks!
DHCP (Dynamic Host Configuration Protocol)
it is a server deployed on an IP network and it is used to allocate IP addresses to its clients. Therefore you had to configure the DHCP on both the server and clients sides.
On the server side:
getting the package: isc-dhcp-server
editing /etc/network interfaces to configure interfaces, choose a static IP address for the DHCP server
specifying the local network interface DHCP will listen to (/etc/default/isc-dhcp-server)
editing configuration file /etc/dhcp/dhcpd.conf to choose the IP addresses that will be used by the local network, define the DNS server
On the client side:
editing /etc/network interfaces to configure interfaces
You can make sure that it is well installed and configured you can use ifconfig and ping commands.
DNS (Domain Name System)
intalling bind9
editing /etc/bind/named.conf to add hosts
Good luck.

hdfs data node disconnected from namenode

I get from time to time the following errors in cloudera manager:
This DataNode is not connected to one or more of its NameNode(s).
and
The Cloudera Manager agent got an unexpected response from this role's web server.
(usually together, sometimes only one of them)
In most references to these errors in SO and Google, the issue is a configuration problem (and the data node never connects to the name node)
In my case the data nodes usually connect at start up, but loose the connection after some time - so it doesn't appear to be a bad configuration.
Any other options?
Is it possible to force the data node to reconnect to the name node?
Is it possible to "ping" the name node from the data node (simulate the connection attempt of the data node)
Could it be some kind of resource problem (to many open files \ connections)?
sample logs (the errors vary from time to time)
2014-02-25 06:39:49,179 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: exception:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)
2014-02-25 06:39:49,180 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.56.144.18:50010, dest: /10.56.144.28:48089, bytes: 132096, op: HDFS_READ, cliID: DFSClient_NONMAPREDUCE_1315770947_27, offset: 0, srvID: DS-990970275-10.56.144.18-50010-1384349167420, blockid: BP-1381780028-10.56.144.16-1384349161741:blk_-8718668700255896235_5121440, duration: 480291679056
2014-02-25 06:39:49,180 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.56.144.18, storageID=DS-990970275-10.56.144.18-50010-1384349167420, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster16;nsid=7043943;c=0):Got exception while serving BP-1381780028-10.56.144.16-1384349161741:blk_-8718668700255896235_5121440 to /10.56.144.28:48089
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)
2014-02-25 06:39:49,181 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: host.com:50010:DataXceiver error processing READ_BLOCK operation src: /10.56.144.28:48089 dest: /10.56.144.18:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)
Hadoop uses specific ports to communicate between the DataNode and the NameNode. It could be that a firewall is blocking those specific ports. Check the default ports in the Cloudera WebSite and test the connectivity to the NameNode with specific ports.
If you're using Linux then please make sure that you have configured these properties correctly:
Disable SELINUX
type the command getenforce on CLI and if it shows enforcing, means it is enabled. Change it fro /etc/selinux/config file.
Disable Firewall
Make sure you have NTP service installed.
Make sure your server can SSH to all client nodes.
Make sure all the nodes have FQDN(Fully Qualified Domain Name) and have an entry in /etc/hosts with name and IP.
If these settings are right in the place then please attach the log of any of your datanode which got disconnected.
I ran into this error
"This DataNode is not connected to one or more of its NameNode(s). "
and I solved it by turning off safe mode and restart HDFS service
I realize you took some steps to test this, but intermittent disconnects still make it sound like a Connectivity issue.
If nodes really don't come back after a disconnect, that may be a configuration issue, which could well be completely independent from the reason why they disconnect in the first place.

hadoop - Connection refused on namenode

I've searched web and stackoverflow for a long time but it was not useful.
I have installed hadoop yarn 2.2.0 in 2 node cluster setup. but something goes wrong.
when I start hadoop daemons using start-dfs.sh and start-yarn.sh on master node, they successfully run in master and slave (my master's hostname is RM and my slave's hostname is slv). they can ssh each other successfully. but when I want to run a job, this error appears:
14/01/02 04:22:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/01/02 04:22:56 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/QuasiMonteCarlo_1388665371850_813553673/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
and in datanode log this log exists:
2014-01-02 04:40:31,616 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: RM/192.168.1.101:9000
2014-01-02 04:40:37,618 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: RM/192.168.1.101:9000. Already tried 0 time(s)$
2014-01-02 04:40:38,619 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: RM/192.168.1.101:9000. Already tried 1 time(s)$
2014-01-02 04:40:39,620 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: RM/192.168.1.101:9000. Already tried 2 time(s)$
2014-01-02 04:40:40,621 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: RM/192.168.1.101:9000. Already tried 3 time(s)
I checked the 9000 port on the master node and the output is this:
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 10227/java
I guess the problem is caused by the reason that in the slave node when I
telnet RM 9000
it says
Trying 192.168.1.101...
telnet: Unable to connect to remote host: Connection refused
however
telnet RM
the output is :
Trying 192.168.1.101...
Connected to RM.
Escape character is '^]'.
Ubuntu 12.04.2 LTS
RM login:
for additional information my /etc/hosts on master and slave is as below:
127.0.0.1 RM|slv localhost
192.168.1.101 RM
192.168.1.103 slv
can anybody suggest me a solution?
any help is really appreciated.
thanks
I think the problem is that your master is listening on 127.0.0.1:9000, so datanode can't connect because it is not listening at 192.168.1.101:9000 (theoretically, a good place to listen is 0.0.0.0:9000 since avoids this problems, but seems this configuration is not accepted).
Maybe will fix modifying your /etc/hosts deleting the first line, or try first just with:
127.0.0.1 localhost
192.168.1.101 RM
192.168.1.103 slv
-- edit: read comments bellow
I had the same problem, I changed
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
in core-site.xml to
<property>
<name>fs.defaultFS</name>
<value>hdfs://ip-address:8020</value>
</property>
and it worked
I encountered the same issue. After ran jps, we can see all the namenode and datanode are running up. but cannot see the active node in the web page. And I found I put 127.0.0.1 master in /etc/hosts. After removed it. the slaves can telnet master 9000.
My /etc/hosts looks like:
127.0.0.1 localhost
192.168.139.129 slave1
192.168.139.130 slave2
192.168.139.128 master

Error in copying files to HDFS

I tried installing hadoop in two nodes. Both the nodes are up and running. The namenode runs on Ubuntu 10.10 and Datanode on Fedora 13. While copying the file from local file system to hdfs I encountered the following errors.
The terminal showed:
12/04/12 02:19:15 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.OException: Bad connect ack with firstBadLink as 10.211.87.162:9200
12/04/12 02:19:15 INFO hdfs.DFSClient: Abandoning block blk_-1069539184735421145_1014
The log file in namenode showed:
2012-10-16 16:17:56,723 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.6.2.26:50010, storageID=DS-880164535-10.18.13.10-50010-1349721715148, infoPort=50075, ipcPort=50020):DataXceiver
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:282)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:662)
Datanodes available are indicated as 2. I've disabled the firewall and selinux.
The following changes have also been made in the hdfs-site.xml
dfs.socket.timeout -> 360000
dfs.datanode.socket.write.timeout -> 3600000
dfs.datanode.max.xcievers -> 1048576
Both the nodes run sun-java6-jdk, The datanode contains Openjdk but the path settings have been made for sun java.
Yet the same error persists.
What might be the solution.
That's because your firewall is on.
try
sudo /etc/init.d/iptables stop
If you are on Ubuntu, do
sudo ufw disable
this should solve the issue.
The exception log mentioned tha the failure reason is No route to host.
Try ping 10.6.2.26 to test your network connection.

Resources