Because of many error I can't figure it out why it's happening in not connecting datanode slave vm into my master vm. Any suggestion is welcome, so i can try it.
And to start, one of them is this error in my slave vm log:
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000
Because of this, I can't run the job that I want in my master vm:
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
which give me this error
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/ubuntu/QuasiMonteCarlo_1386793331690_1605707775/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
and even so, the hdfs dfsadmin -report(at master vm) gives me all 0
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 0 (0 total, 0 dead)
For that, I build up on openstack 3 vms ubuntu, one for master and others slaves.
in master, it's build up in etc/hosts
127.0.0.1 localhost
50.50.1.9 ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8
50.50.1.8 slave1
50.50.1.4 slave2
core-site.xml
<name>fs.default.name</name>
<value>hdfs://ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000</value>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hadoop-2.2.0/tmp</value>
hdfs-site.xml
<name>dfs.replication</name>
<value>3</value>
<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/namenode</value>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/datanode</value>
<name>dfs.permissions</name>
<value>false</value>
mapred-site.xml
<name>mapreduce.framework.name</name>
<value>yarn</value>
And my slave vm file contains each line: slave1 and slave2.
All the logs from master vm contains no error, but when I use slave vm, it gives that error to connect. and the nodemanager gives me error too inside the log:
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76/50.50.1.8 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused;
From my Slave Machine:
core-site.xml
<name>fs.default.name</name>
<value>hdfs://ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000</value>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hadoop-2.2.0/tmp</value>
hdfs-site.xml
<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/datanode</value>
and on my /etc/hosts
127.0.0.1 localhost
50.50.1.8 ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76
50.50.1.9 ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8
The JPS
master
15863 ResourceManager
15205 SecondaryNameNode
14967 NameNode
16194 Jps
slave
1988 Jps
1365 DataNode
1894 NodeManager
The cause all of the error showing, this below error is the main reason not been able to master connect to slave:
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76/50.50.1.8 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused;
Basically, 0.0.0.0:8031 is the port of yarn.resourcemanager.resource-tracker.address, so I checked using lsof -i :8031, the port wasn't enable/open/allowed. Since I'm using OpenStack(cloud), added 8031 and other ports that was showing error and voilá, worked as intend.
I struggled a lot, finally got after using "systemctl stop firewalld" before this I also disabled selinux and ipv6.
In my case, I used hdfs datanode -format to format datanode server, hdfs namenode -format to format datanode server. before that, make sure delete all the files in the data folder which are included in hdfs-site file.
Related
I Set up hadoop 2.6 cluster using two nodes of 8 cores each on Ubuntu 12.04. sbin/start-dfs.sh and sbin/start-yarn.sh both succeed. And I can see the following after jps on the master node.
22437 DataNode
22988 ResourceManager
24668 Jps
22748 SecondaryNameNode
23244 NodeManager
The jps outcome on the slave node is
19693 DataNode
19966 NodeManager
I then run the PI example.
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 30 100
Which gives me there error-log
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "Master-R5-Node/xxx.ww.y.zz"; destination host is: "Master-R5-Node":54310;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
The problem seems with the HDFS file system since trying out the command bin/hdfs dfs -mkdir /user fails with the similar exception.
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "Master-R5-Node/xxx.ww.y.zz"; destination host is: "Master-R5-Node":54310;
where xxx.ww.y.zz is the ip-address of Master-R5-Node
I have checked and followed all the recommendations of ConnectionRefused on Apache and on this site.
Despite the week long effort, I cannot get it fixed.
Thanks.
There are so many reasons to what may lead to the problem I faced. But I finally ended up fixing it using some of the following things.
Make sure that you have the needed permission to the /hadoop and hdfs temporary files. (you have to figure out where that is for your paticular case)
remove the port number from fs.defaultFS in $HADOOP_CONF_DIR/core-site.xml. It should look like this:
`<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://my.master.ip.address/</value>
<description>NameNode URI</description>
</property>
</configuration>`
Add the following two properties to `$HADOOP_CONF_DIR/hdfs-site.xml
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
Voila! You should now be up and running!
I set up Hadoop(2.6.0) with multi machines mode : 1 namenode + 3 datanodes. When I used command : start-all.sh, they (namenode, datanode, resource manager, node manager) worked ok. I checked it with jps command and result on each node were bellow:
NameNode :
7300 ResourceManager
6942 NameNode
7154 SecondaryNameNode
DataNodes:
3840 DataNode
3924 NodeManager
And I also uploaded sample text file on HDFS at: /user/hadoop/data/sample.txt. Absolutely no error at that moment.
But when I tried to run a mapreduce with hadoop example's jar :
hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /user/hadoop/data/sample.txt /user/hadoop/output
I have this error:
15/04/08 03:31:26 INFO mapreduce.Job: Job job_1428478232474_0001 running in uber mode : false
15/04/08 03:31:26 INFO mapreduce.Job: map 0% reduce 0%
15/04/08 03:31:26 INFO mapreduce.Job: Job job_1428478232474_0001 failed with state FAILED due to: Application application_1428478232474_0001 failed 2 times due to Error launching appattempt_1428478232474_0001_000002. Got exception: java.net.ConnectException: Call From hadoop/127.0.0.1 to localhost:53245 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy31.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
... 9 more Failing the application.
15/04/08 03:31:26 INFO mapreduce.Job: Counters: 0
About the configuration, sure that namenode can ssh to datanodes and vice versa without prompt password.I also dissabled IP6 and modified /etc/hosts file :
127.0.0.1 localhost hadoop
192.168.56.102 hadoop-nn
192.168.56.103 hadoop-dn1
192.168.56.104 hadoop-dn2
192.168.56.105 hadoop-dn3
I dont know why mapreduced can't run althought namenode and datanodes worked alright. I'm almost stucked at here, can you help me find the reason??
Thank you
Edit :
Here config in hdfs-site.xml (namenode):
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/hadoop_stores/hdfs/namenode</value>
<description>NameNode directory for namespace and transaction logs storage.</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop-nn:50070</value>
<description>Your NameNode hostname for http access.</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-nn:50090</value>
<description>Your Secondary NameNode hostname for http access.</description>
</property>
In datanodes :
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/hadoop_stores/hdfs/data/datanode</value>
<description>DataNode directory</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop-nn:50070</value>
<description>Your NameNode hostname for http access.</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-nn:50090</value>
<description>Your Secondary NameNode hostname for http access.</description>
Here's result with command : hadoop fs -ls /user/hadoop/data
hadoop#hadoop:~/DATA$ hadoop fs -ls /user/hadoop/data 15/04/09 00:23:27
Found 2 items
-rw-r--r-- 3 hadoop supergroup 29 2015-04-09 00:22 >/user/hadoop/data/sample.txt
-rw-r--r-- 3 hadoop supergroup 27 2015-04-09 00:22 >/user/hadoop/data/sample1.txt
hadoop fs -ls /user/hadoop/output
ls: `/user/hadoop/output': No such file or directory
Found solution!! see this post- yarn shows data nodes id/name as localhost
Call From localhost.localdomain/127.0.0.1 to localhost.localdomain:56148 failed on connection exception: java.net.ConnectException: Connection refused;
Both master and slaves were having host names of localhost.localdomain in /etc/hostname.
I changed host names of slaves to slave1 and slave2. That worked.
Thank you everyone for your time.
#kate make sure etc/hostname in namenode and datanodes are not set to localhost. Just type ~# hostname in terminal to see. You can set a new hostname by the same command.
My master and workers or slaves' /etc/hosts looks like this-
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#127.0.1.1 localhost
192.168.111.72 master
192.168.111.65 worker1
192.168.111.66 worker2
hostname of worker1
hduser#worker1:/mnt/hdfs/datanode$ cat /etc/hostname
worker1
and worker2
hduser#worker2:/usr/local/hadoop/logs$ cat /etc/hostname
worker2
Also, probably you don't want to have "hadoop" hostname with loopback interface. i.e.
127.0.0.1 localhost hadoop
Check this point (1) in https://wiki.apache.org/hadoop/ConnectionRefused.
Thank you.
FIREWALL ISSUE:
java.net.ConnectException: Connection refused
This error might be due to firewall issues. Do this in terminal:
sudo apt-get install iptables-persistent
sudo iptables -L
sudo iptables-save > /usr/iptables-backup/iptables.v4.rules
Check whether the file is created before continuing (since this will be used to restore firewall if something goes wrong).
Now, flush iptable rules (i.e. stop firewall):
sudo iptables -F
Now try,
sudo iptables -L
This command should return no rules. Now, try to run your map/reduce job.
Note: If you want to restore iptables to previous condition, type this in terminal:
sudo iptables-restore < /usr/iptables-backup/iptables.v4.rules
I am trying to set up a Apache Hadoop 2.3.0 cluster , I have a master and three slave nodes , the slave nodes are listed in the $HADOOP_HOME/etc/hadoop/slaves file and I can telnet from the slaves to the Master Name node on port 9000, however when I start the datanode on any of the slaves I get the following exception .
2014-08-03 08:04:27,952 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed
for block pool Block pool BP-1086620743-xx.xy.23.162-1407064313305
(Datanode Uuid null) service to
server1.mydomain.com/xx.xy.23.162:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):
Datanode denied communication with namenode because hostname cannot be
resolved .
The following are the contents of my core-site.xml.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://server1.mydomain.com:9000</value>
</property>
</configuration>
Also in my hdfs-site.xml I am not setting any value for dfs.hosts or dfs.hosts.exclude properties.
Thanks.
Each node needs fully qualified unique hostname.
Your error says
hostname cannot be resolved
Can you cat /etc/hosts file on your each slave an make them having distnct hostname
After that try again
logs
2014-05-12 16:41:26,773 INFO org.apache.hadoop.ipc.RPC: Server at namenode/192.168.12.196:10001 not available yet, Zzzzz...
2014-05-12 16:41:28,777 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.12.196:10001. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
core site xml....
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://user#namenode:10001</value>
</property>
</configuration>
i put in etc/hosts
192.168.12.196 namenode
in masters
user#namenode
in slaves
localhost
and my namenode is on user#192.168.12.196
if i do jps on all node it shows datanode namenode job/tasktracker working fine
You need to change to localhost into namenode in slaves and masters file and restart hadoop once it will work fine.
for better view
Thanks for your Comment
if i put hostname in slaves of namenode it runs datanode and namenode on same node
configuration of my masters and slave are hereafter,
on namenode's masters
'user#namenode'
on namenode'master
hdname1#data1 (data1 belong to ip of node and hdname1 is user)
hdname2#data2
on datanode's masters
user#namenode
on datanode's slaves
hdname1#data1
When I start the hadoopnode1 by using start-all.sh, it successfully start the services on master and slave (see jps command output for slave). But when I try to see the live nodes in admin screen slave node doesn't show up. Even when I run the hadoop fs -ls / command from master it runs perfectly, but from salve it shows error message
#hadoopnode2:~/hadoop-0.20.2/conf$ hadoop fs -ls /
12/05/28 01:14:20 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 0 time(s).
12/05/28 01:14:21 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 1 time(s).
12/05/28 01:14:22 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 2 time(s).
12/05/28 01:14:23 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 3 time(s).
.
.
.
12/05/28 01:14:29 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 10 time(s).
It looks like slave (hadoopnode2) is not being able to find/connect the master node(hadoopnode1)
Please point me what I am missing?
Here are the setting from Master and Slave nodes -
P.S. - Master and slave running same version of Linux and Hadoop and SSH is working perfectly,
because I can start the slave from master node
Also Same settings for core-site.xml, hdfs-site.xml and mapred-site.xml on master(hadooopnode1) and slave (hadoopnode2)
OS - Ubuntu 10
Hadoop Version -
oop#hadoopnode1:~/hadoop-0.20.2/conf$ hadoop version
Hadoop 0.20.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707
Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010
-- Master (hadoopnode1)
hadoop#hadoopnode1:~/hadoop-0.20.2/conf$ uname -a
Linux hadoopnode1 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux
hadoop#hadoopnode1:~/hadoop-0.20.2/conf$ jps
9923 Jps
7555 NameNode
8133 TaskTracker
7897 SecondaryNameNode
7728 DataNode
7971 JobTracker
masters -> hadoopnode1
slaves -> hadoopnode1
hadoopnode2
--Slave (hadoopnode2)
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ uname -a
Linux hadoopnode2 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ jps
1959 DataNode
2631 Jps
2108 TaskTracker
masters - hadoopnode1
core-site.xml
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ cat core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/tmp/hadoop/hadoop-${user.name}</value>
<description>A base for other temp directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopnode1:8020</value>
<description>The name of the default file system</description>
</property>
</configuration>
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoopnode1:8021</value>
<description>The host and port that the MapReduce job tracker runs at.If "local", then jobs are run in process as a single map</description>
</property>
</configuration>
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ cat hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication</description>
</property>
</configuration>
check your service by sudo jps
the master should not be displayed what you need to do
Restart Hadoop
Go to /app/hadoop/tmp/dfs/name/current
Open VERSION (i.e. by vim VERSION)
Record namespaceID
Go to /app/hadoop/tmp/dfs/data/current
Open VERSION (i.e. by vim VERSION)
Replace the namespaceID with the namespaceID you recorded in step 4.
this should work.Best of luck
At the web GUI you can see the number of nodes your cluster has. If you see less than you expected, then make sure that /etc/hosts file at master has as hosts only( for 2 node cluster).
192.168.0.1 master
192.168.0.2 slave
If you see any 127.0.1.1.... ip then comment out, because Hadoop will see them first as host( s).
Check the namenode and datanode logs. (Should be in $HADOOP_HOME/logs/). Most likely issue could be that the namenode and datanode IDs don't match. Delete the hadoop.tmp.dir from all nodes and format the namenode ($HADOOP_HOME/bin/hadoop namenode -format) again, then try again.
I think in slave 2. Slave 2 should listen to the same port 8020 instead of listening at 8021.
Add new node hostname to slaves file and start data node & task tracker on new node.
Indeed there are two errors in your case.
can't connect to hadoop master node from slave
That's network problem. Test it: curl 192.168.1.120:8020 .
Normal Response: curl: (52) Empty reply from server
In my case, I get host not found error. So just take a look at firewall settings
data node down:
That's hadoop problem. Raze2dust's method is good. Here's a another way if you see Incompatible namespaceIDs error in your log:
stop hadoop and edit the value of namespaceID in /current/VERSION to match the value of the current namenode, then start hadoop.
You can always check available datanodes using: hadoop fsck /