Hadoop:datanode not connecting to namenode on localhost:50070 cluster summary shows 0 - hadoop

logs
2014-05-12 16:41:26,773 INFO org.apache.hadoop.ipc.RPC: Server at namenode/192.168.12.196:10001 not available yet, Zzzzz...
2014-05-12 16:41:28,777 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.12.196:10001. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
core site xml....
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://user#namenode:10001</value>
</property>
</configuration>
i put in etc/hosts
192.168.12.196 namenode
in masters
user#namenode
in slaves
localhost
and my namenode is on user#192.168.12.196
if i do jps on all node it shows datanode namenode job/tasktracker working fine

You need to change to localhost into namenode in slaves and masters file and restart hadoop once it will work fine.

for better view
Thanks for your Comment
if i put hostname in slaves of namenode it runs datanode and namenode on same node
configuration of my masters and slave are hereafter,
on namenode's masters
'user#namenode'
on namenode'master
hdname1#data1 (data1 belong to ip of node and hdname1 is user)
hdname2#data2
on datanode's masters
user#namenode
on datanode's slaves
hdname1#data1

Related

Not able to run dump in pig

I am trying to dump a relation but getting following error.
I have tried start-all.sh and tried formatting namenode using hadoop namenode -format.
But I am not getting what is wrong.
Error:-
Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
Start the JobHistoryServer
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
Pig when ran in mapreduce mode expects the JobHistoryServer to be available.
To configure JobHistoryServer, add these properties to mapred-site.xml replacing hostname with actual name of the host where the process is started
<property>
<name>mapreduce.jobhistory.address</name>
<value>hostname:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hostname:19888</value>
</property>
I would first ensure I'm able to connect to namenode from hdfs client on a edge node. If not some problem/inconsistency with your namenode configs in core-site.xml file either with ports or hostname.
Once you are able to run below with out any issues and ensure namenode is not in safe mode on url http://namenode_host:50070 (which prevents any writes)
hadoop fs -ls /
Then I would proceed with pig. Looks like based on your error hdfs client is unable to reach namenode for some reason which could be firewall or config issue.

Error starting datanode on hadoop

I'm trying to run a hadoop cluster via Docker. I have one virtual machine as the namenode and another for the datanode, but the datanode gives me this error running start-dfs.sh:
namenode: namenode running as process 130. Stop it first.
The command jps on the datanode does not show the namenode running. Then I try to start it by hand, using:
hadoop namenode
And it fails with this error:
java.net.BindException: Problem binding to [namenode:9000] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException
So far it seems that namenode is not accesible or is not listening on port 9000. But the network setup is correct: if I execute on datanode:
telnet namenode 9000
It correctly connects to the namenode, and the command netstat -apn | grep 9000 from namenode shows the incoming connection. If I shut down dfs on namenode (stop-dfs.sh), the telnet command from datanode fails with "Connection closed by foreign host."
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value> <!-- I have tried with 1 and 2 too -->
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
</configuration>
Thanks!

Hadoop Datanode slave is not connecting to my master

Because of many error I can't figure it out why it's happening in not connecting datanode slave vm into my master vm. Any suggestion is welcome, so i can try it.
And to start, one of them is this error in my slave vm log:
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000
Because of this, I can't run the job that I want in my master vm:
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
which give me this error
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/ubuntu/QuasiMonteCarlo_1386793331690_1605707775/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
and even so, the hdfs dfsadmin -report(at master vm) gives me all 0
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 0 (0 total, 0 dead)
For that, I build up on openstack 3 vms ubuntu, one for master and others slaves.
in master, it's build up in etc/hosts
127.0.0.1 localhost
50.50.1.9 ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8
50.50.1.8 slave1
50.50.1.4 slave2
core-site.xml
<name>fs.default.name</name>
<value>hdfs://ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000</value>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hadoop-2.2.0/tmp</value>
hdfs-site.xml
<name>dfs.replication</name>
<value>3</value>
<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/namenode</value>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/datanode</value>
<name>dfs.permissions</name>
<value>false</value>
mapred-site.xml
<name>mapreduce.framework.name</name>
<value>yarn</value>
And my slave vm file contains each line: slave1 and slave2.
All the logs from master vm contains no error, but when I use slave vm, it gives that error to connect. and the nodemanager gives me error too inside the log:
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76/50.50.1.8 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused;
From my Slave Machine:
core-site.xml
<name>fs.default.name</name>
<value>hdfs://ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000</value>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hadoop-2.2.0/tmp</value>
hdfs-site.xml
<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/datanode</value>
and on my /etc/hosts
127.0.0.1 localhost
50.50.1.8 ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76
50.50.1.9 ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8
The JPS
master
15863 ResourceManager
15205 SecondaryNameNode
14967 NameNode
16194 Jps
slave
1988 Jps
1365 DataNode
1894 NodeManager
The cause all of the error showing, this below error is the main reason not been able to master connect to slave:
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76/50.50.1.8 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused;
Basically, 0.0.0.0:8031 is the port of yarn.resourcemanager.resource-tracker.address, so I checked using lsof -i :8031, the port wasn't enable/open/allowed. Since I'm using OpenStack(cloud), added 8031 and other ports that was showing error and voilá, worked as intend.
I struggled a lot, finally got after using "systemctl stop firewalld" before this I also disabled selinux and ipv6.
In my case, I used hdfs datanode -format to format datanode server, hdfs namenode -format to format datanode server. before that, make sure delete all the files in the data folder which are included in hdfs-site file.

How to change address 'hadoop jar' command is connecting to?

I have been trying to start a MapReduce job on my cluster with the following command:
bin/hadoop jar myjar.jar MainClass /user/hduser/input /user/hduser/output
But I get the following error over and over again, until connection is refused:
13/08/08 00:37:16 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
I then checked with netstat to see if the service was listening to the correct port:
~> sudo netstat -plten | grep java
tcp 0 0 10.1.1.4:54310 0.0.0.0:* LISTEN 10022 38365 11366/java
tcp 0 0 10.1.1.4:54311 0.0.0.0:* LISTEN 10022 32164 11829/java
Now I notice that my service is listening to port 10.1.1.4:54310, which is the IP of my master, but it seems that the 'hadoop jar' command is connecting to 127.0.0.1 (the localhost, which is the same machine) but therefore doesn't find the service. Is there anyway to force 'hadoop jar' to look at 10.1.1.4 instead of 127.0.0.1?
My NameNode, DataNode, JobTracker, TaskTracker, ... are all running. I even checked for DataNode and TaskTracker on the slaves and it all seems to be working. I can check the WebUI on the master and it shows my cluster is online.
I expect the problem to be DNS related since it seems that the 'hadoop jar' command finds the correct port, but always uses the 127.0.0.1 address instead of the 10.1.1.4
UPDATE
Configuration in core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
Configuration in mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
Configuration in hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
Although it seemed to be a DNS issue, it was actually Hadoop trying to resolve a reference to localhost in the code. I was deploying the jar of someone else and assumed it was correct. Upon further inspection I found the reference to localhost and changed it to master, solving my issue.

Adding Data Node to hadoop cluster

When I start the hadoopnode1 by using start-all.sh, it successfully start the services on master and slave (see jps command output for slave). But when I try to see the live nodes in admin screen slave node doesn't show up. Even when I run the hadoop fs -ls / command from master it runs perfectly, but from salve it shows error message
#hadoopnode2:~/hadoop-0.20.2/conf$ hadoop fs -ls /
12/05/28 01:14:20 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 0 time(s).
12/05/28 01:14:21 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 1 time(s).
12/05/28 01:14:22 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 2 time(s).
12/05/28 01:14:23 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 3 time(s).
.
.
.
12/05/28 01:14:29 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 10 time(s).
It looks like slave (hadoopnode2) is not being able to find/connect the master node(hadoopnode1)
Please point me what I am missing?
Here are the setting from Master and Slave nodes -
P.S. - Master and slave running same version of Linux and Hadoop and SSH is working perfectly,
because I can start the slave from master node
Also Same settings for core-site.xml, hdfs-site.xml and mapred-site.xml on master(hadooopnode1) and slave (hadoopnode2)
OS - Ubuntu 10
Hadoop Version -
oop#hadoopnode1:~/hadoop-0.20.2/conf$ hadoop version
Hadoop 0.20.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707
Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010
-- Master (hadoopnode1)
hadoop#hadoopnode1:~/hadoop-0.20.2/conf$ uname -a
Linux hadoopnode1 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux
hadoop#hadoopnode1:~/hadoop-0.20.2/conf$ jps
9923 Jps
7555 NameNode
8133 TaskTracker
7897 SecondaryNameNode
7728 DataNode
7971 JobTracker
masters -> hadoopnode1
slaves -> hadoopnode1
hadoopnode2
--Slave (hadoopnode2)
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ uname -a
Linux hadoopnode2 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ jps
1959 DataNode
2631 Jps
2108 TaskTracker
masters - hadoopnode1
core-site.xml
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ cat core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/tmp/hadoop/hadoop-${user.name}</value>
<description>A base for other temp directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopnode1:8020</value>
<description>The name of the default file system</description>
</property>
</configuration>
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoopnode1:8021</value>
<description>The host and port that the MapReduce job tracker runs at.If "local", then jobs are run in process as a single map</description>
</property>
</configuration>
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ cat hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication</description>
</property>
</configuration>
check your service by sudo jps
the master should not be displayed what you need to do
Restart Hadoop
Go to /app/hadoop/tmp/dfs/name/current
Open VERSION (i.e. by vim VERSION)
Record namespaceID
Go to /app/hadoop/tmp/dfs/data/current
Open VERSION (i.e. by vim VERSION)
Replace the namespaceID with the namespaceID you recorded in step 4.
this should work.Best of luck
At the web GUI you can see the number of nodes your cluster has. If you see less than you expected, then make sure that /etc/hosts file at master has as hosts only( for 2 node cluster).
192.168.0.1 master
192.168.0.2 slave
If you see any 127.0.1.1.... ip then comment out, because Hadoop will see them first as host( s).
Check the namenode and datanode logs. (Should be in $HADOOP_HOME/logs/). Most likely issue could be that the namenode and datanode IDs don't match. Delete the hadoop.tmp.dir from all nodes and format the namenode ($HADOOP_HOME/bin/hadoop namenode -format) again, then try again.
I think in slave 2. Slave 2 should listen to the same port 8020 instead of listening at 8021.
Add new node hostname to slaves file and start data node & task tracker on new node.
Indeed there are two errors in your case.
can't connect to hadoop master node from slave
That's network problem. Test it: curl 192.168.1.120:8020 .
Normal Response: curl: (52) Empty reply from server
In my case, I get host not found error. So just take a look at firewall settings
data node down:
That's hadoop problem. Raze2dust's method is good. Here's a another way if you see Incompatible namespaceIDs error in your log:
stop hadoop and edit the value of namespaceID in /current/VERSION to match the value of the current namenode, then start hadoop.
You can always check available datanodes using: hadoop fsck /

Resources