When I start the hadoopnode1 by using start-all.sh, it successfully start the services on master and slave (see jps command output for slave). But when I try to see the live nodes in admin screen slave node doesn't show up. Even when I run the hadoop fs -ls / command from master it runs perfectly, but from salve it shows error message
#hadoopnode2:~/hadoop-0.20.2/conf$ hadoop fs -ls /
12/05/28 01:14:20 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 0 time(s).
12/05/28 01:14:21 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 1 time(s).
12/05/28 01:14:22 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 2 time(s).
12/05/28 01:14:23 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 3 time(s).
.
.
.
12/05/28 01:14:29 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 10 time(s).
It looks like slave (hadoopnode2) is not being able to find/connect the master node(hadoopnode1)
Please point me what I am missing?
Here are the setting from Master and Slave nodes -
P.S. - Master and slave running same version of Linux and Hadoop and SSH is working perfectly,
because I can start the slave from master node
Also Same settings for core-site.xml, hdfs-site.xml and mapred-site.xml on master(hadooopnode1) and slave (hadoopnode2)
OS - Ubuntu 10
Hadoop Version -
oop#hadoopnode1:~/hadoop-0.20.2/conf$ hadoop version
Hadoop 0.20.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707
Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010
-- Master (hadoopnode1)
hadoop#hadoopnode1:~/hadoop-0.20.2/conf$ uname -a
Linux hadoopnode1 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux
hadoop#hadoopnode1:~/hadoop-0.20.2/conf$ jps
9923 Jps
7555 NameNode
8133 TaskTracker
7897 SecondaryNameNode
7728 DataNode
7971 JobTracker
masters -> hadoopnode1
slaves -> hadoopnode1
hadoopnode2
--Slave (hadoopnode2)
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ uname -a
Linux hadoopnode2 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ jps
1959 DataNode
2631 Jps
2108 TaskTracker
masters - hadoopnode1
core-site.xml
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ cat core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/tmp/hadoop/hadoop-${user.name}</value>
<description>A base for other temp directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopnode1:8020</value>
<description>The name of the default file system</description>
</property>
</configuration>
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoopnode1:8021</value>
<description>The host and port that the MapReduce job tracker runs at.If "local", then jobs are run in process as a single map</description>
</property>
</configuration>
hadoop#hadoopnode2:~/hadoop-0.20.2/conf$ cat hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication</description>
</property>
</configuration>
check your service by sudo jps
the master should not be displayed what you need to do
Restart Hadoop
Go to /app/hadoop/tmp/dfs/name/current
Open VERSION (i.e. by vim VERSION)
Record namespaceID
Go to /app/hadoop/tmp/dfs/data/current
Open VERSION (i.e. by vim VERSION)
Replace the namespaceID with the namespaceID you recorded in step 4.
this should work.Best of luck
At the web GUI you can see the number of nodes your cluster has. If you see less than you expected, then make sure that /etc/hosts file at master has as hosts only( for 2 node cluster).
192.168.0.1 master
192.168.0.2 slave
If you see any 127.0.1.1.... ip then comment out, because Hadoop will see them first as host( s).
Check the namenode and datanode logs. (Should be in $HADOOP_HOME/logs/). Most likely issue could be that the namenode and datanode IDs don't match. Delete the hadoop.tmp.dir from all nodes and format the namenode ($HADOOP_HOME/bin/hadoop namenode -format) again, then try again.
I think in slave 2. Slave 2 should listen to the same port 8020 instead of listening at 8021.
Add new node hostname to slaves file and start data node & task tracker on new node.
Indeed there are two errors in your case.
can't connect to hadoop master node from slave
That's network problem. Test it: curl 192.168.1.120:8020 .
Normal Response: curl: (52) Empty reply from server
In my case, I get host not found error. So just take a look at firewall settings
data node down:
That's hadoop problem. Raze2dust's method is good. Here's a another way if you see Incompatible namespaceIDs error in your log:
stop hadoop and edit the value of namespaceID in /current/VERSION to match the value of the current namenode, then start hadoop.
You can always check available datanodes using: hadoop fsck /
Related
I'm trying to run a hadoop cluster via Docker. I have one virtual machine as the namenode and another for the datanode, but the datanode gives me this error running start-dfs.sh:
namenode: namenode running as process 130. Stop it first.
The command jps on the datanode does not show the namenode running. Then I try to start it by hand, using:
hadoop namenode
And it fails with this error:
java.net.BindException: Problem binding to [namenode:9000] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException
So far it seems that namenode is not accesible or is not listening on port 9000. But the network setup is correct: if I execute on datanode:
telnet namenode 9000
It correctly connects to the namenode, and the command netstat -apn | grep 9000 from namenode shows the incoming connection. If I shut down dfs on namenode (stop-dfs.sh), the telnet command from datanode fails with "Connection closed by foreign host."
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value> <!-- I have tried with 1 and 2 too -->
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
</configuration>
Thanks!
I have installed my hadoop three node cluster(master,slave1 and slave2).
I would like to install Hbase fully distrubuted mode. I am think to install HBase Master and Zookeepr in my hadoop cluster MASTER machine(i.e Namenode), And Region Servers in SLAVE1 and SLAVE2(i.e Datanodes) machines. Is this correct approach ?
Sorry, This may be simple question but I am new to NoSQL systems and want to do this installations.
I really appreciate If someone able to share any reference document for ths installation.
Thanks in advance.
In order to configure hbase and zookeeper on three nodes, i.e., 1 master and 2 slave nodes, you will need to edit hbase-site.xml, regionservers, hbase-env.sh (found in $HBASE_HOME/conf) and zoo.cfg (found in $ZOOKEEPER_HOME/conf).
Let us name your master node as master and slave nodes as slave1 and slave2. Let us consider your hadoop, hbase and zoopeeper folders are in /usr/local/cluster/ folder. Change the following files:
1. hbase-site.xml:
<configuration>
<property>
<name>hbase.master</name>
<value>master:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:8020/hbase</value>
</property>
<property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>slave1,slave2</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/usr/local/cluster/zk-tmp</value>
</property>
</configuration>
2. hbase-env.sh:
--add these lines--
export JAVA_HOME=/usr/lib/jvm/default-java
export HBASE_HOME=/usr/local/cluster/hbase
export HADOOP_HOME=/usr/local/cluster/hadoop
--modify these lines--
export HBASE_PID_DIR=/usr/local/cluster/zk-tmp
export HBASE_MANAGES_ZK=false
3. regionservers:
(delete the localhost and add these lines if you just want your regionservers in slave1 and slave2 only)
slave1
slave2
4. zoo.cfg:
--modify these lines--
dataDir=/usr/local/cluster/zk-tmp
--add these lines(since you start zookeeper server on master node)--
server.0=master:2888:3888
5. etc/hosts:
Edit the /etc/hosts file and comment the line with 127.0.1.1 (to avoid loopback address problems)
--add these lines--
your-master-node-ip master
your-slave1-node-ip slave1
your-slave2-node-ip slave2
Note: Do steps 1 to 5 in master, slave1 and slave2 nodes.
6. Start zookeeper server in master node:
$ZOOKEEPER_HOME/bin/zkServer.sh start
7. Start hbase processes in master node:
$HBASE_HOME/bin/start-hbase.sh
8. Check your hbase and zookeeper processes: Results for jps command in each node should contain-
--master--
QuorumPeerMain
HMaster
HRegionServer
--slave1--
HRegionServer
--slave2--
HRegionServer
9. Stopping zookeeeper and hbase:
$ZOOKEEPER_HOME/bin/zkServer.sh start
$HBASE_HOME/bin/stop-hbase.sh
I am trying to set up a Apache Hadoop 2.3.0 cluster , I have a master and three slave nodes , the slave nodes are listed in the $HADOOP_HOME/etc/hadoop/slaves file and I can telnet from the slaves to the Master Name node on port 9000, however when I start the datanode on any of the slaves I get the following exception .
2014-08-03 08:04:27,952 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed
for block pool Block pool BP-1086620743-xx.xy.23.162-1407064313305
(Datanode Uuid null) service to
server1.mydomain.com/xx.xy.23.162:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):
Datanode denied communication with namenode because hostname cannot be
resolved .
The following are the contents of my core-site.xml.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://server1.mydomain.com:9000</value>
</property>
</configuration>
Also in my hdfs-site.xml I am not setting any value for dfs.hosts or dfs.hosts.exclude properties.
Thanks.
Each node needs fully qualified unique hostname.
Your error says
hostname cannot be resolved
Can you cat /etc/hosts file on your each slave an make them having distnct hostname
After that try again
logs
2014-05-12 16:41:26,773 INFO org.apache.hadoop.ipc.RPC: Server at namenode/192.168.12.196:10001 not available yet, Zzzzz...
2014-05-12 16:41:28,777 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.12.196:10001. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
core site xml....
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://user#namenode:10001</value>
</property>
</configuration>
i put in etc/hosts
192.168.12.196 namenode
in masters
user#namenode
in slaves
localhost
and my namenode is on user#192.168.12.196
if i do jps on all node it shows datanode namenode job/tasktracker working fine
You need to change to localhost into namenode in slaves and masters file and restart hadoop once it will work fine.
for better view
Thanks for your Comment
if i put hostname in slaves of namenode it runs datanode and namenode on same node
configuration of my masters and slave are hereafter,
on namenode's masters
'user#namenode'
on namenode'master
hdname1#data1 (data1 belong to ip of node and hdname1 is user)
hdname2#data2
on datanode's masters
user#namenode
on datanode's slaves
hdname1#data1
Because of many error I can't figure it out why it's happening in not connecting datanode slave vm into my master vm. Any suggestion is welcome, so i can try it.
And to start, one of them is this error in my slave vm log:
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000
Because of this, I can't run the job that I want in my master vm:
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
which give me this error
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/ubuntu/QuasiMonteCarlo_1386793331690_1605707775/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
and even so, the hdfs dfsadmin -report(at master vm) gives me all 0
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 0 (0 total, 0 dead)
For that, I build up on openstack 3 vms ubuntu, one for master and others slaves.
in master, it's build up in etc/hosts
127.0.0.1 localhost
50.50.1.9 ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8
50.50.1.8 slave1
50.50.1.4 slave2
core-site.xml
<name>fs.default.name</name>
<value>hdfs://ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000</value>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hadoop-2.2.0/tmp</value>
hdfs-site.xml
<name>dfs.replication</name>
<value>3</value>
<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/namenode</value>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/datanode</value>
<name>dfs.permissions</name>
<value>false</value>
mapred-site.xml
<name>mapreduce.framework.name</name>
<value>yarn</value>
And my slave vm file contains each line: slave1 and slave2.
All the logs from master vm contains no error, but when I use slave vm, it gives that error to connect. and the nodemanager gives me error too inside the log:
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76/50.50.1.8 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused;
From my Slave Machine:
core-site.xml
<name>fs.default.name</name>
<value>hdfs://ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8:9000</value>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hadoop-2.2.0/tmp</value>
hdfs-site.xml
<name>dfs.namenode.name.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/ubuntu/hadoop-2.2.0/etc/hdfs/datanode</value>
and on my /etc/hosts
127.0.0.1 localhost
50.50.1.8 ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76
50.50.1.9 ubuntu-378e53c1-3e1f-4f6e-904d-00ef078fe3f8
The JPS
master
15863 ResourceManager
15205 SecondaryNameNode
14967 NameNode
16194 Jps
slave
1988 Jps
1365 DataNode
1894 NodeManager
The cause all of the error showing, this below error is the main reason not been able to master connect to slave:
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From ubuntu-e6df65dc-bf95-45ca-bad5-f8ddcc272b76/50.50.1.8 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused;
Basically, 0.0.0.0:8031 is the port of yarn.resourcemanager.resource-tracker.address, so I checked using lsof -i :8031, the port wasn't enable/open/allowed. Since I'm using OpenStack(cloud), added 8031 and other ports that was showing error and voilá, worked as intend.
I struggled a lot, finally got after using "systemctl stop firewalld" before this I also disabled selinux and ipv6.
In my case, I used hdfs datanode -format to format datanode server, hdfs namenode -format to format datanode server. before that, make sure delete all the files in the data folder which are included in hdfs-site file.