YARN: Slave DataNode not doing work despite looking up and healthy - hadoop

I have a two node Haoop 2.7.1 installation on Ubuntu 12.04 LTS. All the demons are up and running after sbin/start-dfs.sh && sbin/start-yarn.sh as confirmed by jps.
However, only the Master node(it doubles as slave) is doing work while the slave-node's DataNode sits idle.
The weird thing is that the tmp folder of the slave-node is being populated by nm-local-dir (created by the ResourceManager I presume).
All configuration seems fine and I have tried every trick that the Internet can suggest- but to no avail.
Any ideas?

Delete the local datanode/namenode folder on the master node and all the slave nodes then do hdfs namenode -format
I think that the problem is an inconsistency in the clusterID between the master and the slaves.

Related

Adding a node to hadoop cluster without restarting master

i have created a hadoop cluster and wanted to add a new node node in the cluster running as a slave without restarting the master node
how can this be acheived
Datanodes and nodemanagers can be added without restarting the namenode(s) or resource manager(s).
More specifically, these need to be ran on the machines of those running services
Namenode
hdfs dfsadmin -refreshNodes
ResourceManager
rmadmin -refreshNodes

Ambari show namenode is stop but actually namenode is still working

We are using HDP 2.7.1.2.3 with Ambari 2.1.2
After finish setup, every node status is correct.
But oneday ambari suddenly show namdenode is stopped.(we don't change any config of ambari or namenode)
However, we still can use HBASE and run MapReduce.
we think name node status should be normal.
We try to restart namenode and check ambari-server log
It shows:
ServiceComponentHostImpl:949 - Host role transitioned to a new state, serviceComponentName=NAMENODE, oldState=STARTING, currentState=STARTED
HeartBeatHandler:657 - State of service component NAMENODE of service HDFS of cluster wae has changed from STARTED to INSTALLED
we don't understand why its status change from "STARTED" to "INSTALLED".
In namenode side, we check ambari-agent.log
It shows one warning:
[Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
We think it is irrelevant.
What's the reason that ambari think namenode is stopped?
Is there any way that we can fix this issue?
Run the command ambari-server restart from linux terminal in Ambari server node
Run the command ambari-agent restart from linux terminal in all the nodes in the cluster.
You can run the command hdfs dfsadmin -report from the terminal as hdfs user to confirm all the nodes are up and running.

Two hadoop nodes on same machine while a second machine not joining the cluster

I have a test cluster of two machines, on both of them hadoop is installed. I've configured the hadoop cluster but on admin UI (as in the below picture) I see that two nodes are running on the same master machine, and that the other machine has no Hadoop node.
On master machine following services are running:
~$ jps
26310 ResourceManager
27593 Jps
26216 DataNode
26135 NameNode
26557 NodeManager
26701 JobHistoryServer
On the slave machine:
~$ jps
2614 DataNode
2920 Jps
2707 NodeManager
I don't why the slave is not joining the cluster (It was before). I tried to shutdown all servers on both machines and format HDFS then restarting everything but that didn't help. Any help to figure what's causing that behavior is appreciated.
Fixed, the two machines had same hostname! So I just renamed the slave.

Hadoop datanode services is not starting in the slaves in hadoop

I am trying to configure hadoop-1.0.3 multinode cluster with one master and two slave in my laptop using vmware workstation.
when I ran the start-all.sh from master all daemon process running in master node (namenode,datanode,tasktracker,jobtracker,secondarynamenode) but Datanode and tasktracker is not starting on slave node. Password less ssh is enabled and I can do ssh for both master and slave from my masternode without pwd.
Please help me resolve this.
Stop the cluster.
If you have specifically defined tmp directory location in core-site.xml, then remove all files under those directory.
If you have specifically defined data node and namenode directory in hdfs-site.xml, then delete all the files under those directories.
If you have not defined anything in core-site.xml or hdfs-site.xml, then please remove all the files under /tmp/hadoop-*nameofyourhadoopuser.
Format the namenode.
It should work!

cluster not working with cdh4 tarball installation

I am trying with installing CDH4 using tarball version , but facing issues as in steps taken by me are as below :
i downloaded tarball from link https://ccp.cloudera.com/display/SUPPORT/CDH4+Downloadable+Tarballs
i first untar the hadoop-0.20-mapreduce-0.20.2+1341 tar file
i did with configuration changes in
hadoop-0.20-mapreduce-0.20.2+1341 since i wanted mrv1 not yarn .
the first thing as per mentioned in cdh4 installation was to configure HDFS
i made the relevant changes in
core-site.xml
hdfs-site.xml
mapred-site.xml
masters --- which is my namenode
slaves ---- my datanodes
copied the hadoop configurations on all the nodes in the cluster
did a namenode format .
after format i had to start the cluster , but in the bin folder could not
find start-all.sh script . so in that case i started with command
bin/start-mapred.sh
in the logs it shows jobtracker started and tasktracker started on slave nodes
but when i do a jps
i can see only
jobtracker
jps
further going did a datanode start on the datanode with below command
bin/hadoop-daemon.sh start datanode .
it shows datanode started .
Namenode not getting started , tasktracker not getting started .
when i checked with my logs i could see
ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.FileNotFoundException: webapps/hdfs not found in CLASSPATH
not sure what is stopping my cluster to work .
earlier i had a cdh3 running . so i stopped the cdh3 cluster . Then i started with installing cdh4 . Also i changed all the directories hdfs-site.xml i.e. pointed it new empty directories for namenode and datanode and not the used the ones defined in cdh3.
but still nothing seems to help .
Also i turned off firewall since i do have a root access but same thing it did not work for me .
Any help on above will be great help.
thank you for kind reply but
I do not have
start-dfs.sh file in bin folder
only files in /home/hadoop-2.0.0-mr1-cdh4.2.0/bin folder are as
start-mapred.sh
stop-mapred.sh
hadoop-daemon.sh
hadoop-daemons.sh
hadoop-config.sh
rcc
slaves.sh
hadoop
command now i am using are as below
for starting datanode :
for x in /home/hadoop-2.0.0-mr1-cdh4.2.0/bin/hadoop-* ; do $x start datanode ; done ;
for starting namenode :
bin/start-mapred.sh
still i am working on the same issue .
Hi sorry for the above misunderstanding the following commands can be run to start your datanodes and namenode
To start namenode:
hadoop-daemon.sh start namenode
To start datanode:
hadoop-daemons.sh start datanode
To start secondarynamenode:
hadoop-daemons.sh --hosts masters start secondarynamenode
The jobtracker demon will get started in your master node and tasktraker demons will get started in each of your datanodes after you run the command
bin/start-mapred.sh
In Hadoop Cluster Setup only jobtacker demon will be show by JPS command in masternode and in each of your datanodes you can see Tasktracker demons runnig by using JPS command.
Then you have to start HDFS by running the following command in your masternode
bin/start-dfs.sh
This command will start namenode demon in you namenode machine (in this configuration your masternode itself I believe) and Datanode demons are started in each of your slave nodes.
Now you can run JPS on each of your datanodes and it will give output
tasktracker
datanode
jps
I think this link will be usefull
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Resources