NameNode HA using supervisord - hadoop

I am getting an error while implementing Hadoop NameNode HA using supervisord
when I run the command without using supervisord, it works fine,
but supervisord seems to be having problems especially with zkfc
The namenode supervisord configuration currently used in NameNode is as follows:
[program:NameNode]
command=/home/deploy/hadoop/bin/hdfs namenode
user=deploy
[program:ResourceManager]
command=/home/deploy/hadoop/bin/yarn resourcemanager
user=deploy
[program:DFSZKFailoverController]
command=/home/deploy/hadoop/bin/hdfs zkfc
user=deploy
What I'm wondering is,
What is the difference between 'hdfs --daemon start {something}' and 'hdfs {something}'?
When I checked the hadoop/bin/hdfs code, I don't know what the difference is.
When the above two commands are executed with supervisord, when the daemon dies supervisord does not automatically failover because supervisord does not recognize the first command, but NameNode HA works when the daemon is alive
The second command is recognized by supervisord and when the daemon dies, supervisord automatically performs failover, but NameNode HA does not work.
It means when an Active NameNode dies, another NameNode does not automatically change to active. By the way, if I restart zkfc of all NameNodes by 'supervisorctl restart', Active NameNode is elected, but it is one-time
If NameNode uses 'hdfs namenode' and zkfc uses 'hdfs --deamon start zkfc', NameNode HA works, but fails over automatically when zkfc dies.
In conclusion, I'm curious as to the difference between the two commands and why zkfc doesn't work normally in the second command.

Related

Datanode is not starting in hadoop-hbase start?

I am running the following script to run all the hbase and hadoop processes in my hbase setup in virtual machine.
#!/bin/sh
start-dfs.sh
start-yarn.sh
start-hbase.sh
#hbase-daemon.sh start rest
hbase-daemon.sh start thriftr
Earlier all the processes used to run properly. But recently, I have force shutdown my virtual machine process without stopping the hbase and hadoop related processes. Then my datanode process stopped. Later I have formatted my name node process, using some suggestion on online. Now my name node comes properly but data node process does not come up. When I check the running java process (JPS) the datanode process is missing
4672 NodeManager
5474 ThriftServer
4098 NameNode
4408 SecondaryNameNode
5723 Jps
4555 ResourceManager
5372 HRegionServer
5246 HMaster
5182 HQuorumPeer
But earlier the DataNode process used to come properly. Is it because of formatting my namenode. Do I need to change any config data or someting also?

Do we need to put namenode in safe mode before restarting the job tracker?

I have a Hadoop cluster running Cloudera's CDH3, Apache Hadoop's 0.20.2 equivalent. I want to restart the job-tracker as there are some jobs which are not getting killed. I tried killing them from the command line, the command executes successfully, but the jobs are still in Job Cleanup: Pending status. Anyways I want to restart the job-tracker and see if that cleanup the jobs. I know the command to restart the job-tracker, but I am not sure if I need to put the name-node in safe-mode before I restart the job-tracker.
You can try to kill the unwanted jobs using hadoop job -kill <Job-ID> and check for command status echo "$?". If that doesn't work, Restart is the only option.
Hadoop Jobtracker and namenodes are independent components, No need to execute namenode safenode before Jobtracker restart. You can restart Jobtracker process alone.(tasktracker if required)

Ambari show namenode is stop but actually namenode is still working

We are using HDP 2.7.1.2.3 with Ambari 2.1.2
After finish setup, every node status is correct.
But oneday ambari suddenly show namdenode is stopped.(we don't change any config of ambari or namenode)
However, we still can use HBASE and run MapReduce.
we think name node status should be normal.
We try to restart namenode and check ambari-server log
It shows:
ServiceComponentHostImpl:949 - Host role transitioned to a new state, serviceComponentName=NAMENODE, oldState=STARTING, currentState=STARTED
HeartBeatHandler:657 - State of service component NAMENODE of service HDFS of cluster wae has changed from STARTED to INSTALLED
we don't understand why its status change from "STARTED" to "INSTALLED".
In namenode side, we check ambari-agent.log
It shows one warning:
[Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
We think it is irrelevant.
What's the reason that ambari think namenode is stopped?
Is there any way that we can fix this issue?
Run the command ambari-server restart from linux terminal in Ambari server node
Run the command ambari-agent restart from linux terminal in all the nodes in the cluster.
You can run the command hdfs dfsadmin -report from the terminal as hdfs user to confirm all the nodes are up and running.

Hadoop - Restart datanode and tasktracker

I want to bring down a single datanode and tasktracker, so that some new changes that i've made in my mapred-site.xml take effect, such as mapred.reduce.child.java.opts etc. How do I do that? However I don't want to bring down the whole cluster since i have active jobs running.
Also, how can that be done ensuring that the namenode does not copy the relevant data blocks of a "temporarily down" datanode onto another node
To stop
You can stop the DataNodes and TaskTrackers from NameNode's hadoop bin directory.
./hadoop-daemon.sh stop tasktracker
./hadoop-daemon.sh stop datanode
So this script checks for slaves file in conf directory of hadoop to stop the DataNodes and same with the TaskTracker.
To start
Again this script checks for slaves file in conf directory of hadoop to start the DataNodes and TaskTrackers.
./hadoop-daemon.sh start tasktracker
./hadoop-daemon.sh start datanode
In Hadoop 2.7.2, tasktracker is long gone, to manually restart services out on slaves:
yarn-daemon.sh stop nodemanager
hadoop-daemon.sh stop datanode
hadoop-daemon.sh start datanode
yarn-daemon.sh start nodemanager
Ssh into the datanode/tasktracker machine and cd into the bin directory of hadoop.
Invoke
./hadoop-daemon.sh stop tasktracker
./hadoop-daemon.sh stop datanode
./hadoop-daemon.sh start datanode
./hadoop-daemon.sh start tasktracker
I'm not sure if restarting the tasktracker is required for the changes in mapred-site.xml to take effect. Please leave a comment so that i can correct my answer if needed

What is best way to start and stop hadoop ecosystem, with command line?

I see there are several ways we can start hadoop ecosystem,
start-all.sh & stop-all.sh
Which say it's deprecated use start-dfs.sh & start-yarn.sh.
start-dfs.sh, stop-dfs.sh and start-yarn.sh, stop-yarn.sh
hadoop-daemon.sh namenode/datanode and yarn-deamon.sh resourcemanager
EDIT: I think there has to be some specific use cases for each command.
start-all.sh & stop-all.sh : Used to start and stop hadoop daemons all at once. Issuing it on the master machine will start/stop the daemons on all the nodes of a cluster. Deprecated as you have already noticed.
start-dfs.sh, stop-dfs.sh and start-yarn.sh, stop-yarn.sh : Same as above but start/stop HDFS and YARN daemons separately on all the nodes from the master machine. It is advisable to use these commands now over start-all.sh & stop-all.sh
hadoop-daemon.sh namenode/datanode and yarn-deamon.sh resourcemanager : To start individual daemons on an individual machine manually. You need to go to a particular node and issue these commands.
Use case : Suppose you have added a new DN to your cluster and you need to start the DN daemon only on this machine,
bin/hadoop-daemon.sh start datanode
Note : You should have ssh enabled if you want to start all the daemons on all the nodes from one machine.
Hope this answers your query.
From Hadoop page,
start-all.sh
This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on your machine.
start-dfs.sh
This will bring up HDFS with the Namenode running on the machine you ran the command on. On such a machine you would need start-mapred.sh to separately start the job tracker
start-all.sh/stop-all.sh has to be run on the master node
You would use start-all.sh on a single node cluster (i.e. where you would have all the services on the same node.The namenode is also the datanode and is the master node).
In multi-node setup,
You will use start-all.sh on the master node and would start what is necessary on the slaves as well.
Alternatively,
Use start-dfs.sh on the node you want the Namenode to run on. This will bring up HDFS with the Namenode running on the machine you ran the command on and Datanodes on the machines listed in the slaves file.
Use start-mapred.sh on the machine you plan to run the Jobtracker on. This will bring up the Map/Reduce cluster with Jobtracker running on the machine you ran the command on and Tasktrackers running on machines listed in the slaves file.
hadoop-daemon.sh as stated by Tariq is used on each individual node. The master node will not start the services on the slaves.In a single node setup this will act same as start-all.sh.In a multi-node setup you will have to access each node (master as well as slaves) and execute on each of them.
Have a look at this start-all.sh it call config followed by dfs and mapred
Starting
start-dfs.sh (starts the namenode and the datanode)
start-mapred.sh (starts the jobtracker and the tasktracker)
Stopping
stop-dfs.sh
stop-mapred.sh

Resources