DataNode automatically getting restarted in the CDH5 cluster - hadoop

We have setup a cluster with 6 slave nodes. I am trying to see how replication happens when one of the DataNode dies.
I logged into one of the slave and killed the DataNode using the kill -9 command. After sometime the DataNode is restarted automatically and HDFS gets back into healthy status. I am verify this because the PID of the DataNode has changed.
I don't see any documentation on the above behavior of DataNode. Is this the Apache Hadoop or Cloudera CDH feature? Any reference to the documentation is appreciated.

As the pid of datanode has been changed, I don't think it is a behavior of datanode. If you are managing your cluster using Cloudera Manager, there is an option for restarting datanode daemon if it fails(Automatically Restart Process). This option will be set by default. When the datanode process gets failed or killed, As Automatic restart option is set Cloudera Scm agent will start the the datanode daemon.
For Automatic restart option : Choose HDFS services -> go to Configuration section -> Search for automatic restart.
This feature is available in CM 4.X release as well.

Related

Datanode is not starting in hadoop-hbase start?

I am running the following script to run all the hbase and hadoop processes in my hbase setup in virtual machine.
#!/bin/sh
start-dfs.sh
start-yarn.sh
start-hbase.sh
#hbase-daemon.sh start rest
hbase-daemon.sh start thriftr
Earlier all the processes used to run properly. But recently, I have force shutdown my virtual machine process without stopping the hbase and hadoop related processes. Then my datanode process stopped. Later I have formatted my name node process, using some suggestion on online. Now my name node comes properly but data node process does not come up. When I check the running java process (JPS) the datanode process is missing
4672 NodeManager
5474 ThriftServer
4098 NameNode
4408 SecondaryNameNode
5723 Jps
4555 ResourceManager
5372 HRegionServer
5246 HMaster
5182 HQuorumPeer
But earlier the DataNode process used to come properly. Is it because of formatting my namenode. Do I need to change any config data or someting also?

Get list of executed job on Hadoop cluster after cluster reboot

I have a hadoop cluster 2.7.4 version. Due to some reason, I have to restart my cluster. I need job IDs of those jobs that were executed on cluster before cluster reboot. Command mapred -list provide currently running of waiting jobs details only
You can see a list of all jobs on the Yarn Resource Manager Web UI.
In your browser go to http://ResourceManagerIPAdress:8088/
This is how the history looks on the Yarn cluster I am currently testing on (and I restarted the services several times):
See more info here

Do we need to put namenode in safe mode before restarting the job tracker?

I have a Hadoop cluster running Cloudera's CDH3, Apache Hadoop's 0.20.2 equivalent. I want to restart the job-tracker as there are some jobs which are not getting killed. I tried killing them from the command line, the command executes successfully, but the jobs are still in Job Cleanup: Pending status. Anyways I want to restart the job-tracker and see if that cleanup the jobs. I know the command to restart the job-tracker, but I am not sure if I need to put the name-node in safe-mode before I restart the job-tracker.
You can try to kill the unwanted jobs using hadoop job -kill <Job-ID> and check for command status echo "$?". If that doesn't work, Restart is the only option.
Hadoop Jobtracker and namenodes are independent components, No need to execute namenode safenode before Jobtracker restart. You can restart Jobtracker process alone.(tasktracker if required)

Ambari show namenode is stop but actually namenode is still working

We are using HDP 2.7.1.2.3 with Ambari 2.1.2
After finish setup, every node status is correct.
But oneday ambari suddenly show namdenode is stopped.(we don't change any config of ambari or namenode)
However, we still can use HBASE and run MapReduce.
we think name node status should be normal.
We try to restart namenode and check ambari-server log
It shows:
ServiceComponentHostImpl:949 - Host role transitioned to a new state, serviceComponentName=NAMENODE, oldState=STARTING, currentState=STARTED
HeartBeatHandler:657 - State of service component NAMENODE of service HDFS of cluster wae has changed from STARTED to INSTALLED
we don't understand why its status change from "STARTED" to "INSTALLED".
In namenode side, we check ambari-agent.log
It shows one warning:
[Alert][namenode_directory_status] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
We think it is irrelevant.
What's the reason that ambari think namenode is stopped?
Is there any way that we can fix this issue?
Run the command ambari-server restart from linux terminal in Ambari server node
Run the command ambari-agent restart from linux terminal in all the nodes in the cluster.
You can run the command hdfs dfsadmin -report from the terminal as hdfs user to confirm all the nodes are up and running.

How to bring down your namenode?

How to bring down your Namenode in Hadoop 1.2.1 on CentOs and swap your namenode with a Datanode instance, also I have to make sure no data is lost during the process.
I am using Hadoop 1.2.1 with master, slave 1 and slave 2 nodes.
I am looking for the Unix commands or the changes I need to make in the configuration files.
Please ask for any particular details if needed!
You can take a back up of namenode metadata and kill namenode. Install namenode packages on other node of interest and put the backup copy of metadata in namenode data dir. Now start namenode this should pick up your old metadata. Remember to change namenode details in all config files.

Resources