I am struggling to find out how to determine the status of different daemons in Hadoop. I know the status of only two daemons (namenode, jobtracker) since it is default in cloudera. How do I find the rest?
Use jps (need jdk installed) to see all the daemons.
Besides jps, you can scan the results given by below to check if processes are running-
ps -ef | grep hadoop
You can also use the following once the server is running
./hadoop dfsadmin -report
Another place to look at is the web interfaces of namenode and jobtracker for more details
Related
I have Hadoop-3.1.3 and I can upload a file in hadoop pseudo distributed mode, also can display the contents of file.
but when I call jps command i am getting the following output
10912 DataNode
13072 ResourceManager
4480 NodeManager
6584 Jps
664 Namenode
I am unable to find secondary name node, is there a problem with any configuration or hadoop installation?
You're assuming that secondary namenode is started with psuedo-distributed?
If the basic commands work, then its fine.
You need to look at log files to know if something is broken, before asking elsewhere....
In general, I always suggest you use Apache Ambari to provision a Hadoop cluster
You can start the Secondary NameNode manually and observe the start up logs to see if there's anything wrong:
hdfs secondarynamenode
If there's no error, run jps again and hopefully you see SecondaryNameNode listed.
I'd suggest running hdfs --help and checking out all of the options, there's a lot of good stuff there.
I am new in hadoop and I have installed hadoop 2.7.2 into two machines which are master and slave1. I have followed this tutorial. It was not mentioned in the tutorial but I have also edited JAVA_HOME and HADOOP_CONF_DIR variables in hadoop-env.sh. At the end I have two machines hadoop installed. In master NameNode, DataNode, SecondaryNameNode, ResourceManager and NodeManager are running and in slave1 DataNode and NodeManager are running.
I am able to go to master:8088 in the browser and when I go http://master:8088/cluster/nodes, there is only master node here. I am not able to go isci17:8088 and that is not a live node. Why could it be?
Port 8088 is the resource manager web ui port, so if it is running on master you probably won't have it on the slave.
You should be able to also go to the name node web ui on port 50070 on your name node as well to see status such as http://master:50070/ and the MapReduce JobHistory Server at http://hostname:19888/ for a web ui.
If you have access to a terminal session you run the following command on each server as root/sudo user to see which ports are listening on which server in a Linux terminal session;
sudo lsof -i tcp | grep -i LISTEN
You also also run hadoop cli commands that will give you info;
You can run the following to check hadoops ports in a terminal session.
hdfs portmap
Other Health checks on command line;
hdfs classpath
hdfs getconf -namenodes
hdfs dfsadmin -report -live
hdfs dfsadmin -report -dead
hdfs dfsadmin -printTopology
Depending on if the hadoop cli command works automatically you might have to find the executable to run ./hdfs. Also depending on distro/version you might have to replace the command hdfs with the command hadoop.
If you want to see your cluster configurations check your /etc/hadoop/conf folder along with /etc/hadoop/hive . You will find about 5-10 *-site.xml files. There configuration files contain your clusters configuration with the hostnames and ports.
How can i get the following information on the Hadoop Cluster ?
1. namenode and jobtracker name
2. list of all nodes with their roles on the cluster
To get namenode info:
hdfs getconf -confKey fs.defaultFS
For jobtracker
hdfs getconf -confKey yarn.resourcemanager.address.rm2
I am using cloudera based cluster and also working on EMR.
In both the clusters I can find the information from the configuration dir.
To get the namenode information go into core-site.xml file and look for the fs.defaultFS as #daemon12 said
Here is the straight way to get it.
For namenode information use the below command
cat /etc/hadoop/conf/core-site.xml | grep '8020'
Here is the result
<value>hdfs://10.872.22.1:8020</value>
The values inside the value tag is the name node information.
Similarly to get the jobtracker information do the below
cat /etc/hadoop/conf/yarn-site.xml | grep '8032'
Here is the result
<value>10.872.12.32:8032</value>
Again the jobtracker value is inside the value tag.
Generally the NN and JT information is used to run the Oozie jobs and this method will help you for that purpose.
DISCLAIMER: I am grepping the result of cat based on the namenode and jobtracker port number which is 8020 and 8032 respectively. This is widely known ports for NN and JT in Hadoop. If your organization uses a different one, please use that to get more appropriate result.
Along with the command-line way of getting information, you can get the similar information in the browser also:
http://<namenode>:50070 (For in general hadoop informtion)
http://<namenode>:50030 (For JobTracker related information)
These are default ports. You can check here for more information.
With the correct granted authorization, (like sudo -u hdfs ), you may try :
hdfs dfsadmin -report
17223 JobTracker
16897 DataNode
17518 Jps
17451 TaskTracker
17129 SecondaryNameNode
8571 FsShell
Name node is not displaying
Seems like you are using the same user for starting all users, so If namenode is coming in the jps output, Probably namenode daemons might be got killed to not started properly. you may use the following command for ensuring namenode process running or not
ps aux | grep -i namenode
If not running you may need to format your namenode before starting hdfs service, stop all hdfs deamons using stop-dfs.sh script then format your namenode using the below command and start HDFS using the start-dfs.sh script.
hadoop namenode -format
Go through the below SO post if you are hitting the below situation.
Hadoop namenode needs to be formatted after every computer start
If you are looking to check all running JVMs on the host via 'jps',
you need to run the command as root. Otherwise, 'jps' will only show
JVMs running as your currently logged-in user.
Please see this link for more:
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/1dlxmB_GVuU
Should always check the logs.. :)
I would like to see if the hdfs file system for Hadoop is working properly. I know that jps lists the daemons that are running, but I don't actually know which daemons to look for.
I ran the following commands:
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode
$HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager
$HADOOP_PREFIX/sbin/yarn-daemon.sh start nodemanager
Only namenode, resourcemanager, and nodemanager appeared when I entered jps.
Which daemons are supposed to be running in order for hdfs/Hadoop to function? Also, what could you do to fix hdfs if it is not running?
Use any of the following approaches for to check your deamons status
JPS command would list all active deamons
the below is the most appropriate
hadoop dfsadmin -report
This would list down details of datanodes which is basically in a sense your HDFS
cat any file available in hdfs path.
So, I spent two weeks validating my setup (it was fine) , finally found this command:
sudo -u hdfs jps
Initially my simple JPS command was showing only one process, but Hadoop 2.6 under Ubuntu LTS 14.04 was up. I was using 'Sudo' to run the startup scripts.
Here is the startup that work with JPS listing multiple processes:
sudo su hduser
/usr/local/hadoop/sbin/start-dfs.sh
/usr/local/hadoop/sbin/start-yarn.sh