How can i get the following information on the Hadoop Cluster ?
1. namenode and jobtracker name
2. list of all nodes with their roles on the cluster
To get namenode info:
hdfs getconf -confKey fs.defaultFS
For jobtracker
hdfs getconf -confKey yarn.resourcemanager.address.rm2
I am using cloudera based cluster and also working on EMR.
In both the clusters I can find the information from the configuration dir.
To get the namenode information go into core-site.xml file and look for the fs.defaultFS as #daemon12 said
Here is the straight way to get it.
For namenode information use the below command
cat /etc/hadoop/conf/core-site.xml | grep '8020'
Here is the result
<value>hdfs://10.872.22.1:8020</value>
The values inside the value tag is the name node information.
Similarly to get the jobtracker information do the below
cat /etc/hadoop/conf/yarn-site.xml | grep '8032'
Here is the result
<value>10.872.12.32:8032</value>
Again the jobtracker value is inside the value tag.
Generally the NN and JT information is used to run the Oozie jobs and this method will help you for that purpose.
DISCLAIMER: I am grepping the result of cat based on the namenode and jobtracker port number which is 8020 and 8032 respectively. This is widely known ports for NN and JT in Hadoop. If your organization uses a different one, please use that to get more appropriate result.
Along with the command-line way of getting information, you can get the similar information in the browser also:
http://<namenode>:50070 (For in general hadoop informtion)
http://<namenode>:50030 (For JobTracker related information)
These are default ports. You can check here for more information.
With the correct granted authorization, (like sudo -u hdfs ), you may try :
hdfs dfsadmin -report
Related
I have Hadoop-3.1.3 and I can upload a file in hadoop pseudo distributed mode, also can display the contents of file.
but when I call jps command i am getting the following output
10912 DataNode
13072 ResourceManager
4480 NodeManager
6584 Jps
664 Namenode
I am unable to find secondary name node, is there a problem with any configuration or hadoop installation?
You're assuming that secondary namenode is started with psuedo-distributed?
If the basic commands work, then its fine.
You need to look at log files to know if something is broken, before asking elsewhere....
In general, I always suggest you use Apache Ambari to provision a Hadoop cluster
You can start the Secondary NameNode manually and observe the start up logs to see if there's anything wrong:
hdfs secondarynamenode
If there's no error, run jps again and hopefully you see SecondaryNameNode listed.
I'd suggest running hdfs --help and checking out all of the options, there's a lot of good stuff there.
It might be a stupid question but I needed to know.
For example: Why do we need hadoop fs -ls command to list files? Instead why can't just ls be used?
If in pseudo-distributed mode, is that case part of filesystem is given to hadoop file system that is only accessible to hadoop namenode daemon...this is my guess. Please explain.
ls will list all file spaces available to your computer
You can set the fs.defaultFS property to be file:///, the default, then both will act the same, but this is not considered pseudodistributed mode.
Pseudodistributed node requires that you specify a list of datanode and namenode volumes on each respective system in the cluster, and hdfs dfs commands will only list those files that are known by the namenode.
And its called pseudodistributed only because it's a single node. Once you have that working, adding another node should be straightforward given appropriate networking connections
I am struggling to find out how to determine the status of different daemons in Hadoop. I know the status of only two daemons (namenode, jobtracker) since it is default in cloudera. How do I find the rest?
Use jps (need jdk installed) to see all the daemons.
Besides jps, you can scan the results given by below to check if processes are running-
ps -ef | grep hadoop
You can also use the following once the server is running
./hadoop dfsadmin -report
Another place to look at is the web interfaces of namenode and jobtracker for more details
17223 JobTracker
16897 DataNode
17518 Jps
17451 TaskTracker
17129 SecondaryNameNode
8571 FsShell
Name node is not displaying
Seems like you are using the same user for starting all users, so If namenode is coming in the jps output, Probably namenode daemons might be got killed to not started properly. you may use the following command for ensuring namenode process running or not
ps aux | grep -i namenode
If not running you may need to format your namenode before starting hdfs service, stop all hdfs deamons using stop-dfs.sh script then format your namenode using the below command and start HDFS using the start-dfs.sh script.
hadoop namenode -format
Go through the below SO post if you are hitting the below situation.
Hadoop namenode needs to be formatted after every computer start
If you are looking to check all running JVMs on the host via 'jps',
you need to run the command as root. Otherwise, 'jps' will only show
JVMs running as your currently logged-in user.
Please see this link for more:
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/1dlxmB_GVuU
Should always check the logs.. :)
I want to access hdfs with fully qualified names such as :
hadoop fs -ls hdfs://machine-name:8020/user
I could also simply access hdfs with
hadoop fs -ls /user
However, I am writing test cases that should work on different distributions(HDP, Cloudera, MapR...etc) which involves accessing hdfs files with qualified names.
I understand that hdfs://machine-name:8020 is defined in core-site.xml as fs.default.name. But this seems to be different on different distributions. For example, hdfs is maprfs on MapR. IBM BigInsights don't even have core-site.xml in $HADOOP_HOME/conf.
There doesn't seem to a way hadoop tells me what's defined in fs.default.name with it's command line options.
How can I get the value defined in fs.default.name reliably from command line?
The test will always be running on namenode, so machine name is easy. But getting the port number(8020) is a bit difficult. I tried lsof, netstat.. but still couldn't find a reliable way.
Below command available in Apache hadoop 2.7.0 onwards, this can be used for getting the values for the hadoop configuration properties. fs.default.name is deprecated in hadoop 2.0, fs.defaultFS is the updated value. Not sure whether this will work incase of maprfs.
hdfs getconf -confKey fs.defaultFS # ( new property )
or
hdfs getconf -confKey fs.default.name # ( old property )
Not sure whether there is any command line utilities available for retrieving configuration properties values in Mapr or hadoop 0.20 hadoop versions. In case of this situation you better try the same in Java for retrieving the value corresponding to a configuration property.
Configuration hadoop conf = Configuration.getConf();
System.out.println(conf.get("fs.default.name"));
fs.default.name is deprecated.
use : hdfs getconf -confKey fs.defaultFS
I encountered this answer when I was looking for HDFS URI. Generally that's a URL pointing to the namenode. While hdfs getconf -confKey fs.defaultFS gets me the name of the nameservice but it won't help me building the HDFS URI.
I tried the command below to get a list of the namenodes instead
hdfs getconf -namenodes
This gave me a list of all the namenodes, primary first followed by secondary. After that constructing the HDFS URI was simple
hdfs://<primarynamenode>/
you can use
hdfs getconf -confKey fs.default.name
Yes, hdfs getconf -namenodes will show list of namenodes.