Any command to get active namenode for nameservice in hadoop? - hadoop

The command:
hdfs haadmin -getServiceState machine-98
Works only if you know the machine name. Is there any command like:
hdfs haadmin -getServiceState <nameservice>
which can tell you the IP/hostname of the active namenode?

To print out the namenodes use this command:
hdfs getconf -namenodes
To print out the secondary namenodes:
hdfs getconf -secondaryNameNodes
To print out the backup namenodes:
hdfs getconf -backupNodes
Note: These commands were tested using Hadoop 2.4.0.
Update 10-31-2014:
Here is a python script that will read the NameNodes involved in Hadoop HA from the config file and determine which of them is active by using the hdfs haadmin command. This script is not fully tested as I do not have HA configured. Only tested the parsing using a sample file based on the Hadoop HA Documentation. Feel free to use and modify as needed.
#!/usr/bin/env python
# coding: UTF-8
import xml.etree.ElementTree as ET
import subprocess as SP
if __name__ == "__main__":
hdfsSiteConfigFile = "/etc/hadoop/conf/hdfs-site.xml"
tree = ET.parse(hdfsSiteConfigFile)
root = tree.getroot()
hasHadoopHAElement = False
activeNameNode = None
for property in root:
if "dfs.ha.namenodes" in property.find("name").text:
hasHadoopHAElement = True
nameserviceId = property.find("name").text[len("dfs.ha.namenodes")+1:]
nameNodes = property.find("value").text.split(",")
for node in nameNodes:
#get the namenode machine address then check if it is active node
for n in root:
prefix = "dfs.namenode.rpc-address." + nameserviceId + "."
elementText = n.find("name").text
if prefix in elementText:
nodeAddress = n.find("value").text.split(":")[0]
args = ["hdfs haadmin -getServiceState " + node]
p = SP.Popen(args, shell=True, stdout=SP.PIPE, stderr=SP.PIPE)
for line in p.stdout.readlines():
if "active" in line.lower():
print "Active NameNode: " + node
break;
for err in p.stderr.readlines():
print "Error executing Hadoop HA command: ",err
break
if not hasHadoopHAElement:
print "Hadoop High-Availability configuration not found!"

Found this:
https://gist.github.com/cnauroth/7ff52e9f80e7d856ddb3
This works out of the box on my CDH5 namenodes, although I'm not sure other hadoop distributions will have http://namenode:50070/jmx available - if not, I think it can be added by deploying Jolokia.
Example:
curl 'http://namenode1.example.com:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus'
{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeStatus",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.NameNode",
"State" : "active",
"NNRole" : "NameNode",
"HostAndPort" : "namenode1.example.com:8020",
"SecurityEnabled" : true,
"LastHATransitionTime" : 1436283324548
} ]
So by firing off one http request to each namenode (this should be quick) we can figure out which one is the active one.
It's also worth noting that if you talk WebHDFS REST API to an inactive namenode you will get a 403 Forbidden and the following JSON:
{"RemoteException":{"exception":"StandbyException","javaClassName":"org.apache.hadoop.ipc.StandbyException","message":"Operation category READ is not supported in state standby"}}

In a High Availability Hadoop cluster, there will be 2 namenodes - one active and one standby.
To find the active namenode, we can try executing the test hdfs command on each of the namenodes and find the active name node corresponding to the successful run.
Below command executes successfully if the name node is active and fails if it is a standby node.
hadoop fs -test -e hdfs://<Name node>/
Unix script
active_node=''
if hadoop fs -test -e hdfs://<NameNode-1>/ ; then
active_node='<NameNode-1>'
elif hadoop fs -test -e hdfs://<NameNode-2>/ ; then
active_node='<NameNode-2>'
fi
echo "Active Dev Name node : $active_node"

You can do it in bash with hdfs cli calls, too. With the noted caveat that this takes a bit more time since it's a few calls to the API in succession, but this may be preferable to using a python script for some.
This was tested with Hadoop 2.6.0
get_active_nn(){
ha_name=$1 #Needs the NameServiceID
ha_ns_nodes=$(hdfs getconf -confKey dfs.ha.namenodes.${ha_name})
active=""
for node in $(echo ${ha_ns_nodes//,/ }); do
state=$(hdfs haadmin -getServiceState $node)
if [ "$state" == "active" ]; then
active=$(hdfs getconf -confKey dfs.namenode.rpc-address.${ha_name}.${node})
break
fi
done
if [ -z "$active" ]; then
>&2 echo "ERROR: no active namenode found for ${ha_name}"
exit 1
else
echo $active
fi
}

After reading all the existing answers none seemed to combine the three steps of:
Identifying the namenodes from the cluster.
Resolving the node names to host:port.
Checking the status of each node (without requiring
cluster admin privs).
Solution below combines hdfs getconf calls and JMX service call for node status.
#!/usr/bin/env python
from subprocess import check_output
import urllib, json, sys
def get_name_nodes(clusterName):
ha_ns_nodes=check_output(['hdfs', 'getconf', '-confKey',
'dfs.ha.namenodes.' + clusterName])
nodes = ha_ns_nodes.strip().split(',')
nodeHosts = []
for n in nodes:
nodeHosts.append(get_node_hostport(clusterName, n))
return nodeHosts
def get_node_hostport(clusterName, nodename):
hostPort=check_output(
['hdfs','getconf','-confKey',
'dfs.namenode.rpc-address.{0}.{1}'.format(clusterName, nodename)])
return hostPort.strip()
def is_node_active(nn):
jmxPort = 50070
host, port = nn.split(':')
url = "http://{0}:{1}/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus".format(
host, jmxPort)
nnstatus = urllib.urlopen(url)
parsed = json.load(nnstatus)
return parsed.get('beans', [{}])[0].get('State', '') == 'active'
def get_active_namenode(clusterName):
for n in get_name_nodes(clusterName):
if is_node_active(n):
return n
clusterName = (sys.argv[1] if len(sys.argv) > 1 else None)
if not clusterName:
raise Exception("Specify cluster name.")
print 'Cluster: {0}'.format(clusterName)
print "Nodes: {0}".format(get_name_nodes(clusterName))
print "Active Name Node: {0}".format(get_active_namenode(clusterName))

From java api, you can use HAUtil.getAddressOfActive(fileSystem).

You can do a curl command to find out the Active and secondary Namenode
for example
curl -u username -H "X-Requested-By: ambari" -X GET
http://cluster-hostname:8080/api/v1/clusters//services/HDFS
Regards

I found the below when i simply typed 'hdfs' and found a couple of helpful commands, which could be useful for someone who could maybe come here seeking for help.
hdfs getconf -namenodes
This above command will give you the service id of the namenode. Say, hn1.hadoop.com
hdfs getconf -secondaryNameNodes
This above command will give you the service id of the available secondary namenodes. Say , hn2.hadoop.com
hdfs getconf -backupNodes
This above command will get you the service id of backup nodes, if any.
hdfs getconf -nnRpcAddresses
This above command will give you info of name service id along with rpc port number. Say, hn1.hadoop.com:8020
You're Welcome :)

In HDFS 2.6.0 the one that worked for me
ubuntu#platform2:~$ hdfs getconf -confKey dfs.ha.namenodes.arkin-platform-cluster
nn1,nn2
ubuntu#platform2:~$ sudo -u hdfs hdfs haadmin -getServiceState nn1
standby
ubuntu#platform2:~$ sudo -u hdfs hdfs haadmin -getServiceState nn2
active

Here is example of bash code that returns active name node even if you do not have local hadoop installation.
It also works faster as curl calls are usually faster than hadoop.
Checked on Cloudera 7.1
#!/bin/bash
export nameNode1=myNameNode1
export nameNode2=myNameNode2
active_node=''
T1=`curl --silent --insecure -request GET https://${nameNode1}:9871/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus | grep "\"State\" : \"active\"" | wc -l`
if [ $T1 == 1 ]
then
active_node=${nameNode1}
else
T1=`curl --silent --insecure -request GET https://${nameNode2}:9871/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus | grep "\"State\" : \"active\"" | wc -l`
if [ $T1 == 1 ]
then
active_node=${nameNode2}
fi
fi
echo "Active Dev Name node : $active_node"

#!/usr/bin/python
import subprocess
import sys
import os, errno
def getActiveNameNode () :
cmd_string="hdfs getconf -namenodes"
process = subprocess.Popen(cmd_string, shell=True, stdout=subprocess.PIPE)
out, err = process.communicate()
NameNodes = out
Value = NameNodes.split(" ")
for val in Value :
cmd_str="hadoop fs -test -e hdfs://"+val
process = subprocess.Popen(cmd_str, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = process.communicate()
if (err != "") :
return val
def main():
out = getActiveNameNode()
print(out)
if __name__ == '__main__':
main()

You can simply use the below command. I have tested this in hadoop 3.0 You can check the reference here -
hdfs haadmin -getAllServiceState
It returns the state of all the NameNodes.

more /etc/hadoop/conf/hdfs-site.xml
<property>
<name>dfs.ha.namenodes.nameservice1</name>
<value>namenode1353,namenode1357</value>
</property>
hdfs#:/home/ubuntu$ hdfs haadmin -getServiceState namenode1353
active
hdfs#:/home/ubuntu$ hdfs haadmin -getServiceState namenode1357
standby

Related

Apache Airflow pass data from BashOperator to SparkSubmitOperator

I am trying to login into a server 100.18.10.182 and triggering my spark submit job in the server 100.18.10.36 from .182 server in Apache Airflow. I have used BashOperator (a shell script to ssh into 100.18.10.182 server) and for the spark submit job, I have used SparkSubmitOperator as a downstream to BashOperator.
I am able to execute the BashOperator successfully but the SparkOperator fails with:
Cannot execute: Spark submit
I think this is because I am unable to pass the session of my SSH (of .182 server) into the next SparkSubmitOperator or it may be due to some other issue related to --jars or --packages, not sure here.
I was thinking to use xcom_push to push some data from my BashOperator and xcom_pull into the SparkSubmitOperator but not sure how to pass it in a way that my server is logged in and then my SparkSubmitOperator gets triggered from that box itself?
Airflow dag code:
t2 = BashOperator(
task_id='test_bash_operator',
bash_command="/Users/hardikgoel/Downloads/Work/airflow_dir/shell_files/airflow_prod_ssh_script.sh ",
dag=dag)
t2
t3_config = {
'conf': {
"spark.yarn.maxAppAttempts": "1",
"spark.yarn.executor.memoryOverhead": "8"
},
'conn_id': 'spark_default',
'packages': 'com.sparkjobs.SparkJobsApplication',
'jars': '/var/spark/spark-jobs-0.0.1-SNAPSHOT-1/spark-jobs-0.0.1-SNAPSHOT.jar firstJob',
'driver_memory': '1g',
'total_executor_cores': '21',
'executor_cores': 7,
'executor_memory': '48g'
}
t3 = SparkSubmitOperator(
task_id='t3',
**t3_config)
t2 >> t3
Shell Script code:
#!/bin/bash
USERNAME=hardikgoel
HOSTS="100.18.10.182"
SCRIPT="pwd; ls"
ssh -l ${USERNAME} ${HOSTS} "${SCRIPT}"
echo "SSHed successfully"
if [ ${PIPESTATUS[0]} -eq 0 ]; then
echo "successfull"
fi

Custom job script submission to PBS via Dask?

I have a PBS job script with an executable that writes results to out file.
### some lines
PBS_O_EXEDIR="path/to/software"
EXECUTABLE="executablefile"
OUTFILE="out"
### Copy application directory on compute node
[ -d $PBS_O_EXEDIR ] || mkdir -p $PBS_O_EXEDIR
[ -w $PBS_O_EXEDIR ] && \
rsync -Cavz --rsh=$SSH $HOST:$PBS_O_EXEDIR `dirname $PBS_O_EXEDIR`
[ -d $PBS_O_WORKDIR ] || mkdir -p $PBS_O_WORKDIR
rsync -Cavz --rsh=$SSH $HOST:$PBS_O_WORKDIR `dirname $PBS_O_WORKDIR`
# Change into the working directory
cd $PBS_O_WORKDIR
# Save the jobid in the outfile
echo "PBS-JOB-ID was $PBS_JOBID" > $OUTFILE
# Run the executable
$PBS_O_EXEDIR/$EXECUTABLE >> $OUTFILE
In my project, I have to use Dask for this job submission and to monitor them. Therefore, I have configured jobqueue.yaml file like this.
jobqueue:
pbs:
name: htc_calc
# Dask worker options
cores: 4 # Total number of cores per job
memory: 50GB # Total amount of memory per job
# PBS resource manager options
shebang: "#!/usr/bin/env bash"
walltime: '00:30:00'
exe_dir: "/home/r/rb11/softwares/FPLO/bin"
excutable: "fplo18.00-57-x86_64"
outfile: "out"
job-extra: "exe_dir/executable >> outfile"
However, I got this error while submitting jobs via Dask.
qsub: directive error: e
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f3d8c4a56a8>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/distributed/deploy/spec.py:284> exception=RuntimeError('Command exited with non-zero exit code.\nExit code: 1\nCommand:\nqsub /tmp/tmpwyvkfcmi.sh\nstdout:\n\nstderr:\nqsub: directive error: e \n\n',)>)
Traceback (most recent call last):
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/tornado/ioloop.py", line 758, in _run_callback
ret = callback()
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/tornado/ioloop.py", line 779, in _discard_future_result
future.result()
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/asyncio/futures.py", line 294, in result
raise self._exception
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/asyncio/tasks.py", line 240, in _step
result = coro.send(None)
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/distributed/deploy/spec.py", line 317, in _correct_state_internal
await w # for tornado gen.coroutine support
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/distributed/deploy/spec.py", line 41, in _
await self.start()
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/dask_jobqueue/core.py", line 285, in start
out = await self._submit_job(fn)
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/dask_jobqueue/core.py", line 268, in _submit_job
return self._call(shlex.split(self.submit_command) + [script_filename])
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/dask_jobqueue/core.py", line 368, in _call
"stderr:\n{}\n".format(proc.returncode, cmd_str, out, err)
RuntimeError: Command exited with non-zero exit code.
Exit code: 1
Command:
qsub /tmp/tmpwyvkfcmi.sh
stdout:
stderr:
qsub: directive error: e
How do I specify custom bash script in Dask?
Dask is used for distributing Python applications. In the case of Dask Jobqueue it works by submitting a scheduler and workers to the batch system, which connect together to form their own cluster. You can then submit Python work to the Dask scheduler.
It looks like from your example you are trying to use the cluster setup configuration to run your own bash application instead of Dask.
In order to do this with Dask you should return the jobqueue config to the defaults and instead write a Python function which calls your bash script.
from dask_jobqueue import PBSCluster
cluster = PBSCluster()
cluster.scale(jobs=10) # Deploy ten single-node jobs
from dask.distributed import Client
client = Client(cluster) # Connect this local process to remote workers
client.submit(os.system, "/path/to/your/script") # Run script on all workers
However it just seems like Dask may not be a good fit for what you are trying to do. You would probably be better off just submitting your job to PBS normally.

How to get active namenode hostname from Cloudera Manager REST API?

I'm able to access the Cloudera manager rest API.
curl -u username:password http://cmhost:port/api/v10/clusters/clusterName
How to find the active namenode and resource mangarer hostname?
I couldn't find anything relevant from API docs.
http://cloudera.github.io/cm_api/apidocs/v10/index.html
Note: Cluster is configured with high availability
You need to use this endpoint:
http://cloudera.github.io/cm_api/apidocs/v10/path__clusters_-clusterName-services-serviceName-roles-roleName-.html
Then do the following:
For each Name Node:
$ curl -u username:password \
http://cmhost:port/api/v10/clusters/CLNAME/services/HDFS/roles/NN_NAME
Replacing:
CLNAME with your clusterName
HDFS with your HDFS serviceName
NN_NAME with your NameNode name
This will return the apiRole object which has a field called haStatus. The one that shows "ACTIVE" is the active NameNode.
For the Resource Manager do similar steps:
For each Resource Manager:
$ curl -u username:password \
http://cmhost:port/api/v10/clusters/CLNAME/services/YARN/roles/RM_NAME
Where:
YARN with your YARN serviceName
RM_NAME with your Resource Manager name
Once you have the right NameNode and Resource Manager, use:
http://cloudera.github.io/cm_api/apidocs/v10/path__hosts_-hostId-.html
to map the hostId to the hostname.
You can get a bulk of HDFS related information for hosts by using the REST API:
$ python build.py username:password cmhost:port
$ cat build.py
import sys
import json
import requests
args = sys.argv
if len(args) != 3:
print "Usage: python %s login:password host:port" % args[0]
exit(1)
LP = args[1]
CM = args[2]
host = {}
hosts = requests.get('http://'+LP+'#'+CM+'/api/v10/hosts').json()
for h in hosts['items']:
host[h['hostId']] = h['hostname']
nameservices = requests.get('http://'+LP+'#'+CM+'/api/v10/clusters/cluster/services/hdfs/nameservices').json()
for ns in nameservices['items']:
print('hdfs.NS:' + ns['name'])
services = requests.get('http://'+LP+'#'+CM+'/api/v10/clusters/cluster/services').json()
for s in services['items']:
if (s['name'] == 'hdfs'):
roles = requests.get('http://'+LP+'#'+CM+'/api/v10/clusters/cluster/services/' + s['name'] + '/roles').json()
srv = {}
for r in roles['items']:
suff = '.' + r.get('haStatus') if r.get('haStatus') else ''
key = s['name'] + '.' + r['type'] + suff
srv[key] = srv.get(key) + ',' + host[r['hostRef']['hostId']] if srv.get(key) else host[r['hostRef']['hostId']]
for s in srv:
print(s + ":" + ','.join(sorted(srv[s].split(','))))
Then you'll get something like this, just grep for hdfs.NAMENODE.ACTIVE (or slightly change the python script):
hdfs.NS:H1
hdfs.NAMENODE.ACTIVE:h6
hdfs.NAMENODE.STANDBY:h1
hdfs.FAILOVERCONTROLLER:h1,h2,h3
hdfs.DATANODE:h1
hdfs.HTTPFS:h1,h2,h3
hdfs.GATEWAY:h1,h2,h3
hdfs.JOURNALNODE:h4,h5
hdfs.BALANCER:h7

How to check the namenode status?

As a developer how can I check the current state of a given Namenode if it is active or standby? I have tried the getServiceState command but that is only intended for the admins with superuser access. Any command that can be run from the edge node to get the status of a provided namemnode??
Finally, I got an answer to this.
As a developer, one cannot execute dfsadmin commands due to the restriction. To check the namenode availability I used the below if loop in shellscript which did the trick. It wont tell you exactly the namenode is active but with the loop you can easily execute the desired program accordingly.
if hdfs dfs -test -e hdfs://namenodeip/* ; then
echo exist
else
echo not exist
fi
I tried your solution but that didn't work. Here's mine which works perfectly for me ( bash script ).
until curl http://<namenode_ip>:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus|grep -q 'active'; do
printf "Waiting for namenode!"
sleep 5
done
Explanation:
Running this curl request outputs namenode's status as json ( sample below ) which has a State flag indicating its status. So I'm simply checking for 'active' text in curl request output. For any other language, you just have to do a curl request and check its output.
{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeStatus",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.NameNode",
"NNRole" : "NameNode",
"HostAndPort" : "<namenode_ip>:8020",
"SecurityEnabled" : false,
"LastHATransitionTime" : 0,
"State" : "active"
} ]
}

Hadoop Pig fs test command

Wondering what does this line mean? Searched around but cannot find a reference for this command,
Pig.fs("test -e " + pathToCheck) == 0:
thanks in advance,
Lin
Use the command line tool and run hadoop fs -help and got :
-test -[defsz] <path> :
Answer various questions about <path>, with result via exit status.
-d return 0 if <path> is a directory.
-e return 0 if <path> exists.
-f return 0 if <path> is a file.
-s return 0 if file <path> is greater than zero bytes in size.
-z return 0 if file <path> is zero bytes in size, else return 1.

Resources