How to get active namenode hostname from Cloudera Manager REST API? - hadoop

I'm able to access the Cloudera manager rest API.
curl -u username:password http://cmhost:port/api/v10/clusters/clusterName
How to find the active namenode and resource mangarer hostname?
I couldn't find anything relevant from API docs.
http://cloudera.github.io/cm_api/apidocs/v10/index.html
Note: Cluster is configured with high availability

You need to use this endpoint:
http://cloudera.github.io/cm_api/apidocs/v10/path__clusters_-clusterName-services-serviceName-roles-roleName-.html
Then do the following:
For each Name Node:
$ curl -u username:password \
http://cmhost:port/api/v10/clusters/CLNAME/services/HDFS/roles/NN_NAME
Replacing:
CLNAME with your clusterName
HDFS with your HDFS serviceName
NN_NAME with your NameNode name
This will return the apiRole object which has a field called haStatus. The one that shows "ACTIVE" is the active NameNode.
For the Resource Manager do similar steps:
For each Resource Manager:
$ curl -u username:password \
http://cmhost:port/api/v10/clusters/CLNAME/services/YARN/roles/RM_NAME
Where:
YARN with your YARN serviceName
RM_NAME with your Resource Manager name
Once you have the right NameNode and Resource Manager, use:
http://cloudera.github.io/cm_api/apidocs/v10/path__hosts_-hostId-.html
to map the hostId to the hostname.

You can get a bulk of HDFS related information for hosts by using the REST API:
$ python build.py username:password cmhost:port
$ cat build.py
import sys
import json
import requests
args = sys.argv
if len(args) != 3:
print "Usage: python %s login:password host:port" % args[0]
exit(1)
LP = args[1]
CM = args[2]
host = {}
hosts = requests.get('http://'+LP+'#'+CM+'/api/v10/hosts').json()
for h in hosts['items']:
host[h['hostId']] = h['hostname']
nameservices = requests.get('http://'+LP+'#'+CM+'/api/v10/clusters/cluster/services/hdfs/nameservices').json()
for ns in nameservices['items']:
print('hdfs.NS:' + ns['name'])
services = requests.get('http://'+LP+'#'+CM+'/api/v10/clusters/cluster/services').json()
for s in services['items']:
if (s['name'] == 'hdfs'):
roles = requests.get('http://'+LP+'#'+CM+'/api/v10/clusters/cluster/services/' + s['name'] + '/roles').json()
srv = {}
for r in roles['items']:
suff = '.' + r.get('haStatus') if r.get('haStatus') else ''
key = s['name'] + '.' + r['type'] + suff
srv[key] = srv.get(key) + ',' + host[r['hostRef']['hostId']] if srv.get(key) else host[r['hostRef']['hostId']]
for s in srv:
print(s + ":" + ','.join(sorted(srv[s].split(','))))
Then you'll get something like this, just grep for hdfs.NAMENODE.ACTIVE (or slightly change the python script):
hdfs.NS:H1
hdfs.NAMENODE.ACTIVE:h6
hdfs.NAMENODE.STANDBY:h1
hdfs.FAILOVERCONTROLLER:h1,h2,h3
hdfs.DATANODE:h1
hdfs.HTTPFS:h1,h2,h3
hdfs.GATEWAY:h1,h2,h3
hdfs.JOURNALNODE:h4,h5
hdfs.BALANCER:h7

Related

Nomad job using exec fails when running any bash command

I have tried everything and I just can’t get an exec type job to run. I tried it on 3 different clusters and it fails on all.
The job prunes docker containers and just runs docker system prune -a.
This is the config section:
driver = "exec"
config {
command = "bash"
args = ["-c",
" docker system prune -a "]
}
No logs and containers are not pruned:
job "docker-cleanup" {
type = "system"
constraint {
attribute = "${attr.kernel.name}"
operator = "="
value = "linux"
}
datacenters = ["dc1"]
group "docker-cleanup" {
restart {
interval = "24h"
attempts = 0
mode = "fail"
}
task "docker-system-prune" {
driver = "exec"
config {
command = "bash"
args = ["-c",
" docker system prune -a "]
}
resources {
cpu = 100
memory = 50
network {
mbits = 1
}
}
}
}
}
What am I doing wrong?
I would suggest you provide the output to make it easier to analyze.
One thing you can try, is to add the full path to the bash executable.
driver = "exec"
config {
command = "/bin/bash"
args = ["-c",
" docker system prune -a "]
}
Further you are missing the "--force" parameter on system prune, without it - docker system prune asks for confirmation.
docker system prune --all --force
As I know all args should be provided separately:
driver = "exec"
config {
command = "/bin/bash"
args = [
"-c", "docker", "system", "prune", -a "
]
}

How to connect Opentracing application to a remote Jaeger collector

I am using Jaeger UI to display traces from my application. It's work fine for me if both application an Jaeger are running on same server. But I need to run my Jaeger collector on a different server. I tried out with JAEGER_ENDPOINT, JAEGER_AGENT_HOST and JAEGER_AGENT_PORT, but it failed.
I don't know, whether my values setting for these variables is wrong or not. Whether it required any configuration settings inside application code?
Can you provide me any documentation for this problem?
In server 2 , Install jaeger
$ docker run -d --name jaeger \
-p 5775:5775/udp \
-p 6831:6831/udp \
-p 6832:6832/udp \
-p 5778:5778 \
-p 16686:16686 \
-p 14268:14268 \
-p 9411:9411 \
jaegertracing/all-in-one:latest
In server 1, set these environment variables.
JAEGER_SAMPLER_TYPE=probabilistic
JAEGER_SAMPLER_PARAM=1
JAEGER_SAMPLER_MANAGER_HOST_PORT=(EnterServer2HostName):5778
JAEGER_REPORTER_LOG_SPANS=false
JAEGER_AGENT_HOST=(EnterServer2HostName)
JAEGER_AGENT_PORT=6831
JAEGER_REPORTER_FLUSH_INTERVAL=1000
JAEGER_REPORTER_MAX_QUEUE_SIZE=100
application-server-id=server-x
Change the tracer registration application code as below in server 1, so that it will get the configurations from the environment variables.
#Produces
#Singleton
public static io.opentracing.Tracer jaegerTracer() {
String serverInstanceId = System.getProperty("application-server-id");
if(serverInstanceId == null) {
serverInstanceId = System.getenv("application-server-id");
}
return new Configuration("ApplicationName" + (serverInstanceId!=null && !serverInstanceId.isEmpty() ? "-"+serverInstanceId : ""),
Configuration.SamplerConfiguration.fromEnv(),
Configuration.ReporterConfiguration.fromEnv())
.getTracer();
}
Hope this works!
Check this link for integrating elasticsearch as the persistence storage backend so that the traces will not remove once the Jaeger instance is stopped.
How to configure Jaeger with elasticsearch?
Specify "JAEGER_AGENT_HOST" and ensure "local_agent" is not specified in tracer config file.
Below is the working solution for Python
import os
os.environ['JAEGER_AGENT_HOST'] = "123.XXX.YYY.ZZZ" # Specify remote Jaeger-Agent here
# os.environ['JAEGER_AGENT_HOST'] = "16686" # optional, default: "16686"
from jaeger_client import Config
config = Config(
config={
'sampler': {
'type': 'const',
'param': 1,
},
# ENSURE 'local_agent' is not specified
# 'local_agent': {
# # 'reporting_host': "127.0.0.1",
# # 'reporting_port': 16686,
# },
'logging': True,
},
service_name="your-service-name-here",
)
# create tracer object here and voila!
Guidance of Jaeger: https://www.jaegertracing.io/docs/1.33/getting-started/
Jaeger-Client features: https://www.jaegertracing.io/docs/1.33/client-features/
Flask-OpenTracing: https://github.com/opentracing-contrib/python-flask
OpenTelemetry-Python: https://opentelemetry.io/docs/instrumentation/python/getting-started/

Any command to get active namenode for nameservice in hadoop?

The command:
hdfs haadmin -getServiceState machine-98
Works only if you know the machine name. Is there any command like:
hdfs haadmin -getServiceState <nameservice>
which can tell you the IP/hostname of the active namenode?
To print out the namenodes use this command:
hdfs getconf -namenodes
To print out the secondary namenodes:
hdfs getconf -secondaryNameNodes
To print out the backup namenodes:
hdfs getconf -backupNodes
Note: These commands were tested using Hadoop 2.4.0.
Update 10-31-2014:
Here is a python script that will read the NameNodes involved in Hadoop HA from the config file and determine which of them is active by using the hdfs haadmin command. This script is not fully tested as I do not have HA configured. Only tested the parsing using a sample file based on the Hadoop HA Documentation. Feel free to use and modify as needed.
#!/usr/bin/env python
# coding: UTF-8
import xml.etree.ElementTree as ET
import subprocess as SP
if __name__ == "__main__":
hdfsSiteConfigFile = "/etc/hadoop/conf/hdfs-site.xml"
tree = ET.parse(hdfsSiteConfigFile)
root = tree.getroot()
hasHadoopHAElement = False
activeNameNode = None
for property in root:
if "dfs.ha.namenodes" in property.find("name").text:
hasHadoopHAElement = True
nameserviceId = property.find("name").text[len("dfs.ha.namenodes")+1:]
nameNodes = property.find("value").text.split(",")
for node in nameNodes:
#get the namenode machine address then check if it is active node
for n in root:
prefix = "dfs.namenode.rpc-address." + nameserviceId + "."
elementText = n.find("name").text
if prefix in elementText:
nodeAddress = n.find("value").text.split(":")[0]
args = ["hdfs haadmin -getServiceState " + node]
p = SP.Popen(args, shell=True, stdout=SP.PIPE, stderr=SP.PIPE)
for line in p.stdout.readlines():
if "active" in line.lower():
print "Active NameNode: " + node
break;
for err in p.stderr.readlines():
print "Error executing Hadoop HA command: ",err
break
if not hasHadoopHAElement:
print "Hadoop High-Availability configuration not found!"
Found this:
https://gist.github.com/cnauroth/7ff52e9f80e7d856ddb3
This works out of the box on my CDH5 namenodes, although I'm not sure other hadoop distributions will have http://namenode:50070/jmx available - if not, I think it can be added by deploying Jolokia.
Example:
curl 'http://namenode1.example.com:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus'
{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeStatus",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.NameNode",
"State" : "active",
"NNRole" : "NameNode",
"HostAndPort" : "namenode1.example.com:8020",
"SecurityEnabled" : true,
"LastHATransitionTime" : 1436283324548
} ]
So by firing off one http request to each namenode (this should be quick) we can figure out which one is the active one.
It's also worth noting that if you talk WebHDFS REST API to an inactive namenode you will get a 403 Forbidden and the following JSON:
{"RemoteException":{"exception":"StandbyException","javaClassName":"org.apache.hadoop.ipc.StandbyException","message":"Operation category READ is not supported in state standby"}}
In a High Availability Hadoop cluster, there will be 2 namenodes - one active and one standby.
To find the active namenode, we can try executing the test hdfs command on each of the namenodes and find the active name node corresponding to the successful run.
Below command executes successfully if the name node is active and fails if it is a standby node.
hadoop fs -test -e hdfs://<Name node>/
Unix script
active_node=''
if hadoop fs -test -e hdfs://<NameNode-1>/ ; then
active_node='<NameNode-1>'
elif hadoop fs -test -e hdfs://<NameNode-2>/ ; then
active_node='<NameNode-2>'
fi
echo "Active Dev Name node : $active_node"
You can do it in bash with hdfs cli calls, too. With the noted caveat that this takes a bit more time since it's a few calls to the API in succession, but this may be preferable to using a python script for some.
This was tested with Hadoop 2.6.0
get_active_nn(){
ha_name=$1 #Needs the NameServiceID
ha_ns_nodes=$(hdfs getconf -confKey dfs.ha.namenodes.${ha_name})
active=""
for node in $(echo ${ha_ns_nodes//,/ }); do
state=$(hdfs haadmin -getServiceState $node)
if [ "$state" == "active" ]; then
active=$(hdfs getconf -confKey dfs.namenode.rpc-address.${ha_name}.${node})
break
fi
done
if [ -z "$active" ]; then
>&2 echo "ERROR: no active namenode found for ${ha_name}"
exit 1
else
echo $active
fi
}
After reading all the existing answers none seemed to combine the three steps of:
Identifying the namenodes from the cluster.
Resolving the node names to host:port.
Checking the status of each node (without requiring
cluster admin privs).
Solution below combines hdfs getconf calls and JMX service call for node status.
#!/usr/bin/env python
from subprocess import check_output
import urllib, json, sys
def get_name_nodes(clusterName):
ha_ns_nodes=check_output(['hdfs', 'getconf', '-confKey',
'dfs.ha.namenodes.' + clusterName])
nodes = ha_ns_nodes.strip().split(',')
nodeHosts = []
for n in nodes:
nodeHosts.append(get_node_hostport(clusterName, n))
return nodeHosts
def get_node_hostport(clusterName, nodename):
hostPort=check_output(
['hdfs','getconf','-confKey',
'dfs.namenode.rpc-address.{0}.{1}'.format(clusterName, nodename)])
return hostPort.strip()
def is_node_active(nn):
jmxPort = 50070
host, port = nn.split(':')
url = "http://{0}:{1}/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus".format(
host, jmxPort)
nnstatus = urllib.urlopen(url)
parsed = json.load(nnstatus)
return parsed.get('beans', [{}])[0].get('State', '') == 'active'
def get_active_namenode(clusterName):
for n in get_name_nodes(clusterName):
if is_node_active(n):
return n
clusterName = (sys.argv[1] if len(sys.argv) > 1 else None)
if not clusterName:
raise Exception("Specify cluster name.")
print 'Cluster: {0}'.format(clusterName)
print "Nodes: {0}".format(get_name_nodes(clusterName))
print "Active Name Node: {0}".format(get_active_namenode(clusterName))
From java api, you can use HAUtil.getAddressOfActive(fileSystem).
You can do a curl command to find out the Active and secondary Namenode
for example
curl -u username -H "X-Requested-By: ambari" -X GET
http://cluster-hostname:8080/api/v1/clusters//services/HDFS
Regards
I found the below when i simply typed 'hdfs' and found a couple of helpful commands, which could be useful for someone who could maybe come here seeking for help.
hdfs getconf -namenodes
This above command will give you the service id of the namenode. Say, hn1.hadoop.com
hdfs getconf -secondaryNameNodes
This above command will give you the service id of the available secondary namenodes. Say , hn2.hadoop.com
hdfs getconf -backupNodes
This above command will get you the service id of backup nodes, if any.
hdfs getconf -nnRpcAddresses
This above command will give you info of name service id along with rpc port number. Say, hn1.hadoop.com:8020
You're Welcome :)
In HDFS 2.6.0 the one that worked for me
ubuntu#platform2:~$ hdfs getconf -confKey dfs.ha.namenodes.arkin-platform-cluster
nn1,nn2
ubuntu#platform2:~$ sudo -u hdfs hdfs haadmin -getServiceState nn1
standby
ubuntu#platform2:~$ sudo -u hdfs hdfs haadmin -getServiceState nn2
active
Here is example of bash code that returns active name node even if you do not have local hadoop installation.
It also works faster as curl calls are usually faster than hadoop.
Checked on Cloudera 7.1
#!/bin/bash
export nameNode1=myNameNode1
export nameNode2=myNameNode2
active_node=''
T1=`curl --silent --insecure -request GET https://${nameNode1}:9871/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus | grep "\"State\" : \"active\"" | wc -l`
if [ $T1 == 1 ]
then
active_node=${nameNode1}
else
T1=`curl --silent --insecure -request GET https://${nameNode2}:9871/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus | grep "\"State\" : \"active\"" | wc -l`
if [ $T1 == 1 ]
then
active_node=${nameNode2}
fi
fi
echo "Active Dev Name node : $active_node"
#!/usr/bin/python
import subprocess
import sys
import os, errno
def getActiveNameNode () :
cmd_string="hdfs getconf -namenodes"
process = subprocess.Popen(cmd_string, shell=True, stdout=subprocess.PIPE)
out, err = process.communicate()
NameNodes = out
Value = NameNodes.split(" ")
for val in Value :
cmd_str="hadoop fs -test -e hdfs://"+val
process = subprocess.Popen(cmd_str, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = process.communicate()
if (err != "") :
return val
def main():
out = getActiveNameNode()
print(out)
if __name__ == '__main__':
main()
You can simply use the below command. I have tested this in hadoop 3.0 You can check the reference here -
hdfs haadmin -getAllServiceState
It returns the state of all the NameNodes.
more /etc/hadoop/conf/hdfs-site.xml
<property>
<name>dfs.ha.namenodes.nameservice1</name>
<value>namenode1353,namenode1357</value>
</property>
hdfs#:/home/ubuntu$ hdfs haadmin -getServiceState namenode1353
active
hdfs#:/home/ubuntu$ hdfs haadmin -getServiceState namenode1357
standby

Send to github via curl command line (Windows)

I asked a related question and realized I wasn't asking the right question (i.e., this isn't about git).
The question is how to push a project to github without first creating the project in the clouds using R. Currently you can do this from the git command line in RStudio using info from this question.
Now I'm trying to make that into R code now from a Windows machine (Linux was easy). I'm stuck at the first step using curl from the command line via an R system call. I'll show what I have and then the error message (Thanks to SimonO101 for getting me this far.). Per his comments below I've edited heavily to reflect the problem as it:
R Code:
repo <- "New"
user <- "trinker"
password <- "password"
url <- "http://curl.askapache.com/download/curl-7.23.1-win64-ssl-sspi.zip"
tmp <- tempfile( fileext = ".zip" )
download.file(url,tmp)
unzip(tmp, exdir = tempdir())
system(paste0(tempdir(), "/curl http://curl.haxx.se/ca/cacert.pem -o " ,
tempdir() , "/curl-ca-bundle.crt"))
cmd1 <- paste0(tempdir(), "/curl -u '", user, ":", password,
"' https://api.github.com/user/repos -d '{\"name\":\"", repo, "\"}'")
system(cmd1)
cmd2 <- paste0(tempdir(), "/curl -k -u '", user, ":", password,
"' https://api.github.com/user/repos -d '{\"name\":\"", repo, "\"}'")
system(cmd2)
Error Messages (same for both approaches):
> system(cmd1)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 12 0 0 100 12 0 24 --:--:-- --:--:-- --:--:-- 30
100 47 100 35 100 12 65 22 --:--:-- --:--:-- --:--:-- 83{
"message": "Bad credentials"
}
I know all the files are there because:
> dir(tempdir())
[1] "curl-ca-bundle.crt" "curl.exe" "file1aec62fa980.zip" "file1aec758c1415.zip"
It can't be my password or user name because this works on Linux Mint (the only difference is the part before curl):
repo <- "New"
user <- "trinker"
password <- "password"
cmd1 <- paste0("curl -u '", user, ":", password,
"' https://api.github.com/user/repos -d '{\"name\":\"", repo, "\"}'")
system(cmd1)
NOTE: Windows 7 machine. R 2.14.1
EDIT - After OP offered bounty
Ok, it turns out it is to do with some crazy windows character escaping on the command line. Essentially the problem was we were passing improperly formatted json requests to github.
You can use shQuote to properly format the offending portion of the curl request for Windows. We can test platform type to see if we need to include special formatting for Windows cases like so:
repo <- "NewRepository"
json <- paste0(" { \"name\":\"" , repo , "\" } ") #string we desire formatting
os <- .Platform$OS.type #check if we are on Windows
if( os == "windows" ){
json <- shQuote(json , type = "cmd" )
cmd1 <- paste0( tempdir() ,"/curl -i -u \"" , user , ":" , password , "\" https://api.github.com/user/repos -d " , json )
}
This worked on my Windows 7 box without any problems. I can update the GitHub script if you want?
OLD ANSWER
I did some digging around here and here and it might be that the answer to your problem is to update the curl-ca-bundle. It may help on Windows to get R to use the internet2.dll.
repo <- "New"
user <- "trinker"
password <- "password"
url <- "http://curl.askapache.com/download/curl-7.23.1-win64-ssl-sspi.zip"
tmp <- tempfile( fileext = ".zip" )
download.file(url,tmp)
unzip(tmp, exdir = tempdir())
system( paste0( "curl http://curl.haxx.se/ca/cacert.pem -o " , tempdir() , "/curl-ca-bundle.crt" ) )
system( paste0( tempdir(),"/curl", " -u \'USER:PASS\' https://api.github.com/user/repos -d \'{\"name\":\"REPO\"}\'") )
Again, I can't test this as I don't have access to my Windows box, but updating the certificate authority file seems to have helped a few other people. From the curl website, the Windows version of curl should look for the curl-ca-bundle.crt file in the following order:
application's directory
current working directory
Windows System directory (e.g. C:\windows\system32)
Windows Directory (e.g. C:\windows)
all directories along %PATH%

How do I determine if an application is running using wsadmin Jython script?

I can get a list of instlled applications but how do I get the status using Jython?
I dont think there is any direct method to get the application running status, You can get the object from the AdminControl using the following code
serverstatus = AdminControl.completeObjectName('type=Application,name='your_application_name',*')
print serverstatus
If serverstatus returns null, then the application is not running, if the application is running then the details of the applications would be printed.
Here is what I use based on Snehan's answer.
import string
def getAppStatus(appName):
# If objectName is blank, then the application is not running.
objectName = AdminControl.completeObjectName('type=Application,name=' + appName + ',*')
if objectName == "":
appStatus = 'Stopped'
else:
appStatus = 'Running'
return appStatus
def appStatusInfo():
appsString = AdminApp.list()
appList = string.split(appsString, '\r\n')
print '============================'
print ' Status | Application '
print '============================'
# Print apps and their status
for x in appList:
print getAppStatus(x) + ' | ' + x
print '============================'
appStatusInfo()
Sample output
============================
Status | Application
============================
Running | DefaultApplication
Running | IBMUTC
Stopped | some-ear
Running | another-ear
============================
The following IBM documentation should help:
WAS InfoCenter: Querying the application state using wsadmin scripting
IBM Technote: Listing enterprise application status using wsadmin script
To summarize, if the application is running on an application server, an Application MBean will be registered. In order to determine if the application is running, you can query for the presence of these MBeans.
There is some more modification required in Matthieu, Cormier's script.
Here we go.
It will work in any line separator. Generally AdminApp.list() will use "\" as the line seperator
import string
def getAppStatus(appName):
# If objectName is blank, then the application is not running.
objectName = AdminControl.completeObjectName('type=Application,name='+ appName+',*')
if objectName == "":
appStatus = 'Stopped'
else:
appStatus = 'Running'
return appStatus
def appStatusInfo():
Apps=AdminApp.list().split(java.lang.System.getProperty("line.separator"))
print '============================'
print ' Status | Application '
print '============================'
# Print apps and their status
for x in Apps:
print "X value", x
print getAppStatus(x) + ' | ' + x
print '============================'
appStatusInfo()

Resources