Configuring Prometheus JMX exporter for Hadoop2 - hadoop

I am trying to scrape metrics from following Hadoop2 daemons running on ec2 instance using Prometheus JMX exporter:
hadoop namenode
hadoop datanode
yarn resourcemanager
yarn nodemanager
I am trying to run JMX exporter as a java agent with all the four daemons. For this I have added EXTRA_JAVA_OPTS in hadoop-env.sh and yarn-env.sh :
export HADOOP_NAMENODE_OPTS="$HADOOP_NAMENODE_OPTS -javaagent:/home/ec2-user/jmx_exporter/jmx_prometheus_javaagent-0.10.jar=9102:/home/ec2-user/jmx_exporter/prometheus_config.yml"
export HADOOP_DATANODE_OPTS="$HADOOP_DATANODE_OPTS -javaagent:/home/ec2-user/jmx_exporter/jmx_prometheus_javaagent-0.10.jar=9102:/home/ec2-user/jmx_exporter/prometheus_config.yml"
export YARN_RESOURCEMANAGER_OPTS="$YARN_RESOURCEMANAGER_OPTS -javaagent:/home/ec2-user/jmx_exporter/jmx_prometheus_javaagent-0.10.jar=9102:/home/ec2-user/jmx_exporter/prometheus_config.yml"
export YARN_NODEMANAGER_OPTS="$YARN_NODEMANAGER_OPTS -javaagent:/home/ec2-user/jmx_exporter/jmx_prometheus_javaagent-0.10.jar=9102:/home/ec2-user/jmx_exporter/prometheus_config.yml"
Sample prometheus_config.yml for a resourcemanager metric NumAllSources is as follows :
rules:
- pattern: Hadoop<service=ResourceManager, name=MetricsSystem, sub=Stats><>NumAllSources
name: sources
labels:
app_id: "hadoop_rm"
I am getting the following exception when I restart the resourcemanager or other daemons with new configs and java_opts :
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:382)
at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:397)
Caused by: java.lang.IllegalArgumentException: Collector already registered that provides name: jmx_scrape_duration_seconds
at io.prometheus.jmx.shaded.io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:54)
at io.prometheus.jmx.shaded.io.prometheus.client.Collector.register(Collector.java:128)
Any suggestions how to fix this?

While #chanhou's solution will work, I wanted to keep my edits in hadoop-env.sh, so I went with
if ! grep -q <<<"$HADOOP_NAMENODE_OPTS" jmx_prometheus_javaagent; then
HADOOP_NAMENODE_OPTS="$HADOOP_NAMENODE_OPTS -javaagent:/home/caesarli/platform/jmx_prometheus_javaagent-0.12.0.jar=11099:/home/caesarli/platform/hadoop-2.8.4/etc/hadoop/jmx-name.yaml"
fi
and similar for the HADOOP_DATANODE_OPTS.

That is because -javaagent opts is declare multiple times in $HADOOP_OPTS when you call /usr/local/hadoop/sbin/hadoop-daemon.sh start datanode and hadoop-daemon.sh will eventually call /usr/local/hadoop/bin/hdfs to start related service.
During the process, it will source multiple times of hadoop-config.sh and if you echo $HADOOP_OPTS in the shell script /usr/local/hadoop/bin/hdfs then you will find multiple -javaagent there.
A workaround is declare HADOOP_OPTS=$HADOOP_OPTS -javaagent:... in /usr/local/hadoop/bin/hdfs to ensure only one -javaagent appear in HADOOP_OPTS

I think this is because you are using same ports (9102) for all the registrations, changing ports will help.

Related

What is this error on spark-submit by HDFS HA yarn

here is my error log:
$ /spark-submit --master yarn --deploy-mode cluster pi.py
...
2021-12-23 01:31:04,330 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1954)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1442)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1895)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:860)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
, while invoking ClientNamenodeProtocolTranslatorPB.setPermission over master/172.17.0.2:8020. Trying to failover immediately.
...
Why I get this erorr??
NOTE. Spark master is run 'master', so spark-submit command run in 'master'
NOTE. Spark worker is run 'worker1' and 'worker2' and 'worker3'
NOTE. ResourceManager run in 'master' and 'master2'
ADD. When print above error log, master2's DFSZKFailoverController is disappeard to jps command result.
ADD. When print above error log, master's Namenode is disappeard to jps command result.
It happens when Spark is unable to access HDFS.
If configured correctly HDFS client will handle the StandbyException by attempting to fail itself over to the other NameNode in the HA, and then it will reattempt the operation.
Replace active Namenode URI manually and check if you are still having the same error, if not HA is not properly configured.

Hbase shell gives NativeException: java.lang.ExceptionInInitializerError

I have configure hbase on my local machine, below are my jsp task
$ jps
17389 HQuorumPeer
16554 TaskTracker
17894 Jps
16362 JobTracker
15786 NameNode
16078 DataNode
16267 SecondaryNameNode
But when I hit
$ hbase shell
It gives me following error
NativeException: java.lang.ExceptionInInitializerError:
java.lang.reflect.InvocationTargetException
initialize at /home/rahul/hbase-1.2.4/lib/ruby/hbase/hbase.rb:42
(root) at /home/rahul/hbase-1.2.4/bin/hirb.rb:131
Can any one help me to solve this error.I have wasted several hours to solve this error. Help is really appreciated.
Unfortunately this error is very generic and can occur for a number of reasons. I recently experienced this using the hbase command on version HBase 1.2.0-cdh5.16.1 when the wrong URI was configured in core-site.xml and hbase-site.xml (fs.defaultFS and hbase.rootdir respectively). The only way I diagnosed this was to try connecting programmatically via the Java API (e.g. by following https://www.baeldung.com/hbase), which gave me the full stack trace of the exception that caused the NativeException.

Running Spark on Yarn Client

I have recently setup an Multinode Hadoop HA (Namenode & ResourceManager) Cluster (3 node) , The installation is completed and all daemon's run as expected
Daemon in NN1 :
2945 JournalNode
3137 DFSZKFailoverController
6385 Jps
3338 NodeManager
22730 QuorumPeerMain
2747 DataNode
3228 ResourceManager
2636 NameNode
Daemon in NN2 :
19620 Jps
3894 QuorumPeerMain
16966 ResourceManager
16808 NodeManager
16475 DataNode
16572 JournalNode
17101 NameNode
16702 DFSZKFailoverController
Daemon in DN1 :
12228 QuorumPeerMain
29060 NodeManager
28858 DataNode
29644 Jps
28956 JournalNode
I am interested to run Spark Jobs on my Yarn setup.
I have installed Scala and Spark on my NN1 and i can successfully start my spark by issuing the following command
$ spark-shell
Now , i have no knowledge about SPARK , i would like to know how can i run Spark on Yarn. I have read that we can run it as either yarn-client or yarn-cluster.
Should i install the spark & scala on all nodes in the Cluster (NN2 & DN1) to run spark on Yarn client or cluster ? If No then how can i submit the Spark Jobs from NN1 (Primary namenode) host.
I have copied over the Spark assembly JAR to the HDFS as suggested in a blog i read ,
-rw-r--r-- 3 hduser supergroup 187548272 2016-04-04 15:56 /user/spark/share/lib/spark-assembly.jar
Also created SPARK_JAR variable in my bashrc file.I tried to submit the Spark Job as yarn-client but i end up with error as below , I have no idea on if i am doing it all correct or need other settings to be done first.
[hduser#ptfhadoop01v spark-1.6.0]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 4g --executor-memory 2g --executor-cores 2 --queue thequeue lib/spark-examples*.jar 10
16/04/04 17:27:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/04 17:27:51 WARN SparkConf:
SPARK_WORKER_INSTANCES was detected (set to '2').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --num-executors to specify the number of executors
- Or set SPARK_EXECUTOR_INSTANCES
- spark.executor.instances to configure the number of instances in the spark config.
16/04/04 17:27:54 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
16/04/04 17:27:54 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
16/04/04 17:27:57 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/04/04 17:27:58 WARN MetricsSystem: Stopping a MetricsSystem that is not running
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[hduser#ptfhadoop01v spark-1.6.0]$
Please help me to resolve this and on how to run Spark on Yarn as client or as Cluster mode.
Now , i have no knowledge about SPARK , i would like to know how can i run Spark on Yarn. I have read that we can run it as either yarn-client or yarn-cluster.
It's highly recommended that you read the official documentation of Spark on YARN at http://spark.apache.org/docs/latest/running-on-yarn.html.
You can use spark-shell with --master yarn to connect to YARN. You need to have proper configuration files on the machine you do spark-shell from, e.g. yarn-site.xml.
Should i install the spark & scala on all nodes in the Cluster (NN2 & DN1) to run spark on Yarn client or cluster ?
No. You don't have to install anything on YARN since Spark will distribute necessary files for you.
If No then how can i submit the Spark Jobs from NN1 (Primary namenode) host.
Start with spark-shell --master yarn and see if you can execute the following code:
(0 to 5).toDF.show
If you see a table-like output, you're done. Else, provide the error(s).
Also created SPARK_JAR variable in my bashrc file.I tried to submit the Spark Job as yarn-client but i end up with error as below , I have no idea on if i am doing it all correct or need other settings to be done first.
Remove the SPARK_JAR variable. Don't use it as it's not needed and might cause troubles. Read the official documentation at http://spark.apache.org/docs/latest/running-on-yarn.html to understand the basics of Spark on YARN and beyond.
By adding this property into hdfs-site.xml , it solved the issue
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
In the client mode you'd run it something like below for simple word count example
spark-submit --class org.sparkexample.WordCount --master yarn-client wordcount-sample-plain-1.0-SNAPSHOT.jar input.txt output.txt
I think you got the spark-submit command wrong there. There is no --master yarn set up.
I would highly recommend using an automated provisioning tool to set up your cluster quickly instead of a manual approach.
Refer to Cloudera or Hortonworks tools. You can use it to get setup in no time and be able to submit jobs easily without doing all these configurations manually.
Reference: https://hortonworks.com/products/hdp/

Job via Oozie HDP 2.1 not creating job.splitmetainfo

When trying to execute a sqoop job which has my Hadoop program passed as a jar file in -jarFiles parameter, the execution blows off with below error. Any resolution seems to be not available. Other jobs with same Hadoop user is getting executed successfully.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/user/root/.staging/job_1423050964699_0003/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1541)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1396)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1363)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:976)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:135)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1241)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1041)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1452)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1448)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1381)
So here is the way I solved it. We are using CDH5 to run Camus to pull data from kafka. We run CamusJob which is responsible for getting data from kafka using comman line:
hadoop jar...
The problem is that new hosts didn't get so-called "yarn-gateway". Cloudera names pack of configs related to service and copied to /etc/hadoop/conf
as "gateway". So I just clicked "deploy client configuration" in CM UI. YARN client conf has been copied to each YARN NodeManager node and it solved problem.

hbase-master fails to start

I am trying to set up a Hadoop development environment. I am using CDH4 and following the installation instructions in their website https://ccp.cloudera.com/display/CDH4DOC/.
I got to the point in which I was able to install CDH4 in pseudo-distributed mode and I am following the part regarding "Components that require additional configuration".
I have installed HBase-master package, but when I try to start the service I am getting the following error:
$ sudo /sbin/service hbase-master start
starting master, logging to /var/log/hbase/hbase-hbase-master-slc01euu.out
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
I suposse that it has something to do with some env variable (i believe HADOOP_HOME). But I am not sure where to look at since all the previous processes (name node, data node, job tracker,task tracker) started with no problem.
When I search for HADOOP_HOME variable it says that it is undefined.
Do you guys have any idea about how I could solve this?
Thanks a lot in advance.
Please set HADOOP_HOME to your hadoop installation directory
eg: export HADOOP_HOME=/usr/hadoop
Add a pointer to your HADOOP_CONF_DIR to the HBASE_CLASSPATH environment variable in hbase-env.sh.
Add a copy of hdfs-site.xml (and core-site.xml) in under ${HBASE_HOME}/conf folder

Resources