What is this error on spark-submit by HDFS HA yarn - bash

here is my error log:
$ /spark-submit --master yarn --deploy-mode cluster pi.py
...
2021-12-23 01:31:04,330 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1954)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1442)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1895)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:860)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
, while invoking ClientNamenodeProtocolTranslatorPB.setPermission over master/172.17.0.2:8020. Trying to failover immediately.
...
Why I get this erorr??
NOTE. Spark master is run 'master', so spark-submit command run in 'master'
NOTE. Spark worker is run 'worker1' and 'worker2' and 'worker3'
NOTE. ResourceManager run in 'master' and 'master2'
ADD. When print above error log, master2's DFSZKFailoverController is disappeard to jps command result.
ADD. When print above error log, master's Namenode is disappeard to jps command result.

It happens when Spark is unable to access HDFS.
If configured correctly HDFS client will handle the StandbyException by attempting to fail itself over to the other NameNode in the HA, and then it will reattempt the operation.
Replace active Namenode URI manually and check if you are still having the same error, if not HA is not properly configured.

Related

Hbase shell gives NativeException: java.lang.ExceptionInInitializerError

I have configure hbase on my local machine, below are my jsp task
$ jps
17389 HQuorumPeer
16554 TaskTracker
17894 Jps
16362 JobTracker
15786 NameNode
16078 DataNode
16267 SecondaryNameNode
But when I hit
$ hbase shell
It gives me following error
NativeException: java.lang.ExceptionInInitializerError:
java.lang.reflect.InvocationTargetException
initialize at /home/rahul/hbase-1.2.4/lib/ruby/hbase/hbase.rb:42
(root) at /home/rahul/hbase-1.2.4/bin/hirb.rb:131
Can any one help me to solve this error.I have wasted several hours to solve this error. Help is really appreciated.
Unfortunately this error is very generic and can occur for a number of reasons. I recently experienced this using the hbase command on version HBase 1.2.0-cdh5.16.1 when the wrong URI was configured in core-site.xml and hbase-site.xml (fs.defaultFS and hbase.rootdir respectively). The only way I diagnosed this was to try connecting programmatically via the Java API (e.g. by following https://www.baeldung.com/hbase), which gave me the full stack trace of the exception that caused the NativeException.

How to submit a spark job on a remote master node in yarn client mode?

I need to submit spark apps/jobs onto a remote spark cluster. I have currently spark on my machine and the IP address of the master node as yarn-client. Btw my machine is not in the cluster.
I submit my job with this command
./spark-submit --class SparkTest --deploy-mode client /home/vm/app.jar
I have the address of my master hardcoded into my app in the form
val spark_master = spark://IP:7077
And yet all I get is the error
16/06/06 03:04:34 INFO AppClient$ClientEndpoint: Connecting to master spark://IP:7077...
16/06/06 03:04:34 WARN AppClient$ClientEndpoint: Failed to connect to master IP:7077
java.io.IOException: Failed to connect to /IP:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /IP:7077
Or instead if I use
./spark-submit --class SparkTest --master yarn --deploy-mode client /home/vm/test.jar
I get
Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:251)
at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:228)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Do I really need to have hadoop configured as well in my workstation? All the work will be done remotely and this machine is not part of the cluster.
I am using Spark 1.6.1.
First of all, if you are setting conf.setMaster(...) from your application code, it takes highest precedence (over the --master argument). If you want to run in yarn client mode, do not use MASTER_IP:7077 in application code. You should supply hadoop client config files to your driver in the following way.
You should set environment variable HADOOP_CONF_DIR or YARN_CONF_DIR to point to the directory which contains the client configurations.
http://spark.apache.org/docs/latest/running-on-yarn.html
Depending upon which hadoop features you are using in your spark application, some of the config files will be used to lookup configuration. If you are using hive (through HiveContext in spark-sql), it will look for hive-site.xml. hdfs-site.xml will be used to lookup coordinates for NameNode reading/writing to HDFS from your job.

Running Spark on Yarn Client

I have recently setup an Multinode Hadoop HA (Namenode & ResourceManager) Cluster (3 node) , The installation is completed and all daemon's run as expected
Daemon in NN1 :
2945 JournalNode
3137 DFSZKFailoverController
6385 Jps
3338 NodeManager
22730 QuorumPeerMain
2747 DataNode
3228 ResourceManager
2636 NameNode
Daemon in NN2 :
19620 Jps
3894 QuorumPeerMain
16966 ResourceManager
16808 NodeManager
16475 DataNode
16572 JournalNode
17101 NameNode
16702 DFSZKFailoverController
Daemon in DN1 :
12228 QuorumPeerMain
29060 NodeManager
28858 DataNode
29644 Jps
28956 JournalNode
I am interested to run Spark Jobs on my Yarn setup.
I have installed Scala and Spark on my NN1 and i can successfully start my spark by issuing the following command
$ spark-shell
Now , i have no knowledge about SPARK , i would like to know how can i run Spark on Yarn. I have read that we can run it as either yarn-client or yarn-cluster.
Should i install the spark & scala on all nodes in the Cluster (NN2 & DN1) to run spark on Yarn client or cluster ? If No then how can i submit the Spark Jobs from NN1 (Primary namenode) host.
I have copied over the Spark assembly JAR to the HDFS as suggested in a blog i read ,
-rw-r--r-- 3 hduser supergroup 187548272 2016-04-04 15:56 /user/spark/share/lib/spark-assembly.jar
Also created SPARK_JAR variable in my bashrc file.I tried to submit the Spark Job as yarn-client but i end up with error as below , I have no idea on if i am doing it all correct or need other settings to be done first.
[hduser#ptfhadoop01v spark-1.6.0]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 4g --executor-memory 2g --executor-cores 2 --queue thequeue lib/spark-examples*.jar 10
16/04/04 17:27:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/04 17:27:51 WARN SparkConf:
SPARK_WORKER_INSTANCES was detected (set to '2').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --num-executors to specify the number of executors
- Or set SPARK_EXECUTOR_INSTANCES
- spark.executor.instances to configure the number of instances in the spark config.
16/04/04 17:27:54 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
16/04/04 17:27:54 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
16/04/04 17:27:57 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/04/04 17:27:58 WARN MetricsSystem: Stopping a MetricsSystem that is not running
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[hduser#ptfhadoop01v spark-1.6.0]$
Please help me to resolve this and on how to run Spark on Yarn as client or as Cluster mode.
Now , i have no knowledge about SPARK , i would like to know how can i run Spark on Yarn. I have read that we can run it as either yarn-client or yarn-cluster.
It's highly recommended that you read the official documentation of Spark on YARN at http://spark.apache.org/docs/latest/running-on-yarn.html.
You can use spark-shell with --master yarn to connect to YARN. You need to have proper configuration files on the machine you do spark-shell from, e.g. yarn-site.xml.
Should i install the spark & scala on all nodes in the Cluster (NN2 & DN1) to run spark on Yarn client or cluster ?
No. You don't have to install anything on YARN since Spark will distribute necessary files for you.
If No then how can i submit the Spark Jobs from NN1 (Primary namenode) host.
Start with spark-shell --master yarn and see if you can execute the following code:
(0 to 5).toDF.show
If you see a table-like output, you're done. Else, provide the error(s).
Also created SPARK_JAR variable in my bashrc file.I tried to submit the Spark Job as yarn-client but i end up with error as below , I have no idea on if i am doing it all correct or need other settings to be done first.
Remove the SPARK_JAR variable. Don't use it as it's not needed and might cause troubles. Read the official documentation at http://spark.apache.org/docs/latest/running-on-yarn.html to understand the basics of Spark on YARN and beyond.
By adding this property into hdfs-site.xml , it solved the issue
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
In the client mode you'd run it something like below for simple word count example
spark-submit --class org.sparkexample.WordCount --master yarn-client wordcount-sample-plain-1.0-SNAPSHOT.jar input.txt output.txt
I think you got the spark-submit command wrong there. There is no --master yarn set up.
I would highly recommend using an automated provisioning tool to set up your cluster quickly instead of a manual approach.
Refer to Cloudera or Hortonworks tools. You can use it to get setup in no time and be able to submit jobs easily without doing all these configurations manually.
Reference: https://hortonworks.com/products/hdp/

Job via Oozie HDP 2.1 not creating job.splitmetainfo

When trying to execute a sqoop job which has my Hadoop program passed as a jar file in -jarFiles parameter, the execution blows off with below error. Any resolution seems to be not available. Other jobs with same Hadoop user is getting executed successfully.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/user/root/.staging/job_1423050964699_0003/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1541)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1396)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1363)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:976)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:135)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1241)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1041)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1452)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1448)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1381)
So here is the way I solved it. We are using CDH5 to run Camus to pull data from kafka. We run CamusJob which is responsible for getting data from kafka using comman line:
hadoop jar...
The problem is that new hosts didn't get so-called "yarn-gateway". Cloudera names pack of configs related to service and copied to /etc/hadoop/conf
as "gateway". So I just clicked "deploy client configuration" in CM UI. YARN client conf has been copied to each YARN NodeManager node and it solved problem.

ZooKeeper exists failed after 3 retries

I am running Hadoop-1.2.1 and HBase-0.94.11 in a pseudo-distributed mode.
Due to power failure Hadoop and HBase set up went down.Next time when I restarted my machine and the pseudo-distribution set up, HBase stopped working with the following errors on HBase shell:
13/11/27 13:53:27 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
13/11/27 13:53:27 WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:450)
at org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
at org.apache.hadoop.hbase.zookeeper.ClusterId.getId(ClusterId.java:50)
at org.apache.hadoop.hbase.zookeeper.ClusterId.hasId(ClusterId.java:44)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.ensureZookeeperTrackers(HConnectionManager.java:720)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:789)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:129)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
Following are the processes :
hduser#user-ubuntu:~$ jps
16914 NameNode
19955 Jps
29460 Main
17728 TaskTracker
19776 HMaster
17490 JobTracker
17392 SecondaryNameNode
Are you sure your Zookeeper process is running (your jps listing doesn't show an entry for QuorumPeerMain)? The jps stack may not show all java processes running - try using a ps axww | grep QuorumPeerMain.
If your zookeeper refuses to start, check its logs to see if there are some stack trace clues
It's straightforward the zookeeper quorum process is not running - if it has been, there'd have been another java process:
hduser#user-ubuntu:~$ jps
16914 NameNode
19955 Jps
29460 Main
17728 TaskTracker
19776 HMaster
17490 JobTracker
17392 SecondaryNameNode
xxxxx HQuorumPeer
Zookeeper is required for HBase cluster - as it manages it.
Possible solutions:
By default HBase manages zookeeper itself i.e. starting and stopping the zookeeper quorum (the cluster of zookeeper nodes) - to verify the settings look into the file conf/hbase-evn.sh (in your hbase directory) there must be a line:
export HBASE_MANAGES_ZK=true
Basically tells HBase whether it should manage its own instance of Zookeeper or not. In case it is set to false, edit to true.
Also verify the HBase conf at conf/hbase-site.xml,
The minimum conf that should work for pseudo-distributed mode is:
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/<yourusername>/zookeeper</value>
</property>
</configuration>
Now stop the HBase, if it's been running:
$ ./bin/stop-hbase.sh
make the neccessary changes and start it again:
$ ./bin/start-hbase.sh
Answers you may find helpful:1 2

Resources