worker is getting restarted continuously with closedchannel exception in supervisor - apache-storm

worker which is in one of the supervisor is getting restarted continuously and getting Closedchannel exception . But if run the same topology in another storm cluster which is in another environment , it is running without giving any errors.
Below is the error i can see from Storm UI.
java.lang.RuntimeException: java.nio.channels.ClosedChannelException at org.apache.storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:103) at org.apache.storm.kafka.ZkCoordinator.getMyManagedPartitions(ZkCoordinator.java:69) at org.apache.storm.kafka.KafkaSpout.nextTuple(KafkaSpout.java:129) at org.apache.storm.daemon.executor$fn__7990$fn__8005$fn__8036.invoke(executor.clj:648) at org.apache.storm.util$async_loop$fn__624.invoke(util.clj:484) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException at kafka.network.BlockingChannel.send(BlockingChannel.scala:100) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:78) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68) at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:127) at kafka.javaapi.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:79) at org.apache.storm.kafka.KafkaUtils.getOffset(KafkaUtils.java:75) at org.apache.storm.kafka.KafkaUtils.getOffset(KafkaUtils.java:65) at org.apache.storm.kafka.PartitionManager.(PartitionManager.java:94) at org.apache.storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:98) ... 6 mo
Can any one please help me to find out the exact issue.Please let me know if need any more information.

I faced this issue and problem was that ZooKeeper host names not being resolved from worker host.

Related

After HDP cluster kerberized Journal node service having issue while starting

Below is the error after cluster kerberized.
Exception in thread "main" java.io.IOException: Login failure for jn/keystone.mwbsys.com#EXAMPLE.COM from keytab /etc/security/keytabs/jn.service.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user
added keystone.mwbsys.com in /etc/hosts file
and then restarted the journal nodes its fixed the issue. I know this is not the permenent solution but it worked.

Debugging Spark standalone cluster with idea

I am trying to debug a Spark Application on a local cluster using a master and a worker nodes. I have been successful at setting up the master node and worker nodes using Spark standalone cluster manager with start-master.sh and it works.But I want to how Spark Application works in the spark cluster, so I want to start the cluster in debug mode. I read the start-master.sh codeļ¼Œ mock the args and start org.apache.spark.deploy.master.Master main method.Unfortunately it gets a NoClassDefFoundError,I can't open the webui. I want to know where the problem is.
The Error is :
Exception in thread "dispatcher-event-loop-1" java.lang.NoClassDefFoundError: org/eclipse/jetty/util/thread/ThreadPool
at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:81)
at org.apache.spark.deploy.master.ui.MasterWebUI.initialize(MasterWebUI.scala:48)
at org.apache.spark.deploy.master.ui.MasterWebUI.<init>(MasterWebUI.scala:43)
at org.apache.spark.deploy.master.Master.onStart(Master.scala:131)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:122)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:216)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.eclipse.jetty.util.thread.ThreadPool
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
my debug configurations is:
enter image description here
Thanks!
I would suggest not to even use a spark standalone cluster for debugging.
You can run spark locally in the your IDE with breakpoints.
Spark provides you option to run locally pointing to local filesystem as HDFS.
Please follow the following link to know more about how to write test cases for local mode in spark
http://bytepadding.com/big-data/spark/word-count-in-spark/

Hbase shell gives NativeException: java.lang.ExceptionInInitializerError

I have configure hbase on my local machine, below are my jsp task
$ jps
17389 HQuorumPeer
16554 TaskTracker
17894 Jps
16362 JobTracker
15786 NameNode
16078 DataNode
16267 SecondaryNameNode
But when I hit
$ hbase shell
It gives me following error
NativeException: java.lang.ExceptionInInitializerError:
java.lang.reflect.InvocationTargetException
initialize at /home/rahul/hbase-1.2.4/lib/ruby/hbase/hbase.rb:42
(root) at /home/rahul/hbase-1.2.4/bin/hirb.rb:131
Can any one help me to solve this error.I have wasted several hours to solve this error. Help is really appreciated.
Unfortunately this error is very generic and can occur for a number of reasons. I recently experienced this using the hbase command on version HBase 1.2.0-cdh5.16.1 when the wrong URI was configured in core-site.xml and hbase-site.xml (fs.defaultFS and hbase.rootdir respectively). The only way I diagnosed this was to try connecting programmatically via the Java API (e.g. by following https://www.baeldung.com/hbase), which gave me the full stack trace of the exception that caused the NativeException.

Failed to get broadcast_1_piece0 of broadcast_1 in Spark Streaming job

I am running spark jobs on yarn in cluster mode. The job get the messages from kafka direct stream. I am using broadcast variables and checkpointing every 30 seconds. When I start the job first time it runs fine without any issue. If I kill the job and restart it throws below exception in executor upon receiving a message from kafka:
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_1_piece0 of broadcast_1
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1178)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at net.juniper.spark.stream.LogDataStreamProcessor$2.call(LogDataStreamProcessor.java:177)
at net.juniper.spark.stream.LogDataStreamProcessor$2.call(LogDataStreamProcessor.java:1)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$1$1.apply(JavaDStreamLike.scala:172)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$1$1.apply(JavaDStreamLike.scala:172)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$28.apply(RDD.scala:1298)
at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$28.apply(RDD.scala:1298)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Does anyone have idea how to resolve this error?
Spark version: 1.5.0
CDH 5.5.1
When encountering issues where only the first run works, it always resulted in issues revolving the checkpoint data. Moreover, the use of checkpoints only happens when there is something to check, which is the first message from kafka.
I suggest you check if you the job is indeed dead, that is, maybe the process is still running on the machine that executed it.
try running a simple ps -fe and see if something is still running. if there are 2 processes trying to use the same checkpoint folder, it will always fail.
hope this helps

Storm-YARN : Application container fails to launch

I am running a storm (trident) topology that reads avro from kafka & writes the records in hbase.
The topology is running as expected in Localcluster mode, but while using Stormsubmitter I'm facing below issues.
In Distributed Hadoop mode I'm getting the below error [1] while launching the YARN application.
In Hadoop (local mode, with 1 box only) Yarn is spawnning the nimbus server and storm-ui. But there are no supervisor(s) running to run the spout/bolts in the topology. I guess the reason might be insufficient memory (4G to run the topology + hbase, hdfs, kafka, zookeeper etc...).
Can you help me out in understanding the reason of this container failure? There are no errors/info present in application logs.
[1] YARN container fails to launch with below error on running.
storm-yarn launch /homext/storm-yarn.yml --queue default -appname storm-yarn-demo --stormZip /tmp/storm-0.9.zip
Application application_1415038356032_0304 failed 2 times due to AM Container for appattempt_1415038356032_0304_000002 exited with exitCode: 127 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 127
.Failing this attempt.. Failing the application.
This log is insufficient to diagnose. All it says is that the container failed to launch. You should look into the container output. Check the ${yarn.nodemanager.log-dirs} on the nodes, there will be an application folder (application_1415038356032_0304) and in there there will be a container folder for each attempt (...1415038356032_0304_000002) containing the stderr, stdout and syslog of this attempt. Read those and you'll likely identify the problem.
If these don't exist, look in ${yarn.nodemanager.local-dirs} you'll find the container launch script (I thinks is called container-launch.sh) for this app/container attempt. In it will be the actual command to launch the container. Try to run that from the shell prompt and see what you get.
If it fails at an early stage then the logs can be found in HDFS under:
/tmp/logs/<user>/logs/
This should give enough information to diagnose the problem.
In my case I found a log file:
/tmp/logs/hdfs/logs/application_1426618997634_0004/vagrant-cdh-node4_8041
With some errors like:
/bin/bash: /usr/lib/jvm/java-7-oracle/bin/java: No such file or directory
And fixing the JAVA_HOME environment variable did the trick.

Resources