Apache Strom upgrade from 1.0.3 to 2.2.0 and not all workers are used - apache-storm

I am upgrading from Apache Storm 1.0.3 to 2.2.0 and facing a peculiar issue where the topology tasks are not running on all the workers and is only running on the same worker as the Spout.
I am using shuffleGrouping and have configured 6 workers and a total of 51 executors among 2 bolts and 1 spout. When I start the topology all 6 the workers are started as expected but only the one worker with the spout is doing all the work.
Do I need any special configuration for 2.2.0, all my topologies are working fine with version 1.0.3
As a test I am also running the ExclamationTopology from Strom-starter but seeing the same issue of only one worker doing all the work.
Thanks in advance for any help.
Regards,

When upgrading to storm 2.0.0 I was baffled by the behavior you have observed as well. The reason lies within the shuffleGrouping which you are using. According to the storm performance guide:
Load Aware messaging
When load aware messaging is enabled (default), shuffle grouping takes additional factors into consideration for message routing. Impact of this on performance is dependent on the topology and its deployment footprint (i.e. distribution over process and machines). Consequently it is useful to assess the impact of setting topology.disable.loadaware.messaging to true or false for your specific case.
This leads to the spout delegating all tuples to the same worker it finds itself on. To fix it and to go back to the old behavior, in your topology, set topology.disable.loadaware.messaging to true, e.g. with
conf.put("topology.disable.loadaware.messaging", true);

Related

heron Topologies keep running after deactivating

I'm currently working on Heron & Apache Storm for some resource management and scheduling Research.
I noticed that after submitting topologies to Heron, they start running and taking resources, but after deactivating them, it appears that they are still running in the background and take 100% of CPU and RAM! Am i missing something? The way i understand it and based on heron docs, deactivating topologies should halt them and stops them from processing new tuples,
deactivate the topology. Once deactivated, the topology will stop processing but remain running in the cluster.
But when i check the heron-ui after deactivation, it's still processing new tuples, because emit count keeps changing! but when i kill them, everything goes back to normal! Is it normal? And if not, what's the problem?
You can try to update the version of Heron to clarify the problem. I have ran the Heron with 0.17.1 and 0.17.5, there is no problem about this.
Deactivating a topology stops spouts/sources from pulling any new data, but the bolts will continue to process until all pending data has been drained.

Worker node execution in Apach-Strom

Storm topology is been deployed using Storm command on machine X. Worker nodes are running on Machine Y.
Once topology has been deployed, this is ready to process tuples and workers are processing request and response.
Can anyone please suggest that how do Worker node identify work and data, as I am not sure how worker node has access of code which is not at all deployed by developer?
If code to topology is accessible to Worker Nodes, can you please where is the location of this and also suggest execution of Worker nodes?
One, your asking a fairly complex question. I've been using Storm for awhile and don't understand much about how it works internally. Here is a good article talking about the internals of Storm. It's over two years old but should still be highly relevant. I believe that Netty is now used as the internal messaging transport, it's mentioned as being experimental in the article.
As far as code being run on worker nodes, there is an configuration in storm.yaml,
storm.local.dir
When uploading the Topology, I believe it copies the jar to that location. So every different worker machine will have the necessary jar in it's configured storm.local.dir. So even though you only upload the one machine, Storm will distributed it to the necessary workers. (That's from memory and I'm not in a spot to test it at the moment. )

Apache Storm - Nimbus, Supervisors, Workers getting stopped silently

I am using Apache Storm 0.9.5 version and Java 1.7
I am facing below issue.
There is a sudden death of all STORM processes happens.
I ran the topology for once and observed for 1 or 2 days without sending any data.
After then when I see the processes, they will be not running.
Also I have set the -XX:MaxPermSize=512m in Storm.yaml in all the nodes for nimbus, supervisor and workers.
But When I see the GC logs, it is saying
PSPermGen total 27136K, used 26865K [0x0000000760000000, 0x0000000761a80000, 0x0000000780000000)
object space 27136K, 99% used [0x0000000760000000,0x0000000761a3c480,0x0000000761a80000)
It is just 27MB alloted for PermGen space. Is STORM not taking 512MB of ram?
Please let me know why there is a sudden death seen for all these processes.
Thank you.
Added a monitoring process"supervisord" to monitor the master nimbus and supervisros. This way I made, required processes to be always UP and running.
Since Storm fall under fail-fast design category, a separate monitoring process is required to have 24/7 HA support for nimbus and supervisor processes.

Spark Streaming: What are things we should monitor to keep the streaming running?

I have a spark project running on 4 Core 16GB (both master/worker) instance, now can anyone tell me what are all the things to keep monitoring so that my cluster/jobs will never go down?
I have created a small list which includes the following items, please extend the list if you know more:
Monitor Spark Master/Worker from failing
Monitor HDFS from getting filled/going down
Monitor network connectivity for master/worker
Monitor Spark Jobs from getting killed
That's a good list. But in addition to those I would actually monitor the status of the receivers of the streaming application (assuming you are some non-HDFS source of data), whether they are connected or not. Well, to be honest, this was tricky to do with older versions of Spark Streaming as the instrumentation to get the receiver status didnt quite exist. However, with Spark 1.0 (to be released very soon), you can use the org.apache.spark.streaming.StreamingListener interface to get the events regarding the status of the receiver.
A sneak peak to the to-be-released Spark 1.0 docs is at
http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/streaming-programming-guide.html

how YARN manages endless jobs like Storm

Couple of days ago Yahoo posted about Storm-on-YARN project http://developer.yahoo.com/blogs/ydn/storm-yarn-released-open-source-143745133.html that makes possibility to run Storm on YARN.
That's big improvement, however I have two questions regarding to running tasks like Storm with YARN. Tasks like Storm don't have some limit on execution time... I mean, when you run Storm you expect it will work days or months - listen queue or whatever.
I mean there are set of tasks that don't have limitation in time execution (I'd like to report 0% progress)
1) what's about timeout? regular M/R is killed when it hangs on, how to prevent it? I walked through the code, but didn't find any special code
2) also, MR1 has queue where jobs waited for execution: when cluster finish one job, it picked up next job from queue. What about YARN? if I will push endless Storm-like jobs A, and the job B, will job B be executed?
Sorry, if my questions seem ridiculous, maybe I miss/don't understand something
Hadoop's JobTracker was(is) responsible for both cluster resources and the application lifecycle. YARN is only responsible for managing cluster resources and the application lifecycle is the responsibility of the application.
This change means that YARN can be used to manage any distributed paradigm. MR2 is of course the initial implementation ( map/reduce over YARN) but you can see some other implementations like the Storm-on-YARN you mentioned or HortonWorks intention to integrate SQL in hadoop etc.
You can take a look at a library called Weave from continuuity that provides a simple API for building distributed apps on YARN

Resources