heron Topologies keep running after deactivating - apache-storm

I'm currently working on Heron & Apache Storm for some resource management and scheduling Research.
I noticed that after submitting topologies to Heron, they start running and taking resources, but after deactivating them, it appears that they are still running in the background and take 100% of CPU and RAM! Am i missing something? The way i understand it and based on heron docs, deactivating topologies should halt them and stops them from processing new tuples,
deactivate the topology. Once deactivated, the topology will stop processing but remain running in the cluster.
But when i check the heron-ui after deactivation, it's still processing new tuples, because emit count keeps changing! but when i kill them, everything goes back to normal! Is it normal? And if not, what's the problem?

You can try to update the version of Heron to clarify the problem. I have ran the Heron with 0.17.1 and 0.17.5, there is no problem about this.

Deactivating a topology stops spouts/sources from pulling any new data, but the bolts will continue to process until all pending data has been drained.

Related

AWS EMR Metric Server - Cluster Driver is throwing Insufficient Memory Error

This is in relation to my previous post (here) regarding the OOM I'm experiencing on a driver after running some Spark steps.
I have a cluster with 2 nodes in addition to the master, running the job as client. It's a small job that is not very memory intensive.
I've paid particular attention to the hadoop processes via htop, they are the user generated ones and also the highest memory consumers. The main culprit is the amazon.emr.metric.server process, followed by the state pusher process.
As a test I killed the process, the memory as shown by Ganglia dropped quite drastically whereby I was then able to run 3-4 consecutive jobs before the OOM happened again. This behaviour repeats if I manually kill the process.
My question really is regarding the default behaviour of these processes and whether what I'm witnessing is the norm or whether something crazy is happening.

how to stop running task and continue in hadoop cluster

I'm testing "shutting down servers using UPS" while hadoop task is running, and I have two questions.
I wonder if running task can be saved, and then it continues the remaining work again after rebooting. (at all nodes)
If "1" is not supported, is it safe to start shutting down process while hadoop tasks are running? Or, is there anything I have to do to preserve hadoop system? (cluster?)
No, you can't "save" the task in an intermediate state. If you shut down hadoop while some jobs are running, you could end up with intermediate data from abandoned jobs occupying space. Apart from that, you could shut down the system while jobs are running.
It is not possible to save the state of running tasks with Hadoop as of now. It would be an extremely difficult process since all of the resource allocations happen based on the current load of the system but after restarting your entire cluster there might be entirely different workload therefore restoring the state does not make sense.
Answering your second questions, Hadoop was designed to tolerate node failures or temporary problems with accessing files and network outages as well. Individual tasks might fail and then the system restarts them on a other node. It is safe to shut down nodes from the cluster point of view, the only thing to keep in mind that the job will ultimately fail and you need to re-submit it after bringing back the cluster to life. One problem might arise with shutting down the cluster using the power switch is that temporary files are not getting cleaned up. This is usually not a major problem.

Worker node execution in Apach-Strom

Storm topology is been deployed using Storm command on machine X. Worker nodes are running on Machine Y.
Once topology has been deployed, this is ready to process tuples and workers are processing request and response.
Can anyone please suggest that how do Worker node identify work and data, as I am not sure how worker node has access of code which is not at all deployed by developer?
If code to topology is accessible to Worker Nodes, can you please where is the location of this and also suggest execution of Worker nodes?
One, your asking a fairly complex question. I've been using Storm for awhile and don't understand much about how it works internally. Here is a good article talking about the internals of Storm. It's over two years old but should still be highly relevant. I believe that Netty is now used as the internal messaging transport, it's mentioned as being experimental in the article.
As far as code being run on worker nodes, there is an configuration in storm.yaml,
storm.local.dir
When uploading the Topology, I believe it copies the jar to that location. So every different worker machine will have the necessary jar in it's configured storm.local.dir. So even though you only upload the one machine, Storm will distributed it to the necessary workers. (That's from memory and I'm not in a spot to test it at the moment. )

Apache Storm - Nimbus, Supervisors, Workers getting stopped silently

I am using Apache Storm 0.9.5 version and Java 1.7
I am facing below issue.
There is a sudden death of all STORM processes happens.
I ran the topology for once and observed for 1 or 2 days without sending any data.
After then when I see the processes, they will be not running.
Also I have set the -XX:MaxPermSize=512m in Storm.yaml in all the nodes for nimbus, supervisor and workers.
But When I see the GC logs, it is saying
PSPermGen total 27136K, used 26865K [0x0000000760000000, 0x0000000761a80000, 0x0000000780000000)
object space 27136K, 99% used [0x0000000760000000,0x0000000761a3c480,0x0000000761a80000)
It is just 27MB alloted for PermGen space. Is STORM not taking 512MB of ram?
Please let me know why there is a sudden death seen for all these processes.
Thank you.
Added a monitoring process"supervisord" to monitor the master nimbus and supervisros. This way I made, required processes to be always UP and running.
Since Storm fall under fail-fast design category, a separate monitoring process is required to have 24/7 HA support for nimbus and supervisor processes.

Spark Streaming: What are things we should monitor to keep the streaming running?

I have a spark project running on 4 Core 16GB (both master/worker) instance, now can anyone tell me what are all the things to keep monitoring so that my cluster/jobs will never go down?
I have created a small list which includes the following items, please extend the list if you know more:
Monitor Spark Master/Worker from failing
Monitor HDFS from getting filled/going down
Monitor network connectivity for master/worker
Monitor Spark Jobs from getting killed
That's a good list. But in addition to those I would actually monitor the status of the receivers of the streaming application (assuming you are some non-HDFS source of data), whether they are connected or not. Well, to be honest, this was tricky to do with older versions of Spark Streaming as the instrumentation to get the receiver status didnt quite exist. However, with Spark 1.0 (to be released very soon), you can use the org.apache.spark.streaming.StreamingListener interface to get the events regarding the status of the receiver.
A sneak peak to the to-be-released Spark 1.0 docs is at
http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/streaming-programming-guide.html

Resources