Apache Storm cluster not assigning topology's components to all available workers - apache-storm

My topology is configured to use 14 workers and i current have 16 workers available in the cluster.
But when i submit the topology, the "Worker Resources" section on the Storm UI shows that all the 14 workers are up but components (Topology Components) are assigned only to 7. The rest shows N/A (see snapshot below).
The workers/supervisors log files shows no meaningful error(s) as of why some workers are not been assigned components.
I have been googling for the past 6hrs+ to no avail
My environment:
Apache Storm 2.1.0
Zookeeper 3.4.9
One master node (Ubuntu 18.0.4 LTS)
Three supervisor nodes (Ubuntu 18.0.4 LTS)

What are parallelism hints for each of the components (spouts and bolts) in your topology? If they do not add up to 14 or more then you will not have entries on all worker processes.
Remember you can also set the parallelism of system components such as the Ackers and (in metrics V1) the metrics consumers.

Related

Storm Topology Submitted Remotely not running

We created a storm topology and tested in local mode and everything works great. We then did a build and submitted to nimbus and one supervisor with 4 slots. The topology appears on the storm UI and is shown as active, with 4 slots used on the cluster. But when the topology is clicked, there are no spouts, no bolts, no statistics information. Nothing is written to our Redis database either. So we are wondering if there is something we are not doing.
Storm version: 2.0.0
OS: Linux Mint 19.1 Cinnamon

Mesos slave doesn't offer resources

I have a mesos cluster with 1 master and 1 slave.
I followed this tutorial to set up the configuration of both the master and the slave. And I can see the Mesos Web UI in http://master:5050 and the marathon UI in http://master:8080.
Also Activated=1, which means that the mesos slave is successfully connected to the mesos master I guess.
However In the Resources section, the Offered property is always equal to 0.
The total amount of available CPU is 4. 13GB of RAM and 38GB of disk.
Can anyone help with this issue ?

Calculating yarn.nodemanager.resource.cpu-vcores for a yarn cluster with multiple spark clients

If I have 3 spark applications all using the same yarn cluster, how should I set
yarn.nodemanager.resource.cpu-vcores
in each of the 3 yarn-site.xml?
(each spark application is required to have it's own yarn-site.xml on the classpath)
Does this value even matter in the client yarn-site.xml's ?
If it does:
Let's say the cluster has 16 cores.
Should the value in each yarn-site.xml be 5 (for a total of 15 to leave 1 core for system processes) ? Or should I set each one to 15 ?
(Note: Cloudera indicates one core should be left for system processes here: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ however, they do not go into details of using multiple clients against the same cluster)
Assume Spark is running with yarn as the master, and running in cluster mode.
Are you talking about the server-side configuration for each YARN Node Manager? If so, it would typically be configured to be a little less than the number of CPU cores (or virtual cores if you have hyperthreading) on each node in the cluster. So if you have 4 nodes with 4 cores each, you could dedicate for example 3 per node to the YARN node manager and your cluster would have a total of 12 virtual CPUs.
Then you request the desired resources when submitting the Spark job (see http://spark.apache.org/docs/latest/submitting-applications.html for example) to the cluster and YARN will attempt to fulfill that request. If it can't be fulfilled, your Spark job (or application) will be queued up or there will eventually be a timeout.
You can configure different resource pools in YARN to guarantee a specific amount of memory/CPU resources to such a pool, but that's a little bit more advanced.
If you submit your Spark application in cluster mode, you have to consider that the Spark driver will run on a cluster node and not your local machine (that one that submitted it). Therefore it will require at least 1 virtual CPU more.
Hope that clarifies things a little for you.

Does storm support ha host, e.g. for the nimbus host?

If so, from which version, does storm support this? I want to know this because I want to upgrade my storm version(now my storm version is 0.10.0).
High availability for Nimbus was introduced in Storm 1.0.0 (see https://storm.apache.org/2016/04/12/storm100-released.html)
However, even for prior Storm versions, missing HA for Nimbus was not a critical issue, because a failing Nimbus does not affect running topologies. The only problem if Nimbus is down is, that no interaction with the cluster is possible from outside (eg, submitting new topologies etc.).
Workers are HA too, ie, supervisors can restart failing workers. Supervisors are not HA -- however, the task they host will be redistributed automatically to other supervisors if one supervisor fails.

Need help regarding storm

1) What happens if Nimbus fails? Can we convert some other node into a Nimbus?
2) Where is the output of topology stored? When a bolt emits a tuple, where is it stored ?
3) What happens if zookeeper fails ?
Nimbus is itself a failure-tolerant process, which means it doesn't store its state in-memory but in an external database (Zookeeper). So if Nimbus crashes (an unlikely scenario), on the next start it will resume processing just where it stopped. Nimbus usually must be setup to be monitored by an external monitoring system, such as Monit, which will check the Nimbus process state periodically and restart it if any problem occurs. I suggest you read the Storm project's wiki for further information.
Nimbus is the master node of a Storm cluster and isn't possible to have multiple Nimbus nodes. (Update: the Storm community is now (as of 5/2014) actively working on making the Nimbus daemon fault tolerant in a failover manner, by having multiple Nimbuses heartbeating each other)
The tuple is "stored" in the tuple tree, and it is passed to the next bolt in the topology execution chain as topology execution progresses. As for physical storage, tuples are probably stored in an in-memory structure and seralized as necessary to be distributed among the cluster's nodes. The complete Storm cluster's state itself is stored in Zookeeper. Storm doesn't concern itself with persisent storage of a topology or a bolt's output -- it is your job to persist the results of the processing.
Same as for Nimbus, Zookeper in a real, production Storm cluster must be configured for reliability, and for Zookeeper that means having an odd number of Zookeeper nodes running on different servers. You can find more information on configuring a Zookeeper production cluster in the Zookeper Administrator's Guide. If Zookeeper would fail (altough a highly unlikely scenario in a properly configured Zookeeper cluster) the Storm cluster wouldn't be able to continue processing, since all cluster's state is stored in Zookeeper.
Regarding question 1), this bug report and subsequent comment from Storm author and maintainer Nathan Marz clarifies the issue:
Storm is not designed for having topologies partially running. When you bring down the master, it is unable to reassign failed workers. We are working on Nimbus failover. Nimbus is fault-tolerant to the process restarting, which has made it fault-tolerant enough for our and most people's use cases.

Resources