Best Storm Parallelism from various examples - parallel-processing

I am applying parallelism for my storm topology. I have set number of worker node=1.
Example#1
I am setting number of Task and number of executor for particular component as "2".
Example#2: no of tasks < no of executors
I am setting number of Tasks as "1" and number of executor as "2" for particular component.
Example#3: no of tasks > no of executors
I am setting number of Tasks as "5" and number of executor as "1" for particular component.
I am not getting which of the above example will lead to Best parallelism for topology and suggest which one gives benefits of Storm Parallelism? Please help me to understand this.

Did you read this article? https://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
To get good performance, you should set the number of executors to the number of available cores (each executors runs in an own thread). Using more tasks than executors is only beneficial if you want to change the parallelism during runtime.
Your "example#2" is no valid configuration: #tasks >= #executors must always be true (otherwise, there would be threads with no work).

Related

Flink: how does the parallelism set in the Jobmanager UI relate to task slots?

Let's say I have 8 task managers with 16 task slots. If I submit a job using the Jobmanager UI and set the parallelism to 8, do I only utilise 8 task slots?
What if I have 8 task managers with 8 slots, and submit the same job with a parallelism of 8? Is it exactly the same thing? Or is there a difference in the way the data is processed?
Thank you.
The total number of task slots in a Flink cluster defines the maximum parallelism, but the number of slots used may exceed the actual parallelism. Consider, for example, this job:
If run with parallelism of two in a cluster with 2 task managers, each offering 3 slots, the scheduler will use 5 task slots, like this:
However, if the base parallelism is increased to six, then the scheduler will do this (note that the sink remains at a parallelism of one in this example):
See Flink's Distributed Runtime Environment for more information.

Apache storm: why and how to choose number of tasks per executor?

According to the official documentation:
How many instances to create for a spout/bolt. A task runs on a thread with zero or more other tasks for the same spout/bolt. The number of tasks for a spout/bolt is always the same throughout the lifetime of a topology, but the number of executors (threads) for a spout/bolt can change over time. This allows a topology to scale to more or less resources without redeploying the topology or violating the constraints of Storm (such as a fields grouping guaranteeing that the same value goes to the same task)
My questions are:
Under what circumstances would I choose to run multiple tasks in one executor?
If I do use multiple tasks in one executor, what might be reasons that I would choose different number of tasks per executor between my spout and my bolt (such as 2 tasks per bolt executor but only 1 task per spout executor)?
I thought https://stackoverflow.com/a/47714449/8845188 was a fine answer, but I'll try to reword it as examples:
The number of tasks for a component (e.g. spout or bolt) is set in stone when you submit the topology, while the number of executors can be changed without redeploying the topology. The number of executors is always less than or equal to the number of tasks for a component.
Question 1
You wouldn't normally have a reason to choose running e.g. 2 tasks in 1 executor, but if you currently have a low load but expect a high load later, you may choose to submit the topology with a high number of tasks but a low number of executors. You could of course just submit the topology with as many executors as you expect to need, but using many threads when you only need a few is inefficient due to context switching and/or potential resource contention.
For example, lets say you submit your topology so the spout has 4 tasks and 4 executors (one per). When your load increases, you can't scale further because 4 is the maximum number of executors you can have. You now have to redeploy the topology in order to scale with the load.
Let's say instead you submit your topology so the spout has 32 tasks and 4 executors (8 per). When the load increases, you can increase the number of executors to 32, even though you started out with only 4. You can do this scaling up without redeploying the topology.
Question 2
Let's say your topology has a spout A, and a bolt B. Let's say bolt B does some heavyweight work (e.g. can do 10 tuples per executor per second), while the spout is lightweight (e.g. can do 1000 tuples per executor per second). Let's say your load is initially 20 messages per second into the topology, but you expect that to grow.
In this case it makes sense that you might configure your spout with 1 executor and 1 task, since it's likely to be idle most of the time. At the same time you want to configure your bolt with a high number of tasks so you can scale the number of executors for it, and at least 2-3 executors to start.
Config#TOPOLOGY_TASKS -> How many tasks to create per component.
A task performs the actual data processing and is run within its parent executor’s thread of execution. Each spout or bolt that you implement in your code executes as many tasks across the cluster.
The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true: #threads <= #tasks.
By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread (which is usually what you want anyways).
Also be aware that:
The number of executor threads can be changed after the topology has been started.
The number of tasks of a topology is static.
There is another reason where having tasks in place of executors makes more sense.
Lets suppose you have 2 tasks of the same bolt running on a single executor(thread). Lets suppose you are calling a relatively long running(1 second maybe) database subroutine and the result is needed before proceeding further.
Case 1 - Your database call would be running on the executor thread and it would pause for a while and you would not gain anything by running 2 tasks.
Case 2 - You refactor your database call code to spawn a new thread and execute. In this case, your main executor thread would not hang and it would be able to start processing of the second bolt task while the newly spawned thread would be fetching data from database.
Unless you introduce your own parallelism within the component, I do not see a performance gain and no reason to run multiple tasks apart from maintenance reasons as mentioned in other answers.

Storm supports task or data parallelism?

I am trying to learn the parallelism and scalability features offered by Storm and read the following article http://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html. I am confused that whether Storm supports data or task parallelism. What I could understand ( I may be wrong) is that Storm supports task parallelism (since the degree of parallelism is restricted by the number of tasks in the topology). If this is the case then how can it be used for large scale parallel data processing which requires data parallelism.
Any help would be greatly appreciated. Thanks :)
Storm does not follow text book terminology. In fact, Storm does support data, task, and pipelined parallelism.
If you have an operator and assign a parallelism larger than one (parallelism_hint) you get as many threads as specified by the parameter, each executing the same code on different data, ie, you get data parallelism. You can further assign parameter number_of_tasks (which must be >= parallelism_hint) to split the input data into number_of_task partitions/substreams (ie, more partitions than executors). Thus, some executor threads need to process multiple partitions/substreams (called tasks in Storm). This does not increase the parallelism (maybe concurrency). However, it allows to change the number of executor at runtime.
As you have multiple spouts and bolts in your topology and all those spouts and bolt are executed in different thread and even different machines, you have task parallelism here (not to confuse with Storm's usage of the term task!). As there are produce/consumer relationships between spouts/bolts you also get pipeline parallelism hers, which is a special form of task parallelism. Another form of task parallelism in Storm is the ability to run multiple topology at the same time.

How is the work divided amongst Storm Workers?

How does Apache Storm Divide the tasks amongst it's workers, I read that storm does it by itself, and it's a function of parallelism, but what I don't know is how do I figure out which node does what and how many nodes would do which task, basically so that I can calculate the optimal number of nodes required?
Assuming that the hardware configuration of all nodes is not the same.
By default, Storm used "round robin" scheduling, ie, it loops over all supervisors with available slots and assigns the parallel instances of spouts/bolts. If no more free slots are available, single workers are assigned multiple spout/bolt instances.
You need to have a look at storm UI. The metrics: complete latency, capacity, execute latency, process latency and failed tuples will give you "hints" on how many executors and tasks you should allocate for each bolt.

What does multiple task inside an executor in storm signifies?

What is the benefit of using multiple task in an executor in storm topology. I mean I couldn't understand that except doing multiple thing, we can achieve any speed or parallelism?
Michael G.Noll wrote a great tutorial that should help you to understand storm parallelism.
Usually a topology runs one task per executor. However since you cannot increase the number of tasks while a topology is running you can declare multiple tasks per executor in order to scale up parallelism over time.
There is no specific use case to have multiple tasks per executor other than the possibility to increase the topology parallelism.

Resources