Apache Storm UI window - apache-storm

In Apache Storm UI, Window specifies The past period of time for which the statistics apply. So it may be 10 mins, 3 hr, 1day. But actually when a topology is running, Is the number of tuples emitted/ transferred be computed using this window time because If I see the actual time 10 mins is quite big but the window shows 10 mins statistics before actual 10 mins which doesn't make sense?
For Example: emitted = 1764260 tuples, so will the rate of tuples emission is 1764260/600= 9801 tuples/sec?

It does not display the average, it displays the total number of tuples emitted in the last period of time (10 min, 3h or 1 day).
Therefore, if you started the application 2 minutes ago, it will display all tuples emitted the last two minutes and you'll see that the number increases until you get to 10 minutes.
After 10 minutes, it will only show the number of tuples emitted in the last 10 minutes, and not an average of the tuples emitted. So if, for example, you started the application 30 minutes ago, it will display the number of tuples emitted between minutes 20 to 30.

Related

Throughput in Apache storm

I want to know exactly throughput in apache Storm. Is it the number of tuples processed/Total time?
If so what is the total number of tuples emitted? I am not getting exact significance of total tuples emitted/Time. Please let me know.
You need to look at the execute count of the sink bolts (the end ones in your topology that don't connect to any other bolts). This is the throughput and is reported for the last 10 mins, 3 hrs, 1 day and all time. Dividing the values by the time period in seconds will give you the throughput in tuples per second.

JMeter ramp-up vs duration.

Let's say that I have the current configuration :
Number of Threads (users): 150
Ramp-up : 30
Loop Count : None
If I add up a duration of 2 mins so :
Number of Threads (users): 150
Ramp-up : 30
Loop Count : None
Duration (minutes) : 2
How does Jmeter going to react? If each Threads take about 10 seconds to complete
Thanks in advance
Both Loop Count and Duration (if both present) are taken into account, whichever comes first. So in first configuration, you are not limiting loop count or duration, so the script will run "forever". In second case, loop count is still not limited, but duration is. So the test will stop 2 minutes after startup of the very first user, and the time includes ramp-up time. Stopping includes not running new samplers, and hard stop for all running samplers.
In your case, 150 users will finish starting after 30 sec. That means the first thread to run will complete 3 iterations (x10 sec) by the time the last thread just started its first.
Within the remaining 90 sec, all threads will complete roughly 8-9 iterations.
So for the first thread you should expect 11-12 iterations, for the very last thread to start, 8-9 iterations. Remaining threads anywhere between those numbers. Assuming ~30 threads executed same number of iterations, between 8 and 12, you get roughly 1500 iterations in total (could be a little over or under). Last iteration of each thread may be incomplete (e.g. some samplers did not get to run before test ran out of time).
Generally, since duration may leave unfinished iterations, I think it's only good as a fall back or threshold in pipeline automation. For example: run is configured to complete 1000 iterations (should take about 16 min if iteration takes 10 sec). So duration is set to 24 min (gives about 50% slack). It won't be needed if performance is decent, but if execution takes extremely long, we may hard stop it at 24 min, since there's no point to continue: we already know something is wrong.

Why is Average response time is reducing when we are increasing the number of users?

I am using J-Meter to run a performance test with different number of users. With 1 user, the avg response time is 1.4 seconds, but with more number of users, it's logical that the avg response time will go up, but instead it is reducing. Can anyone explain why? The test scenario is that I am interacting a few times (2-3 interactions) with a chat bot.
Please help me understand this confusing results below:
1 user - 30 seconds - 1.3 seconds (average response time)
5 users - 60 seconds - 0.92 seconds (average response time)
10 users - 60 seconds - 0.93 seconds (average response time)
20 users - 120 seconds - 0.92 seconds (average response time)
First iteration of first user often involves some overhead on client side (most commonly DNS resolution), and can have some overhead on server side (server "warmup"). That overhead is not required in the following iterations or users.
Thus what you see as reduction in average time is actually reduction of the impact of the slower "first user first iteration" execution time on overall outcome. This is why it's important to provide a sufficient sample, so that such local spike does not matter that much anymore. My rule of thumb is at least 10000 iterations before looking at any averages, although level of comfort is up to every tester to set.
Also when increasing number of users, you should not expect average to be worse, unless you reached a saturation point: it should be stable rather. So if you expect your app to be able to support not more than 20 users, than your result is surprising, but if you expect application to support 20000 users, you should not have any average degradation at 20 users.
To test if this is what happens, try to run 1 user, but for much longer, so that total number of iterations is similar to running 20 users for example. Roughly you need to increase duration of test with 1 user to 20 min to get to similar number of iterations (i.e. same length of test would be 120 sec, but also x20 iterations with 20 users, giving you rough number of 20 min total for 1 user)

Storm: Min/max aggregation across several sliding windows with varying sizes

I wonder what the best practice is to approach the following problem in Apache Storm.
I have a single spout that generates a stream of integer values with an explicit timestamp attached. The goal is to perform min/max aggregation with three sliding windows over this stream:
last hour
last day, i.e. last 24 hours
Last hour is easy:
topology.setBolt("1h", ...)
.shuffleGrouping("spout")
.withWindow(Duration.hours(1), Duration.seconds(10))
.withTimestampField("timestamp"));
However, for longer periods I am concerned about the queue sizes of the windows. When I consume the tuples directly from the spout as with the last-hour aggregation, every single tuple would end up in the queue.
One possibility would be to consume the tuples from the pre-aggregated "1h" bolt. However, since I am using explicit timestamps, late tuples arriving from the "1h" bolt are ignored. A 1 hour lag is not an option as this delays the evaluation of the window. Is there a way to "allow" late tuples without impacting the timeliness of the results?
Of course I could also store away an aggregate every hour and then compute the minimum over the last 24 hours including the latest value from the "1h" stream. But I am curious if there is a way to do this properly using Storm means.
Update 1
Thanks to arunmahadevan's answer I changed the 1h min bolt to emit the minimum tuple with the maximum timestamp of all tuples in the respective 1h window. That way the consuming bolt does not discard the tuple due to late arrival. I also introduced a new field original-timestamp to retain the original timestamp of the minimum tuple.
Update 2
I finally found an even better way by only emitting state changes in the 1h min bolt. Storm does not advance the time in the consuming bolt as long as no new tuples are received hence the late arrival issue is prevented. Also, I get to keep the original timestamp without copying it into a separate field.
I think periodically emitting the min from "1h" to "24h" bolt should work and keep the "24h" queue size in check.
If you configure a lag, the bolt's execute is invoked only after that lag (i.e. when the event time cross the sliding interval + lag).
Lets say if the "1h" bolt is configured with a lag of 1 min, the execute will be invoked for the tuples between 01:00 - 02:00 only after event time crosses 02:01. (i.e. the bolt has seen an event with timestamp >= 02:01). The execute will however only receive the tuples between 01:00 and 02:00.
Now if you compute the last one hour minimum and emit the result to a "24h" bolt that has a sliding interval of say 1 hr and lag=0, it will trigger once incoming event's timestamp crosses the next hr. If you emitted the 01:00-02:00 min with a timestamp of 02:00 the "24h" window will trigger (for the events between the previous day 02:00 to 02:00) as soon as it receives the min event since the event time crossed the next hour and the configured lag is 0.

Average waiting time in Round Robin scheduling

Waiting time is defined as how long each process has to wait before it gets it's time slice.
In scheduling algorithms such as Shorted Job First and First Come First Serve, we can find that waiting time easily when we just queue up the jobs and see how long each one had to wait before it got serviced.
When it comes to Round Robin or any other preemptive algorithms, we find that long running jobs spend a little time in CPU, when they are preempted and then wait for sometime for it's turn to execute and at some point in it's turn, it executes till completion. I wanted to findout the best way to understand 'waiting time' of the jobs in such a scheduling algorithm.
I found a formula which gives waiting time as:
Waiting Time = (Final Start Time - Previous Time in CPU - Arrival Time)
But I fail to understand the reasoning for this formula. For e.g. Consider a job A which has a burst time of 30 units and round-robin happens at every 5 units. There are two more jobs B(10) and C(15).
The order in which these will be serviced would be:
0 A 5 B 10 C 15 A 20 B 25 C 30 A 35 C 40 A 45 A 50 A 55
Waiting time for A = 40 - 5 - 0
I choose 40 because, after 40 A never waits. It just gets its time slices and goes on and on.
Choose 5 because A spent in process previouly between 30 and 35.
0 is the start time.
Well, I have a doubt in this formula as why was 15 A 20 is not accounted for?
Intuitively, I unable to get how this is getting us the waiting time for A, when we are just accounting for the penultimate execution only and then subtracting the arrival time.
According to me, the waiting time for A should be:
Final Start time - (sum of all times it spend in the processing).
If this formula is wrong, why is it?
Please help clarify my understanding of this concept.
You've misunderstood what the formula means by "previous time in CPU". This actually means the same thing as what you call "sum of all times it spend in the processing". (I guess "previous time in CPU" is supposed to be short for "total time previously spent running on the CPU", where "previously" means "before the final start".)
You still need to subtract the arrival time because the process obviously wasn't waiting before it arrived. (Just in case this is unclear: The "arrival time" is the time when the job was submitted to the scheduler.) In your example, the arrival time for all processes is 0, so this doesn't make a difference there, but in the general case, the arrival time needs to be taken into account.
Edit: If you look at the example on the webpage you linked to, process P1 takes two time slices of four time units each before its final start, and its "previous time in CPU" is calculated as 8, consistent with the interpretation above.
Last waiting
value-(time quantum×(n-1))
Here n denotes the no of times a process arrives in the gantt chart.

Resources