Differnces between __execute-count value and values gathered by the Metrics Reporting API v2 - apache-storm

I have run a topology, and I used the Meter type in metric Reporting API v2. In the execute method I mark this metric. So it will mark an event whenever the execute method is called. But when I compare this value with the __execute-count, I see huge differences. Does anyone know why this happens?
These are the values from my log which are gathered at the same time:
9:v7 __execute-count {v0:v7=44500}
9:v7 tuple_inRate.count 664129
Update:
When I use the mark method on the Meter metric, I will get different results in comparison with the Counter metric. But still, I do not understand why the values from the counter metric (tuple counter) are not the same as the __execute-count.

As given in this answer, Storms Internal Metrics are just estimated by a percentage of the real data flow. Initially, it uses 5% of incoming tuples to make those estimations. This may lead to inaccuracies for extreme high or low throughputs.
EDIT: The documentation describes the following:
In general all of these tuple count metrics are randomly sub-sampled unless otherwise stated. This means that the counts you see both on the UI and from the built in metrics are not necessarily exact. In fact by default we sample only 5% of the events and estimate the total number of events from that. The sampling percentage is configurable per topology through the topology.stats.sample.rate config. Setting it to 1.0 will make the counts exact, but be aware that the more events we sample the slower your topology will run (as the metrics are counted in the same code path as tuples are processed). This is why we have a 5% sample rate as the default.
EDIT 2 In this post, there is more information about the estimation:
The way it works is that if you choose a sampling rate of 0.05, it will pick a random element of the next 20 events in which to increase the count by 20. So if you have 20 tasks for that bolt, your stats could be off by +-380.
By the way, execute_count is just an increasing number, while your tuple_inRate.count is a rate, isn`t it?

Related

Algorithm / data structure for rate of change calculation with limited memory

Certain sensors are to trigger a signal based on the rate of change of the value rather than a threshold.
For instance, heat detectors in fire alarms are supposed to trigger an alarm quicker if the rate of temperature rise is higher: A temperature rise of 1K/min should trigger an alarm after 30 minutes, a rise of 5K/min after 5 minutes and a rise of 30K/min after 30 seconds.
 
I am wondering how this is implemented in embedded systems, where resources are scares. Is there a clever data structure to minimize the data stored?
 
The naive approach would be to measure the temperature every 5 seconds or so and keep the data for 30 minutes. On these data one can calculate change rates over arbitrary time windows. But this requires a lot of memory.
 
I thought about small windows (e.g. 10 seconds) for which min and max are stored, but this would not save much memory.
 
From a mathematical point of view, the examples you have described can be greatly simplified:
1K/min for 30 mins equals a total change of 30K
5K/min for 5 mins equals a total change of 25K
Obviously there is some adjustment to be made because you have picked round numbers for the example, but it sounds like what you care about is having a single threshold for the total change. This makes sense because taking the integral of a differential results in just a delta.
However, if we disregard the numeric example and just focus on your original question then here are some answers:
First, it has already been mentioned in the comments that one byte every five seconds for half an hour is really not very much memory at all for almost any modern microcontroller, as long as you are able to keep your main RAM turned on between samples, which you usually can.
If however you need to discard the contents of RAM between samples to preserve battery life, then a simpler method is just to calculate one differential at a time.
In your example you want to have a much higher sample rate (every 5 seconds) than the time you wish to calculate the delta over (eg: 30 mins). You can reduce your storage needs to a single data point if you make your sample rate equal to your delta period. The single previous value could be stored in a small battery retained memory (eg: backup registers on STM32).
Obviously if you choose this approach you will have to compromise between accuracy and latency, but maybe 30 seconds would be a suitable timebase for your temperature alarm example.
You can also set several thresholds of K/sec, and then allocate counters to count how many consecutive times the each threshold has been exceeded. This requires only one extra integer per threshold.
In signal processing terms, the procedure you want to perform is:
Apply a low-pass filter to smooth quick variations in the temperature
Take the derivative of its output
The cut-off frequency of the filter would be set according to the time frame. There are 2 ways to do this.
You could apply a FIR (finite impulse response) filter, which is a weighted moving average over the time frame of interest. Naively, this requires a lot of memory, but it's not bad if you do a multi-stage decimation first to reduce your sample rate. It ends up being a little complicated, but you have fine control over the response.
You could apply in IIR (Infinite impulse response) filter, which utilizes feedback of the output. The exponential moving average is the simplest example of this. These filters require far less memory -- only a few samples' worth, but your control over the precise shape of the response is limited. A classic example like the Butterworth filter would probably be great for your application, though.

Jmeter deveation is more but the report has zero errors

rampup - 400
Thread- 100
Loop count -10
Deveation is more than average value ...as per my knowledge deveation should be less or half of the average and report has 0 errors
Can anyone tell me what happens if deveation is more and developers going to fix this
And I'm I giving the ramp up time correct what should be rampup period in general for 100 users ...when I give for same input rampup has 100 I'm getting time out errors in my report
As per JMeter Glossary:
Standard Deviation is a measure of the variability of a data set. This is a standard statistical measure. See, for example: Standard Deviation entry at Wikipedia. JMeter calculates the population standard deviation (e.g. STDEVP function in spreadsheets), not the sample standard deviation (e.g. STDEV).
As per Understanding Your Reports: Part 3 - Key Statistics Performance Testers Need to Understand
Standard Deviations
The standard deviation is the measurement of the density of the cluster of the data around the sought value (mean). Low standard deviation means that points are closer to the mean. High standard deviation means the points are farther away. This parameter can help determine how reliable the data is. If the standard deviation is high, this means that results vary very much, and the analysis should be conducted accordingly.
If you have standard deviation higher than the average response time it basically means that you have more samplers with response time above the average than the ones which response time is below the average. Not sure if there is anything to fix there, maybe it's expected that some samplers last longer than otherse, for example "Logout" operation is normally very quick and "search" operations can last longer, if your user does multiple searches and only one logout - the deviation will be higher than the average. You can look at i.e. 90%, 95% and 99% lines of the Aggregate Report listener to see which percentage of users have for each and every action (and overall), compare the values with your NFRs or SLAs and raise issues if necessary.
Per se deviation higher than the average doesn't necessarily mean that there is a performance problem, you need to correlate other metrics with the business requirements

how to understand arrival rate about apache storm disruptor queue

About storm metric. I do not understand the relationship between send queue arrival rate and receive queue arrival rate.
For example, when open ACK, if a spout receive one tuple , and it emit one tuple. whether the RQ arrival rate : SQ arrival rate = 1:2?
Besides, if system not stable. this Equation may be change?
Spout instances in Storm do not have a receive queue (only a send queue)? I assume you are referring to bolts?
Although it is a little old this article by Michael Noll gives a good overview of the internal queues within the workers.
To answer your question. The ratio between the queues will not always be 2:1. The disruptor queues report their metrics averaged over the user configurable topology.builtin.metrics.bucket.size.secs so this will obscure some of the difference. Also all metrics are subject to a sample ratio, set by the topology.stats.sample.rate config variable - which by default is only 20% of transferred tuples, this can also cause the reported numbers to be off.
Also, depending on the code in your bolts, 1 input tuple may produce many output tuples so you would have to take this into account in any ratios you were calculating.
You refer to the stability of an equation in your question. The arrival rate is not based on any queuing theory equation and is simply the number of tuples that are put on the queue in a metric.bucket period divided by the period length in seconds. However, Storm does report a queue sojourn time metric. This is based on a very simple queuing theory equation that is not reliable for unstable queue systems and should be avoided.

Performace impact of using setStatsSampleRate/topology.stats.sample.rate

What is the performance impact of setting topology.stats.sample.rate: 1.0 in yaml?
How this works?
topology.stats.sample.rate configures the rate at which a Storm topology statistics would be calculated.
Default value in defaults.yaml is 0.05. This means only five out of 100 events are taken into account.
The value of 1 means each tuple's statistics is going to be calculated.
Is this going to decrease performance? Most likely many will say yes but since each environment is different, I would say it is better to measure it yourself. Increase and decrease the value and measure the throughput of your topology.

Metrics doesn't decay when no values are reported

I am using codahale metrics for monitoring purposes. Lets say there is a spike in latency at some point and later there are no values reported due to attribute that there are no traffic, the value in the graph stays as is(I am using a histogram). At times it gives a notion that the spike remains and we might need to address it, but it actually means that no values are reported after that and hence the graph doesn't decay. Am I missing any config parameter in this case or is the behaviour expected?
The way we update the metrics is
metrics.processingTime.update(processingTime);
So, when there is no traffic, we don't update this metric.
I know that the histogram takes into consideration datapoints from the past (for an irregular period of time) in order to display a statistical image of the data.
When there are no new datapoints, only the outlier is taken into consideration and averaged on and on.
The meters have the same behavior, displaying the data through moving averages of 1,5,15 minutes.
The solution in the histogram case is to use HDRhistogram and flush it periodically.

Resources