How to interpret the Throughput metric for Semantic Segmentation? - metrics

I am beginning to train a semantic segmentation model in AWS Sagermaker and they provide the following metrics for the output. I understand mIOU, loss, and pixel accuracy, but I do not know what throughput is or how to interpret it. Please see the image below and let me know if you need additional information.

Throughput is reported in records per second (i.e. images per second). It shows how fast the algorithm can iterate over training or validation data. For example, with a throughput of 30 records/sec it would take a minute to iterate over 1800 images.

Related

Algorithm / data structure for rate of change calculation with limited memory

Certain sensors are to trigger a signal based on the rate of change of the value rather than a threshold.
For instance, heat detectors in fire alarms are supposed to trigger an alarm quicker if the rate of temperature rise is higher: A temperature rise of 1K/min should trigger an alarm after 30 minutes, a rise of 5K/min after 5 minutes and a rise of 30K/min after 30 seconds.
 
I am wondering how this is implemented in embedded systems, where resources are scares. Is there a clever data structure to minimize the data stored?
 
The naive approach would be to measure the temperature every 5 seconds or so and keep the data for 30 minutes. On these data one can calculate change rates over arbitrary time windows. But this requires a lot of memory.
 
I thought about small windows (e.g. 10 seconds) for which min and max are stored, but this would not save much memory.
 
From a mathematical point of view, the examples you have described can be greatly simplified:
1K/min for 30 mins equals a total change of 30K
5K/min for 5 mins equals a total change of 25K
Obviously there is some adjustment to be made because you have picked round numbers for the example, but it sounds like what you care about is having a single threshold for the total change. This makes sense because taking the integral of a differential results in just a delta.
However, if we disregard the numeric example and just focus on your original question then here are some answers:
First, it has already been mentioned in the comments that one byte every five seconds for half an hour is really not very much memory at all for almost any modern microcontroller, as long as you are able to keep your main RAM turned on between samples, which you usually can.
If however you need to discard the contents of RAM between samples to preserve battery life, then a simpler method is just to calculate one differential at a time.
In your example you want to have a much higher sample rate (every 5 seconds) than the time you wish to calculate the delta over (eg: 30 mins). You can reduce your storage needs to a single data point if you make your sample rate equal to your delta period. The single previous value could be stored in a small battery retained memory (eg: backup registers on STM32).
Obviously if you choose this approach you will have to compromise between accuracy and latency, but maybe 30 seconds would be a suitable timebase for your temperature alarm example.
You can also set several thresholds of K/sec, and then allocate counters to count how many consecutive times the each threshold has been exceeded. This requires only one extra integer per threshold.
In signal processing terms, the procedure you want to perform is:
Apply a low-pass filter to smooth quick variations in the temperature
Take the derivative of its output
The cut-off frequency of the filter would be set according to the time frame. There are 2 ways to do this.
You could apply a FIR (finite impulse response) filter, which is a weighted moving average over the time frame of interest. Naively, this requires a lot of memory, but it's not bad if you do a multi-stage decimation first to reduce your sample rate. It ends up being a little complicated, but you have fine control over the response.
You could apply in IIR (Infinite impulse response) filter, which utilizes feedback of the output. The exponential moving average is the simplest example of this. These filters require far less memory -- only a few samples' worth, but your control over the precise shape of the response is limited. A classic example like the Butterworth filter would probably be great for your application, though.

Jmeter deveation is more but the report has zero errors

rampup - 400
Thread- 100
Loop count -10
Deveation is more than average value ...as per my knowledge deveation should be less or half of the average and report has 0 errors
Can anyone tell me what happens if deveation is more and developers going to fix this
And I'm I giving the ramp up time correct what should be rampup period in general for 100 users ...when I give for same input rampup has 100 I'm getting time out errors in my report
As per JMeter Glossary:
Standard Deviation is a measure of the variability of a data set. This is a standard statistical measure. See, for example: Standard Deviation entry at Wikipedia. JMeter calculates the population standard deviation (e.g. STDEVP function in spreadsheets), not the sample standard deviation (e.g. STDEV).
As per Understanding Your Reports: Part 3 - Key Statistics Performance Testers Need to Understand
Standard Deviations
The standard deviation is the measurement of the density of the cluster of the data around the sought value (mean). Low standard deviation means that points are closer to the mean. High standard deviation means the points are farther away. This parameter can help determine how reliable the data is. If the standard deviation is high, this means that results vary very much, and the analysis should be conducted accordingly.
If you have standard deviation higher than the average response time it basically means that you have more samplers with response time above the average than the ones which response time is below the average. Not sure if there is anything to fix there, maybe it's expected that some samplers last longer than otherse, for example "Logout" operation is normally very quick and "search" operations can last longer, if your user does multiple searches and only one logout - the deviation will be higher than the average. You can look at i.e. 90%, 95% and 99% lines of the Aggregate Report listener to see which percentage of users have for each and every action (and overall), compare the values with your NFRs or SLAs and raise issues if necessary.
Per se deviation higher than the average doesn't necessarily mean that there is a performance problem, you need to correlate other metrics with the business requirements

Specific Cache Hit Rate calculation

Scenario:
Suppose we have infinite cache memory size. Caching is just limited by timeout, value of this timeout is half an hour. Cache is initially empty.
Problem:
We have 50,000 distinct request. Our system is querying, randomly, at the rate of 15 request/second i.e. 27,000 request in half an hour . What kind of curve or average value of cache hit rate could we expect for first 5 hours?
Note: This scenario is fixed. I need an approach to find out hit rate. If you think tag is wrong, please suggest appropriate tag.
I think you're right and this is a math question (certainly not a programming
problem).
One approach is to consider the extremes -- what is the hit rate for the
first query when the the system starts running? For the second query?
After one second? After 10? After a minute? And what is the likelyhood
that any random query will be found in the cache once the system has been
running a long time?
These are few specific values, and together they give you a curve.
I don't think great numeric precision is necessary; the long-term average
and the shape of the curve is more interesting.

Supress jitter in measured values

in my application a receive measurement data from a external library. It's about a dataset of 20 3D Points 30 times per second. I've realized some jitter in the data and I'm looking for a way to supress or better flatten peaks out of the data stream without significantly slowing down the whole system. Unfortunately I have to rely on the last dataset I receive so I can't record the data (lets say in a Queue) and filter out "bad" values. But I could record data and try fitting an incoming value to the recorded data.
Is there any approved and reliable algorithm for this problem?
Thanks in advance
FUX

Metrics doesn't decay when no values are reported

I am using codahale metrics for monitoring purposes. Lets say there is a spike in latency at some point and later there are no values reported due to attribute that there are no traffic, the value in the graph stays as is(I am using a histogram). At times it gives a notion that the spike remains and we might need to address it, but it actually means that no values are reported after that and hence the graph doesn't decay. Am I missing any config parameter in this case or is the behaviour expected?
The way we update the metrics is
metrics.processingTime.update(processingTime);
So, when there is no traffic, we don't update this metric.
I know that the histogram takes into consideration datapoints from the past (for an irregular period of time) in order to display a statistical image of the data.
When there are no new datapoints, only the outlier is taken into consideration and averaged on and on.
The meters have the same behavior, displaying the data through moving averages of 1,5,15 minutes.
The solution in the histogram case is to use HDRhistogram and flush it periodically.

Resources