Algo for a stable 'download-time-remaining' in a download window - algorithm

While displaying the download status in a window, I have information like:
1) Total file size (f)
2) Downloaded file size (f')
3) Current download speed (s)
A naive time-remaining calculation would be (f-f')/(s), but this value is way-to-shaky (6m remaining / 2h remaining / 5m remaining! deja vu?! :)
Would there be a calculation which is both stabler and not extremely wrong (showing 1h even when the download is about to complete)?

We solved a similar problem in the following way. We weren't interested in how fast the download was over the entire time, just roughly how long it was expected to take based on recent activity but, as you say, not so recent that the figures would be jumping all over the place.
The reason we weren't interested in the entire time frame was that a download could so 1M/s for half an hour then switch up to 10M/s for the next ten minutes. That first half hour will drag down the average speed quite severely, despite the fact that you're now honkin' along at quite a pace.
We created a circular buffer with each cell holding the amount downloaded in a 1-second period. The circular buffer size was 300, allowing for 5 minutes of historical data, and every cell was initialized to zero.
We also maintained a total (the sum of all entries in the buffer, so also initially zero) and the count (zero, obviously).
Every second, we would figure out how much data had been downloaded since the last second and then:
subtract the current cell from the total.
put the current figure into that cell and advance the cell pointer.
add that current figure to the total.
increase the count if it wasn't already 300.
update the figure displayed to the user, based on total / count.
Basically, in pseudo-code:
def init (sz):
buffer = new int[sz]
for i = 0 to sz - 1:
buffer[i] = 0
total = 0
count = 0
index = 0
maxsz = sz
def update (kbps):
total = total - buffer[index] + kbps
buffer[index] = kbps
index = (index + 1) % maxsz
if count < maxsz:
count = count + 1
return total / count
You can change your resolution (1 second) and history (300) to suit your situation but we found 5 minutes was more than long enough that it smoothed out the irregularities but still gradually adjusted to more permanent changes in a timely fashion.

Smooth s (exponential moving avg. or similar).

I prefer using average speed for the last 10 seconds and divide remaining part with that. Dividing to current speed to way too unstable while dividing to average of whole progress cannot handle permanent speed changes (like another download is starting).

Why not compute the download speed as an average over the whole download, that is:
s = f' / elapsed time
That way it would smooth out over time.

Related

How do programs estimate how long a process will take to complete?

In some loading bars there will be something like "2 minutes remaining". Does the programmer time how long the process takes on their computer and use that value some how? Or does the program calculate it all itself? Or another method?
This calculation is actually done internally because timing how long it would take to execute or download a certain program is based on internet speed, RAM, processor speed, etc. so it would be hard to have one universally predicted time based on the programmer's computer. Typically how this is calculated is based on how much of the program is already downloaded in comparison to the size of the file and takes into account how long it took to download that much data. From there the program extrapolates how much longer it will take to finish your download based on how fast it has operated until that point in time.
Those 'x minutes remaining' interface elements, which (ideally) indicate how much time it will take to complete a certain task are simply forecasts based on how long that task has taken so far, and how much work on that task has been accomplished.
For example, suppose I have a app that will upload a batch of images to a server (all of which are the same size, for simplicity). Here's a general idea of how the code that indicates time remaining will work:
Before we begin, we assign the current time to a variable. Also, at this point the time remaining indicator is not visible. Then, in a for... loop:
for (var i = 0; i < batchOfImages.length; i++)
{
We upload an image
After the image is uploaded, we get the difference between the current time and the start time. This is the total time expended on this task so far.
We then divide the total time expended by the total number of images uploaded so far, i + 1, to get the amount of time it has taken on average to upload each image so far
We then multiply the average upload time per image by the number of images remaining to upload to get the likely amount of time it will take to upload the remaining images
We update the text on the indicator to show amount of time remaining, and we make sure that the indicator is visible
}

Throughput calculation in Jmeter

Attached is the Summary Report for my tests.
Please help me understand how is the throughput value calculated by JMeter:
example the throughput of the very first line 53.1/min, how was this figure calculated by JMeter with which formula.
Also, wanted to know how are the throughput values in the subsequent test divided into mins or secs. example the 2nd line has a throughput 1.6/sec, so how does JMeter calculate this throughput values based on the time units ?
Tried many websites on the net and have got a common reply that the throughput is the number of requests per unit of time (seconds, minutes, hours) that are sent to your server during the test. But that didn't apply to the results I see in my graph the way it was explained straight forward.
Documentation defines Throughput as
requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time).
So in your case you had 1 request, which took 1129ms, so
Throughput = 1 / 1129ms = 0.00088573959/ms
= 0.00088573959 * 1000/sec = 0.88573959/sec
= 0.88573959 * 60/min = 53.1443754/min, rounded to 53.1/min
For 1 request total time (or elapsed time) is the same as the time of this single operation. For requests executed multiple times, it would be equal to
Throughput = (number of requests) / (average * number of requests) = 1 / average
For instance if you take the last line in your screenshot (with 21 requests), it has an average of 695, so throughput is:
Throughput = 1 / 695ms = 0.0014388489/ms = 1.4388489/sec, rounded to 1.4/sec
In terms of units (sec/min/hour), Summary report does this:
By default it displays throughput in seconds
But if throughput in seconds < 1.0, it will convert it to minutes
If it's still < 1.0, it will convert it to hours
It rounds the value to 1 decimal digit afterwards.
This is why some values are displayed in sec, some in min, and some could be in hours. Some may even have value 0.0, which basically means that throughput was < 0.04
I have been messing with this for a while and here is what I had to do into order for my numbers to match what jmeter says
Loop through my lines in the csv file, gather the LOWEST start time for each of the labels you have, also grab the HIGHEST (timestamp + elapsed time)
Calculate the difference between those in seconds
then do number of samples / the difference
So in excel, the easiest way to do it is get the csv file and add a column for timestamp + elapsed
First sort the block by the timestamp - lowest to highest then fine the first instance of each label and grab that time
Then sort by your new column highest to lowest and grab the first time again for each label
For each label then gather both of these times in a new sheet
A would be the label
B would be the start time
C would be the endtime+elapsed time
D would then be (C-B)1000 (diff in seconds)
E would then be the number of samples for each label
F would be E/D (samples per second)
G would be F60 (samples per minute)

If I have a 15 min sliding window, how can I collect daily/weekly aggregates?

I have a 15 min sliding window, and can aggregate at any given time over this data within this window. Due to memory constraints, I can't increase the size of the window. I still think I should be able to get aggregates (like trending items which is basically freq counter etc.) over a day/a week.
It doesn't have to be a very accurate count, just needs to filter out the top 3-5.
Will running a cron job every 15 mins and putting it into 4 (15min) counters work?
Can I get update some kind of a rolling counter over the aggregate?
Is there any other method to do this?
My suggestion is an exponentially decaying moving average. Like is done for the Unix load average. (See http://www.howtogeek.com/194642/understanding-the-load-average-on-linux-and-other-unix-like-systems/ for an explanation.)
What you do is pick a constant 0 < k < 1 then update every 5 minutes as follows:
moving_average = k * average_over_last_5_min + (1-k) * moving_average
This will behave something like an average over the last 5/k minutes. So if you set k = 1/(24.0 * 60.0 / 5.0) = 0.00347222222222222 then you get roughly a daily moving average. Divide that by 7 and you get roughly a weekly moving average.
The averages won't be exact, but should work perfectly well to identify what things are trending recently.

Calculating number of messages per second in a rolling window?

I have messages coming into my program with millisecond resolution (anywhere from zero to a couple hundred messages a millisecond).
I'd like to do some analysis. Specifically, I want to maintain multiple rolling windows of the message counts, updated as messages come in. For example,
# of messages in last second
# of messages in last minute
# of messages in last half-hour divided by # of messages in last hour
I can't just maintain a simple count like "1,017 messages in last second", since I won't know when a message is older than 1 second and therefore should no longer be in the count...
I thought of maintaining a queue of all the messages, searching for the youngest message that's older than one second, and inferring the count from the index. However, this seems like it would be too slow, and would eat up a lot of memory.
What can I do to keep track of these counts in my program so that I can efficiently get these values in real-time?
This is easiest handled by a cyclic buffer.
A cyclic buffer has a fixed number of elements, and a pointer to it. You can add an element to the buffer, and when you do, you increment the pointer to the next element. If you get past the fixed-length buffer you start from the beginning. It's a space and time efficient way to store "last N" items.
Now in your case you could have one cyclic buffer of 1,000 counters, each one counting the number of messages during one millisecond. Adding all the 1,000 counters gives you the total count during last second. Of course you can optimize the reporting part by incrementally updating the count, i.e. deduct form the count the number you overwrite when you insert and then add the new number.
You can then have another cyclic buffer that has 60 slots and counts the aggregate number of messages in whole seconds; once a second, you take the total count of the millisecond buffer and write the count to the buffer having resolution of seconds, etc.
Here C-like pseudocode:
int msecbuf[1000]; // initialized with zeroes
int secbuf[60]; // ditto
int msecptr = 0, secptr = 0;
int count = 0;
int msec_total_ctr = 0;
void msg_received() { count++; }
void every_msec() {
msec_total_ctr -= msecbuf[msecptr];
msecbuf[msecptr] = count;
msec_total_ctr += msecbuf[msecptr];
count = 0;
msecptr = (msecptr + 1) % 1000;
}
void every_sec() {
secbuf[secptr] = msec_total_ctr;
secptr = (secptr + 1) % 60;
}
You want exponential smoothing, otherwise known as an exponential weighted moving average. Take an EWMA of the time since the last message arrived, and then divide that time into a second. You can run several of these with different weights to cover effectively longer time intervals. Effectively, you're using an infinitely long window then, so you don't have to worry about expiring data; the reducing weights do it for you.
For the last millisecord, keep the count. When the millisecord slice goes to the next one, reset count and add count to a millisecond rolling buffer array. If you keep this cummulative, you can extract the # of messages / second with a fixed amount of memory.
When a 0,1 second slice (or some other small value next to 1 minute) is done, sum up last 0,1*1000 items from the rolling buffer array and place that in the next rolling buffer. This way you kan keep the millisecord rolling buffer small (1000 items for 1s max lookup) and the buffer for lookup the minute also (600 items).
You can do the next trick for whole minutes of 0,1 minutes intervals. All questions asked can be queried by summing (or when using cummulative , substracting two values) a few integers.
The only disadvantage is that the last sec value wil change every ms and the minute value only every 0,1 secand the hour value (and derivatives with the % in last 1/2 hour) every 0,1 minute. But at least you keep your memory usage at bay.
Your rolling display window can only update so fast, lets say you want to update it 10 times a second, so for 1 second's worth of data, you would need 10 values. Each value would contain the number of messages that showed up in that 1/10 of a second. Lets call these values bins, each bin holds 1/10 of a second's worth of data. Every 100 milliseconds, one of the bins gets discarded and a new bin is set to the number of messages that have show up in that 100 milliseconds.
You would need an array of 36K bins to hold an hour's worth information about your message rate if you wanted to preserve a precision of 1/10 of a second for the whole hour. But that seems overkill.
But I think it would be more reasonable to have the precision drop off as the time inteval gets larger.
Maybe you keep 1 second's worth of data accurate to 100 milliseconds, 1 minutes worth of data accurate to the second, 1 hour's worth of data accurate to the minute, and so on.
I thought of maintaining a queue of all the messages, searching for the youngest message that's older than one second, and inferring the count from the index. However, this seems like it would be too slow, and would eat up a lot of memory.
A better idea would be maintaining a linked list of the messages, adding new messages to the head (with a timestamp), and popping them from the tail as they expire. Or even not pop them - just keep a pointer to the oldest message that came in in the desired timeframe, and advance it towards the head when that message expires (this allows you to keep track of multiply timeframes with one list).
You could compute the count when needed by walking from the tail to the head, or just store the count separately, incrementing it whenever you add a value to the head, and decrementing it whenever you advance the tail.

Creating a formula for calculating device "health" based on uptime/reboots

I have a few hundred network devices that check in to our server every 10 minutes. Each device has an embedded clock, counting the seconds and reporting elapsed seconds on every check in to the server.
So, sample data set looks like
CheckinTime Runtime
2010-01-01 02:15:00.000 101500
2010-01-01 02:25:00.000 102100
2010-01-01 02:35:00.000 102700
etc.
If the device reboots, when it checks back into the server, it reports a runtime of 0.
What I'm trying to determine is some sort of quantifiable metric for the device's "health".
If a device has rebooted a lot in the past but has not rebooted in the last xx days, then it is considered healthy, compared to a device that has a big uptime except for the last xx days where it has repeatedly rebooted.
Also, a device that has been up for 30 days and just rebooted, shouldn't be considered "distressed", compared to a device that has continually rebooted every 24 hrs or so for the last xx days.
I've tried multiple ways of calculating the health, using a variety of metrics:
1. average # of reboots
2. max(uptime)
3. avg(uptime)
4. # of reboots in last 24 hrs
5. # of reboots in last 3 days
6. # of reboots in last 7 days
7. # of reboots in last 30 days
Each individual metric only accounts for one aspect of the device health, but doesn't take into account the overall health compared to other devices or to its current state of health.
Any ideas would be GREATLY appreciated.
You could do something like Windows' 7 reliability metric - start out at full health (say 10). Every hour / day / checkin cycle, increment the health by (10 - currenthealth)*incrementfactor). Every time the server goes down, subtract a certain percentage.
So, given a crashfactor of 20%/crash and an incrementfactor of 10%/day:
If a device has rebooted a lot in the past but has not rebooted in the last 20 days will have a health of 8.6
Big uptime except for the last 2 days where it has repeatedly rebooted 5 times will have a health of 4.1
a device that has been up for 30 days and just rebooted will have a health of 8
a device that has continually rebooted every 24 hrs or so for the last 10 days will have a health of 3.9
To run through an example:
Starting at 10
Day 1: no crash, new health = CurrentHealth + (10 - CurrentHealth)*.1 = 10
Day 2: One crash, new health = currenthealth - currentHealth*.2 = 8
But still increment every day so new health = 8 + (10 - 8)*.1 = 8.2
Day 3: No crash, new health = 8.4
Day 4: Two crashes, new health = 5.8
You might take the reboot count / t of a particular machine and compare that to the standard deviation of the entire population. Those that fall say three standard deviations from the mean, where it's rebooting more often, could be flagged.
You could use weighted average uptime and include the current uptime only when it would make the average higher.
The weight would be how recent the uptime is, so that most recent uptimes have the biggest weight.
Are you able to break the devices out into groups of similar devices? Then you could compare an individual device to its peers.
Another suggestions is to look in to various Moving Average algorithms. These are supposed to smooth out time-series data as well as highlight trends.
Does it always report it a runtime of 0, on reboot? Or something close to zero (less then former time anyway)?
You could calculate this two ways.
1. The lower the number, the less troubles it had.
2. The higher the number, it scored the largest periods.
I guess you need to account, that the health can vary. So it can worsen over time. So the latest values, should have a higher weight then the older ones. This could indicate a exponential growth.
The more reboots it had in the last period, the more broken the system could be. But also looking at shorter intervals of the reboots. Let's say, 5 reboots a day vs. 10 reboots in 2 weeks. That does mean a lot different. So I guess time should be a metric as well as the amount of reboots in this formula.
I guess you need to calculate the density of the amount of reboots in the last period.
You can use the weight of the density, by simply dividing. Because how larger the number is, on which you divide, how lower the result will be, so how lower the weight of the number can become.
Pseudo code:
function calcHealth(machine)
float value = 0;
float threshold = 800;
for each (reboot in machine.reboots) {
reboot.daysPast = time() - reboot.time;
// the more days past, the lower the value, so the lower the weight
value += (100 / reboot.daysPast);
}
return (value == 0) ? 0 : (threshold / value);
}
You could advance this function by for example, filtering for maxDaysPast and playing with the threshold and stuff like that.
This formula is based on this plot: f(x) = 100/x. As you see, on low numbers (low x value), the value is higher, then on large x value. So that's on how this formula calculates the weight of the daysPast. Because lower daysPast == lower x == heigher weight.
With the value += this formula counts the reboots and with the 100/x part it gives weight to the reboot, on where the weight is the time.
At the return, the threshold is divided through the value. This is because, the higher the score of the reboots, the lower the result must be.
You can use a plotting program or calculator, to see the bending of the plot, which is also the bending of the weight of the daysPast.

Resources