I'm trying to fit a PyMC3 model to some data regarding sales over time. Here's a brief description :
N salespeople each sell some number of widgets per week
We assume each salesperson sells widgets at a different mean rate per week, and call this beta_i for salesperson i
Our observed data is assumed to be ~Poisson(beta_i).
Weekly average sales data is plotted here in a histogram, with a log-normal fit on top, to give you an idea of the distribution of weekly widget sales by salesperson.
In this first scenario, I get what I think are a reasonable set of betas, although they don't look particularly log-normal :
Because we are hoping to infer something about an underlying trend shared by all salespeople (something analogous to "the economy"), we tried adding something. Our first attempt had "the economy" be just a linear function of time, starting at an intercept value of 1 and having derivative gamma > 0 ( gamma was half-normal with sd=0.5 ). We then had our data ~Poisson(beta_i * (1 + gamma)). In this scenario, betas didn't shift much, and we did infer something about "the economy", though it was a pretty weak effect.
I'm hoping to replace this with a random walk or a Gaussian process, to allow "the economy" to vary somewhat smoothly in time but to have an arbitrary shape. Ideally it would start at a value of 0, and then go wherever it needs to go to capture the underlying trend shared by all salespeople, with the data once again ~Poisson(beta_i * (1 + gamma)). Here's our model.
with pm.Model() as model:
# Salesperson base rate of selling widgets
beta_ = pm.Lognormal("beta", mu=mu_hat, sd=sd_hat, shape=(1, n_salespeople))
# mu_hat and sd_hat were estimated by fitting a log-normal to weekly sales data
# Economy
gamma_ = pm.GaussianRandomWalk("gamma", mu=0, sd=1e-6, shape=(n_weeks, 1))
# Effects
base_rate = beta_
economy = 1 + gamma_
# Observed
lambda_ = base_rate * economy
y = pm.Poisson("y", mu=lambda_, observed=observed_sales + 1e-7)
where observed_sales is an integer array of the number of sales made, of shape (n_weeks, n_salespeople).
First, I'm not sure I'm specifying this model correctly. Without the "economy", I infer a reasonable set of betas ( although it doesn't look log-normal, as in the second screenshot ). Then, the random walk we get back is not at all smooth, no matter how small the sd gets; more often than not, for reasons I'm unsure about, I get a message about "Mass matrix contains zeros on the diagonal.". Finally, even at the beginning, I was getting infinite probabilities if I didn't add a small factor to the observed data... Why is that ?
So, TL; DR : I'm fairly new to probabilistic programming, and I'm fairly sure something is going wrong but I'm not sure what. Any input much, much appreciated !
I've collected some data from a potentiometer using an Arduino microcontroller. Here is the data which was sampled at 500 Hz (it's a lot of data):
http://cl.ly/3D3s1U3m1R1T?_ga=1.178935463.2093327149.1426657579
If you zoom in you can see that essentially I have a pot that just rotates back and forth i.e., I should see a linear increase and then a linear decrease. While the general shape of the data affirms this, almost every single time there's some really freaking annoying (sometimes surprisingly wide) spikes that get in the way of a really nice shape. Is there any way I can make some type of algorithm or filter which fixes this? I tried a median filter, and using the percentiles but neither worked. I mean I feel like it shouldn't be the hardest thing because I can clearly see what it should look like-basically the minimum of where the spikes occur-but for some reason everything I try miserably fails or at least looses the integrity of the original data.
I'd appreciate any help I can get with this.
There are many ways to tackle your problem. However none of them will ever be perfect. I'll give you 2 approaches here.
Moving average (low pass filter)
In Matlab, one easy way to "low pass" filter your data without having to explicitly use FFT, is to use the filter function` (available in the base package, you do not need any specific toolbox).
You create a kernel for the filter, and apply it twice (once in each direction), to cancel the phase shift introduced. This is in effect a "Moving Average" filter with zero phase shift.
The size (length) of the kernel will control how heavy the averaging process will be.
So for example, 2 different filter lengths:
n = 100 ; %// length of the filter
kernel = ones(1,n)./n ;
q1 = filter( kernel , 1 , fliplr(p) ) ; %// apply the filter in one direction
q1 = filter( kernel , 1 , fliplr(q1) ) ; %// re-apply in the opposite direction to cancel phase shift
n = 500 ; %// length of the filter
kernel = ones(1,n)./n ;
q2 = filter( kernel , 1 , fliplr(filter( kernel , 1 , fliplr(p) )) ) ; %// same than above in one line
Will produce on your data:
As you can see, each filter size has its pros and cons. The more you filter, the more of your spikes you cancel, but the more you deform your original signal. It is up to you to find your optimum settings.
2) Search derivative anomalies
This is a different approach. You can observe on your signal that the spikes are mostly sudden, it means the change of value of your signal is rapid, and luckily faster than the "normal" rate of change of your desired signal. It means that you can calculate the derivative of your signal, and identify all the spikes (the derivative will be much higher than for the rest of the curve).
Since this only identify the "beginning" and "end" of the spikes (not the occasional plateau in the middle), we will need to extend a little bit the zone identified as faulty by this method.
When the identification of faulty data is done, you just discard these data points and re-interpolate your curve over the original interval (taking support on the points you have left).
%% // Method 2 - Reinterpolation of cancelled data
%// OPTIONAL slightly smooth the initial data to get a cleaner derivative
n = 10 ; kernel = ones(1,n)./n ;
ps = filter( kernel , 1 , fliplr(filter( kernel , 1 , fliplr(p) )) ) ;
%// Identify the derivative anomalies (too high or too low)
dp = [0 diff(ps)] ; %// calculate the simplest form of derivative (just the difference between consecutive points)
dpos = dp >= (std(dp)/2) ; %// identify positive derivative above a certain threshold (I choose the STD but you could choose something else)
dneg = dp <= -(std(dp)/2) ; %// identify negative derivative above the threshold
ixbad = dpos | dneg ; %// prepare a global vector of indices to cancel
%// This will cancel "nPtsOut" on the RIGHT of each POSITIVE derivative
%// point identified, and "nPtsOut" on the LEFT of each NEGATIVE derivative
nPtsOut = 100 ; %// decide how many points after/before spikes we are going to cancel
for ii=1:nPtsOut
ixbad = ixbad | circshift( dpos , [0 ii]) | circshift( dneg , [0 -ii]) ;
end
%// now we just reinterpolate the missing gaps
xp = 1:length(p) ; %// prepare a base for reinterpolation
pi = interp1( xp(~ixbad) , p(~ixbad) , xp ) ; %// do the reinterpolation
This will produce:
The red signal is the result of the above moving average, the green signal is the result of the derivative approach.
There are also settings you can change to adjust this result (the threshold for the derivative, 'nPtsOut' and even the initial smoothing of the data).
As you can see, for the same amount of spike cancellation than the moving average method, it respect a bit more the integrity of the initial data. However it is not perfect either and some interval will still be deformed. But as I said at the beginning, no method is ever perfect.
It seems you have large spikes near the maximum and minimum points of you pot. You can limit the range of your valid data between 200 and 300 for instance.
Another option is a 1st order low pass filter like this one
alpha = 0.01 %parameter to tune!
p_filtered(1) = p(1);
for i=2:length(p)
p_filtered(i) = alpha*p(i) + (1-alpha)* p_filtered(i-1);
end
The noise spikes are being caused by the POT’s wiper bouncing along the resistive track as the knob is turned. This is a common problem with them. In future you should look at adding a 0.1uF capacitor to the POT output and this should fix the problem.
With your current data the simplest option is to just do a simple moving average and visually tune the number of samples averaged until the spikes are sufficiently suppressed while not affecting the underlying data. Note that a moving average is just a low pass filter with a sinc frequency response.
The normal way to post-process this sort of data is to do an FFT (using an appropriate windowing function), zero out the noise values above the signal of interest and then take an inverse FFT. This is also just lowpass filtering (with a sinc * windowing function weighted moving average), but you use the insight provided by the FFT to select your cutoff frequency. If you’re not comfortable with the maths involved in doing this then just go with the simple moving average filter. It should be fine for your needs.
My fan has 24 speedsteps. Thermal shutdown is 105°C it think. Idle temperature is about 75°C. Is a good algorithm to take a temperature lower bound and a temperature higher bound and divide it by n speedsteps?
EDIT: ATM I use 2 loops and up_threshold of 85°C but that was before I know about 24 speedsteps:
error |= ec_read(EC_RTMP, &ec_rtmp);
if ( ( ec_rtmp < FAN_UPTHRESHOLD_TEMP && sloop < 0 ) ||
( ec_rtmp < FAN_UPTHRESHOLD_TEMP && sloop == FAN_LOOP ) ||
( ec_rtmp < FAN_UPTHRESHOLD_TEMP && speed_switch == 1 )
)
{
speed_switch = 1;
sloop = FAN_LOOP; // 20 * 10 sec
printk("Temp %dC: disabling fan\n", ec_rtmp);
set_fan_disabled();
queue_delayed_work( my_workqueue, &work_object, FAN_JIFFIES_MS*HZ );
} else
{
speed_switch = 0;
printk("Temp %dC: enable fan\n", ec_rtmp);
set_fan_enable();
queue_delayed_work( my_workqueue, &work_object, 2*FAN_JIFFIES_MS*HZ );
}
EDIT: I've found a good source code: http://code.google.com/p/eeepc-fancontrol/wiki/Formular
You will need to consider many factors, firstly you don't want the fan constantly bouncing between two different steps, so a common trick is to only change the fan speed on a time based interval or if it crosses two boundaries higher than when the fan speed last changed.
If your goal is to just stop the laptop from getting any hotter, then using a table of speed steps will be mostly suitable, but it won't be ideal, and might have the laptop getting hotter than it other wise needs to be. Imagine if your fan was always one or two settings slower for the current heat output than it needed to be. What I'm getting at is fan speed should be related to change in temperature NOT directly related to temperature, but at the same time don't totally reject temperature, you need also to have a threshold table that says fan must be at least speed X when temperature is over Y.
So design your system based on temperature gain / loss (delta) over a time interval rather than temperature at a given point in time.
Also one other thing to consider is that fans generally don't increase linearly in cooling with RPM, they usually follow a bell curve for efficiency where they ramp up towards peak cooling efficiency (vs RPM) and then as you go higher RPM they won't be as efficient in cooling down. You might very well find that the last 10% fan RPM increases by several DB but might not do much more at all in the way of removing heat.
I would suggest using a minimum fan speed that is just below where you would generally like the fan running at in normal quiet conditions. Rather than just going 10%, 20%, 30%.. 90%, 100% fan RPM, I would say start at 40% (or what feels best for you) and then at this speed see what your new idle temperature is, then use that as your base point for increasing the remainder of the fan speed.
There is no perfect generic solution for this problem you will get something that could always be improved upon based on the heat output compared to your current interpretation of what noise is costly. As such you should look to implement different sets of settings for quiet, office or gaming profiles (based on roughly what your importance and system load will be for a given situation). Much like cars which have a sports mode, or off road setting.
In many applications, we have some progress bar for a file download, for a compression task, for a search, etc. We all often use progress bars to let users know something is happening. And if we know some details like just how much work has been done and how much is left to do, we can even give a time estimate, often by extrapolating from how much time it's taken to get to the current progress level.
(source: jameslao.com)
But we've also seen programs which this Time Left "ETA" display is just comically bad. It claims a file copy will be done in 20 seconds, then one second later it says it's going to take 4 days, then it flickers again to be 20 minutes. It's not only unhelpful, it's confusing!
The reason the ETA varies so much is that the progress rate itself can vary and the programmer's math can be overly sensitive.
Apple sidesteps this by just avoiding any accurate prediction and just giving vague estimates!
(source: autodesk.com)
That's annoying too, do I have time for a quick break, or is my task going to be done in 2 more seconds? If the prediction is too fuzzy, it's pointless to make any prediction at all.
Easy but wrong methods
As a first pass ETA computation, probably we all just make a function like if p is the fractional percentage that's done already, and t is the time it's taken so far, we output t*(1-p)/p as the estimate of how long it's going to take to finish. This simple ratio works "OK" but it's also terrible especially at the end of computation. If your slow download speed keeps a copy slowly advancing happening overnight, and finally in the morning, something kicks in and the copy starts going at full speed at 100X faster, your ETA at 90% done may say "1 hour", and 10 seconds later you're at 95% and the ETA will say "30 minutes" which is clearly an embarassingly poor guess.. in this case "10 seconds" is a much, much, much better estimate.
When this happens you may think to change the computation to use recent speed, not average speed, to estimate ETA. You take the average download rate or completion rate over the last 10 seconds, and use that rate to project how long completion will be. That performs quite well in the previous overnight-download-which-sped-up-at-the-end example, since it will give very good final completion estimates at the end. But this still has big problems.. it causes your ETA to bounce wildly when your rate varies quickly over a short period of time, and you get the "done in 20 seconds, done in 2 hours, done in 2 seconds, done in 30 minutes" rapid display of programming shame.
The actual question:
What is the best way to compute an estimated time of completion of a task, given the time history of the computation? I am not looking for links to GUI toolkits or Qt libraries. I'm asking about the algorithm to generate the most sane and accurate completion time estimates.
Have you had success with math formulas? Some kind of averaging, maybe by using the mean of the rate over 10 seconds with the rate over 1 minute with the rate over 1 hour? Some kind of artificial filtering like "if my new estimate varies too much from the previous estimate, tone it down, don't let it bounce too much"? Some kind of fancy history analysis where you integrate progress versus time advancement to find standard deviation of rate to give statistical error metrics on completion?
What have you tried, and what works best?
Original Answer
The company that created this site apparently makes a scheduling system that answers this question in the context of employees writing code. The way it works is with Monte Carlo simulation of future based on the past.
Appendix: Explanation of Monte Carlo
This is how this algorithm would work in your situation:
You model your task as a sequence of microtasks, say 1000 of them. Suppose an hour later you completed 100 of them. Now you run the simulation for the remaining 900 steps by randomly selecting 90 completed microtasks, adding their times and multiplying by 10. Here you have an estimate; repeat N times and you have N estimates for the time remaining. Note the average between these estimates will be about 9 hours -- no surprises here. But by presenting the resulting distribution to the user you'll honestly communicate to him the odds, e.g. 'with the probability 90% this will take another 3-15 hours'
This algorithm, by definition, produces complete result if the task in question can be modeled as a bunch of independent, random microtasks. You can gain a better answer only if you know how the task deviates from this model: for example, installers typically have a download/unpacking/installing tasklist and the speed for one cannot predict the other.
Appendix: Simplifying Monte Carlo
I'm not a statistics guru, but I think if you look closer into the simulation in this method, it will always return a normal distribution as a sum of large number of independent random variables. Therefore, you don't need to perform it at all. In fact, you don't even need to store all the completed times, since you'll only need their sum and sum of their squares.
In maybe not very standard notation,
sigma = sqrt ( sum_of_times_squared-sum_of_times^2 )
scaling = 900/100 // that is (totalSteps - elapsedSteps) / elapsedSteps
lowerBound = sum_of_times*scaling - 3*sigma*sqrt(scaling)
upperBound = sum_of_times*scaling + 3*sigma*sqrt(scaling)
With this, you can output the message saying that the thing will end between [lowerBound, upperBound] from now with some fixed probability (should be about 95%, but I probably missed some constant factor).
Here's what I've found works well! For the first 50% of the task, you assume the rate is constant and extrapolate. The time prediction is very stable and doesn't bounce much.
Once you pass 50%, you switch computation strategy. You take the fraction of the job left to do (1-p), then look back in time in a history of your own progress, and find (by binary search and linear interpolation) how long it's taken you to do the last (1-p) percentage and use that as your time estimate completion.
So if you're now 71% done, you have 29% remaining. You look back in your history and find how long ago you were at (71-29=42%) completion. Report that time as your ETA.
This is naturally adaptive. If you have X amount of work to do, it looks only at the time it took to do the X amount of work. At the end when you're at 99% done, it's using only very fresh, very recent data for the estimate.
It's not perfect of course but it smoothly changes and is especially accurate at the very end when it's most useful.
Whilst all the examples are valid, for the specific case of 'time left to download', I thought it would be a good idea to look at existing open source projects to see what they do.
From what I can see, Mozilla Firefox is the best at estimating the time remaining.
Mozilla Firefox
Firefox keeps a track of the last estimate for time remaining, and by using this and the current estimate for time remaining, it performs a smoothing function on the time.
See the ETA code here. This uses a 'speed' which is previously caculated here and is a smoothed average of the last 10 readings.
This is a little complex, so to paraphrase:
Take a smoothed average of the speed based 90% on the previous speed and 10% on the new speed.
With this smoothed average speed work out the estimated time remaining.
Use this estimated time remaining, and the previous estimated time remaining to created a new estimated time remaining (in order to avoid jumping)
Google Chrome
Chrome seems to jump about all over the place, and the code shows this.
One thing I do like with Chrome though is how they format time remaining.
For > 1 hour it says '1 hrs left'
For < 1 hour it says '59 mins left'
For < 1 minute it says '52 secs left'
You can see how it's formatted here
DownThemAll! Manager
It doesn't use anything clever, meaning the ETA jumps about all over the place.
See the code here
pySmartDL (a python downloader)
Takes the average ETA of the last 30 ETA calculations. Sounds like a reasonable way to do it.
See the code here/blob/916f2592db326241a2bf4d8f2e0719c58b71e385/pySmartDL/pySmartDL.py#L651)
Transmission
Gives a pretty good ETA in most cases (except when starting off, as might be expected).
Uses a smoothing factor over the past 5 readings, similar to Firefox but not quite as complex. Fundamentally similar to Gooli's answer.
See the code here
I usually use an Exponential Moving Average to compute the speed of an operation with a smoothing factor of say 0.1 and use that to compute the remaining time. This way all the measured speeds have influence on the current speed, but recent measurements have much more effect than those in the distant past.
In code it would look something like this:
alpha = 0.1 # smoothing factor
...
speed = (speed * (1 - alpha)) + (currentSpeed * alpha)
If your tasks are uniform in size, currentSpeed would simply be the time it took to execute the last task. If the tasks have different sizes and you know that one task is supposed to be i,e, twice as long as another, you can divide the time it took to execute the task by its relative size to get the current speed. Using speed you can compute the remaining time by multiplying it by the total size of the remaining tasks (or just by their number if the tasks are uniform).
Hopefully my explanation is clear enough, it's a bit late in the day.
In certain instances, when you need to perform the same task on a regular basis, it might be a good idea of using past completion times to average against.
For example, I have an application that loads the iTunes library via its COM interface. The size of a given iTunes library generally do not increase dramatically from launch-to-launch in terms of the number of items, so in this example it might be possible to track the last three load times and load rates and then average against that and compute your current ETA.
This would be hugely more accurate than an instantaneous measurement and probably more consistent as well.
However, this method depends upon the size of the task being relatively similar to the previous ones, so this would not work for a decompressing method or something else where any given byte stream is the data to be crunched.
Just my $0.02
First off, it helps to generate a running moving average. This weights more recent events more heavily.
To do this, keep a bunch of samples around (circular buffer or list), each a pair of progress and time. Keep the most recent N seconds of samples. Then generate a weighted average of the samples:
totalProgress += (curSample.progress - prevSample.progress) * scaleFactor
totalTime += (curSample.time - prevSample.time) * scaleFactor
where scaleFactor goes linearly from 0...1 as an inverse function of time in the past (thus weighing more recent samples more heavily). You can play around with this weighting, of course.
At the end, you can get the average rate of change:
averageProgressRate = (totalProgress / totalTime);
You can use this to figure out the ETA by dividing the remaining progress by this number.
However, while this gives you a good trending number, you have one other issue - jitter. If, due to natural variations, your rate of progress moves around a bit (it's noisy) - e.g. maybe you're using this to estimate file downloads - you'll notice that the noise can easily cause your ETA to jump around, especially if it's pretty far in the future (several minutes or more).
To avoid jitter from affecting your ETA too much, you want this average rate of change number to respond slowly to updates. One way to approach this is to keep around a cached value of averageProgressRate, and instead of instantly updating it to the trending number you've just calculated, you simulate it as a heavy physical object with mass, applying a simulated 'force' to slowly move it towards the trending number. With mass, it has a bit of inertia and is less likely to be affected by jitter.
Here's a rough sample:
// desiredAverageProgressRate is computed from the weighted average above
// m_averageProgressRate is a member variable also in progress units/sec
// lastTimeElapsed = the time delta in seconds (since last simulation)
// m_averageSpeed is a member variable in units/sec, used to hold the
// the velocity of m_averageProgressRate
const float frictionCoeff = 0.75f;
const float mass = 4.0f;
const float maxSpeedCoeff = 0.25f;
// lose 25% of our speed per sec, simulating friction
m_averageSeekSpeed *= pow(frictionCoeff, lastTimeElapsed);
float delta = desiredAvgProgressRate - m_averageProgressRate;
// update the velocity
float oldSpeed = m_averageSeekSpeed;
float accel = delta / mass;
m_averageSeekSpeed += accel * lastTimeElapsed; // v += at
// clamp the top speed to 25% of our current value
float sign = (m_averageSeekSpeed > 0.0f ? 1.0f : -1.0f);
float maxVal = m_averageProgressRate * maxSpeedCoeff;
if (fabs(m_averageSeekSpeed) > maxVal)
{
m_averageSeekSpeed = sign * maxVal;
}
// make sure they have the same sign
if ((m_averageSeekSpeed > 0.0f) == (delta > 0.0f))
{
float adjust = (oldSpeed + m_averageSeekSpeed) * 0.5f * lastTimeElapsed;
// don't overshoot.
if (fabs(adjust) > fabs(delta))
{
adjust = delta;
// apply damping
m_averageSeekSpeed *= 0.25f;
}
m_averageProgressRate += adjust;
}
Your question is a good one. If the problem can be broken up into discrete units having an accurate calculation often works best. Unfortunately this may not be the case even if you are installing 50 components each one might be 2% but one of them can be massive. One thing that I have had moderate success with is to clock the cpu and disk and give a decent estimate based on observational data. Knowing that certain check points are really point x allows you some opportunity to correct for environment factors (network, disk activity, CPU load). However this solution is not general in nature due to its reliance on observational data. Using ancillary data such as rpm file size helped me make my progress bars more accurate but they are never bullet proof.
Uniform averaging
The simplest approach would be to predict the remaining time linearly:
t_rem := t_spent ( n - prog ) / prog
where t_rem is the predicted ETA, t_spent is the time elapsed since the commencement of the operation, prog the number of microtasks completed out of their full quantity n. To explain—n may be the number of rows in a table to process or the number of files to copy.
This method having no parameters, one need not worry about the fine-tuning of the exponent of attenuation. The trade-off is poor adaptation to a changing progress rate because all samples have equal contribution to the estimate, whereas it is only meet that recent samples should be have more weight that old ones, which leads us to
Exponential smoothing of rate
in which the standard technique is to estimate progress rate by averaging previous point measurements:
rate := 1 / (n * dt); { rate equals normalized progress per unit time }
if prog = 1 then { if first microtask just completed }
rate_est := rate; { initialize the estimate }
else
begin
weight := Exp( - dt / DECAY_T );
rate_est := rate_est * weight + rate * (1.0 - weight);
t_rem := (1.0 - prog / n) / rate_est;
end;
where dt denotes the duration of the last completed microtask and is equal to the time passed since the previous progress update. Notice that weight is not a constant and must be adjusted according the length of time during which a certain rate was observed, because the longer we observed a certain speed the higher the exponential decay of the previous measurements. The constant DECAY_T denotes the length of time during which the weight of a sample decreases by a factor of e. SPWorley himself suggested a similar modification to gooli's proposal, although he applied it to the wrong term. An exponential average for equidistant measurements is:
Avg_e(n) = Avg_e(n-1) * alpha + m_n * (1 - alpha)
but what if the samples are not equidistant, as is the case with times in a typical progress bar? Take into account that alpha above is but an empirical quotient whose true value is:
alpha = Exp( - lambda * dt ),
where lambda is the parameter of the exponential window and dt the amount of change since the previous sample, which need not be time, but any linear and additive parameter. alpha is constant for equidistant measurements but varies with dt.
Mark that this method relies on a predefined time constant and is not scalable in time. In other words, if the exactly same process be uniformly slowed-down by a constant factor, this rate-based filter will become proportionally more sensitive to signal variations because at every step weight will be decreased. If we, however, desire a smoothing independent of the time scale, we should consider
Exponential smoothing of slowness
which is essentially the smoothing of rate turned upside down with the added simplification of a constant weight of because prog is growing by equidistant increments:
slowness := n * dt; { slowness is the amount of time per unity progress }
if prog = 1 then { if first microtask just completed }
slowness_est := slowness; { initialize the estimate }
else
begin
weight := Exp( - 1 / (n * DECAY_P ) );
slowness_est := slowness_est * weight + slowness * (1.0 - weight);
t_rem := (1.0 - prog / n) * slowness_est;
end;
The dimensionless constant DECAY_P denotes the normalized progress difference between two samples of which the weights are in the ratio of one to e. In other words, this constant determines the width of the smoothing window in progress domain, rather than in time domain. This technique is therefore independent of the time scale and has a constant spatial resolution.
Futher research: adaptive exponential smoothing
You are now equipped to try the various algorithms of adaptive exponential smoothing. Only remember to apply it to slowness rather than to rate.
I always wish these things would tell me a range. If it said, "This task will most likely be done in between 8 min and 30 minutes," then I have some idea of what kind of break to take. If it's bouncing all over the place, I'm tempted to watch it until it settles down, which is a big waste of time.
I have tried and simplified your "easy"/"wrong"/"OK" formula and it works best for me:
t / p - t
In Python:
>>> done=0.3; duration=10; "time left: %i" % (duration / done - duration)
'time left: 23'
That saves one op compared to (dur*(1-done)/done). And, in the edge case you describe, possibly ignoring the dialog for 30 minutes extra hardly matters after waiting all night.
Comparing this simple method to the one used by Transmission, I found it to be up to 72% more accurate.
I don't sweat it, it's a very small part of an application. I tell them what's going on, and let them go do something else.