How do programs estimate how long a process will take to complete?

How do programs estimate how long a process will take to complete? - time

In some loading bars there will be something like "2 minutes remaining". Does the programmer time how long the process takes on their computer and use that value some how? Or does the program calculate it all itself? Or another method?

This calculation is actually done internally because timing how long it would take to execute or download a certain program is based on internet speed, RAM, processor speed, etc. so it would be hard to have one universally predicted time based on the programmer's computer. Typically how this is calculated is based on how much of the program is already downloaded in comparison to the size of the file and takes into account how long it took to download that much data. From there the program extrapolates how much longer it will take to finish your download based on how fast it has operated until that point in time.

Those 'x minutes remaining' interface elements, which (ideally) indicate how much time it will take to complete a certain task are simply forecasts based on how long that task has taken so far, and how much work on that task has been accomplished.
For example, suppose I have a app that will upload a batch of images to a server (all of which are the same size, for simplicity). Here's a general idea of how the code that indicates time remaining will work:
Before we begin, we assign the current time to a variable. Also, at this point the time remaining indicator is not visible. Then, in a for... loop:
for (var i = 0; i < batchOfImages.length; i++)
{
We upload an image
After the image is uploaded, we get the difference between the current time and the start time. This is the total time expended on this task so far.
We then divide the total time expended by the total number of images uploaded so far, i + 1, to get the amount of time it has taken on average to upload each image so far
We then multiply the average upload time per image by the number of images remaining to upload to get the likely amount of time it will take to upload the remaining images
We update the text on the indicator to show amount of time remaining, and we make sure that the indicator is visible
}

Related

Algorithm / data structure for rate of change calculation with limited memory

Certain sensors are to trigger a signal based on the rate of change of the value rather than a threshold.
For instance, heat detectors in fire alarms are supposed to trigger an alarm quicker if the rate of temperature rise is higher: A temperature rise of 1K/min should trigger an alarm after 30 minutes, a rise of 5K/min after 5 minutes and a rise of 30K/min after 30 seconds.
 
I am wondering how this is implemented in embedded systems, where resources are scares. Is there a clever data structure to minimize the data stored?
 
The naive approach would be to measure the temperature every 5 seconds or so and keep the data for 30 minutes. On these data one can calculate change rates over arbitrary time windows. But this requires a lot of memory.
 
I thought about small windows (e.g. 10 seconds) for which min and max are stored, but this would not save much memory.

From a mathematical point of view, the examples you have described can be greatly simplified:
1K/min for 30 mins equals a total change of 30K
5K/min for 5 mins equals a total change of 25K
Obviously there is some adjustment to be made because you have picked round numbers for the example, but it sounds like what you care about is having a single threshold for the total change. This makes sense because taking the integral of a differential results in just a delta.
However, if we disregard the numeric example and just focus on your original question then here are some answers:
First, it has already been mentioned in the comments that one byte every five seconds for half an hour is really not very much memory at all for almost any modern microcontroller, as long as you are able to keep your main RAM turned on between samples, which you usually can.
If however you need to discard the contents of RAM between samples to preserve battery life, then a simpler method is just to calculate one differential at a time.
In your example you want to have a much higher sample rate (every 5 seconds) than the time you wish to calculate the delta over (eg: 30 mins). You can reduce your storage needs to a single data point if you make your sample rate equal to your delta period. The single previous value could be stored in a small battery retained memory (eg: backup registers on STM32).
Obviously if you choose this approach you will have to compromise between accuracy and latency, but maybe 30 seconds would be a suitable timebase for your temperature alarm example.
You can also set several thresholds of K/sec, and then allocate counters to count how many consecutive times the each threshold has been exceeded. This requires only one extra integer per threshold.

In signal processing terms, the procedure you want to perform is:
Apply a low-pass filter to smooth quick variations in the temperature
Take the derivative of its output
The cut-off frequency of the filter would be set according to the time frame. There are 2 ways to do this.
You could apply a FIR (finite impulse response) filter, which is a weighted moving average over the time frame of interest. Naively, this requires a lot of memory, but it's not bad if you do a multi-stage decimation first to reduce your sample rate. It ends up being a little complicated, but you have fine control over the response.
You could apply in IIR (Infinite impulse response) filter, which utilizes feedback of the output. The exponential moving average is the simplest example of this. These filters require far less memory -- only a few samples' worth, but your control over the precise shape of the response is limited. A classic example like the Butterworth filter would probably be great for your application, though.

Distribute user active time blocks subject to total constraint

I am building an agent-based model for product usage. I am trying to develop a function to decide whether the user is using the product at a given time, while incorporating randomness.
So, say we know the user spends a total of 1 hour per day using the product, and we know the average distribution of this time (e.g., most used at 6-8pm).
How can I generate a set of usage/non-usage times (i.e., during each 10 minute block is the user active or not) while ensuring that at the end of the day the total active time sums to one hour.
In most cases I would just run the distributor without concern for the total, and then at the end normalize by making it proportional to the total target time so the total was 1 hour. However, I can't do that because time blocks must be 10 minutes. I think this is a different question because I'm really not computing time ranges, I'm computing booleans to associate with different 10 minute time blocks (e.g., the user was/was not active during a given block).
Is there a standard way to do this?

I did some more thinking and figured it out, if anyone else is looking at this.
The approach to take is this: You know the allowed number n of 10-minute time blocks for a given agent.
Iterate n times, and on each iteration select a time block out of the day subject to your activity distribution function.
Main point is to iterate over the number of time blocks you want to place, not over the entire day.

Bin packing parts of a dynamic set, considering lastupdate

There's a large set of objects. Set is dynamic: objects can be added or deleted any time. Let's call the total number of objects N.
Each object has two properties: mass (M) and time (T) of last update.
Every X minutes a small batch of those should be selected for processing, which updates their T to current time. Total M of all objects in a batch is limited: not more than L.
I am looking to solve three tasks here:
find a next batch object picking algorithm;
introduce object classes: simple, priority (granted fit into at least each n-th batch) and frequent (fit into each batch);
forecast system capacity exhaust (time to add next server = increase L).
What kind of model best describes such a system?
The whole thing is about a service that processes the "objects" in time intervals. Each object should be "measured" each N hours. N can vary in a range. X is fixed.
Objects are added/deleted by humans. N grows exponentially, rather slow, with some spikes caused by publications. Of course forecast can't be precise, just some estimate. M varies from 0 to 1E7 with exponential distribution, most are closer to 0.
I see there can be several strategies here:
A. full throttle - pack each batch as much as close to 100%. As N grows, average interval a particular object gets a hit will grow.
B. equal temperament :) - try to keep an average interval around some value. A batch fill level will be growing from some low level. When it reaches closer to 100% – time to get more servers.
C. - ?

Here is a pretty complete design for your problem.
Your question does not optimally match your description of the system this is for. So I'll assume that the description is accurate.
When you schedule a measurement you should pass an object, a first time it can be measured, and when you want the measurement to happen by. The object should have a weight attribute and a measured method. When the measurement happens, the measured method will be called, and the difference between your classes is whether, and with what parameters, they will reschedule themselves.
Internally you will need a couple of priority queues. See http://en.wikipedia.org/wiki/Heap_(data_structure) for details on how to implement one.
The first queue is by time the measurement can happen, all of the objects that can't be measured yet. Every time you schedule a batch you will use that to find all of the new measurements that can happen.
The second queue is of measurements that are ready to go now, and is organized by which scheduling period they should happen by, and then weight. I would make them both ascending. You can schedule a batch by pulling items off of that queue until you've got enough to send off.
Now you need to know how much to put in each batch. Given the system that you have described, a spike of events can be put in manually, but over time you'd like those spikes to smooth out. Therefore I would recommend option B, equal temperament. So to do this, as you put each object into the "ready now" queue, you can calculate its "average work weight" as its weight divided by the number of periods until it is supposed to happen. Store that with the object, and keep a running total of what run rate you should be at. Every period I would suggest that you keep adding to the batch until one of three conditions has been met:
You run out of objects.
You hit your maximum batch capacity.
You exceed 1.1 times your running total of your average work weight. The extra 10% is because it is better to use a bit more capacity now than to run out of capacity later.
And finally, capacity planning.
For this you need to use some heuristic. Here is a reasonable one which may need some tweaking for your system. Maintain an array of your past 10 measurements of running total of average work weight. Maintain an "exponentially damped average of your high water mark." Do that by updating each time according to the formula:
average_high_water_mark
= 0.95 * average_high_water_mark
+ 0.5 * max(last 10 running work weight)
If average_high_water_mark ever gets within, say, 2 servers of your maximum capacity, then add more servers. (The idea is that a server should be able to die without leaving you hosed.)

I think answer A is good. Bin packing is to maximize or minimize and you have only one batch. Sort the objects by m and n.

Algorithm to distribute heartbeats?

I am building a sensor network where a large number of sensors report their status to a central hub. The sensors need to report status atleast once every 3 hours, but I want to make sure that the hub does not get innundated with too many reports at any given time. So to mitigate this, I let the hub tell the sensors the 'next report time'.
Now I am looking for any standard algorithms for doing some load balancing of these updates, such that the sensors dont exceed a set interval between reports and the hub can calculate the next report time such that its load (of receiving reports) is evenly divided over the day.
Any help will be appreciated.

If you know how many sensors there are, just divide up every three hour chunk into that many time slots and (either randomly or programmatically as you need), assign one to each sensor.
If you don't, you can still divide up every three hour chunk into some large number of time slots and assign them to sensors. In your assignment algorithm, you just have to make sure that all the slots have one assigned sensor before any of them have two, and all of them have two before any of them have three, etc.

Easiest solution: Is there any reason why the hub cannot poll the sensors according to its own schedule?
Otherwise you may want to devise a system where the hub can decide whether or not to accept a report based on its own load. If a sensor has its connection denied make it wait an random period of time and retry. Over time the sensors should space themselves out more or less optimally.
IIRC some facet of TCP/IP uses a similar method, but I'm drawing a blank as to which.

I would use a base of 90 minutes with a randomized variation over a 30-minute range, so that the intervals are randomly beteween 60 and 120 minutes. Adjust these numbers if you want to get closer to the 3-hour interval but I would personally stay well under it

Download time remaning predictor

Are there any widgets for predicting when a download (or any other process) will finish based on percent done history?
The trivial version would just do a 2 point fit based on the start time, current time and percent done but better option are possible.
A GUI widgest would be nice but a class that just returns the value would be just fine.

For the theoretical algorithm that I would attempt, if I would write such a widget, would be something like:
Record the amount of data transferred within a one second period (a literal KiB/s)
Remember the last 5 or 10 such periods (to get an an recent average KiB/s)
Subtract the total size from the transferred size (to get a "bytes remaining")
???
Widget!
That oughta do it...
(the missing step being: kibibytes remaining divided by average KiB/s)

Most progress bar widgets have an update() method which takes a percentage done.
They then calculate the time remainging based on the time since start and the latest percentage. If you have a process with a long setup time you might want to add more functionality so that it doesn't include this time, perhaps by having the prediction clock reset when sent a 0% update.

It would sort of have the same effect as some of the other proposals but by collecting percent done data as a function of time statistical methods could be used to generate a R^2 best fit line and project it to 100% done. To make it take into account the current operation a heavier weight could be placed on the newer data (or the older data could be thinned) and to make it ignore short term fluctuations, a heavy weighting can be put on the first data point.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio