Algorithm to estimate total active time from timestamps

Algorithm to estimate total active time from timestamps - algorithm

Say I have a a series of timestamps (from visitor impressions), and it looks like (omitting dates):
01:02:13
01:03:29
01:04:34
02:19:29
09:45:10
09:46:20
.....
In the above case, I'd want to sum up the time passed from the first 4 timestamps (1 to 2AM), and separate them from the 9AM events.
Is there a clever way to estimate combined active time, other than utilizing a timeout (say a 300+ seconds timeout indicates a new session). If not, what's a standard / good timeout to use?

Related

Algorithm / data structure for rate of change calculation with limited memory

Certain sensors are to trigger a signal based on the rate of change of the value rather than a threshold.
For instance, heat detectors in fire alarms are supposed to trigger an alarm quicker if the rate of temperature rise is higher: A temperature rise of 1K/min should trigger an alarm after 30 minutes, a rise of 5K/min after 5 minutes and a rise of 30K/min after 30 seconds.
 
I am wondering how this is implemented in embedded systems, where resources are scares. Is there a clever data structure to minimize the data stored?
 
The naive approach would be to measure the temperature every 5 seconds or so and keep the data for 30 minutes. On these data one can calculate change rates over arbitrary time windows. But this requires a lot of memory.
 
I thought about small windows (e.g. 10 seconds) for which min and max are stored, but this would not save much memory.

From a mathematical point of view, the examples you have described can be greatly simplified:
1K/min for 30 mins equals a total change of 30K
5K/min for 5 mins equals a total change of 25K
Obviously there is some adjustment to be made because you have picked round numbers for the example, but it sounds like what you care about is having a single threshold for the total change. This makes sense because taking the integral of a differential results in just a delta.
However, if we disregard the numeric example and just focus on your original question then here are some answers:
First, it has already been mentioned in the comments that one byte every five seconds for half an hour is really not very much memory at all for almost any modern microcontroller, as long as you are able to keep your main RAM turned on between samples, which you usually can.
If however you need to discard the contents of RAM between samples to preserve battery life, then a simpler method is just to calculate one differential at a time.
In your example you want to have a much higher sample rate (every 5 seconds) than the time you wish to calculate the delta over (eg: 30 mins). You can reduce your storage needs to a single data point if you make your sample rate equal to your delta period. The single previous value could be stored in a small battery retained memory (eg: backup registers on STM32).
Obviously if you choose this approach you will have to compromise between accuracy and latency, but maybe 30 seconds would be a suitable timebase for your temperature alarm example.
You can also set several thresholds of K/sec, and then allocate counters to count how many consecutive times the each threshold has been exceeded. This requires only one extra integer per threshold.

In signal processing terms, the procedure you want to perform is:
Apply a low-pass filter to smooth quick variations in the temperature
Take the derivative of its output
The cut-off frequency of the filter would be set according to the time frame. There are 2 ways to do this.
You could apply a FIR (finite impulse response) filter, which is a weighted moving average over the time frame of interest. Naively, this requires a lot of memory, but it's not bad if you do a multi-stage decimation first to reduce your sample rate. It ends up being a little complicated, but you have fine control over the response.
You could apply in IIR (Infinite impulse response) filter, which utilizes feedback of the output. The exponential moving average is the simplest example of this. These filters require far less memory -- only a few samples' worth, but your control over the precise shape of the response is limited. A classic example like the Butterworth filter would probably be great for your application, though.

Is there a way to calculate progress rate without total process count?

I think this is difficult thing.
In general I know that I need total and current count for gaining rate to something.
But in this case, I cannot get total count.
For example, there are two jobs, A and B.
Their total process will be always set randomly.
Also, I cannot get job's total process count before job be ended.
I have one of method that set concreted rate each jobs like if A is done, set rate 50%.
But in this situation that A's count is 10 and B's count is 1000 will make strange result.
Although total count is 1010, it is 50% that 10 process is done.
It is something strange.
So, I want to offer more natural progress rate to users. But I don't have total process count.
Is there any useful method alternative generic percentage calculation?

If you want to know how much total progress you have without knowing how much total progress there could be, this is logically impossible
However, you could
estimate it
keep historical data
assume the maximum and just surprise the user when it's faster
To instead show the rate of progress
take the current time at the start of your process and subtract the time when you check again
divide the completed jobs by that amount to get the jobs/second
Roughly
rate = jobs_completed / (time_now - time_start)
You can also do this over some window, but you need to record both the time and the number of jobs completed at the start of the window to subtract off both to get just the jobs in your time window
rate_windowed = (jobs_completed - jobs_previous) / (time_now - time_previous)

Distribute user active time blocks subject to total constraint

I am building an agent-based model for product usage. I am trying to develop a function to decide whether the user is using the product at a given time, while incorporating randomness.
So, say we know the user spends a total of 1 hour per day using the product, and we know the average distribution of this time (e.g., most used at 6-8pm).
How can I generate a set of usage/non-usage times (i.e., during each 10 minute block is the user active or not) while ensuring that at the end of the day the total active time sums to one hour.
In most cases I would just run the distributor without concern for the total, and then at the end normalize by making it proportional to the total target time so the total was 1 hour. However, I can't do that because time blocks must be 10 minutes. I think this is a different question because I'm really not computing time ranges, I'm computing booleans to associate with different 10 minute time blocks (e.g., the user was/was not active during a given block).
Is there a standard way to do this?

I did some more thinking and figured it out, if anyone else is looking at this.
The approach to take is this: You know the allowed number n of 10-minute time blocks for a given agent.
Iterate n times, and on each iteration select a time block out of the day subject to your activity distribution function.
Main point is to iterate over the number of time blocks you want to place, not over the entire day.

Best algorithm for threshold identitication

Assume I have huge set of data about a system idle time.
Day 1 - 5 mins
Day 2 - 3 mins
Day 3 - 7 mins
...
Day 'n' - 'k' mins
We can assume that even though the idletime is random, the pattern repeats.
Using this as a training data, is it possible for me to identify the idle time behavior of the system. With that, can a abnormality be predicted
Which algorithm would best suit for this purpose
I tried to fit in regression, but it can just answer me " What is the expected idle time today "
But what I want to do is. When the idle time goes away from the pattern, it has to be detected.
Edit:
Or does it make sense to predict for the current day only. i.e Today the expected idle time is 'x' mins. Tomorrow it may differ

I would try a Fourier Transformation and have a look if your system behaves in a periodic way (this would mean there are some peaks in the frequency domain).
Than get rid of the frequencies with low values and use the rest to predict the system behavior in the future.
If the real behavior differs a lot from the prediction that is what you want to detect.
wikipedia: Fast Fourier Transformation

Regrading simulation of bank-teller

we have a system, such as a bank, where customers arrive and wait on a
line until one of k tellers is available.Customer arrival is governed
by a probability distribution function, as is the service time (the
amount of time to be served once a teller is available). We are
interested in statistics such as how long on average a customer has to
wait or how long the line might be.
We can use the probability functions to generate an input stream
consisting of ordered pairs of arrival time and service time for each
customer, sorted by arrival time. We do not need to use the exact time
of day. Rather, we can use a quantum unit, which we will refer to as
a tick.
One way to do this simulation is to start a simulation clock at zero
ticks. We then advance the clock one tick at a time, checking to see
if there is an event. If there is, then we process the event(s) and
compile statistics. When there are no customers left in the input
stream and all the tellers are free, then the simulation is over.
The problem with this simulation strategy is that its running time
does not depend on the number of customers or events (there are two
events per customer), but instead depends on the number of ticks,
which is not really part of the input. To see why this is important,
suppose we changed the clock units to milliticks and multiplied all
the times in the input by 1,000. The result would be that the
simulation would take 1,000 times longer!
My question on above text is how author came in last paragraph what does author mean by " suppose we changed the clock units to milliticks and multiplied all the times in the input by 1,000. The result would be that the simulation would take 1,000 times longer!" ?
Thanks!

With this algorithm we have to check every tick. More ticks there are the more checks we carry out. For example if first customers arrives at 3rd tick, then we had to do 2 unnecessary checks. But if we would check every millitick then we would have to do 2999 unnecessary checks.

Because the checking is being carried out on a per tick basis if the number of ticks is multiplied by 1000 then there will be 1000 times more checks.

Imagine that you set an alarm so that you perform a task, like checking your email, every hour. This means you would check your email 24 times in day, assuming you didn't sleep. If you decide to change this alarm so that it goes off every minute you would now be checking your email 24*60 = 1440 times per day, where 24 is the number of times you were checking it before and 60 is the number of minutes in an hour.
This is exactly what happens in the simulation above, except rather than perform some action every time an alarm goes off, you just do all 1440 email checks as quickly as you can.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio