Count Cycles in a time series

Count Cycles in a time series - algorithm

I have a device that is sending continuously data. The data received changes the waveform in time. For example, for some hours I could receive data like this one:
https://www.dropbox.com/s/g6thhtat1zx9rxm/1.PNG?dl=0
and after some time to begin receiving data like this:
https://www.dropbox.com/s/u10vckcplev0qyh/2.JPG?dl=0
What do I need:
Count the number of cycles
If the waveform is changed, to detect and to count cycles based on the new pattern
In the first image the algorithm shall count: 4 cycles
In the second image the algorithm shall count: 3 cycles

Calculate auto-correlation for signal.
If period does exist, its value should correspond to the first non-zero peak in AC power spectrum. Divide full length by period value to get number of periods.
Don't forget to check whether determined period is real one (perhaps it is not so simple problem in signal processing)

Related

Algorithm / data structure for rate of change calculation with limited memory

Certain sensors are to trigger a signal based on the rate of change of the value rather than a threshold.
For instance, heat detectors in fire alarms are supposed to trigger an alarm quicker if the rate of temperature rise is higher: A temperature rise of 1K/min should trigger an alarm after 30 minutes, a rise of 5K/min after 5 minutes and a rise of 30K/min after 30 seconds.
 
I am wondering how this is implemented in embedded systems, where resources are scares. Is there a clever data structure to minimize the data stored?
 
The naive approach would be to measure the temperature every 5 seconds or so and keep the data for 30 minutes. On these data one can calculate change rates over arbitrary time windows. But this requires a lot of memory.
 
I thought about small windows (e.g. 10 seconds) for which min and max are stored, but this would not save much memory.

From a mathematical point of view, the examples you have described can be greatly simplified:
1K/min for 30 mins equals a total change of 30K
5K/min for 5 mins equals a total change of 25K
Obviously there is some adjustment to be made because you have picked round numbers for the example, but it sounds like what you care about is having a single threshold for the total change. This makes sense because taking the integral of a differential results in just a delta.
However, if we disregard the numeric example and just focus on your original question then here are some answers:
First, it has already been mentioned in the comments that one byte every five seconds for half an hour is really not very much memory at all for almost any modern microcontroller, as long as you are able to keep your main RAM turned on between samples, which you usually can.
If however you need to discard the contents of RAM between samples to preserve battery life, then a simpler method is just to calculate one differential at a time.
In your example you want to have a much higher sample rate (every 5 seconds) than the time you wish to calculate the delta over (eg: 30 mins). You can reduce your storage needs to a single data point if you make your sample rate equal to your delta period. The single previous value could be stored in a small battery retained memory (eg: backup registers on STM32).
Obviously if you choose this approach you will have to compromise between accuracy and latency, but maybe 30 seconds would be a suitable timebase for your temperature alarm example.
You can also set several thresholds of K/sec, and then allocate counters to count how many consecutive times the each threshold has been exceeded. This requires only one extra integer per threshold.

In signal processing terms, the procedure you want to perform is:
Apply a low-pass filter to smooth quick variations in the temperature
Take the derivative of its output
The cut-off frequency of the filter would be set according to the time frame. There are 2 ways to do this.
You could apply a FIR (finite impulse response) filter, which is a weighted moving average over the time frame of interest. Naively, this requires a lot of memory, but it's not bad if you do a multi-stage decimation first to reduce your sample rate. It ends up being a little complicated, but you have fine control over the response.
You could apply in IIR (Infinite impulse response) filter, which utilizes feedback of the output. The exponential moving average is the simplest example of this. These filters require far less memory -- only a few samples' worth, but your control over the precise shape of the response is limited. A classic example like the Butterworth filter would probably be great for your application, though.

Schedule sending messages to consumers at different rate

I'm looking for best algorithm for message schedule. What I mean with message schedule is a way to send a messages on the bus when we have many consumers at different rate.
Example :
Suppose that we have data D1 to Dn
. D1 to send to many consumer C1 every 5ms, C2 every 19ms, C3 every 30ms, Cn every Rn ms
. Dn to send to C1 every 10ms, C2 every 31ms , Cn every 50ms
What is best algorithm which schedule this actions with the best performance (CPU, Memory, IO)?
Regards

I can think of quite a few options, each with their own costs and benefits. It really comes down to exactly what your needs are -- what really defines "best" for you. I've pseudocoded a couple possibilities below to hopefully help you get started.
Option 1: Execute the following every time unit (in your example, millisecond)
func callEachMs
time = getCurrentTime()
for each datum
for each customer
if time % datum.customer.rate == 0
sendMsg()
This has the advantage of requiring no consistently stored memory -- you just check at each time unit whether your should be sending a message. This can also deal with messages that weren't sent at time == 0 -- just store the time the message was initially sent modulo the rate, and replace the conditional with if time % datum.customer.rate == data.customer.firstMsgTimeMod.
A downside to this method is it is completely reliant on always being called at a rate of 1 ms. If there's lag caused by another process on a CPU and it misses a cycle, you may miss sending a message altogether (as opposed to sending it a little late).
Option 2: Maintain a list of lists of tuples, where each entry represents the tasks that need to be done at that millisecond. Make your list at least as long as the longest rate divided by the time unit (if your longest rate is 50 ms and you're going by ms, your list must be at least 50 long). When you start your program, place the first time a message will be sent into the queue. And then each time you send a message, update the next time you'll send it in that list.
func buildList(&list)
for each datum
for each customer
if list.size < datum.customer.rate
list.resize(datum.customer.rate+1)
list[customer.rate].push_back(tuple(datum.name, customer.name))
func callEachMs(&list)
for each (datum.name, customer.name) in list[0]
sendMsg()
list[customer.rate].push_back((datum.name, customer.name))
list.pop_front()
list.push_back(empty list)
This has the advantage of avoiding the many unnecessary modulus calculations option 1 required. However, that comes with the cost of increased memory usage. This implementation would also not be efficient if there's a large disparity in the rate of your various messages (although you could modify this to deal with algorithms with longer rates more efficiently). And it still has to be called every millisecond.
Finally, you'll have to think very carefully about what data structure you use, as this will make a huge difference in its efficiency. Because you pop from the front and push from the back at every iteration, and the list is a fixed size, you may want to implement a circular buffer to avoid unneeded moving of values. For the lists of tuples, since they're only ever iterated over (random access isn't needed), and there are frequent additions, a singly-linked list may be your best solution.
.
Obviously, there are many more ways that you could do this, but hopefully, these ideas can get you started. Also, keep in mind that the nature of the system you're running this on could have a strong effect on which method works better, or whether you want to do something else entirely. For example, both methods require that they can be reliably called at a certain rate. I also haven't described parallellized implementations, which may be the best option if your application supports them.

Like Helium_1s2 described, there is a second way which based on what I called a schedule table and this is what I used now but this solution has its limits.
Suppose that we have one data to send and two consumer C1 and C2 :
Like you can see we must extract our schedule table and we must identify the repeating transmission cycle and the value of IDLE MINIMUM PERIOD. In fact, it is useless to loop on the smallest peace of time ex 1ms or 1ns or 1mn or 1h (depending on the case) BUT it is not always the best period and we can optimize this loop as follows.
for example one (C1 at 6 and C2 at 9), we remark that there is cycle which repeats from 0 to 18. with a minimal difference of two consecutive send event equal to 3.
so :
HCF(6,9) = 3 = IDLE MINIMUM PERIOD
LCM(6,9) = 18 = transmission cycle length
LCM/HCF = 6 = size of our schedule table
And the schedule table is :
and the sending loop looks like :
while(1) {
sleep(IDLE_MINIMUM_PERIOD); // free CPU for idle min period
i++; // initialized at 0
send(ScheduleTable[i]);
if (i == sizeof(ScheduleTable)) i=0;
}
The problem with this method is that this array will grows if LCM grows which is the case if we have bad combination like with rate = prime number, etc.

Sliding Window over Time - Data Structure and Garbage Collection

I am trying to implement something along the lines of a Moving Average.
In this system, there are no guarantees of a quantity of Integers per time period. I do need to calculate the Average for each period. Therefore, I cannot simply slide over the list of integers by quantity as this would not be relative to time.
I can keep a record of each value with its associated time. We will have a ton of data running through the system so it is important to 'garbage collect' the old data.
It may also be important to note that I need to save the average to disk after the end of each period. However, they may be some overlap between saving the data to disk and having data from a new period being introduced.
What are some efficient data structures I can use to store, slide, and garbage collect this type of data?

The description of the problem and the question conflict: what is described is not a moving average, since the average for each time period is distinct. ("I need to compute the average for each period.") So that admits a truly trivial solution:
For each period, maintain a count and a sum of observations.
At the end of the period, compute the average
I suspect that what is actually wanted is something like: Every second (computation period), I want to know the average observation over the past minute (aggregation period).
This can be solved simply with a circular buffer of buckets, each of which represents the value for one computation period. There will be aggregation period / computation period such buckets. Again, each bucket contains a count and a sum. Also, a current total/sum and a cumulative total sum/count are maintained. Each observation is added to the current total/sum.
At the end of a each computation period:
subtract the sum/count for the (circularly) first period from the cumulative sum/count
add the current sum/count to the cumulative sum/count
report the average based on the cumulative sum/count
replace the values of the first period with the current sum/count
clear the current sum/count
advance the origin of the circular buffer.
If you really need to be able to compute at any time at all the average of the previous observations over some given period, you'd need a more complicated data structure, basically an expandable circular buffer. However, such precise computations are rarely actually necessary, and a bucketed approximation, as per the above algorithm, is usually adequate for data purposes, and is much more sustainable over the long term for memory management, since its memory requirements are fixed from the start.

Regrading simulation of bank-teller

we have a system, such as a bank, where customers arrive and wait on a
line until one of k tellers is available.Customer arrival is governed
by a probability distribution function, as is the service time (the
amount of time to be served once a teller is available). We are
interested in statistics such as how long on average a customer has to
wait or how long the line might be.
We can use the probability functions to generate an input stream
consisting of ordered pairs of arrival time and service time for each
customer, sorted by arrival time. We do not need to use the exact time
of day. Rather, we can use a quantum unit, which we will refer to as
a tick.
One way to do this simulation is to start a simulation clock at zero
ticks. We then advance the clock one tick at a time, checking to see
if there is an event. If there is, then we process the event(s) and
compile statistics. When there are no customers left in the input
stream and all the tellers are free, then the simulation is over.
The problem with this simulation strategy is that its running time
does not depend on the number of customers or events (there are two
events per customer), but instead depends on the number of ticks,
which is not really part of the input. To see why this is important,
suppose we changed the clock units to milliticks and multiplied all
the times in the input by 1,000. The result would be that the
simulation would take 1,000 times longer!
My question on above text is how author came in last paragraph what does author mean by " suppose we changed the clock units to milliticks and multiplied all the times in the input by 1,000. The result would be that the simulation would take 1,000 times longer!" ?
Thanks!

With this algorithm we have to check every tick. More ticks there are the more checks we carry out. For example if first customers arrives at 3rd tick, then we had to do 2 unnecessary checks. But if we would check every millitick then we would have to do 2999 unnecessary checks.

Because the checking is being carried out on a per tick basis if the number of ticks is multiplied by 1000 then there will be 1000 times more checks.

Imagine that you set an alarm so that you perform a task, like checking your email, every hour. This means you would check your email 24 times in day, assuming you didn't sleep. If you decide to change this alarm so that it goes off every minute you would now be checking your email 24*60 = 1440 times per day, where 24 is the number of times you were checking it before and 60 is the number of minutes in an hour.
This is exactly what happens in the simulation above, except rather than perform some action every time an alarm goes off, you just do all 1440 email checks as quickly as you can.

Calculating number of messages per second in a rolling window?

I have messages coming into my program with millisecond resolution (anywhere from zero to a couple hundred messages a millisecond).
I'd like to do some analysis. Specifically, I want to maintain multiple rolling windows of the message counts, updated as messages come in. For example,
# of messages in last second
# of messages in last minute
# of messages in last half-hour divided by # of messages in last hour
I can't just maintain a simple count like "1,017 messages in last second", since I won't know when a message is older than 1 second and therefore should no longer be in the count...
I thought of maintaining a queue of all the messages, searching for the youngest message that's older than one second, and inferring the count from the index. However, this seems like it would be too slow, and would eat up a lot of memory.
What can I do to keep track of these counts in my program so that I can efficiently get these values in real-time?

This is easiest handled by a cyclic buffer.
A cyclic buffer has a fixed number of elements, and a pointer to it. You can add an element to the buffer, and when you do, you increment the pointer to the next element. If you get past the fixed-length buffer you start from the beginning. It's a space and time efficient way to store "last N" items.
Now in your case you could have one cyclic buffer of 1,000 counters, each one counting the number of messages during one millisecond. Adding all the 1,000 counters gives you the total count during last second. Of course you can optimize the reporting part by incrementally updating the count, i.e. deduct form the count the number you overwrite when you insert and then add the new number.
You can then have another cyclic buffer that has 60 slots and counts the aggregate number of messages in whole seconds; once a second, you take the total count of the millisecond buffer and write the count to the buffer having resolution of seconds, etc.
Here C-like pseudocode:
int msecbuf[1000]; // initialized with zeroes
int secbuf[60]; // ditto
int msecptr = 0, secptr = 0;
int count = 0;
int msec_total_ctr = 0;
void msg_received() { count++; }
void every_msec() {
msec_total_ctr -= msecbuf[msecptr];
msecbuf[msecptr] = count;
msec_total_ctr += msecbuf[msecptr];
count = 0;
msecptr = (msecptr + 1) % 1000;
}
void every_sec() {
secbuf[secptr] = msec_total_ctr;
secptr = (secptr + 1) % 60;
}

You want exponential smoothing, otherwise known as an exponential weighted moving average. Take an EWMA of the time since the last message arrived, and then divide that time into a second. You can run several of these with different weights to cover effectively longer time intervals. Effectively, you're using an infinitely long window then, so you don't have to worry about expiring data; the reducing weights do it for you.

For the last millisecord, keep the count. When the millisecord slice goes to the next one, reset count and add count to a millisecond rolling buffer array. If you keep this cummulative, you can extract the # of messages / second with a fixed amount of memory.
When a 0,1 second slice (or some other small value next to 1 minute) is done, sum up last 0,1*1000 items from the rolling buffer array and place that in the next rolling buffer. This way you kan keep the millisecord rolling buffer small (1000 items for 1s max lookup) and the buffer for lookup the minute also (600 items).
You can do the next trick for whole minutes of 0,1 minutes intervals. All questions asked can be queried by summing (or when using cummulative , substracting two values) a few integers.
The only disadvantage is that the last sec value wil change every ms and the minute value only every 0,1 secand the hour value (and derivatives with the % in last 1/2 hour) every 0,1 minute. But at least you keep your memory usage at bay.

Your rolling display window can only update so fast, lets say you want to update it 10 times a second, so for 1 second's worth of data, you would need 10 values. Each value would contain the number of messages that showed up in that 1/10 of a second. Lets call these values bins, each bin holds 1/10 of a second's worth of data. Every 100 milliseconds, one of the bins gets discarded and a new bin is set to the number of messages that have show up in that 100 milliseconds.
You would need an array of 36K bins to hold an hour's worth information about your message rate if you wanted to preserve a precision of 1/10 of a second for the whole hour. But that seems overkill.
But I think it would be more reasonable to have the precision drop off as the time inteval gets larger.
Maybe you keep 1 second's worth of data accurate to 100 milliseconds, 1 minutes worth of data accurate to the second, 1 hour's worth of data accurate to the minute, and so on.

I thought of maintaining a queue of all the messages, searching for the youngest message that's older than one second, and inferring the count from the index. However, this seems like it would be too slow, and would eat up a lot of memory.
A better idea would be maintaining a linked list of the messages, adding new messages to the head (with a timestamp), and popping them from the tail as they expire. Or even not pop them - just keep a pointer to the oldest message that came in in the desired timeframe, and advance it towards the head when that message expires (this allows you to keep track of multiply timeframes with one list).
You could compute the count when needed by walking from the tail to the head, or just store the count separately, incrementing it whenever you add a value to the head, and decrementing it whenever you advance the tail.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio