Algorithm to detect sawtooth like timeSeries - algorithm

What you see here is a Graph of acceleration on the Vertical Axis (or head to toe axis) of a person walking.
I want to Implement a reliable method to recognise this pattern of motion and count no of steps.
As we can immediately notice each step corresponds to a spike and dip from the mean around 10-10.5 ms^2 line.
Earlier I planned on a Threshold detection based mechanism but that yielded very poor results because there are some variables:
If the person walks slower or faster the graph would expand out in time axis
If a person steps lighter or harder then the spikes and dips are smaller and larger respectively
however in all of the cases the pattern is still the same that is a spike and dip at almost regular intervals
what is the best reasonable algorithm to detect this pattern with reasonable accuracy and computing time

Never mind I figured it out, it was rather very simple ,all I had to do was decide a noise threshold and a base level or zero level then run a peak detector on it
following is the abstract procedure
Base level is calculated in Real time as average of last 30 samples
Values above base level - noise threshold were considered as positive spikes
Values below base level - noise threshold were considered as negative spikes
A pair of subsequent positive and negative spikes detected within a short interval of about ~500ms is considered as step.
with proper tuning the accuracy is ~98% and can count no of steps taken very reliably

Related

How to compute "take off" and "landing" time given a time series of altitudes?

Given a sequence of altitudes over time for an aircraft, how can I compute the specific time when the aircraft departs the ground (takes off) and returns to the ground for the final time (lands)?
Consider that an aircraft may depart an airport with a different altitude than the aircraft it lands at.
Also consider that the altitude during taxi may change slightly but should not be considered a take off.
NOTE An aircraft is likely to have "departed" the ground when its altitude has changed more than 1000 feet from it starting altitude. The same would be true in reverse for a landing.
A simple, probably-good-enough idea: first, apply smoothing (e.g. Gaussian) to the data, then take the derivative. Call the first time point where the derivative differs significantly from zero the takeoff, and the last time point where the derivative differs significantly from zero the landing. Tweak the smoothing time constant and the threshold for "significantly different from zero" until you get good results with your data.

Computational Complexity of Finding Area Under Discrete Curve

I apologize if my questions are extremely misguided or loosely scoped. Math is not my strongest subject. For context, I am trying to figure out the computational complexity of calculating the area under a discrete curve. In the particular use case that I am interested in, the y-axis is the length of a queue and the x-axis is time. The curve will always have the following bounds: it begins at zero, it is composed of multiple timestamped samples that are greater than zero, and it eventually shrinks to zero. My initial research has yielded two potential mathematical approaches to this problem. The first is a Reimann sum over domain [a, b] where a is initially zero and b eventually becomes zero (not sure if my understanding is completely correct there). I think the mathematical representation of this the formula found here:
https://en.wikipedia.org/wiki/Riemann_sum#Connection_with_integration.
The second is a discrete convolution. However, I am unable to tell the difference between, and applicability of, a discrete convolution and a Reimann sum over domain [a, b] where a is initially zero and b eventually becomes zero.
My questions are:
Is there are difference between the two?
Which approach is most applicable/efficient for what I am trying to figure out?
Is it even appropriate ask the computation complexity of either mathematical approach? If so, what are the complexities of each in this particular application?
Edit:
For added context, there will be a function calculating average queue length by taking the sum of the area under two separate curves and dividing it by the total time interval spanning those two curves. The particular application can be seen on page 168 of this paper: https://www.cse.wustl.edu/~jain/cv/raj_jain_paper4_decbit.pdf
Is there are difference between the two?
A discrete convolution requires two functions. If the first one corresponds to the discrete curve, what is the second one?
Which approach is most applicable/efficient for what I am trying to figure out?
A Riemann sum is an approximation of an integral. It's typically used to approximate the area under a continuous curve. You can of course use it on a discrete curve, but it's not an approximation anymore, and I'm not sure you can call it a "Riemann" sum.
Is it even appropriate ask the computation complexity of either mathematical approach? If so, what are the complexities of each in this particular application?
In any case, the complexity of computing the area under a dicrete curve is linear in the number of samples, and it's pretty straightforward to find why: you need to do something with each sample, once or twice.
What you probably want looks like a Riemann sum with the trapezoidal rule. Pick the first two samples, calculate their average, and multiply that by the distance between two samples. Repeat for every adjacent pair and sum it all.
So, this is for the router feedback filter in the referenced paper...
That algorithm is specifically designed so that you can implement it without storing a lot of samples and timestamps.
It works by accumulating total queue_length * time during each cycle.
At the start of each "cycle", record the current queue length and current clock time and set the current cycle's total to 0. (The paper defines the cycle so that the queue length is 0 at the start, but that's not important here)
every time the queue length changes, get the new current clock time and add (new_clock_time - previous_clock_time) * previous_queue_length to the total. Also do this at the end of the cycle. Then, record new new current queue length and current clock time.
When you need to calculate the current "average queue length", it's just (previous_cycle_total + current_cycle_total + (current_clock_time - previous_clock_time)*previous_queue_length) / total_time_since_previous_cycle_start

Parameter Tuning for Perceptron Learning Algorithm

I'm having sort of an issue trying to figure out how to tune the parameters for my perceptron algorithm so that it performs relatively well on unseen data.
I've implemented a verified working perceptron algorithm and I'd like to figure out a method by which I can tune the numbers of iterations and the learning rate of the perceptron. These are the two parameters I'm interested in.
I know that the learning rate of the perceptron doesn't affect whether or not the algorithm converges and completes. I'm trying to grasp how to change n. Too fast and it'll swing around a lot, and too low and it'll take longer.
As for the number of iterations, I'm not entirely sure how to determine an ideal number.
In any case, any help would be appreciated. Thanks.
Start with a small number of iterations (it's actually more conventional to count 'epochs' rather than iterations--'epochs' refers to the number of iterations through the entire data set used to train the network). By 'small' let's say something like 50 epochs. The reason for this is that you want to see how the total error is changing with each additional training cycle (epoch)--hopefully it's going down (more on 'total error' below).
Obviously you are interested in the point (the number of epochs) where the next additional epoch does not cause a further decrease in total error. So begin with a small number of epochs so you can approach that point by increasing the epochs.
The learning rate you begin with should not be too fine or too coarse, (obviously subjective but hopefully you have a rough sense for what is a large versus small learning rate).
Next, insert a few lines of testing code in your perceptron--really just a few well-placed 'print' statements. For each iteration, calculate and show the delta (actual value for each data point in the training data minus predicted value) then sum the individual delta values over all points (data rows) in the training data (i usually take the absolute value of the delta, or you can take the square root of the sum of the squared differences--doesn't matter too much. Call that summed value "total error"--just to be clear, this is total error (sum of the error across all nodes) per epoch.
Then, plot the total error as a function of epoch number (ie, epoch number on the x axis, total error on the y axis). Initially of course, you'll see the data points in the upper left-hand corner trending down and to the right and with a decreasing slope
Let the algorithm train the network against the training data. Increase the epochs (by e.g., 10 per run) until you see the curve (total error versus epoch number) flatten--i.e., additional iterations doesn't cause a decrease in total error.
So the slope of that curve is important and so is its vertical position--ie., how much total error you have and whether it continues to trend downward with more training cycles (epochs). If, after increasing epochs, you eventually notice an increase in error, start again with a lower learning rate.
The learning rate (usually a fraction between about 0.01 and 0.2) will certainly affect how quickly the network is trained--i.e., it can move you to the local minimum more quickly. It can also cause you to jump over it. So code a loop that trains a network, let's say five separate times, using a fixed number of epochs (and a the same starting point) each time but varying the learning rate from e.g., 0.05 to 0.2, each time increasing the learning rate by 0.05.
One more parameter is important here (though not strictly necessary), 'momentum'. As the name suggests, using a momentum term will help you get an adequately trained network more quickly. In essence, momentum is a multiplier to the learning rate--as long as the the error rate is decreasing, the momentum term accelerates the progress. The intuition behind the momentum term is 'as long as you traveling toward the destination, increase your velocity'.Typical values for the momentum term are 0.1 or 0.2. In the training scheme above, you should probably hold momentum constant while varying the learning rate.
About the learning rate not affecting whether or not the perceptron converges - That's not true. If you choose a learning rate that is too high, you will probably get a divergent network. If you change the learning rate during learning, and it drops too fast (i.e stronger than 1/n) you can also get a network that never converges (That's because the sum of N(t) over t from 1 to inf is finite. that means the vector of weights can only change by a finite amount).
Theoretically it can be shown for simple cases that changing n (learning rate) according to 1/t (where t is the number of presented examples) should work good, but I actually found that in practice, the best way to do this, is to find good high n value (the highest value that doesn't make your learning diverge) and low n value (this one is tricker to figure. really depends on the data and problem), and then let n change linearly over time from high n to low n.
The learning rate depends on the typical values of data. There is no rule of thumb in general. Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step.
Normalizing the data to a zero-mean, unit variance or between 0-1 or any other standard form can help in selecting a value of learning rate. As doug mentioned, learning rate between 0.05 and 0.2 generally works well.
Also this will help in making the algorithm converge faster.
Source: Juszczak, P.; D. M. J. Tax, and R. P. W. Dui (2002). "Feature scaling in support vector data descriptions". Proc. 8th Annu. Conf. Adv. School Comput. Imaging: 95–10.

Channel allocation algorithm

We have a set of radio nodes in close proximity to each other and would like to allocate the frequencies for them to minimize overlap. To get complete coverage of the area, radio channels need to be oversubscribed and so we will have nearby radios transmitting on the same frequency.
Sample data:
5 Frequencies
343 Radios
4158 Edges
My current best guess is to randomly generate a population of frequency allocations and to swap frequencies between radios until the best score does not improve for 10 generations. Score is the sum of 1/range^2 for radios on the same frequency.
Each edge is the distance between the radios, corrected for walls and floors. Edges above 2* the max radio range have been culled from the list.
Is there a better way?
This is basically a graph-coloring problem with a twist. Rather than all proper colorings being equally good, some proper colorings are better than others, as defined by your scoring algorithm.
I think your genetic approach is practical and will yield good (if not provably optimal) solutions, but I would definitely suggest looking at some graph-coloring papers and seeing how applicable they are. It is very likely that you will get some great ideas for deciding how your algorithm should consider the available choices.
I agree that a simulation based on random initial assignment followed by some optimization is a good approach, but you're describing an optimization procedure which does not seem optimal, if I understand correctly (you're planning to swap frequencies at random if I read you correctly). At each optimization step you could pick a "reasonable" improvement by taking one radio from each frequency group and considering the 5*4/2=10 possible swaps of frequencies between two of them, and either choose the best, or (say) one of those which has positive delta score, with probabilities proportional to the deltas in the scores.
In the spirit of "simulated annealing", once the overall score seems to have more or less stabilized, you may want to switch for a small number of steps to "high temperature" (high randomness) where you just pick the set of 5 radios and swap them all e.g. with a circular permutation of frequency assignments -- do that a few times then go to the "cooling down" part again with the procedure in the above paragraph (which tries to get a cheap simulation of a maximum-gradient descent;-).
My quick stab at it would be to use a thin plate spline (or possibly a similar, cleverer linear algebra technique) to fit a plane to the function of frequency density. The average 'altitude' of each plane (per frequency) would then tell you whether a frequency is overused (i.e. when it's higher than the others); the slope would be an indication of the spatial distribution.

How to 'smooth' data and calculate line gradient?

I'm reading data from a device which measures distance. My sample rate is high so that I can measure large changes in distance (i.e. velocity) but this means that, when the velocity is low, the device delivers a number of measurements which are identical (due to the granularity of the device). This results in a 'stepped' curve.
What I need to do is to smooth the curve in order to calculate the velocity. Following that I then need to calculate the acceleration.
How to best go about this?
(Sample rate up to 1000Hz, calculation rate of 10Hz would be ok. Using C# in VS2005)
The wikipedia entry from moogs is a good starting point for smoothing the data. But it does not help you in making a decision.
It all depends on your data, and the needed processing speed.
Moving Average
Will flatten the top values. If you are interrested in the minimum and maximum value, don't use this. Also I think using the moving average will influence your measurement of the acceleration, since it will flatten your data (a bit), thereby acceleration will appear to be smaller. It all comes down to the needed accuracy.
Savitzky–Golay
Fast algorithm. As fast as the moving average. That will preserve the heights of peaks. Somewhat harder to implement. And you need the correct coefficients. I would pick this one.
Kalman filters
If you know the distribution, this can give you good results (it is used in GPS navigation systems). Maybe somewhat harder to implement. I mention this because I have used them in the past. But they are probably not a good choice for a starter in this kind of stuff.
The above will reduce noise on your signal.
Next you have to do is detect the start and end point of the "acceleration". You could do this by creating a Derivative of the original signal. The point(s) where the derivative crosses the Y-axis (zero) are probably the peaks in your signal, and might indicate the start and end of the acceleration.
You can then create a second degree derivative to get the minium and maximum acceleration itself.
You need a smoothing filter, the simplest would be a "moving average": just calculate the average of the last n points.
The question here is, how to determine n, can you tell us more about your application?
(There are other, more complicated filters. They vary on how they preserve the input data. A good list is in Wikipedia)
Edit!: For 10Hz, average the last 100 values.
Moving averages are generally terrible - but work well for white noise. Both moving averages & Savitzky-Golay both boil down to a correlation - and therefore are very fast and could be implemented in real time. If you need higher order information like first and second derivatives - SG is a good right choice. The magic of SG lies in the constant correlation coefficients needed for the filter - once you have decided the length and degree of polynomial to fit locally, the coefficients need only to be found once. You can compute them using R (sgolay) or Matlab.
You can also estimate a noisy signal's first derivative via the Savitzky-Golay best-fit polynomials - these are sometimes called Savitzky-Golay derivatives - and typically give a good estimate of the first derivative.
Kalman filtering can be very effective, but it's heavier computationally - it's hard to beat a short convolution for speed!
Paul
CenterSpace Software
In addition to the above articles, have a look at Catmull-Rom Splines.
You could use a moving average to smooth out the data.
In addition to GvSs excellent answer above you could also consider smoothing / reducing the stepping effect of your averaged results using some general curve fitting such as cubic or quadratic splines.

Resources