Smoothing values with inconsistent time period - algorithm

The problem I'm trying to solve: calculate the current average velocity of some data series where the data points are unevenly spread. For example, calculating the current speed of an upload, where the 'amount uploaded' signals arrive unevenly:
t = 0, sent = 0
t = 5, sent = 10
t = 6, sent = 12
t = 9, sent = 20

(last - first) / (time delta between first and last)
And that would be exactly the average velocity.
Unsless you forgot to tell us some details, you do not need the data points in the middle.

You can calculate the average per time unit by taking the delta of the new values and the previous values.
And if you want the average over multiple points, you can calculate the averages between several points, and than take the average of those averages.
For example:
Current average:
t34 = 9 - 6 = 3
sent34 = 20 - 12 = 8
average34 = 8 / 3 = 2.67
Average of last two time slots:
t23 = 6 - 5 = 1
sent23 = 12 - 10 = 2
average23 = 2 / 1 = 2
average234 = (2 + 2.67) / 2 = 2.33

Just rescale latest results
For you example:
t = 0, sent = 0
t = 5, sent = 10
t = 6, sent = 12
t = 9, sent = 20
CurrentSpeed = (20 -12) / (9 - 6) = 8/3 = 2.666666
You may use different rescale interval size to decrease speed of changing velocity (when connection "lost" "restored")

The standard way of calculating a velocity from noisy data is to apply a Kalman filter.

Related

What is the most efficient way to implement zig-zag ordering in MATLAB? [duplicate]

I have an NxM matrix in MATLAB that I would like to reorder in similar fashion to the way JPEG reorders its subblock pixels:
(image from Wikipedia)
I would like the algorithm to be generic such that I can pass in a 2D matrix with any dimensions. I am a C++ programmer by trade and am very tempted to write an old school loop to accomplish this, but I suspect there is a better way to do it in MATLAB.
I'd be rather want an algorithm that worked on an NxN matrix and go from there.
Example:
1 2 3
4 5 6 --> 1 2 4 7 5 3 6 8 9
7 8 9
Consider the code:
M = randi(100, [3 4]); %# input matrix
ind = reshape(1:numel(M), size(M)); %# indices of elements
ind = fliplr( spdiags( fliplr(ind) ) ); %# get the anti-diagonals
ind(:,1:2:end) = flipud( ind(:,1:2:end) ); %# reverse order of odd columns
ind(ind==0) = []; %# keep non-zero indices
M(ind) %# get elements in zigzag order
An example with a 4x4 matrix:
» M
M =
17 35 26 96
12 59 51 55
50 23 70 14
96 76 90 15
» M(ind)
ans =
17 35 12 50 59 26 96 51 23 96 76 70 55 14 90 15
and an example with a non-square matrix:
M =
69 9 16 100
75 23 83 8
46 92 54 45
ans =
69 9 75 46 23 16 100 83 92 54 8 45
This approach is pretty fast:
X = randn(500,2000); %// example input matrix
[r, c] = size(X);
M = bsxfun(#plus, (1:r).', 0:c-1);
M = M + bsxfun(#times, (1:r).'/(r+c), (-1).^M);
[~, ind] = sort(M(:));
y = X(ind).'; %'// output row vector
Benchmarking
The following code compares running time with that of Amro's excellent answer, using timeit. It tests different combinations of matrix size (number of entries) and matrix shape (number of rows to number of columns ratio).
%// Amro's approach
function y = zigzag_Amro(M)
ind = reshape(1:numel(M), size(M));
ind = fliplr( spdiags( fliplr(ind) ) );
ind(:,1:2:end) = flipud( ind(:,1:2:end) );
ind(ind==0) = [];
y = M(ind);
%// Luis' approach
function y = zigzag_Luis(X)
[r, c] = size(X);
M = bsxfun(#plus, (1:r).', 0:c-1);
M = M + bsxfun(#times, (1:r).'/(r+c), (-1).^M);
[~, ind] = sort(M(:));
y = X(ind).';
%// Benchmarking code:
S = [10 30 100 300 1000 3000]; %// reference to generate matrix size
f = [1 1]; %// number of cols is S*f(1); number of rows is S*f(2)
%// f = [0.5 2]; %// plotted with '--'
%// f = [2 0.5]; %// plotted with ':'
t_Amro = NaN(size(S));
t_Luis = NaN(size(S));
for n = 1:numel(S)
X = rand(f(1)*S(n), f(2)*S(n));
f_Amro = #() zigzag_Amro(X);
f_Luis = #() zigzag_Luis(X);
t_Amro(n) = timeit(f_Amro);
t_Luis(n) = timeit(f_Luis);
end
loglog(S.^2*prod(f), t_Amro, '.b-');
hold on
loglog(S.^2*prod(f), t_Luis, '.r-');
xlabel('number of matrix entries')
ylabel('time')
The figure below has been obtained with Matlab R2014b on Windows 7 64 bits. Results in R2010b are very similar. It is seen that the new approach reduces running time by a factor between 2.5 (for small matrices) and 1.4 (for large matrices). Results are seen to be almost insensitive to matrix shape, given a total number of entries.
Here's a non-loop solution zig_zag.m. It looks ugly but it works!:
function [M,index] = zig_zag(M)
[r,c] = size(M);
checker = rem(hankel(1:r,r-1+(1:c)),2);
[rEven,cEven] = find(checker);
[cOdd,rOdd] = find(~checker.'); %'#
rTotal = [rEven; rOdd];
cTotal = [cEven; cOdd];
[junk,sortIndex] = sort(rTotal+cTotal);
rSort = rTotal(sortIndex);
cSort = cTotal(sortIndex);
index = sub2ind([r c],rSort,cSort);
M = M(index);
end
And a test matrix:
>> M = [magic(4) zeros(4,1)];
M =
16 2 3 13 0
5 11 10 8 0
9 7 6 12 0
4 14 15 1 0
>> newM = zig_zag(M) %# Zig-zag sampled elements
newM =
16
2
5
9
11
3
13
10
7
4
14
6
8
0
0
12
15
1
0
0
Here's a way how to do this. Basically, your array is a hankel matrix plus vectors of 1:m, where m is the number of elements in each diagonal. Maybe someone else has a neat idea on how to create the diagonal arrays that have to be added to the flipped hankel array without a loop.
I think this should be generalizeable to a non-square array.
% for a 3x3 array
n=3;
numElementsPerDiagonal = [1:n,n-1:-1:1];
hadaRC = cumsum([0,numElementsPerDiagonal(1:end-1)]);
array2add = fliplr(hankel(hadaRC(1:n),hadaRC(end-n+1:n)));
% loop through the hankel array and add numbers counting either up or down
% if they are even or odd
for d = 1:(2*n-1)
if floor(d/2)==d/2
% even, count down
array2add = array2add + diag(1:numElementsPerDiagonal(d),d-n);
else
% odd, count up
array2add = array2add + diag(numElementsPerDiagonal(d):-1:1,d-n);
end
end
% now flip to get the result
indexMatrix = fliplr(array2add)
result =
1 2 6
3 5 7
4 8 9
Afterward, you just call reshape(image(indexMatrix),[],1) to get the vector of reordered elements.
EDIT
Ok, from your comment it looks like you need to use sort like Marc suggested.
indexMatrixT = indexMatrix'; % ' SO formatting
[dummy,sortedIdx] = sort(indexMatrixT(:));
sortedIdx =
1 2 4 7 5 3 6 8 9
Note that you'd need to transpose your input matrix first before you index, because Matlab counts first down, then right.
Assuming X to be the input 2D matrix and that is square or landscape-shaped, this seems to be pretty efficient -
[m,n] = size(X);
nlim = m*n;
n = n+mod(n-m,2);
mask = bsxfun(#le,[1:m]',[n:-1:1]);
start_vec = m:m-1:m*(m-1)+1;
a = bsxfun(#plus,start_vec',[0:n-1]*m);
offset_startcol = 2- mod(m+1,2);
[~,idx] = min(mask,[],1);
idx = idx - 1;
idx(idx==0) = m;
end_ind = a([0:n-1]*m + idx);
offsets = a(1,offset_startcol:2:end) + end_ind(offset_startcol:2:end);
a(:,offset_startcol:2:end) = bsxfun(#minus,offsets,a(:,offset_startcol:2:end));
out = a(mask);
out2 = m*n+1 - out(end:-1:1+m*(n-m+1));
result = X([out2 ; out(out<=nlim)]);
Quick runtime tests against Luis's approach -
Datasize: 500 x 2000
------------------------------------- With Proposed Approach
Elapsed time is 0.037145 seconds.
------------------------------------- With Luis Approach
Elapsed time is 0.045900 seconds.
Datasize: 5000 x 20000
------------------------------------- With Proposed Approach
Elapsed time is 3.947325 seconds.
------------------------------------- With Luis Approach
Elapsed time is 6.370463 seconds.
Let's assume for a moment that you have a 2-D matrix that's the same size as your image specifying the correct index. Call this array idx; then the matlab commands to reorder your image would be
[~,I] = sort (idx(:)); %sort the 1D indices of the image into ascending order according to idx
reorderedim = im(I);
I don't see an obvious solution to generate idx without using for loops or recursion, but I'll think some more.

How to define matching axis notches from existing "step list"

I need a way to align tick marks on two separate axis, while being able to control the "step" value (value between tick marks), where both axis start at mark 0 and end on a different maximum value.
Why this problem:
Flot, the JS charting package has an option to align tick marks, but when I do, I cannot control the step value. I can however control the step value directly, but then I lose the ability to align tick marks. I can however revert to defining my own max and step values, to get what I need (aligned tick marks while maintaining desired step value), but I need some help. yielding this question (read on for details).
Example
Let a be maximum value on axis A and b, be maximum value on axis B.
In this example, let a = 30, and b = 82.
Let's say I want 6 tick marks (not counting the extra tick mark at end of axis). In reality I guessed at 6 after trying out a few.
Once I have a desired number of tick marks, I can do something like this:
30 / 6 = 5 (I just go the needed step value for axis A)
Now need to figure out tick alignment for axis B
82 / 6 = 13.67 (not a good value, I prefer something more rounded)
move max value of B to 90 , where 90 / 6 = 15 (good - I just got the needed step value for axis B)
End Result
Input:
a_max = 30, b_max = 82
(in reality a_max could be 28.5, 29.42, b_max could be 84, 85.345, etc)
Output:
a_adjusted_max = 30, b_adjusted_max = 90,
a_step = 5, b_step = 15
number of ticks = 6 (+1 if count the end)
Visual:
|---------|---------|---------|---------|---------|---------> A
0 5 10 15 20 25 30
|---------|---------|---------|---------|---------|---------> B
0 15 30 45 60 75 90
Summary of "Demands"
Need step value for each axis to be one of 1, 2, 5, 10, 15, 20, 25, 50, 100 (in example was 5 for A, 15 for B)
Need adjusted max value for each axis (in example was 30 for A, 90 for B)
Need number of ticks to match for both axis
(optional) Number of ticks is flexible but should be anywhere between 4 and 12 as a sweet spot
adjusted max value is at or greater than original max value, and is located at a "rounded number" (i.e. 90 is prefered over 82 as in my above example)
Problems (Question)
I need to remove most of the guessing and automate tick mark generation.
i.e. at first, I Need better way to get number of tick marks because I guessed at number of tick marks I wanted above, because I wanted a good "step" value, which can be something like 1, 2, 5, 10, 15, 20, 25, 50, 100. Max values start at 4, and can go up to 100. In rarer cases go up to 500. In most cases the max values stay between 30-90.
How can I do so?
Here's a procedure I came up with. I'm assuming you only want integers.
choose a number of ticks from 4 to 12
calculate the number of steps needed for the A and B axes using this number of ticks
find how much we would have to extend axis A and axis B using these step values; add these numbers together and remember the result
repeat from the start for the next tick value
we choose the number of ticks that gives the minimal score; if there is a tie we choose the smaller number of ticks
Here are some example results:
a=30, b=82 gives 4 ticks
0 10 20 30
0 28 56 84
a=8, b=5 gives 6 ticks
0 2 4 6 8 10
0 1 2 3 4 5
Here's the pseudocode:
a = range of A axis
b = range of B axis
tickList[] = {4,5,6,7,8,9,10,11,12}
// calculate the scores for each number of ticks
for i from 0 to length(tickList)-1
ticks = tickList[i]
// find the number of steps we would use for this number of ticks
Astep = ceiling(a/(ticks-1))
Bstep = ceiling(b/(ticks-1))
// how much we would need to extend the A axis
if (a%Astep != 0)
Aextend[i] = Astep - a%Astep
else
Aextend[i] = 0
end
// how much we would need to extend the B axis
if (b%Bstep != 0)
Bextend[i] = Bstep - b%Bstep
else
Bextend[i] = 0
end
// the score is the total extending we would need to do
score[i] = Aextend[i] + Bextend[i]
end
// find the number of ticks that minimizes the score
bestIdx = 0
bestScore = 1000;
for i from 0 to length(tickList);
if (score[i] < bestScore)
bestIdx = i
bestScore = score[i]
end
end
bestTick = tickList[bestIdx]
bestAstep = ceiling(a/(bestTick-1))
bestBstep = ceiling(b/(bestTick-1))
A axis goes from 0 by bestAstep to bestAstep*bestTick
B axis goes from 0 by bestBstep to bestBstep*bestTick

How to extract a frequency from a signal

Is there a simple way to extract the main frequency/period from a signal (without resorting to the FFT)?
For my requirements, this can result in either a value for the main frequency (e.g. 3Hz) or a value representing the strength of a target frequency. For example, in the following 1-D signal the frequency is about 4Hz, assuming the sampling rate is 50ms.
How can this be extracted from the data programmatically?
10
2
1
2
8
10
8
2
1
1
8
10
7
1
1
2
7
10
5
1
Use Auto Correlation !
%using Matlab
%convert sample rate to hertz
fs = 1/(50/1000) % result = 20hz
vector = [10 2 1 2 8 10 8 2 1 1 8 10 7 1 1 2 7 10 5 1];
R = xcorr(vector);
[pks,locs]=findpeaks(R);
%result in hertz
fs./(diff(locs))
ans =
3.3333 4.0000 3.3333 3.3333 4.0000 3.3333
max(fs./diff(locs))
ans =
4
Apply Autocorrelation on the signal, you can find a lot of
source code in the web in defferent languages to do autocorrelation, a pseudo code:
TotalSamples = length(signal)
for z=1:TotalSamples
sum = 0;
for i=1:TotalSamples
sum = sum + (signal(i)*signal(i + pos));
end
Xcorre(z) = Xcorre(z) + sum;
end
Find all local peaks from result of autocorrelation
Compute the difference between local peaks locs[k+1] - locs[k]
Divide your frame rate by the difference between local peaks
The Frequency is the Maximum value

Select rolling rows without a loop

I have a question.
Suppose I have matrix
A =
1 2 3
4 5 6
7 8 9
10 11 12
I need to select n rolling rows from A and transpose elements in new matrix C in rows.
The loop that I use is:
n = 3; %for instance every 3 rows of A
B = [];
for i = 1:n
Btemp = transpose(A(i:i+size(A,1)-n,:));
B = [B;Btemp];
end
C=B';
and that produces matrix C which is:
C =
1 2 3 4 5 6 7 8 9
4 5 6 7 8 9 10 11 12
This is what i want too do, but can I do the same job without the loop?
It takes 4 minutes to calculate for an A matrix of 3280x35 size.
I think you can make it work very fast if you make initialization. And one other trick is to take the transpose first, since MATLAB uses columns as first index instead of rows.
tic
A = reshape(1:3280*35,[3280 35])'; %# Generate an example A
[nRows, nCols] = size(A);
n = 3; %for instance every 3 rows of A
B = zeros(nRows-n+1,nCols*n);
At = A';
for i = 1:size(B,1)
B(i,:) = reshape(At(:,i:i+n-1), [1 nCols*n]);
end
toc
The elapsed time is
Elapsed time is 0.004059 seconds.
I would not use reshape in the loop, but transform A first to one single row (actually a column will also work, doesn't matter)
Ar = reshape(A',1,[]); % the ' is important here!
then the selecting of elements out of Ar is really simple:
[nrows, ncols] = size(A);
new_ncols = ncols*n;
B = zeros(nrows-(n-1),new_ncols);
for ii = 1:nrows-(n-1)
B(ii,:) = Ar(n*(ii-1)+(1:new_ncols));
end
Still, the preallocation of B, gives you the largest improvement: more info at http://www.mathworks.nl/help/techdoc/matlab_prog/f8-784135.html
I don't have Matlab on me right now but I think you can do this without loops like this:
reshape(permute(cat(A(1:end-1,:),A(2:end,:),3),[3,2,1]), [2, size(A,2)*(size(A,1) - 1)]);
and in fact won't this do what you want?:
A1 = A(1:end-1,:);
A2 = A(2:end,:);
answer = [A1(:) ; A2(:)]

Basic Velocity Algorithm?

Given the following dataset for a single article on my site:
Article 1
2/1/2010 100
2/2/2010 80
2/3/2010 60
Article 2
2/1/2010 20000
2/2/2010 25000
2/3/2010 23000
where column 1 is the date and column 2 is the number of pageviews for an article. What is a basic velocity calculation that can be done to determine if this article is trending upwards or downwards for the most recent 3 days?
Caveats, the articles will not know the total number of pageviews only their own totals. Ideally with a number between 0 and 1. Any pointers to what this class of algorithms is called?
thanks!
update: Your data actually already is a list of velocities (pageviews/day). The following answer simply shows how to find the average velocity over the past three days. See my other answer for how to calculate pageview acceleration, which is the real statistic you are probably looking for.
Velocity is simply the change in a value (delta pageviews) over time:
For article 1 on 2/3/2010:
delta pageviews = 100 + 80 + 60
= 240 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 240 / 3
= 80 pageviews/day
For article 2 on 2/3/2010:
delta pageviews = 20000 + 25000 + 23000
= 68000 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 68,000 / 3
= 22,666 + 2/3 pageviews/day
Now that we know the maximum velocity, we can scale all the velocities to get relative velocities between 0 and 1 (or between 0% and 100%):
relative pageview velocity of article 1 = velocity / MAX_VELOCITY
= 240 / (22,666 + 2/3)
~ 0.0105882353
~ 1.05882353%
relative pageview velocity of article 2 = velocity / MAX_VELOCITY
= (22,666 + 2/3)/(22,666 + 2/3)
= 1
= 100%
"Pageview trend" likely refers to pageview acceleration, not velocity. Your dataset actually already is a list of velocities (pageviews/day). Pageviews are non-decreasing values, so pageview velocity can never be negative. The following describes how to calculate pageview acceleration, which may be negative.
PV_acceleration(t1,t2) = (PV_velocity{t2} - PV_velocity{t1}) / (t2 - t1)
("PV" == "Pageview")
Explanation:
Acceleration is simply change in velocity divided by change in time. Since your dataset is a list of page view velocities, you can plug them directly into the formula:
PV_acceleration("2/1/2010", "2/3/2010") = (60 - 100) / ("2/3/2010" - "2/1/2010")
= -40 / 2
= -20 pageviews per day per day
Note the data for "2/2/2010" was not used. An alternate method is to calculate three PV_accelerations (using a date range that goes back only a single day) and averaging them. There is not enough data in your example to do this for three days, but here is how to do it for the last two days:
PV_acceleration("2/3/2010", "2/2/2010") = (60 - 80) / ("2/3/2010" - "2/2/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration("2/2/2010", "2/1/2010") = (80 - 100) / ("2/2/2010" - "2/1/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration_average("2/3/2010", "2/2/2010") = -20 + -20 / 2
= -20 pageviews per day per day
This alternate method did not make a difference for the article 1 data because the page view acceleration did not change between the two days, but it will make a difference for article 2.
Just a link to an article about the 'trending' algorithm reddit, SUs and HN use among others.
http://www.seomoz.org/blog/reddit-stumbleupon-delicious-and-hacker-news-algorithms-exposed

Resources