Kolmogorov – Filter Matlab - performance

Hy
I need to use this Kolmogorov filter in an apllication. You put it some measured data and with the filter it gets some hoe smoothed.
I tryed to do it with "nchoosek" however when I try to do this for an I of 50 or more it takes way too long.
Does someone know how to do this in a faster way?
function [ filterd ] = kolmo(data, inter)
temp = 0;
temp1 = 0;
filterd(1:10, 1) = NaN;
for t=inter+1:(length(data)-inter)
for o=-inter:inter
temp = temp + (nchoosek(2*inter, (inter+o))*data(t+o));
temp1 = temp1 + nchoosek(2*inter, (inter+o));
end
filterd(t, 1) = temp/temp1;
temp = 0;
temp1 = 0;
end
end
Thx
Andy

Here is a loop-less solution:
function y = MySoln(x, K)
%# Get the binomial coefficient terms
FacAll = factorial(0:1:2*K)';
BinCoefAll = FacAll(end) ./ (FacAll .* flipud(FacAll));
%# Get all numerator terms
NumerAll = conv(x, BinCoefAll, 'valid');
%# Rescale numerator terms into output
y = (1 / sum(BinCoefAll)) * NumerAll;
I've avoided using nchoosek and instead have calculated the binomial coefficients manually using the factorials. This ensures that each factorial calculation is only performed once. In contrast, the OP's solution potentially performs each factorial calculation hundreds of times.
Once the binomial coefficients are calculated, the solution from there is a straightforward application of conv, and then scale by the denominator term.
I did a quick speed test between the OP solution and my solution. The speed test uses a random vector x with 50 elements, and sets K to 5. Then I run 100 iterations over my solution versus the OP solution. Here are the results:
Elapsed time is 2.637597 seconds. %# OP Solution
Elapsed time is 0.010401 seconds. %# My Solution
I'm pretty happy with this. I doubt the method can be made much more efficient from this point (but would be happy to be proven wrong). :-)

Related

Is there most efficient way to code program for Avg Clustering Coeff

Calculation of Average clustering coefficient of a graph
I am getting correct result but it takes huge time when the graph dimension increases need some alternative way so that it takes less time to execute. Is there any way to simplify the code??
%// A is adjacency matrix N X N,
%// d is degree ,
N=100;
d=10;
rand('state',0)
A = zeros(N,N);
kv=d*(d-1)/2;
%% Creating A matrix %%%
for i = 1:(d*N/2)
j = floor(N*rand)+1;
k = floor(N*rand)+1;
while (j==k)||(A(j,k)==1)
j = floor(N*rand)+1;
k = floor(N*rand)+1;
end
A(j,k)=1;
A(k,j)=1;
end
%% Calculation of clustering Coeff %%
for i=1:N
J=find(A(i,:));
et=0;
for ii=1:(size(J,2))-1
for jj=ii+1:size(J,2)
et=et+A(J(ii),J(jj));
end
end
Cv(i)=et/kv;
end
Avg_clustering_coeff=sum(Cv)/n;
Output I got.
Avg_clustering_coeff = 0.1107
That Calculation of clustering Coeff part could be vectorized using nchoosek to remove the innermost two nested loops, like so -
CvOut = zeros(1,N);
for k=1:N
J=find(A(k,:));
if numel(J)>1
idx = nchoosek(J,2);
CvOut(k) = sum(A(sub2ind([N N],idx(:,1),idx(:,2))));
end
end
CvOut=CvOut/kv;
Hopefully, this would boost up the performance quite a bit!
To speed up your code you can read my comment, but you are not going to reduce drastically the computation time, because the time complexity doesn't change.
But if you don't need to get an absolut result you can use the probability.
probnum = cumsum(1:d);
probnum = mean(probnum(end-1:end)); %theorical number of elements created by your second loop (for each row).
probfind = d*N/(N^2); %probability of finding a non zero value.
coeff = probnum*probfind/kv;
This probabilistic coeff is going to be equal to Avg_clustering_coeff for big N.
So you can use the normal method for small N and this method for big N.

Finding the exp

Hi I have a question on the result of the following function
The input is the row vector of x and we are outputing the calculated exp value using the ∑_(n=0)^(n=50)▒(x^n)/n! (i.e. Summation from n=0 to n=50 using x^n)/n!)
The loop will terminate either when n reaches 50 or (x^n)/n! < 0.01
function [summ] = ExpFunction(x)
// there is a loop to iterate.
There are two versions
1) We write an if to see if the value (x^n)/n! is >= 0.01. if it is then add it the the summ.
2) Add it to the summ first and then check if (x^n)/n! is >= 0.01. if not then terminates the program.
My question is that why do the two versions produce different results and the second version appears to produce better results(i.e. closer the exp(x) )
Thank you
version 1:
function [result] = Exp(x)
result = 0;
a = 0;
n = 0;
while(n <= 50)
{
a = (x.^n)/factorial(n) %% The factorial function is self written have have been checked.
if(abs(a) >= 0.01)
result = result + a;
else
break;
n = n + 1;
}
Second version is to do result = result + a; before checking abs(a) >=0.01
The question seems simple. The series is increasing (i.e. each addition results in a larger sum) and the limit value is being approached from below. This means that every new term added to the sum is getting closer to the final value = the limit. This results in each addition being a better approximation to the result.
It is also clear that the first method, not adding the term, will result in a slightly less accurate result than the second method, which does add the term.
It is clear that the accuracy of the result is improved by adding more terms. The only cost is the extra computing time. Is your termination criterion (x^n/factorial(n) < 0.01) giving good enough values for all values of x? I would have expected you to use a formula more like (x^n/factorial(n) < g(x)) where g(x) is a formula involving x. I suggest that you go back to the text on series and determine whether a better g(x) is required for your accuracy requirements.

In matlab in a product dense matrix * sparse matrix, how can I only calculate specific entries?

We have a matlab program in which we want to calculate the following expression:
sum( (M*x) .* x)
Here, M is a small dense matrix (say 100 by 100) and x is a sparse fat matrix (say of size 100 by 1 000 000, with 5% non-zero entries). When I run the code, then first M*x is calculated, which is a dense matrix-- however, most of the computation that went into computing that matrix is a complete waste of time, as most of it will be zero-ed out in the point-wise product with x afterwards.
In other words: What I want to do is to only calculate those entries (i,j) of M*x which correspond to (i,j) for which x(i,j) is non-zero. In the end, I will then also only be interested in each column count.
It seems pretty simple to start with but I could not figure out how to tell matlab to do it or how to reshape the calculation so that matlab does it efficiently. I would really like to avoid having to code up a mex-file for this operation, and this operation is eating up most of the computation time.
Here is a code snippet for comparison:
m = 100;
n = 100000;
density = 0.05;
M = randn(m); M = M * M';
x = sprandn(m,n,density);
tic
for i = 1:100
xsi = sum((M * x).*x,1);
end
toc
Elapsed time is 13.570713 seconds.
To compute (M*x) .* x: find which entries of the final result can be nonzero (using find), compute manually only for those (sum(M(...).'.*x(...)) .* nonzeros(x).'), and from that build the final matrix (using sparse):
[ii jj] = find(x);
R = sparse(ii, jj, sum(M(ii,:).'.*x(:,jj)) .* nonzeros(x).');
Of course, to compute sum((M*x) .* x) you then simply use
full(sum(R))

Efficient replacement for ppval

I have a loop in which I use ppval to evaluate a set of values from a piecewise polynomial spline. The interpolation is easily the most time consuming part of the loop and I am looking for a way improve the function's efficiency.
More specifically, I'm using a finite difference scheme to calculate transient temperature distributions in friction welds. To do this I need to recalculate the material properties (as a function of temperature and position) at each time step. The rate limiting factor is the interpolation of these values. I could use an alternate finite difference scheme (less restrictive in the time domain) but would rather stick with what I have if at all possible.
I've included a MWE below:
x=0:.1:10;
y=sin(x);
pp=spline(x,y);
tic
for n=1:10000
x_int=10*rand(1000,1);
y_int=ppval(pp,x_int);
end
toc
plot(x,y,x_int,y_int,'*') % plot for sanity of data
Elapsed time is 1.265442 seconds.
Edit - I should probably mention that I would be more than happy with a simple linear interpolation between values but the interp1 function is slower than ppval
x=0:.1:10;
y=sin(x);
tic
for n=1:10000
x_int=10*rand(1000,1);
y_int=interp1(x,y,x_int,'linear');
end
toc
plot(x,y,x_int,y_int,'*') % plot for sanity of data
Elapsed time is 1.957256 seconds.
This is slow, because you're running into the single most annoying limitation of JIT. It's the cause of many many many oh so many questions in the MATLAB tag here on SO:
MATLAB's JIT accelerator cannot accelerate loops that call non-builtin functions.
Both ppval and interp1 are not built in (check with type ppval or edit interp1). Their implementation is not particularly slow, they just aren't fast when placed in a loop.
Now I have the impression it's getting better in more recent versions of MATLAB, but there are still quite massive differences between "inlined" and "non-inlined" loops. Why their JIT doesn't automate this task by simply recursing into non-builtins, I really have no idea.
Anyway, to fix this, you should copy-paste the essence of what happens in ppval into the loop body:
% Example data
x = 0:.1:10;
y = sin(x);
pp = spline(x,y);
% Your original version
tic
for n = 1:10000
x_int = 10*rand(1000,1);
y_int = ppval(pp, x_int);
end
toc
% "inlined" version
tic
br = pp.breaks.';
cf = pp.coefs;
for n = 1:10000
x_int = 10*rand(1000,1);
[~, inds] = histc(x_int, [-inf; br(2:end-1); +inf]);
x_shf = x_int - br(inds);
zero = ones(size(x_shf));
one = x_shf;
two = one .* x_shf;
three = two .* x_shf;
y_int = sum( [three two one zero] .* cf(inds,:), 2);
end
toc
Profiler:
Results on my crappy machine:
Elapsed time is 2.764317 seconds. % ppval
Elapsed time is 1.695324 seconds. % "inlined" version
The difference is actually less than what I expected, but I think that's mostly due to the sum() -- for this ppval case, I usually only need to evaluate a single site per iteration, which you can do without histc (but with simple vectorized code) and matrix/vector multiplication x*y (BLAS) instead of sum(x.*y) (fast, but not BLAS-fast).
Oh well, a ~60% reduction is not bad :)
It is a bit surprising that interp1 is slower than ppval, but having a quick look at its source code, it seems that it has to check for many special cases and has to loop over all the points since it it cannot be sure if the step-size is constant.
I didn't check the timing, but I guess you can speed up the linear interpolation by a lot if you can guarantee that steps in x of your table are constant, and that the values to be interpolated are stricktly within the given range, so that you do not have to do any checking. In that case, linear interpolation can be converted to a simple lookup problem like so:
%data to be interpolated, on grid with constant step
x = 0:0.5:10;
y = sin(x);
x_int = 0:0.1:9.9;
%make sure it is interpolation, not extrapolation
assert(all(x(1) <= x_int & x_int < x(end)));
% compute mapping, this can be precomputed for constant grid
slope = (length(x) - 1) / (x(end) - x(1));
offset = 1 - slope*x(1);
%map x_int to interval 1..lenght(i)
xmapped = offset + slope * x_int;
ind = floor(xmapped);
frac = xmapped - ind;
%interpolate by taking weighted sum of neighbouring points
y_int = y(ind) .* (1 - frac) + y(ind+1) .* frac;
% make plot to check correctness
plot(x, y, 'o-', x_int, y_int, '.')

How to test if a function grows logarithmically?

I have function in my model that counts user's score:
def score
(MULTIPLER * Math::log10(bets.count * summary_value ** accuracy + 1)).floor
end
My point is to test that it grows logarithmically?
The point of a test isn't to prove it always works (this is the area for static typing/proofs), but to check that it is probably working. This is normally good enough. I'm guessing you are doing it for a game, and what to ensure the function doesn't "grow" too quickly.
A way we could do that is to try a number of values, and check whether they are following a general logarithmic pattern.
For example, consider a pure logarithmic function f(x) = log(x) (any base):
If f(x) = y, then f(x^n) = f(x) * n.
So, if f(x^n) == (f(x) * n), then the function is logarithmic.
Compare that to a linear function, eg f(x) == x * 2. f(x^n) = x^n * 2, ie x^(n - 1) times bigger (a lot bigger).
You may have a more complex logarithmic function, eg f(x) = log(x + 7) + 3456. The pattern still holds though, just less accurately. So what I did was:
Attempt to calculate the constant value, by using x = 1
Find the difference f(x^n) - f(x) * n.
Find the absolute difference of f((x*100)^n) - f(100x) * n
If (3)/(2) is less than 10, it is almost certainly not linear, and probably logarithmic. The 10 is just an arbitrary number. Most linear functions will be different by a factor of more than a billion. Even functions like sqrt(x) will have a bigger difference than 10.
My example code will just have the score method take a parameter, and test against that (to keep it simple + I don't have your supporting code).
require 'rspec'
require 'rspec/mocks/standalone'
def score(input)
Math.log2(input * 3 + 1000 * 3) * 3 + 100 + Math.sin(input)
end
describe "score" do
it "grows logarithmacally based on input" do
x = 50
n = 8
c = score(1)
result1 = (score(x ** n) - c) / ((score(x) -c) * n)
x *= 100
result2 = (score(x ** n) - c) / ((score(x) -c) * n)
(result2 / result1).abs.should be < 10
end
end
Though I almost forget complex math knowledge, it seems the fact can't stop me answering the question.
My suggestion as follows
describe "#score" do
it "grows logarithmically" do
round_1 = FactoryGirl.create_list(:bet, 10, value: 5).score
round_2 = FactoryGirl.create_list(:bet, 11, value: 5).score
# Then expect some math relation between round_1 and round_2,
# calculated by you manually.
end
end
If you need to treat this function like a black box, the only true solution is to get a bunch of values and see if their curve is well-approximated by a logarithmic curve, focusing on large n. If you could put reasonable bounds on it, like a log(n) < score(n) < b log(n) for some values a and b then you could just check that.
Generally speaking, the best way to see if a function grows is to plot some data on a graph. Just use some graph plotting gem and evaluate the result.
A logarithmic function will always look like this:
(source: sosmath.com)
You can then adjust how fast it grows through your parameters, and replot the graph, until you found yourself happy with the result.

Resources