Continuous Fourier transform on discrete data using Mathematica? - wolfram-mathematica

I have some periodic data, but the amount of data is not a multiple of
the period. How can I Fourier analyze this data? Example:
% Let's create some data for testing:
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}]
% I now receive this data, but have no idea that it came from the
formula above. I'm trying to reconstruct the formula just from 'data'.
% Looking at the first few non-constant terms of the Fourier series:
ListPlot[Table[Abs[Fourier[data]][[x]], {x,2,20}], PlotJoined->True,
shows an expected spike at 6 (since the number of periods is really
25000/(623*2*Pi) or about 6.38663, though we don't know this).
% Now, how do I get back 6.38663? One way is to "convolve" the data with
arbitrary multiples of Cos[x].
convolve[n_] := Sum[data[[x]]*Cos[n*x], {x,1,25000}]
% And graph the "convolution" near n=6:
Plot[convolve[n],{n,5,7}, PlotRange->All]
we see a spike roughly where expected.
% We try FindMaximum:
but the result is useless and inaccurate:
Machine precision is insufficient to achieve the requested accuracy or
Out[119]= {98.9285, {n -> 5.17881}}
because the function is very wiggly.
% By refining our interval (using visual analysis on the plots), we
finally find an interval where convolve[] doesn't wiggle too much:
Plot[convolve[n],{n,6.2831,6.2833}, PlotRange->All]
and FindMaximum works:
FindMaximum[convolve[n],{n,6.2831,6.2833}] // FortranForm
% However, this process is ugly, requires human intervention, and
computing convolve[] is REALLY slow. Is there a better way to do this?
% Looking at the Fourier series of the data, can I somehow divine the
"true" number of periods is 6.38663? Of course, the actual result
would be 6.283185, since my data fits that better (because I'm only
sampling at a finite number of points).

Based on Mathematica help for the Fourier function / Applications / Frequency Identification:
Checked on version 7
n = 25000;
data = Table[N[753 + 919*Sin[x/623 - 125]], {x, 1, n}];
pdata = data - Total[data]/Length[data];
f = Abs[Fourier[pdata]];
pos = Ordering[-f, 1][[1]]; (*the position of the first Maximal value*)
fr = Abs[Fourier[pdata Exp[2 Pi I (pos - 2) N[Range[0, n - 1]]/n],
FourierParameters -> {0, 2/n}]];
frpos = Ordering[-fr, 1][[1]];
N[(pos - 2 + 2 (frpos - 1)/n)]
returns 6.37072

Look for the period length using autocorrelation to get an estimate:
autocorrelate[data_, d_] :=
Plus ## (Drop[data, d]*Drop[data, -d])/(Length[data] - d)
ListPlot[Table[{d, autocorrelate[data, d]}, {d, 0, 5000, 100}]]
A smart search for the first maximum away from d=0 may be the best estimate you can get form the available data?

(* the data *)
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}];
(* Find the position of the largest Fourier coefficient, after
removing the last half of the list (which is redundant) and the
constant term; the [[1]] is necessary because Ordering returns a list *)
f2 = Ordering[Abs[Take[Fourier[data], {2,Round[Length[data]/2+1]}]],-1][[1]]
(* Result: 6 *)
(* Directly find the least squares difference between all functions of
the form a+b*Sin[c*n-d], with intelligent starting values *)
sol = FindMinimum[Sum[((a+b*Sin[c*n-d]) - data[[n]])^2, {n,1,Length[data]}],
(* Result (using //InputForm):
The step size in the search has become less than the tolerance prescribed by
the PrecisionGoal option, but the gradient is larger than the tolerance
specified by the AccuracyGoal option. There is a possibility that the method
has stalled at a point that is not a local minimum.
{2.1375902350021628*^-19, {a -> 753., b -> -919., c -> 0.0016051364365971107,
d -> 2.477886509998064}}
(* Create a table of values for the resulting function to compare to 'data' *)
tab = Table[a+b*Sin[c*x-d], {x,1,Length[data]}] /. sol[[2]];
(* The maximal difference is effectively 0 *)
Max[Abs[data-tab]] // InputForm
(* Result: 7.73070496506989*^-12 *)
Although the above doesn't necessarily fully answer my question, I found it
somewhat remarkable.
Earlier, I'd tried using FindFit[] with Method -> NMinimize (which is
supposed to give a better global fit), but that didn't work well,
possibly because you can't give FindFit[] intelligent starting values.
The error I get bugs me but appears to be irrelevant.


Analytical way of speeding up exp(A*x) in MATLAB

I need to calculate f(x)=exp(A*x) repeatedly for a tiny, variable column vector x and a huge, constant matrix A (many rows, few columns). In other words, the x are few, but the A*x are many. My problem dimensions are such that A*x takes about as much runtime as the exp() part.
Apart from Taylor expansion and pre-calculating a range of values exp(y) (assuming known the range y of values of A*x), which I haven't managed to speed up considerably (while maintaining accuracy) with respect to what MATLAB is doing on its own, I am thinking about analytically restating the problem in order to be able to precalculate some values.
For example, I find that exp(A*x)_i = exp(\sum_j A_ij x_j) = \prod_j exp(A_ij x_j) = \prod_j exp(A_ij)^x_j
This would allow me to precalculate exp(A) once, but the required exponentiation in the loop is as costly as the original exp() function call, and the multiplications (\prod) have to be carried out in addition.
Is there any other idea that I could follow, or solutions within MATLAB that I may have missed?
Edit: some more details
A is 26873856 by 81 in size (yes, it's that huge), so x is 81 by 1.
nnz(A) / numel(A) is 0.0012, nnz(A*x) / numel(A*x) is 0.0075. I already use a sparse matrix to represent A, however, exp() of a sparse matrix is not sparse any longer. So in fact, I store x non-sparse and I calculate exp(full(A*x)) which turned out to be as fast/slow as full(exp(A*x)) (I think A*x is non-sparse anyway, since x is non-sparse.) exp(full(A*sparse(x))) is a way to have a sparse A*x, but is slower. Even slower variants are exp(A*sparse(x)) (with doubled memory impact for a non-sparse matrix of type sparse) and full(exp(A*sparse(x)) (which again yields a non-sparse result).
sx = sparse(x);
tic, for i = 1 : 10, exp(full(A*x)); end, toc
tic, for i = 1 : 10, full(exp(A*x)); end, toc
tic, for i = 1 : 10, exp(full(A*sx)); end, toc
tic, for i = 1 : 10, exp(A*sx); end, toc
tic, for i = 1 : 10, full(exp(A*sx)); end, toc
Elapsed time is 1.485935 seconds.
Elapsed time is 1.511304 seconds.
Elapsed time is 2.060104 seconds.
Elapsed time is 3.194711 seconds.
Elapsed time is 4.534749 seconds.
Yes, I do calculate element-wise exp, I update the above equation to reflect that.
One more edit: I tried to be smart, with little success:
tic, for i = 1 : 10, B = exp(A*x); end, toc
tic, for i = 1 : 10, C = 1 + full(spfun(#(x) exp(x) - 1, A * sx)); end, toc
tic, for i = 1 : 10, D = 1 + full(spfun(#(x) exp(x) - 1, A * x)); end, toc
tic, for i = 1 : 10, E = 1 + full(spfun(#(x) exp(x) - 1, sparse(A * x))); end, toc
tic, for i = 1 : 10, F = 1 + spfun(#(x) exp(x) - 1, A * sx); end, toc
tic, for i = 1 : 10, G = 1 + spfun(#(x) exp(x) - 1, A * x); end, toc
tic, for i = 1 : 10, H = 1 + spfun(#(x) exp(x) - 1, sparse(A * x)); end, toc
Elapsed time is 1.490776 seconds.
Elapsed time is 2.031305 seconds.
Elapsed time is 2.743365 seconds.
Elapsed time is 2.818630 seconds.
Elapsed time is 2.176082 seconds.
Elapsed time is 2.779800 seconds.
Elapsed time is 2.900107 seconds.
Computers don't really do exponents. You would think they do, but what they do is high-accuracy polynomial approximations.
The last reference looked quite nice. Perhaps it should have been first.
Since you are working on images, you likely have discrete number of intensity levels (255 typically). This can allow reduced sampling, or lookups, depending on the nature of "A". One way to check this is to do something like the following for a sufficiently representative group of values of "x":
If you were able to pre-segment your images into "more interesting" and "not as interesting" - like if you were looking at an x-ray being able to trim out all the "outside the human body" locations and clamp them to zero to pre-sparsify your data, that could reduce your number of unique values. You might consider the previous for each unique "mode" inside the data.
My approaches would include:
look at alternate formulations of exp(x) that are lower accuracy but higher speed
consider table lookups if you have few enough levels of "x"
consider a combination of interpolation and table lookups if you have "slightly too many" levels to do a table lookup
consider a single lookup (or alternate formulation) based on segmented mode. If you know it is a bone and are looking for a vein, then maybe it should get less high-cost data processing applied.
Now I have to ask myself why would you be living in so many iterations of exp(A*x)*x and I think you might be switching back and forth between frequency/wavenumber domain and time/space domain. You also might be dealing with probabilities using exp(x) as a basis, and doing some Bayesian fun. I don't know that exp(x) is a good conjugate prior, so I'm going to go with the fourier material.
Other options:
- consider use of fft, fft2, or fftn given your matrices - they are fast and might do part of what you are looking for.
I am sure there is a forier domain variation on the following:
You might be able to mix the lookup with a compute using the woodbury matrix. I would have to think about that some to be sure though. (link) At one point I knew that everything that mattered (CFD, FEA, FFT) were all about the matrix inversion, but I have since forgotten the particular details.
Now, if you are living in MatLab then you might consider using "coder" which converts MatLab code to c-code. No matter how much fun an interpreter may be, a good c-compiler can be a lot faster. The mnemonic (hopefully not too ambitious) that I use is shown here: link starting around 13:49. It is really simple, but it shows the difference between a canonical interpreted language (python) and compiled version of the same (cython/c).
I'm sure that if I had some more specifics, and was requested to, then I could engage more aggressively in a more specifically relevant answer.
You might not have a good way to do it on conventional hardware, buy you might consider something like a GPGPU. CUDA and its peers have massively parallel operations that allow substantial speedup for the cost of a few video cards. You can have thousands of "cores" (overglorified pipelines) doing the work of a few ALU's and if the job is properly parallelizable (as this looks like) then it can get done a LOT faster.
I was thinking about Eureqa. One option that I would consider if I had some "big iron" for development but not production would be to use their Eureqa product to come up with a fast enough, accurate enough approximation.
If you performed a 'quick' singular value decomposition of your "A" matrix, you would find that the dominant performance is governed by 81 eigenvectors. I would look at the eigenvalues and see if there were only a few of those 81 eigenvectors providing the majority of the information. If that was the case, then you can clamp the others to zero, and construct a simple transformation.
Now, if it were me, I would want to get "A" out of the exponent. I'm wondering if you can look at the 81x81 eigenvector matrix and "x" and think a little about linear algebra, and what space you are projecting your vectors into. Is there any way that you can make a function that looks like the following:
f(x) = B2 * exp( B1 * x )
such that the
B1 * x
is much smaller rank than your current

matlab: optimum amount of points for linear fit

I want to make a linear fit to few data points, as shown on the image. Since I know the intercept (in this case say 0.05), I want to fit only points which are in the linear region with this particular intercept. In this case it will be lets say points 5:22 (but not 22:30).
I'm looking for the simple algorithm to determine this optimal amount of points, based on... hmm, that's the question... R^2? Any Ideas how to do it?
I was thinking about probing R^2 for fits using points 1 to 2:30, 2 to 3:30, and so on, but I don't really know how to enclose it into clear and simple function. For fits with fixed intercept I'm using polyfit0 ( . Thanks for any suggestions!
sample data:
intercept = 0.043;
x = 0.01:0.01:0.3;
y = [0.0530642513911393,0.0600786706929529,0.0673485248329648,0.0794662409166333,0.0895915873196170,0.103837395346484,0.107224784565365,0.120300492775786,0.126318699218730,0.141508831492330,0.147135757370947,0.161734674733680,0.170982455701681,0.191799936622712,0.192312642057298,0.204771365716483,0.222689541632988,0.242582251060963,0.252582727297656,0.267390860166283,0.282890010610515,0.292381165948577,0.307990544720676,0.314264952297699,0.332344368808024,0.355781519885611,0.373277721489254,0.387722683944356,0.413648156978284,0.446500064130389;];
What you have here is a rather difficult problem to find a general solution of.
One approach would be to compute all the slopes/intersects between all consecutive pairs of points, and then do cluster analysis on the intersepts:
slopes = diff(y)./diff(x);
intersepts = y(1:end-1) - slopes.*x(1:end-1);
idx = kmeans(intersepts, 3);
x([idx; 3] == 2) % the points with the intersepts closest to the linear one.
This requires the statistics toolbox (for kmeans). This is the best of all methods I tried, although the range of points found this way might have a few small holes in it; e.g., when the slopes of two points in the start and end range lie close to the slope of the line, these points will be detected as belonging to the line. This (and other factors) will require a bit more post-processing of the solution found this way.
Another approach (which I failed to construct successfully) is to do a linear fit in a loop, each time increasing the range of points from some point in the middle towards both of the endpoints, and see if the sum of the squared error remains small. This I gave up very quickly, because defining what "small" is is very subjective and must be done in some heuristic way.
I tried a more systematic and robust approach of the above:
function test
%% example data
slope = 2;
intercept = 1.5;
x = linspace(0.1, 5, 100).';
y = slope*x + intercept;
y(1:12) = log(x(1:12)) + y(12)-log(x(12));
y(74:100) = y(74:100) + (x(74:100)-x(74)).^8;
y = y + 0.2*randn(size(y));
%% simple algorithm
[X,fn] = fminsearch(#(ii)P(ii, x,y,intercept), [0.5 0.5])
[~,inds] = P(X, y,x,intercept)
function [C, inds] = P(ii, x,y,intercept)
% ii represents fraction of range from center to end,
% So ii lies between 0 and 1.
N = numel(x);
n = round(N/2);
ii = round(ii*n);
inds = min(max(1, n+(-ii(1):ii(2))), N);
% Solve linear system with fixed intercept
A = x(inds);
b = y(inds) - intercept;
% and return the sum of squared errors, divided by
% the number of points included in the set. This
% last step is required to prevent fminsearch from
% reducing the set to 1 point (= minimum possible
% squared error).
C = sum(((A\b)*A - b).^2)/numel(inds);
which only finds a rough approximation to the desired indices (12 and 74 in this example).
When fminsearch is run a few dozen times with random starting values (really just rand(1,2)), it gets more reliable, but I still wouln't bet my life on it.
If you have the statistics toolbox, use the kmeans option.
Depending on the number of data values, I would split the data into a relative small number of overlapping segments, and for each segment calculate the linear fit, or rather the 1-st order coefficient, (remember you know the intercept, which will be same for all segments).
Then, for each coefficient calculate the MSE between this hypothetical line and entire dataset, choosing the coefficient which yields the smallest MSE.

NMinimize with function containing random variables

I was wondering if it is possible to use NMinimize from mathematica with an objective function, which contains random variables? E.g. I have a function with parameters which follow a distribution (normal and truncated normal). I want to fit its histogram to data that I have and constructed an objective function which now I need to minimize (so, the objective function depends on the mus and sigmas of the parameters and need to be determined). If I run my code, there is an error message: It claims the parameter for the NormalDistribution needs to be positive (If I plug in numbers for the mus and sigmas of my objective functionby hand, i don't get an error message).
So, I am wondering if it is not possible for NMinimize to handle a non-analytic function.
Here, I give you an example code (please note that the original function is more complicated)
listS and listT are both lists of event times. I want to fit the curve of my statistical model for the times (here, a very simple one, it consists of a truncated normal distribution) to the data I have.
For this I compare the survival curves and need to minimize the sum of the least squares.
My problem is that the function NMinimize doesn't seem to work. (Please note, that the original objective function consists of a more complicated function with parameters that are random variables)
(* Both lists are supposed to be the list of times *)
SurvivalS[listeS_, x_] := Module[{res, survivald},
survivald = SurvivalDistribution[listeS];
res = SurvivalFunction[survivald, x];
Residuum[listeT_, listeS_] :=
Table[(SurvivalS[listeT, listeT[[i]]] - SurvivalS[listeS, listeT[[i]]]), {i,
1, dataN}];
LeastSquare[listeT_, listeS_] :=
Total[Function[x, x^2] /#
listeS]];(* objective function, here ist is the sum of least square *)
objectiveF[mu_, sigma_] :=
Piecewise[{{LeastSquare[listeT, listeS[mu, sigma]], mu > 0 && sigma > 0}},
20 (1 + (sigma + mu)^2)];
pool = 100; (* No. points from MonteCarlo *)
listeS[mu_, sigma_] := RandomVariate[TruncatedDistribution[{0, 1}, NormalDistribution[mu, sigma]],pool];(* simulated data *)
listeT = Sort[RandomVariate[TruncatedDistribution[{0, 1}, NormalDistribution[.5, .9]],60]]; (* list of "measured" data *)
dataN = Length[listeT];
NMinimize[objectiveF[mu, .9], {{mu, .4}}]
The error message is: "RandomVariate::realprm: Parameter mu at position 1 in NormalDistribution[mu,0.9] is expected to be real. >>"

Standard deviation of one element

When I try to execute
I get an error
StandardDeviation::shlen: "The argument {1} should have at least two elements"
But std of one element is 0, isn't it?
The standard deviation is commonly defined as the square-root of the unbiased estimator of the variance:
You can easily see that for a single sample, N=1 and you get 0/0, which is undefined. Hence your standard deviation is undefined for a single sample in Mathematica.
Now depending on your conventions, you might want to define a standard deviation for a single sample (either return Null or some value or 0). Here's an example that shows you how to define it for a single sample.
std[x_List] := Which[(Length[x] == 1), 0, True, StandardDeviation[x]]
Out[1]= 0
The standard deviation of a constant is zero.
The estimated standard deviation of one sample is undefined.
If you want some formality:
p[x_] := DiracDelta[x - mu];
expValue = Integrate[x p[x] , {x, -Infinity, Infinity}]
stdDev = Sqrt[Integrate[(x - expValue)^2 p[x] , {x, -Infinity, Infinity}]]
-> ConditionalExpression[mu, mu \[Element] Reals]
-> ConditionalExpression[0, mu \[Element] Reals]
Or better, using Mathematica ProbabilityDistribution[]:
dist = ProbabilityDistribution[DiracDelta[x - mu], {x, -Infinity, Infinity}];
{Mean[dist], StandardDeviation[dist]}
-> { mu, ConditionalExpression[0, mu \[Element] Reals]}
If your population size is one element, then yes the standard deviation of your population will be 0. However typically standard deviations are used on samples, and not on the entire population, so instead of dividing by the number of elements in the sample, you divide by the number of elements minus one. This is due to the error inherent in performing calculations on a sample, rather than a population.
Performing a calculation of the standard deviation over a population of size 1 makes absolutely no sense, which I think is where the confusion is coming from. If you know that your population contains only one element then finding out the standard deviation of that element is pointless, so generally you will see the standard deviation of a single element written as undefined.
Standard deviation - which is a measure for the deviation of the actual value from the average of a given set - for a list of one element doesn't make any sense (you can set it to 0 if you want).

Using Fourier Analysis to fit function to data

I have 24 values for Y and corresponding 24 values for the Y values are measured experimentally,
while t has values : t=[1,2,3........24]
I want to find the relationship between Y and t as an equation using Fourier analysis,
what I have tried and done is:
I wrote the following MATLAB code:
ts=1; % step
t=1:ts:24; % the period is 24
f=[-length(t)/2:length(t)/2-1]/(length(t)*ts); % computing frequency interval
figure;plot(f,M,'LineWidth',1.5);grid % plot of harmonic components
plot(t,Y,'LineWidth',1.5);grid % plot of original data Y
figure;bar(f,M);grid % plot of harmonic components as bar shape
the results of the bar figure is:
Now, I want to find the equation for these harmonic components which represent the data. After that I want to draw the original data Y with the data found from the fitting function and the two curves should be close to each other.
Should I use cos or sin or -sin or -cos?
In another way, what is the rule to represent these harmonics as a function: Y = f (t) ?
An example done with your data and Mathematica using Discrete sine transform. Hope you can extrapolate to Matlab:
n = 24;
xg = N[Range[n]]/n
fg = l (*your list *)
fp = ListPlot[Transpose[{xg, fg}], PlotRange -> All] (*points plot*)
coef = FourierDST[fg, 1]/Sqrt[n/2]; (*Fourier transform*)
Show[fp, Plot[Sum[coef[[r]]*Sin[Pi r x], {r, n - 1}], {x, -1, 1},
PlotRange -> All]]
The coefficients are:
{16.6411, -4.00062, 5.31557, -1.38863, 2.89762, 0.898562,
1.54402, -0.116046, 1.54847, 0.136079, 1.16729, 0.156489,
0.787476, -0.0879736, 0.747845, 0.00903859, 0.515012, 0.021791,
0.35001, 0.0159676, 0.215619, 0.0122281, 0.0943376, -0.00150218}
More detailed view:
However, as an even function seems to be better, I made also a discrete fourier cosine transform of type 3, which works much better:
In this case the coefficients are:
{14.7384, -8.93197, 4.56404, -2.85262, 2.42847, -0.249488,
0.565181,-0.848594, 0.958699, -0.468337, 0.660136, -0.317903,
0.390689,-0.457621, 0.427875, -0.260669, 0.278931, -0.166846,
0.18547, -0.102438, 0.111731, -0.0425396, 0.0484102, -0.00559378}
And the plotting of coeffs and function are obtained by:
coef = FourierDCT[fg, 3]/Sqrt[n];(*Fourier transform*)
f[x_]:= Sum[coef[[r]]*Cos[Pi (r - 1/2) x], {r, n - 1}]
You'll have to experiment a little ...
Depends on what MATLAB gave you back. It's either sine and cosine or a complex exponential.
Most FFT algorithms that I know of usually demand that the number of data points be an integer power of two. The closest one for your data set is 32, so you should pad it out with zeros.
Thanks for your help.
I found the solution I was aiming to get but for some reason everything is shifted by 1
Here is the code:
ts = 1; % time step
t = [1:ts:24];
fs = 1/ts; % frequency step
f=[-length(t)/2:length(t)/2-1]/(length(t)*ts); % frequency formula
xlabel('time (hours)');ylabel('Power (MW)')
title('Power Profile for 2nd Feb, 1998')
% fourier transform analysis
P1 = fft(P)/length(t);
amp=abs(P2); % amplitude
phi = angle(P2); % phase angle
xlabel('frequency (Hz)');ylabel('amplitude (MW)')
xlabel('frequency (Hz)');ylabel('phase angle (rad)')
% Pmodel=Ai*COS(i*w*t+phii)
% where, w=2*pi/24 and i is the harmonic order
% Here, up to the third harmonic is enough
% and using Parseval's Theorem, the model is:
% PP=12.6635+2*(1.9806*cos(w*tt+1.807)+0.86388*cos(2*w*tt+2.0769)+0.39683*cos(3*w*tt- 1.8132));
plot(t,P,'LineWidth',1.5);grid on
hold on;
legend('original','model');xlabel('time (hours )');ylabel('Power (MW)')
% But here is a problem, the modeled signal is shifted
% by 1 comparing to the original one
% I redraw the two figures together by plotting Pmodeled vs t+1
% Actually, I don't know why it is shifted, but they are
% exactly identical with shifting by 1
plot(t,P,'LineWidth',1.5);grid on
hold on;
legend('original','model');xlabel('time (hours )');ylabel('Power (MW)')
Why has this shifting problem happened, and how can I solve it?
The problem is with
line 2
"t = [1:ts:24];"
it should be "t= 0:ts:23;"
