Real-time peak detection in noisy sinusoidal time-series - algorithm

I have been attempting to detect peaks in sinusoidal time-series data in real time, however I've had no success thus far. I cannot seem to find a real-time algorithm that works to detect peaks in sinusoidal signals with a reasonable level of accuracy. I either get no peaks detected, or I get a zillion points along the sine wave being detected as peaks.
What is a good real-time algorithm for input signals that resemble a sine wave, and may contain some random noise?
As a simple test case, consider a stationary, sine wave that is always the same frequency and amplitude. (The exact frequency and amplitude don't matter; I have arbitrarily chosen a frequency of 60 Hz, an amplitude of +/− 1 unit, at a sampling rate of 8 KS/s.) The following MATLAB code will generate such a sinusoidal signal:
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
x = sin(2*pi*60*t);
Using the algorithm developed and published by Jean-Paul, I either get no peaks detected (left) or a zillion "peaks" detected (right):
I've tried just about every combination of values for these 3 parameters that I could think of, following the "rules of thumb" that Jean-Paul gives, but I have so far been unable to get my expected result.
I found an alternative algorithm, developed and published by Eli Billauer, that does give me the results that I want—e.g.:
Even though Eli Billauer's algorithm is much simpler and does tend to reliably produce the results that I want, it is not suitable for real-time applications.
As another example of a signal that I'd like to apply such an algorithm to, consider the test case given by Eli Billauer for his own algorithm:
t = 0:0.001:10;
x = 0.3*sin(t) + sin(1.3*t) + 0.9*sin(4.2*t) + 0.02*randn(1, 10001);
This is a more unusual (less uniform/regular) signal, with a varying frequency and amplitude, but still generally sinusoidal. The peaks are plainly obvious to the eye when plotted, but hard to identify with an algorithm.
What is a good real-time algorithm to correctly identify the peaks in a sinusoidal input signal? I am not really an expert when it comes to signal processing, so it would be helpful to get some rules of thumb that consider sinusoidal inputs. Or, perhaps I need to modify e.g. Jean-Paul's algorithm itself in order to work properly on sinusoidal signals. If that's the case, what modifications would be required, and how would I go about making these?

Case 1: sinusoid without noise
If your sinusoid does not contain any noise, you can use a very classic signal processing technique: taking the first derivative and detecting when it is equal to zero.
For example:
function signal = derivesignal( d )
% Identify signal
signal = zeros(size(d));
for i=2:length(d)
if d(i-1) > 0 && d(i) <= 0
signal(i) = +1; % peak detected
elseif d(i-1) < 0 && d(i) >= 0
signal(i) = -1; % trough detected
end
end
end
Using your example data:
% Generate data
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
y = sin(2*pi*60*t);
% Add some trends
y(1:1000) = y(1:1000) + 0.001*(1:1000)';
y(1001:2000) = y(1001:2000) - 0.002*(1:1000)';
% Approximate first derivative (delta y / delta x)
d = [0; diff(y)];
% Identify signal
signal = derivesignal(d);
% Plot result
figure(1); clf; set(gcf,'Position',[0 0 677 600])
subplot(4,1,1); hold on;
title('Data');
plot(t,y);
subplot(4,1,2); hold on;
title('First derivative');
area(d);
ylim([-0.05, 0.05]);
subplot(4,1,3); hold on;
title('Signal (-1 for trough, +1 for peak)');
plot(t,signal); ylim([-1.5 1.5]);
subplot(4,1,4); hold on;
title('Signals marked on data');
markers = abs(signal) > 0;
plot(t,y); scatter(t(markers),y(markers),30,'or','MarkerFaceColor','red');
This yields:
This method will work extremely well for any type of sinusoid, with the only requirement that the input signal contains no noise.
Case 2: sinusoid with noise
As soon as your input signal contains noise, the derivative method will fail. For example:
% Generate data
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
y = sin(2*pi*60*t);
% Add some trends
y(1:1000) = y(1:1000) + 0.001*(1:1000)';
y(1001:2000) = y(1001:2000) - 0.002*(1:1000)';
% Add some noise
y = y + 0.2.*randn(2000,1);
Will now generate this result because first differences amplify noise:
Now there are many ways to deal with noise, and the most standard way is to apply a moving average filter. One disadvantage of moving averages is that they are slow to adapt to new information, such that signals may be identified after they have occurred (moving averages have a lag).
Another very typical approach is to use Fourier Analysis to identify all the frequencies in your input data, disregard all low-amplitude and high-frequency sinusoids, and use the remaining sinusoid as a filter. The remaining sinusoid will be (largely) cleansed from the noise and you can then use first-differencing again to determine the peaks and troughs (or for a single sine wave you know the peaks and troughs happen at 1/4 and 3/4 pi of the phase). I suggest you pick up any signal processing theory book to learn more about this technique. Matlab also has some educational material about this.
If you want to use this algorithm in hardware, I would suggest you also take a look at WFLC (Weighted Fourier Linear Combiner) with e.g. 1 oscillator or PLL (Phase-Locked Loop) that can estimate the phase of a noisy wave without doing a full Fast Fourier Transform. You can find a Matlab algorithm for a phase-locked loop on Wikipedia.
I will suggest a slightly more sophisticated approach here that will identify the peaks and troughs in real-time: fitting a sine wave function to your data using moving least squares minimization with initial estimates from Fourier analysis.
Here is my function to do that:
function [result, peaks, troughs] = fitsine(y, t, eps)
% Fast fourier-transform
f = fft(y);
l = length(y);
p2 = abs(f/l);
p1 = p2(1:ceil(l/2+1));
p1(2:end-1) = 2*p1(2:end-1);
freq = (1/mean(diff(t)))*(0:ceil(l/2))/l;
% Find maximum amplitude and frequency
maxPeak = p1 == max(p1(2:end)); % disregard 0 frequency!
maxAmplitude = p1(maxPeak); % find maximum amplitude
maxFrequency = freq(maxPeak); % find maximum frequency
% Initialize guesses
p = [];
p(1) = mean(y); % vertical shift
p(2) = maxAmplitude; % amplitude estimate
p(3) = maxFrequency; % phase estimate
p(4) = 0; % phase shift (no guess)
p(5) = 0; % trend (no guess)
% Create model
f = #(p) p(1) + p(2)*sin( p(3)*2*pi*t+p(4) ) + p(5)*t;
ferror = #(p) sum((f(p) - y).^2);
% Nonlinear least squares
% If you have the Optimization toolbox, use [lsqcurvefit] instead!
options = optimset('MaxFunEvals',50000,'MaxIter',50000,'TolFun',1e-25);
[param,fval,exitflag,output] = fminsearch(ferror,p,options);
% Calculate result
result = f(param);
% Find peaks
peaks = abs(sin(param(3)*2*pi*t+param(4)) - 1) < eps;
% Find troughs
troughs = abs(sin(param(3)*2*pi*t+param(4)) + 1) < eps;
end
As you can see, I first perform a Fourier transform to find initial estimates of the amplitude and frequency of the data. I then fit a sinusoid to the data using the model a + b sin(ct + d) + et. The fitted values represent a sine wave of which I know that +1 and -1 are the peaks and troughs, respectively. I can therefore identify these values as the signals.
This works very well for sinusoids with (slowly changing) trends and general (white) noise:
% Generate data
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
y = sin(2*pi*60*t);
% Add some trends
y(1:1000) = y(1:1000) + 0.001*(1:1000)';
y(1001:2000) = y(1001:2000) - 0.002*(1:1000)';
% Add some noise
y = y + 0.2.*randn(2000,1);
% Loop through data (moving window) and fit sine wave
window = 250; % How many data points to consider
interval = 10; % How often to estimate
result = nan(size(y));
signal = zeros(size(y));
for i = window+1:interval:length(y)
data = y(i-window:i); % Get data window
period = t(i-window:i); % Get time window
[output, peaks, troughs] = fitsine(data,period,0.01);
result(i-interval:i) = output(end-interval:end);
signal(i-interval:i) = peaks(end-interval:end) - troughs(end-interval:end);
end
% Plot result
figure(1); clf; set(gcf,'Position',[0 0 677 600])
subplot(4,1,1); hold on;
title('Data');
plot(t,y); xlim([0 max(t)]); ylim([-4 4]);
subplot(4,1,2); hold on;
title('Model fit');
plot(t,result,'-k'); xlim([0 max(t)]); ylim([-4 4]);
subplot(4,1,3); hold on;
title('Signal (-1 for trough, +1 for peak)');
plot(t,signal,'r','LineWidth',2); ylim([-1.5 1.5]);
subplot(4,1,4); hold on;
title('Signals marked on data');
markers = abs(signal) > 0;
plot(t,y,'-','Color',[0.1 0.1 0.1]);
scatter(t(markers),result(markers),30,'or','MarkerFaceColor','red');
xlim([0 max(t)]); ylim([-4 4]);
Main advantages of this approach are:
You have an actual model of your data, so you can predict signals in the future before they happen! (e.g. fix the model and calculate the result by inputting future time periods)
You don't need to estimate the model every period (see parameter interval in the code)
The disadvantage is that you need to select a lookback window, but you will have this problem with any method that you use for real-time detection.
Video demonstration
Data is the input data, Model fit is the fitted sine wave to the data (see code), Signal indicates the peaks and troughs and Signals marked on data gives an impression of how accurate the algorithm is. Note: watch the model fit adjust itself to the trend in the middle of the graph!
That should get you started. There are also a lot of excellent books on signal detection theory (just google that term), which will go much further into these types of techniques. Good luck!

Consider using findpeaks, it is fast, which may be important for realtime. You should filter high-frequency noise to improve accuracy. here I smooth the data with a moving window.
t = 0:0.001:10;
x = 0.3*sin(t) + sin(1.3*t) + 0.9*sin(4.2*t) + 0.02*randn(1, 10001);
[~,iPeak0] = findpeaks(movmean(x,100),'MinPeakProminence',0.5);
You can time the process (0.0015sec)
f0 = #() findpeaks(movmean(x,100),'MinPeakProminence',0.5)
disp(timeit(f0,2))
To compare, processing the slope is only a bit faster (0.00013sec), but findpeaks have many useful options, such as minimum interval between peaks etc.
iPeaks1 = derivePeaks(x);
f1 = #() derivePeaks(x)
disp(timeit(f1,1))
Where derivePeaks is:
function iPeak1 = derivePeaks(x)
xSmooth = movmean(x,100);
goingUp = find(diff(movmean(xSmooth,100)) > 0);
iPeak1 = unique(goingUp([1,find(diff(goingUp) > 100),end]));
iPeak1(iPeak1 == 1 | iPeak1 == length(iPeak1)) = [];
end

Related

Multiliteration implementation with inaccurate distance data

I am trying to create an android smartphone application which uses Apples iBeacon technology to determine the current indoor location of itself. I already managed to get all available beacons and calculate the distance to them via the rssi signal.
Currently I face the problem, that I am not able to find any library or implementation of an algorithm, which calculates the estimated location in 2D by using 3 (or more) distances of fixed points with the condition, that these distances are not accurate (which means, that the three "trilateration-circles" do not intersect in one point).
I would be deeply grateful if anybody can post me a link or an implementation of that in any common programming language (Java, C++, Python, PHP, Javascript or whatever). I already read a lot on stackoverflow about that topic, but could not find any answer I were able to convert in code (only some mathematical approaches with matrices and inverting them, calculating with vectors or stuff like that).
EDIT
I thought about an own approach, which works quite well for me, but is not that efficient and scientific. I iterate over every meter (or like in my example 0.1 meter) of the location grid and calculate the possibility of that location to be the actual position of the handset by comparing the distance of that location to all beacons and the distance I calculate with the received rssi signal.
Code example:
public Location trilaterate(ArrayList<Beacon> beacons, double maxX, double maxY)
{
for (double x = 0; x <= maxX; x += .1)
{
for (double y = 0; y <= maxY; y += .1)
{
double currentLocationProbability = 0;
for (Beacon beacon : beacons)
{
// distance difference between calculated distance to beacon transmitter
// (rssi-calculated distance) and current location:
// |sqrt(dX^2 + dY^2) - distanceToTransmitter|
double distanceDifference = Math
.abs(Math.sqrt(Math.pow(beacon.getLocation().x - x, 2)
+ Math.pow(beacon.getLocation().y - y, 2))
- beacon.getCurrentDistanceToTransmitter());
// weight the distance difference with the beacon calculated rssi-distance. The
// smaller the calculated rssi-distance is, the more the distance difference
// will be weighted (it is assumed, that nearer beacons measure the distance
// more accurate)
distanceDifference /= Math.pow(beacon.getCurrentDistanceToTransmitter(), 0.9);
// sum up all weighted distance differences for every beacon in
// "currentLocationProbability"
currentLocationProbability += distanceDifference;
}
addToLocationMap(currentLocationProbability, x, y);
// the previous line is my approach, I create a Set of Locations with the 5 most probable locations in it to estimate the accuracy of the measurement afterwards. If that is not necessary, a simple variable assignment for the most probable location would do the job also
}
}
Location bestLocation = getLocationSet().first().location;
bestLocation.accuracy = calculateLocationAccuracy();
Log.w("TRILATERATION", "Location " + bestLocation + " best with accuracy "
+ bestLocation.accuracy);
return bestLocation;
}
Of course, the downside of that is, that I have on a 300m² floor 30.000 locations I had to iterate over and measure the distance to every single beacon I got a signal from (if that would be 5, I do 150.000 calculations only for determine a single location). That's a lot - so I will let the question open and hope for some further solutions or a good improvement of this existing solution in order to make it more efficient.
Of course it has not to be a Trilateration approach, like the original title of this question was, it is also good to have an algorithm which includes more than three beacons for the location determination (Multilateration).
If the current approach is fine except for being too slow, then you could speed it up by recursively subdividing the plane. This works sort of like finding nearest neighbors in a kd-tree. Suppose that we are given an axis-aligned box and wish to find the approximate best solution in the box. If the box is small enough, then return the center.
Otherwise, divide the box in half, either by x or by y depending on which side is longer. For both halves, compute a bound on the solution quality as follows. Since the objective function is additive, sum lower bounds for each beacon. The lower bound for a beacon is the distance of the circle to the box, times the scaling factor. Recursively find the best solution in the child with the lower lower bound. Examine the other child only if the best solution in the first child is worse than the other child's lower bound.
Most of the implementation work here is the box-to-circle distance computation. Since the box is axis-aligned, we can use interval arithmetic to determine the precise range of distances from box points to the circle center.
P.S.: Math.hypot is a nice function for computing 2D Euclidean distances.
Instead of taking confidence levels of individual beacons into account, I would instead try to assign an overall confidence level for your result after you make the best guess you can with the available data. I don't think the only available metric (perceived power) is a good indication of accuracy. With poor geometry or a misbehaving beacon, you could be trusting poor data highly. It might make better sense to come up with an overall confidence level based on how well the perceived distance to the beacons line up with the calculated point assuming you trust all beacons equally.
I wrote some Python below that comes up with a best guess based on the provided data in the 3-beacon case by calculating the two points of intersection of circles for the first two beacons and then choosing the point that best matches the third. It's meant to get started on the problem and is not a final solution. If beacons don't intersect, it slightly increases the radius of each up until they do meet or a threshold is met. Likewise, it makes sure the third beacon agrees within a settable threshold. For n-beacons, I would pick 3 or 4 of the strongest signals and use those. There are tons of optimizations that could be done and I think this is a trial-by-fire problem due to the unwieldy nature of beaconing.
import math
beacons = [[0.0,0.0,7.0],[0.0,10.0,7.0],[10.0,5.0,16.0]] # x, y, radius
def point_dist(x1,y1,x2,y2):
x = x2-x1
y = y2-y1
return math.sqrt((x*x)+(y*y))
# determines two points of intersection for two circles [x,y,radius]
# returns None if the circles do not intersect
def circle_intersection(beacon1,beacon2):
r1 = beacon1[2]
r2 = beacon2[2]
dist = point_dist(beacon1[0],beacon1[1],beacon2[0],beacon2[1])
heron_root = (dist+r1+r2)*(-dist+r1+r2)*(dist-r1+r2)*(dist+r1-r2)
if ( heron_root > 0 ):
heron = 0.25*math.sqrt(heron_root)
xbase = (0.5)*(beacon1[0]+beacon2[0]) + (0.5)*(beacon2[0]-beacon1[0])*(r1*r1-r2*r2)/(dist*dist)
xdiff = 2*(beacon2[1]-beacon1[1])*heron/(dist*dist)
ybase = (0.5)*(beacon1[1]+beacon2[1]) + (0.5)*(beacon2[1]-beacon1[1])*(r1*r1-r2*r2)/(dist*dist)
ydiff = 2*(beacon2[0]-beacon1[0])*heron/(dist*dist)
return (xbase+xdiff,ybase-ydiff),(xbase-xdiff,ybase+ydiff)
else:
# no intersection, need to pseudo-increase beacon power and try again
return None
# find the two points of intersection between beacon0 and beacon1
# will use beacon2 to determine the better of the two points
failing = True
power_increases = 0
while failing and power_increases < 10:
res = circle_intersection(beacons[0],beacons[1])
if ( res ):
intersection = res
else:
beacons[0][2] *= 1.001
beacons[1][2] *= 1.001
power_increases += 1
continue
failing = False
# make sure the best fit is within x% (10% of the total distance from the 3rd beacon in this case)
# otherwise the results are too far off
THRESHOLD = 0.1
if failing:
print 'Bad Beacon Data (Beacon0 & Beacon1 don\'t intersection after many "power increases")'
else:
# finding best point between beacon1 and beacon2
dist1 = point_dist(beacons[2][0],beacons[2][1],intersection[0][0],intersection[0][1])
dist2 = point_dist(beacons[2][0],beacons[2][1],intersection[1][0],intersection[1][1])
if ( math.fabs(dist1-beacons[2][2]) < math.fabs(dist2-beacons[2][2]) ):
best_point = intersection[0]
best_dist = dist1
else:
best_point = intersection[1]
best_dist = dist2
best_dist_diff = math.fabs(best_dist-beacons[2][2])
if best_dist_diff < THRESHOLD*best_dist:
print best_point
else:
print 'Bad Beacon Data (Beacon2 distance to best point not within threshold)'
If you want to trust closer beacons more, you may want to calculate the intersection points between the two closest beacons and then use the farther beacon to tie-break. Keep in mind that almost anything you do with "confidence levels" for the individual measurements will be a hack at best. Since you will always be working with very bad data, you will defintiely need to loosen up the power_increases limit and threshold percentage.
You have 3 points : A(xA,yA,zA), B(xB,yB,zB) and C(xC,yC,zC), which respectively are approximately at dA, dB and dC from you goal point G(xG,yG,zG).
Let's say cA, cB and cC are the confidence rate ( 0 < cX <= 1 ) of each point.
Basically, you might take something really close to 1, like {0.95,0.97,0.99}.
If you don't know, try different coefficient depending of distance avg. If distance is really big, you're likely to be not very confident about it.
Here is the way i'll do it :
var sum = (cA*dA) + (cB*dB) + (cC*dC);
dA = cA*dA/sum;
dB = cB*dB/sum;
dC = cC*dC/sum;
xG = (xA*dA) + (xB*dB) + (xC*dC);
yG = (yA*dA) + (yB*dB) + (yC*dC);
xG = (zA*dA) + (zB*dB) + (zC*dC);
Basic, and not really smart but will do the job for some simple tasks.
EDIT
You can take any confidence coef you want in [0,inf[, but IMHO, restraining at [0,1] is a good idea to keep a realistic result.

Minimizing distance to a weighted grid

Lets suppose you have a 1000x1000 grid of positive integer weights W.
We want to find the cell that minimizes the average weighted distance.to each cell.
The brute force way to do this would be to loop over each candidate cell and calculate the distance:
int best_x, best_y, best_dist;
for x0 = 1:1000,
for y0 = 1:1000,
int total_dist = 0;
for x1 = 1:1000,
for y1 = 1:1000,
total_dist += W[x1,y1] * sqrt((x0-x1)^2 + (y0-y1)^2);
if (total_dist < best_dist)
best_x = x0;
best_y = y0;
best_dist = total_dist;
This takes ~10^12 operations, which is too long.
Is there a way to do this in or near ~10^8 or so operations?
Theory
This is possible using Filters in O(n m log nm ) time where n, m are the grid dimensions.
You need to define a filter of size 2n + 1 x 2m + 1, and you need to (centered) embed your original weight grid in a grid of zeros of size 3n x 3m. The filter needs to be the distance weighting from the origin at (n,m):
F(i,j) = sqrt((n-i)^2 + (m-j)^2)
Let W denote the original weight grid (centered) embedded in a grid of zeros of size 3n x 3m.
Then the filtered (cross-correlation) result
R = F o W
will give you total_dist grid, simply take the min R (ignoring the extra embedded zeros you put into W) to find your best x0, y0 positions.
Image (i.e. Grid) filtering is very standard, and can be done in all sorts of different existing software such as matlab, with the imfilter command.
I should note, though I explicitly made use of cross-correlation above, you would get the same result with convolution only because your filter F is symmetric. In general, image filter is cross-correlation, not convolution, though the two operations are very analogous.
The reason for the O(nm log nm ) runtime is because image filtering can be done using 2D FFT's.
Implemenation
Here are both implementations in Matlab, final result is the same for both methods and they are benchmarked in a very simple way:
m=100;
n=100;
W0=abs(randn(m,n))+.001;
tic;
%The following padding is not necessary in the matlab code because
%matlab implements it in the imfilter function, from the imfilter
%documentation:
% - Boundary options
%
% X Input array values outside the bounds of the array
% are implicitly assumed to have the value X. When no
% boundary option is specified, imfilter uses X = 0.
%W=padarray(W0,[m n]);
W=W0;
F=zeros(2*m+1,2*n+1);
for i=1:size(F,1)
for j=1:size(F,2)
%This is matlab where indices start from 1, hence the need
%for m-1 and n-1 in the equations
F(i,j)=sqrt((i-m-1)^2 + (j-n-1)^2);
end
end
R=imfilter(W,F);
[mr mc] = ind2sub(size(R),find(R == min(R(:))));
[mr, mc]
toc;
tic;
T=zeros([m n]);
best_x=-1;
best_y=-1;
best_val=inf;
for y0=1:m
for x0=1:n
total_dist = 0;
for y1=1:m
for x1=1:n
total_dist = total_dist + W0(y1,x1) * sqrt((x0-x1)^2 + (y0-y1)^2);
end
end
T(y0,x0) = total_dist;
if ( total_dist < best_val )
best_x = x0;
best_y = y0;
best_val = total_dist;
end
end
end
[best_y best_x]
toc;
diff=abs(T-R);
max_diff=max(diff(:));
fprintf('The max difference between the two computations: %g\n', max_diff);
Performance
For an 800x800 grid, on my PC which is certainly not the fastest, the FFT method evaluates in just over 700 seconds. The brute force method doesn't complete after several hours and I have to kill it.
In terms of further performance gains, you can attain them by moving to a hardware platform like GPUs. For example, using CUDA's FFT library you can compute 2D FFT's in a fraction of the time it takes on a CPU. The key point is, that the FFT method will scale as you throw more hardware to do the computation, while the brute force method will scale much worse.
Observations
While implementing this, I have observed that almost every time, the best_x,bext_y values are one of floor(n/2)+-1. This means that most likely the distance term dominates the entire computation, therefore, you could get away with computing the value of total_dist for only 4 values, making this algorithm trivial!

image processing algorithm in MATLAB

I am trying to implement an algorithm described in this paper:
Decomposition of biospeckle images in temporary spectral bands
Here is an explanation of the algorithm:
We recorded a sequence of N successive speckle images with a sampling
frequency fs. In this way it was possible to observe how a pixel
evolves through the N images. That evolution can be treated as a time
series and can be processed in the following way: Each signal
corresponding to the evolution of every pixel was used as input to a
bank of filters. The intensity values were previously divided by their
temporal mean value to minimize local differences in reflectivity or
illumination of the object. The maximum frequency that can be
adequately analyzed is determined by the sampling theorem and s half
of sampling frequency fs. The latter is set by the CCD camera, the
size of the image, and the frame grabber. The bank of filters is
outlined in Fig. 1.
In our case, ten 5° order Butterworth filters
were used, but this number can be varied according to the required
discrimination. The bank was implemented in a computer using MATLAB
software. We chose the Butter-worth filter because, in addition to its
simplicity, it is maximally flat. Other filters, an infinite impulse
response, or a finite impulse response could be used.
By means of this
bank of filters, ten corresponding signals of each filter of each
temporary pixel evolution were obtained as output. Average energy Eb
in each signal was then calculated:
where pb(n) is the intensity of the filtered pixel in the nth image
for filter b divided by its mean value and N is the total number of
images. In this way, En values of energy for each pixel were obtained,
each of hem belonging to one of the frequency bands in Fig. 1.
With these values it is possible to build ten images of the active object,
each one of which shows how much energy of time-varying speckle there
is in a certain frequency band. False color assignment to the gray
levels in the results would help in discrimination.
and here is my MATLAB code base on that :
for i=1:520
for j=1:368
ts = [];
for k=1:600
ts = [ts D{k}(i,j)]; %%% kth image pixel i,j --- ts is time series
end
ts = double(ts);
temp = mean(ts);
if (temp==0)
for l=1:10
filtImag1{l}(i,j)=0;
end
continue;
end
ts = ts-temp;
ts = ts/temp;
N = 5; % filter order
W = [0.0 0.10;0.10 0.20;0.20 0.30;0.30 0.40;0.40 0.50;0.50 0.60 ;0.60 0.70;0.70 0.80 ;0.80 0.90;0.90 1.0];
[B,A]=butter(N,0.10,'low');
ts_f(1,:) = filter(B,A,ts);
N1 = 5;
for ind = 2:9
Wn = W(ind,:);
[B,A] = butter(N1,Wn);
ts_f(ind,:) = filter(B,A,ts);
end
[B,A]=butter(N,0.90,'high');
ts_f(10,:) = filter(B,A,ts);
for ind=1:10
%Following Paper Suggestion
filtImag1{ind}(i,j) =sum(ts_f(ind,:).^2);
end
end
end
for i=1:10
figure,imshow(filtImag1{i});
colorbar
end
pre_max = max(filtImag1{1}(:));
for i=1:10
new_max = max(filtImag1{i}(:));
if (pre_max<new_max)
pre_max=max(filtImag1{i}(:));
end
end
new_max = pre_max;
pre_min = min(filtImag1{1}(:));
for i=1:10
new_min = min(filtImag1{i}(:));
if (pre_min>new_min)
pre_min = min(filtImag1{i}(:));
end
end
new_min = pre_min;
%normalize
for i=1:10
temp_imag = filtImag1{i}(:,:);
x=isnan(temp_imag);
temp_imag(x)=0;
t_max = max(max(temp_imag));
t_min = min(min(temp_imag));
temp_imag = (double(temp_imag-t_min)).*((double(new_max)-double(new_min))/double(t_max-t_min))+(double(new_min));
%median filter
%temp_imag = medfilt2(temp_imag);
imag_test2{i}(:,:) = temp_imag;
end
for i=1:10
figure,imshow(imag_test2{i});
colorbar
end
for i=1:10
A=imag_test2{i}(:,:);
B=A/max(max(A));
B=histeq(A);
figure,imshow(B);
colorbar
imag_test2{i}(:,:)=B;
end
but I am not getting the same result as paper. has anybody has any idea why? or where I have gone wrong?
EDIT
by getting help from #Amro and using his code I endup with the following images:
here is my Original Image from 72hrs germinated Lentil (400 images, with 5 frame per second):
here is the results images for 10 different band :
A couple of issue I can spot:
when you divide the signal by its mean, you need to check that it was not zero. Otherwise the result will be NaN.
the authors (I am following this article) used a bank of filters with frequency bands covering the entire range up to the Nyquist frequency. You are doing half of that. The normalized frequencies you pass to butter should go all the way up to 1 (corresponds to fs/2)
When computing the energy of each filtered signal, I think you should not divide by its mean (you have already accounted for that before). Instead simply do: E = sum(sig.^2); for each of the filtered signals
In the last post-processing step, you should normalize to the range [0,1], and then apply the median filtering algorithm medfilt2. The computation doesn't look right, it should be something like:
img = ( img - min(img(:)) ) ./ ( max(img(:)) - min(img(:)) );
EDIT:
With the above points in mind, I tried to rewrite the code in a vectorized way. Since you didn't post sample input images, I can't test if the result is as expected... Plus I am not sure how to interpret the final images anyway :)
%# read biospeckle images
fnames = dir( fullfile('folder','myimages*.jpg') );
fnames = {fnames.name};
N = numel(fnames); %# number of images
Fs = 1; %# sampling frequency in Hz
sz = [209 278]; %# image sizes
T = zeros([sz N],'uint8'); %# store all images
for i=1:N
T(:,:,i) = imread( fullfile('folder',fnames{i}) );
end
%# timeseries corresponding to every pixel
T = reshape(T, [prod(sz) N])'; %# columns are the signals
T = double(T); %# work with double class
%# normalize signals before filtering (avoid division by zero)
mn = mean(T,1);
T = bsxfun(#rdivide, T, mn+(mn==0)); %# divide by temporal mean
%# bank of filters
numBanks = 10;
order = 5; % butterworth filter order
fCutoff = linspace(0, Fs/2, numBanks+1)'; % lower/upper cutoff freqs
W = [fCutoff(1:end-1) fCutoff(2:end)] ./ (Fs/2); % normalized frequency bands
W(1,1) = W(1,1) + 1e-5; % adjust first freq
W(end,end) = W(end,end) - 1e-5; % adjust last freq
%# filter signals using the bank of filters
Tf = cell(numBanks,1); %# filtered signals using each filter
for i=1:numBanks
[b,a] = butter(order, W(i,:)); %# bandpass filter
Tf{i} = filter(b,a,T); %# apply filter to all signals
end
clear T %# cleanup unnecessary stuff
%# compute average energy in each signal across frequency bands
Tf = cellfun(#(x)sum(x.^2,1), Tf, 'Uniform',false);
%# normalize each to [0,1], and build corresponding images
Tf = cellfun(#(x)reshape((x-min(x))./range(x),sz), Tf, 'Uniform',false);
%# show images
for i=1:numBanks
subplot(4,3,i), imshow(Tf{i})
title( sprintf('%g - %g Hz',W(i,:).*Fs/2) )
end
colormap(gray)
(I used the image from here for the above result)
EDIT#2
Made some changes and simplified the above code a bit. This shall reduce memory footprint. For example I used cell array instead of a single multidimensional matrix to store the result. That way we don't allocate one big block of contiguous memory. I also reused same variables instead of introducing new ones at each intermediate step...
The paper doesn't mention subtracting the mean of the time series, are you sure that's necessary? Also, you only compute the new_max and new_min once, from the last image.

Circular Hough Transform Improvements

I'm working on an iris recognition algorithm that processes these kind of images into unique codes for identification and authentication purposes.
After filtering, intelligently thresholding, then finding edges in the image, the next step is obviously to fit circles to the pupil and iris. I've looked around the the technique to use is the circular Hough Transform. Here is the code for my implementation. Sorry about the cryptic variable names.
print "Populating Accumulator..."
# Loop over image rows
for x in range(w):
# Loop over image columns
for y in range(h):
# Only process black pixels
if inp[x,y] == 0:
# px,py = 0 means pupil, otherwise pupil center
if px == 0:
ra = r_min
rb = r_max
else:
rr = sqrt((px-x)*(px-x)+(py-y)*(py-y))
ra = int(rr-3)
rb = int(rr+3)
# a is the width of the image, b is the height
for _a in range(a):
for _b in range(b):
for _r in range(rb-ra):
s1 = x - (_a + a_min)
s2 = y - (_b + b_min)
r1 = _r + ra
if (s1 * s1 + s2 * s2 == r1 * r1):
new = acc[_a][_b][_r]
if new >= maxVotes:
maxVotes = new
print "Done"
# Average all circles with the most votes
for _a in range(a):
for _b in range(b):
for _r in range(r):
if acc[_a][_b][_r] >= maxVotes-1:
total_a += _a + a_min
total_b += _b + b_min
total_r += _r + r_min
amount += 1
top_a = total_a / amount
top_b = total_b / amount
top_r = total_r / amount
print top_a,top_b,top_r
This is written in python and uses the Python Imaging Library to do image processing. As you can see, this is a very naive brute force method of finding circles. It works, but takes several minutes. The basic idea is to draw circles from rmin to rmax wherever there is a black pixel (from thresholding and edge-detection), the build an accumulator array of the number of times a location on the image is "voted" on. Whichever x, y, and r has the most votes is the circle of interest. I tried to use the fact that the iris and pupil have about the same center (variables ra and rb) to reduce some of the complexity of the r loop, but the pupil detection takes so long that it doesn't matter.
Now, obviously my implementation is very naive. It uses a three dimensional parameter space (x, y, and r), which unfortunately makes it run slower than is acceptable. What kind of improvements can I make? Is there any way to reduce this to a two-dimensional parameter space? Is there a more efficient way of accessing and setting pixels that I'm not aware of?
On a side note, are there any other techniques for improving the overall runtime of this algorithm that I'm not aware of? Such as methods to approximate the maximum radius of the pupil or iris?
Note: I've tried to use OpenCV for this as well, but I could not tune the parameters enough to be consistently accurate.
Let me know if there's any other information that you need.
NOTE: Once again I misinterpreted my own code. It is technically 5-dimensional, but the 3-dimensional x,y,r loop only operates on black pixels.
Assuming you want the position of the circle rather than a measure of R.
If you have a decent estimate of the possible range of R then a common technique is to run the algorithm for a first guess of fixed R, adjust it and try again.

How to compute frequency of data using FFT?

I want to know the frequency of data. I had a little bit idea that it can be done using FFT, but I am not sure how to do it. Once I passed the entire data to FFT, then it is giving me 2 peaks, but how can I get the frequency?
Thanks a lot in advance.
Here's what you're probably looking for:
When you talk about computing the frequency of a signal, you probably aren't so interested in the component sine waves. This is what the FFT gives you. For example, if you sum sin(2*pi*10x)+sin(2*pi*15x)+sin(2*pi*20x)+sin(2*pi*25x), you probably want to detect the "frequency" as 5 (take a look at the graph of this function). However, the FFT of this signal will detect the magnitude of 0 for the frequency 5.
What you are probably more interested in is the periodicity of the signal. That is, the interval at which the signal becomes most like itself. So most likely what you want is the autocorrelation. Look it up. This will essentially give you a measure of how self-similar the signal is to itself after being shifted over by a certain amount. So if you find a peak in the autocorrelation, that would indicate that the signal matches up well with itself when shifted over that amount. There's a lot of cool math behind it, look it up if you are interested, but if you just want it to work, just do this:
Window the signal, using a smooth window (a cosine will do. The window should be at least twice as large as the largest period you want to detect. 3 times as large will give better results). (see http://zone.ni.com/devzone/cda/tut/p/id/4844 if you are confused).
Take the FFT (however, make sure the FFT size is twice as big as the window, with the second half being padded with zeroes. If the FFT size is only the size of the window, you will effectively be taking the circular autocorrelation, which is not what you want. see https://en.wikipedia.org/wiki/Discrete_Fourier_transform#Circular_convolution_theorem_and_cross-correlation_theorem )
Replace all coefficients of the FFT with their square value (real^2+imag^2). This is effectively taking the autocorrelation.
Take the iFFT
Find the largest peak in the iFFT. This is the strongest periodicity of the waveform. You can actually be a little more clever in which peak you pick, but for most purposes this should be enough. To find the frequency, you just take f=1/T.
Suppose x[n] = cos(2*pi*f0*n/fs) where f0 is the frequency of your sinusoid in Hertz, n=0:N-1, and fs is the sampling rate of x in samples per second.
Let X = fft(x). Both x and X have length N. Suppose X has two peaks at n0 and N-n0.
Then the sinusoid frequency is f0 = fs*n0/N Hertz.
Example: fs = 8000 samples per second, N = 16000 samples. Therefore, x lasts two seconds long.
Suppose X = fft(x) has peaks at 2000 and 14000 (=16000-2000). Therefore, f0 = 8000*2000/16000 = 1000 Hz.
If you have a signal with one frequency (for instance:
y = sin(2 pi f t)
With:
y time signal
f the central frequency
t time
Then you'll get two peaks, one at a frequency corresponding to f, and one at a frequency corresponding to -f.
So, to get to a frequency, can discard the negative frequency part. It is located after the positive frequency part. Furthermore, the first element in the array is a dc-offset, so the frequency is 0. (Beware that this offset is usually much more than 0, so the other frequency components might get dwarved by it.)
In code: (I've written it in python, but it should be equally simple in c#):
import numpy as np
from pylab import *
x = np.random.rand(100) # create 100 random numbers of which we want the fourier transform
x = x - mean(x) # make sure the average is zero, so we don't get a huge DC offset.
dt = 0.1 #[s] 1/the sampling rate
fftx = np.fft.fft(x) # the frequency transformed part
# now discard anything that we do not need..
fftx = fftx[range(int(len(fftx)/2))]
# now create the frequency axis: it runs from 0 to the sampling rate /2
freq_fftx = np.linspace(0,2/dt,len(fftx))
# and plot a power spectrum
plot(freq_fftx,abs(fftx)**2)
show()
Now the frequency is located at the largest peak.
If you are looking at the magnitude results from an FFT of the type most common used, then a strong sinusoidal frequency component of real data will show up in two places, once in the bottom half, plus its complex conjugate mirror image in the top half. Those two peaks both represent the same spectral peak and same frequency (for strictly real data). If the FFT result bin numbers start at 0 (zero), then the frequency of the sinusoidal component represented by the bin in the bottom half of the FFT result is most likely.
Frequency_of_Peak = Data_Sample_Rate * Bin_number_of_Peak / Length_of_FFT ;
Make sure to work out your proper units within the above equation (to get units of cycles per second, per fortnight, per kiloparsec, etc.)
Note that unless the wavelength of the data is an exact integer submultiple of the FFT length, the actual peak will be between bins, thus distributing energy among multiple nearby FFT result bins. So you may have to interpolate to better estimate the frequency peak. Common interpolation methods to find a more precise frequency estimate are 3-point parabolic and Sinc convolution (which is nearly the same as using a zero-padded longer FFT).
Assuming you use a discrete Fourier transform to look at frequencies, then you have to be careful about how to interpret the normalized frequencies back into physical ones (i.e. Hz).
According to the FFTW tutorial on how to calculate the power spectrum of a signal:
#include <rfftw.h>
...
{
fftw_real in[N], out[N], power_spectrum[N/2+1];
rfftw_plan p;
int k;
...
p = rfftw_create_plan(N, FFTW_REAL_TO_COMPLEX, FFTW_ESTIMATE);
...
rfftw_one(p, in, out);
power_spectrum[0] = out[0]*out[0]; /* DC component */
for (k = 1; k < (N+1)/2; ++k) /* (k < N/2 rounded up) */
power_spectrum[k] = out[k]*out[k] + out[N-k]*out[N-k];
if (N % 2 == 0) /* N is even */
power_spectrum[N/2] = out[N/2]*out[N/2]; /* Nyquist freq. */
...
rfftw_destroy_plan(p);
}
Note it handles data lengths that are not even. Note particularly if the data length is given, FFTW will give you a "bin" corresponding to the Nyquist frequency (sample rate divided by 2). Otherwise, you don't get it (i.e. the last bin is just below Nyquist).
A MATLAB example is similar, but they are choosing the length of 1000 (an even number) for the example:
N = length(x);
xdft = fft(x);
xdft = xdft(1:N/2+1);
psdx = (1/(Fs*N)).*abs(xdft).^2;
psdx(2:end-1) = 2*psdx(2:end-1);
freq = 0:Fs/length(x):Fs/2;
In general, it can be implementation (of the DFT) dependent. You should create a test pure sine wave at a known frequency and then make sure the calculation gives the same number.
Frequency = speed/wavelength.
Wavelength is the distance between the two peaks.

Resources