image processing algorithm in MATLAB - algorithm

I am trying to implement an algorithm described in this paper:
Decomposition of biospeckle images in temporary spectral bands
Here is an explanation of the algorithm:
We recorded a sequence of N successive speckle images with a sampling
frequency fs. In this way it was possible to observe how a pixel
evolves through the N images. That evolution can be treated as a time
series and can be processed in the following way: Each signal
corresponding to the evolution of every pixel was used as input to a
bank of filters. The intensity values were previously divided by their
temporal mean value to minimize local differences in reflectivity or
illumination of the object. The maximum frequency that can be
adequately analyzed is determined by the sampling theorem and s half
of sampling frequency fs. The latter is set by the CCD camera, the
size of the image, and the frame grabber. The bank of filters is
outlined in Fig. 1.
In our case, ten 5° order Butterworth filters
were used, but this number can be varied according to the required
discrimination. The bank was implemented in a computer using MATLAB
software. We chose the Butter-worth filter because, in addition to its
simplicity, it is maximally flat. Other filters, an infinite impulse
response, or a finite impulse response could be used.
By means of this
bank of filters, ten corresponding signals of each filter of each
temporary pixel evolution were obtained as output. Average energy Eb
in each signal was then calculated:
where pb(n) is the intensity of the filtered pixel in the nth image
for filter b divided by its mean value and N is the total number of
images. In this way, En values of energy for each pixel were obtained,
each of hem belonging to one of the frequency bands in Fig. 1.
With these values it is possible to build ten images of the active object,
each one of which shows how much energy of time-varying speckle there
is in a certain frequency band. False color assignment to the gray
levels in the results would help in discrimination.
and here is my MATLAB code base on that :
for i=1:520
for j=1:368
ts = [];
for k=1:600
ts = [ts D{k}(i,j)]; %%% kth image pixel i,j --- ts is time series
end
ts = double(ts);
temp = mean(ts);
if (temp==0)
for l=1:10
filtImag1{l}(i,j)=0;
end
continue;
end
ts = ts-temp;
ts = ts/temp;
N = 5; % filter order
W = [0.0 0.10;0.10 0.20;0.20 0.30;0.30 0.40;0.40 0.50;0.50 0.60 ;0.60 0.70;0.70 0.80 ;0.80 0.90;0.90 1.0];
[B,A]=butter(N,0.10,'low');
ts_f(1,:) = filter(B,A,ts);
N1 = 5;
for ind = 2:9
Wn = W(ind,:);
[B,A] = butter(N1,Wn);
ts_f(ind,:) = filter(B,A,ts);
end
[B,A]=butter(N,0.90,'high');
ts_f(10,:) = filter(B,A,ts);
for ind=1:10
%Following Paper Suggestion
filtImag1{ind}(i,j) =sum(ts_f(ind,:).^2);
end
end
end
for i=1:10
figure,imshow(filtImag1{i});
colorbar
end
pre_max = max(filtImag1{1}(:));
for i=1:10
new_max = max(filtImag1{i}(:));
if (pre_max<new_max)
pre_max=max(filtImag1{i}(:));
end
end
new_max = pre_max;
pre_min = min(filtImag1{1}(:));
for i=1:10
new_min = min(filtImag1{i}(:));
if (pre_min>new_min)
pre_min = min(filtImag1{i}(:));
end
end
new_min = pre_min;
%normalize
for i=1:10
temp_imag = filtImag1{i}(:,:);
x=isnan(temp_imag);
temp_imag(x)=0;
t_max = max(max(temp_imag));
t_min = min(min(temp_imag));
temp_imag = (double(temp_imag-t_min)).*((double(new_max)-double(new_min))/double(t_max-t_min))+(double(new_min));
%median filter
%temp_imag = medfilt2(temp_imag);
imag_test2{i}(:,:) = temp_imag;
end
for i=1:10
figure,imshow(imag_test2{i});
colorbar
end
for i=1:10
A=imag_test2{i}(:,:);
B=A/max(max(A));
B=histeq(A);
figure,imshow(B);
colorbar
imag_test2{i}(:,:)=B;
end
but I am not getting the same result as paper. has anybody has any idea why? or where I have gone wrong?
EDIT
by getting help from #Amro and using his code I endup with the following images:
here is my Original Image from 72hrs germinated Lentil (400 images, with 5 frame per second):
here is the results images for 10 different band :

A couple of issue I can spot:
when you divide the signal by its mean, you need to check that it was not zero. Otherwise the result will be NaN.
the authors (I am following this article) used a bank of filters with frequency bands covering the entire range up to the Nyquist frequency. You are doing half of that. The normalized frequencies you pass to butter should go all the way up to 1 (corresponds to fs/2)
When computing the energy of each filtered signal, I think you should not divide by its mean (you have already accounted for that before). Instead simply do: E = sum(sig.^2); for each of the filtered signals
In the last post-processing step, you should normalize to the range [0,1], and then apply the median filtering algorithm medfilt2. The computation doesn't look right, it should be something like:
img = ( img - min(img(:)) ) ./ ( max(img(:)) - min(img(:)) );
EDIT:
With the above points in mind, I tried to rewrite the code in a vectorized way. Since you didn't post sample input images, I can't test if the result is as expected... Plus I am not sure how to interpret the final images anyway :)
%# read biospeckle images
fnames = dir( fullfile('folder','myimages*.jpg') );
fnames = {fnames.name};
N = numel(fnames); %# number of images
Fs = 1; %# sampling frequency in Hz
sz = [209 278]; %# image sizes
T = zeros([sz N],'uint8'); %# store all images
for i=1:N
T(:,:,i) = imread( fullfile('folder',fnames{i}) );
end
%# timeseries corresponding to every pixel
T = reshape(T, [prod(sz) N])'; %# columns are the signals
T = double(T); %# work with double class
%# normalize signals before filtering (avoid division by zero)
mn = mean(T,1);
T = bsxfun(#rdivide, T, mn+(mn==0)); %# divide by temporal mean
%# bank of filters
numBanks = 10;
order = 5; % butterworth filter order
fCutoff = linspace(0, Fs/2, numBanks+1)'; % lower/upper cutoff freqs
W = [fCutoff(1:end-1) fCutoff(2:end)] ./ (Fs/2); % normalized frequency bands
W(1,1) = W(1,1) + 1e-5; % adjust first freq
W(end,end) = W(end,end) - 1e-5; % adjust last freq
%# filter signals using the bank of filters
Tf = cell(numBanks,1); %# filtered signals using each filter
for i=1:numBanks
[b,a] = butter(order, W(i,:)); %# bandpass filter
Tf{i} = filter(b,a,T); %# apply filter to all signals
end
clear T %# cleanup unnecessary stuff
%# compute average energy in each signal across frequency bands
Tf = cellfun(#(x)sum(x.^2,1), Tf, 'Uniform',false);
%# normalize each to [0,1], and build corresponding images
Tf = cellfun(#(x)reshape((x-min(x))./range(x),sz), Tf, 'Uniform',false);
%# show images
for i=1:numBanks
subplot(4,3,i), imshow(Tf{i})
title( sprintf('%g - %g Hz',W(i,:).*Fs/2) )
end
colormap(gray)
(I used the image from here for the above result)
EDIT#2
Made some changes and simplified the above code a bit. This shall reduce memory footprint. For example I used cell array instead of a single multidimensional matrix to store the result. That way we don't allocate one big block of contiguous memory. I also reused same variables instead of introducing new ones at each intermediate step...

The paper doesn't mention subtracting the mean of the time series, are you sure that's necessary? Also, you only compute the new_max and new_min once, from the last image.

Related

How can we detect these points in this case?

I have a data of pulse train samples as amplitude samples with equal intervals.
Let's call the sampled pulse amplitude array as A and time array as t.
So the plot is obtained by plot(t, A) in MATLAB.
Here below is plot of the pulse train:
And below is the zoomed version(green dots are samples, reds circles are max points):
What I need to do is, I need an algorithm which can detect and save the max point of each pulse(I circled them in red above) into an array.
So far I tried the following but didn't work:
kk = 0
for i=1:length(t)-2
if y(i)>0 & y(i+1)>y(i) & y(i+2)>y(i+1) & y(i+3)<y(i+2)
kk = kk+1;
maxPointTime(kk) = t(i+2);
maxPointVoltage(kk) = A(i+2);
end
end
So you want to find the local maxima, right? MATLAB has a build in function to do so, cf. doc.
x = 1:100;
A = (1-cos(2*pi*0.01*x)).*sin(2*pi*0.15*x);
TF = islocalmax(A);
plot(x,A,x(TF),A(TF),'r*')

Real-time peak detection in noisy sinusoidal time-series

I have been attempting to detect peaks in sinusoidal time-series data in real time, however I've had no success thus far. I cannot seem to find a real-time algorithm that works to detect peaks in sinusoidal signals with a reasonable level of accuracy. I either get no peaks detected, or I get a zillion points along the sine wave being detected as peaks.
What is a good real-time algorithm for input signals that resemble a sine wave, and may contain some random noise?
As a simple test case, consider a stationary, sine wave that is always the same frequency and amplitude. (The exact frequency and amplitude don't matter; I have arbitrarily chosen a frequency of 60 Hz, an amplitude of +/− 1 unit, at a sampling rate of 8 KS/s.) The following MATLAB code will generate such a sinusoidal signal:
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
x = sin(2*pi*60*t);
Using the algorithm developed and published by Jean-Paul, I either get no peaks detected (left) or a zillion "peaks" detected (right):
I've tried just about every combination of values for these 3 parameters that I could think of, following the "rules of thumb" that Jean-Paul gives, but I have so far been unable to get my expected result.
I found an alternative algorithm, developed and published by Eli Billauer, that does give me the results that I want—e.g.:
Even though Eli Billauer's algorithm is much simpler and does tend to reliably produce the results that I want, it is not suitable for real-time applications.
As another example of a signal that I'd like to apply such an algorithm to, consider the test case given by Eli Billauer for his own algorithm:
t = 0:0.001:10;
x = 0.3*sin(t) + sin(1.3*t) + 0.9*sin(4.2*t) + 0.02*randn(1, 10001);
This is a more unusual (less uniform/regular) signal, with a varying frequency and amplitude, but still generally sinusoidal. The peaks are plainly obvious to the eye when plotted, but hard to identify with an algorithm.
What is a good real-time algorithm to correctly identify the peaks in a sinusoidal input signal? I am not really an expert when it comes to signal processing, so it would be helpful to get some rules of thumb that consider sinusoidal inputs. Or, perhaps I need to modify e.g. Jean-Paul's algorithm itself in order to work properly on sinusoidal signals. If that's the case, what modifications would be required, and how would I go about making these?
Case 1: sinusoid without noise
If your sinusoid does not contain any noise, you can use a very classic signal processing technique: taking the first derivative and detecting when it is equal to zero.
For example:
function signal = derivesignal( d )
% Identify signal
signal = zeros(size(d));
for i=2:length(d)
if d(i-1) > 0 && d(i) <= 0
signal(i) = +1; % peak detected
elseif d(i-1) < 0 && d(i) >= 0
signal(i) = -1; % trough detected
end
end
end
Using your example data:
% Generate data
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
y = sin(2*pi*60*t);
% Add some trends
y(1:1000) = y(1:1000) + 0.001*(1:1000)';
y(1001:2000) = y(1001:2000) - 0.002*(1:1000)';
% Approximate first derivative (delta y / delta x)
d = [0; diff(y)];
% Identify signal
signal = derivesignal(d);
% Plot result
figure(1); clf; set(gcf,'Position',[0 0 677 600])
subplot(4,1,1); hold on;
title('Data');
plot(t,y);
subplot(4,1,2); hold on;
title('First derivative');
area(d);
ylim([-0.05, 0.05]);
subplot(4,1,3); hold on;
title('Signal (-1 for trough, +1 for peak)');
plot(t,signal); ylim([-1.5 1.5]);
subplot(4,1,4); hold on;
title('Signals marked on data');
markers = abs(signal) > 0;
plot(t,y); scatter(t(markers),y(markers),30,'or','MarkerFaceColor','red');
This yields:
This method will work extremely well for any type of sinusoid, with the only requirement that the input signal contains no noise.
Case 2: sinusoid with noise
As soon as your input signal contains noise, the derivative method will fail. For example:
% Generate data
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
y = sin(2*pi*60*t);
% Add some trends
y(1:1000) = y(1:1000) + 0.001*(1:1000)';
y(1001:2000) = y(1001:2000) - 0.002*(1:1000)';
% Add some noise
y = y + 0.2.*randn(2000,1);
Will now generate this result because first differences amplify noise:
Now there are many ways to deal with noise, and the most standard way is to apply a moving average filter. One disadvantage of moving averages is that they are slow to adapt to new information, such that signals may be identified after they have occurred (moving averages have a lag).
Another very typical approach is to use Fourier Analysis to identify all the frequencies in your input data, disregard all low-amplitude and high-frequency sinusoids, and use the remaining sinusoid as a filter. The remaining sinusoid will be (largely) cleansed from the noise and you can then use first-differencing again to determine the peaks and troughs (or for a single sine wave you know the peaks and troughs happen at 1/4 and 3/4 pi of the phase). I suggest you pick up any signal processing theory book to learn more about this technique. Matlab also has some educational material about this.
If you want to use this algorithm in hardware, I would suggest you also take a look at WFLC (Weighted Fourier Linear Combiner) with e.g. 1 oscillator or PLL (Phase-Locked Loop) that can estimate the phase of a noisy wave without doing a full Fast Fourier Transform. You can find a Matlab algorithm for a phase-locked loop on Wikipedia.
I will suggest a slightly more sophisticated approach here that will identify the peaks and troughs in real-time: fitting a sine wave function to your data using moving least squares minimization with initial estimates from Fourier analysis.
Here is my function to do that:
function [result, peaks, troughs] = fitsine(y, t, eps)
% Fast fourier-transform
f = fft(y);
l = length(y);
p2 = abs(f/l);
p1 = p2(1:ceil(l/2+1));
p1(2:end-1) = 2*p1(2:end-1);
freq = (1/mean(diff(t)))*(0:ceil(l/2))/l;
% Find maximum amplitude and frequency
maxPeak = p1 == max(p1(2:end)); % disregard 0 frequency!
maxAmplitude = p1(maxPeak); % find maximum amplitude
maxFrequency = freq(maxPeak); % find maximum frequency
% Initialize guesses
p = [];
p(1) = mean(y); % vertical shift
p(2) = maxAmplitude; % amplitude estimate
p(3) = maxFrequency; % phase estimate
p(4) = 0; % phase shift (no guess)
p(5) = 0; % trend (no guess)
% Create model
f = #(p) p(1) + p(2)*sin( p(3)*2*pi*t+p(4) ) + p(5)*t;
ferror = #(p) sum((f(p) - y).^2);
% Nonlinear least squares
% If you have the Optimization toolbox, use [lsqcurvefit] instead!
options = optimset('MaxFunEvals',50000,'MaxIter',50000,'TolFun',1e-25);
[param,fval,exitflag,output] = fminsearch(ferror,p,options);
% Calculate result
result = f(param);
% Find peaks
peaks = abs(sin(param(3)*2*pi*t+param(4)) - 1) < eps;
% Find troughs
troughs = abs(sin(param(3)*2*pi*t+param(4)) + 1) < eps;
end
As you can see, I first perform a Fourier transform to find initial estimates of the amplitude and frequency of the data. I then fit a sinusoid to the data using the model a + b sin(ct + d) + et. The fitted values represent a sine wave of which I know that +1 and -1 are the peaks and troughs, respectively. I can therefore identify these values as the signals.
This works very well for sinusoids with (slowly changing) trends and general (white) noise:
% Generate data
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
y = sin(2*pi*60*t);
% Add some trends
y(1:1000) = y(1:1000) + 0.001*(1:1000)';
y(1001:2000) = y(1001:2000) - 0.002*(1:1000)';
% Add some noise
y = y + 0.2.*randn(2000,1);
% Loop through data (moving window) and fit sine wave
window = 250; % How many data points to consider
interval = 10; % How often to estimate
result = nan(size(y));
signal = zeros(size(y));
for i = window+1:interval:length(y)
data = y(i-window:i); % Get data window
period = t(i-window:i); % Get time window
[output, peaks, troughs] = fitsine(data,period,0.01);
result(i-interval:i) = output(end-interval:end);
signal(i-interval:i) = peaks(end-interval:end) - troughs(end-interval:end);
end
% Plot result
figure(1); clf; set(gcf,'Position',[0 0 677 600])
subplot(4,1,1); hold on;
title('Data');
plot(t,y); xlim([0 max(t)]); ylim([-4 4]);
subplot(4,1,2); hold on;
title('Model fit');
plot(t,result,'-k'); xlim([0 max(t)]); ylim([-4 4]);
subplot(4,1,3); hold on;
title('Signal (-1 for trough, +1 for peak)');
plot(t,signal,'r','LineWidth',2); ylim([-1.5 1.5]);
subplot(4,1,4); hold on;
title('Signals marked on data');
markers = abs(signal) > 0;
plot(t,y,'-','Color',[0.1 0.1 0.1]);
scatter(t(markers),result(markers),30,'or','MarkerFaceColor','red');
xlim([0 max(t)]); ylim([-4 4]);
Main advantages of this approach are:
You have an actual model of your data, so you can predict signals in the future before they happen! (e.g. fix the model and calculate the result by inputting future time periods)
You don't need to estimate the model every period (see parameter interval in the code)
The disadvantage is that you need to select a lookback window, but you will have this problem with any method that you use for real-time detection.
Video demonstration
Data is the input data, Model fit is the fitted sine wave to the data (see code), Signal indicates the peaks and troughs and Signals marked on data gives an impression of how accurate the algorithm is. Note: watch the model fit adjust itself to the trend in the middle of the graph!
That should get you started. There are also a lot of excellent books on signal detection theory (just google that term), which will go much further into these types of techniques. Good luck!
Consider using findpeaks, it is fast, which may be important for realtime. You should filter high-frequency noise to improve accuracy. here I smooth the data with a moving window.
t = 0:0.001:10;
x = 0.3*sin(t) + sin(1.3*t) + 0.9*sin(4.2*t) + 0.02*randn(1, 10001);
[~,iPeak0] = findpeaks(movmean(x,100),'MinPeakProminence',0.5);
You can time the process (0.0015sec)
f0 = #() findpeaks(movmean(x,100),'MinPeakProminence',0.5)
disp(timeit(f0,2))
To compare, processing the slope is only a bit faster (0.00013sec), but findpeaks have many useful options, such as minimum interval between peaks etc.
iPeaks1 = derivePeaks(x);
f1 = #() derivePeaks(x)
disp(timeit(f1,1))
Where derivePeaks is:
function iPeak1 = derivePeaks(x)
xSmooth = movmean(x,100);
goingUp = find(diff(movmean(xSmooth,100)) > 0);
iPeak1 = unique(goingUp([1,find(diff(goingUp) > 100),end]));
iPeak1(iPeak1 == 1 | iPeak1 == length(iPeak1)) = [];
end

Average set of color images and standard deviation

I am learning image analysis and trying to average set of color images and get standard deviation at each pixel
I have done this, but it is not by averaging RGB channels. (for ex rchannel = I(:,:,1))
filelist = dir('dir1/*.jpg');
ims = zeros(215, 300, 3);
for i=1:length(filelist)
imname = ['dir1/' filelist(i).name];
rgbim = im2double(imread(imname));
ims = ims + rgbim;
end
avgset1 = ims/length(filelist);
figure;
imshow(avgset1);
I am not sure if this is correct. I am confused as to how averaging images is useful.
Also, I couldn't get the matrix holding standard deviation.
Any help is appreciated.
If you are concerned about finding the mean RGB image, then your code is correct. What I like is that you converted the images using im2double before accumulating the mean and so you are making everything double precision. As what Parag said, finding the mean image is very useful especially in machine learning. It is common to find the mean image of a set of images before doing image classification as it allows the dynamic range of each pixel to be within a normalized range. This allows the training of the learning algorithm to converge quickly to the optimum solution and provide the best set of parameters to facilitate the best accuracy in classification.
If you want to find the mean RGB colour which is the average colour over all images, then no your code is not correct.
You have summed over all channels individually which is stored in sumrgbims, so the last step you need to do now take this image and sum over each channel individually. Two calls to sum in the first and second dimensions chained together will help. This will produce a 1 x 1 x 3 vector, so using squeeze after this to remove the singleton dimensions and get a 3 x 1 vector representing the mean RGB colour over all images is what you get.
Therefore:
mean_colour = squeeze(sum(sum(sumrgbims, 1), 2));
To address your second question, I'm assuming you want to find the standard deviation of each pixel value over all images. What you will have to do is accumulate the square of each image in addition to accumulating each image inside the loop. After that, you know that the standard deviation is the square root of the variance, and the variance is equal to the average sum of squares subtracted by the mean squared. We have the mean image, now you just have to square the mean image and subtract this with the average sum of squares. Just to be sure our math is right, supposing we have a signal X with a mean mu. Given that we have N values in our signal, the variance is thus equal to:
Source: Science Buddies
The standard deviation would simply be the square root of the above result. We would thus calculate this for each pixel independently. Therefore you can modify your loop to do that for you:
filelist = dir('set1/*.jpg');
sumrgbims = zeros(215, 300, 3);
sum2rgbims = sumrgbims; % New - for standard deviation
for i=1:length(filelist)
imname = ['set1/' filelist(i).name];
rgbim = im2double(imread(imname));
sumrgbims = sumrgbims + rgbim;
sum2rgbims = sum2rgbims + rgbim.^2; % New
end
rgbavgset1 = sumrgbims/length(filelist);
% New - find standard deviation
rgbstdset1 = ((sum2rgbims / length(filelist)) - rgbavgset.^2).^(0.5);
figure;
imshow(rgbavgset1, []);
% New - display standard deviation image
figure;
imshow(rgbstdset1, []);
Also to make sure, I've scaled the display of each imshow call so the smallest value gets mapped to 0 and the largest value gets mapped to 1. This does not change the actual contents of the images. This is just for display purposes.

How do I efficiently created a BW mask for this microscopic image?

So some background. I was tasked to write a matlab program to count the number yeast cells inside visible-light microscopic images. To do that I think the first step will be cell segmentation. Before I got the real experiment image set, I developed an algorithm use a test image set utilizing watershed. Which looks like this:
The first step of watershed is generating a BW mask for the cells. Then I would generate a bwdist image with imposed local minimums generated from the BW mask. With that I can generate the watershed easily.
As you can see my algorithm rely on the successful generation of BW mask. Because I need to generate the bwdist image and markers from it. Originally, I generate the BW mask following the following steps:
generate the Local standard deviation of image sdImage = stdfilt(grayImage, ones(9))
Use BW thresholding to generate the initial BW mask binaryImage = sdImage < 8;
use imclearborder to clear the background. Use some other code to add the cells on the border back.
Background finished. Here is my problem
But today I received the new real data sets. The image resolution is much smaller and the light condition is different from the test image set. The color depth is also much smaller. These make my algorithm useless. Here is it:
Using stdfilt failed to generate a good clean images. Instead it generate stuff like this (Note: I have adjusted parameters for the stdfilt function and the BW threshold value, following is the best result I can get) :
As you can see there are light pixels in the center of the cells that not necessary darker than the membrane. Which lead the bw thresholding generate stuff like this:
The new bw image after bw thresholding have either incomplete membrane or segmented cell centers and make them unsuitable to the other steps.
I only start image processing recently and have no idea how should I proceed. If you have an idea please help me! Thanks!
For your convience, I have attached a link from dropbox for a subset of the images
I think there's a fundamental problem in your approach. Your algorithm uses stdfilt in order to binarize the image. But what that essentially means is you're assuming there is there is low "texture" in the background and within the cell. This works for your first image. However, in your second image there is a "texture" within the cell, so this assumption is broken.
I think a stronger assumption is that there is a "ring" around each cell (valid for both images you posted). So I took the approach of detecting this ring instead.
So my approach is essentially:
Detect these rings (I use a 'log' filter and then binarize based on positive values. However, this results in a lot of "chatter"
Try to remove some of the "chatter" initially by filtering out very small and very large regions
Now, fill in these rings. However, there is still some "chatter" and filled regions between cells left
Again, remove small and large regions, but since the cells are filled, increase the bounds for what is acceptable.
There are still some bad regions, most of the bad areas are going to be regions between cells. Regions between cells are detectable by observing the curvature around the boundary of the region. They "bend inwards" a lot, which is expressed mathematically as a large portion of the boundary having a negative curvature. Also, to remove the rest of the "chatter", these regions will have a large standard deviation in the curvature of their boundary, so remove boundaries with a large standard deviation as well.
Overall, the most difficult part will be removing regions between cells and the "chatter" without removing the actual cells.
Anyway, here's the code (note there are a lot of heuristics and also it's very rough and based on code from older projects, homeworks, and stackoverflow answers so it's definitely far from finished):
cell = im2double(imread('cell1.png'));
if (size(cell,3) == 3)
cell = rgb2gray(cell);
end
figure(1), subplot(3,2,1)
imshow(cell,[]);
% Detect edges
hw = 5;
cell_filt = imfilter(cell, fspecial('log',2*hw+1,1));
subplot(3,2,2)
imshow(cell_filt,[]);
% First remove hw and filter out noncell hws
mask = cell_filt > 0;
hw = 5;
mask = mask(hw:end-hw-1,hw:end-hw-1);
subplot(3,2,3)
imshow(mask,[]);
rp = regionprops(mask, 'PixelIdxList', 'Area');
rp = rp(vertcat(rp.Area) > 50 & vertcat(rp.Area) < 2000);
mask(:) = false;
mask(vertcat(rp.PixelIdxList)) = true;
subplot(3,2,4)
imshow(mask,[]);
% Now fill objects
mask1 = true(size(mask) + hw);
mask1(hw+1:end, hw+1:end) = mask;
mask1 = imfill(mask1,'holes');
mask1 = mask1(hw+1:end, hw+1:end);
mask2 = true(size(mask) + hw);
mask2(hw+1:end, 1:end-hw) = mask;
mask2 = imfill(mask2,'holes');
mask2 = mask2(hw+1:end, 1:end-hw);
mask3 = true(size(mask) + hw);
mask3(1:end-hw, 1:end-hw) = mask;
mask3 = imfill(mask3,'holes');
mask3 = mask3(1:end-hw, 1:end-hw);
mask4 = true(size(mask) + hw);
mask4(1:end-hw, hw+1:end) = mask;
mask4 = imfill(mask4,'holes');
mask4 = mask4(1:end-hw, hw+1:end);
mask = mask1 | mask2 | mask3 | mask4;
% Filter out large and small regions again
rp = regionprops(mask, 'PixelIdxList', 'Area');
rp = rp(vertcat(rp.Area) > 100 & vertcat(rp.Area) < 5000);
mask(:) = false;
mask(vertcat(rp.PixelIdxList)) = true;
subplot(3,2,5)
imshow(mask);
% Filter out regions with lots of positive concavity
% Get boundaries
[B,L] = bwboundaries(mask);
% Cycle over boundarys
for i = 1:length(B)
b = B{i};
% Filter boundary - use circular convolution
b(:,1) = cconv(b(:,1),fspecial('gaussian',[1 7],1)',size(b,1));
b(:,2) = cconv(b(:,2),fspecial('gaussian',[1 7],1)',size(b,1));
% Find curvature
curv_vec = zeros(size(b,1),1);
for j = 1:size(b,1)
p_b = b(mod(j-2,size(b,1))+1,:); % p_b = point before
p_m = b(mod(j,size(b,1))+1,:); % p_m = point middle
p_a = b(mod(j+2,size(b,1))+1,:); % p_a = point after
dx_ds = p_a(1)-p_m(1); % First derivative
dy_ds = p_a(2)-p_m(2); % First derivative
ddx_ds = p_a(1)-2*p_m(1)+p_b(1); % Second derivative
ddy_ds = p_a(2)-2*p_m(2)+p_b(2); % Second derivative
curv_vec(j+1) = dx_ds*ddy_ds-dy_ds*ddx_ds;
end
if (sum(curv_vec > 0)/length(curv_vec) > 0.4 || std(curv_vec) > 2.0)
L(L == i) = 0;
end
end
mask = L ~= 0;
subplot(3,2,6)
imshow(mask,[])
Output1:
Output2:

Matching trajectories of whiskers

I am performing a whisker-tracking experiments. I have high-speed videos (500fps) of rats whisking against objects. In each such video I tracked the shape of the rat's snout and whiskers. Since tracking is noisy, the number of whiskers in each frame may be different (see 2 consecutive frames in attached image, notice the yellow false-positive whisker appearing in the left frame but not the right one).
See example 1:
As an end result of tracking, I get, for each frame, a varying number of variable-length vectors; each vector corresponding to a single whisker. At this point I would like to match the whiskers between frames. I have tried using Matlab's sample align to do this, but it works only somewhat properly. Its results are attached below (in attached image showing basepoint of all whiskers over 227 frames).
See example 2:
I would like to run some algorithm to cluster the whiskers correctly, such that each whisker is recognized as itself and separated from other over the course of many frames. In other words, I would like each slightly sinusoidal trajectory in the second image to be recognized as one trajectory. Whatever sorting algorithm I use should take into account that whiskers may disappear and reappear between consecutive frames. Unfortunately, I'm all out of ideas...
Any help?
Once again, keep in mind that for each point in attached image 2, I have many data points, since this is only a plot of whisker basepoint, while in actuality I have data for the entire whisker length.
This is how I would deal with the problem. Assuming that data vectors of different size are in a cell type called dataVectors, and knowing the number of whiskers (nSignals), I would try to extend the data to a second dimension derived from the original data and then perform k-means on two dimensions.
So, first I would get the maximum size of the vectors in order to convert the data to a matrix and do NaN-padding.
maxSize = -Inf;
for k = 1:nSignals
if length(dataVectors{k}.data) > maxSize
maxSize = length(dataVectors{k}.data);
end
end
Now, I would make the data 2D by elevating it to power of two (or three, your choice). This is just a very simple transformation. But you could alternatively use kernel methods here and project each vector against the rest; however, I don't think this is necessary, and if your data is really big, it could be inefficient. For now, raising the data to the power of two should do the trick. The result is stored in a second dimension.
projDegree = 2;
projData = zeros(nSignals, maxSize, 2).*NaN;
for k = 1:nSignals
vecSize = length(dataVectors{k}.data);
projData(k, 1:vecSize, 1) = dataVectors{k}.data;
projData(k, 1:vecSize, 2) = dataVectors{k}.data.*projDegree;
end
projData = reshape(projData, [], 2);
Here, projData will have in row 1 and column 1, the original data of the first whisker (or signal as I call it here), and column 2 will have the new dimension. Let's suppose that you have 8 whiskers in total, then, projData will have the data of the first whisker in row 1, 9, 17, and so on. The data of the second whisker in row 2, 10, 18, and so forth. That is important if you want to work your way back to the original data. Also, you can try with different projDegrees but I doubt it will make a lot of difference.
Now we perform k-means on the 2D data; however, we provide the initial points instead of letting it determine them with k-means++. The initial points, as I propose here, are the first data point of each vector for each whisker. In this manner, k-means will depart from there and will move to clusters means accordingly. We save the results in idxK.
idxK = kmeans(projData,nSignals, 'Start', projData(1:nSignals, :));
And there you have it. The variable idxK will tell you which data point belongs to what cluster.
Below is a working example of my proposed solution. The first part is simply trying to produce data that looks like your data, you can skip it.
rng(9, 'twister')
nSignals = 8; % number of whiskers
n = 1000; % number of data points
allData = zeros(nSignals, n); % all the data will be stored here
% this loop will just generate some data that looks like yours
for k = 1:nSignals
x = sort(rand(1,n));
nPeriods = round(rand*9)+1; % the sin can have between 1-10 periods
nShiftAmount = round(randn*30); % shift between ~ -100 to +100
y = sin(x*2*pi*nPeriods) + (randn(1,n).*0.5);
y = y + nShiftAmount;
allData(k, :) = y;
end
nanIdx = round(rand(1, round(n*0.05)*nSignals).*((n*nSignals)-1))+1;
allData(nanIdx) = NaN; % about 5% of the data is now missing
figure(1);
for k = 1:nSignals
nanIdx = ~isnan(allData(k, :));
dataVectors{k}.data = allData(k, nanIdx);
plot(dataVectors{k}.data, 'kx'), hold on;
end
% determine the max size
maxSize = -Inf;
for k = 1:nSignals
if length(dataVectors{k}.data) > maxSize
maxSize = length(dataVectors{k}.data);
end
end
% making the data now into two dimensions and NaN pad
projDegree = 2;
projData = zeros(nSignals, maxSize, 2).*NaN;
for k = 1:nSignals
vecSize = length(dataVectors{k}.data);
projData(k, 1:vecSize, 1) = dataVectors{k}.data;
projData(k, 1:vecSize, 2) = dataVectors{k}.data.*projDegree;
end
projData = reshape(projData, [], 2);
figure(2); plot(projData(:,1), projData(:,2), 'kx');
% run k-means using the first points of all measure as the initial points
idxK = kmeans(projData,nSignals, 'Start', projData(1:nSignals, :));
figure(3);
liColors = [{'yx'},{'mx'},{'cx'},{'bx'},{'kx'},{'gx'},{'rx'},{'gd'}];
for k = 1:nSignals
plot(projData(idxK==k,1), projData(idxK==k,2), liColors{k}), hold on;
end
% plot results on original data
figure(4);
for k = 1:nSignals
plot(projData(idxK==k,1), liColors{k}), hold on;
end
Let me know if this helps.

Resources