Grouping data using loops (signal processing in MATLAB) - algorithm

I am working in MATLAB with a signal data that consist of consecutive dips as shown below. I am trying to write a code which sorts the contents of each dip into a separate group. How should the general structure of such a code look like?
The following is my data. I am only interested in the portion of the signal that lies below a certain threshold d (the red line):
And here is the desired grouping:
Here is an unsuccessful attempt:
k=0; % Group number
for i = 1 : length(signal)
if signal(i) < d
k=k+1;
while signal(i) < d
NewSignal(i, k) = signal(i);
i = i + 1;
end
end
end
The code above generated 310 groups instead of the desired 12 groups.
Any explanation would be greatly appreciated.

Taking Benl generated data you can do the following:
%generate data
x=1:1000;
y=sin(x/20);
for ii=1:9
y=y+-10*exp(-(x-ii*100).^2./10);
end
y=awgn(y,4);
%set threshold
t=-4;
%threshold data
Y = char(double(y<t) + '0'); %// convert to string of zeros and ones
%search for start and ends
This idea is taken from here
[s, e] = regexp(Y, '1+', 'start', 'end');
%and now plot and see that each pair of starts and end
% represents a group
plot(x,y)
hold on
for k=1:numel(s)
line(s(k)*ones(2,1),ylim,'Color','k','LineStyle','--')
line(e(k)*ones(2,1),ylim,'Color','k','LineStyle','-')
end
hold off
legend('Data','Starts','Ends')
Comments: First of all I choose an arbitrary threshold, it is up to you to find the "best" one in your data. Additionally I didn't group the data explicitly but rather this approach gives you the start and end of each epoch with a dip (you might call it group). So you could say that each index is the grouping index. Finally I did not debug this approach for corner cases, when dips fall on starts and ends...

In MATLAB you cannot change the loop index of a for loop. A for loop:
for i = array
loops over each column of array in turn. In your code, 1 : length(signal) is an array, each of its elements is visited in turn. Inside this loop there is a while loop that increments i. However, when this while loop ends and the next iteration of the for loop runs, i is reset to the next item in the array.
This code therefore needs two while loops:
i = 1; % Index
k = 0; % Group number
while i <= numel(signal)
if signal(i) < d
k = k + 1;
while signal(i) < d
NewSignal(i,k) = signal(i);
i = i + 1;
end
end
i = i + 1;
end

Easy, the function you're looking for is bwlabel, which when combined with logical indexing makes this simple.
To start I made some fake data which resembled your data
x=1:1000;
y=sin(x/20);
for ii=1:9
y=y+-10*exp(-(x-ii*100).^2./10);
end
y=awgn(y,4);
plot(x,y)
Then set your threshold and use 'bwlabel'
d=-4;% set the threshold
groupid=bwlabel(y<d);
bwlabel labels connected groups in a black and white image, what we've effectively done here is make a black and white (logical 0 & 1) 1D image in the logical vector y<d. bwlabel returns the number of the region at the index of the region. We're not interested in the 0 region, so to get the x values or y values of the nth region, simply use x(groupid==n), for example with my test data
x_4=x(groupid==4)
y_4=y(groupid==4)
x_4 = 398 399 400 401 402
y_4 = -5.5601 -7.8280 -9.1965 -7.9083 -5.8751

Related

Parallel loops and image processing in matlab

I am going to implement a salient object detection method based on a simple linear feedback control system (LFCS). The control system model is represented as in the following equation:
I've come up with the following program codes but the result would not be what should be. Specifically, the output should be something like the following image:
But the code produces this output:
The codes are as follows.
%Calculation of euclidian distance between adjacent superpixels stores in variable of Euc
A = imread('aa.jpg');
[rows, columns, cnumberOfColorChannels] = size(A);
[L,N] = superpixels(A,400);
%% Determination of adjacent superpixels
glcms = graycomatrix(L,'NumLevels',N,'GrayLimits',[1,N],'Offset',[0,1;1,0]); %Create gray-level co-occurrence matrix from image
glcms = sum(glcms,3); % add together the two matrices
glcms = glcms + glcms.'; % add upper and lower triangles together, make it symmetric
glcms(1:N+1:end) = 0; % set the diagonal to zero, we don't want to see "1 is neighbor of 1"
idx = label2idx(L); % Convert label matrix to cell array of linear indices
numRows = size(A,1);
numCols = size(A,2);
%%Mean color in Lab color space for each channel
data = zeros(N,3);
for labelVal = 1:N
redIdx = idx{labelVal};
greenIdx = idx{labelVal}+numRows*numCols;
blueIdx = idx{labelVal}+2*numRows*numCols;
data(labelVal,1) = mean(A(redIdx));
data(labelVal,2) = mean(A(greenIdx));
data(labelVal,3) = mean(A(blueIdx));
end
Euc=zeros(N);
%%Calculation of euclidian distance between adjacent superpixels stores in Euc
for a=1:N
for b=1:N
if glcms(a,b)~=0
Euc(a,b)=sqrt(((data(a,1)-data(b,1))^2)+((data(a,2)-data(b,2))^2)+((data(a,3)-data(b,3))^2));
end
end
end
%%Creation of Connectivity matrix "W" between adjacent superpixels
W=zeros(N);
W_num=zeros(N);
W_den=zeros(N);
OMG1=0.1;
for c=1:N
for d=1:N
if(Euc(c,d)~=0)
W_num(c,d)=exp(-OMG1*(Euc(c,d)));
W_den(c,c)=W_num(c,d)+W_den(c,c); %
end
end
end
%Connectivity matrix W between adjacent superpixels
for e=1:N
for f=1:N
if(Euc(e,f)~=0)
W(e,f)=(W_num(e,f))/(W_den(e,e));
end
end
end
%%calculation of geodesic distance between nonadjacent superpixels stores in variable "s_star_temp"
s_star_temp=zeros(N); %temporary variable for geodesic distance measurement
W_sparse=zeros(N);
W_sparse=sparse(W);
for g=1:N
for h=1:N
if W(g,h)==0 & g~=h;
s_star_temp(g,h)=graphshortestpath(W_sparse,g,h,'directed',false);
end
end
end
%%Calculation of connectivity matrix for nonadjacent superpixels stores in "S_star" variable"
S_star=zeros(N);
OMG2=8;
for i=1:N
for j=1:N
if s_star_temp(i,j)~=0
S_star(i,j)=exp(-OMG2*s_star_temp(i,j));
end
end
end
%%Calculation of connectivity matrix "S" for measuring connectivity between all superpixels
S=zeros(N);
S=S_star+W;
%% Defining non-isolation level for connectivity matrix "W"
g_star=zeros(N);
for k=1:N
g_star(k,k)=max(W(k,:));
end
%%Limiting the range of g_star and calculation of isolation cue matrix "G"
alpha1=0.15;
alpha2=0.85;
G=zeros(N);
for l=1:N
G(l,l)=alpha1*(g_star(l,l)- min(g_star(:)))/(max(g_star(:))- min(g_star(:)))+(alpha2 - alpha1);
end
%%Determining the supperpixels that surrounding the image boundary
lr = L([1,end],:);
tb = L(:,[1,end]);
labels = unique([lr(:);tb(:)]);
%% Calculation of background likelihood for each superpixels stores in"BgLike"
sum_temp=0;
temp=zeros(1,N);
BgLike=zeros(N,1);
BgLike_num=zeros(N);
BgLike_den=zeros(N);
for m=1:N
for n=1:N
if ismember(n,labels)==1
BgLike_num(m,m)=S(m,n)+ BgLike_num(m,m);
end
end
end
for o=1:N
for p=1:N
for q=1:N
if W(p,q)~=0
temp(q)=S(o,p)-S(o,q);
end
end
sum_temp=max(temp)+sum_temp;
temp=0;
end
BgLike_den(o,o)=sum_temp;
sum_temp=0;
end
for r=1:N
BgLike(r,1)= BgLike_num(r,r)/BgLike_den(r,r);
end
%%%%Calculation of Foreground likelihood for each superpixels stores in "FgLike"
FgLike=zeros(N,1);
for s=1:N
for t=1:N
FgLike(s,1)=(exp(-BgLike(t,1))) * Euc(s,t)+ FgLike(s,1);
end
end
The above codes are prerequisite for the following sections (in fact, they produce necessary data and matrices for the next section. The aforementioned codes provided to make the whole process reproducible).
Specifically, I think that this section did not give the desired results. I'm afraid I did not properly simulate the parallelism using for loops. Moreover, the terminating conditions (employed with for and if statements to simulate do-while loop) are never satisfied and the loops continue until the last iteration (instead terminating when a specified condition occurs). A major concern here is that if the terminating conditions are properly implemented.
The pseudo algorithm for the following code is as the image below:
%%parallel operations for background and foreground implemented here
T0 = 0 ;
Tf = 20 ;
Ts = 0.1 ;
Ti = T0:Ts:Tf ;
Nt=numel(Ti);
Y_Bg=zeros(N,Nt);
Y_Fg=zeros(N,Nt);
P_Back_Bg=zeros(N,N);
P_Back_Fg=zeros(N,N);
u_Bg=zeros(N,Nt);
u_Fg=zeros(N,Nt);
u_Bg_Star=zeros(N,Nt);
u_Fg_Star=zeros(N,Nt);
u_Bg_Normalized=zeros(N,Nt);
u_Fg_Normalized=zeros(N,Nt);
tau=0.1;
sigma_Bg=zeros(Nt,N);
Temp_Bg=0;
Temp_Fg=0;
C_Bg=zeros(Nt,N);
C_Fg=zeros(Nt,N);
%%System Initialization
for u=1:N
u_Bg(u,1)=(BgLike(u,1)- min(BgLike(:)))/(max(BgLike(:))- min(BgLike(:)));
u_Fg(u,1)=(FgLike(u,1)- min(FgLike(:)))/(max(FgLike(:))- min(FgLike(:)));
end
%% P_state and P_input
P_state=G*W;
P_input=eye(N)-G;
% State Initialization
X_Bg=zeros(N,Nt);
X_Fg=zeros(N,Nt);
for v=1:20 % v starts from 1 because we have no matrices with 0th column number
%The first column of X_Bg and X_Fg is 0 for system initialization
X_Bg(:,v+1)=P_state*X_Bg(:,v) + P_input*u_Bg(:,v);
X_Fg(:,v+1)=P_state*X_Fg(:,v) + P_input*u_Fg(:,v);
v=v+1;
if v==2
C_Bg(1,:)=1;
C_Fg(1,:)=1;
else
for w=1:N
for x=1:N
Temp_Fg=S(w,x)*X_Fg(x,v-1)+Temp_Fg;
Temp_Bg=S(w,x)*X_Bg(x,v-1)+Temp_Bg;
end
C_Fg(v-1,w)=inv(X_Fg(w,v-1)+((Temp_Bg)/(Temp_Fg)*(1-X_Fg(w,v-1))));
C_Bg(v-1,w)=inv(X_Bg(w,v-1)+((Temp_Fg)/(Temp_Bg))*(1-X_Bg(w,v-1)));
Temp_Bg=0;
Temp_Fg=0;
end
end
P_Bg=diag(C_Bg(v-1,:));
P_Fg=diag(C_Fg(v-1,:));
Y_Bg(:,v)= P_Bg*X_Bg(:,v);
Y_Fg(:,v)= P_Fg*X_Fg(:,v);
for y=1:N
Temp_sig_Bg=0;
Temp_sig_Fg=0;
for z=1:N
Temp_sig_Bg = Temp_sig_Bg +S(y,z)*abs(Y_Bg(y,v)- Y_Bg(z,v));
Temp_sig_Fg = Temp_sig_Fg +S(y,z)*abs(Y_Fg(y,v)- Y_Fg(z,v));
end
if Y_Bg(y,v)>= Y_Bg(y,v-1)
sign_Bg=1;
else
sign_Bg=-1;
end
if Y_Fg(y,v)>= Y_Fg(y,v-1)
sign_Fg=1;
else
sign_Fg=-1;
end
sigma_Bg(v-1,y)=sign_Bg*Temp_sig_Bg;
sigma_Fg(v-1,y)=sign_Fg*Temp_sig_Fg;
end
%Calculation of P_Back for background and foreground
P_Back_Bg=tau*diag(sigma_Bg(v-1,:));
P_Back_Fg=tau*diag(sigma_Fg(v-1,:));
u_Bg_Star(:,v)=u_Bg(:,v-1)+P_Back_Bg*Y_Bg(:,v);
u_Fg_Star(:,v)=u_Fg(:,v-1)+P_Back_Fg*Y_Fg(:,v);
for aa=1:N %Normalization of u_Bg and u_Fg
u_Bg(aa,v)=(u_Bg_Star(aa,v)- min(u_Bg_Star(:,v)))/(max(u_Bg_Star(:,v))-min(u_Bg_Star(:,v)));
u_Fg(aa,v)=(u_Fg_Star(aa,v)- min(u_Fg_Star(:,v)))/(max(u_Fg_Star(:,v))-min(u_Fg_Star(:,v)));
end
if (max(abs(Y_Fg(:,v)-Y_Fg(:,v-1)))<=0.0118) &&(max(abs(Y_Bg(:,v)-Y_Bg(:,v-1)))<=0.0118) %% epsilon= 0.0118
break;
end
end
Finally, the saliency map will be generated by using the following codes.
K=4;
T=0.4;
phi_1=(2-(1-T)^(K-1))/((1-T)^(K-2));
phi_2=(1-T)^(K-1);
phi_3=1-phi_1;
for bb=1:N
Y_Output_Preliminary(bb,1)=Y_Fg(bb,v)/((Y_Fg(bb,v)+Y_Bg(bb,v)));
end
for hh=1:N
Y_Output(hh,1)=(phi_1*(T^K))/(phi_2*(1-Y_Output_Preliminary(hh,1))^K+(T^K))+phi_3;
end
V_rs=zeros(N);
V_Final=zeros(rows,columns);
for cc=1:rows
for dd=1:columns
V_rs(cc,dd)=Y_Output(L(cc,dd),1);
end
end
maxDist = 10; % Maximum chessboard distance from image
wSF=zeros(rows,columns);
wSB=zeros(rows,columns);
% Get the range of x and y indices who's chessboard distance from pixel (0,0) are less than 'maxDist'
xRange = (-(maxDist-1)):(maxDist-1);
yRange = (-(maxDist-1)):(maxDist-1);
% Create a mesgrid to get the pairs of (x,y) of the pixels
[pointsX, pointsY] = meshgrid(xRange, yRange);
pointsX = pointsX(:);
pointsY = pointsY(:);
% Remove pixel (0,0)
pixIndToRemove = (pointsX == 0 & pointsY == 0);
pointsX(pixIndToRemove) = [];
pointsY(pixIndToRemove) = [];
for ee=1:rows
for ff=1:columns
% Get a shifted copy of 'pointsX' and 'pointsY' that is centered
% around (x, y)
pointsX1 = pointsX + ee;
pointsY1 = pointsY + ff;
% Remove the the pixels that are out of the image bounds
inBounds =...
pointsX1 >= 1 & pointsX1 <= rows &...
pointsY1 >= 1 & pointsY1 <= columns;
pointsX1 = pointsX1(inBounds);
pointsY1 = pointsY1(inBounds);
% Do stuff with 'pointsX1' and 'pointsY1'
wSF_temp=0;
wSB_temp=0;
for gg=1:size(pointsX1)
Temp=exp(-OMG1*(sqrt(double(A(pointsX1(gg),pointsY1(gg),1))-double(A(ee,ff,1)))^2+(double(A(pointsX1(gg),pointsY1(gg),2))-double(A(ee,ff,2)))^2 + (double(A(pointsX1(gg),pointsY1(gg),3))-double(A(ee,ff,3)))^2));
wSF_temp=wSF_temp+(Temp*V_rs(pointsX1(gg),pointsY1(gg)));
wSB_temp=wSB_temp+(Temp*(1-V_rs(pointsX1(gg),pointsY1(gg))));
end
wSF(ee,ff)= wSF_temp;
wSB(ee,ff)= wSB_temp;
V_Final(ee,ff)=V_rs(ee,ff)/(V_rs(ee,ff)+(wSB(ee,ff)/wSF(ee,ff))*(1-V_rs(ee,ff)));
end
end
imshow(V_Final,[]); %%Saliency map of the image
Part of your terminating criterion is this:
max(abs(Y_a(:,t)-Y_a(:,t-1)))<=eps
Say Y_a tends to 2. You are really close... In fact, the closest you can get without subsequent values being identical is Y_a(t)-Y_a(t-1) == 4.4409e-16. If the two values were any closer, their difference would be 0, because this is the precision with which floating-point values can be represented. So you have reached this fantastic level of closeness to your goal. Subsequent iterations are changing the target value by the smallest possible amount, 4.4409e-16. But your test is returning false! Why? Because eps == 2.2204e-16!
eps is short-hand for eps(1), the difference between 1 and the next representable larger value. Because how floating-point values are represented, this difference is half the difference between 2 and the next representable larger value (which is given by eps(2).
However, if Y_a tends to 1e-16, subsequent iterations could double or halve the value of Y_a and you'd still meet the stopping criterion!
Thus, what you need is to come up with a reasonable stopping criterion that is a fraction of the target value, something like this:
max(abs(Y_a(:,t)-Y_a(:,t-1))) <= 1e6 * eps(max(abs(Y_a(:,t))))
Unsolicited advice
You should really look into vectorized operations in MATLAB. For example,
for y=1:N
Temp_sig_a=0;
for z=1:N
Temp_sig_a = Temp_sig_a + abs(Y_a(y,t)- Y_a(z,t));
end
sigma_a(t-1,y)= Temp_sig_a;
end
can be written as
for y=1:N
sigma_a(t-1,y) = sum(abs(Y_a(y,t) - Y_a(:,t)));
end
which in turn can be written as
sigma_a(t-1,:) = sum(abs(Y_a(:,t).' - Y_a(:,t)));
Avoiding loops is not only usually more efficient, but it also leads to shorter code that is easier to read.
Also, this:
P_FB_a = diag(sigma_a(t-1,:));
u_a(:,t) = u_a(:,t-1) + P_FB_a * Y_a(:,t);
is the same as
u_a(:,t) = u_a(:,t-1) + sigma_a(t-1,:).' .* Y_a(:,t);
but of course creating a diagonal matrix and doing a matrix multiplication with so many zeros is much more expensive than directly computing an element-wise multiplication.

Divide image blocks into two chunks

I'm working on an image and I divided into non overlapping blocks, what I want to do next is to apply some changes on every two adjacent chunks of the same blocks. For example, I have a block B1 and I divided it into B11 and B12.
The objective here is to apply SVD on B11 and B12 and compare their singular values: S11(2,2) and S12(2,2) and so on for every other blocks. But I don't know how to work on every two sub-blocks adjacent I only can apply SVD for blocks Bi.
It seems like I need a loop for to process every two sub-blocks right or can the function mat2tiles do this?
This is an example explaining what I'm saying.
I finally found the way how to get my objective, and I'm sharing in case somebody needed too:
%First this a function that help me to divide my image into non overlapping block
function B=Block(IM,p,q)
p=p %% number of rows
q=q %% number of cols
[m,n] = size(IM);
IJ = zeros(p,q);
z = 1;
for m1 = 1:m/p
for n1 = 1:n/q
if m1*p <= m;
if n1*q <= n;
for i = (m1-1)*p+1:m1*p
for j = (n1-1)*q+1:n1*q
IJ(i-(m1-1)*p,j-(n1-1)*q) = IM(i,j);
if (i-(m1-1)*p)==p&&(j-(n1-1)*q)==q;
OUT = IJ;
B{1,z} = OUT;
z = z+1;
end
end
end
end
end
end
end
Now I will divide each blocks into sub-blocks and apply SVD
Block_Num = (m*n)/(p*q) % To get how many blocks we have
fun= #svd % the function we use is SVD
for i=1:Block_Num
sv=blkproc(B{i},[4 4],fun) % in my case I wanted to apply SVD
of every sub-block of 4*4
end
And this get the job done. hope it will be useful for you too

Using non-continuous integers as identifiers in cells or structs in Matlab

I want to store some results in the following way:
Res.0 = magic(4); % or Res.baseCase = magic(4);
Res.2 = magic(5); % I would prefer to use integers on all other
Res.7 = magic(6); % elements than the first.
Res.2000 = 1:3;
I want to use numbers between 0 and 3000, but I will only use approx 100-300 of them. Is it possible to use 0 as an identifier, or will I have to use a minimum value of 1? (The numbers have meaning, so I would prefer if I don't need to change them). Can I use numbers as identifiers in structs?
I know I can do the following:
Res{(last number + 1)} = magic(4);
Res{2} = magic(5);
Res{7} = magic(6);
Res{2000} = 1:3;
And just remember that the last element is really the "number zero" element.
In this case I will create a bunch of empty cell elements [] in the non-populated positions. Does this cause a problem? I assume it will be best to assign the last element first, to avoid creating a growing cell, or does this not have an effect? Is this an efficient way of doing this?
Which will be most efficient, struct's or cell's? (If it's possible to use struct's, that is).
My main concern is computational efficiency.
Thanks!
Let's review your options:
Indexing into a cell arrays
MATLAB indices start from 1, not from 0. If you want to store your data in cell arrays, in the worst case, you could always use the subscript k + 1 to index into cell corresponding to the k-th identifier (k ≥ 0). In my opinion, using the last element as the "base case" is more confusing. So what you'll have is:
Res{1} = magic(4); %// Base case
Res{2} = magic(5); %// Corresponds to identifier 1
...
Res{k + 1} = ... %// Corresponds to indentifier k
Accessing fields in structures
Field names in structures are not allowed to begin with numbers, but they are allowed to contain them starting from the second character. Hence, you can build your structure like so:
Res.c0 = magic(4); %// Base case
Res.c1 = magic(5); %// Corresponds to identifier 1
Res.c2 = magic(6); %// Corresponds to identifier 2
%// And so on...
You can use dynamic field referencing to access any field, for instance:
k = 3;
kth_field = Res.(sprintf('c%d', k)); %// Access field k = 3 (i.e field 'c3')
I can't say which alternative seems more elegant, but I believe that indexing into a cell should be faster than dynamic field referencing (but you're welcome to check that out and prove me wrong).
As an alternative to EitanT's answer, it sounds like matlab's map containers are exactly what you need. They can deal with any type of key and the value may be a struct or cell.
EDIT:
In your case this will be:
k = {0,2,7,2000};
Res = {magic(4),magic(5),magic(6),1:3};
ResMap = containers.Map(k, Res)
ResMap(0)
ans =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
I agree with the idea in #wakjah 's comment. If you are concerned about the efficiency of your program it's better to change the interpretation of the problem. In my opinion there is definitely a way that you could priorotize your data. This prioritization could be according to the time you acquired them, or with respect to the inputs that they are calculated. If you set any kind of priority among them, you can sort them into an structure or cell (structure might be faster).
So
Priority (Your Current Index Meaning) Data
1 0 magic(4)
2 2 magic(5)
3 7 magic(6)
4 2000 1:3
Then:
% Initialize Result structure which is different than your Res.
Result(300).Data = 0; % 300 the maximum number of data
Result(300).idx = 0; % idx or anything that represent the meaning of your current index.
% Assigning
k = 1; % Priority index
Result(k).idx = 0; Result(k).Data = magic(4); k = k + 1;
Result(k).idx = 2; Result(k).Data = magic(5); k = k + 1;
Result(k).idx = 7; Result(k).Data = magic(6); k = k + 1;
...

MATLAB loop optimization

I have a matrix, matrix_logical(50000,100000), that is a sparse logical matrix (a lot of falses, some true). I have to produce a matrix, intersect(50000,50000), that, for each pair, i,j, of rows of matrix_logical(50000,100000), stores the number of columns for which rows i and j have both "true" as the value.
Here is the code I wrote:
% store in advance the nonzeros cols
for i=1:50000
nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end
intersect = zeros(50000,50000);
for i=1:49999
a = cell2mat(nonzeros{i});
for j=(i+1):50000
b = cell2mat(nonzeros{j});
intersect(i,j) = numel(intersect(a,b));
end
end
Is it possible to further increase the performance? It takes too long to compute the matrix. I would like to avoid the double loop in the second part of the code.
matrix_logical is sparse, but it is not saved as sparse in MATLAB because otherwise the performance become the worst possible.
Since the [i,j] entry counts the number of non zero elements in the element-wise multiplication of rows i and j, you can do it by multiplying matrix_logical with its transpose (you should convert to numeric data type first, e.g matrix_logical = single(matrix_logical)):
inter = matrix_logical * matrix_logical';
And it works both for sparse or full representation.
EDIT
In order to calculate numel(intersect(a,b))/numel(union(a,b)); (as asked in your comment), you can use the fact that for two sets a and b, you have
length(union(a,b)) = length(a) + length(b) - length(intersect(a,b))
so, you can do the following:
unLen = sum(matrix_logical,2);
tmp = repmat(unLen, 1, length(unLen)) + repmat(unLen', length(unLen), 1);
inter = matrix_logical * matrix_logical';
inter = inter ./ (tmp-inter);
If I understood you correctly, you want a logical AND of the rows:
intersct = zeros(50000, 50000)
for ii = 1:49999
for jj = ii:50000
intersct(ii, jj) = sum(matrix_logical(ii, :) & matrix_logical(jj, :));
intersct(jj, ii) = intersct(ii, jj);
end
end
Doesn't avoid the double loop, but at least works without the first loop and the slow find command.
Elaborating on my comment, here is a distance function suitable for pdist()
function out = distfun(xi,xj)
out = zeros(size(xj,1),1);
for i=1:size(xj,1)
out(i) = sum(sum( xi & xj(i,:) )) / sum(sum( xi | xj(i,:) ));
end
In my experience, sum(sum()) is faster for logicals than nnz(), thus its appearance above.
You would also need to use squareform() to reshape the output of pdist() appropriately:
squareform(pdist(martrix_logical,#distfun));
Note that pdist() includes a 'jaccard' distance measure, but it is actually the Jaccard distance and not the Jaccard index or coefficient, which is the value you are apparently after.

Matlab get rid of loops

I'm a newbie in Matlab and trying get rid of Java/C++ customs.
The question is "how I can get rid of these for loops."
I tried to use nchoosek(n0,2) to get rid of one of the loops but another problem arose.(performance of nchoosek)
<Matlab code>
for j=2:n0
for i=1:j-1
%wij is the number of rows of A that have 1 at both column i and column j
%summing col i and j to find #of common 1's
wij = length(find((A(:,i)+A(:,j))==2));
%store it
W(1,j)=wij;
%testing whether the intersection of any two columns is too large
if wij>= (1+epsilon)*u2;
%create and edge between col i j
end
end
end
</matlab Code>
I assume that A is an array with only 0 and 1.
Then you can create a nCol-by-nCol array B with the "distance" between colums by writing
B = A'*A; %'# B(i,j) = length(find((A(:,i)+A(:,j))==2))
%# threshold
largeIntersection = B >= u2;
%# find i,j of large intersections
[largeIJ(:,1),largeIJ(:,2)] = find(largeIntersection);
%# make sure we only get unique i,j pairs
largeIJ = unique(sort(largeIJ,2),'rows');

Resources