I'm attempting to implement Neville's algorithm in MatLab with four given points in this case. However, I'm a little stuck at the moment.
This is my script so far:
% Neville's Method
% Function parameters
x = [7,14,21,28];
fx = [58,50,54,53];
t = 10;
n = length(x);
Q = zeros(n,n);
for i = 1:n
Q(i,1) = fx(i);
end
for j = 2:n
for i = j:n
Q(i,j) = ((t-x(i-j)) * Q(i,j-1)/(x(i)-x(i-j))) + ((x(i)-t) * Q(i-1,j-1)/(x(i)-x(i-j)));
end
end
print(Q);
As for the problem I'm having, I'm getting this output consistently:
Subscript indices must either be real positive integers or logicals.
I've been trying to tweak the loop iterations but to no avail. I know the problem is the primary logic line in the inner loop. Some of the operations result in array indices that are equal to zero initially.
That's where I am, any help would be appreciated!
In your loop at the first time i-j is 0 because you set i = j. In MATLAB indices start at 1. A simple fix to get running code would be to change
for i = j:n
to
for i = j+1:n
This solves
Subscript indices must either be real positive integers or logicals.
However, this may not be ideal and you may need to rethink your logic. The output I get is
>> neville
Q =
58.0000 0 0 0
50.0000 0 0 0
54.0000 50.8571 0 0
53.0000 54.2857 51.3469 0
Related
I have a Vector of Vectors of different length W. These last vectors contain integers between 0 and 150,000 in steps of 5 but can also be empty. I am trying to compute the empirical cdf for each of those vectors. I could compute these cdf iterating over every vector and every integer like this
cdfdict = Dict{Tuple{Int,Int},Float64}()
for i in 1:length(W)
v = W[i]
len = length(v)
if len == 0
pcdf = 1.0
else
for j in 0:5:150_000
pcdf = length(v[v .<= j])/len
cdfdict[i, j] = pcdf
end
end
end
However, this approach is inefficient because the cdf will be equal to 1 for j >= maximum(v) and sometimes this maximum(v) will be much lower than 150,000.
My question is: how can I include a condition that breaks out of the j loop for j > maximum(v) but still assigns pcdf = 1.0 for the rest of js?
I tried including a break when j > maximum(v) but this, of course, stops the loop from continuing for the rest of js. Also, I can break the loop and then use get! to access/include 1.0 for keys not found in cdfdict later on, but that is not what I'm looking for.
To elaborate on my comment, this answer details an implementation which fills an Array instead of a Dict.
First to create a random test case:
W = [rand(0:mv,rand(0:10)) for mv in floor(Int,exp(log(150_000)*rand(10)))]
Next create an array of the right size filled with 1.0s:
cdfmat = ones(Float64,length(W),length(0:5:150_000));
Now to fill the beginning of the CDFs:
for i=1:length(W)
v = sort(W[i])
k = 1
thresh = 0
for j=1:length(v)
if (j>1 && v[j]==v[j-1])
continue
end
pcdf = (j-1)/length(v)
while thresh<v[j]
cdfmat[i,k]=pcdf
k += 1
thresh += 5
end
end
end
This implementation uses a sort which can be slow sometimes, but the other implementations basically compare the vector with various values which is even slower in most cases.
break only does one level. You can do what you want by wrapping the for loop function and using return (instead of where you would've put break), or using #goto.
Or where you would break, you could switch a boolean breakd=true and then break, and at the bottom of the larger loop do if breakd break end.
You can use another for loop to set all remaining elements to 1.0. The inner loop becomes
m = maximum(v)
for j in 0:5:150_000
if j > m
for k in j:5:150_000
cdfdict[i, k] = 1.0
end
break
end
pcdf = count(x -> x <= j, v)/len
cdfdict[i, j] = pcdf
end
However, this is rather hard to understand. It would be easier to use a branch. In fact, this should be just as fast because the branch is very predictable.
m = maximum(v)
for j in 0:5:150_000
if j > m
cdfdict[i, j] = 1.0
else
pcdf = count(x -> x <= j, v)/len
cdfdict[i, j] = pcdf
end
end
Another answer gave an implementation using an Array which calculated the CDF by sorting the samples and filling up the CDF bins with quantile values. Since the whole Array is thus filled, doing another pass on the array should not be overly costly (we tolerate a single pass already). The sorting bit and the allocation accompanying it can be avoided by calculating a histogram in the array and using cumsum to produce a CDF. Perhaps the code will explain this better:
Initialize sizes, lengths and widths:
n = 10; w = 5; rmax = 150_000; hl = length(0:w:rmax)
Produce a sample example:
W = [rand(0:mv,rand(0:10)) for mv in floor(Int,exp(log(rmax)*rand(n)))];
Calculate the CDFs:
cdfmat = zeros(Float64,n,hl); # empty histograms
for i=1:n # drop samples into histogram bins
for j=1:length(W[i])
cdfmat[i,1+(W[i][j]+w-1)รท5]+=one(Float64)
end
end
cumsum!(cdfmat,cdfmat,2) # calculate pre-CDF by cumsum
for i=1:n # normalize each CDF by total
if cdfmat[i,hl]==zero(Float64) # check if histogram empty?
for j=1:hl # CDF of 1.0 as default (might be changed)
cdfmat[i,j] = one(Float64)
end
else # the normalization factor calc-ed once
f = one(Float64)/cdfmat[i,hl]
for j=1:hl
cdfmat[i,j] *= f
end
end
end
(a) Note the use of one,zero to prepare for change of Real type - this is good practice. (b) Also adding various #inbounds and #simd should optimize further. (c) Putting this code in a function is recommended (this is not done in this answer). (d) If having a zero CDF for empty samples is OK (which means no samples means huge samples semantically), then the second for can be simplified.
See other answers for more options, and reminder: Premature optimization is the root of all evil (Knuth??)
I would be very interested to receive suggestions on how to improve performance of the following nested for loop:
I = (U > q); % matrix of indicator variables, I(i,j) is 1 if U(i,j) > q
for i = 2:K
for j = 1:(i-1)
mTau(i,j) = sum(I(:,i) .* I(:,j));
mTau(j,i) = mTau(i,j);
end
end
The code evaluates if for pairs of variables both variables are below a certain threshold, thereby filling a matrix. I appreciate your help!
You can use matrix multiplication:
I = double(U>q);
mTau = I.'*I;
This will have none-zero values on diagonal so you can set them to zero by
mTau = mTau - diag(diag(mTau));
One approach with bsxfun -
out = squeeze(sum(bsxfun(#and,I,permute(I,[1 3 2])),1));
out(1:size(out,1)+1:end)=0;
I am fairly new to the concept of vectorization in MATLAB so please excuse my naivety in this regard. I was trying to vectorize the following MATLAB code which includes if statements within nested for loops:
h = zeros(dimV);
for a = 1 : dimV
for b = 1 : dimV
if a ~= b && C(a,b) == 1
h(a,b) = C(a,b)*exp(-1i*A(a,b)*L(a,b))*(sin(k*L(a,b)))^-1;
else if a == b
for m = 1 : dimV
if m ~= a && C(a,m) == 1
h(a,b) = h(a,b) - C(a,m)*cot(k*L(a,m));
end
end
end
end
end
end
Here the variable dimV specifies the size of the matrix h and is fairly large, of the order of 100, and C is a symmetric square matrix (previously defined) of size dimV all of whose off-diagonal elements are either 0 or 1 and the diagonal elements are necesarrily 0. The elements of matrix L are also zero in the same positions as the zeros of the matrix C. Following the vectorization techniques that I found on this website and here, I was able to vectorize the code, albeit partially, and my MWE is as follows:
h = zeros(dimV);
idx = (C == 1);
h(idx) = C(idx).*exp(-1i*A(idx).*L(idx)).*(sin(k*L(idx))).^-1;
for a = 1:dimV;
m = 1 : dimV;
m = m(C(a,:) == 1);
h(a,a) = - sum (C(a,m).*cot(k*L(a,m)));
end
My main problem is in converting the for loop in the variable a to a vector as I need the individual values of a to address the diagonal elements of h. I compared the evaluation time of both the code blocks using the MATLAB profiler and the latter version is only marginally faster and the improvement in efficiency is really insignificant. In fact the profiler showed that the line allocating the values to h(a,a) takes up nearly 50% of the execution time in the second case. So I was wondering if there is a more elegant way to rewrite the above code using the appropriate vectorization schemes, which would help improve its efficiency. I am really in a bit of bother about this and any I would be greatly appreciative of any help in this regard. Thank you so much.
Vectorized Code -
diag_ind = 1:dimV+1:numel(C);
C_neq1 = C~=1;
parte1_2 = (sin(k.*L)).^-1;
parte1_2(C_neq1 & (L==0))=0;
parte1 = exp(-1i.*A.*L).*parte1_2;
parte1(C_neq1)=0;
h = parte1;
parte2 = cot(k*L);
parte2(C_neq1)=0;
parte2(diag_ind)=0;
h(diag_ind) = - sum(parte2,2);
I have a code that yields a solution similar to the desired output, and I don't know how to perfect this.
The code is as follows.
N = 4; % sampling period
for nB = -30:-1;
if rem(nB,N)==0
xnB(abs(nB)) = -(cos(.1*pi*nB)-(4*sin(.2*pi*nB)));
else
xnB(abs(nB)) = 0;
end
end
for nC = 1:30;
if rem(nC,N)==0
xnC(nC) = cos(.1*pi*nC)-(4*sin(.2*pi*nC));
else
xnC(nC) = 0;
end
end
nB = -30:-1;
nC = 1:30;
nD = 0;
xnD = 0;
plot(nA,xnA,nB,xnB,'r--o',nC,xnC,'r--o',nD,xnD,'r--o')
This produces something that is close, but not close enough for proper data recovery.
I have tried using an index that has the same length but simply starts at 1 but the output was even worse than this, though if that is a viable option please explain thoroughly, how it should be done.
I have tried running this in a single for-loop with one if-statement but there is a problem when the counter passes zero. What is a way around this that would allow me to avoid using two for-loops? (I'm fairly confident that, solving this issue would increase the accuracy of my output enough to successfully recover the signal.)
EDIT/CLARIFICATION/ADD - 1
I do in fact want to evaluate the signal at the index of zero. The if-statement cannot handle an index of zero which is an index that I'd prefer not to skip.
The goal of this code is to be able to sample a signal, and then I will build a code that will put it through a recovery filter.
EDIT/UPDATE - 2
nA = -30:.1:30; % n values for original function
xnA = cos(.1*pi*nA)-(4*sin(.2*pi*nA)); % original function
N = 4; % sampling period
n = -30:30;
xn = zeros(size(n));
xn(rem(n,N)==0) = -(cos(.1*pi*n)-(4*sin(.2*pi*n)));
plot(nA,xnA,n,xn,'r--o')
title('Original seq. x and Sampled seq. xp')
xlabel('n')
ylabel('x(n) and xp(n)')
legend('original','sampled');
This threw an error at the line xn(rem(n,N)==0) = -(cos(.1*pi*n)-(4*sin(.2*pi*n))); which read: In an assignment A(I) = B, the number of elements in B and I must be the same. I have ran into this error before, but my previous encounters were usually the result of faulty looping. Could someone point out why it isn't working this time?
EDIT/Clarification - 3
N = 4; % sampling period
for nB = -30:30;
if rem(nB,N)==0
xnB(abs(nB)) = -(cos(.1*pi*nB)-(4*sin(.2*pi*nB)));
else
xnB(abs(nB)) = 0;
end
end
The error message resulting is as follows: Attempted to access xnB(0); index must be a positive integer or logical.
EDIT/SUCCESS - 4
After taking another look at the answers posted, I realized that the negative sign in front of the cos function wasn't supposed to be in the original coding.
You could do something like the following:
nB = -30:1
nC = 1:30
xnB = zeros(size(nB));
remB = rem(nB,N)==0;
xnB(remB) = -cos(.1*pi*nB(remB))-(4*sin(.2*pi*nB(remB));
xnC = zeros(size(nC));
remC = rem(nC,N)==0;
xnC(remC) = cos(.1*pi*nC(remC))-(4*sin(.2*pi*nC(remC)));
This avoids the issue of having for-loops entirely. However, this would produce the exact same output as you had before, so I'm not sure that it would fix your initial problem...
EDIT for your most recent addition:
nB = -30:30;
xnB = zeros(size(nB));
remB = rem(nB,N)==0;
xnB(remB) = -(cos(.1*pi*nB(remB))-(4*sin(.2*pi*nB(remB)));
In your original post you had the sign dependent on the sign of nB - if you wanted to maintain this functionality, you would do the following:
xnB(remB) = sign(nB(remB).*(cos(.1*pi*nB(remB))-(4*sin(.2*pi*nB(remB)));
From what I understand, you want to iterate over all integer values in [-30, 30] excluding 0 using a single for loop. this can be easily done as:
for ii = [-30:-1,1:30]
%Your code
end
Resolution for edit - 2
As per your updated code, try replacing
xn(rem(n,N)==0) = -(cos(.1*pi*n)-(4*sin(.2*pi*n)));
with
xn(rem(n,N)==0) = -(cos(.1*pi*n(rem(n,N)==0))-(4*sin(.2*pi*n(rem(n,N)==0))));
This should fix the dimension mismatch.
Resolution for edit - 3
Try:
N = 4; % sampling period
for nB = -30:30;
if rem(nB,N)==0
xnB(nB-(-30)+1) = -(cos(.1*pi*nB)-(4*sin(.2*pi*nB)));
else
xnB(nB-(-30)+1) = 0;
end
end
I have a matrix, matrix_logical(50000,100000), that is a sparse logical matrix (a lot of falses, some true). I have to produce a matrix, intersect(50000,50000), that, for each pair, i,j, of rows of matrix_logical(50000,100000), stores the number of columns for which rows i and j have both "true" as the value.
Here is the code I wrote:
% store in advance the nonzeros cols
for i=1:50000
nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end
intersect = zeros(50000,50000);
for i=1:49999
a = cell2mat(nonzeros{i});
for j=(i+1):50000
b = cell2mat(nonzeros{j});
intersect(i,j) = numel(intersect(a,b));
end
end
Is it possible to further increase the performance? It takes too long to compute the matrix. I would like to avoid the double loop in the second part of the code.
matrix_logical is sparse, but it is not saved as sparse in MATLAB because otherwise the performance become the worst possible.
Since the [i,j] entry counts the number of non zero elements in the element-wise multiplication of rows i and j, you can do it by multiplying matrix_logical with its transpose (you should convert to numeric data type first, e.g matrix_logical = single(matrix_logical)):
inter = matrix_logical * matrix_logical';
And it works both for sparse or full representation.
EDIT
In order to calculate numel(intersect(a,b))/numel(union(a,b)); (as asked in your comment), you can use the fact that for two sets a and b, you have
length(union(a,b)) = length(a) + length(b) - length(intersect(a,b))
so, you can do the following:
unLen = sum(matrix_logical,2);
tmp = repmat(unLen, 1, length(unLen)) + repmat(unLen', length(unLen), 1);
inter = matrix_logical * matrix_logical';
inter = inter ./ (tmp-inter);
If I understood you correctly, you want a logical AND of the rows:
intersct = zeros(50000, 50000)
for ii = 1:49999
for jj = ii:50000
intersct(ii, jj) = sum(matrix_logical(ii, :) & matrix_logical(jj, :));
intersct(jj, ii) = intersct(ii, jj);
end
end
Doesn't avoid the double loop, but at least works without the first loop and the slow find command.
Elaborating on my comment, here is a distance function suitable for pdist()
function out = distfun(xi,xj)
out = zeros(size(xj,1),1);
for i=1:size(xj,1)
out(i) = sum(sum( xi & xj(i,:) )) / sum(sum( xi | xj(i,:) ));
end
In my experience, sum(sum()) is faster for logicals than nnz(), thus its appearance above.
You would also need to use squareform() to reshape the output of pdist() appropriately:
squareform(pdist(martrix_logical,#distfun));
Note that pdist() includes a 'jaccard' distance measure, but it is actually the Jaccard distance and not the Jaccard index or coefficient, which is the value you are apparently after.