Replacing selective numbers with NaNs - performance

I have eight columns of data. Colulmns 1,3, 5 and 7 contain 3-digit numbers. Columns 2,4,6 and 8 contain 1s and zeros and correspond to 1, 3, 5 and 7 respectively. Where there is a zero in an even column I want to change the corresponding number to NaN. More simply, if it were
155 1 345 0
328 1 288 1
884 0 145 0
326 1 332 1
159 0 186 1
then 884 would be replaced with NaN, as would 159, 345 and 145 with the other numbers remaining the same. I need to use NaN to maintain the data in matrix form.
I know I could use
data(3,1)=Nan; data(5,1)=Nan
etc but this is very time consuming. Any suggestions would be very welcome.

Approach 1
a1 = [
155 1 345 0
328 1 288 1
884 0 145 0
326 1 332 1
159 0 186 1]
t1 = a1(:,[2:2:end])
data1 = a1(:,[1:2:end])
t1(t1==0)=NaN
t1(t1==1)=data1(t1==1)
a1(:,[1:2:end]) = t1
Output -
a1 =
155 1 NaN 0
328 1 288 1
NaN 0 NaN 0
326 1 332 1
NaN 0 186 1
Approach 2
[x1,y1] = find(~a1(:,[2:2:end]))
a1(sub2ind(size(a1),x1,2*y1-1)) = NaN

I would split the problem into two matrices, with one being a logical mask, the other holding your data.
data = your_mat(:,1:2:end);
valid = your_mat(:,2:2:end);
Then you can simply do:
data(~valid)=NaN;
You could then rebuild your data by doing:
your_mat(:,1:2:end) = data;

Here is an interesting solution, I would expect it to perform quite well, but be aware that it is a bit tricky!
data(~data(:,2:end))=NaN

Using logical indexing:
even = a1(:,2:2:end); % even columns
odd = a1(:,1:2:end); % odd columns
odd(even == 0) = NaN; % set odd columns to NaN if corresponding col is 0
a1(:,1:2:end) = odd; % assign back to a1
a1 =
155 1 NaN 0
328 1 288 1
NaN 0 NaN 0
326 1 332 1
NaN 0 186 1

Here is an alternative solution. You can use circshift, in the following manner.
First create a mask of the even columns of the same size of your input matrix A:
AM = false(size(A)); AM(:,2:2:end) = true;
Then circshift the mask (A==0)&AM one element to the left, to shift this mask on the odd columns.
A(circshift((A==0)&AM,[0 -1])) = nan;
NOTE: I've searched for a one-liner ... I don't think it's a good one, but here is one you can use, based on my solution:
A(circshift(bsxfun(#and, A==0, mod(0:size(A,2)-1,2)),[0 -1])) = nan;
The dirty thing with bsxfun is to create on-line the mask AM. I use for that the oddness test on a vector of indices, bsxfun extends it over the whole matrix A. You can do anything else to create this mask, of course.

Related

Quickly compute `dot(a(n:end), b(1:end-n))`

Suppose we have two, one dimensional arrays of values a and b which both have length N. I want to create a new array c such that c(n)=dot(a(n:N), b(1:N-n+1)) I can of course do this using a simple loop:
for n=1:N
c(n)=dot(a(n:N), b(1:N-n+1));
end
but given that this is such a simple operation which resembles a convolution I was wondering if there isn't a more efficient method to do this (using Matlab).
A solution using 1D convolution conv:
out = conv(a, flip(b));
c = out(ceil(numel(out)/2):end);
In conv the first vector is multiplied by the reversed version of the second vector so we need to compute the convolution of a and the flipped b and trim the unnecessary part.
This is an interesting problem!
I am going to assume that a and b are column vectors of the same length. Let us consider a simple example:
a = [9;10;2;10;7];
b = [1;3;6;10;10];
% yields:
c = [221;146;74;31;7];
Now let's see what happens when we compute the convolution of these vectors:
>> conv(a,b)
ans =
9
37
86
166
239
201
162
170
70
>> conv2(a, b.')
ans =
9 27 54 90 90
10 30 60 100 100
2 6 12 20 20
10 30 60 100 100
7 21 42 70 70
We notice that c is the sum of elements along the lower diagonals of the result of conv2. To show it clearer we'll transpose to get the diagonals in the same order as values in c:
>> triu(conv2(a.', b))
ans =
9 10 2 10 7
0 30 6 30 21
0 0 12 60 42
0 0 0 100 70
0 0 0 0 70
So now it becomes a question of summing the diagonals of a matrix, which is a more common problem with existing solution, for example this one by Andrei Bobrov:
C = conv2(a.', b);
p = sum( spdiags(C, 0:size(C,2)-1) ).'; % This gives the same result as the loop.

R - how to pick a random sample with specific percentages

This is snapshot of my dataset
A B
1 34
1 33
1 66
0 54
0 77
0 98
0 39
0 12
I am trying to create a random sample where there are 2 1s and 3 0s from column A in the sample along with their respective B values. Is there a way to do that? Basically trying to see how to get a sample with specific percentages of a particular column? Thanks.

How to substitute a for-loop with vecorization acting several thousand times per data.frame row?

Being still quite wet behind the ears concerning R and - more important - vectorization, I cannot get my head around how to speed up the code below.
The for-loop calculates a number of seeds falling onto a road for several road segments with different densities of seed-generating plants by applying a random propability for every seed.
As my real data frame has ~200k rows and seed numbers are up to 300k/segment, using the example below would take several hours on my current machine.
#Example data.frame
df <- data.frame(Density=c(0,0,0,3,0,120,300,120,0,0))
#Example SeedRain vector
SeedRainDists <- c(7.72,-43.11,16.80,-9.04,1.22,0.70,16.48,75.06,42.64,-5.50)
#Calculating the number of seeds from plant densities
df$Seeds <- df$Density * 500
#Applying a probability of reaching the road for every seed
df$SeedsOnRoad <- apply(as.matrix(df$Seeds),1,function(x){
SeedsOut <- 0
if(x>0){
#Summing up the number of seeds reaching a certain distance
for(i in 1:x){
SeedsOut <- SeedsOut +
ifelse(sample(SeedRainDists,1,replace=T)>40,1,0)
}
}
return(SeedsOut)
})
If someone might give me a hint as to how the loop could be substituted by vectorization - or maybe how the data could be organized better in the first place to improve performance - I would be very grateful!
Edit: Roland's answer showed that I may have oversimplified the question. In the for-loop I extract a random value from a distribution of distances recorded by another author (that's why I can't supply the data here). Added an exemplary vector with likely values for SeedRain distances.
This should do about the same simulation:
df$SeedsOnRoad2 <- sapply(df$Seeds,function(x){
rbinom(1,x,0.6)
})
# Density Seeds SeedsOnRoad SeedsOnRoad2
#1 0 0 0 0
#2 0 0 0 0
#3 0 0 0 0
#4 3 1500 892 877
#5 0 0 0 0
#6 120 60000 36048 36158
#7 300 150000 90031 89875
#8 120 60000 35985 35773
#9 0 0 0 0
#10 0 0 0 0
One option is generate the sample() for all Seeds per row of df in a single go.
Using set.seed(1) before your loop-based code I get:
> df
Density Seeds SeedsOnRoad
1 0 0 0
2 0 0 0
3 0 0 0
4 3 1500 289
5 0 0 0
6 120 60000 12044
7 300 150000 29984
8 120 60000 12079
9 0 0 0
10 0 0 0
I get the same answer in a fraction of the time if I do:
set.seed(1)
tmp <- sapply(df$Seeds,
function(x) sum(sample(SeedRainDists, x, replace = TRUE) > 40)))
> tmp
[1] 0 0 0 289 0 12044 29984 12079 0 0
For comparison:
df <- transform(df, GavSeedsOnRoad = tmp)
df
> df
Density Seeds SeedsOnRoad GavSeedsOnRoad
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 3 1500 289 289
5 0 0 0 0
6 120 60000 12044 12044
7 300 150000 29984 29984
8 120 60000 12079 12079
9 0 0 0 0
10 0 0 0 0
The points to note here are:
try to avoid calling a function repeatedly in a loop if you the function is vectorised or can generate the entire end result with a single call. Here you were calling sample() Seeds times for each row of df, each call returning a single sample from SeedRainDists. Here I do a single sample() call asking for sample size Seeds, for each row of df - hence I call sample 10 times, your code called it 271500 times.
even if you have to repeatedly call a function in a loop, remove from the loop anything that is vectorised that could be done on the entire result after the loop is done. An example here is your accumulating of SeedsOut, which is calling +() a large number of times.
Better would have been to collect each SeedsOut in a vector, and then sum() that vector outside the loop. E.g.
SeedsOut <- numeric(length = x)
for(i in seq_len(x)) {
SeedsOut[i] <- ifelse(sample(SeedRainDists,1,replace=TRUE)>40,1,0)
}
sum(SeedOut)
Note that R treats a logical as if it were numeric 0s or 1s where used in any mathematical function. Hence
sum(ifelse(sample(SeedRainDists, 100, replace=TRUE)>40,1,0))
and
sum(sample(SeedRainDists, 100, replace=TRUE)>40)
would give the same result if run with the same set.seed().
There may be a fancier way of doing the sampling requiring fewer calls to sample() (and there is, sample(SeedRainDists, sum(Seeds), replace = TRUE) > 40 but then you need to take care of selecting the right elements of that vector for each row of df - not hard, just a light cumbersome), but what i show may be efficient enough?

calculate exponential moving average in matrix with nan values

suppose I have the following matrix
a =
76 NaN 122 NaN
78 NaN 123 NaN
84 NaN 124 54
77 NaN 126 58
82 45 129 62
90 50 135 45
76 63 133 66
79 52 122 49
88 56 140 24
Is there any way to calculate exponential moving average for each column, disregarding the first NaN values? For instance, if I use a 3 days exponential factor, I would expect to get a matrix starting with 2 NaN values in the 1st column, 6 NaN values in the 2nd column,2 NaN values in the 3rd column and 4 NaN values in the 4th column. Any suggestion? Thank you in advance
Just use filter on the whole matrix, which will pass through the NaN's as appropriate. If you want to "infect" edge values with NaN as well, add some extras at the top edge, then trim the result:
kernel = [1 1 1].'; % Any 3-element kernel, as column vector
a2 = [repmat(NaN, 2, 4); a]; % Add extra NaN's at the start, to avoid partial answers
xtemp = filter(kernel, 1, a2);
x = xtemp(3:end, :);

How can I draw a triangle in an image in MATLAB?

I need to draw a triangle in an image I have loaded. The triangle should look like this:
1 0 0 0 0 0
1 1 0 0 0 0
1 1 1 0 0 0
1 1 1 1 0 0
1 1 1 1 1 0
1 1 1 1 1 1
But the main problem I have is that I do not know how I can create a matrix like that. I want to multiply this matrix with an image, and the image matrix consists of 3 parameters (W, H, RGB).
You can create a matrix like the one in your question by using the TRIL and ONES functions:
>> A = tril(ones(6))
A =
1 0 0 0 0 0
1 1 0 0 0 0
1 1 1 0 0 0
1 1 1 1 0 0
1 1 1 1 1 0
1 1 1 1 1 1
EDIT: Based on your comment below, it sounds like you have a 3-D RGB image matrix B and that you want to multiply each color plane of B by the matrix A. This will have the net result of setting the upper triangular part of the image (corresponding to all the zeroes in A) to black. Assuming B is a 6-by-6-by-3 matrix (i.e. the rows and columns of B match those of A), here is one solution that uses indexing (and the function REPMAT) instead of multiplication:
>> B = randi([0 255],[6 6 3],'uint8'); % A random uint8 matrix as an example
>> B(repmat(~A,[1 1 3])) = 0; % Set upper triangular part to 0
>> B(:,:,1) % Take a peek at the first plane
ans =
8 0 0 0 0 0
143 251 0 0 0 0
225 40 123 0 0 0
171 219 30 74 0 0
48 165 150 157 149 0
94 96 57 67 27 5
The call to REPMAT replicates a negated version of A 3 times so that it has the same dimensions as B. The result is used as a logical index into B, setting the non-zero indices to 0. By using indexing instead of multiplication, you can avoid having to worry about converting A and B to the same data type (which would be required to do the multiplication in this case since A is of type double and B is of type uint8).

Resources