Identify gaps in repeated sequences - algorithm

I have a vector that should contain n sequences from 00 to 11
A = [00;01;02;03;04;05;06;07;08;09;10;11;00;01;02;03;04;05;06;07;08;09;10;11]
and I would like to check that the sequence "00 - 11 " is always respected (no missing values).
for example if
A =[00;01;02; 04;05;06;07;08;09;10;11;00;01;02;03;04;05;06;07;08;09;10;11]
(missing 03 in the 3rd position)
For each missing value I would like to have back this information in another vector
missing=
[value_1,position_1;
value_2, position_2;
etc, etc]
Can you help me?

For sure we know that the last element must be 11, so we can already check for this and make our life easier for testing all previous elements. We ensure that A is 11-terminated, so an "element-wise change" approach (below) will be valid. Note that the same is true for the beginning, but changing A there would mess with indices, so we better take care of that later.
missing = [];
if A(end) ~= 11
missing = [missing; 11, length(A) + 1];
A = [A, 11];
end
Then we can calculate the change dA = A(2:end) - A(1:end-1); from one element to another, and identify the gap positions idx_gap = find((dA~=1) & (dA~=-11));. Now we need to expand all missing indices and expected values, using ev for the expected value. ev can be obtained from the previous value, as in
for k = 1 : length(idx_gap)
ev = A(idx_gap(k));
Now, the number of elements to fill in is the change dA in that position minus one (because one means no gap). Note that this can wrap over if there is a gap at the boundary between segments, so we use the modulus.
for n = 1 : mod(dA(idx_gap(k)) - 1, 12)
ev = mod(ev + 1, 12);
missing = [missing; ev, idx_gap(k) + 1];
end
end
As a test, consider A = [5 6 7 8 9 10 3 4 5 6 7 8 9 10 11 0 1 2 3 4 6 7 8]. That's a case where the special initialization from the beginning will fire, memorizing the missing 11 already, and changing A to [5 6 ... 7 8 11]. missing then will yield
11 24 % recognizes improper termination of A.
11 7
0 7 % properly handles wrap-over here.
1 7
2 7
5 21 % recognizes single element as missing.
9 24
10 24
which should be what you are expecting. Now what's missing still is the beginning of A, so let's say missing = [0 : A(1) - 1, 1; missing]; to complete the list.

This will give you the missing values and their positions in the full sequence:
N = 11; % specify the repeating 0:N sub-sequence
n = 3; % reps of sub-sequence
A = [5 6 7 8 9 10 3 4 5 6 7 8 9 10 11 0 1 2 3 4 6 7 8]'; %' column from s.bandara
da = diff([A; N+1]); % EDITED to include missing end
skipLocs = find(~(da==1 | da==-N));
skipLength = da(skipLocs)-1;
skipLength(skipLength<0) = N + skipLength(skipLength<0) + 1;
firstSkipVal = A(skipLocs)+1;
patchFun = #(x,y)(0:y)'+x - (N+1)*(((0:y)'+x)>N);
patches = arrayfun(patchFun,firstSkipVal,skipLength-1,'uni',false);
locs = arrayfun(#(x,y)(x:x+y)',skipLocs+cumsum([A(1); skipLength(1:end-1)])+1,...
skipLength-1,'uni',false);
Then putting them together, including any missing values at the beginning:
>> gapMap = [vertcat(patches{:}) vertcat(locs{:})-1]; % not including lead
>> gapMap = [repmat((0 : A(1) - 1)',1,2); gapMap] %' including lead
gapMap =
0 0
1 1
2 2
3 3
4 4
11 11
0 12
1 13
2 14
5 29
9 33
10 34
11 35
The first column contains the missing values. The second column is the 0-based location in the hypothetical full sequence.
>> Afull = repmat(0:N,1,n)
>> isequal(gapMap(:,1), Afull(gapMap(:,2)+1)')
ans =
1

Although this doesn't solve your problem completely, you can identify the position of missing values, or of groups of contiguous missing values, like this:
ind = 1+find(~ismember(diff(A),[1 -11]));
ind gives the position with respect to the current sequence A, not to the completed sequence.
For example, with
A =[00;01;02; 04;05;06;07;08;09;10;11;00;01;02;03; ;06;07;08;09;10;11];
this gives
>> ind = 1+find(~ismember(diff(A),[1 -11]))
ind =
4
16

Related

How to extract vectors from a given condition matrix in Octave

I'm trying to extract a matrix with two columns. The first column is the data that I want to group into a vector, while the second column is information about the group.
A =
1 1
2 1
7 2
9 2
7 3
10 3
13 3
1 4
5 4
17 4
1 5
6 5
the result that i seek are
A1 =
1
2
A2 =
7
9
A3 =
7
10
13
A4=
1
5
17
A5 =
1
6
as an illustration, I used the eval function but it didn't give the results I wanted
Assuming that you don't actually need individually named separated variables, the following will put the values into separate cells of a cell array, each of which can be an arbitrary size and which can be then retrieved using cell index syntax. It makes used of logical indexing so that each iteration of the for loop assigns to that cell in B just the values from the first column of A that have the correct number in the second column of A.
num_cells = max (A(:,2));
B = cell (num_cells,1);
for idx = 1:max(A(:,2))
B(idx) = A((A(:,2)==idx),1);
end
B =
{
[1,1] =
1
2
[2,1] =
7
9
[3,1] =
7
10
13
[4,1] =
1
5
17
[5,1] =
1
6
}
Cell arrays are accessed a bit differently than normal numeric arrays. Array indexing (with ()) will return another cell, e.g.:
>> B(1)
ans =
{
[1,1] =
1
2
}
To get the contents of the cell so that you can work with them like any other variable, index them using {}.
>> B{1}
ans =
1
2
How it works:
Use max(A(:,2)) to find out how many array elements are going to be needed. A(:,2) uses subscript notation to indicate every value of A in column 2.
Create an empty cell array B with the right number of cells to contain the separated parts of A. This isn't strictly necessary, but with large amounts of data, things can slow down a lot if you keep adding on to the end of an array. Pre-allocating is usually better.
For each iteration of the for loop, it determines which elements in the 2nd column of A have the value matching the value of idx. This returns a logical array. For example, for the third time through the for loop, idx = 3, and:
>> A_index3 = A(:,2)==3
A_index3 =
0
0
0
0
1
1
1
0
0
0
0
0
That is a logical array of trues/falses indicating which elements equal 3. You are allowed to mix both logical and subscripts when indexing. So using this we can retrieve just those values from the first column:
A(A_index3, 1)
ans =
7
10
13
we get the same result if we do it in a single line without the A_index3 intermediate placeholder:
>> A(A(:,2)==3, 1)
ans =
7
10
13
Putting it in a for loop where 3 is replaced by the loop variable idx, and we assign the answer to the idx location in B, we get all of the values separated into different cells.

Take out elements from a vector that meets certain condition

I have two vectors, A = [1,3,5] and B = [1,2,3,4,5,6,7,8,9,10]. I want to get C=[2,4,6,7,8,9,10] by extracting some elements from B that A doesn't have.
I don't want to use loops, because this is a simplified problem from a real data simulation. In the real case A and B are huge, but A is included in B.
Here are two methods,
C=setdiff(B,A)
but if values are repeated in B they will only come up once in C, or
C=B(~ismember(B,A))
which will preserve repeated values in B.
One approach with unique, sort and diff -
C = [A B];
[~,~,idC] = unique(C);
[sidC,id_idC] = sort(idC);
start_id = id_idC(diff([0 sidC])==1);
out = C(start_id(start_id>numel(A)))
Sample runs -
Case #1 (Sample from question):
A =
1 3 5
B =
1 2 3 4 5 6 7 8 9 10
out =
2 4 6 7 8 9 10
Case #2 (Bit more generic case):
A =
11 15 14
B =
19 14 6 8 9 11 15
out =
6 8 9 19

Transpose and reshape a 3d array in matlab

Suppose I have an array X of size n by p by q. I would like to reshape it as a matrix with p rows, and in each row put the concatenation of the n rows of size q, resulting in a matrix of size p by nq.
I managed to do it with a loop but it takes a while say if n=1000, p=300, q=300.
F0=[];
for k=1:size(F,1)
F0=[F0,squeeze(X(k,:,:))];
end
Is there a faster way?
I think this is what you want:
Y = reshape(permute(X, [2 1 3]), size(X,2), []);
Example with n=2, p=3, q=4:
>> X
X(:,:,1) =
0 6 9
8 3 0
X(:,:,2) =
4 7 1
3 7 4
X(:,:,3) =
4 7 2
6 7 6
X(:,:,4) =
6 1 9
1 4 3
>> Y = reshape(permute(X, [2 1 3]), size(X,2), [])
Y =
0 8 4 3 4 6 6 1
6 3 7 7 7 7 1 4
9 0 1 4 2 6 9 3
Try this -
reshape(permute(X,[2 3 1]),p,[])
Thus, for code verification, one can look into a sample case run -
n = 2;
p = 3;
q = 4;
X = rand(n,p,q)
F0=[];
for k=1:n
F0=[F0,squeeze(X(k,:,:))];
end
F0
F0_noloop = reshape(permute(X,[2 3 1]),p,[])
Output is -
F0 =
0.4134 0.6938 0.3782 0.4775 0.2177 0.0098 0.7043 0.6237
0.1257 0.8432 0.7295 0.2364 0.3089 0.9223 0.2243 0.1771
0.7261 0.7710 0.2691 0.8296 0.7829 0.0427 0.6730 0.7669
F0_noloop =
0.4134 0.6938 0.3782 0.4775 0.2177 0.0098 0.7043 0.6237
0.1257 0.8432 0.7295 0.2364 0.3089 0.9223 0.2243 0.1771
0.7261 0.7710 0.2691 0.8296 0.7829 0.0427 0.6730 0.7669
Rather than using vectorization to solve the problem, you could look at the code to try and figure out what may improve performance. In this case, since you know the size of your output matrix F0 should be px(n*q), you could pre-allocate memory to F0 and avoid the constant resizing of the matrix at each iteration of the for loop
n=1000;
p=300;
q=300;
F0=zeros(p,n*q);
for k=1:size(F,1)
F0(:,(k-1)*q+1:k*q) = squeeze(F(k,:,:));
end
While probably not as efficient as the other two solutions, it is an alternative. Try the above and see what happens!

Algorithm suggestion

I'm looking for the best way to accomplish the following tasks:
Given 4 non-repeatable numbers between 1 and 9.
Given 2 numbers between 1 and 6.
Adding up the two numbers (1 to 6), check to see if there is a way make that same number using the four non-repeatable numbers (1 to 9), plus you may not even have to use all four numbers.
Example:
Your four non-repeatable (1 to 9) numbers are: 2, 4, 6, and 7
Your two numbers between 1 and 6 are: 3 and 3
The total for the two numbers is 3 + 3 = 6.
Looking at the four non-repeatable (1 to 9) numbers, you can make a 6 in two different ways:
2 + 4 = 6
6 = 6
So, this example returns "yes, there is a possible solution".
How do I accomplish this task in the most efficient, cleanest way possible, algorithmic-ally.
enter code hereSince the number of elements here is 4 so we should not worry about efficiency.
Just loop over 0 to 15 and use it as a bit mask to check what are the valid results that can be generated.
Here is a code in python to give you idea.
a = [2,4,6,7]
for i in range(16):
x = i
ans = 0
for j in range(4):
if(x%2):
ans += a[j]
x /= 2
print ans,
0 2 4 6 6 8 10 12 7 9 11 13 13 15 17 19

Summation of difference between matrix elements

I am in the process of building a function in MATLAB. As a part of it I have to calculate differences between elements in two matrices and sum them up.
Let me explain considering two matrices,
1 2 3 4 5 6
13 14 15 16 17 18
and
7 8 9 10 11 12
19 20 21 22 23 24
The calculations in the first row - only four elements in both matrices are considered at once (zero indicates padding):
(1-8)+(2-9)+(3-10)+(4-11): This replaces 1 in initial matrix.
(2-9)+(3-10)+(4-11)+(5-12): This replaces 2 in initial matrix.
(3-10)+(4-11)+(5-12)+(6-0): This replaces 3 in initial matrix.
(4-11)+(5-12)+(6-0)+(0-0): This replaces 4 in initial matrix. And so on
I am unable to decide how to code this in MATLAB. How do I do it?
I use the following equation.
Here i ranges from 1 to n(h), n(h), the number of distant pairs. It depends on the lag distance chosen. So if I choose a lag distance of 1, n(h) will be the number of elements - 1.
When I use a 7 X 7 window, considering the central value, n(h) = 4 - 1 = 3 which is the case here.
You may want to look at the circshfit() function:
a = [1 2 3 4; 9 10 11 12];
b = [5 6 7 8; 12 14 15 16];
for k = 1:3
b = circshift(b, [0 -1]);
b(:, end) = 0;
diff = sum(a - b, 2)
end

Resources