I'm trying to extract a matrix with two columns. The first column is the data that I want to group into a vector, while the second column is information about the group.
A =
1 1
2 1
7 2
9 2
7 3
10 3
13 3
1 4
5 4
17 4
1 5
6 5
the result that i seek are
A1 =
1
2
A2 =
7
9
A3 =
7
10
13
A4=
1
5
17
A5 =
1
6
as an illustration, I used the eval function but it didn't give the results I wanted
Assuming that you don't actually need individually named separated variables, the following will put the values into separate cells of a cell array, each of which can be an arbitrary size and which can be then retrieved using cell index syntax. It makes used of logical indexing so that each iteration of the for loop assigns to that cell in B just the values from the first column of A that have the correct number in the second column of A.
num_cells = max (A(:,2));
B = cell (num_cells,1);
for idx = 1:max(A(:,2))
B(idx) = A((A(:,2)==idx),1);
end
B =
{
[1,1] =
1
2
[2,1] =
7
9
[3,1] =
7
10
13
[4,1] =
1
5
17
[5,1] =
1
6
}
Cell arrays are accessed a bit differently than normal numeric arrays. Array indexing (with ()) will return another cell, e.g.:
>> B(1)
ans =
{
[1,1] =
1
2
}
To get the contents of the cell so that you can work with them like any other variable, index them using {}.
>> B{1}
ans =
1
2
How it works:
Use max(A(:,2)) to find out how many array elements are going to be needed. A(:,2) uses subscript notation to indicate every value of A in column 2.
Create an empty cell array B with the right number of cells to contain the separated parts of A. This isn't strictly necessary, but with large amounts of data, things can slow down a lot if you keep adding on to the end of an array. Pre-allocating is usually better.
For each iteration of the for loop, it determines which elements in the 2nd column of A have the value matching the value of idx. This returns a logical array. For example, for the third time through the for loop, idx = 3, and:
>> A_index3 = A(:,2)==3
A_index3 =
0
0
0
0
1
1
1
0
0
0
0
0
That is a logical array of trues/falses indicating which elements equal 3. You are allowed to mix both logical and subscripts when indexing. So using this we can retrieve just those values from the first column:
A(A_index3, 1)
ans =
7
10
13
we get the same result if we do it in a single line without the A_index3 intermediate placeholder:
>> A(A(:,2)==3, 1)
ans =
7
10
13
Putting it in a for loop where 3 is replaced by the loop variable idx, and we assign the answer to the idx location in B, we get all of the values separated into different cells.
Related
I was looking at the code for Counting Sort on GeeksForGeeks and during the final stage of the algorithm where the elements from the original array are inserted into their final locations in the sorted array (the second-to-last for loop), the input array is traversed in reverse order.
I can't seem to understand why you can't just go from the beginning of the input array to the end, like so :
for i in range(len(arr)):
output_arr[count_arr[arr[i] - min_element] - 1] = arr[i]
count_arr[arr[i] - min_element] -= 1
Is there some subtle reason for going in reverse order that I'm missing? Apologies if this is a very obvious question. I saw Counting Sort implemented in the same style here as well.
Any comments would be helpful, thank you!
Stability. With your way, the order of equal-valued elements gets reversed instead of preserved. Going over the input backwards cancels out the backwards copying (that -= 1 thing).
To process an array in forward order, the count / index array either needs to be one element larger so that the starting index is 0 or two local variables can be used. Example for integer array:
def countSort(arr):
output = [0 for i in range(len(arr))]
count = [0 for i in range(257)] # change
for i in arr:
count[i+1] += 1 # change
for i in range(256):
count[i+1] += count[i] # change
for i in range(len(arr)):
output[count[arr[i]]] = arr[i] # change
count[arr[i]] += 1 # change
return output
arr = [4,3,0,1,3,7,0,2,6,3,5]
ans = countSort(arr)
print(ans)
or using two variables, s to hold the running sum, c to hold the current count:
def countSort(arr):
output = [0 for i in range(len(arr))]
count = [0 for i in range(256)]
for i in arr:
count[i] += 1
s = 0
for i in range(256):
c = count[i]
count[i] = s
s = s + c
for i in range(len(arr)):
output[count[arr[i]]] = arr[i]
count[arr[i]] += 1
return output
arr = [4,3,0,1,3,7,0,2,6,3,5]
ans = countSort(arr)
print(ans)
Here We are Considering Stable Sort --> which is actually considering the Elements position by position.
For eg if we have array like
arr--> 5 ,8 ,3, 1, 1, 2, 6
0 1 2 3 4 5 6 7 8
count-> 0 2 1 1 0 1 1 0 1
Now we take cummulative sum of all frequencies
0 1 2 3 4 5 6 7 8
count-> 0 2 3 4 4 5 6 6 7
After Traversing the Original array , we prefer from last Since
we want to add Elements on their proper position so when we subtract the index , the Element will be added to lateral position.
But if we start traversing from beginning , then there will be no meaning for taking the cummulative sum since we are not adding according to the Elements placed. We are adding hap -hazardly which can be done even if we not take their cummulative sum.
I just got a question about counting the split points in a integer array, to ensure there is at least one duplicated integer on the two sides.
ex:
1 1 4 2 4 2 4 1
we can either split it into:
1 1 4 2 | 4 2 4 1
or
1 1 4 2 4 | 2 4 1
so that there is at least one '1', '2' ,and '4' are in both sides.
The integer can range from 1 to 100,000
The complexity requires O(n). How to solve this question?
Make one pass over the array and build count[i] = how many times the value i appears in the array. The problem is only solvable if count[i] >= 2 for all non-zero values. You can use this array to tell how many distinct values you have in your array.
Next, make another pass and using another array count2[i] (or you can reuse the first one), keep track of when you have visited each value at least once. Then use that position as your split point.
Example:
1 1 4 2 4 2 4 1
count = [3, 2, 0, 4] => 3 distinct values
1 1 4 2 4 2 4 1
^ => 1 distinct value so far
^ => 1 distinct value so far
^ => 2 distinct values so far
^ => 3 distinct values so far => this is your split point
There might be cases for which there is no solution, for example if the last 1 was at the beginning as well. To detect this, you can just make another pass over the rest of the array after you have decided on the split point and see if you still have all the values on that side.
You can avoid this last pass by using the count and count2 arrays to detect when you can no longer have a split point. This is left as an exercise.
I have a problem with sorting some finance data based on firmnumbers. So given is a matrix that looks like:
[1 3 4 7;
1 2 7 8;
2 3 7 8;]
On Matlab i would like the matrix to be sorted as follows:
[1 0 3 4 7 0;
1 2 0 0 7 8;
0 2 3 0 7 8;]
So basically every column needs to consist of 1 type of number.
I have tried many things but i cant get the matrix sorted properly.
A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
%// Get a unique list of numbers in the order that you want them to appear as the new columns
U = unique(A(:))'
%'//For each column (of your output, same as columns of U), find which rows have that number. Do this by making A 3D so that bsxfun compares each element with each element
temp1 = bsxfun(#eq,permute(A,[1,3,2]),U)
%// Consolidate this into a boolean matrix with the right dimensions and 1 where you'll have a number in your final answer
temp2 = any(temp1,3)
%// Finally multiply each line with U
bsxfun(#times, temp2, U)
So you can do that all in one line but I broke it up to make it easier to understand. I suggest you run each line and look at the output to see how it works. It might seem complicated but it's worthwhile getting to understand bsxfun as it's a really useful function. The first use which also uses permute is a bit more tricky so I suggest you first make sure you understand that last line and then work backwards.
What you are asking can also be seen as an histogram
A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
uniquevalues = unique(A(:))
N = histc(A,uniquevalues' ,2) %//'
B = bsxfun(#times,N,uniquevalues') %//'
%// bsxfun can replace the following instructions:
%//(the instructions are equivalent only when each value appears only once per row )
%// B = repmat(uniquevalues', size(A,1),1)
%// B(N==0) = 0
Answer without assumptions - Simplified
I did not feel comfortable with my old answer that makes the assumption of everything being an integer and removed the possibility of duplicates, so I came up with a different solution based on #lib's suggestion of using a histogram and counting method.
The only case I can see this not working for is if a 0 is entered. you will end up with a column of all zeros, which one might interpret as all rows initially containing a zero, but that would be incorrect. you could uses nan instead of zeros in that case, but not sure what this data is being put into, and if it that processing would freak out.
EDITED
Includes sorting of secondary matrix, B, along with A.
A = [-1 3 4 7 9; 0 2 2 7 8.2; 2 3 5 9 8];
B = [5 4 3 2 1; 1 2 3 4 5; 10 9 8 7 6];
keys = unique(A);
[counts,bin] = histc(A,transpose(unique(A)),2);
A_sorted = cell(size(A,1),1);
for ii = 1:size(A,1)
for jj = 1:numel(keys)
temp = zeros(1,max(counts(:,jj)));
temp(1:counts(ii,jj)) = keys(jj);
A_sorted{ii} = [A_sorted{ii},temp];
end
end
A_sorted = cell2mat(A_sorted);
B_sorted = nan(size(A_sorted));
for ii = 1:size(bin,1)
for jj = 1:size(bin,2)
idx = bin(ii,jj);
while ~isnan(B_sorted(ii,idx))
idx = idx+1;
end
B_sorted(ii,idx) = B(ii,jj);
end
end
B_sorted(isnan(B_sorted)) = 0
You can create at the beginning a matrix with 9 columns , and treat the values in your original matrix as column indexes.
A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
B = zeros(3,max(A(:)))
for i = 1:size(A,1)
B(i,A(i,:)) = A(i,:)
end
B(:,~any(B,1)) = []
I have a vector that should contain n sequences from 00 to 11
A = [00;01;02;03;04;05;06;07;08;09;10;11;00;01;02;03;04;05;06;07;08;09;10;11]
and I would like to check that the sequence "00 - 11 " is always respected (no missing values).
for example if
A =[00;01;02; 04;05;06;07;08;09;10;11;00;01;02;03;04;05;06;07;08;09;10;11]
(missing 03 in the 3rd position)
For each missing value I would like to have back this information in another vector
missing=
[value_1,position_1;
value_2, position_2;
etc, etc]
Can you help me?
For sure we know that the last element must be 11, so we can already check for this and make our life easier for testing all previous elements. We ensure that A is 11-terminated, so an "element-wise change" approach (below) will be valid. Note that the same is true for the beginning, but changing A there would mess with indices, so we better take care of that later.
missing = [];
if A(end) ~= 11
missing = [missing; 11, length(A) + 1];
A = [A, 11];
end
Then we can calculate the change dA = A(2:end) - A(1:end-1); from one element to another, and identify the gap positions idx_gap = find((dA~=1) & (dA~=-11));. Now we need to expand all missing indices and expected values, using ev for the expected value. ev can be obtained from the previous value, as in
for k = 1 : length(idx_gap)
ev = A(idx_gap(k));
Now, the number of elements to fill in is the change dA in that position minus one (because one means no gap). Note that this can wrap over if there is a gap at the boundary between segments, so we use the modulus.
for n = 1 : mod(dA(idx_gap(k)) - 1, 12)
ev = mod(ev + 1, 12);
missing = [missing; ev, idx_gap(k) + 1];
end
end
As a test, consider A = [5 6 7 8 9 10 3 4 5 6 7 8 9 10 11 0 1 2 3 4 6 7 8]. That's a case where the special initialization from the beginning will fire, memorizing the missing 11 already, and changing A to [5 6 ... 7 8 11]. missing then will yield
11 24 % recognizes improper termination of A.
11 7
0 7 % properly handles wrap-over here.
1 7
2 7
5 21 % recognizes single element as missing.
9 24
10 24
which should be what you are expecting. Now what's missing still is the beginning of A, so let's say missing = [0 : A(1) - 1, 1; missing]; to complete the list.
This will give you the missing values and their positions in the full sequence:
N = 11; % specify the repeating 0:N sub-sequence
n = 3; % reps of sub-sequence
A = [5 6 7 8 9 10 3 4 5 6 7 8 9 10 11 0 1 2 3 4 6 7 8]'; %' column from s.bandara
da = diff([A; N+1]); % EDITED to include missing end
skipLocs = find(~(da==1 | da==-N));
skipLength = da(skipLocs)-1;
skipLength(skipLength<0) = N + skipLength(skipLength<0) + 1;
firstSkipVal = A(skipLocs)+1;
patchFun = #(x,y)(0:y)'+x - (N+1)*(((0:y)'+x)>N);
patches = arrayfun(patchFun,firstSkipVal,skipLength-1,'uni',false);
locs = arrayfun(#(x,y)(x:x+y)',skipLocs+cumsum([A(1); skipLength(1:end-1)])+1,...
skipLength-1,'uni',false);
Then putting them together, including any missing values at the beginning:
>> gapMap = [vertcat(patches{:}) vertcat(locs{:})-1]; % not including lead
>> gapMap = [repmat((0 : A(1) - 1)',1,2); gapMap] %' including lead
gapMap =
0 0
1 1
2 2
3 3
4 4
11 11
0 12
1 13
2 14
5 29
9 33
10 34
11 35
The first column contains the missing values. The second column is the 0-based location in the hypothetical full sequence.
>> Afull = repmat(0:N,1,n)
>> isequal(gapMap(:,1), Afull(gapMap(:,2)+1)')
ans =
1
Although this doesn't solve your problem completely, you can identify the position of missing values, or of groups of contiguous missing values, like this:
ind = 1+find(~ismember(diff(A),[1 -11]));
ind gives the position with respect to the current sequence A, not to the completed sequence.
For example, with
A =[00;01;02; 04;05;06;07;08;09;10;11;00;01;02;03; ;06;07;08;09;10;11];
this gives
>> ind = 1+find(~ismember(diff(A),[1 -11]))
ind =
4
16
If I have a 2d array the way to represent each element in 1 dimension is to use row_num * row_width + column if i want the element at row_num, column. But what I'm struggling with is how big should the 1 dimensional array be if I have a 3x3 2d array (just as an example). Shouldn't 3^3 = 9 be enough for the 1d array? But then for element 3,2 the index would be 3 * 3 + 2 = 11. Or should the size be that of the biggest index I want to address - e.g. 3 * 3 + 3 = 12 if I want to address all elements from a 3x3 2d array?
You need to start counting from zero (zero-indexing), where the rows and columns are 0,1,2.
Then element "(3,2)" is really "(2, 1)", or 2*3+1=7, and the final element "(3,3)" is really "(2,2)", which is 2*3+2=8. This is the last element in the 1-D array, because they're counted from 0 too, so the 9 elements are 0,1,2,3,4,5,6,7,8.
For example:
>>> for r in 0,1,2:
... for c in 0,1,2:
... print r, c, r*3+c
...
0 0 0
0 1 1
0 2 2
1 0 3
1 1 4
1 2 5
2 0 6
2 1 7
2 2 8