Separate labeled data

Separate labeled data - algorithm

I have a matrix of the size n x p(n data points). Using MATLAB k-means algorithm gives me an n x 1 array, where each element specifies the label of the data point at the same position in the data matrix.
What is the best way to separate the original data into multiple matrices according to the label specified in the other array?
Example
data_points =
-0.0168 0.0689
-0.0064 -0.0632
0.0527 0.0509
-0.0468 0.0152
labels =
1
2
2
1
Thus the first and the last data point should be in a new array and the second and third should be in a new array.

Related

Is there a way to create a summed-are table of a matrix with just one iteration?

Constrains:
You can't iterate the matrix more than once.
If we name the matrix A then there are two of those matrices available, one is 'read-only' and the other is 'read/write'. We will use the 'read/write' matrix to construct the summed-area table.
For example this code here:
http://www.geeksforgeeks.org/submatrix-sum-queries/
Iterares 2 times: 1) summing all columns
2) summing all rows

Useful picture for summed area tables from Wikipedia:
During construction, we already have A, B and C (for the edges, they would be zero), and want to compute D. The area spanned by that rectangle is 1x1 in this case, so we know the sum of the rectangle is X where X is the number from the original matrix at the position of D, so D = C + B - A + X. A is subtracted because the areas belonging to C and B both contain the area of A.
Simply iterating over the matrix and filling every cell using that formula iterates over the matrix only once, and could even be done in-place (replacing the original matrix by its SAT) if your matrix was not read-only.

Error while converting a 3d matrix into an animated gif in Matlab

I am attempting to make a movie from a 3d matrix, which is made multiple 2d matrices and the third dimension is time.
I have read the following question witch is pretty much the same and I have attempted to do the same.
How to make a video from a 3d matrix in matlab
The 3d matrix I want to play is stored in a object instanced A.
a.movie; % 3D matrix
X = permute(a.movie,[1 2 4 3]); % 4D matrix
movie = immovie(X,map); % map is the colormap you want to use
implay(movie);
I would like to know why should a.movie be permuted? And what is the map referred?
How can I define 0 as blue and 100 as red?

The post you linked us to exactly answers that. immovie expects a m x n x 1 x k matrix where m and n are the rows and columns of 1 slice from your 3D matrix, and k is the number of slices. You currently have your 3D matrix set up to be m x n x k. Therefore, by permuting, you are artificially creating a 4D matrix from your 3D original matrix. Simply put, you can think of your 3D matrix as having a singleton 4D dimension: m x n x k x 1. The job of permute here is to swap the 3rd and 4th dimension - that's why you see the [1 2 4 3] vector in the permute call. The first and second dimensions represent the rows and columns, and you leave those empty.
Now that answers the permute question. The map is defined as a colour map. This maps each value in your 3D matrix to a unique colour. Basically, the colour map is a M x 3 matrix where row in this matrix corresponds to a unique colour. Each column represents a colour channel. Therefore, the first column represents the proportion of red you want, the second channel is the proportion of green and the last is the proportion of blue. Keep in mind that these colours should be normalized between [0,1].
The goal of the colour map is to take each value in your 3D matrix, and figure out which colour that value maps to. The way to do this is to use each value in your 3D matrix exactly as it is and use this to access the row of the colour map. This row gives you the colours you want. Now, I'm assuming that your values in the 3D matrix span from 0 to 100.
If you want the colours to span between blue and red. The blue colour has an exact colour of RGB = (0,0,1) assuming normalized coordinates and similarly, the red represents the exact colour of RGB = (1,0,0). Therefore, start off with RGB = (0,0,1), then start linearly increasing the red component while linearly decreasing the blue component until the red is 1 and the blue is 0.
What we can do is figure out how many unique values there are in your matrix, then we can create our colour map that way so we can ensure that each value in your matrix gets assigned to one colour. However, this will require that a.movie be redefined to ensure that we can assign a value to a colour.
Therefore, I'd create your colour map like this:
[unq,~,id] = unique(a.movie);
movie_IDs = reshape(id, size(a.movie));
M = numel(unq);
map = [linspace(1,0,M).', zeros(M,1), linspace(0,1,M).'];
Now, go ahead and use map with the above code to create your movie.
X = permute(movie_IDs,[1 2 4 3]); % 4D matrix
movie = immovie(X,map); % map is the colormap you want to use
implay(movie);
However, the colour map you're looking at is the jet colour map. Therefore, you can simply just create a jet colour map:
map = jet(M);
However, you must make sure you run through each value in a.movie and assign a unique integer to each value to ensure that there are no gaps in your data and every value gets assigned to a new value that goes up from 1 to M in order for the movie to properly access the right colour.
MATLAB has a bunch of built-in colour maps for you to use if you don't feel like designing your own colour map. http://www.mathworks.com/help/matlab/ref/colormap.html#inputarg_map - However, from what I see in your post, making the colour map is what you want to do.

looking for faster way to deal with cell and vector operations

I have a cell list with each elements contains varied number of coordinates to access a vector. For example,
C ={ [1 2 3] , [4 5], [6], [1 8 9 12 20]}
this is just an example, in real case, C is of 10^4 to 10^6 size, each element contains a vector of 1 to 1000 elements. I need to use each element as coordinates to access the corresponding elements in a vector. I am using a loop to find the mean value of vector elements specified by the cell elements
for n=1:size(C,1)
x = mean(X(C{n}));
% put x to somewhere
end
here X is the big vector of 10000 elements. Using the loop is ok but I am wondering if any way to do the same thing but without using a loop? The reason I am asking is above code need to be run for so many times and it is quite slow now to use a lopp.

Approach #1
C_num = char(C{:})-0; %// 2D numeric array from C with cells of lesser elements
%// being filled with 32, which is the ascii equivalent of space
mask = C_num==32; %// get mask for the spaces
C_num(mask)=1; %// replace the numbers in those spaces with ones, so that we
%// can index into x witout throwing any out-of-extent error
X_array = X(C_num); %// 2D array obtained after indexing into X with C_num
X_array(mask) = nan; %// set the earlier invalid space indices with nans
x = nanmean(X_array,2); %// final output of mean values neglecting the nans
Approach #2
lens = cellfun('length',C); %// Lengths of each cell in C
maxlens = max(lens); %// max of those lengths
%// Create a mask array with no. of rows as maxlens and columns as no. of cells.
%// In each column, we would put numbers from each cell starting from top until
%// the number of elements in that cell. The ones(true) in this mask would be the
%// ones where those numbers are to be put and zeros(false) otherwise.
mask = bsxfun(#le,[1:maxlens]',lens) ; %//'
C_num = ones(maxlens,numel(lens)); %// An array where the numbers from C are to be put
C_num(mask) = [C{:}]; %// Put those numbers from C in C_num.
%// NOTE: For performance you can also try out: double(sprintf('%s',C{:}))
X_array = X(C_num); %// Get the corresponding X elements
X_array(mask==0) = nan; %// Set the invalid locations to be NaNs
x = nanmean(X_array); %// Get the desired output of mean values for each cell
Approach #3
This would be almost same as approach #2, but with some changes at the end to avoid nanmean.
Thus, edit the last two lines from approach #2, to these -
X_array(mask1==0) = 0;
x = sum(X_array)./lens;

RGB histogram using bitshift in matlab

I'm trying to create a mozaic image in Matlab. The database consists of mostly RGB images but also some gray scale images.
I need to calculate the histograms - like in the example of the Wikipedia article about color histograms - for the RGB images and thought about using the bitshift operator in Matlab to combine the R,G and B channels.
nbins = 4;
nbits = 8;
index = bitshift(bitshift(image(:,:,1), log2(nbins)-nbits), 2*log2(nbins)) + ...
+ bitshift(bitshift(image(:,:,2), log2(nbins)-nbits), log2(nbins)) + ...
+ bitshift(image(:,:,3), log2(nbins)-nbits) + 1;
index is now a matrix of the same size as image with the index to the corresponding bin for the pixel value.
How can I sum the occurences of all unique values in this matrix to get the histogram of the RGB image?
Is there a better approach than bitshift to calculate the histogram of an RGB image?

Calculating Indices
The bitshift operator seems OK to do. Me what I would personally do is create a lookup relationship that relates RGB value to bin value. You first have to figure out how many bins in each dimension that you want. For example, let's say we wanted 8 bins in each channel. This means that we would have a total of 512 bins all together. Assuming we have 8 bits per channel, you would produce a relationship that creates an index like so:
% // Figure out where to split our bins
accessRed = floor(256 / NUM_RED_BINS);
accessGreen = floor(256 / NUM_GREEN_BINS);
accessBlue = floor(256 / NUM_BLUE_BINS);
%// Figures out where to index the histogram
redChan = floor(red / accessRed);
greenChan = floor(green / accessGreen);
blueChan = floor(blue / accessBlue);
%// Find single index
out = 1 + redChan + (NUM_RED_BINS)*greenChan + (NUM_RED_BINS*NUM_GREEN_BINS)*blueChan;
This assumes we have split our channels into red, green and blue. We also offset our indices by 1 as MATLAB indexes arrays starting at 1. This makes more sense to me, but the bitshift operator looks more efficient.
Onto your histogram question
Now, supposing you have the indices stored in index, you can use the accumarray function that will help you do that. accumarray takes in a set of locations in your array, as well as "weights" for each location. accumarray will find the corresponding locations as well as the weights and aggregate them together. In your case, you can use sum. accumarray isn't just limited to sum. You can use any operation that provides a 1-to-1 relationship. As an example, suppose we had the following variables:
index =
1
2
3
4
5
1
1
2
2
3
3
weights =
1
1
1
2
2
2
3
3
3
4
4
What accumarray will do is for each value of weights, take a look at the corresponding value in index, and accumulate this value into its corresponding slot.
As such, by doing this you would get (make sure that index and weights are column vectors):
out = accumarray(index, weights);
out =
6
7
9
2
2
If you take a look, all indices that have a value of 1, any values in weights that share the same index of 1 get summed into the first slot of out. We have three values: 1, 2 and 3. Similarly, with the index 2 we have values of 1, 3 and 3, which give us 7.
Now, to apply this to your application, given your code, your indices look like they start at 1. To calculate the histogram of your image, all we have to do is set all of the weights to 1 and use accumarray to accumulate the entries. Therefore:
%// Make sure these are column vectors
index = index(:);
weights = ones(numel(index), 1);
%// Calculate histogram
h = accumarray(index, weights);
%// You can also do:
%// h = accumarray(index, 1); - This is a special case if every value
%// in weights is the same number
accumarray's behaviour by default invokes sum. This should hopefully give you what you need. Also, should there be any indices that are missing values, (for example, suppose the index of 2 is missing from your index matrix), accumarray will conveniently place a zero in this location when you aggregate. Makes sense right?
Good luck!

Positions on grid 'used'

I have a puzzle to solve which involves taking input which is size of grid. Grid is always square. Then a number of points on the grid are provided and the squares on the grid are 'taken' if they are immediately left or right or above or below.
Eg imagine a grid 10 x 10. If points are (1,1) bottom left and (10,10) top right, then if a point (2,1) is given then square positions left and right (10 squares) and above and below (another 9 squares) are taken. So using simple arithmetic, if grid is n squared then n + (n-1) squares will be taken on first point provided.
But it gets complicated if other points are provided as input. Eg if next point is eg (5,5) then another 19 squares will be 'taken' minus thos squares overlapping other point. so it gets complex. and of course a point say (3,1) could be provided which overlaps more.
Is there an algorithm for this type of problem?
Or is it simply a matter of holding a 2 dimensional array and placing an x for each taken square. then at end just totting up taken (or non-taken) squares. That would work but I was wndering if there is an easier way.

Keep two sets: X (storing all x-coords) and Y (storing all y-coords). The number of squares taken will be n * (|X| + |Y|) - |X| * |Y|. This follows because each unique x-coord removes a column of n squares, and each unique y-coord removes a row of n squares. But this counts the intersections of the removed rows and columns twice, so we subtract |X| * |Y| to account for this.

One way to do it is to keep track of the positions that are taken in some data structure, for example a set.
At the first step this involves adding n + (n - 1) squares to that data structure.
At the second (third, fourth) step etc this involves checking for each square at the horizontal and vertical line for the given (x, y) whether it's already in the data structure. If not then you add it to the data structure. Otherwise, if the point is already in there, then it was taken in an earlier step.
We can actually see that the first step is just a special case of the other rounds because in the first round no points are taken yet. So in general the algorithm is to keep track of the taken points and to add any new ones to a data structure.
So in pseudocode:
Create a data structure taken_points = empty data structure (e.g., a set)
Whenever you're processing a point (x, y):
Set a counter = 0.
Given a point (x, y):
for each point (px, py) on the horizontal and vertical lines that intersect with (x, y):
check if that point is already in taken_points
if it is, then do nothing
otherwise, add (px, py) to taken_points and increment counter
You've now updated taken_points to contain all the points that are taken so far and counter is the number of points that were taken in the most recent round.

Here is the way to do it without using large space:-
rowVisited[n] = {0}
colVisited[n] = {0}
totalrows = 0 and totalcol = 0 for total rows and columns visited
total = 0; // for point taken for x,y
given point (x,y)
if(!rowVisited[x]) {
total = total + n - totalcol;
}
if(!colVisited[y]) {
total = total + n-1 - totalrows + rowVisited[x];
}
if(!rowVisited[x]) {
rowVisited[x] = 1;
totalrows++;
}
if(!colVisited[x]) {
colVisited[x] = 1;
totalcol++;
}
print total

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Separate labeled data - algorithm

Related

Is there a way to create a summed-are table of a matrix with just one iteration?

Error while converting a 3d matrix into an animated gif in Matlab

looking for faster way to deal with cell and vector operations

RGB histogram using bitshift in matlab

Positions on grid 'used'

Categories

Resources