matlab for loop: fastest and most efficient method to reproduce large matrix - performance

My data is a 2096x252 matrix of double values. I need a for loop or an equivalent which performs the following:
Each time the matrix is reproduced the first array is deleted and the second becomes the first. When the loop runs again, the remaining matrix is reproduced and the first array is deleted and the next becomes the first and so on.
I've tried using repmat but it is too slow and tedious when dealing with large matrices (2096x252).
Example input:
1 2 3 4
3 4 5 6
3 5 7 5
9 6 3 2
Desired output:
1 2 3 4
3 4 5 6
3 5 7 5
9 6 3 2
3 4 5 6
3 5 7 5
9 6 3 2
3 5 7 5
9 6 3 2
9 6 3 2

Generally with Matlab it is much faster to pre-allocate a large array than to build it incrementally. When you know in advance the final size of the large array there's no reason not to follow this general advice.
Something like the following should do what you want. Suppose you have an array in(nrows, ncols); then
indices = [0 nrows:-1:1];
out = zeros(sum(indices),ncols);
for ix = 1:nrows
out(1+sum(indices(1:ix)):sum(indices(1:ix+1)),:) = in(ix:end,:);
end
This worked on your small test input. I expect you can figure out what is going on.
Whether it is the fastest of all possible approaches I don't know, but I expect it to be much faster than building a large matrix incrementally.

Disclaimer:
You'll probably have memory issues with large matrices, but that is not the question.
Now, to the business:
For a given matrix A, the straightforward approach with the for loop would be:
[N, M] = size(A);
B = zeros(sum(1:N), M);
offset = 1;
for i = 1:N
B(offset:offset + N - i, :) = A(i:end, :);
offset = offset + size(A(i:end, :), 1);
end
B is the desired output matrix.
However, this solution is expected to be slow as well, because of the for loop.
Edit: preallocated B instead of dynamically changing size (this optimization should achieve a slight speedup).

Related

Can you check for duplicates by taking the sum of the array and then the product of the array?

Let's say we have an array of size N with values from 1 to N inside it. We want to check if this array has any duplicates. My friend suggested two ways that I showed him were wrong:
Take the sum of the array and check it against the sum 1+2+3+...+N. I gave the example 1,1,4,4 which proves that this way is wrong since 1+1+4+4 = 1+2+3+4 despite there being duplicates in the array.
Next he suggested the same thing but with multiplication. i.e. check if the product of the elements in the array is equal to N!, but again this fails with an array like 2,2,3,2, where 2x2x3x2 = 1x2x3x4.
Finally, he suggested doing both checks, and if one of them fails, then there is a duplicate in the array. I can't help but feel that this is still incorrect, but I can't prove it to him by giving him an example of an array with duplicates that passes both checks. I understand that the burden of proof lies with him, not me, but I can't help but want to find an example where this doesn't work.
P.S. I understand there are many more efficient ways to solve such a problem, but we are trying to discuss this particular approach.
Is there a way to prove that doing both checks doesn't necessarily mean there are no duplicates?
Here's a counterexample: 1,3,3,3,4,6,7,8,10,10
Found by looking for a pair of composite numbers with factorizations that change the sum & count by the same amount.
I.e., 9 -> 3, 3 reduces the sum by 3 and increases the count by 1, and 10 -> 2, 5 does the same. So by converting 2,5 to 10 and 9 to 3,3, I leave both the sum and count unchanged. Also of course the product, since I'm replacing numbers with their factors & vice versa.
Here's a much longer one.
24 -> 2*3*4 increases the count by 2 and decreases the sum by 15
2*11 -> 22 decreases the count by 1 and increases the sum by 9
2*8 -> 16 decreases the count by 1 and increases the sum by 6.
We have a second 2 available because of the factorization of 24.
This gives us:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24
Has the same sum, product, and count of elements as
1,3,3,4,4,5,6,7,9,10,12,13,14,15,16,16,17,18,19,20,21,22,22,23
In general you can find these by finding all factorizations of composite numbers, seeing how they change the sum & count (as above), and choosing changes in both directions (composite <-> factors) that cancel out.
I've just wrote a simple not very effective brute-force function. And it shows that there is for example
1 2 4 4 4 5 7 9 9
sequence that has the same sum and product as
1 2 3 4 5 6 7 8 9
For n = 10 there are more such sequences:
1 2 3 4 6 6 6 7 10 10
1 2 4 4 4 5 7 9 9 10
1 3 3 3 4 6 7 8 10 10
1 3 3 4 4 4 7 9 10 10
2 2 2 3 4 6 7 9 10 10
My write-only c++ code is here: https://ideone.com/2oRCbh

Append matrix to another matrix in Matlab

I have a matrix M=[4 3 2 1;1 2 3 4]. I want to append different size matrices at each iteration:
M=[4 3 2 1;1 2 3 4];
for i=1:t
newM=createNewMatrix;
M=[M;newM];
end
newM can be [] or a Nx4 matrix. This is very slow though. What is the fastest way to do this?
Update
Pre-allocating would look like this?
M=zeros(200000,4)
start=1
M(1:2,:)=M=[4 3 2 1;1 2 3 4];
for i=1:t
newM=createNewMatrix;
size_of_newM=size(newM,1);
finish=start+size_of_newM-1;
M(start:finish,:)=newM;
start=finish;
end
Like suggested, preallocation gives the most boost.
Using cell arrays is another good approach and could be implemented like this:
M = cell(200000, 1);
M{1} = [4 3 2 1; 1 2 3 4];
for t=2:200000
i = randi(3)-1;
M{t}=rand(i,4);
end
MC = vertcat(M{:});
In principle you generate a cell array with arbitrary long arrays in each cell and then concatenate them afterwards.
This worked for me nearly twice as fast as your preallocation update. On the other hand, this still was only around one second for the example with 200k iterations...

cut signal (interlinked) and compute theses parts

I'm a French student engineer in signal processing field and I do my internship in a neuroscience laboratory. I've to process a lot of data from the brain activity with the help of Matlab, so one of my main subject is to optimize the code. But now I'm stuck in a situation that I can't resolve and I don't find anything about it on the web. I explain my problem :
For example the matrix a :
a = [ 1 2 3 4 5; 6 7 8 9 10;11 12 13 14 15]
Each row is the datas of a signal (so here we have 3 signals), and I want, for each signal/row, cut the vector in blcoks interlinked of the same length.
For instance for the signal 1, S1 = [1 2 3 4 5] I want to extract the bloc S1_1 = [1 2 3], S1_2 = [2 3 4], and S1_3 = [3 4 5] and compute every sub-block.
My first idea was to use nested loop like that :
[nrow ncol] = size(a);
for i = 1 : nrow
for j = 4 : ncol
sub_block = a(i, (j-3):j);
result(i, j-3) = compute(sub_block);
end
end
BUT as I said I have to process a lot of datas, so I want to avoid for-loop. I'm looking for an algorithm wich will be able to remove these for-loop but I don't know how to do...
I saw the function 'reshape' but this on give me sub-block like : S1_1 = [1 2 3], S1_2 = [4 5 6] I can't use it because here in the sub-block S1_2 I have the data from the signal 1 and the signal 2.
Then I saw the function 'blockproc' but I didn't really understand how it process and I'm not really convaince that this one can help me...
So, I hope you understand my problem and that you could help me or indicate me a way to find a solution.
In addition to #Ziyao Wei's suggestion you could alternatively use im2col:
>> S = im2col(a', [3 1])
S =
1 2 3 6 7 8 11 12 13
2 3 4 7 8 9 12 13 14
3 4 5 8 9 10 13 14 15
Where S(:, 3*k-2:3*k) for k = 1:data_rows are the desired sub-blocks of row k of your data (a).
Blockproc seems to be doing a block operation rather than a sliding operation. A bit of digging around gives this:
http://www.mathworks.com/help/images/ref/nlfilter.html
But it seems the image processing toolbox is needed.
Also this might help:
http://dovgalecs.com/blog/matlab-sliding-window-easy-and-painless/
In general, try to search for sliding-window or convolution, and try to see if something shows up.
You could probably find another way of doing your loop with the arrayfun function, but the fact is that it might not necessarily be faster arrayfun can be significantly slower than an explicit loop in matlab. Why?
Thank you a lot for all your (quick) answers ! I didn't expect to get an answer so quickly !
I have the image processing toolbox and your different methods are greats ! I'll have to use the im2col because is the "slower" solution for me and I can remove one for-loop.
Thank you for your help

effective way of transformation from 2D to 1D vector

i want to create 1D vector in matlab from given matrix,for this i have implemented following algorithm ,which use trivial way
% create one dimensional vector from 2D matrix
function [x]=one_dimensional(b,m,n)
k=1;
for i=1:m
for t=1:n
x(k)=b(i,t);
k=k+1;
end
end
x;
end
when i run it using following example,it seems to do it's task fine
b=[2 1 3;4 2 3;1 5 4]
b =
2 1 3
4 2 3
1 5 4
>> one_dimensional(b,3,3)
ans =
2 1 3 4 2 3 1 5 4
but generally i know that,arrays are not good way to use in matlab,because it's performance,so what should be effective way for transformation matrix into row/column vector?i am just care about performance.thanks very much
You can use the (:) operator...But it works on columns not rows so you need to transpose using the 'operator before , for example:
b=b.';
b(:)'
ans=
2 1 3 4 2 3 1 5 4
and I transposed again to get a row output (otherwise it'll the same vector only in column form)
or also, this is an option (probably a slower one):
reshape(b.',1,[])

trying to find the lowest average height in this .dat file of numbers

Im trying to fit a swimming pool onto this piece of terrain. The terrain is the first index (10x10 in this case) and the last index is the size the pool will be(2x2).
ive figured out how to read in the terrain and get the mean and standard deviation of it but now i need to find the lowest average height. I know i need to use a while loop but I dont know how to go about this can anyone help me ?
10
1 1 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 12 12 12
1 2 3 4 5 6 7 12 12 12
1 2 3 4 5 6 7 12 12 12
21
Here are two answers showing different styles. The first is faster (only important for HUGE terrain sizes), but less "Ruby-esque"; the second is more functional, but creates extra intermediary data. For your own best education, I encourage you to ensure that you understand these thoroughly, and choose how to proceed in a way that is best for you.
Also, I've assumed that the 21 you have in your question is a mistake, and you meant to have a 2 there.
First, both solutions start with the same code that creates an array of arrays for the terrain:
# Load the text file as an array of strings
lines = IO.readlines('pool.txt')
# Turn it into an array of arrays of numbers
terrain = lines.map{ |s| s.scan(/\d+/).map(&:to_i) }
# Throw out the silly grid size; we'll infer it from real data instead!
terrain.shift
# Take the last line (pool size) out of the terrain
pool_size = terrain.pop.first
The first solution walks through the terrain and calculates the average for each sub-grid, keeping track of the lowest number:
# For fun, we'll allow terrain that doesn't have to be square
rows = terrain.length
cols = terrain.first.length
best_size = Float::INFINITY
0.upto(rows-pool_size-1) do |y|
0.upto(cols-pool_size-1) do |x|
# x,y is the upper left corner of a valid pool_size × pool_size grid
average = 0.0
0.upto(pool_size-1) do |m|
0.upto(pool_size-1) do |n|
# Add up each point in the sub-grid
average += terrain[y+n][x+m]
end
end
# The number of points we added is the square of the size
average /= (pool_size*pool_size)
# Mark this as the best seen so far
best_size = average if average < best_size
end
end
p best_size
#=> 1.25
The second solution finds all the sub-grids, and then uses the Enumerable#min_by method to find the best. We also create a method for calculating the average on an array of numbers, just for fun and more self-describing code:
# See http://ruby-doc.org/stdlib-1.9.3/libdoc/matrix/rdoc/Matrix.html
require 'matrix'
class Matrix
# Average all values in the array (as a float)
def average
parts = to_a.flatten
parts.inject(:+) / parts.length.to_f
end
end
# Hey look, a nice 2D grid of elevations!
terrain = Matrix[ *terrain ]
# Create an array of matrices, each one representing a possible pool
rows = 0...(terrain.row_size - size)
cols = 0...(terrain.column_size - size)
pools = rows.flat_map{|x| cols.map{ |y| terrain.minor(x,size,y,size) } }
# Find the lowest pool by calling the above 'average' method on each
lowest = pools.min_by(&:average)
p lowest, lowest.average
#=> Matrix[[1, 1], [1, 2]]
#=> 1.25
On my computer the simple array-of-arrays method takes ~0.6s to find the lowest 3x3 pool in a random 400×400 terrain, while the matrix technique takes ~1.3s. So the matrix style is more than twice as slow, but still plenty fast for your assignment. :)
It's Ruby. You probably want to use iterators, not while loops.
But do your own homework. You'll learn more.

Resources