LU decomposing a square matrix matlab gauss elimination - algorithm

I'm trying to create a program that takes a square (n-by-n) matrix as input, and if it is invertible, will LU decompose the matrix using Gaussian Elimination.
Here is my problem: in class we learned that it is better to change rows so that your pivot is always the largest number (in absolute value) in its column. For example, if the matrix was A = [1,2;3,4] then switching rows it is [3,4;1,2] and then we can proceed with the Gaussian elimination.
My code works properly for matrices that don't require row changes, but for ones that do, it does not. This is my code:
function newgauss(A)
P=eye(rows,columns); %P is permutation matrix
if(det(A)==0) %% determinante is 0 means no single solution
disp('No solutions or infinite number of solutions')
maxi=0;%%find maximum abs value in column pivot
for i=pivot+1:rows
end %%if needed then switch
end %%Grade the column pivot using gauss elimination
for i=pivot+1:rows
disp('PA is:');
disp('LU is:');
Clarification: since we are switching rows, we are looking to decompose P (permutation matrix) times A, and not the original A that we had as input.
Explanation of the code:
First I check if the matrix is invertible, if it isn't, stop. If it is, pivot is (1,1)
I find the largest number in column 1, and switch rows
Grade column 1 using Gaussian elimination, turning all but the spot (1,1) to zero
Pivot is now (2,2), find largest number in column 2... Rinse, repeat

Your code seems to work fine from what I can tell, at least for the basic examples A=[1,2;3,4] or A=[3,4;1,2]. Change your function definition to:
function [L,U,P] = newgauss(A)
so you can output your calculated values (much better than using disp, but this shows the correct results too). Then you'll see that P*A = L*U. Maybe you were expecting L*U to equal A directly? You can also confirm that you are correct via Matlab's lu function:
[L,U,P] = lu(A);
Permutation matrices are orthogonal matrices, so P−1 = PT. If you want to get back A in your code, you can do:
Similarly, using Matlab's lu with the permutation matrix output, you can do:
[L,U,P] = lu(A);
(You should also use error or warning rather than how you're using disp in checking the determinant, but they probably don't teach that.)

Note that the det function is implemented using an LU decomposition itself to compute the determinant... recursive anyone :)
Aside from that, there is a reminder towards the end of the page which suggest using cond instead of det to test for matrix singularity:
Testing singularity using abs(det(X)) <= tolerance is not
recommended as it is difficult to choose the correct tolerance. The
function cond(X) can check for singular and nearly singular
COND uses the singular value decomposition (see its implementation: edit cond.m)

For anyone finding this in the future and needing a working solution:
The OP's code doesn't contain the logic for switching elements in L when creating the permutation matrix P. The adjusted code that gives the same output as Matlab's lu(A) function is:
function [L,U,P] = newgauss(A)
P=eye(rows,columns); %P is permutation matrix
tol = 1E-16; % I believe this is what matlab uses as a warning level
if( rcond(A) <= tol) %% bad condition number
error('Matrix is nearly singular')
maxi=0;%%find maximum abs value in column pivot
for i=pivot+1:rows
end %%if needed then switch
% change elements in L-----
if pivot >= 2
end %%Grade the column pivot using gauss elimination
for i=pivot+1:rows
Hope this helps someone stumbling upon this in the future.


I'm trying to simulate the heat distribution on an infinite plate over time. For this purpose, I've wrote a Scilab script. Now, the crucial point of it, is calculation of temperature for all plate points, and it has to be done for every time instance I want to observe:
for j=2:S-1
for i=2:S-1
heat(i, j) = tcoeff*10000*(plate(i-1,j) + plate(i+1,j) - 4*plate(i,j) + plate(i, j-1) + plate(i, j+1)) + plate(i,j);
The problem is, that, if I'd like to do it for a 100x100 points plate, it means, that here (it's only for inner part, without boundary conditions), I would have to loop 98x98 = 9604 times, at every turn calculating the heat at a given i,j point. If I'd like to observe that for, say 100 secons, with a 1 s step, I have to repeat it 100 times, giving 960,400 iterations in total. Which takes quite a long time, and I'd like to avoid it. Up to 50x50 plate, it all happens in a reasonable, 4-5 seconds time frame.
Now my question is - is it necessary to do all this using for loops? Is there any built-in aggregate function in Scilab, that will let me do this for all elements of a matrix? The reason I haven't found a way yet, is that the result for every point depends on the values of other matrix points, and that made me do it with nested loops. Any ideas on how to make it faster appreciated.
It seems to me that you want to compute a 2D intercorrelation of your heat field and a certain diffusion pattern. This pattern can be thought as a "filter" kernel, which is a common way to modify images with a linear filter matrix. Your "filter" is:
If you install the Image Processing Toolbox (IPD) you will have a MaskFilter function to do this 2D intercorrelation.
//your solution with nested for loops
for j=2:S-1
for i=2:S-1
heat(i, j) = tcoeff*10000*(plate(i-1,j)+plate(i+1,j)-..
4*plate(i,j)+plate(i,j-1)+plate(i, j+1))+plate(i,j);
mprintf("\nNested for loops: %f s (100 %%)",T0);
//optimised nested for loop
F=[0,1,0;1,-4,1;0,1,0]; //"filter" matrix
for j=2:S-1
for i=2:S-1
mprintf("\nNested for loops optimised: %f s (%.2f %%)",T2,T2/T0*100);
//MaskFilter from IPD toolbox
mprintf("\nWith MaskFilter: %f s (%.2f %%)",T3,T3/T0*100);
disp(heat3(1:10,1:10)-heat(1:10,1:10),"Difference of the results (heat3-heat):");
Please note, that MaskFilter pads the image (the original matrix) before applying the filter, and as far as I know it uses a "mirror" array across the border. You should check whether this behaviour is appropriate for you or not.
The speed increase is about *320 (the execution time is 0.32% of your original code). Is that fast enough?
In theory it could be done with two 2D Fourier Transform (with Scilab builtin mfft maybe) but it might not be faster than this. See here:
Please consider that there is a big difference between vectorizing an operation and parallel computation, as I have explained here. Although vectorizing might improve performance a little bit, that's not comparable to what you can achive through GPU computing for example (e.g. OpenCL). I will try to explain a vectorized form of your code without going too much into the details. Consider these as given:
S = ...;
tcoeff = ...;
function Plate = plate(i, j)
function Heat = heat(i, j)
Now you could define a meshgrid:
x = 2 : S - 1;
y = 2 : S - 1;
[M, N] = meshgrid(x,y);
Result = feval(M, N, heat);
The feval is the key here which will broadcast the feval function over the M and N matrices.
Your scheme is a finite differences scheme of the Laplacian operator applied to a rectangular grid. If you choose a row-wise or column-wise numbering of your degrees of freedom (here the plate(i,j)) in order to treat them as vectors, then applying your "discrete" Laplacian can be done by multiplying a sparse matrix on the left (it is very fast) This is particularly well explained in the following document:
The implementation is described in Matlab but is easily translated in Scilab.

I have a scalar function f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2) which receives two 2-dimensional vectors as input (norm here implements the Euclidean norm). The values of x,i range in 1:w and the values y,j range in 1:h. I want to create a cell array X such that X{x,y} will contain a w x h matrix such that X{x,y}(i,j) = f([x,y],[i,j]). This can obviously be done using 4 nested loops like so:
for x=1:w;
for y=1:h;
for i=1:w
for j=1:h
This is however extremely inefficient. I would very much appreciate an efficient way to create X.
The one way to do this is to remove the 2 innermost loops and replace then with a vectorised version. By the look of your f function this shouldn't be too bad
First we need to construct two matrices containing the 1 to w on every row and 1 to h on every column like so
This is going to represent the inner two loops, and the transpose will allow us to get all combinations. Now we can vectorise the calculation (f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2)):
for x=1:w;
for y=1:h;
Where we have computed the Euclidean norm for all pairs of nodes in the inner loops at once.
Some discussion and code
The trick here is to perform the norm-calculations with numeric arrays and save the results into a cell array version as late as possible. For performing the norm-calculations you can take help of ndgrid, bsxfun and some permute + reshape to give it the "shape" as needed for the final cell array version. So, here's the vectorized approach to perform these tasks -
%// Create x-y/i-j values to be used for calculation of function values
[xi,yi] = ndgrid(1:w,1:h);
%// Get the norm values
normvals = sqrt(bsxfun(#minus,xi(:),xi(:).').^2 + ...
%// Get the actual function values
vals = exp(-normvals.^2/sigma^2);
%// Get the values into blocks of a 4D array and then re-arrange to match
%// with the shape of numeric array version of X
blks = reshape(permute(reshape(vals, w*h, h, []), [2 1 3]), h, w, h, w);
arranged_blks = reshape(permute(blks,[2 3 1 4]),w,h,w,h);
%// Finally get the cell array version
X = squeeze(mat2cell(arranged_blks,w,h,ones(1,w),ones(1,h)));
Benchmarking and runtimes
After improving the original loopy code with pre-allocation for X and function-inling f, runtime-benchmarks were performed with it against the proposed vectorized approach with datasizes as w, h = 60 and the runtime results thus obtained were -
----------- With Improved loopy code
Elapsed time is 41.227797 seconds.
----------- With Vectorized code
Elapsed time is 2.116782 seconds.
This suggested a whooping close to 20x speedup with the proposed solution!
For extremely huge datasizes
If you are dealing with huge datasizes, essentially you are not giving enough memory for bsxfun to work with, and bsxfun is known to use up a lot of memory for giving you a performance-efficient vectorized solution. So, for such huge-datasize cases, you can use the following loopy approach to replace normvals calculations that was listed in the earlier bsxfun based solution -
%// Get the norm values
nx = numel(xi);
normvals = zeros(nx,nx);
for ii = 1:nx
normvals(:,ii) = sqrt( (xi(:) - xi(ii)).^2 + (yi(:) - yi(ii)).^2 );
It seems to me that when you run through the cycle for x=w, y=h, you are calculating all the values you need at once. So you don't need recalculate them. Once you have this:
for i=1:w
for j=1:h
Then, e.g. X{1,1} is just temp(1,1), X{2,2} is just temp(1:2,1:2), and so on. If you can vectorise the calculation of f (norm here is just the Euclidean norm of that vector?) then it will get even simpler.

I have a nxn singular matrix. I want to add k rows (which must be from the standard basis e1, e2, ..., en) to this matrix such that the new (n+k)xn matrix is full column rank. The number of added rows k must be minimum and they can be added in any order (not just e1, e2 ,..., it can be e4, e10, e1, ...) as long as k is minimum.
Does anybody know a simple way to do this? Any help is appreciated.
You can achieve this by doing a QR decomposition with column pivoting, then taking the transpose of the last n-rank(A) columns of the permutation matrix.
In matlab, this is achieved by the qr function(See the matlab documentation here):
Each row of transpose(E(:,end-r+1:end)) will be a member of standard basis, rank of newA will be n, and this is also the minimal number of standard basis you will need to do so.
Here is how this works:
QR decomposition with column pivoting is a standard procedure to decompose a matrix A into products:
where Q is an orthogonal matrix if A is real, or an unitary matrix if A is complex; R is upper triangular matrix, and E is a permutation matrix.
In short, the permutations are chosen so that the diagonal elements are larger than the off-diagonals in the same row, and that size of the diagonal elements are non-increasing. More detailed description can be found on the netlib QR factorization page.
Since Q and E are both orthogonal (or unitary) matrices, the rank of R is the same as the rank of A. To bring up the rank of A, we just need to find ways to increase the rank of R; and this is much more straight forward thanks to the structure of R as the result of pivoting and the fact that it is upper-triangular.
Now, with the requirement placed on pivoting procedure, if any diagonal element of R is 0, the entire row has to be 0. The n-rank(A) rows of 0s in the bottom if R is responsible for the nullity. If we replace the lower right corner with an identity matrix, the that new matrix would be full rank. Well, we cannot really do the replacement, but we can append the rows matrix to the bottom of R and form a new matrix that has the same rank:
B==[ 0 I ] => newR=[ R ; B ]
Here the dimensionality of I is the nullity of A and that of R.
It is readily seen that rank(newR)=n. Then we can also define a new unitary Q matrix by expanding its dimensionality in a trivial manner:
newQ=[Q 0 ; 0 I]
With that, our new rank n matrix can be obtained as
newA=newQ*newR.transpose(E)=[Q*R ; B ]*transpose(E) =[A ; B*transpose(E)]
Note that B is [0 I] and E is a permutation matrix, so B*transpose(E) is simply the transpose
of the last n-rank(A) columns of E, and thus a set of rows made of standard basis, and that's just what you wanted!
Is n very large? The simplest solution without using any math would be to try adding e_i and seeing if the rank increases. If it does, keep e_i. proceed until finished.
I like #Xiaolei Zhu's solution because it's elegant, but another way to go (that's even more computationally efficient is):
Determine if any rows, indexed by i, of your matrix A are all zero. If so, then the corresponding e_i must be concatenated.
After that process, you can simply concatenate any subset of the n - rank(A) columns of the identity matrix that you didn't add in step 1.
rows/cols from Identity matrix can be added in any order. it does not need to be added in usual order as e1,e2,... in general situation for making matrix full rank.

I am a new student learning to use Matlab.
Could anyone please tell me is there a faster way possibly without loops:
to assign for each row only two values 1, -1 into different positions of a big sparse matrix.
My code to build a bimatrix or bibimatrix for the MILP problem of condition :
f^k_{ij} <= y_{ij} for every arc (i,j) and all k ~=r; in a multi-commodity flow model.
Naive approach:
% create each row and then add to bimatrix
newrow4= zeros(1,n*(n+1)^2);
for k=1:n
for i=0:n
for j=1: n
if j~=i
%change value of some positions to -1 and 1
% add to bimatrix
bimatrix=[bimatrix; newrow4];
% change newrow4 back to zeros row.
% Generate the big sparse matrix first.
bibimatrix=zeros(n^3 ,n*(n+1)^2);
for k=1:n
for i=0:n
for j=1: n
if j~=i
%Change 2 positions in each row to -1 and 1 in each row.
With these above code in Matlab, the time to generate this matrix, with n~12, is more than 3s. I need to generate a larger matrix in less time.
Thank you.
Suggestion: Use sparse matrices.
You should be able to create two vectors containing the column number where you want your +1 and -1 in each row. Let's call these two vectors vec_1 and vec_2. You should be able to do this without loops (if not, I still think the procedure below will be faster).
Let the size of your matrix be (max_row X max_col). Then you can create your matrix like this:
bibimatrix = sparse(1:max_row,vec_1,1,max_row,max_col);
bibimatrix = bibimatrix + sparse(1:max_row, vec_2,-1,max_row,max_col)
If you want to see the entire matrix (which you don't, since it's huge) you can write: full(bibimatrix).
You may also do it this way:
col_vec = [vec_1, vec_2];
row_vec = [1:max_row, 1:max_row];
s = [ones(1,max_row), -1*ones(1,max_row)];
bibimatrix = sparse(row_vec, col_vec, s, max_row, max_col)
Disclaimer: I don't have MATLAB available, so it might not be error-free.

Wikipedia says we can approximate Bark scale with the equation:
b(f) = 13*atan(0.00076*f)+3.5*atan(power(f/7500,2))
How can I divide frequency spectrum into n intervals of the same length on Bark scale (interval division points will be equidistant on Bark scale)?
The best way would be to analytically inverse function (express x by function of y). I was trying doing it on paper but failed. WolframAlpha search bar couldn't do it also. I tried Octave finverse function, but I got error.
Octave says (for simpler example):
octave:2> x = sym('x');
octave:3> finverse(2*x)
error: `finverse' undefined near line 3 column 1
This is finverse description from Matlab:
There could be also numerical way to do it. I can imagine that you just start from dividing the y axis equally and search for ideal division by binary search. But maybe there are some existing tools that do it?
You need to numerically solve this equation (there is no analytical inverse function). Set values for b equally spaced and solve the equation to find the various f. Bissection is somewhat slow but a very good alternative is Brent's method. See
This function can't be inverted analytically. You'll have to use some numerical procedure. Binary search would be fine, but there are more efficient ways to do these sorts of things: look into root-finding algorithms. You can apply your algorithm of choice to the equation b(f) = f_n for each of the frequency interval endpoints f_n.
Just so you know, in (say) octave to implement rpsmi's or David Zaslavsky's answer, you'd do something like this:
global x0 = 0.
function res = b(f)
global x0
res = 13*atan(0.00076*f)+3.5*atan(power(f/7500,2)) - x0
function [intervals, barks] = barkintervals(left, right, n)
global x0
intervals = linspace(left, right, n);
barks = intervals;
for i = 1:n
x0 = intervals(i);
# 125*x0 is just a crude guess starting point given the values
[barks(i), fval, info] = fsolve('b', 125*x0);
and run it like so:
octave:1> barks
octave:2> [i,bx] = barkintervals(0, 10, 10)
[... lots of output from fsolve deleted...]
i =
Columns 1 through 8:
0.00000 1.11111 2.22222 3.33333 4.44444 5.55556 6.66667 7.77778
Columns 9 and 10:
8.88889 10.00000
bx =
Columns 1 through 6:
0.0000e+00 1.1266e+02 2.2681e+02 3.4418e+02 4.6668e+02 5.9653e+02
Columns 7 through 10:
7.3639e+02 8.8960e+02 1.0605e+03 1.2549e+03
I finally decided not to use the Bark values approximation but ideal values for critical bands centres (defined for n=1..24). I plotted them with gnuplot and on the same graph I plotted arbitrarily chosen values for points of greater density (for the required n>24). I adjusted the points values in Hz till the the both curves were approximately the same.
Of course rpsmi and David Zaslavsky answers are more general and scalable.
