I am trying to write a function that produces a single solution to an underrepresented system of equations (e.g. the matrix that describes the system is wider than it is tall). In order to do this, I have been looking in the LAPACK documentation for a way to row-reduce a matrix to it's reduced-echelon form, similar to the function rref() in both Mathematica and TI calculators. The closest I came across was http://software.intel.com/en-us/forums/intel-math-kernel-library/topic/53107/ this tiny thread. This thread, however, seems to imply that simply taking the "U" upper triangular matrix (and dividing each row by the diagonal) is the same as the reduced echelon form of a matrix, which I do not believe to be the case. I could code up rref() myself, but I do not believe I could achieve the performance LAPACK is famous for.
1) Is there a better way to simply get any one specific solution to an underrepresented system?
2) If not, is there a way for LAPACK to row-reduce a matrix?
Thanks!
One often used method for this is the least square solution, see lapack's sgelsx
Related
A motivating issue, implemented in Matlab:
N = 1000;
R = zeros(2*N);
for i=0:N-1
R = R(2:end-1, 2:end-1);
end
For this code timeit() gives a time 2.9793 on my machine. It isn't really great.
By canonical way I mean a discussion that isn't just acceptable, but a performant implementation that respects very large matrices reduced. I would be very appreciative of any answer, referrals to other discussions or literature.
As for language, I am not really a programmer, this question is motivated by a mathematics inquiry and I have encountered performance issues implementing any such reduction process in Matlab. Is there a solution to this in Matlab, or must one delve into the scary depths of C/C++?
One note: One may ask, why not just keep the matrix as is and consider parts of it as needed? To clarify, the reduction process in practice of course depends on the actual (nonzero) values of the elements, e.g. by processing the matrix in 2x2 blocks, and the removal of edge-values is needed to prepare the matrix for then next reduction step.
R(2:end-1, 2:end-1) is the correct way of extracting the part of the array that is all values except the ones at the edges. This requires copying the data, so will take some time. There is no legal way around the copy, and no alternative for extracting a part of the array. (subsref might seems like an alternative, but is the function that is internally called for the given syntax.)
As for illegal ways, you could try James Tursa’s sharedchild from the MATLAB FileExchange. It allows to create an array that references subsets of the data of another array. James is well known in the MATLAB user community as one of the people reverse-engineering the system and bending it to his will. This is solid code. But every version of MATLAB introduces new changes to the infrastructure, so upgrading MATLAB might break your program if you use this code.
You don't need the for loop. If you want to remove L elements from the borders, simply do:
R=R(L+1:end-L, L+1:end-L)
I am surprised you didn't get an error with that code. I think you should end up with an empty matrix at the end of the loop.
Yesterday I came across this question and for the first time noticed that the weights of the linear layer nn.Linear need to be transposed before applying matmul.
Code for applying the weights:
output = input.matmul(weight.t())
What is the reason for this?
Why are the weights not in the transposed shape just from the beginning, so they don't need to be transposed every time before applying the layer?
I found an answer here:
Efficient forward pass in nn.Linear #2159
It seems like there is no real reasoning behind this. However the transpose operation doesn't seem to be slowing down the computation.
According to the issue mentioned above, during the forward pass the transpose operation is (almost) free in terms of computation. While during the backward pass leaving out the transpose operation would actually make computation less efficient with the current implementation.
The last post in that issue sums it up quite nicely:
It's historical weight layout, changing it is backward-incompatible.
Unless there is some BIG benefit in terms of speed or convenience, we
wont break userland.
https://github.com/pytorch/pytorch/issues/2159#issuecomment-390068272
I am trying to find the equation I would need to use in order to implement a Least Squares Kernel classifier for a dataset with N samples of feature length d. I have the kernel equation k(x_i, x_j) and I need the equation to pug it into to get the length-d vector used to classify future data. No matter where I look/google, Although there are dozens of powerpoints and pdfs that seem to give me almost what I'm looking for, I can't find a resource which can give me a straight answer.
note: I am not looking for the programming-language tool that computes this for me such as lsqlin, but the mathematical formula.
Least Squares Kernel SVM (what I assume your actually asking about) is equivalent to Kernelized Ridge Regression. This is the simplest what to implement it, and the solution can be found here, assume you have the appropriate background.
I am implementing a SPICE solver. I have the following problem: say I put two diodes and a current source in serial (standard diodes). I use MNA and Boost LU-decomposition. The problem is that the nodal matrix becomes very quickly near-singular. I think I have to scale the values but I don't know how and I couldn't find anything on the Internet. Any ideas how to do this scaling?
In the perspective of numerical, there is a scale technique for this kind of near-singular matrices. Basically, this technique is to divide each row of A by the sum (or maximum) of the absolute values in that row. You can find KLU which is a linear solver for circuit simulations for more details.
In perspective of SPICE simulation, it uses so-call Gmin stepping technique to iteratively compute and approach a real answer. You can find this in the documents of a SPICE project QUCS (Quite Universal Circuit Simulator).
Scaling does not help when the matrix has both very large and very small entries.
It is necessary to use some or all of the many tricks that were developed for circuit solver applications. A good start is clipping the range of the exponential and log function arguments to reasonable values -- in most circuits a diode forward voltage is never more than 1V and the diode reverse current not less than 1pA.
Actually, look at all library functions and wrap them in code that makes their arguments and results suitable for circuit-solving purposes. Simple clipping is sometimes good enough, but it is way better to make sure the functions stay (twice) differentiable and continuous.
Let's imagine I have an unknown function that I want to approximate via Genetic Algorithms. For this case, I'll assume it is y = 2x.
I'd have a DNA composed of 5 elements, one y for each x, from x = 0 to x = 4, in which, after a lot of trials and computation and I'd arrive near something of the form:
best_adn = [ 0, 2, 4, 6, 8 ]
Keep in mind I don't know beforehand if it is a linear function, a polynomial or something way more ugly, Also, my goal is not to infer from the best_adn what is the type of function, I just want those points, so I can use them later.
This was just an example problem. In my case, instead of having only 5 points in the DNA, I have something like 50 or 100. What is the best approach with GA to find the best set of points?
Generating a population of 100,
discard the worse 20%
Recombine the remaining 80%? How?
Cutting them at a random point and
then putting together the first
part of ADN of the father with the
second part of ADN of the mother?
Mutation, how should I define in
this kind of problem mutation?
Is it worth using Elitism?
Any other simple idea worth using
around?
Thanks
Usually you only find these out by experimentation... perhaps writing a GA to tune your GA.
But that aside, I don't understand what you're asking. If you don't know what the function is, and you also don't know the points to being with, how do you determine fitness?
From my current understanding of the problem, this is better fitted by a neural network.
edit:
2.Recombine the remaining 80%? How? Cutting them at a random point and then putting together the first part of ADN of the father with the second part of ADN of the mother?
This is called crossover. If you want to be saucey, do something like pick a random starting point and swapping a random length. For instance, you have 10 elements in an object. randomly choose a spot X between 1 and 10 and swap x..10-rand%10+1.. you get the picture... spice it up a little.
3.Mutation, how should I define in this kind of problem mutation?
usually that depends more on what is defined as a legal solution than anything else. you can do mutation the same way you do crossover, except you fill it with random data (that is legal) rather than swapping with another specimen... and you do it at a MUCH lower rate.
4.Is it worth using Elitism?
experiment and find out.
Gaussian adaptation usually outperforms standard genetic algorithms. If you don't want to write your own package from scratch, the Mathematica Global Optimization package is EXCELLENT -- I used it to fit a really nasty nonlinear function where standard fitters failed miserably.
Edit:
Wikipedia Article
If you hunt down prints of the listed papers on the article, you can find whitepapers and implementations. In general though, you should have some idea what the solution space for your maximizing the fitness function look like. If the number of variables is small, or the number of local maxima is small or they are connected/slope down to a global maxima, simple least squares works fine. If the area around each local maxima is small (IE you have to get a damned good solution to hit the best one, otherwise you hit a bad one), then fancier algorithms are needed.
Choosing variables for a genetic algorithm depends on what the solution space will look like.