Suppose I have a list of Matrices saved in the variable G and apply the following operations:
top[g_] = Minors[g]
Diagonal[top /# G]
Minorsreturns a matrix where each element is the determinant with the (i,j) row/col deleted, and Diagonal returns a list of the diagonal elements of a matrix.
My question is on the evaluation of these commands - clearly I do not want all entries evaluated. Is Mathematica lazy in the sense that Diagonal is parsed first which only extracts the elements needed from Minors or is the minor matrix constructed and then its diagonal elements are pulled out?
This is a general question for lazy evaluation, however being new to Mathematica I would appreciate any tips on how to improve the syntax for the specific problem.
It's late so only a short answer: investigate Hold[] and its relatives. With them you can implement lazy evaluating functions. Most intrinsic Mathematica functions are not lazy, a few are. In general, as a beginner, you should avoid modifying the behaviour of Mathematica's intrinsic functions, though it is very good fun to do so and can very easily make the entire system unusable.
You can solve this problem by building up the list of diagonal minors by yourself and then applying Det, for a matrix M:
Map[Det,Drop[Transpose[Drop[M,{#}]],{#}]& /# Range[1,Dimensions[M][[1]]]]
This is a bit of a cludge but it is about 50 times faster than using Mathematica's built in Minors and picking off just the diagonal elements (tested on 100x100 random matrices).
No mathematica is not lazy in general.
top/#G
Will produce a matrix that Diagonal will operate on.
Since Minors does not operate on individual elements of the matrix, what you are asking for is not, from my knowledge, just lazy evaluation either.
I think I have a solution for you though.
Clear[f];
Diagonal[Minors[G,Length[G],f]]/.f->Det
This solution will only produce the Minors of the Diagonal elements to be summed by Diagonal.
But I have only moved the excess computation to an excess memory usage problem. Since the submatrix of off diagonal elements is still produced only to be thrown away.
I will post again if I think of a way to prevent that as well.
Related
I'm trying to solve simultaneously the same ODE at different point (each point n is an independent vector of shape m) using the scipy BDF solver. In other world, i have a matrix n x m, and i want to solve n points (by solving, I mean make them advance in time with a while loop ), knowing that each point n are independant from each other.
Obviously you can loop on the different points, but this method takes too much time. Is there any way to make this faster and use it as a vectorized function?
I also tried to reshape my matrix to a 1D vector, but it looks like the solver compute the jacobian matrix of the complete vector, which takes too much time and is useless as the points along n are independent.
Maybe there is a way to specify that the derivatives of points n-m are zeros to speed up the jacobian computation ?
Thanks in advance for the answer
Edit:
Thanks for your answer #Lutz Lehmann. I was able to sped up the computation a little using jac_sparcity, that avoid computing a lot of unnecessary points.
The other improvement I can imagine is regarding the rate of progress h_abs : each independent ODE should have its own h_abs. Using the 1D vector method implies that all the ODE's are advancing at the same rate of progress h_abs i.e. the most restricting one. I don't know if there is anyway of doing this.
I am already using a vectorized atol built as an n x m matrix and reshaped, the same way as the complete set of ODE to make sure that the good tolerances are applied for each variable. I've never used jumba so far, but I will definitely have a look.
It might not be evident, but Prolog also offers arrays out of the box. A Prolog compound has a functor and a number of arguments. This means we could represent an array such as:
[[1,2],[3,4]]
Replacing the Prolog lists by the following Prolog compounds:
matrice(vector(1,2), vector(3,4))
The advantage would be faster element access from an integer index. Can this representation be used to realize a matrix multiplication?
There is yet another approach, as implemented in R (the statistical environment). The dimensions of the array and the values are kept separately. So your square could also be represented as:
array(dims(2, 2), v(1,2,3,4))
This approach has some (questionable) benefits and drawbacks. You can start reading here, if you are at all interested: https://stat.ethz.ch/R-manual/R-devel/library/base/html/dim.html
To your question, yes, you can implement matrix multiplication, regardless on how you decide to represent the matrix. It would be interesting to see how the two approaches (array of arrays vs. one array and calculating indexes from the dimensions) compare in terms of efficiency.
What algorithm do you want to use for the matrix multiplication? Is it any of the ones described here: https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm?
EDIT: do you want to allow the client code to be able to provide the product and sum operations? Do you want to allow specialization of the values? For example, if you want to use matrix multiplication for finding the transitive closure of a graph, you could represent the boolean square matrix as an unbounded integer. This will make the matrix itself at least quite small.
This is a follow-up to another stack overflow question, here:
3D Correspondences from fundamental matrix
Just like in that question, I am trying to get a camera matrix from a fundamental matrix, the ultimate goal being 3d reconstruction from 2d points. The answer given there is good, and correct. I just don't understand it. It says, quote, "If you have access to Hartley and Zisserman's textbook, you can check section 9.5.3 where you will find what you need." He also provides a link to source code.
Now, here's what section 9.5.3 of the book, among other things, says:
Result 9.12. A non-zero matrix F is the fundamental matrix
corresponding to a pair of camera matrices P and P if and only if PTFP
is skew-symmetric.
That, to me, is gibberish. (I looked up skew-symmetric - it means the inverse is its negative. I have no idea how that is relevant to anything.)
Now, here is the source code given (source):
[U,S,V] = svd(F);
e = U(:,3);
P = [-vgg_contreps(e)*F e];
This is also a mystery.
So what I want to know is, how does one explain the other? Getting that code from that statement seems like black magic. How would I, or anyone, figure out that "A non-zero matrix F is the fundamental matrix corresponding to a pair of camera matrices P and P if and only if PTFP is skew-symmetric." means what the code is telling you to do, which is basically
'Take the singular value decomposition. Take the first matrix. Take the third column of that. Perform some weird re-arrangment of its values. That's your answer.' How would I have come up with this code on my own?
Can someone explain to me the section 9.5.3 and this code in plain English?
Aha, that "PTFP" is actually something I have also wondered about and could not find the answer in literature. However, this is what I figured out:
The 4x4 skew-symmetric matrix you are mentioning is not just any matrix. It is actually the dual Plücker Matrix of the baseline (see also https://en.wikipedia.org/wiki/Pl%C3%BCcker_matrix). In other words, it only gives you the line on which the camera centers are located, which is not useful for reconstruction tasks as such.
The condition you mention is identical to the more popularized fact that the fundamental matrix for the view 1 & 0 is the negative transpose of the fundamental matrix for the views 0 & 1 (using MATLAB/Octave syntax here)
Consider first that the fundamental matrix maps a point x0 in one image to line l1 in the other
l1=F*x0
Next, consider that the transpose of the projection matrix back-projects a lines l1 in the image to a plane E in space
E=P1'*l1
(I find this beautifully simple and understated in most geometry / computer vision classes)
Now, I will use a geometric argument: Two lines are corresponding epipolar lines iff they correspond to the same epipolar plane i.e. the back-projection of either line gives the same epipolar plane. Algebraically:
E=P0'*l0
E=P1'*l1
thus (the important equation)
P0'*l0=P1'*l1
Now we are almost there. Let's assume we have a 3D point X and its two projections
x0=P0*X
x1=P1*X
and the epipolar lines
l1=F*x0
l0=-F'*x1
We can just put that into the important equation and we have for all X
P0'*-F'*P1*X=P1'*F*P0*X
and finally
P0'*-F'*P1=P1'*F*P0
As you can see, the left-hand-side is the negative transpose of the right-hand-side. So this matrix is a skew symmetric 4x4 matrix.
I also published these thoughts in Section II B (towards the end of the paragraph) in the following paper. It should also explain why this matrix is a representation of the baseline.
Aichert, André, et al. "Epipolar consistency in transmission imaging."
IEEE transactions on medical imaging 34.11 (2015): 2205-2219.
https://www.ncbi.nlm.nih.gov/pubmed/25915956
Final note to #john ktejik : skew-symmetry means that a matrix is identical to its negative transpose (NOT inverse transpose)
I have my beuatiful triangular n x n matrix, say L (for lower triangular), and I want to solve a system like
LX=B
Where B and X are n x k matrices (that is: I want to solve a triangular linear system with multiple right hand side). Additionally, I have my triangular matrix stored in PACKED FORMAT; i.e. I only store the lower triangular part. I am using BLAS and LAPACK, but I have realised that there is no specific way of solving my problem. Although there are many functions that solve similar problems:
stpsv(): Takes a triangular matrix in packed format and solves for a single right hand side.
strsm(): Takes a triangular matrix in dense format and solves for multiple right hand side.
What I really need is a combination of both. I would like to have a function accepting packed triangular format, as in stpsv(), and also accepting multiple right hand side, as in strsm(). But it seems as if there is not such a function readily available.
So my questions are:
Is there any function that can accept a packed triangular matrix and solve for multiple right hand side?
If the answer is NO, what would be more efficient? Either I call stpsv() in a for loop for every column in B, or I create a dense matrix from L, so that I have all those useless zeros in there and then I call strsm(). What would be better? Moreover, maybe I am missing a more clever way of doing all this.
Packed storage implies BLAS2 routines. Otherwise BLAS3 functions are more efficient in solving linear systems but they work in optimal blocked algorithms. If you call BLAS2 functions then you basically go back to vectored version hence it won't make too much sense.
Note that BLAS2 versions also do not perform conditioning checks. So they are directly optimized for BLAS2 performance since a triangular matrix with single RHS is a direct backward substitution.
For multiple RHS you can convert your matrix via, say stpttr and then use strtrs.
Yes, there is a function to solve Ax=B, for packed triangular matrix A and multiple right hand size B. It is stptrs() from LAPACK. In addition, there are other routines for triangular packed matrices, all featuring tp in their name according to the naming conventions of LAPACK.
However, looking at the source unveils that this function calls stpsv() from BLAS in a loop, once for each right hand side. It's exactly what you suggested!
The kernel trick maps a non-linear problem into a linear problem.
My questions are:
1. What is the main difference between a linear and a non-linear problem? What is the intuition behind the difference of these two classes of problem? And How does kernel trick helps use the linear classifiers on a non-linear problem?
2. Why is the dot product so important in the two cases?
Thanks.
When people say linear problem with respect to a classification problem, they usually mean linearly separable problem. Linearly separable means that there is some function that can separate the two classes that is a linear combination of the input variable. For example, if you have two input variables, x1 and x2, there are some numbers theta1 and theta2 such that the function theta1.x1 + theta2.x2 will be sufficient to predict the output. In two dimensions this corresponds to a straight line, in 3D it becomes a plane and in higher dimensional spaces it becomes a hyperplane.
You can get some kind of intuition about these concepts by thinking about points and lines in 2D/3D. Here's a very contrived pair of examples...
This is a plot of a linearly inseparable problem. There is no straight line that can separate the red and blue points.
However, if we give each point an extra coordinate (specifically 1 - sqrt(x*x + y*y)... I told you it was contrived), then the problem becomes linearly separable since the red and blue points can be separated by a 2-dimensional plane going through z=0.
Hopefully, these examples demonstrate part of the idea behind the kernel trick:
Mapping a problem into a space with a larger number of dimensions makes it more likely that the problem will become linearly separable.
The second idea behind the kernel trick (and the reason why it is so tricky) is that it is usually very awkward and computationally expensive to work in a very high-dimensional space. However, if an algorithm only uses the dot products between points (which you can think of as distances), then you only have to work with a matrix of scalars. You can implicitly perform the calculations in the higher-dimensional space without ever actually having to do the mapping or handle the higher-dimensional data.
Many classifiers, among them the linear Support Vector Machine (SVM), can only solve problems that are linearly separable, i.e. where the points belonging to class 1 can be separated from the points belonging to class 2 by a hyperplane.
In many cases, a problem that is not linearly separable can be solved by applying a transform phi() to the data points; this transform is said to transform the points to feature space. The hope is that, in feature space, the points will be linearly separable. (Note: This is not the kernel trick yet... stay tuned.)
It can be shown that, the higher the dimension of the feature space, the greater the number of problems that are linearly separable in that space. Therefore, one would ideally want the feature space to be as high-dimensional as possible.
Unfortunately, as the dimension of feature space increases, so does the amount of computation required. This is where the kernel trick comes in. Many machine learning algorithms (among them the SVM) can be formulated in such a way that the only operation they perform on the data points is a scalar product between two data points. (I will denote a scalar product between x1 and x2 by <x1, x2>.)
If we transform our points to feature space, the scalar product now looks like this:
<phi(x1), phi(x2)>
The key insight is that there exists a class of functions called kernels that can be used to optimize the computation of this scalar product. A kernel is a function K(x1, x2) that has the property that
K(x1, x2) = <phi(x1), phi(x2)>
for some function phi(). In other words: We can evaluate the scalar product in the low-dimensional data space (where x1 and x2 "live") without having to transform to the high-dimensional feature space (where phi(x1) and phi(x2) "live") -- but we still get the benefits of transforming to the high-dimensional feature space. This is called the kernel trick.
Many popular kernels, such as the Gaussian kernel, actually correspond to a transform phi() that transforms into an infinte-dimensional feature space. The kernel trick allows us to compute scalar products in this space without having to represent points in this space explicitly (which, obviously, is impossible on computers with finite amounts of memory).
The main difference (for practical purposes) is: A linear problem either does have a solution (and then it's easily found), or you get a definite answer that there is no solution at all. You do know this much, before you even know the problem at all. As long as it's linear, you'll get an answer; quickly.
The intuition beheind this is the fact that if you have two straight lines in some space, it's pretty easy to see whether they intersect or not, and if they do, it's easy to know where.
If the problem is not linear -- well, it can be anything, and you know just about nothing.
The dot product of two vectors just means the following: The sum of the products of the corresponding elements. So if your problem is
c1 * x1 + c2 * x2 + c3 * x3 = 0
(where you usually know the coefficients c, and you're looking for the variables x), the left hand side is the dot product of the vectors (c1,c2,c3) and (x1,x2,x3).
The above equation is (pretty much) the very defintion of a linear problem, so there's your connection between the dot product and linear problems.
Linear equations are homogeneous, and superposition applies. You can create solutions using combinations of other known solutions; this is one reason why Fourier transforms work so well. Non-linear equations are not homogeneous, and superposition does not apply. Non-linear equations usually have to be solved numerically using iterative, incremental techniques.
I'm not sure how to express the importance of the dot product, but it does take two vectors and returns a scalar. Certainly a solution to a scalar equation is less work than solving a vector or higher-order tensor equation, simply because there are fewer components to deal with.
My intuition in this matter is based more on physics, so I'm having a hard time translating to AI.
I think following link also useful ...
http://www.simafore.com/blog/bid/113227/How-support-vector-machines-use-kernel-functions-to-classify-data