fast way to invert or dot kxnxn matrix - matrix

Is there a fast way to calculate the inverse of a kxnxn matrix using numpy (the inverse being calculated at each k-slice)? In other words, is there a way to vectorize the following code:
>>>from numpy.linalg import inv
>>>a-random(4*2*2).reshape(4,2,2)
>>>b=a.copy()
>>>for k in range(len(a)):
>>> b[k,:,:] = inv(a[k,:,:])

First about getting the inverse. I have looked into both np.linalg.tensorinv and np.linalg.tensorsolve.
I think unfortunately tensorinv will not give you what you want. It needs the array to be "square". This excludes what you want to do, because their definition of square is that np.prod(a[:i]) == np.prod(a[i:]) where i is 0, 1 or 2 (one of the axes of the array in general); this can be given as the third argument ind of tensorinv. This means that if you have a general array of NxN matrices of length M, you need to have e.g. (for i = 1) NxN == NxM, which is not true in general (it actually is true in your example, but it does not give the correct answer anyway).
Now, maybe something is possible with tensorsolve. This would however involve some heavy construction work on the a matrix-array before it is passed as the first argument to tensorsolve. Because we would want b to be the solution of the "matrix-array equation" a*b = 1 (where 1 is an array of identity matrices) and 1 would have the same shape as a and b, we cannot simply supply the a you defined above as the first argument to tensorsolve. Rather, it needs to be an array with shape (M,N,N,M,N,N) or (M,N,N,N,M,N) or (M,N,N,N,N,M). This is necessary, because tensorsolve would multiply with b over these last three axes and also sum over them so that the result (the second argument to the function) is again of shape (M,N,N).
Then secondly, about dot products (your title suggests that's also part of your question). This is very doable. Two options.
First: this blog post by James Hensman gives some good suggestions.
Second: I personally like using np.einsum better for clarity. E.g.:
a=np.random.random((7,2,2))
b=np.random.random((7,2,2))
np.einsum('ijk,ikl->ijl', a,b)
This will matrix-multiply all 7 "matrices" in arrays a and b. It seems to be about 2 times slower than the array-method from the blog post above, but it's still about 70 times faster than using a for loop as in your example. In fact, with larger arrays (e.g. 10000 5x5 matrices) the einsum method seems to be slightly faster (not sure why).
Hope that helps.

Related

I have a function f(w,x,y,z) and a target value A, how can I discover values for w,x,y,z that produce A?

So I have a function that takes four numerical arguments and produces a numerical argument.
f(w,x,y,z) --> A
If I have the function f and a target result A, is there an iterative method for discovering parameters w,x,y,z that produce a given number A?
If it helps, my function f is a quintic bezier where most of the parameters are determined. I have isolated just these four that are required to fit the value A.
Q(t)=R(1−t)^5+5S(1−t)^4*t+10T(1−t)^3*t^2+10U(1−t)^2*t^3+5V(1−t)t^4+Wt^5
R,S,T,U,V,W are vectors where R and W are known, I have isolated only a single element in each of S,T,U,V that vary as parameters.
The set of solutions of the equation f(w,x,y,z)=A (where all of w, x, y, z and A are scalars) is, in general, a 3 dimensional manifold (surface) in the 4-dimensional space R^4 of (w,x,y,z). I.e., the solution is massively non-unique.
Now, if f is simple enough for you to compute its derivative, you can use the Newton's method to find a root: the gradient is the direction of the fastest change of the function, so you go there.
Specifically, let X_0=(w_0,x_0,y_0,z_0) be your initial approximation of a solution and let G=f'(X_0) be the gradient at X_0.
Then f(X_0+h)=f(X_0)+(G,h)+O(|h|^2) (where (a,b) is the dot product).
Let h=a*G, and solve A=f(X_0)+a*|G|^2 to get a=(A-f(X_0))/|G|^2 (if G=0, change X_0) and X_1=X_0+a*G. If f(X_1) is close enough to A, you are done, otherwise proceed to compute f'(X_1) &c.
If you cannot compute f', you can play with many other methods.
If you can impose 3 (or more) additional equations that you know (or suspect) must be true for your 4-variable solution that gives target value A, then you can try applying Newton's method for solving a system of k equations with k unknowns. Otherwise, without a deeper understanding of the structure of the function you are trying to make equal to A, the only general type of technique I'm aware of that's easy to implement is to simply define the error function as g(w,x,y,z) = |f(w,x,y,z) - A| and search for a minimum of g. Typically the "minimum" found will be a local minimum, so it may require many restarts of the minimization problem with different starting values for your parameters to actually find a solution that gives a local minimum you want of g = 0. This is very easy to implement and try in a few lines e.g. in MATLAB using fminsearch

Is it beneficial to transpose an array in order to use column-wise operations?

Assume that we are working with a language which stores arrays in column-major order. Assume also that we have a function which uses 2-D array as an argument, and returns it.
I'm wondering can you claim that it is (or isn't) in general beneficial to transpose this array when calling the function in order to work with column-wise operations instead of row-wise operations, or does the transposing negate the the benefits of column-wise operations?
As an example, in R I have a object of class ts named y which has dimension n x p, i.e I have p times series of length n.
I need to make some computations with y in Fortran, where I have two loops with following kind of structure:
do i = 1, n
do j= 1, p
!just an example, some row-wise operations on `y`
x(i,j) = a*y(i,j)
D = ddot(m,y(i,1:p),1,b,1)
! ...
end do
end do
As Fortran (as does R) uses column-wise storage, it would be better to make the computations with p x n array instead. So instead of
out<-.Fortran("something",y=array(y,dim(y)),x=array(0,dim(y)))
ynew<-out$out$y
x<-out$out$x
I could use
out<-.Fortran("something2",y=t(array(y,dim(y))),x=array(0,dim(y)[2:1]))
ynew<-t(out$out$y)
x<-t(out$out$x)
where Fortran subroutine something2 would be something like
do i = 1, n
do j= 1, p
!just an example, some column-wise operations on `y`
x(j,i) = a*y(j,i)
D = ddot(m,y(1:p,i),1,b,1)
! ...
end do
end do
Does the choice of approach always depend on the dimensions n and p or is it possible to say one approach is better in terms of computation speed and/or memory requirements? In my application n is usually much larger than p, which is 1 to 10 in most cases.
more of a comment, buy i wanted to put a bit of code: under old school f77 you would essentially be forced to use the second approach as
y(1:p,i)
is simply a pointer to y(1,i), with the following p values contiguous in memory.
the first construct
y(i,1:p)
is a list of values interspaced in memory, so it seems to require making a copy of the data to pass to the subroutine. I say it seems because i haven't the foggiest idea how a modern optimizing compiler deals with these things. I tend to think at best its a wash at worst this could really hurt. Imagine an array so large you need to page swap to access the whole vector.
In the end the only way to answer this is to test it yourself
----------edit
did a little testng and confirmed my hunch: passing rows y(i,1:p) does cost you vs passing columns y(1:p,i). I used a subroutine that does practically nothing to see the difference. My guess with any real subroutine the hit is negligable.
Btw (and maybe this helps understand what goes on) passing every other value in a column
y(1:p:2,i) takes longer (orders of magnitude) than passing the whole column, while passing every other value in a row cuts the time in half vs. passing a whole row.
(using gfortran 12..)

Algorithm for finding basis of a set of bitstrings?

This is for a diff utility I'm writing in C++.
I have a list of n character-sets {"a", "abc", "abcde", "bcd", "de"} (taken from an alphabet of k=5 different letters). I need a way to observe that the entire list can be constructed by disjunctions of the character-sets {"a", "bc", "d", "e"}. That is, "b" and "c" are linearly dependent, and every other pair of letters is independent.
In the bit-twiddling version, the character-sets above are represented as {10000, 11100, 11111, 01110, 00011}, and I need a way to observe that they can all be constructed by ORing together bitstrings from the smaller set {10000, 01100, 00010, 00001}.
In other words, I believe I'm looking for a "discrete basis" of a set of n different bit-vectors in {0,1}k. This paper claims the general problem is NP-complete... but luckily I'm only looking for a solution to small cases (k < 32).
I can think of really stupid algorithms for generating the basis. For example: For each of the k2 pairs of letters, try to demonstrate (by an O(n) search) that they're dependent. But I really feel like there's an efficient bit-twiddling algorithm that I just haven't stumbled upon yet. Does anyone know it?
EDIT: I ended up not really needing a solution to this problem after all. But I'd still like to know if there is a simple bit-twiddling solution.
I'm thinking a disjoint set data structure, like union find turned on it's head (rather than combining nodes, we split them).
Algorithm:
Create an array main where you assign all the positions to the same group, then:
for each bitstring curr
for each position i
if (curr[i] == 1)
// max of main can be stored for constant time access
main[i] += max of main from previous iteration
Then all the distinct numbers in main are your different sets (possibly using the actual union-find algorithm).
Example:
So, main = 22222. (I won't use 1 as groups to reduce possible confusion, as curr uses bitstrings).
curr = 10000
main = 42222 // first bit (=2) += max (=2)
curr = 11100
main = 86622 // first 3 bits (=422) += max (=4)
curr = 11111
main = 16-14-14-10-10
curr = 01110
main = 16-30-30-26-10
curr = 00011
main = 16-30-30-56-40
Then split by distinct numbers:
{10000, 01100, 00010, 00001}
Improvement:
To reduce the speed at which main increases, we can replace
main[i] += max of main from previous iteration
with
main[i] += 1 + (max - min) of main from previous iteration
EDIT: Edit based on j_random_hacker's comment
You could combine the passes of the stupid algorithm at the cost of space.
Make a bit vector called violations that is (k - 1) k / 2 bits long (so, 496 for k = 32.) Take a single pass over character sets. For each, and for each pair of letters, look for violations (i.e. XOR the bits for those letters, OR the result into the corresponding position in violations.) When you're done, negate and read off what's left.
You could give Principal Component Analysis a try. There are some flavors of PCA designed for binary or more generally for categorical data.
Since someone showed it as NP complete, for large vocabs I doubt you will do better than a brute force search (with various pruning possible) of the entire set of possibilities O((2k-1) * n). At least in a worst case scenario, probably some heuristics will help in many cases as outlined in the paper you linked. This is your "stupid" approach generalized to all possible basis strings instead of just basis of length 2.
However, for small vocabs, I think an approach like this would do a lot better:
Are your words disjoint? If so, you are done (simple case of independent words like "abc" and "def")
Perform bitwise and on each possible pair of words. This gives you an initial set of candidate basis strings.
Goto step 1, but instead of using the original words, use the current basis candidate strings
Afterwards you also need to include any individual letter which is not a subset of one of the final accepted candidates. Maybe some other minor bookeeping for things like unused letters (using something like a bitwise or on all possible words).
Considering your simple example:
First pass gives you a, abc, bc, bcd, de, d
Second pass gives you a, bc, d
Bookkeeping gives you a, bc, d, e
I don't have a proof that this is right but I think intuitively it is at least in the right direction. The advantage lies in using the words instead of the brute force's approach of using possible candidates. With a large enough set of words, this approach would become terrible, but for vocabularies up to say a few hundred or maybe even a few thousand I bet it would be pretty quick. The nice thing is that it will still work even for a huge value of k.
If you like the answer and bounty it I'd be happy to try to solve in 20 lines of code :) and come up with a more convincing proof. Seems very doable to me.

Fastest way to check if a vector increases matrix rank

Given an n-by-m matrix A, with it being guaranteed that n>m=rank(A), and given a n-by-1 column v, what is the fastest way to check if [A v] has rank strictly bigger than A?
For my application, A is sparse, n is about 2^12, and m is anywhere in 1:n-1.
Comparing rank(full([A v])) takes about a second on my machine, and I need to do it tens of thousands of times, so I would be very happy to discover a quicker way.
There is no need to do repeated solves IF you can afford to do ONE computation of the null space. Just one call to null will suffice. Given a new vector V, if the dot product with V and the nullspace basis is non-zero, then V will increase the rank of the matrix. For example, suppose we have the matrix M, which of course has a rank of 2.
M = [1 1;2 2;3 1;4 2];
nullM = null(M')';
Will a new column vector [1;1;1;1] increase the rank if we appended it to M?
nullM*[1;1;1;1]
ans =
-0.0321573705742971
-0.602164651199413
Yes, since it has a non-zero projection on at least one of the basis vectors in nullM.
How about this vector:
nullM*[0;0;1;1]
ans =
1.11022302462516e-16
2.22044604925031e-16
In this case, both numbers are essentially zero, so the vector in question would not have increased the rank of M.
The point is, only a simple matrix-vector multiplication is necessary once the null space basis has been generated. If your matrix is too large (and the matrix nearly of full rank) that a call to null will fail here, then you will need to do more work. However, n = 4096 is not excessively large as long as the matrix does not have too many columns.
One alternative if null is too much is a call to svds, to find those singular vectors that are essentially zero. These will form the nullspace basis that we need.
I would use sprank for sparse matrixes. Check it out, it might be faster than any other method.
Edit : As pointed out correctly by #IanHincks, it is not the rank. I am leaving the answer here, just in case someone else will need it in the future.
Maybe you can try to solve the system A*x=v, if it has a solution that means that the rank does not increase.
x=(B\A)';
norm(A*x-B) %% if this is small then the rank does not increase

What is determining the items that make the difference between two arrays called?

I want to find which elements of two arrays make the two arrays different.
For example, if I start off with
known_unacceptable_array = [bad, bad, good, good, good, bad, good]
known_acceptable_array = []
and an array is only unacceptable if there's three bads (but I don't know that at the time), but I'm able to evaluate whether an array is acceptable or unacceptable, I would like to find the smallest array that makes the array unacceptable
possibly_minimal_unacceptable = [bad, bad, bad]
maximal_acceptable = [bad, bad] # Third bad required to make the array unacceptable
What is this problem called, and what algorithms are there for this?
Edit: The elements can't be changed in order, and adding an element can only either change the list from being acceptable to unacceptable or have no effect - it can't change it from being unacceptable to acceptable.
Background: I've randomly generated thousands of instructions that make a ruby interpreter crash, and I want to isolate the specific instructions that cause it to crash, and at the time I thought that multiple bad instructions were required to make it crash. A very naive attempt to determine what the bad instructions is at this link
What is determining the elements that make the difference
between two arrays called?
Differencing is often called
subtraction.
I want to determine which elements of two arrays make the
two arrays different.
Again, that's subtraction(at least
some form of it):
Given A ={ x , y , z } B = { x , y a },
A - B = { z , -a }
or "only A has z and only B has a", or "z and a" make them
different.
For example, if I start off with
known_bad = [bad, bad, good, good, good, bad, good] >
known_good = []
Why start with a full array and an empty one? Isn't this an
extreme case, or are these "two arrays" not two of which you
are trying to determine the "difference."
possibly_minimal_bad = [bad, bad, bad]
maximal_good = [bad, bad] # Third bad required to make the list bad
Is this just a set of rules? Or is this the result of
finding the difference between the two arrays of the previous
(known_good,bad) set?
What is this problem called, and what algorithms are there
for this?
If it isn't called "difference" or "subtraction" then why
introduce it that way?
Is the problem: a. going from
the first two arrays (known_xx) to the second two (min,max);
or is it: b. classifying finite sequences of the words "good"
and "bad."
a) I can't see a relation between the first two
arrays and the second two. How did you get from the first two
to the second?
b) Classifying a sequence of words could be
"parsing a language", or decoding a message, recognizing a
pattern, etc.
Is it "Pattern Recognition"?
It appears that you are looking for a pattern in test input(or test point) data and it's relationship to product failure,
and want to represent the relationship in some codical
form for further analysis. Or searching for a correlation between certain test points and product failure. That makes this question rather
interesting. However, the presentation of the question
is quite confusing. Maybe those groups of
equations could be explained a little more, clarifying if they are related,and if so, then: In what way?
I'm not entirely sure if I understand the question. If my answer is unsatisfactory, please rephrase your question to be more clear. I'll base my answer on this.
I want to determine which elements of two arrays make the two arrays different.
This is a combination of the three set operations union, intersection and difference. Different combinations can achieve the same result.
Complement is the the subset of A which is not in B.
Intersection is the set of elements which is both in A and B, but not just A or B.
Union is the subset which is either in A or B (no duplicates).
It sounds like you want the union of both complements, which is:
A\B ∪ B\A
Or the complement between the intersection and the union:
A∩B \ A∪B
See http://en.wikipedia.org/wiki/Set_operations_(Boolean) for more information.

Resources