Fast algorithm for boundary value problem - algorithm

I am looking for the fastest way to solve the following problem:
Given some volume of lattice points in a 3D grid, some points b_i (the boundary) satisfy f(b_i)=0, while another point a_0 satisfies f(a_0)= 1.
All other points (non-boundary) are some linear combination of the surrounding 26 points. For example, I could want
f(i, j, k)= .05*f(i+1, j, k)+.1*f(i, j+1, k)+.07*f(i, j, k+1)+...
The sum of the coefficients .05+.1+.07+... will add up to 1. My objective is to solve for f(x_i) for all x_i in the volume.
Currently, I am using the successive over-relaxation (SOR) method, which basically initializes the boundary of the volume, assigns to each point the weighted average of the 26 surrounding points, and repeats. The SOR method just takes a combination of f(x_i) after the most recent iteration and f(x_i) after an iteration before.
I was wondering if anyone knows of any faster ways to solve this problem for a 3D grid around the size 102x102x48. SOR currently takes about 500-1000 iterations to converge to the level I want (varying depending on the coefficients used). I am most willing to use matlab, idl, and c++. Does anyone have any idea of how fast SOR is compared to converting the problem into a linear system and using matrix methods (like BCGSTAB)? Also, which method would be most effectively (and easily) parallelized? I have access to a 250 core cluster, and have been trying to make the SOR method parallel using mpi and c++, but haven't seen as much speed increase as I would like (ideal would be on the order of 100-fold). I would greatly appreciate any ideas on speeding up the solution to this problem. Thanks.

If you're comfortable with multithreading, using a red-black scheme for SOR can give a decent speedup. For a 2D problem, imagine a checkerboard - the red squares only depend on the black squares (and possibly themselves), so you can update all the red squares in parallel, and then repeat for all the black ones. Note that this does converge more slowly than the simple ordering, but it lets you spread the work over multiple threads.
Conjugate gradient methods will generally converge faster than SOR (if I remember correctly, by about an order of magnitude). I've never used BCGSTAB, but I remember GMRES working well on non-symmetric problems, and they can probably both benefit from preconditioning.
As for opportunities for parallelization, most CG-type methods only need you to compute the matrix-vector product A*x, so you never need to form the full matrix. That's probably the biggest cost of each iteration, and so it's where you should look at multithreading.
Two good references on this are Golub and Van Loan, and Trefethen and Bau. The latter is much more readable IMHO.
Hope that helps...

I've got (I think) an exact solution, however, it might be long. The main problem would be computation on a very very big (and hopefully sparse) matrix.
Let's say you have N points in your volume. Let's call p1...pN these points. What you are looking for is f(p1)...f(pN)
Let's take a point pi. It has got 26 neighbours. Let's call them pi-1...pi-26. I suppose you have access for each point pi to the coefficients of the linear combination. Let's call them ci-1...ci-j...ci-26 (for "coefficient of the linear combination for point pi to its j-th neighbour)
Let's do even better, you can consider that pi is a linear combination of all the points in your space, except that most of them (except 26) are equals to 0. You have your coefficients ci-1...ci-N.
You can now build a big matrix N*N of these ci-j coefficients :
+--------------------- -------+
| 0 | c1-2 | c1-3 | ... | c1-N | |f(p1)| |f(p1)|
+--------------------- -------+ | | | |
| c2_1| 0 | c2-3 | ... | c1-N | |f(p2)| |f(p2)|
+--------------------- -------+ * . . = . .
| | . . . .
. . . . . .
+--------------------- -------+ . . . .
| cN-1| cN-2 | cN-3 | ... | 0 | |f(pN)| |f(pN)|
+--------------------- -------+
Amazing! The solution you're looking for is one of the eigenvectors corresponding to the eigenvalue 1!
Use an optimised matrix library that computes eigenvectors efficiently (for sparse matrix) and hope it is already parallelised!
Edit : Amusing, I just reread your post, seems I just gave you your BCGSTAB solution... Sorry! ^^
Re-edit : In fact, I'm not sure, are you talking about "Biconjugate Gradient Stabilized Method"? Because I don't see the linear method you're talking about if you're doing gradient descent...
Re-re-edit : I love math =) I can even prove that given the condition sum of the ci = 1 that 1 is in fact an eigenvalue. I do not know the dimension of the corresponding space, but it is at least 1!

Related

Matrix Chain Multiplication Dynamic Programming

Assume that multiplying a matrix G1 of dimension p×q with another matrix G2 of dimension q×r requires pqr scalar multiplications. Computing the product of n matrices G1G2G3 ….. Gn can be done by parenthesizing in different ways. Define GiGi+1 as an explicitly computed pair for a given paranthesization if they are directly multiplied. For example, in the matrix multiplication chain G1G2G3G4G5G6 using parenthesization (G1(G2G3))(G4(G5G6)), G2G3 and G5G6 are only explicitly computed pairs.
Consider a matrix multiplication chain F1F2F3F4F5, where matrices F1,F2,F3,F4 and F5 are of dimensions 2×25,25×3,3×16,16×1 and 1×1000, respectively. In the parenthesization of F1F2F3F4F5 that minimizes the total number of scalar multiplications, the explicitly computed pairs is/are
F1F2 and F3F4 only
F2F3 only
F3F4 only
F2F3 and F4F5 only
=======================================================================
My approach - I want to solve this under one minute, but the only way I know is that to use Bottom up Dynamic Approach by making a table and the other thing I can conclude is we should multiply with F5 at last because it has 1000 in it's dimension.So, please how to develop fast intuition for this kind of question!
======================================================================
Correct answer is F3F4
The most important thing to note is the dimension 1×1000. You better watch out for it if you want to minimize the multiplications. OK, now we do know what we are looking for is basically multiply a small number with 1000.
Carefully examining if we go with F4F5, we would be multiplying 16x1x1000. But computing F3F4 first , the result matrix has dimension 3x1. So going with F3F4 we are able to get small numbers like 3,1 . So , no way im going with F4F5.
By similar logic I would not go with F2F3 and loose the smaller 3 and get bigger 25 and 16 to be later used with 1000.
OK, for F1F2, you can quickly find that (F1F2)(F3F4) is not better than
(F1(F2(F3F4))) . So the answer is F3F4

Algorithm - How to reduce the complexity?

I'm currently facing an algorithmic problem were I could need some hints from you.
Here the problems details:
Given are n numbers of FIFOs with a capacity of k. There are also sixteen different elements. Now we fill those FIFOs with a random amount of different
random elements. The task is to remove a given random amount of random elements from the FIFOs. It is possible that there is no optimal solution caused by other blocking elements (rotating the FIFO isn't allowed).
The best solution is the one which removes all or most elements (so there are more possible solutions).
Well here is an visualized example of three FIFOs (seperated by | ) and three elements (a, b and c):
1 | 2 | 3
---+---+---
. | b | c <- last elements
. | b | a
a | a | b <- first elements
I hope you now know what problem I'm facing. (please feel free to ask if there is anything unintelligible)
Here is my idea of an algorithm:
Treat each FIFO seperately and check on how much elements possibly could get removed. (This step is necessary to remove the problems size in the following step)
Try each possible combination to remove the elements.
Stop if there is a perfect combination which removes each element else remember the best combination (which is the one that removes the most elements).
So the worst case has a complexity of O(n^k) which is obviously horrible! I'd really appreciate any idea to reduce the complexity.

Second best solution to an assignmentproblem using the Hungarian Algorithm

For finding the best solution in the assignment problem it's easy to use the Hungarian Algorithm.
For example:
A | 3 4 2
B | 8 9 1
C | 7 9 5
When using the Hungarian Algorithm on this you become:
A | 0 0 1
B | 5 5 0
C | 0 1 0
Which means A gets assigned to 'job' 2, B to job 3 and C to job 1.
However, I want to find the second best solution, meaning I want the best solution with a cost strictly greater that the cost of the optimal solution. According to me I just need to find the assignment with the minimal sum in the last matrix without it being the same as the optimal. I could do this by just searching in a tree (with pruning) but I'm worried about the complexity (being O(n!)). Is there any efficient method for this I don't know about?
I was thinking about a search in which I sort the rows first and then greedily choose the lowest cost first assuming most of the lowest costs will make up for the minimal sum + pruning. But assuming the Hungarian Algorithm can produce a matrix with a lot of zero's, the complexity is terrible again...
What you describe is a special case of the K best assignments problem -- there was in fact a solution to this problem proposed by Katta G. Murty in the following 1968 paper "An Algorithm for Ranking all the Assignments in Order of Increasing Cost." Operations Research 16(3):682-687.
Looks like there are actually a reasonable number of implementations of this, at least in Java and Matlab, available on the web (see e.g. here.)
In r there is now an implementation of Murty's algorithm in the muRty package.
CRAN
GitHub
It covers:
Optimization in both minimum and maximum direction;
output by rank (similar to dense rank in SQL), and
the use of either Hungarian algorithm (as implemented in clue) or linear programming (as implemented in lpSolve) for solving the initial assignment(s).
Disclaimer: I'm the author of the package.

Determine Quaternion Which Sends One Vector into Another

First off, this is not a duplicate. All other seemingly related questions ask for the quaternion representing rotation between directions of 2 vectors, i.e. the solutions do not take into account norms of these 2 vectors.
Here is what I want. Imagine that I have non-unit vectors a = (0, 0, 2) and b = (3, 1, 2). Following the original Hamilton's definition of quaternion q = a / b (this definition is symbolic, since of course you cannot divide vectors). Refer to Wikipedia for this concept. From that I can infer (maybe it's naive) that somehow I can find such q that q * b = a.
In other words, given a and b I want to find a quaternion q which when multiplied by b will give me a. Please, pay attention to the fact that I'm not interested in plain rotating (unitary) quaternion which would simply rotate b into direction of a. In fact, in addition to rotation, I want norm of b to be scaled to the norm of a as well.
Yes, I know that I could do it in two stages: rotating b with standard unitary quaternion approach and then manually scaling the rotated b to the norm of a which would of course involve additional square roots (which is what I'm trying to avoid here). In fact, I want a computationally efficient composition of these 2 operations, and I feel like it's achievable, but the information is not widespread since it does not seem to be conventional use case.
Maybe I'm wrong. Please, share your experiences. Thank you.
Why not math.stackexchange.com?
Because I'm not interested in thorough mathematical derivation or explanation. My concern is computationally efficient algorithm for construction of such quaternion. Nevertheless, if such details will be included in the answer, I'd really appreciate that and probably others who stumble across the same issue in future too.
For Close Voters:
Go ahead and close Finding quaternion representing the rotation from one vector to another as well.
Furthermore, I have tagged my question properly. My question belongs to these highly-populated tags which are part of StackOverflow. As a result, your reasons for close do not make any sense.
Daniel Fischer's comment-answer is correct. It turns out that there are infinite ways to construct such a quaternion. The problem boils down to a linear system with three equations and four variables. It's under-constrained (if we assume we'll discard the [w] part of the result).
Perhaps I can clarify Fischer's answer.
When you treat two vectors as quaternions and multiply them, you get their cross-product in the [x,y,z] part and you get their negated dot-product in the [w] part:
| 0| | 0| |-ax*bx-ay*by-az*bz|
a*b=|ax|*|bx|=| ay*bz-az*by |
|ay| |by| | az*bx-ax*bz |
|az| |bz| | ax*by-ay*bx |
When you left-multiply a full-quaternion with a vector, you get the same thing, but the [w] part scales the vector and adds it back to the cross-product:
|qw| | 0| |-qx*bx-qy*by-qz*bz|
q*b=|qx|*|bx|=| qy*bz-qz*by+qw*bx|
|qy| |by| | qz*bx-qx*bz+qw*by|
|qz| |bz| | qx*by-qy*bx+qw*bz|
Recall that
a x b = |a||b|sin(Θ)n
where n is a unit vector that is orthogonal to a and b. And
a . b = |a||b|cos(Θ)
The quaternion conjugate of a vector is just its negation.
So if we look at Fischer's equation:
a = q*b = |b|^{-2} * a * b' * b
We can see that
a*b' = | -dotP(a,-b)|
|crossP(a,-b)|
And so
a*b'*b = | -dotP(crossP(a,-b),b) |
| crossP(crossP(a,-b),b) - dotP(a,-b)b |
The top ([w]) portion of this quaternion must be zero because it is the dot-product between two orthogonal vectors. The bottom portion is a scaled version of a: The nested cross-products produce a vector that is orthogonal to both b and n and is the length of |a|*|b|*|b|. The dot-product portion adds in the projection of a onto b (scaled by the squared length of b). This brings it parallel to a. Once we divide out the squared length of b, all that's left is a.
Now, the question of whether or not this is actually useful is different. It's not very useful to finding a, since you need to have it to begin with. Furthermore, odds are good that q*c is not going to do what you're hoping, but you'd have to tell us what that is.

Minimizing a function of vectors

I need to minimize the following sum:
minimize sum for all i{(i = 1 to n) fi(v(i), v(i - 1), tangent(i))}
v and tangent are vectors.
fi takes the 3 vectors as arguments and returns a cost associated with these 3 vectors. For this function, v(i - 1) is the vector chosen in the previous iteration. tangent(i) is also known. fi calculates the cost of choosing a vector v(i), given the other two vectors v(i - 1) and tangent(i). The v(0) and v(n) vectors are known. tangent(i) values are also known in advance for alli = 0 to n.
My task is to determine all such v(i)s such that the total cost of the function values for i = 1 to n is minimized.
Can you please give me any ideas to solve this?
So far I can think of Branch and Bound or dynamic programming methods.
Thanks!
I think this is a problem in mathematical optimisation, with an objective function built up of dot products and arcCosines, subject to the constraint that your vectors should be unit vectors. You could enforce this either with Lagrange multipliers, or by including a normalising step in the arc-Cosine. If Ti is a unit vector then for Vi calculate cos^-1(Ti.Vi/sqrt(Vi.Vi)). I would have a go at using a conjugate gradient optimiser for this, or perhaps even Newton's method, with my starting point Vi = Ti.
I would hope that this would be reasonably tractable, because the Vi are only related to neighbouring Vi. You might even get somewhere by repeatedly adjusting each Vi in isolation, one by one, to optimise the objective function. It might be worth just seeing what happens if you repeatedly set Vi to be the average of Ti, Vi+1, and Vi-1, and then scaled Vi to be a unit vector again.

Resources