How to efficiently construct a dependency graph from a transitive pairwise relation? - algorithm

The following algorithm is necessary in a code generation problem that I am tackling. My current algorithm is O(n^2) but I feel like there is a better way to do it.
Suppose I have a predicate function for computing whether x < y.
less?: (x:T, y:T) -> True|False
I know a-priori that this relation is transitive. Such that,
less?(a, b) and less?(b, c)
implies
less?(a, c)
I would like to compute the dependency graph for a set of objects (x1, ..., xn). It should look like:
x1 => (x2, x4, x5)
x2 => (x3)
x5 => (x7)
x10 => ()
etc...
where each node, xi, is associated with a list of xj such that less?(xj, xi) is true. The easiest way to compute this graph is to call less? on all possible pairs of (xi, xj). But the less? relation is expensive and I would like to minimize the calls to less?
Thanks for your help.
-Patrick

If the < relation is sufficiently expensive you might gain by maintaining a matrix in which to store the current state of knowledge about a vs b. We can have a < b, !(a < b), or unknown. Then when you compute a comparison of a vs b, store that in the matrix and look for deductions about a vs c and b vs c for every possible c for which the result is as yet unknown. Do you also have a < b => !(b < a)?
With e.g. a vs b and b vs c there are only a finite number of possibilities to check for compatibility and incompatibility to see where deductions are possible but clearly a < b and b < c => a < c. Because of this we also have a < b and !(a < c) => !(b < c). Perhaps if you write out all possibilities you can find more.
I would be inclined to slowly grow a square of known values, adding new variables one by one chosen in a random order, so at stage i you know the entire contents of the matrix for the first i randomly chosen variables. As you add each new variable I would compare it with the variables already worked on in a random order. You are making every deduction possible. If there is a very clever variable comparison order you might hope that with a random comparison order it will be close enough to the optimal order that you won't be much more inefficient than it.
I have my doubts about this in the worst case. If you never find a < b for any a, b, I think you have to check every possibility.

Related

an algorithm to find a transform matrix between two matrix.

Assuming there exist two matrixes A and B that are both m * n, is there a method or algorithm that can be used to obtain a n * n matrix C which satisfies the equation A * C = B' (B' can be obtained by performing several steps of row swap on B), where C satisfies the minimum sum of squared error.
Or A * C = D * B, where D(m*m) is a row swap transform matrix.
Thanks.
If I read your question correctly, you have two matrixes A and B, and you’re looking for C such that A * C = B + epsilon where you want to minimize epsilon’s sum of squares.
Your question seems to suggest you have some constraint on C but it’s not obvious what that is. But as you indicate in your answer, a linear solver will find a C that minimizes epsilon’s sum of squares. The solver doesn’t care what the ordering of the rows of B are: it will combine row-swapping operators (like D that you mention) into the C that it finds.
There are many different linear solvers and a simple function like solve has to choose which to use—you can always explicitly choose a specific solver if you know you want it. An expensive but very useful solver is the Moore–Penrose pseudoinverse: with C = pinv(A) * B, C is guaranteed to minimize the sum of squares of epsilon but also minimize C’s sum of squares. Wikipedia explains when solve might return something different than this min-norm solution via pseudoinverse.

Sort Algorithm for a set of elements knowing pair order

Here is a problem that I'm seeking an algorithmic solution for. Suppose we have a set of n elements, A1, A2, ..., An
And we have a set of rules like A1 > A2, A1 < A3 and etc. Rules are enough for writing the sorted list by hand. Is there a known method for doing the sort? I don't want to do a bubble sort like loop, I'm looking for a standard solution. Any ideas? A name would be enough for me!
Thanks in advance.
Comparison-based sort algorithms will only work if you have a total ordering, that is if for every pair x, y with x != y, we know whether x < y or y < x. What you have is a partial ordering on your set of elements and what you are looking for is a toplogical ordering of the elements according to that partial order.
To find it, interpret your input as a graph with edges (a, b) where a < b is an input pair. Then do a DFS on that graph:
dfs(x):
if x is visited: return
for every rule x < y or y > x:
dfs(y)
add x to front of output
output = []
for every element x:
dfs(x)
The runtime is O(n + m) where n is the number of elements (nodes) and m is the number of rules (edges).
Sure, take your pick!
Merge Sort
Quick Sort
Heap Sort
Any comparison sort will work; just put your rules into a big if/else statement in a single function, and the comparison sort will be more than happy to sort them as you like.

How to linearize a minmax constraint

Currently I have this linear programming model:
Max X
such that:
Max_a(Min_b(F(a,b,X))) <= some constant
*Max_a meaning to maximize the following equation by just changing a, and the same applies to Min_b
Now, the problem becomes how to linearize the constraint part. Most of the current Minmax linearization papers talks about Minmax as an objective. But how to linearize it if it was an constraint??
Thanks
Preliminary remark: the problem you describe is not a "linear programming model", and there is no way to transform it into a linear model directly (which doesn't mean it can't be solved).
First, note that the Max in the constraint is not necessary, i.e. your problem can be reformulated as:
Max X
subject to: Min_b F(a, b, X) <= K forall a
Now, since you are speaking of 'linear model', I assume that at least F is linear, i.e.:
F(a, b, X) = Fa.a + Fb.b + FX.X
And the constraint can obviously be written:
Fa.a + Min_b Fb.b + FX.X <= k forall a
The interesting point is that the minimum on b does not depend on the value of a and X. Hence, it can be solved beforehand: first find u = Min_b Fb.b, and then solve
Max X
subject to Fa.a + FX.X <= k - u forall a
This assume, of course, that the domain of a and b are independant (of the form AxB): if there are other constraints coupling a and b, it is a different problem (in that case please write the complete problem in the question).

A problem about algorithm of string, dp, graph or sth else

problem is following.
about "nice"
1) "ab" is nice
2) A is nice => "a"+A+"b" is nice
3) A and B are nice => A+B is nice
about "~"
1) "ab"~"ab"
2) A~B => "a"+A+"b"~"a"+B+"b"
3) A~B and C~D => A+C~B+D and A+C~D+B
now there are at most 1000 string of 'a' and 'b' forming a set S, find the biggest subset of S in which every element must be nice and none of pair (A,B) holds A~B. Output the Cardinality.
There are sth different from the problems i see before:
A+B+C+D~A+C+B+D~B+D+A+C but A+B+C+D~B+D+A+C doesn't hold.
Two difficulties for me:
how to check whether S1~S2
if i know every pair's "~", how can i find the cardinality
more detail: https://www.spoj.pl/problems/ABWORDS/
The rules for constructing a nice word imply that every nice word starts with "a" and ends with "b". Hence, there is a unique (up to sequencing - rule 3) decomposition of a nice word into nice sub-words: find all "ab"s, and then try to expand them using rule 2, and sequence them using rule 3. We can express this decomposition via a tree (n branches for repeated application of rule 3).
In the tree context, the "~" relation is simply expressing isomorphic trees, I think (where branch order does not matter).
EDIT: as pointed out, branch order does matter.
I'll attempt to solve the problem as stated in the original link (the 2 definitions of "nice" doesn't coincide).
First, similarity between two words X and Y, using DP.
Define f(a, b, n) to be the function that indicates that X[a..a+2n-1] is similar to Y[b..b+2n-1] and that both subwords are nice.
f(a, b, 0) = 1.
for n > 0,
f(a, b, n) = f1(a, b, n) or f2(a, b, n) or f3(a, b, n)
f1(a, b, n) = x[a] == y[b] == 'a' and x[a+2n-1] == y[b+2n-1] == 'b' and f(a+1, b+1, n-1)
f2(a, b, n) = any(1 <= k < n) f(a, b, k) and f(a+2k, b+2k, n-k)
f3(a, b, n) = any(1 <= k < n) f(a, b+2(n-k), k) and f(a+2k, b, n-k)
I think this is O(n^4) (urgh).
For the second part, if you represent the words as a graph with edges representing the similarity relation, you are essentially trying to find the maximum independent set, I think. If so, good luck! It is NP-hard (i.e. no known better solution than to try all
combinations) in the general case, and I don't see any properties that make it easier in this :(
EDITED to make the definition of similarity automatically check niceness. It is quite easy.
EDITED yet again because of my stupidity.

Distance measure between two sets of possibly different size

I have 2 sets of integers, A and B, not necessarily of the same size. For my needs, I take the distance between each 2 elements a and b (integers) to be just abs(a-b).
I am defining the distance between the two sets as follows:
If the sets are of the same size, minimize the sum of distances of all pairs [a,b] (a from A and b from B), minimization over all possible 'pairs partitions' (there are n! possible partitions).
If the sets are not of the same size, let's say A of size m and B of size n, with m < n, then minimize the distance from (1) over all subsets of B which are of size m.
My question is, is the following algorithm (just an intuitive guess) gives the right answer, according to the definition written above.
Construct a matrix D of size m X n, with D(i,j) = abs(A(i)-B(j))
Find the smallest element of D, accumulate it, and delete the row and the column of that element. Accumulate the next smallest entry, and keep accumulating until all rows and columns are deleted.
for example, if A={0,1,4} and B={3,4}, then D is (with the elements above and to the left):
3 4
0 3 4
1 2 3
4 1 0
And the distance is 0 + 2 = 2, coming from pairing 4 with 4 and 3 with 1.
Note that this problem is referred to sometimes as the skis and skiers problem, where you have n skis and m skiers of varying lengths and heights. The goal is to match skis with skiers so that the sum of the differences between heights and ski lengths is minimized.
To solve the problem you could use minimum weight bipartite matching, which requires O(n^3) time.
Even better, you can achieve O(n^2) time with O(n) extra memory using the simple dynamic programming algorithm below.
Optimally, you can solve the problem in linear time if the points are already sorted using the algorithm described in this paper.
O(n^2) dynamic programming algorithm:
if (size(B) > size(A))
swap(A, B);
sort(A);
sort(B);
opt = array(size(B));
nopt = array(size(B));
for (i = 0; i < size(B); i++)
opt[i] = abs(A[0] - B[i]);
for (i = 1; i < size(A); i++) {
fill(nopt, infinity);
for (j = 1; j < size(B); j++) {
nopt[j] = min(nopt[j - 1], opt[j - 1] + abs(A[i] - B[j]));
swap(opt, nopt);
}
return opt[size(B) - 1];
After each iteration i of the outer for loop above, opt[j] contains the optimal solution matching {A[0],..., A[i]} using the elements {B[0],..., B[j]}.
The correctness of this algorithm relies on the fact that in any optimal matching if a1 is matched with b1, a2 is matched with b2, and a1 < a2, then b1 <= b2.
In order to get the optimum, solve the assignment problem on D.
The assignment problem finds a perfect matching in a bipartite graph such that the total edge weight is minimized, which maps perfectly to your problem. It is also in P.
EDIT to explain how OP's problem maps onto assignment.
For simplicity of explanation, extend the smaller set with special elements e_k.
Let A be the set of workers, and B be the set of tasks (the contents are just labels).
Let the cost be the distance between an element in A and B (i.e. an entry of D). The distance between e_k and anything is 0.
Then, we want to find a perfect matching of A and B (i.e. every worker is matched with a task), such that the cost is minimized. This is the assignment problem.
No It's not a best answer, for example:
A: {3,7} and B:{0,4} you will choose: {(3,4),(0,7)} and distance is 8 but you should choose {(3,0),(4,7)} in this case distance is 6.
Your answer gives a good approximation to the minimum, but not necessarily the best minimum. You are following a "greedy" approach which is generally much easier, and gives good results, but can not guarantee the best answer.

Resources