In a reduction with additive but nonlinear blow up can I still get APX-Hardness? - complexity-theory

I'm not entirely sure how to phrase this so first I'll give an example and then, in analogy to the example, try and state my question.
A standard example of an L-reduction is showing bounded degree independent set (for concreteness say instantiated as G = (V,E), B $\ge$ 3 is the bound on degree) is APX complete by L - reduction to max 2 SAT. It works by creating a clause for each edge and a clause for each vertex, the idea being in our simulation by MAX 2 SAT every edge clause will be satisfied and it's optimum is:
OPT = |E| + |Maximum Ind. Set|
Since the degree is bounded, |Max Ind. Set| is \Theta(|E|) and we get an L-Reduction.
Now my question is suppose I have two problems A, which is APX-Complete, and B which is my target problem. Let the optimum of A be \Theta(m) and my solution in B
OPT_B = p(m) + OPT_A
where p is some polynomial with deg(p) > 1. I no longer have an L-reduction, my question is do get anything? Can it be a PTAS reduction? I hope the question is clear, thanks.

Related

minimum weight vertex cover of a tree

There's an existing question dealing with trees where the weight of a vertex is its degree, but I'm interested in the case where the vertices can have arbitrary weights.
This isn't homework but it is one of the questions in the algorithm design manual, which I'm currently reading; an answer set gives the solution as
Perform a DFS, at each step update Score[v][include], where v is a vertex and include is either true or false;
If v is a leaf, set Score[v][false] = 0, Score[v][true] = wv, where wv is the weight of vertex v.
During DFS, when moving up from the last child of the node v, update Score[v][include]:
Score[v][false] = Sum for c in children(v) of Score[c][true] and Score[v][true] = wv + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
Extract actual cover by backtracking Score.
However, I can't actually translate that into something that works. (In response to the comment: what I've tried so far is drawing some smallish graphs with weights and running through the algorithm on paper, up until step four, where the "extract actual cover" part is not transparent.)
In response Ali's answer: So suppose I have this graph, with the vertices given by A etc. and the weights in parens after:
A(9)---B(3)---C(2)
\ \
E(1) D(4)
The right answer is clearly {B,E}.
Going through this algorithm, we'd set values like so:
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6; score[B][true] = 3
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4; score[A][true] = 12
Ok, so, my question is basically, now what? Doing the simple thing and iterating through the score vector and deciding what's cheapest locally doesn't work; you only end up including B. Deciding based on the parent and alternating also doesn't work: consider the case where the weight of E is 1000; now the correct answer is {A,B}, and they're adjacent. Perhaps it is not supposed to be confusing, but frankly, I'm confused.
There's no actual backtracking done (or needed). The solution uses dynamic programming to avoid backtracking, since that'd take exponential time. My guess is "backtracking Score" means the Score contains the partial results you would get by doing backtracking.
The cover vertex of a tree allows to include alternated and adjacent vertices. It does not allow to exclude two adjacent vertices, because it must contain all of the edges.
The answer is given in the way the Score is recursively calculated. The cost of not including a vertex, is the cost of including its children. However, the cost of including a vertex is whatever is less costly, the cost of including its children or not including them, because both things are allowed.
As your solution suggests, it can be done with DFS in post-order, in a single pass. The trick is to include a vertex if the Score says it must be included, and include its children if it must be excluded, otherwise we'd be excluding two adjacent vertices.
Here's some pseudocode:
find_cover_vertex_of_minimum_weight(v)
find_cover_vertex_of_minimum_weight(left children of v)
find_cover_vertex_of_minimum_weight(right children of v)
Score[v][false] = Sum for c in children(v) of Score[c][true]
Score[v][true] = v weight + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
if Score[v][true] < Score[v][false] then
add v to cover vertex tree
else
for c in children(v)
add c to cover vertex tree
It actually didnt mean any thing confusing and it is just Dynamic Programming, you seems to almost understand all the algorithm. If I want to make it any more clear, I have to say:
first preform DFS on you graph and find leafs.
for every leaf assign values as the algorithm says.
now start from leafs and assign values to each leaf parent by that formula.
start assigning values to parent of nodes that already have values until you reach the root of your graph.
That is just it, by backtracking in your algorithm it means that you assign value to each node that its child already have values. As I said above this kind of solving problem is called dynamic programming.
Edit just for explaining your changes in the question. As you you have the following graph and answer is clearly B,E but you though this algorithm just give you B and you are incorrect this algorithm give you B and E.
A(9)---B(3)---C(2)
\ \
E(1) D(4)
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6 this means we use C and D; score[B][true] = 3 this means we use B
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4 This means we use B and E; score[A][true] = 12 this means we use B and A.
and you select 4 so you must use B and E. if it was just B your answer would be 3. but as you find it correctly your answer is 4 = 3 + 1 = B + E.
Also when E = 1000
A(9)---B(3)---C(2)
\ \
E(1000) D(4)
it is 100% correct that the answer is B and A because it is wrong to use E just because you dont want to select adjacent nodes. with this algorithm you will find the answer is A and B and just by checking you can find it too. suppose this covers :
C D A = 15
C D E = 1006
A B = 12
Although the first two answer have no adjacent nodes but they are bigger than last answer that have adjacent nodes. so it is best to use A and B for cover.

Seating people in a movie theater

This is based on an article I read about puzzles and interview questions asked by large software companies, but it has a twist...
General question:
What is an algorithm to seat people in a movie theater so that they sit directly beside their friends but not beside their enemies.
Technical question:
Given an N by M grid, fill the grid with N * M - 1 items. Each item has an association Boolean value for each of the other N * M - 2 items. In each row of N, items directly beside other items should have a positive association value for the other. Columns however do not matter, i.e. an item can be "enemies" with the item in front of it. Note: If item A has a positive association value for B, then that means B also has a positive association value for A. It works the same for negative association values. An item is guarenteed to have a positive association with atleast one other item. Also, you have access to all of the items and their association values before you start placing them in the grid.
Comments:
I have been researching this problem and thinking about it since yesterday, from what I have found it reminds me of the bin packing problem with some added requirements. In some free time I attempted to implement it, but large groups of "enemies" were sitting next to each other. I am sure that most situations will have to have atleast one pair of enemies sitting next to each other, but my solution was far from optimal. It actually looked as if I had just randomized it.
As far as my implementation went, I made N = 10, M = 10, the number of items = 99, and had an array of size 99 for EACH item that had a randomized Boolean value that referred to the friendship of the corresponding item number. This means that each item had a friendship value that corresponded with their self as well, I just ignored that value.
I plan on trying to reimplement this again later and I will post the code. Can anyone figure out a "good" way to do this to minimize seating clashes between enemies?
This problem is NP-Hard.
Define L={(G,n,m)|there is a legal seating for G in m×m matrix,(u,v) in E if u is friend of v} L is a formal definition of this problem as a language.
Proof:
We will show Hamiltonian-Problem ≤ (p) 2-path ≤ (p) This-problem in 2 steps [Hamiltonian and 2-path defined below], and thus we conclude this problem is NP-Hard.
(1) We will show that finding two paths covering all vertices without using any vertex twice is NP-Hard [let's call such a path: 2-path and this problem as 2-path problem]
A reduction from Hamiltonian Path problem:
input: a graph G=(V,E)
Output: a graph G'=(V',E) where V' = V U {u₀}.
Correctness:
if G has Hamiltonian Path: v₁→v₂→...→vn, then G' has 2-path:
v₁→v₂→...→vn,u₀
if G' has 2-path, since u₀ is isolated from the rest vertices, there is a
path: v₁→...→vn, which is Hamiltonian in G.
Thus: G has Hamiltonian path 1 ⇔ G' has 2-path, and thus: 2-path problem is NP-Hard.
(2)We will now show that our problem [L] is also NP-Hard:
We will show a reduction from the 2-path problem, defined above.
input: a graph G=(V,E)
output: (G,|V|+1,1) [a long row with |V|+1 sits].
Correctness:
If G has 2-path, then we can seat the people, and use the 1 sit gap to
use as a 'buffer' between the two paths, this will be a legal perfect seating
since if v₁ is sitting next to v₂, then v₁ v₁→v₂ is in the path, and thus
(v₁,v₂) is in E, so v₁,v₂ are friends.
If (G,|V|+1,1) is legal seat:[v₁,...,vk,buffer,vk+1,...,vn] , there is a 2-path in G,
v₁→...→vk, vk+1→...→vn
Conclusion: This problem is NP-Hard, so there is not known polynomial solution for it.
Exponential solution:
You might want to use backtracking solution: which is basically: create all subsets of E with size |V|-2 or less, check which is best.
static best <- infinity
least_enemies(G,used):
if |used| <= |V|-2:
val <- evaluate(used)
best <- min(best,val)
if |used| == |V|-2:
return
for each edge e in E-used: //E without used
least_enemies(G,used + e)
in here we assume evaluate(used) gives the 'score' for this solution. if this solution is completely illegal [i.e. a vertex appear twice], evaluate(used)=infinity. an optimization can of course be made, trimming these cases. to get the actual sitting we can store the currently best solution.
(*)There are probably better solutions, this is just a simple possible solution to start with, the main aim in this answer is proving this problem is NP-Hard.
EDIT: simpler solution:
Create a graph G'=(V U { u₀ } ,E U {(u₀,v),(v,u₀) | for each v in V}) [u₀ is a junk vertex for the buffer] and a weight function for edges:
w((u,v)) = 1 u is friend of v
w((u,v)) = 2 u is an enemy v
w((u0,v)) = w ((v,u0)) = 0
Now you got yourself a classic TSP, which can be solved in O(|V|^2 * 2^|V|) using dynamic programming.
Note that this solution [using TSP] is for one lined theatre, but it might be a good lead to find a solution for the general case.
One algorithm used for large "search spaces" such as this is simulated annealing

Dynamic programming question

I am stuck with one of the algorithm homework problem. Can anyone give me some hint to solve it? Here is the question:
Consider a chain structured computation represented by a weighted graph G = (V;E) where
V = {v1; v2; ... ; vn} and E = {(vi; vi+1) such that 1<= i <= n-1. We are also given a chain-structure m identical processors P = {P1; ... ; Pm} (i.e., there exists a communication link between Pk and Pk+1 for 1 <= k <= m - 1).
The set of vertices V represents computation modules, and the set of edges E represents
communication between the two modules. Each node vi is assigned a weight wi denoting the
execution time of the module on a single processor. Each edge (vi; vi+1) is assigned a weight ci denoting the amount of communication time between the two modules if they are assigned two different processors. If multiple modules are assigned to the same processor, the modules assigned to the same processor must be consecutive. Suppose modules va; va+1; .. ; vb are assigned to Processor Pk. Then, the time taken by Pk, denoted by Tk, is the time to compute assigned modules plus the time to communicate between neighboring processors. Hence, Tk = wa+...+ wb + ca-1 + cb. Note here that ca-1 = 0 if a = 1 and cb = 0 if b = n.
The objective of the problem is to find an assignment V to P such that max1<=k<=m Tk
is minimized, where we assume that each processor must take at least one module. (This
assumption can be relaxed by adding m dummy modules with zero weight on computational
and communication time.)
Develop a dynamic programming algorithm to solve this problem in polynomial time(i.e O(mn))
I tried to find the minimum execution time for each Pk and then find the max, but I doubt my solution is dynamic programming since there is no recursive formula. Please give me some hints!
Thanks!
I think you might be able to modify the Viterbi algorithm to solve this problem.
okay. this is easy.
decompose your problem to be a function you need to minimise, say F(n,k). which results into the minimum assignment of the first n nodes to k first processors.
Then derive your formula like this, collecting the number of nodes on the kth processor.
F(n,k) = min[i=0..n]( max(F(i,k-1), w[i]+...+w[n]+c[i-1]+c[n]) )
c[0] = 0
F(*,0) = inf
F(0,*) = inf

Paths in complete graph

I have a friend that needs to compute the following:
In the complete graph Kn (k<=13), there are k*(k-1)/2 edges.
Each edge can be directed in 2 ways, hence 2^[(k*(k-1))/2] different cases.
She needs to compute P[A !-> B && C !-> D] - P[A !-> B]*P[C !-> D]
X !-> Y means "there is no path from X to Y", and P[ ] is the probability.
So the bruteforce algorithm is to examine every one of the 2^[(k*(k-1))/2] different graphes, and since they are complete, in each graph one only needs to consider one set of A,B,C,D because of symmetry.
P[A !-> B] is then computed as "number of graphs with no path between node 1 and 2" divided by total number of graphs, i.e 2^[(k*(k-1))/2].
The bruteforce method works in mathematica up to K8, but she needs K9,K10... up to K13.
We obviously don't need to find the shortest path in the cases, just want to find if there is one.
Anyone have optimization suggestions? (This sound like a typical Project Euler problem).
Example:
The minimal graph K4 have 4 vertices, giving 6 edges. Hence there are 2^6 = 64 possible ways to assign directions to the edges, if we label the 4 vertices A,B,C and D.
In some graphs, there is NOT a path from A to B, (lets say X of them) and in some others, there are no path from C to D (lets say Y). But in some graphs, there is no path from A to B, and at the same time no path from C to D. These are W.
So P[A !-> B]=X/64, P[C !-> D]=Y/64 and P[A !-> B && C !-> D] = W/64.
Update:
A, B,C and D are 4 different vertives, hence we need at least K4.
Observe that we are dealing with DIRECTED graphs, so normal representation with UT-matrices won't suffice.
There is a function in mathematica that finds the distance between nodes in a directed graph, (if it returns infinity, there is no path), but this is a little bit overkill since we dont need the distance, just if there is a path or not.
I have a theory, but I don't have mathematica to test it with, so here goes. (And please excuse my mistakes in terminology, I'm not really familiar with graph theory.)
I agree that there are 2^(n*(n-1)/2) different directed Kn graphs. The question is how many of those contain a path A->B. Call that number S(n).
Suppose we know S(n) for some n, and we want to add another node, X, and calculate S(n+1). We will look for paths X->A.
There are 2^n ways to connect X to the preexisting graph.
The edge X-A might point in the "right" direction (X->A); there are 2^(n-1) ways to connect X this way, and it will lead to a path for any of the 2^(n*(n-1)/2) different Kn graphs.
If X-A points to X, try the edge X-B. If X-B points to B (and there are 2^(n-2) such ways to connect X) then some Kn graphs will give a path B->A, S(n) of them in fact.
If X-B points to X, try X-C; there are 2^(n-3)S(n) successful graphs there.
If my math is correct, S(n+1) = 2^((n+2)(n-1)/2) + (2^(n-1)-1)S(n)
So this gives the following:
S(2) = 1
S(3) = 5
S(4) = 47
S(5) = 841
S(6) = 28999
Can someone check this? Or give a closed form for S(n)?
EDIT:
I see now that the hard part is this P[A !-> B && C !-> D]. But I think the recursion approach will still work: start with {A,B,C,D}, then keep adding points, keeping track of the number of graphs in which A->(a points), (b points)->B, C->(c points) and (d points)->D, keeping the desired constraint. Ugly, but tractable.
The brute force approach of considering all graphs will not get you much further, you'll have to consider more than one graph at a time.
For 8 you have 2^28 ~ 256 million graphs.
9: 2^36 ~ 64 billion
10: 2^45 ~ 32 trillion
11: 2^55 > 1016
12: 2^66 > 1019
13: 2^78 > 1023
For the purpose of finding paths the interesting part is the partial ordering on the strongly connected components of the graph. Actually the ordering must be total, because there is an edge between any two nodes.
So you could try to consider total orderings, there are certainly a lot fewer than graphs.
I think that representing graph using matrix will be very helpful.
If A!->B put 0 in A th row and B th column.
Put 1 everywhere else.
Count no of 0s = Z.
then P[A!->B] = 1 / 2^Z
=> P[A!->B && C!->B] - P[A!-B].P[C!-D] = 1/2^2 - 1/ 2^(X-2) // Somthing wrong here I'm fixin it
where X = k(k-1)/2
A B C D
A . 0 1 1
B . . 1 1
C . . . 1
D . . . .
NOTE:We can use upper triangle without loss of generality.

Algorithm for multidimensional optimization / root-finding / something

I have five values, A, B, C, D and E.
Given the constraint A + B + C + D + E = 1, and five functions F(A), F(B), F(C), F(D), F(E), I need to solve for A through E such that F(A) = F(B) = F(C) = F(D) = F(E).
What's the best algorithm/approach to use for this? I don't care if I have to write it myself, I would just like to know where to look.
EDIT: These are nonlinear functions. Beyond that, they can't be characterized. Some of them may eventually be interpolated from a table of data.
There is no general answer to this question. A solver finding the solution to any equation does not exist. As Lance Roberts already says, you have to know more about the functions. Just a few examples
If the functions are twice differentiable, and you can compute the first derivative, you might try a variant of Newton-Raphson
Have a look at the Lagrange Multiplier Method for implementing the constraint.
If the function F is continuous (which it probably is, if it is an interpolant), you could also try the Bisection Method, which is a lot like binary search.
Before you can solve the problem, you really need to know more about the function you're studying.
As others have already posted, we do need some more information on the functions. However, given that, we can still try to solve the following relaxation with a standard non-linear programming toolbox.
min k
st.
A + B + C + D + E = 1
F1(A) - k = 0
F2(B) - k = 0
F3(C) -k = 0
F4(D) - k = 0
F5(E) -k = 0
Now we can solve this in any manner we wish, such as penalty method
min k + mu*sum(Fi(x_i) - k)^2
st
A+B+C+D+E = 1
or a straightforward SQP or interior-point method.
More details and I can help advise as to a good method.
m
The functions are all monotonically increasing with their argument. Beyond that, they can't be characterized. The approach that worked turned out to be:
1) Start with A = B = C = D = E = 1/5
2) Compute F1(A) through F5(E), and recalculate A through E such that each function equals that sum divided by 5 (the average).
3) Rescale the new A through E so that they all sum to 1, and recompute F1 through F5.
4) Repeat until satisfied.
It converges surprisingly fast - just a few iterations. Of course, each iteration requires 5 root finds for step 2.
One solution of the equations
A + B + C + D + E = 1
F(A) = F(B) = F(C) = F(D) = F(E)
is to take A, B, C, D and E all equal to 1/5. Not sure though whether that is what you want ...
Added after John's comment (thanks!)
Assuming the second equation should read F1(A) = F2(B) = F3(C) = F4(D) = F5(E), I'd use the Newton-Raphson method (see Martijn's answer). You can eliminate one variable by setting E = 1 - A - B - C - D. At every step of the iteration you need to solve a 4x4 system. The biggest problem is probably where to start the iteration. One possibility is to start at a random point, do some iterations, and if you're not getting anywhere, pick another random point and start again.
Keep in mind that if you really don't know anything about the function then there need not be a solution.
ALGENCAN (part of TANGO) is really nice. There are Python bindings, too.
http://www.ime.usp.br/~egbirgin/tango/codes.php - " general nonlinear programming that does not use matrix manipulations at all and, so, is able to solve extremely large problems with moderate computer time. The general algorithm is of Augmented Lagrangian type ... "
http://pypi.python.org/pypi/TANGO%20Project%20-%20ALGENCAN/1.0
Google OPTIF9 or ALLUNC. We use these for general optimization.
You could use standard search technic as the others mentioned. There are a few optimization you could make use of it while doing the search.
First of all, you only need to solve A,B,C,D because 1-E = A+B+C+D.
Second, you have F(A) = F(B) = F(C) = F(D), then you can search for A. Once you get F(A), you could solve B, C, D if that is possible. If it is not possible to solve the functions, you need to continue search each variable, but now you have a limited range to search for because A+B+C+D <= 1.
If your search is discrete and finite, the above optimizations should work reasonable well.
I would try Particle Swarm Optimization first. It is very easy to implement and tweak. See the Wiki page for it.

Resources