Reducing Graph Reachability to SAT (CNF) - algorithm

So I came across this problem in my textbook. I was wondering how to develop a reduction from the Graph Reachability problem to SAT (CNF) problem. (i.e. formula is satisfiable iff there exists a path in graph G from start to end node)
1) I can't wrap my head around how to go from something that can be solved in polynomial time (Graph Reachability) to something that is NP (SAT).
2) I can't seem to find a way to formulate these nodes/edges of Graph into actual clauses in CNF that correspond to reachability.
I tried to think about algorithms like Floyd-Warshall that determine if a path exists from start to end node but I can't seem to formulate that idea into actual CNF clauses. Help would be much appreciated!

It probably wouldn't be too hard to come up with the kind of answer you're expecting, but here's the real answer instead:
"Reducing" a problem X to problem Y means transforming any instance of X to an instance of Y such that the answer to Y provides the answer to X. Usually, we require a P-time reduction, i.e., the transformation of the problem and the extraction of the answer must both happen in polynomial time.
Graph Reachability is easily solved in linear time, which is certainly polynomial time, so the reduction from Graph Reachability to SAT is very simple:
Given a graph reachability problem, solve it in linear time;
If the desired path exists, write out any satisfiable SAT instance, like (A). Otherwise, write out any unsatisfiable SAT instance like (A)&(~A)

We did something similar to your task a few years ago. Our approach was based exactly on Floyd-Warshall (F.-W.) algorithm.
Intuitively, you would like to something like this:
Generate all possible paths using F.-W. for each pair of nodes
Generate a clause representing each path. It could be described as "if a path is selected, then the following nodes must be selected"
Generate a clause that unites all paths into a single CNF. Most likely it would be "exactly_one" clause.
A bit more formally:
Assign a binary literal to each node in a graph. The literal has value True iff. it belongs to a path between two nodes.
Run F.-W. for a pair of nodes
Turn resulting path to a clause:
nodes <- get_nodes_from_path(path)
node_lits <- logical_and([n.literal for n in nodes])
Get new literal for a path path_lit <- get_new_literal()
Add it to path a path: path_clause <- if_then_else(node_lits, path_lit)
Go to 2, enumerate all pairs
Finally, you could the following:
all_paths <- exactly_one(all_path_clauses)
all_paths <- True
SAT solver would be forced to select one of paths and this would lead to selecting corresponding nodes.

With respect to your first question: Since you're only devising a way to reduce a problem in P into a problem in NP (and not the other way around), this isn't actually a problem. You can turn any Graph Reachability problem into a SAT problem, but that doesn't mean you can turn any SAT problem into a Graph Reachability problem.

Related

Algorithms for Deducing a Timeline / Chronology

I'm looking for leads on algorithms to deduce the timeline/chronology of a series of novels. I've split the texts into days and created a database of relationships between them, e.g.: X is a month before Y, Y and Z are consecutive, date of Z is known, X is on a Tuesday, etc. There is uncertainty ('month' really only means roughly 30 days) and also contradictions. I can mark some relationships as more reliable than others to help resolve ambiguity and contradictions.
What kind of algorithms exist to deduce a best-fit chronology from this kind of data, assigning a highest-probability date to each day? At least time is 1-dimensional but dealing with a complex relationship graph with inconsistencies seems non-trivial. I have a CS background so I can code something up but some idea about the names of applicable algorithms would be helpful. I guess what I have is a graph with days as nodes as relationships as edges.
A simple, crude first approximation to your problem would be to store information like "A happened before B" in a directed graph with edges like "A -> B". Test the graph to see whether it is a Directed Acyclic Graph (DAG). If it is, the information is consistent in the sense that there is a consistent chronology of what happened before what else. You can get a sample linear chronology by printing a "topological sort" (topsort) of the DAG. If events C and D happened simultaneously or there is no information to say which came before the other, they might appear in the topsort as ABCD or ABDC. You can even get the topsort algorithm to print all possibilities (so both ABCD and ABDC) for further analysis using more detailed information.
If the graph you obtain is not a DAG, you can use an algorithm like Tarjan's algorithm to quickly identify "strongly connected components", which are areas of the graph which contain chronological contradictions in the form of cycles. You could then analyze them more closely to determine which less reliable edges might be removed to resolve contradictions. Another way to identify edges to remove to eliminate cycles is to search for "minimum feedback arc sets". That's NP-hard in general but if your strongly connected components are small the search could be feasible.
Constraint programming is what you need. In propagation-based CP, you alternate between (a) making a decision at the current choice point in the search tree and (b) propagating the consequences of that decision as far as you can. Notionally you do this by maintaining a domain D of possible values for each problem variable x such that D(x) is the set of values for x which have not yet been ruled out along the current search path. In your problem, you might be able to reduce it to a large set of Boolean variables, x_ij, where x_ij is true iff event i precedes event j. Initially D(x) = {true, false} for all variables. A decision is simply reducing the domain of an undecided variable (for a Boolean variable this means reducing its domain to a single value, true or false, which is the same as an assignment). If at any point along a search path D(x) becomes empty for any x, you have reached a dead-end and have to backtrack.
If you're smart, you will try to learn from each failure and also retreat as far back up the search tree as required to avoid redundant search (this is called backjumping -- for example, if you identify that the dead-end you reached at level 7 was caused by the choice you made at level 3, there's no point in backtracking just to level 6 because no solution exists in this subtree given the choice you made at level 3!).
Now, given you have different degrees of confidence in your data, you actually have an optimisation problem. That is, you're not just looking for a solution that satisfies all the constraints that must be true, but one which also best satisfies the other "soft" constraints according to the degree of trust you have in them. What you need to do here is decide on an objective function assigning a score to a given set of satisfied/violated partial constraints. You then want to prune your search whenever you find the current search path cannot improve on the best previously found solution.
If you do decide to go for the Boolean approach, you could profitably look into SAT solvers, which tear through these kinds of problems. But the first place I'd look is at MiniZinc, a CP language which maps on to a whole variety of state of the art constraint solvers.
Best of luck!

Proving breadth-first traversal on graphs

I am trying to prove the following algorithm to see if a there exists a path from u to v in a graph G = (V,E).
I know that to finish up the proof, I need to prove termination, the invariants, and correctness but I have no idea how. I think I need to use induction on the while loop but I am not exactly sure how.
How do I prove those three characteristics about an algorithm?
Disclaimer: I don't know how much formal you want your proof to be and I'm not familiar with formal proofs.
induction on the while loop: Is it true at the beginning? Does it remain true after a step (quite simple path property)?
same idea, induction on k (why k+1???): Is it true at the beginning? Does it remain true after a step (quite simple path property)?
Think Reach as a strictly increasing set.
Termination: maybe you can use a quite simple property linked to the diameter of the graph?
(This question could probably be better answered elsewhere, on https://cstheory.stackexchange.com/ maybe?)
There is a lot of possibilities. For example, for a Breadth First Search, we note that:
(1) The algorithm never visits the same node twice.(as any path back must be >= the length that put it in the discovered pile already.
(2) At every step, it adds exactly one node.
Thus, it clearly must terminate on any finite graph, as the set of nodes which are discoverable cannot be larger than the set of nodes which are in the graph.
Finally, since, give a start node, it will only terminate when it has reached every node which is connected by any path to the start node, it will always find a path between the start and target if it exists.
You can rewrite these logical steps above in deeper rigour if you like, for example, by showing that the list of visited nodes is strictly increasing, and non convergent (i.e. adding one to something repeatedly tends to infinity) and the termination condition must be met at at most some finite value, and a non convergent increasing function always intersects a given bound exactly once.
BFS is an easy example because it has such simple logic, but proving these things for a given algorithm may be extremely tricky.

SAT/CNF optimization

Problem
I'm looking at a special subset of SAT optimization problem. For those not familiar with SAT and related topics, here's the related Wikipedia article.
TRUE=(a OR b OR c OR d) AND (a OR f) AND ...
There are no NOTs and it's in conjunctive normal form. This is easily solvable. However I'm trying to minimize the number of true assignments to make the whole statement true. I couldn't find a way to solve that problem.
Possible solutions
I came up with the following ways to solve it:
Convert to a directed graph and search the minimum spanning tree, spanning only a subset of vertices. There's Edmond's algorithm but that gives a MST for the complete graph instead of a subset of the vertices.
Maybe there's a version of Edmond's algorithm that solves the problem for a subset of the vertices?
Maybe there's a way to construct a graph out of the original problem that's solvable with other algorithms?
Use a SAT solver, a LIP solver or exhaustive search. I'm not interested in those solutions as I'm trying to use this problem as lecture material.
Question
Do you have any ideas/comments? Can you come up with other approaches that might work?
This problem is NP-Hard as well.
One can show an east reduction from Hitting Set:
Hitting Set problem: Given sets S1,S2,...,Sn and a number k: chose set S of size k, such that for every Si there is an element s in S such that s is in Si. [alternative definition: the intersection between each Si and S is not empty].
Reduction:
for an instance (S1,...,Sn,k) of hitting set, construct the instance of your problem: (S'1 AND S'2 And ... S'n,k) where S'i is all elements in Si, with OR. These elements in S'i are variables in the formula.
proof:
Hitting Set -> This problem: If there is an instance of hittins set, S then by assigning all of S's elements with true, the formula is satisfied with k elements, since for every S'i there is some variable v which is in S and Si and thus also in S'i.
This problem -> Hitting set: build S with all elements whom assigment is true [same idea as Hitting Set->This problem].
Since you are looking for the optimization problem for this, it is also NP-Hard, and if you are looking for an exact solution - you should try an exponential algorithm

solving the Longest-Path-Length. Is my solution correct?

This is the question [From CLRS]:
Define the optimization problem LONGEST-PATH-LENGTH as the relation that
associates each instance of an undirected graph and two vertices with the number
of edges in a longest simple path between the two vertices. Define the decision
problem LONGEST-PATH = {: G=(V,E) is an undirected
graph, u,v contained in V, k >= 0 is an integer, and there exists a simple path
from u to v in G consisting of at least k edges}. Show that the optimization problem
LONGEST-PATH-LENGTH can be solved in polynomial time if and only if
LONGEST-PATH is contained in P.
My solution:
Given an algorith A, that can solve G(u,v) in polytime, so we run the A on G(u,v) if it returns 'YES" and k' such that k' is the longest path in G(u,v), now all we have to do it compare if
k =< k'
if then the longest path length is solved. If we recieve "NO" or k>=k', then there exists no solution.
so polytime to run A + constant for comparsion, then to find the longest path length it takes poly time. Also this is only possible since G(u,v) runs in Polytime (in P), thus G(u,v,k) runs also in polytime (in P), therefore since longest path can be reduced to longest-path-length, then longest-path-length is in P.
we can solve it the oposite way, what we do is, run G(u,v,k') for k'=0 to n, every time check if the k==k', is so we solved it.
run time analysis for this:
n*polytime+ n*(constant comparsion)=polytime
Can someone tell me if my answer is reasonable? if not please tell me where i've gone wrong
Also can you give me some advice to how to study algorithms ,and what approch i should take to solve a algorith question (or a graph question)
please and thankyou
Your answer is reasonable but I would try to shore it up a little bit formally (format the cases separately in a clear manner, be more precise about what polynomial time means, that kind of stuff...)
The only thing that I would like to point out is that in your second reduction (showing the decision problem solves the optimization problem) the for k=0 to N solution is not general. Polynomial time is determined in relation to the length of input so in problems where N is a general number (such as weight or something) instead of a number of a count of items from the input (as in this case) you need to use a more advanced binary search to be sure.

Matching algorithm

Odd question here not really code but logic,hope its ok to post it here,here it is
I have a data structure that can be thought of as a graph.
Each node can support many links but is limited to a value for each node.
All links are bidirectional. and each link has a cost. the cost depends on euclidian difference between the nodes the minimum value of two parameters in each node. and a global modifier.
i wish to find the maximum cost for the graph.
wondering if there was a clever way to find such a matching, rather than going through in brute force ...which is ugly... and i'm not sure how i'd even do that without spending 7 million years running it.
To clarify:
Global variable = T
many nodes N each have E,X,Y,L
L is the max number of links each node can have.
cost of link A,B = Sqrt( min([a].e | [b].e) ) x
( 1 + Sqrt( sqrt(sqr([a].x-[b].x)+sqr([a].y-[b].y)))/75 + Sqrt(t)/10 )
total cost =sum all links.....and we wish to maximize this.
average values for nodes is 40-50 can range to (20..600)
average node linking factor is 3 range 0-10.
For the sake of completeness for anybody else that looks at this article, i would suggest revisiting your graph theory algorithms:
Dijkstra
Astar
Greedy
Depth / Breadth First
Even dynamic programming (in some situations)
ect. ect.
In there somewhere is the correct solution for your problem. I would suggest looking at Dijkstra first.
I hope this helps someone.
If I understand the problem correctly, there is likely no polynomial solution. Therefore I would implement the following algorithm:
Find some solution by beng greedy. To do that, you sort all edges by cost and then go through them starting with the highest, adding an edge to your graph while possible, and skipping when the node can't accept more edges.
Look at your edges and try to change them to archive higher cost by using a heuristics. The first that comes to my mind: you cycle through all 4-tuples of nodes (A,B,C,D) and if your current graph has edges AB, CD but AC, BD would be better, then you make the change.
Optionally the same thing with 6-tuples, or other genetic algorithms (they are called that way because they work by mutations).
This is equivalent to the traveling salesman problem (and is therefore NP-Complete) since if you could solve this problem efficiently, you could solve TSP simply by replacing each cost with its reciprocal.
This means you can't solve exactly. On the other hand, it means that you can do exactly as I said (replace each cost with its reciprocal) and then use any of the known TSP approximation methods on this problem.
Seems like a max flow problem to me.
Is it possible that by greedily selecting the next most expensive option from any given start point (omitting jumps to visited nodes) and stopping once all nodes are visited? If you get to a dead end backtrack to the previous spot where you are not at a dead end and greedily select. It would require some work and probably something like a stack to keep your paths in. I think this would work quite effectively provided the costs are well ordered and non negative.
Use Genetic Algorithms. They are designed to solve the problem you state rapidly reducing time complexity. Check for AI library in your language of choice.

Resources