Proving NP complexity - algorithm

I'm learning how to prove something is NP. In Thomas Cormen's intro to algorithm book, he states something is NP if given a solution to some problem, you can verify it is correct in polynomial time.
Say the problem is 2x + 9 = 55, and let's pretend I don't know how long it takes the find the correct x value, but an algorithm to solve the problem returned the solution 23. Then to show it is NP, do I only need to plug 23 in back in the equation, and see if that took a polynomial time and gave me 55? Thanks.

Yes; if you can check the correctness of every correct and every incorrect answer for every instance of this problem in polynomial time, then the problem is in NP.

Adding information to #Mehrdad answer:
Note that NP stands for Nondeterministic Polynomial time - it means that by definition - a problem is in NP if and only if it can be solved polynomially by a Non-deterministic Turing Machine.
It is equivalent to saying that in the standard computation model (RAM machine/ deterministic turing machine) - you can verify an answer in polynomial time (like #Mehrdad answered). The proof for the equivalence is described in the wikipedia page for equivalence of definitions
Bonus:
The question of "is everything that is verifiable (polynomially) on turing machine is also solveable polynomially" and the questions "is everything that is solveable on non-deterministic turing machine polynomially also solveable on deterministic turing machine polynomially" are also equivalent.
The answer is yet unknown and the problem is known as the P vs NP problem, which is the most important open question on computer science.

While the problems above cover the last, and perhaps, most identifiable step of NP-ness, there are 3 basic steps to showing that a problem is in NP.
Decision Problem: Can you turn your problem into a decision problem? In the case of the equation problem,the decision problem would be "is there an x such that 2x+9=55?" ?
Certificate: Can you identify an answer to your decision question? Again, in the case of the equation problem, an answer might be x = 23
Verification: Can you verify a certificate in polynomial time. By verifying in polynomial time, you know that the problem is not NP-Hard. Some verification steps might be: is x a number? is X equal to one half of 55-9?
That is how you know that your problem is completely in NP.

Related

Cook's theorem and NP complete reductions

Based on Cook's theorem,
Any NP problem can be converted to SAT in polynomial time
I know that SAT is a NP-complete problem. Therefore, is it accurate to say:
If we can reduce a search problem A (which is in NP) to problem B in a polynomial number of steps, then problem B must be NP-complete?
You are on the right track, but there is a little bit more that must be done to show that problem A is in fact NP hard. If you have already proven that problem A is in NP (reword as a decision problem, describe a yes certificate, and show that it can be verified in polynomial time), then what you must do is show that if you were to hypothetically find an algorithm that solves problem A in polynomial time, then that algorithm could be used to solve any SAT problem in polynomial time as well.
This shows that your problem requires you to solve SAT (as well as other possible inputs for problem A) in polynomial time, and since SAT has yet to be solved in polynomial time, you can explain to whoever is asking you to solve the problem that this is an unreasonable request. To show this, find a way to convert SAT into input for your problem A (think how to transform the edges and vertices into the input of problem A).
Now, show that this transformation from SAT to problem A is done in polynomial time, and then show that the answer from problem A can be transformed back into an answer for SAT (again, in polynomial time). Lastly, make sure to explain that an answer to problem A is equivalent to an answer to SAT (the answer to SAT is correct IFF the answer to problem A is correct).
For all of these steps, treat the hypothetical algorithm for problem A as a black box that magically solves the problem in polynomial time.

How do we know NP-complete problems are the hardest in NP?

I get that if you can do a polynomial time reduction from "every" problem then it proves that the problem is at least as hard as every problem in NP. Except, how do we know that we've discovered every problem in NP? Can't there exist problems that we may not have discovered or proven exist in NP but CANNOT be reduced to any np-complete problem? Or is this still an open question?
As others have correctly stated, the existence of the problem that is NP, but is not NP-complete would imply that P != NP, so finding one would bring you a million dollar and eternal glory. One famous problem that is believed to belong in this class is integer factorization. However, your original question was
Can't there exist problems that we may not have discovered or proven
exist in NP but CANNOT be reduced to any np-complete problem?
The answer is no. By definition of NP-completeness, one of two
necessary conditions for a problem A to be NP-complete is that every NP problem needs to be reducible in polynomial time to A. If you want to find out how to prove that every single NP problem can be reducible in polynomial time to some NP-complete problem, have a look at the proof of Cook-Levin theorem that states that 3-SAT problem is NP-complete. It was the first proven NP-complete problem and many other NP-complete problems are later proven to be NP-complete by finding the appropriate reduction from 3-SAT to these problems.
NP consists of all problems that could (theoretically) be solved by being able to make lucky guesses, guessing the solution and checking in polynomial time that the solution is correct. For example, the travelling salesman problem "can I visit the capitols of all 50 states of the USA with a trip of less than 9,825 miles" can be solved by guessing a trip and checking that it is not too long.
And one problem in NP is basically simulating a programmable computer circuit with various inputs and checking whether a certain output can be achieved. And that programmable computer circuit is powerful enough to solve all problems in NP.
So yes, we know all about all problems in NP.
(Then of course an NP complete problem can by definition be used to solve any problem in NP. If there is a problem that it cannot solve, that problem is not in NP).
Except, how do we know that we've discovered every problem in NP?
We don't. The set of all problems in the universe is not only infinite, but uncountable.
Can't there exist problems that we may not have discovered or proven
exist in NP but CANNOT be reduced to any np-complete problem?
We don't know that. We suspect that this is the case, but this hasn't been proven yet. If we were to find a NP problem that is not in NP-Complete, it would be proof that P =/= NP.
It is one of the great unsolved problems in CS. Many brilliant minds have been taking a go at it, but this nut has been one tough one to crack.

NP complete - solvable in non-deterministic polynomial time

It is written in a book that --"If a problem A is NP-Complete, there exists a non-deterministic polynomial time algorithm to solve A" . But as far I know 'yes' -answer for NP complete problems can be "verified" in polynomial time. I am really confused. Can a NP-complete problem be "solved" using non-deterministic polynomial time algorithm?
The two things are basically identical and are based on two different though equivalent definition of NP.
Every problem (language) in NP must be:
Verified in polynomial time by a deterministic turing machine. (given a problem and a 'verification', you can answer if the verification is correct for the problem in a polynomial time).
Example: Given a graph, and you want to check if there is a hamiltonian path in it - the verifier can be the path. You can easily check if the path is indeed hamiltonian once you have it.
Solved in polynomial time by a non deterministic turing machine. (there exists non deterministic Turing Machine M that can solve the problem polynomially)
Since by definition of NP-Complete - a problem is NP Complete if it is NP-Hard AND in NP, every NP-Complete problem is also NP - and both are correct.
Note that those two claims are basically based on the two equivalent definitions for NP:
A language L is in NP if for each x in L there is a word z such that |z| is polynomial in |x|, and there exists some deterministic turing machine that runs in polynomial time M - such that for each x and its matching z,: M(x,z) = true if and only if x is in L
A problem is in NP if there is a non deterministic turing machine that can solve the problem in polynomia time. Formally, a language L is in NP if there is a non deterministic turing machine such that M(x) = true if and only if x is in L
Finding the solution to an NP-complete problem can be done in polynomial time on a non-deterministic Turing machine. Given a candidate solution of an NP-complete problem, it can be verified whether it is indeed a solution or not, i.e. checked, in polynomial time on a deterministic Turing machine.
So the difference is in finding a solution and checking a solution. The former usually requires some kind of search for NP-complete problems while the latter is just verifying the assignments to your variables.

What would a P=NP proof be like, hypothetically?

Would it be an polynomial time algorithm to a specific NP-complete problem, or just abstract reasonings that demonstrate solutions to NP-complete problems exist?
It seems that the a specific algoithm is much more helpful. With it, all we'll have to do to polynomially solve an NP problem is to convert it into the specific NP-complete problem for which the proof has a solution, and we are done.
P = NP: "The 3SAT problem is a classic NP complete problem. In this proof, we demonstrate an algorithm to solve it that has an asymptotic bound of (n^99 log log n). First we ..."
P != NP: "Assume there was a polynomial algorithm for the 3SAT problem. This would imply that .... which by ..... implies we can do .... and then ... and then ... which is impossible. This was all predicated on a polynomial time algorithm for 3SAT. Thus P != NP."
UPDATE: Perhaps something like this paper (for P != NP).
UPDATE 2: Here's a video of Michael Sipser sketching out a proof for P != NP
Call me pessimistic, but it will be like this:
...
∴, P ≠ NP
QED
There are some meta-results about what a P=NP or P≠NP proof can not look like. The details are quite technical, but it is known that the proof cannot be
relativizing, which kind of means that the proof must make use of the exact definition of Turing machine used, because with some modifications ("oracles", like very powerful CISC instructions added to the instruction set) P=NP, and with some other modifications, P≠NP. See also this blog post for a nice explanation of relativization.
natural, a property of several classic circuit complexity proofs,
or algebrizing, a generalization of relativizing.
It could take the form of demonstrating that assuming P ≠ NP leads to a contradiction.
It might not be connected to P and NP in a straightforward way... Many theorems now are based on P!=NP, so proving one assumed fact to be untrue would make a big difference. Even proving something like constant ratio approximation for TS should be enough IIRC. I think, existence of NPI (GI) and other sets is also based on P!=NP, so making any of them equal to P or NP might change the situation completely.
IMHO everything happens now on a very abstract level. If someone proves anything about P=/!=NP, it doesn't have to mention any of those sets or even a specific problem.
Probably it would be in the form of a reduction from an NP problem to a P problem. See the Wikipedia page on reductions.
OR
Like this proof proposed by Vinay Deolalikar.
The most straightforward way is to prove that there is a polynomial time solution to the problems in the class NP-complete. These are problems that are in NP and are reducable to one of the known np problem. That means you could give a faster algorithm to prove the original problem posed by Stephen Cook or many others which have also been shown to be NP-Complete. See Richard Karp's seminal paper and this book for more interesting problems. It has been shown that if you solve one of these problems the entire complexity class collapses. edit: I have to add that i was talking to my friend who is studying quantum computation. Although I had no clue what it means, he said that a certain proof/experiment? in the quantum world could make the entire complexity class, i mean the whole thing, moot. If anyone here knows more about this, please reply.
There have also been numerous attempts to the problem without giving a formal algorithm. You could try to count the set. Theres the Robert/Seymore proof. People have also tried to solve it using the tried and tested diagonlization proof(also used to show that there are problems that you can never solve). Razborov also showed that if there are certain one-way functions then any proof cannot give a resolution. That means that new techniques will be required in order to solve this question.
Its been 38 years since the original paper has been published and there still is no sign of a proof. Not only that but lot of problems that mathematicians had been posing before the notion of complexity classes came in has been shown to be NP. Therefor many mathematicians and computer scientists believe that some of the problems are so fundamental that a new kind of maths may be needed to solve the problem. You have to keep in mind that the best minds human race has to offer have tackled this problem without any success. I think it should be at least decades before somebody cracks the puzzle. But even if there is a polynomial time solution the constants or the exponent could be so large that it would be useless in our problems.
There is an excellent survey available which should answer most of your questions: http://www.scottaaronson.com/papers/pnp.pdf.
Certainly a descriptive proof is the most useful, but there are other categories of proof: it is possible, for example, to provide 'existence proofs' that demonstrate that it is possible to find an answer without finding (or, sometimes, even suggesting how to find) that answer.
Set N equal to the multiplicative identity. Then NP = P. QED. ;-)
It would likely look almost precisely like one of these
Good question; it could take either form. Obviously, the specific algorithm would be more helpful, yes, but there's no determining that that would be the way that a theoretical P=NP proof would occur. Given that the nature of NP-complete problems and how common they are, it would seem that more effort has been put into solving those problems than has been put into solving the theoretical reasoning side of the equation, but that's just supposition.
Any nonconstructive proof that P=NP really is not. It would imply that the following explicit 3-SAT algorithm runs in polynomial time:
Enumerate all programs. On round i, run all programs numbered
less than i for one step. If
a program terminates with a
satisfying input to the formula, return true. If a program
terminates with a formal proof that
no such input exists, return
false.
If P=NP, then there exists a program which runs in O(poly(N)) and outputs a satisfying input to the formula, if such a formula exists.
If P=coNP, there exists a program which runs in O(poly(N)) and outputs a formal proof that no formula exists, if no formula exists.
If P=NP, then since P is closed under complement NP=coNP. So, there exists a program which runs in O(poly(N)) and does both. That program is the k'th program in the enumeration. k is O(1)! Since it runs in O(poly(N)) our brute force simulation only requires
k*O(poly(N))+O(poly(N))^2
rounds once it reaches the program in question. As such, the brute force simulation runs in polynomial time!
(Note that k is exponential in the size of the program; this approach is not really feasible, but it suggests that it would be hard to do a nonconstructive proof that P=NP, even if it were the case.)
An interesting read that is somewhat related to this
To some extent, the form such a proof needs to have depends on your philosophical point of view (= the axioms you deem to be true) - e.g., as a contructivist you would demand the construction of an actual algorithm that requires polynomial time to solve an NP-complete problem. This could be done by using reduction, but not with an indirect proof. Anyhow, it really seems to be very unlikely :)
The proof would deduce a contradiction from to the assumption that at least one element (problem) of NP isn't also an element of P.

Explaining computational complexity theory

Assuming some background in mathematics, how would you give a general overview of computational complexity theory to the naive?
I am looking for an explanation of the P = NP question. What is P? What is NP? What is a NP-Hard?
Sometimes Wikipedia is written as if the reader already understands all concepts involved.
Hoooo, doctoral comp flashback. Okay, here goes.
We start with the idea of a decision problem, a problem for which an algorithm can always answer "yes" or "no." We also need the idea of two models of computer (Turing machine, really): deterministic and non-deterministic. A deterministic computer is the regular computer we always thinking of; a non-deterministic computer is one that is just like we're used to except that is has unlimited parallelism, so that any time you come to a branch, you spawn a new "process" and examine both sides. Like Yogi Berra said, when you come to a fork in the road, you should take it.
A decision problem is in P if there is a known polynomial-time algorithm to get that answer. A decision problem is in NP if there is a known polynomial-time algorithm for a non-deterministic machine to get the answer.
Problems known to be in P are trivially in NP --- the nondeterministic machine just never troubles itself to fork another process, and acts just like a deterministic one. There are problems that are known to be neither in P nor NP; a simple example is to enumerate all the bit vectors of length n. No matter what, that takes 2n steps.
(Strictly, a decision problem is in NP if a nodeterministic machine can arrive at an answer in poly-time, and a deterministic machine can verify that the solution is correct in poly time.)
But there are some problems which are known to be in NP for which no poly-time deterministic algorithm is known; in other words, we know they're in NP, but don't know if they're in P. The traditional example is the decision-problem version of the Traveling Salesman Problem (decision-TSP): given the cities and distances, is there a route that covers all the cities, returning to the starting point, in less than x distance? It's easy in a nondeterministic machine, because every time the nondeterministic traveling salesman comes to a fork in the road, he takes it: his clones head on to the next city they haven't visited, and at the end they compare notes and see if any of the clones took less than x distance.
(Then, the exponentially many clones get to fight it out for which ones must be killed.)
It's not known whether decision-TSP is in P: there's no known poly-time solution, but there's no proof such a solution doesn't exist.
Now, one more concept: given decision problems P and Q, if an algorithm can transform a solution for P into a solution for Q in polynomial time, it's said that Q is poly-time reducible (or just reducible) to P.
A problem is NP-complete if you can prove that (1) it's in NP, and (2) show that it's poly-time reducible to a problem already known to be NP-complete. (The hard part of that was provie the first example of an NP-complete problem: that was done by Steve Cook in Cook's Theorem.)
So really, what it says is that if anyone ever finds a poly-time solution to one NP-complete problem, they've automatically got one for all the NP-complete problems; that will also mean that P=NP.
A problem is NP-hard if and only if it's "at least as" hard as an NP-complete problem. The more conventional Traveling Salesman Problem of finding the shortest route is NP-hard, not strictly NP-complete.
Michael Sipser's Introduction to the Theory of Computation is a great book, and is very readable. Another great resource is Scott Aaronson's Great Ideas in Theoretical Computer Science course.
The formalism that is used is to look at decision problems (problems with a Yes/No answer, e.g. "does this graph have a Hamiltonian cycle") as "languages" -- sets of strings -- inputs for which the answer is Yes. There is a formal notion of what a "computer" is (Turing machine), and a problem is in P if there is a polynomial time algorithm for deciding that problem (given an input string, say Yes or No) on a Turing machine.
A problem is in NP if it is checkable in polynomial time, i.e. if, for inputs where the answer is Yes, there is a (polynomial-size) certificate given which you can check that the answer is Yes in polynomial time. [E.g. given a Hamiltonian cycle as certificate, you can obviously check that it is one.]
It doesn't say anything about how to find that certificate. Obviously, you can try "all possible certificates" but that can take exponential time; it is not clear whether you will always have to take more than polynomial time to decide Yes or No; this is the P vs NP question.
A problem is NP-hard if being able to solve that problem means being able to solve all problems in NP.
Also see this question:
What is an NP-complete in computer science?
But really, all these are probably only vague to you; it is worth taking the time to read e.g. Sipser's book. It is a beautiful theory.
This is a comment on Charlie's post.
A problem is NP-complete if you can prove that (1) it's in NP, and
(2) show that it's poly-time reducible to a problem already known to
be NP-complete.
There is a subtle error with the second condition. Actually, what you need to prove is that a known NP-complete problem (say Y) is polynomial-time reducible to this problem (let's call it problem X).
The reasoning behind this manner of proof is that if you could reduce an NP-Complete problem to this problem and somehow manage to solve this problem in poly-time, then you've also succeeded in finding a poly-time solution to the NP-complete problem, which would be a remarkable (if not impossible) thing, since then you'll have succeeded to resolve the long-standing P = NP problem.
Another way to look at this proof is consider it as using the the contra-positive proof technique, which essentially states that if Y --> X, then ~X --> ~Y. In other words, not being able to solve Y in polynomial time isn't possible means not being to solve X in poly-time either. On the other hand, if you could solve X in poly-time, then you could solve Y in poly-time as well. Further, you could solve all problems that reduce to Y in poly-time as well by transitivity.
I hope my explanation above is clear enough. A good source is Chapter 8 of Algorithm Design by Kleinberg and Tardos or Chapter 34 of Cormen et al.
Unfortunately, the best two books I am aware of (Garey and Johnson and Hopcroft and Ullman) both start at the level of graduate proof-oriented mathematics. This is almost certainly necessary, as the whole issue is very easy to misunderstand or mischaracterize. Jeff nearly got his ears chewed off when he attempted to approach the matter in too folksy/jokey a tone.
Perhaps the best way is to simply do a lot of hands-on work with big-O notation using lots of examples and exercises. See also this answer. Note, however, that this is not quite the same thing: individual algorithms can be described by asymptotes, but saying that a problem is of a certain complexity is a statement about every possible algorithm for it. This is why the proofs are so complicated!
I remember "Computational Complexity" from Papadimitriou (I hope I spelled the name right) as a good book
very much simplified: A problem is NP-hard if the only way to solve it is by enumerating all possible answers and checking each one.
Here are a few links on the subject:
Clay Mathematics statement of P vp NP problem
P vs NP Page
P, NP, and Mathematics
In you are familiar with the idea of set cardinality, that is the number of elements in a set, then one could view the question like P representing the cardinality of Integer numbers while NP is a mystery: Is it the same or is it larger like the cardinality of all Real numbers?
My simplified answer would be: "Computational complexity is the analysis of how much harder a problem becomes when you add more elements."
In that sentence, the word "harder" is deliberately vague because it could refer either to processing time or to memory usage.
In computer science it is not enough to be able to solve a problem. It has to be solvable in a reasonable amount of time. So while in pure mathematics you come up with an equation, in CS you have to refine that equation so you can solve a problem in reasonable time.
That is the simplest way I can think to put it, that may be too simple for your purposes.
Depending on how long you have, maybe it would be best to start at DFA, NDFA, and then show that they are equivalent. Then they understand ND vs. D, and will understand regular expressions a lot better as a nice side effect.

Resources