How do we know NP-complete problems are the hardest in NP? - big-o

I get that if you can do a polynomial time reduction from "every" problem then it proves that the problem is at least as hard as every problem in NP. Except, how do we know that we've discovered every problem in NP? Can't there exist problems that we may not have discovered or proven exist in NP but CANNOT be reduced to any np-complete problem? Or is this still an open question?

As others have correctly stated, the existence of the problem that is NP, but is not NP-complete would imply that P != NP, so finding one would bring you a million dollar and eternal glory. One famous problem that is believed to belong in this class is integer factorization. However, your original question was
Can't there exist problems that we may not have discovered or proven
exist in NP but CANNOT be reduced to any np-complete problem?
The answer is no. By definition of NP-completeness, one of two
necessary conditions for a problem A to be NP-complete is that every NP problem needs to be reducible in polynomial time to A. If you want to find out how to prove that every single NP problem can be reducible in polynomial time to some NP-complete problem, have a look at the proof of Cook-Levin theorem that states that 3-SAT problem is NP-complete. It was the first proven NP-complete problem and many other NP-complete problems are later proven to be NP-complete by finding the appropriate reduction from 3-SAT to these problems.

NP consists of all problems that could (theoretically) be solved by being able to make lucky guesses, guessing the solution and checking in polynomial time that the solution is correct. For example, the travelling salesman problem "can I visit the capitols of all 50 states of the USA with a trip of less than 9,825 miles" can be solved by guessing a trip and checking that it is not too long.
And one problem in NP is basically simulating a programmable computer circuit with various inputs and checking whether a certain output can be achieved. And that programmable computer circuit is powerful enough to solve all problems in NP.
So yes, we know all about all problems in NP.
(Then of course an NP complete problem can by definition be used to solve any problem in NP. If there is a problem that it cannot solve, that problem is not in NP).

Except, how do we know that we've discovered every problem in NP?
We don't. The set of all problems in the universe is not only infinite, but uncountable.
Can't there exist problems that we may not have discovered or proven
exist in NP but CANNOT be reduced to any np-complete problem?
We don't know that. We suspect that this is the case, but this hasn't been proven yet. If we were to find a NP problem that is not in NP-Complete, it would be proof that P =/= NP.
It is one of the great unsolved problems in CS. Many brilliant minds have been taking a go at it, but this nut has been one tough one to crack.

Related

Is there an NP problem that is not NP-complete or P?

I am trying to understand the relationships between P, NP, NP-Complete and NP-Hard.
I believe I am starting to understand the general idea but, I am hung up on this question(see title).
What is an example of a problem that is not solvable in P time, is verifiable in P time but is not NP-Complete?
If there is some piece of understanding I am missing please fill me in.
Thanks in advance
As noted in the comments, this is the wrong site for this question. However, it can be answered briefly:
What is an example of a problem that is not solvable in P time, is verifiable in P time but is not NP-Complete?
If I understand you, what you want are problems that are (1) not in P, (2) in NP, and (3) not in NPC. Such problems are the NP-intermediate (NPI) problems.
It is not known if there is any such problem, because it is not known if P=NP.
If P=NP then clearly there are no such problems; if P=NP then also P=NPC, and therefore every problem which can be verified in P time is in all of P, NP and NPC because they are equal.
If P!=NP then it is known that there are such problems; at least one exists. Unfortunately we do not know if any real-world problems we face are in NPI provided that P!=NP. A list of likely candidates can be found here:
https://en.wikipedia.org/wiki/NP-intermediate
In short: knowing whether NPI is empty or not is equivalent to solving proving P!=NP, so get cracking! If you can find a problem that is definitely in NP but definitely not in P or NPC, then there's a big money prize awaiting you.

NP-complete vs NP-hard (why are they unequal?)

Why is NP-hard unequal to NP-complete?
My informal understanding of definitions being used:
NP - all problems that can be verified in polynomial time
NP-complete - all problems that are NP and NP-hard
NP-hard - at least as hard as the hardest problem in NP
Decision Problem - A problem that asks a question with regards to an input and outputs a bool value
Confusion:
The problem with unknown solution of P vs NP arises from the fact that we cannot prove or disprove all problems in NP can be solved in polynomial time. It feels like a similar question arises from NP-complete vs NP-hard. How do we know all problems in NP-hard cannot be verified in polynomial time and thus result in NP-hard=NP-complete?
Here is my line of reasoning
From online research the distinction seems that this has something to do with decision problems (a concept I'm entirely new to but seem simple enough). I think this means that problems in NP have complementary decision problems that ask if an input is the solution to the problem. Let's say the problem is to find an optimal solution. I believe the complementary decision problem to be "is the given input the optimal solution?"and I believe that if this decision problem is verifiable in polynomial time then the problem is NP-complete (or in NP). So this means that NP-hard problems that aren't NP-complete problems are those that either have no decision problem (which I believe is never true since any brute force solution can answer this) or a problem is NP-hard and not NP-complete if it has a decision problem that's not verifiable in polynomial time. If it is the latter then it feels like we have the same problem from P vs NP. That is, how do we confirm all decision problems in NP-hard do not have polynomial time solutions?
Sorry if the above phrasing is weird. I will try and clarify any confusion in my question.
notes
I am interested in both an intuitive explanation and a formal explanation (a proof if it's a complicated answer). The formal explanation can certainly be a link to an academic paper. I don't want anyone to invest a significant amount of time into an overly complicated proof that may be beyond the scope of my understanding (I've found complexity theory to become very quickly... complex).
If it helps for the sake of explanation I have done work on the traveling salesman problem and I am currently working on a paper for the nurse scheduling problem (I believe these are NP-hard problems).
NP-Hard includes all problems whose solutions can be used to derive solutions to problems in NP with polynomial overhead.
This includes lots of problems that aren't in NP. For instance, the halting problem - an undecidable problem - is NP-Hard, because any problem in NP can be reduced to it in polynomial time:
Reduce any problem in NP to an instance of the NP-Complete problem 3-SAT
Construct in polynomial time a TM which checks all assignments and halts iff a satisfying assignment is found.
Use a solution to the halting problem to tell whether the TM halts.
If it halts, accept; otherwise, reject.

Example of an undecidable that is not NP-hard?

Can someone give me an example of an undecidable problem that is not NP-hard?
I'm unable to understand the difference between the two.
Thanks very much!
An NP-hard problem is one such that every problem in NP can be reduced to it. In fact, it is "at least as hard as" the problems in NP class. For example, TSP (Traveling Sales Person) is NP-hard. However, undecidable is a problem for which there is no algorithm that always decide correctly. For example, the question of whether a program halts at some point or not is undecidable. In fact, you may not have an algorithm that can answer this question correctly for all programs in the world. (This can be proved)
So, in brief, an undecidable problem is logically hard; no matter how strong your computers or algorithms are, they cannot be solved. But, NP-hard problems have algorithms to be solved with but those algorithms are not polynomial in time.

Relationship between NP-hard and undecidable problems

Am a bit confused about the relationship between undecidable problems and NP hard problems. Whether NP hard problems are a subset of undecidable problems, or are they just the same and equal, or is it that they are not comparable?
For me, I have been arguing with my friends that undecidable problems are a superset to the NP hard problems. There would exist some problems that are not in NP hard but are undecidable. But i am finding this argument to be weak and am confused a bit. Are there NP-complete problems that are undecidable.? is there any problem in NP hard which is decidable.??
Some discussion would be of great help! Thanks!
Undecidable = unsolvable for some inputs. No matter how much (finite) time you give your algorithm, it will always be wrong on some input.
NP-hard ~= super-polynomial running time (assuming P != NP). That's hand-wavy, but basically NP-hard means it is at least as hard as the hardest problem in NP.
There are certainly problems that are NP-hard which are not undecidable (= are decidable). Any NP-complete problem would be one of them, say SAT.
Are there undecidable problems which are not NP-hard? I don't think so, but it isn't easy to rule it out - I don't see an obvious argument that there must be a reduction from SAT to all possible undecidable problems. There could be some weird undecidable problems which aren't very useful. But the standard undecidable problems (the halting problem, say) are NP-hard.
An NP-hard is a problem that is at least as hard as any NP-complete problem.
Therefore an undecidable problem can be NP-hard. A problem is NP-hard if an oracle for it would make solving NP-complete problems easy (i.e. solvable in polynomial time). We can imagine an undecidable problem such that, given an oracle for it, NP-complete problems would be easy to solve. For example, obviously every oracle that solves the halting problem can also solve an NP-complete problem, so every Turing-complete problem is also NP-hard in the sense that a (fast) oracle for it would make solving NP-complete problems a breeze.
Therefore Turing-complete undecidable problems are a subset of NP-hard problems.
Undecidable problem e.g. Turing Halting Problem is NP-Hard only.
<---------NP Hard------>
|------------|-------------||-------------|------------|--------> Computational Difficulty
|<----P--->|
|<----------NP---------->|
|<-----------Exponential----------->|
|<---------------R (Finite Time)---------------->|
In this diagram, that small pipe shows overlapping of NP and NP-Hard and which shows NP-Completeness, i.e. set of those problems which are NP as well as NP-Hard.
Undecidable problems are NP Hard problems which do not have solution and which are not in NP.

Explaining computational complexity theory

Assuming some background in mathematics, how would you give a general overview of computational complexity theory to the naive?
I am looking for an explanation of the P = NP question. What is P? What is NP? What is a NP-Hard?
Sometimes Wikipedia is written as if the reader already understands all concepts involved.
Hoooo, doctoral comp flashback. Okay, here goes.
We start with the idea of a decision problem, a problem for which an algorithm can always answer "yes" or "no." We also need the idea of two models of computer (Turing machine, really): deterministic and non-deterministic. A deterministic computer is the regular computer we always thinking of; a non-deterministic computer is one that is just like we're used to except that is has unlimited parallelism, so that any time you come to a branch, you spawn a new "process" and examine both sides. Like Yogi Berra said, when you come to a fork in the road, you should take it.
A decision problem is in P if there is a known polynomial-time algorithm to get that answer. A decision problem is in NP if there is a known polynomial-time algorithm for a non-deterministic machine to get the answer.
Problems known to be in P are trivially in NP --- the nondeterministic machine just never troubles itself to fork another process, and acts just like a deterministic one. There are problems that are known to be neither in P nor NP; a simple example is to enumerate all the bit vectors of length n. No matter what, that takes 2n steps.
(Strictly, a decision problem is in NP if a nodeterministic machine can arrive at an answer in poly-time, and a deterministic machine can verify that the solution is correct in poly time.)
But there are some problems which are known to be in NP for which no poly-time deterministic algorithm is known; in other words, we know they're in NP, but don't know if they're in P. The traditional example is the decision-problem version of the Traveling Salesman Problem (decision-TSP): given the cities and distances, is there a route that covers all the cities, returning to the starting point, in less than x distance? It's easy in a nondeterministic machine, because every time the nondeterministic traveling salesman comes to a fork in the road, he takes it: his clones head on to the next city they haven't visited, and at the end they compare notes and see if any of the clones took less than x distance.
(Then, the exponentially many clones get to fight it out for which ones must be killed.)
It's not known whether decision-TSP is in P: there's no known poly-time solution, but there's no proof such a solution doesn't exist.
Now, one more concept: given decision problems P and Q, if an algorithm can transform a solution for P into a solution for Q in polynomial time, it's said that Q is poly-time reducible (or just reducible) to P.
A problem is NP-complete if you can prove that (1) it's in NP, and (2) show that it's poly-time reducible to a problem already known to be NP-complete. (The hard part of that was provie the first example of an NP-complete problem: that was done by Steve Cook in Cook's Theorem.)
So really, what it says is that if anyone ever finds a poly-time solution to one NP-complete problem, they've automatically got one for all the NP-complete problems; that will also mean that P=NP.
A problem is NP-hard if and only if it's "at least as" hard as an NP-complete problem. The more conventional Traveling Salesman Problem of finding the shortest route is NP-hard, not strictly NP-complete.
Michael Sipser's Introduction to the Theory of Computation is a great book, and is very readable. Another great resource is Scott Aaronson's Great Ideas in Theoretical Computer Science course.
The formalism that is used is to look at decision problems (problems with a Yes/No answer, e.g. "does this graph have a Hamiltonian cycle") as "languages" -- sets of strings -- inputs for which the answer is Yes. There is a formal notion of what a "computer" is (Turing machine), and a problem is in P if there is a polynomial time algorithm for deciding that problem (given an input string, say Yes or No) on a Turing machine.
A problem is in NP if it is checkable in polynomial time, i.e. if, for inputs where the answer is Yes, there is a (polynomial-size) certificate given which you can check that the answer is Yes in polynomial time. [E.g. given a Hamiltonian cycle as certificate, you can obviously check that it is one.]
It doesn't say anything about how to find that certificate. Obviously, you can try "all possible certificates" but that can take exponential time; it is not clear whether you will always have to take more than polynomial time to decide Yes or No; this is the P vs NP question.
A problem is NP-hard if being able to solve that problem means being able to solve all problems in NP.
Also see this question:
What is an NP-complete in computer science?
But really, all these are probably only vague to you; it is worth taking the time to read e.g. Sipser's book. It is a beautiful theory.
This is a comment on Charlie's post.
A problem is NP-complete if you can prove that (1) it's in NP, and
(2) show that it's poly-time reducible to a problem already known to
be NP-complete.
There is a subtle error with the second condition. Actually, what you need to prove is that a known NP-complete problem (say Y) is polynomial-time reducible to this problem (let's call it problem X).
The reasoning behind this manner of proof is that if you could reduce an NP-Complete problem to this problem and somehow manage to solve this problem in poly-time, then you've also succeeded in finding a poly-time solution to the NP-complete problem, which would be a remarkable (if not impossible) thing, since then you'll have succeeded to resolve the long-standing P = NP problem.
Another way to look at this proof is consider it as using the the contra-positive proof technique, which essentially states that if Y --> X, then ~X --> ~Y. In other words, not being able to solve Y in polynomial time isn't possible means not being to solve X in poly-time either. On the other hand, if you could solve X in poly-time, then you could solve Y in poly-time as well. Further, you could solve all problems that reduce to Y in poly-time as well by transitivity.
I hope my explanation above is clear enough. A good source is Chapter 8 of Algorithm Design by Kleinberg and Tardos or Chapter 34 of Cormen et al.
Unfortunately, the best two books I am aware of (Garey and Johnson and Hopcroft and Ullman) both start at the level of graduate proof-oriented mathematics. This is almost certainly necessary, as the whole issue is very easy to misunderstand or mischaracterize. Jeff nearly got his ears chewed off when he attempted to approach the matter in too folksy/jokey a tone.
Perhaps the best way is to simply do a lot of hands-on work with big-O notation using lots of examples and exercises. See also this answer. Note, however, that this is not quite the same thing: individual algorithms can be described by asymptotes, but saying that a problem is of a certain complexity is a statement about every possible algorithm for it. This is why the proofs are so complicated!
I remember "Computational Complexity" from Papadimitriou (I hope I spelled the name right) as a good book
very much simplified: A problem is NP-hard if the only way to solve it is by enumerating all possible answers and checking each one.
Here are a few links on the subject:
Clay Mathematics statement of P vp NP problem
P vs NP Page
P, NP, and Mathematics
In you are familiar with the idea of set cardinality, that is the number of elements in a set, then one could view the question like P representing the cardinality of Integer numbers while NP is a mystery: Is it the same or is it larger like the cardinality of all Real numbers?
My simplified answer would be: "Computational complexity is the analysis of how much harder a problem becomes when you add more elements."
In that sentence, the word "harder" is deliberately vague because it could refer either to processing time or to memory usage.
In computer science it is not enough to be able to solve a problem. It has to be solvable in a reasonable amount of time. So while in pure mathematics you come up with an equation, in CS you have to refine that equation so you can solve a problem in reasonable time.
That is the simplest way I can think to put it, that may be too simple for your purposes.
Depending on how long you have, maybe it would be best to start at DFA, NDFA, and then show that they are equivalent. Then they understand ND vs. D, and will understand regular expressions a lot better as a nice side effect.

Resources