I'm having a bit of trouble understanding how one would verify if there is no solution to a given instance of the subset sum problem in polynomial time.
Of course you could easily verify the positive case: simply provide the list of integers which add up to the target sum and check they are all in the original set. (O(N))
How do you verify that the answer "false" is the correct one in polynomial time?
It’s actually not known how to do this - and indeed it’s conjectured that it’s not possible to do so!
The class NP consists of problems where “yes” instances can be verified in polynomial time. The subset sum problem is a canonical example of a problem in NP. Importantly, notice that the definition of NP says nothing about what happens if the answer is “no” - maybe it’ll be easy to show this, or maybe there isn’t an efficient algorithm for doing so.
A counterpart to NP is the class co-NP, which consists of problems where “no” instances can be verified in polynomial time. A canonical example of such a problem is the tautology problem - given a propositional logic formula, is it always true regardless of what values the variables are given? If the formula isn’t a tautology, it’s easy to verify this by having someone tell you how to assign the values to the variables such that the formula is false. But if the formula is always true, it’s unclear how you’d show this efficiently.
Just as the P = NP problem hasn’t been solved, the NP = co-NP problem is also open. We don’t know whether problems where “yes” answers have fast verification are the same as problems where “no” answers have fast verification.
What we do know is that if any NP-complete problem is in co-NP, then NP = co-NP. And since subset sum is NP-complete, there’s no known polynomial time algorithm to verify if the answer to a subset sum instance is “no.”
I have to answer this question as a homework assignment but I am finding very little material to work with. I understand what is a NP-complete problem and what is a restriction. In my opinion, this statement is true, because you can always restrict the problem in order to "make the problem easier". But I'm looking at it with a bird's eye view... Can anyone help me make some progress finding the answer to this question?
Any help will be much appreciated.
Converting my comment into an answer - consider the "empty problem," a problem whose instance set is empty. Since the empty set is a subset of every set, this problem technically counts as a restriction of any language (including languages not in NP). It's also a problem in P; you can build a polynomial-time TM that always rejects its input. Therefore, every problem in NP has a polynomial-time restriction.
What I'm still curious about, though, is whether every NP problem whose instance set is infinite has a polynomial-time restriction whose instance set is also infinite. That's a more interesting question, IMHO, and I don't currently have an answer.
Hope this helps!
I read the article on wikipedia but could not understand what exactly are NP problems. Can anyone tell me about them and also what is relation of them with P Problems?
NP problems are problems that given a proposed solution, you can verify the solution in a polynomial time. For example, if you have a list of University courses and need to create a schedule so that courses won't conflict, it would be a really difficult task (complexity-wise). However, given a proposed schedule, you can easily verify its correctness.
Another important example from the field of encryption: given a number which is the result of multiplying two very large prime numbers, it's very difficult to find those primes based only on the result. However, given two numbers, it's very easy to check the solution (multiply them, compare).
I have intentionally chose examples that are in NP and not in P (i.e. problem that are hard to find the solution for) so you can understand the difference. All problems that are easy to solve, are also easy to verify - just solve and compare. That is, P is a subset of NP.
Not really an answer, because Piccolo's link is more useful, but a HP researcher claims having proven P != NP, here is the paper.
www.hpl.hp.com/personal/Vinay_Deolalikar/Papers/pnp12pt.pdf
It was not accepted yet, but I wish him good luck for the 1M$.
This problem came up in the real world, but I've translated it into a more generic "textbook-like" formulation. I suspect it is NP, but I'm particularly interested in knowing if it has a name or is well known since I think I can't be the first one to encounter it. ;-)
Imagine there is a potluck party with N guests. Each guest may bring his/her "signature dish" to the party, or bring nothing. Each guest either likes or hates each of the dishes that the other guests may bring (and this is known in advance since they are all old friends!), but they all like their own dishes.
Is there a deterministic algorithm that does not take exponential time to find the smallest set of dishes that satisfies the constraint that all guests will find at least one dish to their liking? I say "the" smallest, but actually there may be multiple solutions, and I'd like to know them all if possible.
Or, in a more abstract way, imagine a square matrix where all elements are either 0 or 1, and all diagonal elements are 1. What are the smallest sets of rows such that the sum (or the binary OR) of the rows in each set have no zeroes? (The rows would be the dishes, the columns would be the guests, 1 would mean that a guest likes a dish, and diagonal elements are 1 since everyone likes their own dish.)
This could be generalized to non-square matrices, or by removing diagonal=1 rule (although the latter guarantees that there will always be at least one solution). But I don't care about those cases for now...
I already have a program that solves it through an exhaustive search and is fast enough for N around 20, but it takes exponential time. I'm thinking I may need to recur to stochastic algorithms to find good-enough solutions for larger values of N.
Added
Wow, thanks for the quick answer! "Set cover", that's the name I was looking for. :)
This is called the SET COVER problem and it is NP-complete.
The set cover problem, as described in the Wikipedia article which Antti Huima linked to, lacks the feature of each guest liking his own dish. Offhand, I don't know whether this makes any difference.
So I've had at least two professors mention that backtracking makes an algorithm non-deterministic without giving too much explanation into why that is. I think I understand how this happens, but I have trouble putting it into words. Could somebody give me a concise explanation of the reason for this?
It's not so much the case that backtracking makes an algorithm non-deterministic.
Rather, you usually need backtracking to process a non-deterministic algorithm, since (by the definition of non-deterministic) you don't know which path to take at a particular time in your processing, but instead you must try several.
I'll just quote wikipedia:
A nondeterministic programming language is a language which can specify, at certain points in the program (called "choice points"), various alternatives for program flow. Unlike an if-then statement, the method of choice between these alternatives is not directly specified by the programmer; the program must decide at runtime between the alternatives, via some general method applied to all choice points. A programmer specifies a limited number of alternatives, but the program must later choose between them. ("Choose" is, in fact, a typical name for the nondeterministic operator.) A hierarchy of choice points may be formed, with higher-level choices leading to branches that contain lower-level choices within them.
One method of choice is embodied in backtracking systems, in which some alternatives may "fail", causing the program to backtrack and try other alternatives. If all alternatives fail at a particular choice point, then an entire branch fails, and the program will backtrack further, to an older choice point. One complication is that, because any choice is tentative and may be remade, the system must be able to restore old program states by undoing side-effects caused by partially executing a branch that eventually failed.
Out of the Nondeterministic Programming article.
Consider an algorithm for coloring a map of the world. No color can be used on adjacent countries. The algorithm arbitrarily starts at a country and colors it an arbitrary color. So it moves along, coloring countries, changing the color on each step until, "uh oh", two adjacent countries have the same color. Well, now we have to backtrack, and make a new color choice. Now we aren't making a choice as a nondeterministic algorithm would, that's not possible for our deterministic computers. Instead, we are simulating the nondeterministic algorithm with backtracking. A nondeterministic algorithm would have made the right choice for every country.
The running time of backtracking on a deterministic computer is factorial, i.e. it is in O(n!).
Where a non-deterministic computer could instantly guess correctly in each step, a deterministic computer has to try all possible combinations of choices.
Since it is impossible to build a non-deterministic computer, what your professor probably meant is the following:
A provenly hard problem in the complexity class NP (all problems that a non-deterministic computer can solve efficiently by always guessing correctly) cannot be solved more efficiently on real computers than by backtracking.
The above statement is true, if the complexity classes P (all problems that a deterministic computer can solve efficiently) and NP are not the same. This is the famous P vs. NP problem. The Clay Mathematics Institute has offered a $1 Million prize for its solution, but the problem has resisted proof for many years. However, most researchers believe that P is not equal to NP.
A simple way to sum it up would be: Most interesting problems a non-deterministic computer could solve efficiently by always guessing correctly, are so hard that a deterministic computer would probably have to try all possible combinations of choices, i.e. use backtracking.
Thought experiment:
1) Hidden from view there is some distribution of electric charges which you feel a force from and you measure the potential field they create. Tell me exactly the positions of all the charges.
2) Take some charges and arrange them. Tell me exactly the potential field they create.
Only the second question has a unique answer. This is the non-uniqueness of vector fields. This situation may be in analogy with some non-deterministic algorithms you are considering. Further consider in math limits which do not exist because they have different answers depending on which direction you approach a discontinuity from.
I wrote a maze runner that uses backtracking (of course), which I'll use as an example.
You walk through the maze. When you reach a junction, you flip a coin to decide which route to follow. If you chose a dead end, trace back to the junction and take another route. If you tried them all, return to the previous junction.
This algorithm is non-deterministic, non because of the backtracking, but because of the coin flipping.
Now change the algorithm: when you reach a junction, always try the leftmost route you haven't tried yet first. If that leads to a dead end, return to the junction and again try the leftmost route you haven't tried yet.
This algorithm is deterministic. There's no chance involved, it's predictable: you'll always follow the same route in the same maze.
If you allow backtracking you allow infinite looping in your program which makes it non-deterministic since the actual path taken may always include one more loop.
Non-Deterministic Turing Machines (NDTMs) could take multiple branches in a single step. DTMs on the other hand follow a trial-and-error process.
You can think of DTMs as regular computers. In contrast, quantum computers are alike to NDTMs and can solve non-deterministic problems much easier (e.g. see their application in breaking cryptography). So backtracking would actually be a linear process for them.
I like the maze analogy. Lets think of the maze, for simplicity, as a binary tree, in which there is only one path out.
Now you want to try a depth first search to find the correct way out of the maze.
A non deterministic computer would, at every branching point, duplicate/clone itself and run each further calculations in parallel. It is like as if the person in the maze would duplicate/clone himself (like in the movie Prestige) at each branching point and send one copy of himself into the left subbranch of the tree and the other copy of himself into the right subbranch of the tree.
The computers/persons who end up at a dead end they die (terminate without answer).
Only one computer will survive (terminate with an answer), the one who gets out of the maze.
The difference between backtracking and non-determinism is the following.
In the case of backtracking there is only one computer alive at any given moment, he does the traditional maze solving trick, simply marking his path with a chalk and when he gets to a dead end he just simply backtracks to a branching point whose sub branches he did not yet explore completely, just like in a depth first search.
IN CONTRAST :
A non deteministic computer can clone himself at every branching point and check for the way out by running paralell searches in the sub branches.
So the backtracking algorithm simulates/emulates the cloning ability of the non-deterministic computer on a sequential/non-parallel/deterministic computer.