P versus NP Clarification - algorithm

Quoted from wikipedia, the P vs NP problem, regarding the time complexity of algorithms "... asks whether every problem whose solution can be quickly verified by a computer can also be quickly solved by a computer."
I am hoping that somebody can clarify what the difference between "verifying the problem" and "solving the problem" is here.

I am hoping that somebody can clarify what the difference between "verifying the problem" and "solving the problem" is here.
It's not "verifying the problem", but "verifying the solution". For example you can check in polynomial time whether a given set is valid for SAT. The actual generation of such set is NP hard. The section Verifier-based definition in the Wikipedia article NP (complexity) could help you a little bit:
Verifier-based definition
In order to explain the verifier-based definition of NP, let us consider the subset sum problem: Assume that we are given some integers, such as {−7, −3, −2, 5, 8}, and we wish to know whether some of these integers sum up to zero. In this example, the answer is "yes", since the subset of integers {−3, −2, 5} corresponds to the sum (−3) + (−2) + 5 = 0. The task of deciding whether such a subset with sum zero exists is called the subset sum problem.
As the number of integers that we feed into the algorithm becomes larger, the number of subsets grows exponentially, and in fact the subset sum problem is NP-complete. However, notice that, if we are given a particular subset (often called a certificate), we can easily check or verify whether the subset sum is zero, by just summing up the integers of the subset. So if the sum is indeed zero, that particular subset is the proof or witness for the fact that the answer is "yes". An algorithm that verifies whether a given subset has sum zero is called verifier. A problem is said to be in NP if and only if there exists a verifier for the problem that executes in polynomial time. In case of the subset sum problem, the verifier needs only polynomial time, for which reason the subset sum problem is in NP.
If you're more into graph theory the Hamilton cycle is a NP-complete problem. It's trivial to check whether a given solution is a Hamilton cycle (linear complexity, traverse the solution path), but if P != NP than there exists no algorithm with polynomial runtime which solves the problem.
Maybe the term "quick" is misleading in this context. An algorithm is quick in this regard if and only if it's worst-case runtime is bounded by a polynomial function, such as O(n) or O(n log n). The creation of all permutations for a given range with the length n is not bounded, as you have n! different permutations. This means that the problem could be solved in n 100 log n, which will take a very long time, but this is still considered fast. On the other hand, one of the first algorithms for TSP was O(n!) and another one was O(n2 2n). And compared to polynomial functions these things grow really, really fast.

The RSA encryption uses prime numbers like following: two big prime numbers P and Q (200-400 digits each) are multiplied to form the public key N. N=P*Q
To break the encryption one needs to figure out P and Q given the N.
While finding P and Q is very difficult and can take years, verifing the solution is just about multiplying P by Q and comparing with N.
So solving the problem is very hard while it takes nothing to verifying the solution.
P.S. The example is only part of the RSA simplified for this question. The real RSA is much more complex.

Related

Non deterministic Polynomial(NP) vs Polynomial(P)?

I am actually looking for description what NP alogrithm actually means and what kind of algo/problem can be classified as NP problem
I have read many resources on net . I liked
https://www.quora.com/What-are-P-NP-NP-complete-and-NP-hard
What are the differences between NP, NP-Complete and NP-Hard?
Non deterministic Turing machine
What are NP problems?
What are NP and NP-complete problems?
Polynomial problem :-
If the running time is some polynomial function of the size of the input**, for instance if the algorithm runs in linear time or quadratic time or cubic time, then we say the algorithm runs in polynomial time . Example can be binary search
Now I do understand Polynomial problem . But not able to contrast it with NP.
NP(nondeterministic polynomial Problem):-
Now there are a lot of programs that don't (necessarily) run in polynomial time on a regular computer, but do run in polynomial time on a nondeterministic Turing machine. These programs solve problems in NP, which stands for nondeterministic polynomial time.
I am not able to to understand/think of example that does not run in polynomial time on a regular computer. Per mine current understanding, Every problem/algo can be solved
in some polynomial function of time which can or can't be proportional to time. I know i am missing something here but really could not grasp this concept. Could someone
give example of problem which can not be solved in polynomial time on regular computer but can be verified in polynomial time ?
One of the example given at second link mentioned above is Integer factorization is in NP. This is the problem that given integers n and m, is there an integer f with 1 < f < m, such that f divides n (f is a small factor of n)? why this can't be solved in some polynomial time on regular computer ? we can check for all number from 1 to n if they divide n or not. Right ?
Also where verification part come here(i mean if it can be solved in polynomial time but then how the problem solution can be verified in polynomial time)?
Your question touches several points.
First, in the sense relevant to your question, the size of a problem is defined to be the size of the representation of the problem. So, for example, when you write about the problem of a divisor of n. What is the representation of n? It is a series of characters of length q (I don't want to be more specific than that). In general, n is exponential in q. So when you talk about a simple loop from 1 to n, you're talking about something that is exponential in the size of the input. For example, the number "999999999999999" represents the number 999999999999999. That is quite a large number, but it is represented by 12 characters here.
Second, while there is more than a single way to define the class NP, perhaps the simplest one for decision problems (which is the type you raise in your question, namely is something true or not) is that if the answer is true, then there is an "certificate" that can be verified in polynomial time. For example, consider the Hamilton Path Problem. This is (probably) a hard problem to solve, but, if you are given a hamilton path as an answer, it is very easy to verify that it is so; specifically, it can be done in polynomial time. For the Hamilton Path Problem, the path is a polynomial-time verifiable certificate, and therefore this problem is NP.
It's probably worth noting how the idea of "checking a solution in polynomial time" relates to a nondeterministic Turing Machine solving a problem: in a normal (deterministic) Turing Machine, there is a well-defined set of instructions telling the machine exactly what to do in any situation ("if you're in state 3 and see an 'a', move left, if you're in state 7 and see a 'c', overwrite it with a 'b', etc.") whereas in a nondeterministic Turing Machine there is more than one option for what to do in some situations ("if you're in state 3 and see an 'a', either move right or overwrite it with a 'b'"). In terms of algorithms, this lets us "guess" solutions in the sense that if we can encode a problem into a language on an alphabet* then we can use a nondeterministic Turing Machine to generate strings on this alphabet, and then use a standard (deterministic) Turing Machine to ensure that it is correct. If we assume that we always make the right guess, then the runtime of our algorithm is simply the runtime of the deterministic checking part, which for NP problems runs in polynomial time. This is what it means for a problem to be 'decidable in polynomial time on a nondeterministic Turing Machine', and why it is often simply phrased as 'checking a solution/ certificate in polynomial time'.
*
Example: The Hamiltonian Path problem could be encoded as follows:
Label the vertices of the graph 1 through n, where n is the number of vertices. Our alphabet is then the numbers 1 through n, and our language consists of all words such that
a) every integer from 1 to n appears exactly once
and
b) for every consecutive pair of integers in a word, the vertices with those labels are connected
Polynomial Time :- Problem which can be solved in polynomial time of input size is called polynomial problem. In plain simple words :- Here Solution to problem is fast. For
example sorting, binary search
Non deterministic polynomial :- Theoretically the problems which can be verified in polynomial time irrespective of actual solution time complexity (which can be polynomial or not polynomial). So some problem which are P can also be NP.
But Informally people while conversation/posts use the NP term in below sense
Problem which can not be solved in polynomial time of input size is called polynomial problem. In plain simple words :- Here Solution to problem is not fast. You may have to try different permutation/combination or guessing work. But Verification part is fast and can be done in polynomial time. Like
input some numbers X and divide the numbers into two groups that difference in their sum is minimum
I really liked the Alex Flint answer at https://www.quora.com/What-are-P-NP-NP-complete-and-NP-hard .Above is just gist of that.

Minimum Cut in undirected graphs

I would like to quote from Wikipedia
In mathematics, the minimum k-cut, is a combinatorial optimization
problem that requires finding a set of edges whose removal would
partition the graph to k connected components.
It is said to be the minimum cut if the set of edges is minimal.
For a k = 2, It would mean Finding the set of edges whose removal would Disconnect the graph into 2 connected components.
However, The same article of Wikipedia says that:
For a fixed k, the problem is polynomial time solvable in O(|V|^(k^2))
My question is Does this mean that minimum 2-cut is a problem that belongs to complexity class P?
The min-cut problem is solvable in polynomial time and thus yes it is true that it belongs to complexity class P. Another article related to this particular problem is the Max-flow min-cut theorem.
First of all, the time complexity an algorithm should be evaluated by expressing the number of steps the algorithm requires to finish as a function of the length of the input (see Time complexity). More or less formally, if you vary the length of the input, how would the number of steps required by the algorithm to finish vary?
Second of all, the time complexity of an algorithm is not exactly the same thing as to what complexity class does the problem the algorithm solves belong to. For one problem there can be multiple algorithms to solve it. The primality test problem (i.e. testing if a number is a prime or not) is in P, but some (most) of the algorithms used in practice are actually not polynomial.
Third of all, in the case of most algorithms you'll find on the Internet evaluating the time complexity is not done by definition (i.e. not as a function of the length of the input, at least not expressed directly as such). Lets take the good old naive primality test algorithm (the one in which you take n as input and you check for division by 2,3...n-1). How many steps does this algo take? One way to put it is O(n) steps. This is correct. So is this algorithm polynomial? Well, it is linear in n, so it is polynomial in n. But, if you take a look at what time complexity means, the algorithm is actually exponential. First, what is the length of the input to your problem? Well, if you provide the input n as an array of bits (the usual in practice) then the length of the input is, roughly said, L = log n. Your algorithm thus takes O(n)=O(2^log n)=O(2^L) steps, so exponential in L. So the naive primality test is in the same time linear in n, but exponential in the length of the input L. Both correct. Btw, the AKS primality test algorithm is polynomial in the size of input (thus, the primality test problem is in P).
Fourth of all, what is P in the first place? Well, it is a class of problems that contains all decision problems that can be solved in polynomial time. What is a decision problem? A problem that can be answered with yes or no. Check these two Wikipedia pages for more details: P (complexity) and decision problems.
Coming back to your question, the answer is no (but pretty close to yes :p). The minimum 2-cut problem is in P if formulated as a decision problem (your formulation requires an answer that is not just a yes-or-no). In the same time the algorithm that solves the problem in O(|V|^4) steps is a polynomial algorithm in the size of the input. Why? Well, the input to the problem is the graph (i.e. vertices, edges and weights), to keep it simple lets assume we use an adjacency/weights matrix (i.e. the length of the input is at least quadratic in |V|). So solving the problem in O(|V|^4) steps means polynomial in the size of the input. The algorithm that accomplishes this is a proof that the minimum 2-cut problem (if formulated as decision problem) is in P.
A class related to P is FP and your problem (as you formulated it) belongs to this class.

Decision problems that can't even be decided efficiently?

How does these problems fall into the tapestry of the P, NP, NP-Hard, etc... sets? I don't know if any such problems even exists, but what initiated my thought process was thinking of a decidable of the travelling salesman problem:
Given a list of cities and the distances between each pair of cities, and a
Hamiltonian path P, is P the shortest Hamiltonian path?
I suspect that we cannot verify the "shortestness" of P in polynomial time, in which this decision problem is not even in NP. So where does it fall in this case?
This problem is in co-NP. You can think of NP as the class of problems where if the answer is yes, there is a small amount of information I could give you that would convince you of this. For example, the problem
Is there a Hamiltonian cycle in G with cost at most k?
is in NP, because if the answer is yes, I could just give you the cycle and you could check it to see whether it's valid. Coming up with that cycle is hard, but once you have the Hamiltonian cycle it's really easy to use it to check the answer.
The class co-NP consists of problems where if the answer is no, there's a small amount of information I could give you that would convince you of this. In your case, suppose that no, P is not the shortest Hamiltonian path. That means that there's some shorter path P'. If I gave you P', you could easily check that P wasn't ideal. Coming up with P' might be really hard (in fact, it's co-NP-hard!), but once you have it it's pretty straightforward to use it to confirm the answer is no.
Hope this helps!
Given two integers n and m, are there exactly m prime numbers p <= n?
This can be solved in about O (n^(2/3)) and possibly slightly faster, but the problem size is of course not n but log (n), so it takes sub-linear time in n, but exponential time in the problem size. That's not worse than you'd expect from a problem in NP. However, I cannot see any possible information that would allow you to check this quicker.
(Actually, there is an algorithm which determines the number of primes <= n in about O (n^(2/3)) steps, but there is no known algorithm that can check an answer faster than finding the answer. )
Given integers n and k, is 2^n - 1 the k-th Mersenne prime?
It is possible to prove that p is prime in time polynomial in the size of p if a complete factorisation of p + 1 is known, and if p = 2^n - 1 then the complete factorisation of p + 1 is trivial.
However, that is polynomial in the size of p. 2^n - 1 can be checked for primality in time that is polynomial in n. However, that is not polynomial in the size of the problem, which would be roughly the number of digits in n and k. And it would just answer the question whether 2^n - 1 is a Mersenne prime. To prove that it is the k-th Mersenne prime, we would have to check 2^m - 1 for 1 <= m < n and prove that exactly k-1 of these are primes.
Currently the answer to the question is not known for k >= 44 and many 8-digit values n.

What is fixed-parameter tractability? Why is it useful?

Some problems that are NP-hard are also fixed-parameter tractable, or FPT. Wikipedia describes a problem as fixed-parameter tractable if there's an algorithm that solves it in time f(k) · |x|O(1).
What does this mean? Why is this concept useful?
To begin with, under the assumption that P ≠ NP, there are no polynomial-time, exact algorithms for any NP-hard problem. Although we don't know whether P = NP or P ≠ NP, we don't have any polynomial-time algorithms for any NP-hard problems.
The idea behind fixed-parameter tractability is to take an NP-hard problem, which we don't know any polynomial-time algorithms for, and to try to separate out the complexity into two pieces - some piece that depends purely on the size of the input, and some piece that depends on some "parameter" to the problem.
As an example, consider the 0/1 knapsack problem. In this problem, you're given a list of n objects that have associated weights and values, along with some maximum weight W that you're allowed to carry. The question is to determine the maximum amount of value that you can carry. This problem is NP-hard, meaning that there's no polynomial-time algorithm that solves it. A brute-force method will take time around O(2n) by considering all possible subsets of the items, which is extremely slow for large n. However, it is possible to solve this problem in time O(nW), where n is the number of elements and W is the amount of weight you can carry. If you look at the runtime O(nW), you'll notice that it's split into two parts: a component that's linear in the number of elements (the n part) and a component that's linear in the weight (the W part). If W is any fixed constant, then the runtime of this algorithm will be O(n), which is linear-time, even though the problem in general is NP-hard. This means that if we treat W as some tunable "parameter" of the problem, for any fixed value of this parameter, the problem ends up running in polynomial time (which is "tractable," in the complexity theory sense of the word.)
As another example, consider the problem of finding long, simple paths in a graph. This problem is also NP-hard, and the naive algorithm for finding simple paths of length k in a graph takes time O(n! / (n - k)!), which for large k ends up being superexponential. However, using the technique of color-coding, it's possible to solve this problem in time O((2e)kn3 log n), where k is the length of the path to find and n is the number of nodes in the input graph. Notice that this runtime also has two "components:" one component that's a polynomial in the number of nodes in the input graph (the n3 log n part) and one component that's exponential in k (the (2e)k part). This means that for any fixed value of k, there's a polynomial-time algorithm for finding length-k paths in the graph; the runtime will be O(n3 log n).
In both of these cases, we can take a problem for which we have an exponential-time solution (or worse) and find a new solution whose runtime is some polynomial in n times some crazy-looking function of some extra "parameter." In the case of the knapsack problem, that parameter is the maximum amount of weight we can carry; in the case of finding long paths, the parameter is the length of the path to find. Generally speaking, a problem is called fixed-parameter tractable if there is some algorithm for solving the problem defined in terms of two quantities: n, the size of the input, and k, some "parameter," where the runtime is
O(p(n) · f(k))
Where p(n) is some polynomial function and f(k) is an arbitrary function in k. Intuitively, this means that the complexity of the problem scales polynomially with n (meaning that as only the problem size increases, the runtime will scale nicely), but can scale arbitrarily badly with the parameter k. This separates out the "inherent hardness" of the problem such that the "hard part" of the problem is blamed on the parameter k, while the "easy part" of the problem is charged to the size of the input.
Once you have a runtime that looks like O(p(n) · f(k)), we immediately get polynomial-time algorithms for solving the problem for any fixed k. Specifically, if k is fixed, then f(k) is some constant, so O(p(n) · f(k)) is just O(p(n)). This is a polynomial-time algorithm. Therefore, if we "fix" the parameter, we get back some "tractable" algorithm for solving the problem. This is the origin of the term fixed-parameter tractable.
(A note: Wikipedia's definition of fixed-parameter tractability says that the algorithm should have runtime f(k) · |x|O(1). Here, |x| refers to the size of the input, which I've called n here. This means that Wikipedia's definition is the same as saying that the runtime is f(k) · nO(1). As mentioned in this earlier answer, nO(1) means "some polynomial in n," and so this definition ends up being equivalent to the one I've given here).
Fixed-parameter tractability has enormous practical implications for a problem. It's common to encounter problems that are NP-hard. If you find a problem that's fixed-parameter tractable and the parameter is low, it can be significantly more efficient to use the fixed-parameter tractable algorithm than to use the normal brute-force algorithm. The color-coding example above for finding long paths in a graph, for example, has been used to great success in computational biology to find sequencing pathways in yeast cells, and the 0/1 knapsack solution is used frequently because common values of W are low enough for it to be practical.
Hope this helps!
I believe that the explanation of #templatetypedef was already quite comprehensive of the generality of FPT.
I would like to add that in practice, it appears quite often that the class of problem one is trying to solve is FPT, such as above examples.
In the case of problems expressed as set of constraints (e.g. SAT, CSP, ILP, etc.) a very common parameter is treewidth, which basically explicits how much your problem is organized as a tree.
This allows to split ones problem into a tree of subproblems which can then be solved more individually using dynamic programming.
In such case, many problems are linear-time fixed-parameter tractable, that is the complexity grows linearly with the number of components (i.e. the size of the system) by exponentially in the size of its biggest component.
Although the use of explicit techniques is possible to solve sub-problems is possible, in order to scale-up to more reasonnable instances, using symbolic representations is recomended.

Finding maximum subsequence below or equal to a certain value

I'm learning dynamic programming and I've been having a great deal of trouble understanding more complex problems. When given a problem, I've been taught to find a recursive algorithm, memoize the recursive algorithm and then create an iterative, bottom-up version. At almost every step I have an issue. In terms of the recursive algorithm, I write different different ways to do recursive algorithms, but only one is often optimal for use in dynamic programming and I can't distinguish what aspects of a recursive algorithm make memoization easier. In terms of memoization, I don't understand which values to use for indices. For conversion to a bottom-up version, I can't figure out which order to fill the array/double array.
This is what I understand:
- it should be possible to split the main problem to subproblems
In terms of the problem mentioned, I've come up with a recursive algorithm that has these important lines of code:
int optionOne = values[i] + find(values, i+1, limit - values[i]);
int optionTwo = find(values, i+1, limit);
If I'm unclear or this is not the correct qa site, let me know.
Edit:
Example: Given array x: [4,5,6,9,11] and max value m: 20
Maximum subsequence in x under or equal to m would be [4,5,11] as 4+5+11 = 20
I think this problem is NP-hard, meaning that unless P = NP there isn't a polynomial-time algorithm for solving the problem.
There's a simple reduction from the subset-sum problem to this problem. In subset-sum, you're given a set of n numbers and a target number k and want to determine whether there's a subset of those numbers that adds up to exactly k. You can solve subset-sum with a solver for your problem as follows: create an array of the numbers in the set and find the largest subsequence whose sum is less than or equal to k. If that adds up to exactly k, the set has a subset that adds up to k. Otherwise, it does not.
This reduction takes polynomial time, so because subset-sum is NP-hard, your problem is NP-hard as well. Therefore, I doubt there's a polynomial-time algorithm.
That said - there is a pseudopolynomial-time algorithm for subset-sum, which is described on Wikipedia. This algorithm uses DP in two variables and isn't strictly polynomial time, but it will probably work in your case.
Hope this helps!

Resources