If Y is reducible to X in polynomial time, then how is it true that X is at least as hard as Y? - algorithm

I am having difficulty understanding the relationship between the complexity of two classes of problems, say NP-hard and NP-complete problems.
The answer at https://stackoverflow.com/a/1857342/ states:
NP Hard
Intuitively, these are the problems that are at least as hard as the NP-complete problems. Note that NP-hard problems do not have to be in NP, and they do not have to be decision problems.
The precise definition here is that a problem X is NP-hard, if there is an NP-complete problem Y, such that Y is reducible to X in polynomial time.
If a problem Y can be reduced to X in polynomial time, should we not say that Y is at least as hard as X? If a problem Y is reducible to X in polynomial time, then the time required to solve Y is polynomial time + the time required to solve X. So it appears to me that problem Y is at least as hard as X.
But the quoted text above says just the opposite. It says, if an NP-complete problem Y is reducible to an NP-hard problem X, then the NP-hard problem is at least as hard as the NP-complete problem.
How does this make sense? Where am I making an error in thinking?

Your error is in supposing that you have to solve X in order to solve Y. Y might be actually much easier, but one way to solve it is to change it to an instance of X problem. And since we are in big O notation and in NP class we are way past linear algorithms, you can always safely discard any linear parts of an algorithm. Heck you can almost safely discard any polynomial parts until P=NP problem is solved. That means O(f(n) + n) = O(f(n)) where n=O(f(n)).
Example (which is obviously with neither NP-hard or NP-complete problems but just a mere illustration): You are to find the lowest number in an unsorted array of n numbers. There is obvious solution to iterate over the whole list and remember the lowest number you found, pretty straight-forward and solid O(n).
Someone else comes and says, ok, let's change it to sorting the array, then we can just take the first number and it will be the lowest. Note here, that this conversion of the problem was O(1), but we can for example pretend there had to be some preprocessing done with the array that would make it O(n). The overall solution is O(n + n*log(n)) = O(n * log(n)).
Here you too changed easy problem to a hard problem, thus proving that the hard problem is indeed the same or harder as the easy one.
Basically what the NP-hard problem difinition means is, that X is at least as hard as an NP-complete Y problem. If you find an NP-complete Y problem that you can solve by solving X problem, it means either that X is as hard or harder than Y and then it is indeed NP-hard, or if it is simpler, it means you found an algorithm to solve Y faster than any algorithm before, potentially even moving it out of NP-complete class.
Another example: let's pretend convolution is in my set of "complete", and normally takes O(n²). Then you come up with Fast Fourier Transformation with O(n * log(n)) and you find out you can solve convolution by transforming it to FFT problem. Now you came up with a solution for convolution, which is o(n²), more specifically O(n * log(n)).

Let I_X be the indicator function of X (i.e., 1 if the input is in X and 0 otherwise) and I_Y be the indicator function of Y. If Y reduces to X via a function f that can be computed in polynomial-time, then I_Y = I_X . f, where . denotes function composition. X is at least as hard as Y because, given an algorithm for I_X, the formula above gives an algorithm for I_Y that, for any class of running times closed under polynomial substitution (e.g., polynomial, exponential, finite), if the algorithm for I_X belongs to the class, then so does the algorithm for I_Y. The contrapositive of this statement is, if Y has no fast decision procedure, then X has no fast decision procedure.

Related

If we can prove that knapsack problem with limited capacity are solved in a polynomial time then all knapsack belongs to P

I found this question in my Optimization Algorithm course, the full question is this:
If we can prove all Knapsack problems with capacity limited to 100 can be solved in polynomial time, then all Knapsack problems belong to P. Is this sentence true or false? Justify.
With my book and some research I came out with something like this:
First of all KP is an NP-complete problem. With Dynamic programming it can reach a pseudopolynomial time, but it's not enough.
If, absurdly, we can prove that KP with capacity limited to 100 can be solved in polynomial time then we can assume that KP belongs to P.
What do you think about my answer? I think the absurd is not so right in the last sentence.
Proving that all knapsack problems with a limited capacity can be solved in polynomial time does not prove that all knapsack problems are in P. If a problem is in P, that means that it can be solved in polynomial time. This means that it can be solved in O(n^k) where k is some integer. Big O is an upper bound, meaning that, if an algorithm is O(n), as n approaches infinity, the time it takes to do the algorithms will never be longer than n. By proving that all problems with n<100 can be solved in polynomial time, this makes no guarantee for much larger n. Therefore we cannot say that there is an algorithm that runs in O(n^k) and is therefore in P.

Non deterministic Polynomial(NP) vs Polynomial(P)?

I am actually looking for description what NP alogrithm actually means and what kind of algo/problem can be classified as NP problem
I have read many resources on net . I liked
https://www.quora.com/What-are-P-NP-NP-complete-and-NP-hard
What are the differences between NP, NP-Complete and NP-Hard?
Non deterministic Turing machine
What are NP problems?
What are NP and NP-complete problems?
Polynomial problem :-
If the running time is some polynomial function of the size of the input**, for instance if the algorithm runs in linear time or quadratic time or cubic time, then we say the algorithm runs in polynomial time . Example can be binary search
Now I do understand Polynomial problem . But not able to contrast it with NP.
NP(nondeterministic polynomial Problem):-
Now there are a lot of programs that don't (necessarily) run in polynomial time on a regular computer, but do run in polynomial time on a nondeterministic Turing machine. These programs solve problems in NP, which stands for nondeterministic polynomial time.
I am not able to to understand/think of example that does not run in polynomial time on a regular computer. Per mine current understanding, Every problem/algo can be solved
in some polynomial function of time which can or can't be proportional to time. I know i am missing something here but really could not grasp this concept. Could someone
give example of problem which can not be solved in polynomial time on regular computer but can be verified in polynomial time ?
One of the example given at second link mentioned above is Integer factorization is in NP. This is the problem that given integers n and m, is there an integer f with 1 < f < m, such that f divides n (f is a small factor of n)? why this can't be solved in some polynomial time on regular computer ? we can check for all number from 1 to n if they divide n or not. Right ?
Also where verification part come here(i mean if it can be solved in polynomial time but then how the problem solution can be verified in polynomial time)?
Your question touches several points.
First, in the sense relevant to your question, the size of a problem is defined to be the size of the representation of the problem. So, for example, when you write about the problem of a divisor of n. What is the representation of n? It is a series of characters of length q (I don't want to be more specific than that). In general, n is exponential in q. So when you talk about a simple loop from 1 to n, you're talking about something that is exponential in the size of the input. For example, the number "999999999999999" represents the number 999999999999999. That is quite a large number, but it is represented by 12 characters here.
Second, while there is more than a single way to define the class NP, perhaps the simplest one for decision problems (which is the type you raise in your question, namely is something true or not) is that if the answer is true, then there is an "certificate" that can be verified in polynomial time. For example, consider the Hamilton Path Problem. This is (probably) a hard problem to solve, but, if you are given a hamilton path as an answer, it is very easy to verify that it is so; specifically, it can be done in polynomial time. For the Hamilton Path Problem, the path is a polynomial-time verifiable certificate, and therefore this problem is NP.
It's probably worth noting how the idea of "checking a solution in polynomial time" relates to a nondeterministic Turing Machine solving a problem: in a normal (deterministic) Turing Machine, there is a well-defined set of instructions telling the machine exactly what to do in any situation ("if you're in state 3 and see an 'a', move left, if you're in state 7 and see a 'c', overwrite it with a 'b', etc.") whereas in a nondeterministic Turing Machine there is more than one option for what to do in some situations ("if you're in state 3 and see an 'a', either move right or overwrite it with a 'b'"). In terms of algorithms, this lets us "guess" solutions in the sense that if we can encode a problem into a language on an alphabet* then we can use a nondeterministic Turing Machine to generate strings on this alphabet, and then use a standard (deterministic) Turing Machine to ensure that it is correct. If we assume that we always make the right guess, then the runtime of our algorithm is simply the runtime of the deterministic checking part, which for NP problems runs in polynomial time. This is what it means for a problem to be 'decidable in polynomial time on a nondeterministic Turing Machine', and why it is often simply phrased as 'checking a solution/ certificate in polynomial time'.
*
Example: The Hamiltonian Path problem could be encoded as follows:
Label the vertices of the graph 1 through n, where n is the number of vertices. Our alphabet is then the numbers 1 through n, and our language consists of all words such that
a) every integer from 1 to n appears exactly once
and
b) for every consecutive pair of integers in a word, the vertices with those labels are connected
Polynomial Time :- Problem which can be solved in polynomial time of input size is called polynomial problem. In plain simple words :- Here Solution to problem is fast. For
example sorting, binary search
Non deterministic polynomial :- Theoretically the problems which can be verified in polynomial time irrespective of actual solution time complexity (which can be polynomial or not polynomial). So some problem which are P can also be NP.
But Informally people while conversation/posts use the NP term in below sense
Problem which can not be solved in polynomial time of input size is called polynomial problem. In plain simple words :- Here Solution to problem is not fast. You may have to try different permutation/combination or guessing work. But Verification part is fast and can be done in polynomial time. Like
input some numbers X and divide the numbers into two groups that difference in their sum is minimum
I really liked the Alex Flint answer at https://www.quora.com/What-are-P-NP-NP-complete-and-NP-hard .Above is just gist of that.

Understanding Polynomial TIme Approximation Scheme

Is an approximation algorithm the same as a Polynomial Time Approximation Algorithm (PTAS)? E.g. It can be shown that A(I) <= 2 * OPT(I) for vertex cover. Does it mean that Vertex Cover has a 2-polynomial time approximation algorithm or a PTAS?
Thanks!
Note: The text in Italics is the edit I made after I posted my question.
No, this isn't necessarily the case. A PTAS is an algorithm where given any ε > 0, you can approximate the answer to a factor of (1 + ε) in polynomial time. In other words, you can get arbitrarily good approximations.
Some problems are known (for example, MAX-3SAT) that have approximation algorithms for specific factors (for example, 5/8), but where it's known that unless P = NP there is a hard limit to how well the problem can be approximated in polynomial time. For example, the PCP theorem says that MAX-3SAT doesn't have a polynomial-time 7/8 approximation unless P = NP. It's therefore possible that MAX-3SAT has a PTAS, but only if P = NP.
Hope this helps!
Vertex cover having 2-approximation algorithm is not same as having a PTAS algorithm. Sometimes, there are problems where much better approximation is possible. These problems then admit PTAS.
Such algorithms take an instance of the problem as input, with another input parameter epsilon>0. And it gives an output whose value is at most (1+epsilon).OPT for minimisation problem; and (1/(1+epsilon)).OPT for maximisation problem.
Run time of a PTAS algorithm is polynomial in n (size of problem instance). Sometimes, the runtime is also polynomial in epsilon, then its called to admit FPTAS(fully PTAS).
Example:
Dynamic programming algorithm for KNAPSACK with integer-profits gives optimal solution.
While, KNAPSACK problem with real-valued profits do not admit polynomial-time algorithm. But it admits a FPTAS, where real-value profits are converted into integer profits; and DP algorithm is used to calculate the solution with "rounded" profits.
Another example, Max Independent Set does not admit a PTAS or FPTAS. Because, in this case, we can set a value for epsilon, which will always give optimal solution for any graph using that PTAS algorithm; which is not possible until P=NP.

Nondeterminism versus polynomial-time verifiability

I have read that an NP problem is verifiable in polynomial time or, equivalently, is solvable in polynomial time by a non-deterministic Turing machine. Why are these definitions equivalent?
First, let's show that anything that can be verified in polynomial time can be solved in nondeterministic polynomial time. Suppose you have some algorithm P(x, y) that decides whether y verifies x, and where P runs in time O(|x|k) for some constant k. The runtime of this algorithm is polynomial in x, meaning that it can only look at polynomially many bits of y. Then you could build this nondetermistic polynomial-time algorithm that solves the problem:
Nondeterministically guess a string y of length O(|x|k).
Deterministically run P(x, y) and output whatever it says.
This runs in nondeterministic polynomial time because y is constructed in nondeterministic polynomial time and P(x, y) then runs in polynomial time. If there is a y that verifies x, this machine can nondeterministically guess it. Otherwise, no guess works and the machine will output NO.
The other direction is trickier. Suppose there's a nondeterministic algorithm P(x) that runs in nondeterministic time O(|x|k) for some constant k. This nondeterministic algorithm, at each step, chooses from one of c different options of what to do next. Therefore, you can make a verification program Q(x, y) where y encodes what choices to make at each step of the computation of P. It can then simulate P(x), looking at the choices encoded in y to determine which step to take next. This will run in deterministic time O(|x|k) because it simply does all of the steps in the nondeterministic computation.
Hope this helps!
Perhaps this example gives some hint:
Given L = {w : expression w is satisfiable}
and Time for n variables:
Guess an assignment of the variables O(n)
Check if this is a satisfying assignment O(n)
Total time: O(n)
The satisfiability problem is an NP-Problem and Intractable but widely used in computing
applications due to the fact that it is linear time complexity for each guess.
The class NP is intended to isolate the notion of polynomial time “verifiability”.
NP is the class of languages that have polynomial time verifiers.

What is fixed-parameter tractability? Why is it useful?

Some problems that are NP-hard are also fixed-parameter tractable, or FPT. Wikipedia describes a problem as fixed-parameter tractable if there's an algorithm that solves it in time f(k) · |x|O(1).
What does this mean? Why is this concept useful?
To begin with, under the assumption that P ≠ NP, there are no polynomial-time, exact algorithms for any NP-hard problem. Although we don't know whether P = NP or P ≠ NP, we don't have any polynomial-time algorithms for any NP-hard problems.
The idea behind fixed-parameter tractability is to take an NP-hard problem, which we don't know any polynomial-time algorithms for, and to try to separate out the complexity into two pieces - some piece that depends purely on the size of the input, and some piece that depends on some "parameter" to the problem.
As an example, consider the 0/1 knapsack problem. In this problem, you're given a list of n objects that have associated weights and values, along with some maximum weight W that you're allowed to carry. The question is to determine the maximum amount of value that you can carry. This problem is NP-hard, meaning that there's no polynomial-time algorithm that solves it. A brute-force method will take time around O(2n) by considering all possible subsets of the items, which is extremely slow for large n. However, it is possible to solve this problem in time O(nW), where n is the number of elements and W is the amount of weight you can carry. If you look at the runtime O(nW), you'll notice that it's split into two parts: a component that's linear in the number of elements (the n part) and a component that's linear in the weight (the W part). If W is any fixed constant, then the runtime of this algorithm will be O(n), which is linear-time, even though the problem in general is NP-hard. This means that if we treat W as some tunable "parameter" of the problem, for any fixed value of this parameter, the problem ends up running in polynomial time (which is "tractable," in the complexity theory sense of the word.)
As another example, consider the problem of finding long, simple paths in a graph. This problem is also NP-hard, and the naive algorithm for finding simple paths of length k in a graph takes time O(n! / (n - k)!), which for large k ends up being superexponential. However, using the technique of color-coding, it's possible to solve this problem in time O((2e)kn3 log n), where k is the length of the path to find and n is the number of nodes in the input graph. Notice that this runtime also has two "components:" one component that's a polynomial in the number of nodes in the input graph (the n3 log n part) and one component that's exponential in k (the (2e)k part). This means that for any fixed value of k, there's a polynomial-time algorithm for finding length-k paths in the graph; the runtime will be O(n3 log n).
In both of these cases, we can take a problem for which we have an exponential-time solution (or worse) and find a new solution whose runtime is some polynomial in n times some crazy-looking function of some extra "parameter." In the case of the knapsack problem, that parameter is the maximum amount of weight we can carry; in the case of finding long paths, the parameter is the length of the path to find. Generally speaking, a problem is called fixed-parameter tractable if there is some algorithm for solving the problem defined in terms of two quantities: n, the size of the input, and k, some "parameter," where the runtime is
O(p(n) · f(k))
Where p(n) is some polynomial function and f(k) is an arbitrary function in k. Intuitively, this means that the complexity of the problem scales polynomially with n (meaning that as only the problem size increases, the runtime will scale nicely), but can scale arbitrarily badly with the parameter k. This separates out the "inherent hardness" of the problem such that the "hard part" of the problem is blamed on the parameter k, while the "easy part" of the problem is charged to the size of the input.
Once you have a runtime that looks like O(p(n) · f(k)), we immediately get polynomial-time algorithms for solving the problem for any fixed k. Specifically, if k is fixed, then f(k) is some constant, so O(p(n) · f(k)) is just O(p(n)). This is a polynomial-time algorithm. Therefore, if we "fix" the parameter, we get back some "tractable" algorithm for solving the problem. This is the origin of the term fixed-parameter tractable.
(A note: Wikipedia's definition of fixed-parameter tractability says that the algorithm should have runtime f(k) · |x|O(1). Here, |x| refers to the size of the input, which I've called n here. This means that Wikipedia's definition is the same as saying that the runtime is f(k) · nO(1). As mentioned in this earlier answer, nO(1) means "some polynomial in n," and so this definition ends up being equivalent to the one I've given here).
Fixed-parameter tractability has enormous practical implications for a problem. It's common to encounter problems that are NP-hard. If you find a problem that's fixed-parameter tractable and the parameter is low, it can be significantly more efficient to use the fixed-parameter tractable algorithm than to use the normal brute-force algorithm. The color-coding example above for finding long paths in a graph, for example, has been used to great success in computational biology to find sequencing pathways in yeast cells, and the 0/1 knapsack solution is used frequently because common values of W are low enough for it to be practical.
Hope this helps!
I believe that the explanation of #templatetypedef was already quite comprehensive of the generality of FPT.
I would like to add that in practice, it appears quite often that the class of problem one is trying to solve is FPT, such as above examples.
In the case of problems expressed as set of constraints (e.g. SAT, CSP, ILP, etc.) a very common parameter is treewidth, which basically explicits how much your problem is organized as a tree.
This allows to split ones problem into a tree of subproblems which can then be solved more individually using dynamic programming.
In such case, many problems are linear-time fixed-parameter tractable, that is the complexity grows linearly with the number of components (i.e. the size of the system) by exponentially in the size of its biggest component.
Although the use of explicit techniques is possible to solve sub-problems is possible, in order to scale-up to more reasonnable instances, using symbolic representations is recomended.

Resources