What does this sentence mean "It is NP-hard to approximate the Max-3-DM with bound 2 "? - complexity-theory

It has been shown that: it is NP-hard to approximate the maximum 3-dimensional matching problem (Max-3-DM) to within 95/94, this result apply to instances with exactly two occurrences of each element.
Does this mean that, the Max-3-DM with the bound 2 on the number of occurrences of each element in triples, is NP-hard?
I have found a polynomial reduction from the Max-3-DM with bound 2 to my problem, can I say that my problem is NP-hard?

If it is indeed NP-hard to approximate MAX-3DM with exactly two instances of each element within 95/94, then it's NP-hard to solve MAX-3DM with exactly two instances of each element. Specifically, if you could solve the problem exactly, then you could end up with an approximation better than 95/94 of the optimal solution (namely, you'd have something that was exactly accurate).
Generally speaking, if it's NP-hard to approximate a problem to within a factor of 1+ε, it's NP-hard to solve it exactly because an exact solution is essentially a 1-approximation of the true answer.

From what I understand, this sentence means that the approximation problem is NP-hard. It says nothing about Max-3-DM problem itself.
Regardless of that, in order to prove that Your problem is NP-hard, You have to reduce some NP-complete problem to Your problem. So even if Max-3-DM is NP-hard, reduction to Max-3-DM problem is not enough. You would have to reduce Max-3-DM to Your problem (that is, in the opposite direction) and Max-3-DM would have to be NP-complete.

Related

NP-hardness. Is it average case or worst-case?

Do we measure the NP-hardness in terms of average-case hardness or worst-case hardness?
I've found this here:
"However, NP-completeness is defined in terms of worst-case complexity".
Does it remain true to NP-hardness?
I don't know what the term "worst-case complexity" means. What is the difference between worst-case complexity and worst-case problems?
An interesting nuance here is that NP-hardness, by itself, doesn’t speak about worst-case or average case complexity. Rather, the formal definition of NP-hardness purely says that there’s a polynomial-time reduction from every problem in NP to any NP-hard problem. That reduction means that any instance of any problem in NP could be solved by applying the reduction and then solving the NP-hard problem. But because this applies to “any instance” and the specific transform done by the reduction isn’t specified, that definition by itself doesn’t say anything about average-case complexity.
We can artificially construct NP-hard problems that are extremely easy to solve on average. Here’s an example. Take an NP-hard problem - say, the problem of checking whether a graph is 3-colorable. We can solve this in time (roughly) O(3n) by simply trying all possible colorings. (The actual time complexity is a bit higher because we need to check edges in each step, but let’s ignore that for now). Now, we’ll invent a contrived problem of the following form:
Given a string of 0s, 1s, and 2s, determine whether
The first half of the string contains a 1 or a 2, or
Whether it doesn’t and the back half of the string is a base-3 encoding of a graph that’s 3-colorable.
This problem is NP-hard because we can reduce graph 3-colorability to it by just prepending a bunch of 0s to any input instance of 3-colorability. But on average it’s very easy to solve this problem. The probability that a string’s first half is all 0s is 1 / 3n/2, where n is the length of the string. This means that even if it takes O(3n/2) time to check the coloring of the graph encoded in the back half of a suitable string, mathematically the average amount of work required to solve this problem is O(1). (I’m aware I’m conflating the meaning of n as “the number of nodes in a graph” and “how long the string is,” but the math still checks out here.)
What’s worrisome is that we still don’t have a very well-developed theory of average-case complexity for NP-hard problems. Some NP-hard problems, like the one above, are very easy on average. But others like SAT, graph coloring, etc. are mysteries to us, where we legitimately don’t know how hard they are for random instances. It’s entirely possible, for example, that P ≠ NP and yet the average-case hardness of individual NP-hard problems are not known.

Dynamic Programming : Why the need for optimal sub structure

I was revisiting my notes on Dynamic Programming. Its basically a memoized recursion technique, which stores away the solutions to smaller subproblems for later reuse in computing solutions to relatively larger sub problems.
The question I have is that in order to apply DP to a recursive problem, it must have an optimal substructure. This basically necessitates that an optimal solution to a problem contains optimal solution to subproblems.
Is it possible otherwise ? I mean have you ever seen a case where optimal solution to a problem does not contain an optimal solution to subproblems.
Please share some examples, if you know to deepen my understanding.
In dynamic programming a given problems has Optimal Substructure Property if optimal solution of the given problem can be obtained by using optimal solutions of its sub problems.
For example the shortest path problem has following optimal substructure property: If a node X lies in the shortest path from a source node U to destination node V then the shortest path from U to V is combination of shortest path from U to X and shortest path from X to V.
But Longest path problem doesn’t have the Optimal Substructure property.
i.e the longest path between two nodes doesn't have to be the longest path between the in between nodes.
For example, the longest path q->r->t is not a combination of longest path from q to r and longest path from r to t, because the longest path from q to r is q->s->t->r.
So here: optimal solution to a problem does not contain an optimal solution to the sub problems.
For more details you can read
Longest path problem from wikipedia
Optimal substructure from wikipedia
You're perfectly right that the definitions are imprecise. DP is a technique for getting algorithmic speedups rather than an algorithm in itself. The term "optimal substructure" is a vague concept. (You're right again here!) To wit, every loop can be expressed as a recursive function: each iteration solves a subproblem for the successive one. Is every algorithm with a loop a DP? Clearly not.
What people actually mean by "optimal substructure" and "overlapping subproblems" is that subproblem results are used often enough to decrease the asymptotic complexity of solutions. In other words, the memoization is useful! In most cases the subtle implication is a decrease from exponential to polynomial time, O(n^k) to O(n^p), p<k or similar.
Ex: There is an exponential number of paths between two nodes in a dense graph. DP finds the shortest path looking at only a polynomial number of them because the memos are extremely useful in this case.
On the other hand, Traveling salesman can be expressed as a memoized function (e.g. see this discussion), where the memos cause a factor of O( (1/2)^n ) time to be saved. But, the number of TS paths through n cities, is O(n!). This is so much bigger that the asymptotic run time is still super-exponential: O(n!)/O(2^n) = O(n!). Such an algorithm is generally not called a Dynamic Program even though it's following very much the same pattern as the DP for shortest paths. Apparently it's only a DP if it gives a nice result!
To my understanding, this 'optimal substructure' property is necessary not only for Dynamic Programming, but to obtain a recursive formulation of the solution in the first place. Note that in addition to the Wikipedia article on Dynamic Programming, there is a separate article on the optimal substructure property. To make things even more involved, there is also an article about the Bellman equation.
You could solve the Traveling Salesman Problem, choosing the nearest city at each step, but it's wrong method.
The whole idea is to narrow down the problem into the relatively small set of the candidates for optimal solution and use "brute force" to solve it.
So it better be that solutions of the smaller sub-problem should be sufficient to solve the bigger problem.
This is expressed via a recursion as function of the optimal solution of smaller sub-problems.
answering this question:
Is it possible otherwise ? I mean have you ever seen a case where
optimal solution to a problem does not contain an optimal solution to
subproblems.
no it is not possible, and can even be proven.
You can try to implement dynamic programming on any recursive problem but you will not get any better result if it doesn't have optimal substructure property. In other words dynamic programming methodology is not useful to implement on a problem which doesn't have optimal substructure property.

I need to solve an NP-hard problem. Is there hope?

There are a lot of real-world problems that turn out to be NP-hard. If we assume that P ≠ NP, there aren't any polynomial-time algorithms for these problems.
If you have to solve one of these problems, is there any hope that you'll be able to do so efficiently? Or are you just out of luck?
If a problem is NP-hard, under the assumption that P ≠ NP there is no algorithm that is
deterministic,
exactly correct on all inputs all the time, and
efficient on all possible inputs.
If you absolutely need all of the above guarantees, then you're pretty much out of luck. However, if you're willing to settle for a solution to the problem that relaxes some of these constraints, then there very well still might be hope! Here are a few options to consider.
Option One: Approximation Algorithms
If a problem is NP-hard and P ≠ NP, it means that there's is no algorithm that will always efficiently produce the exactly correct answer on all inputs. But what if you don't need the exact answer? What if you just need answers that are close to correct? In some cases, you may be able to combat NP-hardness by using an approximation algorithm.
For example, a canonical example of an NP-hard problem is the traveling salesman problem. In this problem, you're given as input a complete graph representing a transportation network. Each edge in the graph has an associated weight. The goal is to find a cycle that goes through every node in the graph exactly once and which has minimum total weight. In the case where the edge weights satisfy the triangle inequality (that is, the best route from point A to point B is always to follow the direct link from A to B), then you can get back a cycle whose cost is at most 3/2 optimal by using the Christofides algorithm.
As another example, the 0/1 knapsack problem is known to be NP-hard. In this problem, you're given a bag and a collection of objects with different weights and values. The goal is to pack the maximum value of objects into the bag without exceeding the bag's weight limit. Even though computing an exact answer requires exponential time in the worst case, it's possible to approximate the correct answer to an arbitrary degree of precision in polynomial time. (The algorithm that does this is called a fully polynomial-time approximation scheme or FPTAS).
Unfortunately, we do have some theoretical limits on the approximability of certain NP-hard problems. The Christofides algorithm mentioned earlier gives a 3/2 approximation to TSP where the edges obey the triangle inequality, but interestingly enough it's possible to show that if P ≠ NP, there is no polynomial-time approximation algorithm for TSP that can get within any constant factor of optimal. Usually, you need to do some research to learn more about which problems can be well-approximated and which ones can't, since many NP-hard problems can be approximated well and many can't. There doesn't seem to be a unified theme.
Option Two: Heuristics
In many NP-hard problems, standard approaches like greedy algortihms won't always produce the right answer, but often do reasonably well on "reasonable" inputs. In many cases, it's reasonable to attack NP-hard problems with heuristics. The exact definition of a heuristic varies from context to context, but typically a heuristic is either an approach to a problem that "often" gives back good answers at the cost of sometimes giving back wrong answers, or is a useful rule of thumb that helps speed up searches even if it might not always guide the search the right way.
As an example of the first type of heuristic, let's look at the graph-coloring problem. This NP-hard problem asks, given a graph, to find the minimum number of colors necessary to paint the nodes in the graph such that no edge's endpoints are the same color. This turns out to be a particularly tough problem to solve with many other approaches (the best known approximation algorithms have terrible bounds, and it's not suspected to have a parameterized efficient algorithm). However, there are many heuristics for graph coloring that do quite well in practice. Many greedy coloring heuristics exist for assigning colors to nodes in a reasonable order, and these heuristics often do quite well in practice. Unfortunately, sometimes these heuristics give terrible answers back, but provided that the graph isn't pathologically constructed the heuristics often work just fine.
As an example of the second type of heuristic, it's helpful to look at SAT solvers. SAT, the Boolean satisfiability problem, was the first problem proven to be NP-hard. The problem asks, given a propositional formula (often written in conjunctive normal form), to determine whether there is a way to assign values to the variables such that the overall formula evaluates to true. Modern SAT solvers are getting quite good at solving SAT in many cases by using heuristics to guide their search over possible variable assignments. One famous SAT-solving algorithm, DPLL, essentially tries all possible assignments to see if the formula is satisfiable, using heuristics to speed up the search. For example, if it finds that a variable is either always true or always false, DPLL will try assigning that variable its forced value before trying other variables. DPLL also finds unit clauses (clauses with just one literal) and sets those variables' values before trying other variables. The net effect of these heuristics is that DPLL ends up being very fast in practice, even though it's known to have exponential worst-case behavior.
Option Three: Pseudopolynomial-Time Algorithms
If P ≠ NP, then no NP-hard problem can be solved in polynomial time. However, in some cases, the definition of "polynomial time" doesn't necessarily match the standard intuition of polynomial time. Formally speaking, polynomial time means polynomial in the number of bits necessary to specify the input, which doesn't always sync up with what we consider the input to be.
As an example, consider the set partition problem. In this problem, you're given a set of numbers and need to determine whether there's a way to split the set into two smaller sets, each of which has the same sum. The naive solution to this problem runs in time O(2n) and works by just brute-force testing all subsets. With dynamic programming, though, it's possible to solve this problem in time O(nN), where n is the number of elements in the set and N is the maximum value in the set. Technically speaking, the runtime O(nN) is not polynomial time because the numeric value N is written out in only log2 N bits, but assuming that the numeric value of N isn't too large, this is a perfectly reasonable runtime.
This algorithm is called a pseudopolynomial-time algorithm because the runtime O(nN) "looks" like a polynomial, but technically speaking is exponential in the size of the input. Many NP-hard problems, especially ones involving numeric values, admit pseudopolynomial-time algorithms and are therefore easy to solve assuming that the numeric values aren't too large.
For more information on pseudopolynomial time, check out this earlier Stack Overflow question about pseudopolynomial time.
Option Four: Randomized Algorithms
If a problem is NP-hard and P ≠ NP, then there is no deterministic algorithm that can solve that problem in worst-case polynomial time. But what happens if we allow for algorithms that introduce randomness? If we're willing to settle for an algorithm that gives a good answer on expectation, then we can often get relatively good answers to NP-hard problems in not much time.
As an example, consider the maximum cut problem. In this problem, you're given an undirected graph and want to find a way to split the nodes in the graph into two nonempty groups A and B with the maximum number of edges running between the groups. This has some interesting applications in computational physics (unfortunately, I don't understand them at all, but you can peruse this paper for some details about this). This problem is known to be NP-hard, but there's a simple randomized approximation algorithm for it. If you just toss each node into one of the two groups completely at random, you end up with a cut that, on expectation, is within 50% of the optimal solution.
Returning to SAT, many modern SAT solvers use some degree of randomness to guide the search for a satisfying assignment. The WalkSAT and GSAT algorithms, for example, work by picking a random clause that isn't currently satisfied and trying to satisfy it by flipping some variable's truth value. This often guides the search toward a satisfying assignment, causing these algorithms to work well in practice.
It turns out there's a lot of open theoretical problems about the ability to solve NP-hard problems using randomized algorithms. If you're curious, check out the complexity class BPP and the open problem of its relation to NP.
Option Five: Parameterized Algorithms
Some NP-hard problems take in multiple different inputs. For example, the long path problem takes as input a graph and a length k, then asks whether there's a simple path of length k in the graph. The subset sum problem takes in as input a set of numbers and a target number k, then asks whether there's a subset of the numbers that dds up to exactly k.
Interestingly, in the case of the long path problem, there's an algorithm (the color-coding algorithm) whose runtime is O((n3 log n) · bk), where n is the number of nodes, k is the length of the requested path, and b is some constant. This runtime is exponential in k, but is only polynomial in n, the number of nodes. This means that if k is fixed and known in advance, the runtime of the algorithm as a function of the number of nodes is only O(n3 log n), which is quite a nice polynomial. Similarly, in the case of the subset sum problem, there's a dynamic programming algorithm whose runtime is O(nW), where n is the number of elements of the set and W is the maximum weight of those elements. If W is fixed in advance as some constant, then this algorithm will run in time O(n), meaning that it will be possible to exactly solve subset sum in linear time.
Both of these algorithms are examples of parameterized algorithms, algorithms for solving NP-hard problems that split the hardness of the problem into two pieces - a "hard" piece that depends on some input parameter to the problem, and an "easy" piece that scales gracefully with the size of the input. These algorithms can be useful for finding exact solutions to NP-hard problems when the parameter in question is small. The color-coding algorithm mentioned above, for example, has proven quite useful in practice in computational biology.
However, some problems are conjectured to not have any nice parameterized algorithms. Graph coloring, for example, is suspected to not have any efficient parameterized algorithms. In the cases where parameterized algorithms exist, they're often quite efficient, but you can't rely on them for all problems.
For more information on parameterized algorithms, check out this earlier Stack Overflow question.
Option Six: Fast Exponential-Time Algorithms
Exponential-time algorithms don't scale well - their runtimes approach the lifetime of the universe for inputs as small as 100 or 200 elements.
What if you need to solve an NP-hard problem, but you know the input is reasonably small - say, perhaps its size is somewhere between 50 and 70. Standard exponential-time algorithms are probably not going to be fast enough to solve these problems. What if you really do need an exact solution to the problem and the other approaches here won't cut it?
In some cases, there are "optimized" exponential-time algorithms for NP-hard problems. These are algorithms whose runtime is exponential, but not as bad an exponential as the naive solution. For example, a simple exponential-time algorithm for the 3-coloring problem (given a graph, determine if you can color the nodes one of three colors each so that no edge's endpoints are the same color) might work checking each possible way of coloring the nodes in the graph, testing if any of them are 3-colorings. There are 3n possible ways to do this, so in the worst case the runtime of this algorithm will be O(3n · poly(n)) for some small polynomial poly(n). However, using more clever tricks and techniques, it's possible to develop an algorithm for 3-colorability that runs in time O(1.3289n). This is still an exponential-time algorithm, but it's a much faster exponential-time algorithm. For example, 319 is about 109, so if a computer can do one billion operations per second, it can use our initial brute-force algorithm to (roughly speaking) solve 3-colorability in graphs with up to 19 nodes in one second. Using the O((1.3289n)-time exponential algorithm, we could solve instances of up to about 73 nodes in about a second. That's a huge improvement - we've grown the size we can handle in one second by more than a factor of three!
As another famous example, consider the traveling salesman problem. There's an obvious O(n! · poly(n))-time solution to TSP that works by enumerating all permutations of the nodes and testing the paths resulting from those permutations. However, by using a dynamic programming algorithm similar to that used by the color-coding algorithm, it's possible to improve the runtime to "only" O(n2 2n). Given that 13! is about one billion, the naive solution would let you solve TSP for 13-node graphs in roughly a second. For comparison, the DP solution lets you solve TSP on 28-node graphs in about one second.
These fast exponential-time algorithms are often useful for boosting the size of the inputs that can be exactly solved in practice. Of course, they still run in exponential time, so these approaches are typically not useful for solving very large problem instances.
Option Seven: Solve an Easy Special Case
Many problems that are NP-hard in general have restricted special cases that are known to be solvable efficiently. For example, while in general it’s NP-hard to determine whether a graph has a k-coloring, in the specific case of k = 2 this is equivalent to checking whether a graph is bipartite, which can be checked in linear time using a modified depth-first search. Boolean satisfiability is, generally speaking, NP-hard, but it can be solved in polynomial time if you have an input formula with at most two literals per clause, or where the formula is formed from clauses using XOR rather than inclusive-OR, etc. Finding the largest independent set in a graph is generally speaking NP-hard, but if the graph is bipartite this can be done efficiently due to König’s theorem.
As a result, if you find yourself needing to solve what might initially appear to be an NP-hard problem, first check whether the inputs you actually need to solve that problem on have some additional restricted structure. If so, you might be able to find an algorithm that applies to your special case and runs much faster than a solver for the problem in its full generality.
Conclusion
If you need to solve an NP-hard problem, don't despair! There are lots of great options available that might make your intractable problem a lot more approachable. No one of the above techniques works in all cases, but by using some combination of these approaches, it's usually possible to make progress even when confronted with NP-hardness.

Can 1 approximation algorithm be used for multiple NP-Hard problems?

Since any NP Hard problem be reduced to any other NP Hard problem by mapping, my question is 1 step forward;
for example every step of that algo : could that also be mapped to the other NP hard?
Thanks in advance
From http://en.wikipedia.org/wiki/Approximation_algorithm we see that
NP-hard problems vary greatly in their approximability; some, such as the bin packing problem, can be approximated within any factor greater than 1 (such a family of approximation algorithms is often called a polynomial time approximation scheme or PTAS). Others are impossible to approximate within any constant, or even polynomial factor unless P = NP, such as the maximum clique problem.
(end quote)
It follows from this that a good approximation in one NP-complete problem is not necessarily a good approximation in another NP-complete problem. In that fortunate world we could use easily-approximated NP-complete problems to find good approximate algorithms for all other NP-complete problems, which is not the case here, as there are hard-to-approximate NP-complete problems.
When proving a problem is NP-Hard, we usually consider the decision version of the problem, whose output is either yes or no. However, when considering approximation algorithms, we consider the optimization version of the problem.
If you use one problem's approximation algorithm to solve another problem by using the reduction in the proof of NP-Hard, the approximation ratio may change. For example, if you have a 2-approximation algorithm for problem A and you use it to solve problem B, then you may get a O(n)-approximation algorithm for problem B, since the reduction does not preserve approximation ratio. Hence, if you want to use an approximation algorithm for one problem to solve another problem, you need to ensure that the reduction will not change approximation ratio too much in order to get a useful algorithm. For example, you can use L-reduction or PTAS reduction.

NP-Hard solution question

i have NP hard problem. Let imagine I have found some polynomial algorithm that find ONLY one of many existing solutions of that problem, but at least one solution (if present in the probem). Is that algorithm considered as solution of NP=P question (if that algorithm transformed to mathematical proof)?
Thanks for answers
NP is a class of decision problems. Your algorithm should answer "yes" or "no" correctly to all possible instances (questions).
For example, the problem: "given graph G and number k, does G contain a clique of size >= k" is NP-hard. If you have a polynomial time algorithm that answers "yes" or "no" correctly each time, then it is a valid proof of P=NP. The algorithm doesn't need to explicitly show the clique - only answer if it exists for all possible G and k.
If you find a NP-hard problem and you can detect some cases that you can solve in polynomial time (leaving others for exponential time), then only if the fraction of cases remaining is on the order of log(N)/N will you change the order of the entire problem, and even then only if you can restrict your exponential case to examining only log(N) not all N possibilities.
Also, if you find a NP-hard problem where you think you can solve every case in polynomial time, you have probably made a mistake, either in posing a NP-hard problem correctly, or in finding the more troublesome examples. Try a larger test set before believing yourself!

Resources