I would like to quote from Wikipedia
In mathematics, the minimum k-cut, is a combinatorial optimization
problem that requires finding a set of edges whose removal would
partition the graph to k connected components.
It is said to be the minimum cut if the set of edges is minimal.
For a k = 2, It would mean Finding the set of edges whose removal would Disconnect the graph into 2 connected components.
However, The same article of Wikipedia says that:
For a fixed k, the problem is polynomial time solvable in O(|V|^(k^2))
My question is Does this mean that minimum 2-cut is a problem that belongs to complexity class P?
The min-cut problem is solvable in polynomial time and thus yes it is true that it belongs to complexity class P. Another article related to this particular problem is the Max-flow min-cut theorem.
First of all, the time complexity an algorithm should be evaluated by expressing the number of steps the algorithm requires to finish as a function of the length of the input (see Time complexity). More or less formally, if you vary the length of the input, how would the number of steps required by the algorithm to finish vary?
Second of all, the time complexity of an algorithm is not exactly the same thing as to what complexity class does the problem the algorithm solves belong to. For one problem there can be multiple algorithms to solve it. The primality test problem (i.e. testing if a number is a prime or not) is in P, but some (most) of the algorithms used in practice are actually not polynomial.
Third of all, in the case of most algorithms you'll find on the Internet evaluating the time complexity is not done by definition (i.e. not as a function of the length of the input, at least not expressed directly as such). Lets take the good old naive primality test algorithm (the one in which you take n as input and you check for division by 2,3...n-1). How many steps does this algo take? One way to put it is O(n) steps. This is correct. So is this algorithm polynomial? Well, it is linear in n, so it is polynomial in n. But, if you take a look at what time complexity means, the algorithm is actually exponential. First, what is the length of the input to your problem? Well, if you provide the input n as an array of bits (the usual in practice) then the length of the input is, roughly said, L = log n. Your algorithm thus takes O(n)=O(2^log n)=O(2^L) steps, so exponential in L. So the naive primality test is in the same time linear in n, but exponential in the length of the input L. Both correct. Btw, the AKS primality test algorithm is polynomial in the size of input (thus, the primality test problem is in P).
Fourth of all, what is P in the first place? Well, it is a class of problems that contains all decision problems that can be solved in polynomial time. What is a decision problem? A problem that can be answered with yes or no. Check these two Wikipedia pages for more details: P (complexity) and decision problems.
Coming back to your question, the answer is no (but pretty close to yes :p). The minimum 2-cut problem is in P if formulated as a decision problem (your formulation requires an answer that is not just a yes-or-no). In the same time the algorithm that solves the problem in O(|V|^4) steps is a polynomial algorithm in the size of the input. Why? Well, the input to the problem is the graph (i.e. vertices, edges and weights), to keep it simple lets assume we use an adjacency/weights matrix (i.e. the length of the input is at least quadratic in |V|). So solving the problem in O(|V|^4) steps means polynomial in the size of the input. The algorithm that accomplishes this is a proof that the minimum 2-cut problem (if formulated as decision problem) is in P.
A class related to P is FP and your problem (as you formulated it) belongs to this class.
There are a lot of real-world problems that turn out to be NP-hard. If we assume that P ≠ NP, there aren't any polynomial-time algorithms for these problems.
If you have to solve one of these problems, is there any hope that you'll be able to do so efficiently? Or are you just out of luck?
If a problem is NP-hard, under the assumption that P ≠ NP there is no algorithm that is
deterministic,
exactly correct on all inputs all the time, and
efficient on all possible inputs.
If you absolutely need all of the above guarantees, then you're pretty much out of luck. However, if you're willing to settle for a solution to the problem that relaxes some of these constraints, then there very well still might be hope! Here are a few options to consider.
Option One: Approximation Algorithms
If a problem is NP-hard and P ≠ NP, it means that there's is no algorithm that will always efficiently produce the exactly correct answer on all inputs. But what if you don't need the exact answer? What if you just need answers that are close to correct? In some cases, you may be able to combat NP-hardness by using an approximation algorithm.
For example, a canonical example of an NP-hard problem is the traveling salesman problem. In this problem, you're given as input a complete graph representing a transportation network. Each edge in the graph has an associated weight. The goal is to find a cycle that goes through every node in the graph exactly once and which has minimum total weight. In the case where the edge weights satisfy the triangle inequality (that is, the best route from point A to point B is always to follow the direct link from A to B), then you can get back a cycle whose cost is at most 3/2 optimal by using the Christofides algorithm.
As another example, the 0/1 knapsack problem is known to be NP-hard. In this problem, you're given a bag and a collection of objects with different weights and values. The goal is to pack the maximum value of objects into the bag without exceeding the bag's weight limit. Even though computing an exact answer requires exponential time in the worst case, it's possible to approximate the correct answer to an arbitrary degree of precision in polynomial time. (The algorithm that does this is called a fully polynomial-time approximation scheme or FPTAS).
Unfortunately, we do have some theoretical limits on the approximability of certain NP-hard problems. The Christofides algorithm mentioned earlier gives a 3/2 approximation to TSP where the edges obey the triangle inequality, but interestingly enough it's possible to show that if P ≠ NP, there is no polynomial-time approximation algorithm for TSP that can get within any constant factor of optimal. Usually, you need to do some research to learn more about which problems can be well-approximated and which ones can't, since many NP-hard problems can be approximated well and many can't. There doesn't seem to be a unified theme.
Option Two: Heuristics
In many NP-hard problems, standard approaches like greedy algortihms won't always produce the right answer, but often do reasonably well on "reasonable" inputs. In many cases, it's reasonable to attack NP-hard problems with heuristics. The exact definition of a heuristic varies from context to context, but typically a heuristic is either an approach to a problem that "often" gives back good answers at the cost of sometimes giving back wrong answers, or is a useful rule of thumb that helps speed up searches even if it might not always guide the search the right way.
As an example of the first type of heuristic, let's look at the graph-coloring problem. This NP-hard problem asks, given a graph, to find the minimum number of colors necessary to paint the nodes in the graph such that no edge's endpoints are the same color. This turns out to be a particularly tough problem to solve with many other approaches (the best known approximation algorithms have terrible bounds, and it's not suspected to have a parameterized efficient algorithm). However, there are many heuristics for graph coloring that do quite well in practice. Many greedy coloring heuristics exist for assigning colors to nodes in a reasonable order, and these heuristics often do quite well in practice. Unfortunately, sometimes these heuristics give terrible answers back, but provided that the graph isn't pathologically constructed the heuristics often work just fine.
As an example of the second type of heuristic, it's helpful to look at SAT solvers. SAT, the Boolean satisfiability problem, was the first problem proven to be NP-hard. The problem asks, given a propositional formula (often written in conjunctive normal form), to determine whether there is a way to assign values to the variables such that the overall formula evaluates to true. Modern SAT solvers are getting quite good at solving SAT in many cases by using heuristics to guide their search over possible variable assignments. One famous SAT-solving algorithm, DPLL, essentially tries all possible assignments to see if the formula is satisfiable, using heuristics to speed up the search. For example, if it finds that a variable is either always true or always false, DPLL will try assigning that variable its forced value before trying other variables. DPLL also finds unit clauses (clauses with just one literal) and sets those variables' values before trying other variables. The net effect of these heuristics is that DPLL ends up being very fast in practice, even though it's known to have exponential worst-case behavior.
Option Three: Pseudopolynomial-Time Algorithms
If P ≠ NP, then no NP-hard problem can be solved in polynomial time. However, in some cases, the definition of "polynomial time" doesn't necessarily match the standard intuition of polynomial time. Formally speaking, polynomial time means polynomial in the number of bits necessary to specify the input, which doesn't always sync up with what we consider the input to be.
As an example, consider the set partition problem. In this problem, you're given a set of numbers and need to determine whether there's a way to split the set into two smaller sets, each of which has the same sum. The naive solution to this problem runs in time O(2n) and works by just brute-force testing all subsets. With dynamic programming, though, it's possible to solve this problem in time O(nN), where n is the number of elements in the set and N is the maximum value in the set. Technically speaking, the runtime O(nN) is not polynomial time because the numeric value N is written out in only log2 N bits, but assuming that the numeric value of N isn't too large, this is a perfectly reasonable runtime.
This algorithm is called a pseudopolynomial-time algorithm because the runtime O(nN) "looks" like a polynomial, but technically speaking is exponential in the size of the input. Many NP-hard problems, especially ones involving numeric values, admit pseudopolynomial-time algorithms and are therefore easy to solve assuming that the numeric values aren't too large.
For more information on pseudopolynomial time, check out this earlier Stack Overflow question about pseudopolynomial time.
Option Four: Randomized Algorithms
If a problem is NP-hard and P ≠ NP, then there is no deterministic algorithm that can solve that problem in worst-case polynomial time. But what happens if we allow for algorithms that introduce randomness? If we're willing to settle for an algorithm that gives a good answer on expectation, then we can often get relatively good answers to NP-hard problems in not much time.
As an example, consider the maximum cut problem. In this problem, you're given an undirected graph and want to find a way to split the nodes in the graph into two nonempty groups A and B with the maximum number of edges running between the groups. This has some interesting applications in computational physics (unfortunately, I don't understand them at all, but you can peruse this paper for some details about this). This problem is known to be NP-hard, but there's a simple randomized approximation algorithm for it. If you just toss each node into one of the two groups completely at random, you end up with a cut that, on expectation, is within 50% of the optimal solution.
Returning to SAT, many modern SAT solvers use some degree of randomness to guide the search for a satisfying assignment. The WalkSAT and GSAT algorithms, for example, work by picking a random clause that isn't currently satisfied and trying to satisfy it by flipping some variable's truth value. This often guides the search toward a satisfying assignment, causing these algorithms to work well in practice.
It turns out there's a lot of open theoretical problems about the ability to solve NP-hard problems using randomized algorithms. If you're curious, check out the complexity class BPP and the open problem of its relation to NP.
Option Five: Parameterized Algorithms
Some NP-hard problems take in multiple different inputs. For example, the long path problem takes as input a graph and a length k, then asks whether there's a simple path of length k in the graph. The subset sum problem takes in as input a set of numbers and a target number k, then asks whether there's a subset of the numbers that dds up to exactly k.
Interestingly, in the case of the long path problem, there's an algorithm (the color-coding algorithm) whose runtime is O((n3 log n) · bk), where n is the number of nodes, k is the length of the requested path, and b is some constant. This runtime is exponential in k, but is only polynomial in n, the number of nodes. This means that if k is fixed and known in advance, the runtime of the algorithm as a function of the number of nodes is only O(n3 log n), which is quite a nice polynomial. Similarly, in the case of the subset sum problem, there's a dynamic programming algorithm whose runtime is O(nW), where n is the number of elements of the set and W is the maximum weight of those elements. If W is fixed in advance as some constant, then this algorithm will run in time O(n), meaning that it will be possible to exactly solve subset sum in linear time.
Both of these algorithms are examples of parameterized algorithms, algorithms for solving NP-hard problems that split the hardness of the problem into two pieces - a "hard" piece that depends on some input parameter to the problem, and an "easy" piece that scales gracefully with the size of the input. These algorithms can be useful for finding exact solutions to NP-hard problems when the parameter in question is small. The color-coding algorithm mentioned above, for example, has proven quite useful in practice in computational biology.
However, some problems are conjectured to not have any nice parameterized algorithms. Graph coloring, for example, is suspected to not have any efficient parameterized algorithms. In the cases where parameterized algorithms exist, they're often quite efficient, but you can't rely on them for all problems.
For more information on parameterized algorithms, check out this earlier Stack Overflow question.
Option Six: Fast Exponential-Time Algorithms
Exponential-time algorithms don't scale well - their runtimes approach the lifetime of the universe for inputs as small as 100 or 200 elements.
What if you need to solve an NP-hard problem, but you know the input is reasonably small - say, perhaps its size is somewhere between 50 and 70. Standard exponential-time algorithms are probably not going to be fast enough to solve these problems. What if you really do need an exact solution to the problem and the other approaches here won't cut it?
In some cases, there are "optimized" exponential-time algorithms for NP-hard problems. These are algorithms whose runtime is exponential, but not as bad an exponential as the naive solution. For example, a simple exponential-time algorithm for the 3-coloring problem (given a graph, determine if you can color the nodes one of three colors each so that no edge's endpoints are the same color) might work checking each possible way of coloring the nodes in the graph, testing if any of them are 3-colorings. There are 3n possible ways to do this, so in the worst case the runtime of this algorithm will be O(3n · poly(n)) for some small polynomial poly(n). However, using more clever tricks and techniques, it's possible to develop an algorithm for 3-colorability that runs in time O(1.3289n). This is still an exponential-time algorithm, but it's a much faster exponential-time algorithm. For example, 319 is about 109, so if a computer can do one billion operations per second, it can use our initial brute-force algorithm to (roughly speaking) solve 3-colorability in graphs with up to 19 nodes in one second. Using the O((1.3289n)-time exponential algorithm, we could solve instances of up to about 73 nodes in about a second. That's a huge improvement - we've grown the size we can handle in one second by more than a factor of three!
As another famous example, consider the traveling salesman problem. There's an obvious O(n! · poly(n))-time solution to TSP that works by enumerating all permutations of the nodes and testing the paths resulting from those permutations. However, by using a dynamic programming algorithm similar to that used by the color-coding algorithm, it's possible to improve the runtime to "only" O(n2 2n). Given that 13! is about one billion, the naive solution would let you solve TSP for 13-node graphs in roughly a second. For comparison, the DP solution lets you solve TSP on 28-node graphs in about one second.
These fast exponential-time algorithms are often useful for boosting the size of the inputs that can be exactly solved in practice. Of course, they still run in exponential time, so these approaches are typically not useful for solving very large problem instances.
Option Seven: Solve an Easy Special Case
Many problems that are NP-hard in general have restricted special cases that are known to be solvable efficiently. For example, while in general it’s NP-hard to determine whether a graph has a k-coloring, in the specific case of k = 2 this is equivalent to checking whether a graph is bipartite, which can be checked in linear time using a modified depth-first search. Boolean satisfiability is, generally speaking, NP-hard, but it can be solved in polynomial time if you have an input formula with at most two literals per clause, or where the formula is formed from clauses using XOR rather than inclusive-OR, etc. Finding the largest independent set in a graph is generally speaking NP-hard, but if the graph is bipartite this can be done efficiently due to König’s theorem.
As a result, if you find yourself needing to solve what might initially appear to be an NP-hard problem, first check whether the inputs you actually need to solve that problem on have some additional restricted structure. If so, you might be able to find an algorithm that applies to your special case and runs much faster than a solver for the problem in its full generality.
Conclusion
If you need to solve an NP-hard problem, don't despair! There are lots of great options available that might make your intractable problem a lot more approachable. No one of the above techniques works in all cases, but by using some combination of these approaches, it's usually possible to make progress even when confronted with NP-hardness.
I'm trying to better understand reduction, and I'm currently looking at the two algorithms, "Median of medians" and Quicksort.
I understand that both algorithms use a similar (effectively identical) partition subroutine to help solve their problems, which ends up making them quite similar.
Select(A[1...n],k): // Pseudocode for median of medians
m = [n/5]
for i from 1 to m:
B[i] = Select(A[5i-4..5i],3)
mom = Select(B[1..m],m/2)
r = partition(A[1..n],mom) // THIS IS THE SUBROUTINE
if k < r:
return Select(A[1..r-1],k)
else if k > r:
return Select(A[r+1..n],k-r)
else
return mom
So does the term "reduction" make any sense in regards to these two algorithms? Do any of the following make sense?
Median of Medians/Quicksort can be reduced to a partition subroutine
Median of medians reduces to quicksort
Quicksort reduces to median of medians
This really depends on your definition of "reduction."
The standard type of reduction that's usually discussed is a mapping reduction (also called a many-one reduction). A mapping reduction from problem X to problem Y is the following:
Given an input IX to problem X, transform it into an input IY to problem Y. Then, run a solver for problem Y on IY and output that answer.
In a mapping reduction, you get to make exactly one call to a subroutine that solves problem Y and you have to output whatever answer you get back from that subroutine. For example, you can reduce the problem of "is this number even?" to the problem of "is this number odd?" by adding one to the number and outputting whether the resulting number is odd.
As a non-example of a mapping reduction, consider these two problems: first, the problem "is every boolean in this list true?," and second, the problem "is some boolean in this list false?" If you have a solver for the second problem, you can use it to solve the first by running the solver for the second problem and outputting the opposite result: a list of booleans has some element that's false if and only if it's not the case that every element of the list is true. However, this reduction isn't a mapping reduction because we're flipping the result produced by the subroutine.
A different type of reduction that's often used is the Turing reduction. A Turing reduction from problem X to problem Y is the following:
Build an algorithm that solves problem X assuming that there is a magic black box that always solves problem Y.
All mapping reductions are Turing reductions, but not the other way around. The above reduction from "is everything true?" to "is something false" is not a mapping reduction, but it is a Turing reduction because you can use the subroutine for "is something false?" to learn whether or not the list contains any false values, then can output the opposite.
Another major difference between mapping reductions and Turing reductions is that in a Turing reduction, you can make multiple calls to the subroutine that solves problem Y, not just one.
You can think of both quicksort and median-of-medians as algorithms that use partitioning as a subroutine. In quicksort, that subroutine does all the heavy lifting required to sort everything, and in median-of-medians it does one of the essential steps to shrink down the input. Since both algorithms make multiple calls to the subroutine, you can think of them as Turing-style reductions. Quicksort is a reduction from sorting to partitioning, while median-of-medians is a reduction from selection to partitioning.
Hope this helps!
I don't think either can be reduced to the other (in any meaningful way, anyway). You could use the median of medians to choose the pivot for a Quicksort (but nearly nobody actually does). A Quicksort still has to carry out some other steps based on the pivot element though (specifically, partitioning the data based on the pivot).
Likewise, median of medians can't be reduced to Quicksort because a Quicksort does extra work that (among other things) prevents it from meeting the complexity guarantee of the median of medians.
Algorithm requirements
Input is an arbitrary square matrix M of size N×N, which just fits in memory.
The algorithm's output must be true if M[i,j] = M[j,i] for all j≠i, false otherwise.
Obvious solutions
Check if the transpose equals the matrix itself (MT=M). Easiest to program in many environments, but (usually) consumes twice the memory and requires N² comparisons worst case. Therefore, this is O(N²) and has high peak memory.
Check if the lower triangular part equals the upper triangular part. Of course, the algorithm returns on the first inequality found. This would make the worst case (worst case being, the matrix is indeed symmetric) require N²/2 - N comparisons, since the diagonal does not need to be checked. So although it is better than option 1, this is still O(N²).
Question
Although it's hard to see how it would be possible (the N² elements will all have to be compared somehow), is there an algorithm doing this check that is better than O(N²)?
Or, provided there is a proof of non-existence of such an algorithm: how to implement this most efficiently for a multi-core CPU (Intel or AMD) taking into account things like cache-friendliness, optimal branch prediction, other compiler-specific specializations, etc.?
This question stems mostly from academic interest, although I imagine a practical use could be to determine what solver to use if the matrix describes a linear system AX=b...
Since you will have to examine all the elements except the diagonal, the complexity IMO can't be better than O (n^2).
For a dense matrix, the answer is a definite "no", because any uninspected (non-diagonal) elements could be different from their transposed counterparts.
For standard representations of a sparse matrix, the same reasoning indicates that you can't generally do better than the input size.
However, the same reasoning doesn't apply to arbitrary matrix representations. For example, you could store sparse representations of the symmetric and antisymmetric components of your matrix, which can easily be checked for symmetry in O(1) time by checking if antisymmetric element has any components at all...
I think you can take a probabilistic approach here.
I think it's not a chance/coincidence that x randomly picked lower coordinate elements will match to their upper triangular counter part. The chance is very high that the matrix is indeed symmetric.
So instead of going through all the ½n² - n elements you can check p random coordinates and tell if the matrix is symmetric with confidence:
p / (½n² - n)
you can then decide a threshold above which you believe that the matrix must be a symmetric matrix.
I'm given a task to write an algorithm to compute the maximum two dimensional subset, of a matrix of integers. - However I'm not interested in help for such an algorithm, I'm more interested in knowing the complexity for the best worse-case that can possibly solve this.
Our current algorithm is like O(n^3).
I've been considering, something alike divide and conquer, by splitting the matrix into a number of sub-matrices, simply by adding up the elements within the matrices; and thereby limiting the number of matrices one have to consider in order to find an approximate solution.
Worst case (exhaustive search) is definitely no worse than O(n^3). There are several descriptions of this on the web.
Best case can be far better: O(1). If all of the elements are non-negative, then the answer is the matrix itself. If the elements are non-positive, the answer is the element that has its value closest to zero.
Likewise if there are entire rows/columns on the edges of your matrix that are nothing but non-positive integers, you can chop these off in your search.
I've figured that there isn't a better way to do it. - At least not known to man yet.
And I'm going to stick with the solution I got, mainly because its simple.