For example, the set-cover decision problem is known to be a NP-complete problem. The input of this problems is a universe U, a family S of subsets of U, and an integer k ().
One thing that I'm confused with is that if we let k=1, then obviously the problem can be solved in time |S|, by simply checking each element in S. More generally, when k is a constant, the problem can be solved in polynomial time of |S|. In such a way, the time complexity becomes exponentially high only when k also increases with |S|, like |S|/2, |S|/3...
So here're my questions:
My current understanding is that the time complexity measurement of a NP-complete problem is measured in terms of the WORST case. Can anybody please tell me if the understanding is right?
I saw somebody proved that another problem is NP-hard by showing that a set-covering desicion problem with input <U,S,|U|/3> can be reduced to that problem. I'm just wondering why did he only prove for <U,S,|U|/3>, instead of <U,S,ARBITRARY k>?? Is such a proof reliable?
Many thanks!
Time complexity is measured as a function of the input instance size. The input instance size can be measured in bits. The input instance size increases as any of the inputs U, S, and k increase. So the question that one is trying to answer is how much more time does it take to solve the problem of instance size for example 2n bits vs the problem of instance size n.
So simply the size of the whole input instance has to increase and in this particular case it means increasing the size of U and/or S and/or k.
To answer your two questions:
Yes, the worst case time complexity is used: you are looking for the hardest problem of input size n and you correctly noticed that the problem (of the same size) probably becomes harder as you increase more parameters than just one.
It would be better to see the proof you are referring to but the thinking probably goes like:
I give a polynomial reduction of the set-covering decision problem instance of size n to my problem's instance of size m. If the size of the set-covering decision problem's input instance increases to 2n then the result of the reduction will be my problem's instance of size 2m because there is a direct correspondence between the input size of U, S, and k and the input size of my problem.
So all set-covering decision problem instances of size n map to my problem instances of size m. Thus if I am looking for the hardest instance of the set-covering decision problem using this reduction I will find the hardest instance of my problem of size m.
EDIT
From the proof in your linked paper:
Proof. We reduce an arbitrary 3-cover problem instance—in which we are
given a universe U, a family S of subsets of U, such that each subset
contains 3 elements, and we are asked whether we can (exactly) cover
all of U using |U|/3 elements of S—to a game with homogeneous
resources and schedules of size 3.
As you correctly say, they need to convert all instances of the set-cover problem to their problem. But they are using a reduction from a different problem: the Exact 3-cover problem that is proven to be NP-complete in "Computers and intractability - MR Garey, DS Johnson, 1979".
The Exact 3-Cover problem is like the set cover decision problem but with |U| = 3t and S is a set of 3-element subsets of U.
Related
Is there a decision problem with a time complexity of Ө(n²)?
In other words, I'm looking for a decision problem for which the best known solution has been proven to have a lower bound of N².
I thought about searching for the biggest number in matrix but the problem is that matrix is an input of O(n²) so the solution is linear.
It doesn't need to be known problem, a hypothetical one would suffice as well.
Does a close pair exist?
In any "difficult" metric space, given n points, does a pair exist in distance less than r, where r is an input parameter?
Intuitively proof:
Given that r is an input parameter, you have to search every point.
For a point, you have compute the distance to every other point, that's Θ(n).
For n points, you have n*Θ(n) = Ө(n²).
Time complexity: Ө(n²)
Once I read in a scientific paper:
The computational complexity of the algorithm is O(N^d), where N
is the number of data, d is the dimension. Hence with fixed
dimension, the algorithm complexity is polynomial.
Now, this made me think, that (if I'm not mistaken), big-O notation is defined in the number of binary inputs. Thus if I fix the dimension of data, it is natural to arrive to polynomial solution. Moreover, if I would also fix N, the number of input, I would arrive to an O(1) solution, see the connected post:
Algorithm complexity with input is fix-sized
My question is, if you think that this is a valid argument for polynomial complexity? Can one really fix one dimension and the input data and claim polynomial complexity?
Yes, that's a reasonable thing to do.
It really depends on the initial problem, but in most cases I would say fixing number of dimensions is reasonable. I would expect the paper to claim something like "polynomial complexity for practical purposes" or something like that or have some arguments presented why limiting d is reasonable.
You can compare with a solution with complexity O(d^N) where fixing the number of dimensions doesn't mean that the solution is polynomial. So the one presented is clearly better when d is small.
As a quick recall from university time.
Big-O notation is just a UPPER bound of how your algorithm perform.
Mathematically, f(x) is O(g(x)) means that there exists a constant k>0 and x0 such that
f(x) <= kg(x) for all x>x0
To answer your question, you cannot fix the N, which is the independent variable.
If you fix N, says <100, we can surely arrive O(1),
because according to the definition. We can set a large K to ensure f(N) <= kG(N) for all x (<100)
This only works for some algorithms. It is not clear to me, what the "dimension" should be in some cases.
E.g. SubSetSum is is NP-complete, therefor there is no algorithm known with polynomial complexity. But the input is just N numbers. You could also see it as N numbers of bit length d. but the algorithm still has a polynomial complexity.
Same holds for Shortest Vector Problem (SVP) for lattices. The input is a N x N Basis (lets say with integer entries) and you look for the shortest non zero vector. This is also a hard problem and no algorithm with polynomial complexity is known yet.
For many problems its not just the size of the input data that makes the problem difficult, but certain properties or parameters of that data. E.g. many graph problems have complexity given in the number of nodes and edges separately.
Sometimes, the difference between this parameters might be dramatic, for example if you have something like O(n^d) the complexity is just polynomial when n grows, but exponential when d grows.
If you now happen to have an application, where you know that the value of a parameter like the dimension is always the same or there is a (small) maximal value, then regarding this parameter as fixed can give you useful inside. So statements like these are very common in scientific papers.
However, you can not just fix any parameter, e.g. your memory is finite, therefore sorting of data is constant time, because the bound on that parameter is so large that viewing it as fixed does not give you any useful insight.
So fixing all parameters is usually not an option because there has to be one aspect in which the size of your data varies. It can be an option if your complexity is very slow growing.
E.g. data structures with O(log n) operations are sometimes considered to have effectively constant complexity if the constant is also quite small. Or data structures as union-find-structures where amortized complexity of the operations is O(α(n)) where α is the inverse of the Ackermann-function, a function growing so slowly that it is impossible to get above 10 or for any size n any hardware imaginable could possible ever handle.
I would like to quote from Wikipedia
In mathematics, the minimum k-cut, is a combinatorial optimization
problem that requires finding a set of edges whose removal would
partition the graph to k connected components.
It is said to be the minimum cut if the set of edges is minimal.
For a k = 2, It would mean Finding the set of edges whose removal would Disconnect the graph into 2 connected components.
However, The same article of Wikipedia says that:
For a fixed k, the problem is polynomial time solvable in O(|V|^(k^2))
My question is Does this mean that minimum 2-cut is a problem that belongs to complexity class P?
The min-cut problem is solvable in polynomial time and thus yes it is true that it belongs to complexity class P. Another article related to this particular problem is the Max-flow min-cut theorem.
First of all, the time complexity an algorithm should be evaluated by expressing the number of steps the algorithm requires to finish as a function of the length of the input (see Time complexity). More or less formally, if you vary the length of the input, how would the number of steps required by the algorithm to finish vary?
Second of all, the time complexity of an algorithm is not exactly the same thing as to what complexity class does the problem the algorithm solves belong to. For one problem there can be multiple algorithms to solve it. The primality test problem (i.e. testing if a number is a prime or not) is in P, but some (most) of the algorithms used in practice are actually not polynomial.
Third of all, in the case of most algorithms you'll find on the Internet evaluating the time complexity is not done by definition (i.e. not as a function of the length of the input, at least not expressed directly as such). Lets take the good old naive primality test algorithm (the one in which you take n as input and you check for division by 2,3...n-1). How many steps does this algo take? One way to put it is O(n) steps. This is correct. So is this algorithm polynomial? Well, it is linear in n, so it is polynomial in n. But, if you take a look at what time complexity means, the algorithm is actually exponential. First, what is the length of the input to your problem? Well, if you provide the input n as an array of bits (the usual in practice) then the length of the input is, roughly said, L = log n. Your algorithm thus takes O(n)=O(2^log n)=O(2^L) steps, so exponential in L. So the naive primality test is in the same time linear in n, but exponential in the length of the input L. Both correct. Btw, the AKS primality test algorithm is polynomial in the size of input (thus, the primality test problem is in P).
Fourth of all, what is P in the first place? Well, it is a class of problems that contains all decision problems that can be solved in polynomial time. What is a decision problem? A problem that can be answered with yes or no. Check these two Wikipedia pages for more details: P (complexity) and decision problems.
Coming back to your question, the answer is no (but pretty close to yes :p). The minimum 2-cut problem is in P if formulated as a decision problem (your formulation requires an answer that is not just a yes-or-no). In the same time the algorithm that solves the problem in O(|V|^4) steps is a polynomial algorithm in the size of the input. Why? Well, the input to the problem is the graph (i.e. vertices, edges and weights), to keep it simple lets assume we use an adjacency/weights matrix (i.e. the length of the input is at least quadratic in |V|). So solving the problem in O(|V|^4) steps means polynomial in the size of the input. The algorithm that accomplishes this is a proof that the minimum 2-cut problem (if formulated as decision problem) is in P.
A class related to P is FP and your problem (as you formulated it) belongs to this class.
Some problems that are NP-hard are also fixed-parameter tractable, or FPT. Wikipedia describes a problem as fixed-parameter tractable if there's an algorithm that solves it in time f(k) · |x|O(1).
What does this mean? Why is this concept useful?
To begin with, under the assumption that P ≠ NP, there are no polynomial-time, exact algorithms for any NP-hard problem. Although we don't know whether P = NP or P ≠ NP, we don't have any polynomial-time algorithms for any NP-hard problems.
The idea behind fixed-parameter tractability is to take an NP-hard problem, which we don't know any polynomial-time algorithms for, and to try to separate out the complexity into two pieces - some piece that depends purely on the size of the input, and some piece that depends on some "parameter" to the problem.
As an example, consider the 0/1 knapsack problem. In this problem, you're given a list of n objects that have associated weights and values, along with some maximum weight W that you're allowed to carry. The question is to determine the maximum amount of value that you can carry. This problem is NP-hard, meaning that there's no polynomial-time algorithm that solves it. A brute-force method will take time around O(2n) by considering all possible subsets of the items, which is extremely slow for large n. However, it is possible to solve this problem in time O(nW), where n is the number of elements and W is the amount of weight you can carry. If you look at the runtime O(nW), you'll notice that it's split into two parts: a component that's linear in the number of elements (the n part) and a component that's linear in the weight (the W part). If W is any fixed constant, then the runtime of this algorithm will be O(n), which is linear-time, even though the problem in general is NP-hard. This means that if we treat W as some tunable "parameter" of the problem, for any fixed value of this parameter, the problem ends up running in polynomial time (which is "tractable," in the complexity theory sense of the word.)
As another example, consider the problem of finding long, simple paths in a graph. This problem is also NP-hard, and the naive algorithm for finding simple paths of length k in a graph takes time O(n! / (n - k)!), which for large k ends up being superexponential. However, using the technique of color-coding, it's possible to solve this problem in time O((2e)kn3 log n), where k is the length of the path to find and n is the number of nodes in the input graph. Notice that this runtime also has two "components:" one component that's a polynomial in the number of nodes in the input graph (the n3 log n part) and one component that's exponential in k (the (2e)k part). This means that for any fixed value of k, there's a polynomial-time algorithm for finding length-k paths in the graph; the runtime will be O(n3 log n).
In both of these cases, we can take a problem for which we have an exponential-time solution (or worse) and find a new solution whose runtime is some polynomial in n times some crazy-looking function of some extra "parameter." In the case of the knapsack problem, that parameter is the maximum amount of weight we can carry; in the case of finding long paths, the parameter is the length of the path to find. Generally speaking, a problem is called fixed-parameter tractable if there is some algorithm for solving the problem defined in terms of two quantities: n, the size of the input, and k, some "parameter," where the runtime is
O(p(n) · f(k))
Where p(n) is some polynomial function and f(k) is an arbitrary function in k. Intuitively, this means that the complexity of the problem scales polynomially with n (meaning that as only the problem size increases, the runtime will scale nicely), but can scale arbitrarily badly with the parameter k. This separates out the "inherent hardness" of the problem such that the "hard part" of the problem is blamed on the parameter k, while the "easy part" of the problem is charged to the size of the input.
Once you have a runtime that looks like O(p(n) · f(k)), we immediately get polynomial-time algorithms for solving the problem for any fixed k. Specifically, if k is fixed, then f(k) is some constant, so O(p(n) · f(k)) is just O(p(n)). This is a polynomial-time algorithm. Therefore, if we "fix" the parameter, we get back some "tractable" algorithm for solving the problem. This is the origin of the term fixed-parameter tractable.
(A note: Wikipedia's definition of fixed-parameter tractability says that the algorithm should have runtime f(k) · |x|O(1). Here, |x| refers to the size of the input, which I've called n here. This means that Wikipedia's definition is the same as saying that the runtime is f(k) · nO(1). As mentioned in this earlier answer, nO(1) means "some polynomial in n," and so this definition ends up being equivalent to the one I've given here).
Fixed-parameter tractability has enormous practical implications for a problem. It's common to encounter problems that are NP-hard. If you find a problem that's fixed-parameter tractable and the parameter is low, it can be significantly more efficient to use the fixed-parameter tractable algorithm than to use the normal brute-force algorithm. The color-coding example above for finding long paths in a graph, for example, has been used to great success in computational biology to find sequencing pathways in yeast cells, and the 0/1 knapsack solution is used frequently because common values of W are low enough for it to be practical.
Hope this helps!
I believe that the explanation of #templatetypedef was already quite comprehensive of the generality of FPT.
I would like to add that in practice, it appears quite often that the class of problem one is trying to solve is FPT, such as above examples.
In the case of problems expressed as set of constraints (e.g. SAT, CSP, ILP, etc.) a very common parameter is treewidth, which basically explicits how much your problem is organized as a tree.
This allows to split ones problem into a tree of subproblems which can then be solved more individually using dynamic programming.
In such case, many problems are linear-time fixed-parameter tractable, that is the complexity grows linearly with the number of components (i.e. the size of the system) by exponentially in the size of its biggest component.
Although the use of explicit techniques is possible to solve sub-problems is possible, in order to scale-up to more reasonnable instances, using symbolic representations is recomended.
If a problem X reduces to a problem Y is the opposite reduction also possible? Say
X = Given an array tell if all elements are distinct
Y = Sort an array using comparison sort
Now, X reduces to Y in linear time i.e. if I can solve Y, I can solve X in linear time. Is the reverse always true? Can I solve Y, given I can solve X? If so, how?
By reduction I mean the following:
Problem X linear reduces to problem Y if X can be solved with:
a) Linear number of standard computational steps.
b) Constant calls to subroutine for Y.
Given the example above:
You can determine if all elements are distinct in O(N) if you back them up with a hash table. Which allows you to check existence in O(1) + the overhead of the hash function (which generally doesn't matter). IF you are doing a non-comparison based sort:
sorting algorithm list
Specialized sort that is linear:
For simplicity, assume you're sorting a list of natural numbers. The sorting method is illustrated using uncooked rods of spaghetti:
For each number x in the list, obtain a rod of length x. (One practical way of choosing the unit is to let the largest number m in your list correspond to one full rod of spaghetti. In this case, the full rod equals m spaghetti units. To get a rod of length x, simply break a rod in two so that one piece is of length x units; discard the other piece.)
Once you have all your spaghetti rods, take them loosely in your fist and lower them to the table, so that they all stand upright, resting on the table surface. Now, for each rod, lower your other hand from above until it meets with a rod--this one is clearly the longest! Remove this rod and insert it into the front of the (initially empty) output list (or equivalently, place it in the last unused slot of the output array). Repeat until all rods have been removed.
So given a very specialized case of your problem, your statement would hold. This will not hold in the general case though, which seems to be more what you are after. It is very similar to when people think they have solved TSP, but have instead created a constrained version of the general problem that is solvable using a special algorithm.
Suppose I can solve a problem A in constant time O(1) but problem B has a best case exponential time solution O(2^n). It is likely that I can come up with an insanely complex way of solving problem A in O(2^n) ("reducing" problem A to B) as well but if the answer to your question was "YES", I should then be able to make all exceedingly difficult problems solvable in O(1). Surely, that cannot be the case!
Assuming I understand what you mean by reduction, let's say that I have a problem that I can solve in O(N) using an array of key/value pairs, that being the problem of looking something up from a list. I can solve the same problem in O(1) by using a Dictionary.
Does that mean I can go back to my first technique, and use it to solve the same problem in O(1)?
I don't think so.