What exactly is Input size - algorithm

When I am about to know some algorithms I confused about what exactly is input size.
For example in travelling sales person problem dynamic programming implementation takes O(2n × n2)
And Kruskal's algorithm takes
O(E log V). Though both are graph problems why TSP input size is the number of vertices n and in Kruskal's algorithm the input size is in Edges and Vertices.
How to know exactly what can be taken as input size?

You can state an algorithm's complexity in terms of whatever variables you like.
You should generally choose them to clearly communicate useful information about the running time of the algorithm.

Related

Why naive implementation of Dijkstra's shortest path algorithm takes O(nm) time

I'm watching the coursera lectures about algorithm and the professor introduces that the naive implementation of Dijkstra's shortest path algorithm, without using heaps, takes O(nm) time(n is the number of vertices, and m is the number of edges)
It claims that the main loop will go through the rest vertices besides the source, which are n-1 vertices, this I can understand, but inside the loop, the algorithm will go through edges that with tail in the processed vertices and head in the unprocessed vertices, to minimize the next path. But why does it mean there are m edges to go through, we can just go through the edges that qualifies the criteria(tail in the processed vertices and head in the unprocessed vertices) even in a naive implementation right?
Could anyone please help me understand this? thanks.
When you consider big-O time complexity, you should think of it as a form of upper-bound. Meaning, if some input can possibly make your program run in O(N) while some other input can possibly make your program run in O(NM), we generally say that this is an O(NM) program (especially when dealing with algorithms). This is called considering an algorithm in its worst-case.
There are rare cases where we don't consider the worst-case (amortized time complexity analysis, time complexity with random elements to it such as quicksort). In the case of quicksort, for example, the worst-case time complexity is O(N^2), but the chance of that happening is so, so small that in practice, if we select a random pivot, we can expect O(N log N) time complexity. That being said, unless you can guarantee (or have large confidence that) your program runs faster than O(NM), it is an O(NM) program.
Let's consider the case of your naive implementation of Dijkstra's algorithm. You didn't specify whether you were considering directed or undirected graphs, so I'll assume that the graph is undirected (the case for a directed graph is extremely similar).
We're going through all the nodes, besides the first node. This means we already have an O(N) loop.
In each layer of the loop, we're considering all the edges that stem from a processed node to an unprocessed node.
However, in the worst-case, there are close to O(M) of these in each layer of the loop; not O(1) or anything less than O(M).
There's an easy way to prove this. If each edge were to only be visited a single time, then we can say that the algorithm runs in O(M). However, in your naive implementation of Dijkstra's algorithm, the same edge can be considered multiple times. In fact, asymptomatically speaking, O(N) times.
You can try creating a graph yourself, then dry-running the process of Dijkstra on paper. You'll notice that each edge can be considered up to N times: once in each layer of the O(N) loop.
In other words, the reason we can't say the program is faster than O(NM) is because there is no guarantee that each edge isn't processed N times, or less than O(log N) times, or less than O(sqrt N) times, etc. Therefore, the best upper bound we can give is N, although in practice it may be less than N by some sort of constant. However, we do not consider this constant in big-O time complexity.
However, your thought process may lead to a better implementation of Dijsktra. When considering the algorithm, you may realize that instead of considering every single edge that goes from processed to unprocessed vertices in every iteration of the main loop, we only have to consider every single edge that is adjacent to the current node we are on (I'm talking about something like this or this).
By implementing it like so, you can achieve a complexity of O(N^2).

How does Kruskal and Prim change when edge weights are in the range of 1 to |V| or some constant W?

I'm reading CLRS Algorithms Edition 3 and I have two problems for my homework (I'm not asking for answers, I promise!). They are essentially the same question, just applied to Kruskal or to Prim. They are as follows:
Suppose that all edge weights in a graph are integers in the range from 1 to |V|. How fast can you make [Prim/Kruskal]'s algorithm run? What if the edge weights are integers in the range from 1 to W for some constant W?
I can see the logic behind the answers I'm thinking of and what I'm finding online (ie sort the edge weights using a linear sort, change the data structure being used, etc), so I don't need help answering it. But I'm wondering why there is a difference between the answer if the range is 1 to |V| and 1 to W. Why ask the same question twice? If it's some constant W, it could literally be anything. But honestly, so could |V| - we could have a crazy large graph, or a very small one. I'm not sure how the two questions posed in this problem are different, and why I need two separate approaches for both of them.
There's a difference in complexity between an algorithm that runs in O(V) time and O(W) time for constant W. Sure, V could be anything, as could W, but that's not really the point: one is linear, one, is O(1). The question is then for which algorithms could having a restricted range of edge-weights impact complexity (based, as you suggest on edge-weight sort time and choice in data-structure), and what would the actual new optimal complexity be for linearly bounded edge-weights vs. for edge-weights bounded by a constant, W.
Having bounded edge-weights could open up new possibilities for sorting algorithms for Kruskal's, and might change the data structure you'd want to use to implement the queue for Prim's along with the most optimal way you could implement extract-min and update-key operations for that queue. The extent to which edge-weights are bounded can impact whether a particular change in data structure or implementation is even beneficial to make in terms of final complexity.
For example, knowing that the n elements of a list are bounded in value by a constant W makes it so that a switch to radix sort would improve the asymptotic complexity of sorting them, but if I instead only knew that they were bounded in value by 2^n there would be no advantage in changing to radix sort over the traditional methods and their O(n*logn) sorting complexity.

Minimum Cut in undirected graphs

I would like to quote from Wikipedia
In mathematics, the minimum k-cut, is a combinatorial optimization
problem that requires finding a set of edges whose removal would
partition the graph to k connected components.
It is said to be the minimum cut if the set of edges is minimal.
For a k = 2, It would mean Finding the set of edges whose removal would Disconnect the graph into 2 connected components.
However, The same article of Wikipedia says that:
For a fixed k, the problem is polynomial time solvable in O(|V|^(k^2))
My question is Does this mean that minimum 2-cut is a problem that belongs to complexity class P?
The min-cut problem is solvable in polynomial time and thus yes it is true that it belongs to complexity class P. Another article related to this particular problem is the Max-flow min-cut theorem.
First of all, the time complexity an algorithm should be evaluated by expressing the number of steps the algorithm requires to finish as a function of the length of the input (see Time complexity). More or less formally, if you vary the length of the input, how would the number of steps required by the algorithm to finish vary?
Second of all, the time complexity of an algorithm is not exactly the same thing as to what complexity class does the problem the algorithm solves belong to. For one problem there can be multiple algorithms to solve it. The primality test problem (i.e. testing if a number is a prime or not) is in P, but some (most) of the algorithms used in practice are actually not polynomial.
Third of all, in the case of most algorithms you'll find on the Internet evaluating the time complexity is not done by definition (i.e. not as a function of the length of the input, at least not expressed directly as such). Lets take the good old naive primality test algorithm (the one in which you take n as input and you check for division by 2,3...n-1). How many steps does this algo take? One way to put it is O(n) steps. This is correct. So is this algorithm polynomial? Well, it is linear in n, so it is polynomial in n. But, if you take a look at what time complexity means, the algorithm is actually exponential. First, what is the length of the input to your problem? Well, if you provide the input n as an array of bits (the usual in practice) then the length of the input is, roughly said, L = log n. Your algorithm thus takes O(n)=O(2^log n)=O(2^L) steps, so exponential in L. So the naive primality test is in the same time linear in n, but exponential in the length of the input L. Both correct. Btw, the AKS primality test algorithm is polynomial in the size of input (thus, the primality test problem is in P).
Fourth of all, what is P in the first place? Well, it is a class of problems that contains all decision problems that can be solved in polynomial time. What is a decision problem? A problem that can be answered with yes or no. Check these two Wikipedia pages for more details: P (complexity) and decision problems.
Coming back to your question, the answer is no (but pretty close to yes :p). The minimum 2-cut problem is in P if formulated as a decision problem (your formulation requires an answer that is not just a yes-or-no). In the same time the algorithm that solves the problem in O(|V|^4) steps is a polynomial algorithm in the size of the input. Why? Well, the input to the problem is the graph (i.e. vertices, edges and weights), to keep it simple lets assume we use an adjacency/weights matrix (i.e. the length of the input is at least quadratic in |V|). So solving the problem in O(|V|^4) steps means polynomial in the size of the input. The algorithm that accomplishes this is a proof that the minimum 2-cut problem (if formulated as decision problem) is in P.
A class related to P is FP and your problem (as you formulated it) belongs to this class.

What is fixed-parameter tractability? Why is it useful?

Some problems that are NP-hard are also fixed-parameter tractable, or FPT. Wikipedia describes a problem as fixed-parameter tractable if there's an algorithm that solves it in time f(k) · |x|O(1).
What does this mean? Why is this concept useful?
To begin with, under the assumption that P ≠ NP, there are no polynomial-time, exact algorithms for any NP-hard problem. Although we don't know whether P = NP or P ≠ NP, we don't have any polynomial-time algorithms for any NP-hard problems.
The idea behind fixed-parameter tractability is to take an NP-hard problem, which we don't know any polynomial-time algorithms for, and to try to separate out the complexity into two pieces - some piece that depends purely on the size of the input, and some piece that depends on some "parameter" to the problem.
As an example, consider the 0/1 knapsack problem. In this problem, you're given a list of n objects that have associated weights and values, along with some maximum weight W that you're allowed to carry. The question is to determine the maximum amount of value that you can carry. This problem is NP-hard, meaning that there's no polynomial-time algorithm that solves it. A brute-force method will take time around O(2n) by considering all possible subsets of the items, which is extremely slow for large n. However, it is possible to solve this problem in time O(nW), where n is the number of elements and W is the amount of weight you can carry. If you look at the runtime O(nW), you'll notice that it's split into two parts: a component that's linear in the number of elements (the n part) and a component that's linear in the weight (the W part). If W is any fixed constant, then the runtime of this algorithm will be O(n), which is linear-time, even though the problem in general is NP-hard. This means that if we treat W as some tunable "parameter" of the problem, for any fixed value of this parameter, the problem ends up running in polynomial time (which is "tractable," in the complexity theory sense of the word.)
As another example, consider the problem of finding long, simple paths in a graph. This problem is also NP-hard, and the naive algorithm for finding simple paths of length k in a graph takes time O(n! / (n - k)!), which for large k ends up being superexponential. However, using the technique of color-coding, it's possible to solve this problem in time O((2e)kn3 log n), where k is the length of the path to find and n is the number of nodes in the input graph. Notice that this runtime also has two "components:" one component that's a polynomial in the number of nodes in the input graph (the n3 log n part) and one component that's exponential in k (the (2e)k part). This means that for any fixed value of k, there's a polynomial-time algorithm for finding length-k paths in the graph; the runtime will be O(n3 log n).
In both of these cases, we can take a problem for which we have an exponential-time solution (or worse) and find a new solution whose runtime is some polynomial in n times some crazy-looking function of some extra "parameter." In the case of the knapsack problem, that parameter is the maximum amount of weight we can carry; in the case of finding long paths, the parameter is the length of the path to find. Generally speaking, a problem is called fixed-parameter tractable if there is some algorithm for solving the problem defined in terms of two quantities: n, the size of the input, and k, some "parameter," where the runtime is
O(p(n) · f(k))
Where p(n) is some polynomial function and f(k) is an arbitrary function in k. Intuitively, this means that the complexity of the problem scales polynomially with n (meaning that as only the problem size increases, the runtime will scale nicely), but can scale arbitrarily badly with the parameter k. This separates out the "inherent hardness" of the problem such that the "hard part" of the problem is blamed on the parameter k, while the "easy part" of the problem is charged to the size of the input.
Once you have a runtime that looks like O(p(n) · f(k)), we immediately get polynomial-time algorithms for solving the problem for any fixed k. Specifically, if k is fixed, then f(k) is some constant, so O(p(n) · f(k)) is just O(p(n)). This is a polynomial-time algorithm. Therefore, if we "fix" the parameter, we get back some "tractable" algorithm for solving the problem. This is the origin of the term fixed-parameter tractable.
(A note: Wikipedia's definition of fixed-parameter tractability says that the algorithm should have runtime f(k) · |x|O(1). Here, |x| refers to the size of the input, which I've called n here. This means that Wikipedia's definition is the same as saying that the runtime is f(k) · nO(1). As mentioned in this earlier answer, nO(1) means "some polynomial in n," and so this definition ends up being equivalent to the one I've given here).
Fixed-parameter tractability has enormous practical implications for a problem. It's common to encounter problems that are NP-hard. If you find a problem that's fixed-parameter tractable and the parameter is low, it can be significantly more efficient to use the fixed-parameter tractable algorithm than to use the normal brute-force algorithm. The color-coding example above for finding long paths in a graph, for example, has been used to great success in computational biology to find sequencing pathways in yeast cells, and the 0/1 knapsack solution is used frequently because common values of W are low enough for it to be practical.
Hope this helps!
I believe that the explanation of #templatetypedef was already quite comprehensive of the generality of FPT.
I would like to add that in practice, it appears quite often that the class of problem one is trying to solve is FPT, such as above examples.
In the case of problems expressed as set of constraints (e.g. SAT, CSP, ILP, etc.) a very common parameter is treewidth, which basically explicits how much your problem is organized as a tree.
This allows to split ones problem into a tree of subproblems which can then be solved more individually using dynamic programming.
In such case, many problems are linear-time fixed-parameter tractable, that is the complexity grows linearly with the number of components (i.e. the size of the system) by exponentially in the size of its biggest component.
Although the use of explicit techniques is possible to solve sub-problems is possible, in order to scale-up to more reasonnable instances, using symbolic representations is recomended.

Time complexity of one algorithm cascaded into another?

I am working with random forest for a supervised classification problem, and I am using the k-means clustering algorithm to split the data at each node. I am trying to calculate the time complexity for the algorithm. From what I understand the the time complexity for k-means is
O(n · K · I · d )
where
n is the number of points,
K is the number of clusters,
I is the number of iterations, and
d is the number of attributes.
The k, I and d are constants or have an upper bound, and n is much larger as compared to these three, so I suppose the complexity is just O(n).
The random forest, on the other hand, is a divide-and-conquer approach, so for n instances the complexity is O(n · logn), though I am not sure about this, correct me if i am wrong.
To get the complexity of the algorithm do i just add these two things?
In this case, you don't add the values together. If you have a divide-and-conquer algorithm, the runtime is determined by a combination of
The number of subproblems made per call,
The sizes of those subproblems, and
The amount of work done per problem.
Changing any one of these parameters can wildly impact the overall runtime of the function. If you increase the number of subproblems made per call by even a small amount, you increase exponentially the number of total subproblems, which can have a large impact overall. Similarly, if you increase the work done per level, since there are so many subproblems the runtime can swing wildly. Check out the Master Theorem as an example of how to determine the runtime based on these quantities.
In your case, you are beginning with a divide-and-conquer algorithm where all you know is that the runtime is O(n log n) and are adding in a step that does O(n) work per level. Just knowing this, I don't believe it's possible to determine what the runtime will be. If, on the other hand, you make the assumption that
The algorithm always splits the input into two smaller pieces,
The algorithm recursively processes those two pieces independently, and
The algorithm uses your O(n) algorithm to determine which split to make
Then you can conclude that the runtime is O(n log n), since this is the solution to the recurrence given by the Master Theorem.
Without more information about the internal workings of the algorithm, though, I can't say for certain.
Hope this helps!

Resources