Finding maximum subsequence below or equal to a certain value - algorithm

I'm learning dynamic programming and I've been having a great deal of trouble understanding more complex problems. When given a problem, I've been taught to find a recursive algorithm, memoize the recursive algorithm and then create an iterative, bottom-up version. At almost every step I have an issue. In terms of the recursive algorithm, I write different different ways to do recursive algorithms, but only one is often optimal for use in dynamic programming and I can't distinguish what aspects of a recursive algorithm make memoization easier. In terms of memoization, I don't understand which values to use for indices. For conversion to a bottom-up version, I can't figure out which order to fill the array/double array.
This is what I understand:
- it should be possible to split the main problem to subproblems
In terms of the problem mentioned, I've come up with a recursive algorithm that has these important lines of code:
int optionOne = values[i] + find(values, i+1, limit - values[i]);
int optionTwo = find(values, i+1, limit);
If I'm unclear or this is not the correct qa site, let me know.
Edit:
Example: Given array x: [4,5,6,9,11] and max value m: 20
Maximum subsequence in x under or equal to m would be [4,5,11] as 4+5+11 = 20

I think this problem is NP-hard, meaning that unless P = NP there isn't a polynomial-time algorithm for solving the problem.
There's a simple reduction from the subset-sum problem to this problem. In subset-sum, you're given a set of n numbers and a target number k and want to determine whether there's a subset of those numbers that adds up to exactly k. You can solve subset-sum with a solver for your problem as follows: create an array of the numbers in the set and find the largest subsequence whose sum is less than or equal to k. If that adds up to exactly k, the set has a subset that adds up to k. Otherwise, it does not.
This reduction takes polynomial time, so because subset-sum is NP-hard, your problem is NP-hard as well. Therefore, I doubt there's a polynomial-time algorithm.
That said - there is a pseudopolynomial-time algorithm for subset-sum, which is described on Wikipedia. This algorithm uses DP in two variables and isn't strictly polynomial time, but it will probably work in your case.
Hope this helps!

Related

Why based on these, the partition problem cannot be concluded that P = NP?

I have learned that the Partition problem is included in the NP-Hard problem. I have done some searching about this Problem and seem to not find the reason why is for a certain problem it cannot be concluded that P = NP for a certain algorithm.
If Partition problem can be solved in time 𝑂 (𝑛𝑀) where 𝑛 is the number of elements in the set, and 𝑀 is the sum of the absolute values of the elements in the set. Why based on these, it cannot be concluded that P = NP?
The definition of P and NP states that an algorithm (non-deterministic in case of NP) exists that runs in a polynomial time. The trick is that the argument of the polynomial is the size of the problem, which is defined as the number of bits that encode the problem instance.
The trick with the Partition problem is that numbers are exponential in terms of bits encoding them, so 𝑂(𝑛𝑀) is actually exponential in terms of the size of the encoding.
The class of NP-complete problems where a deterministic polynomial time algorithm exists where the argument of the polynomial is the sum of numbers that define the problem is called weakly NP-complete. Partition, backpack and similar are in this class.
Related Wikipedia entry: https://en.wikipedia.org/wiki/Weak_NP-completeness
Because M is the sum of the absolute values of the elements. It's not polynomial to n, and thus it is not a polynomial time algorithm. Same idea with the dynamic programming solution for the knapsack problem.

Minimum Cut in undirected graphs

I would like to quote from Wikipedia
In mathematics, the minimum k-cut, is a combinatorial optimization
problem that requires finding a set of edges whose removal would
partition the graph to k connected components.
It is said to be the minimum cut if the set of edges is minimal.
For a k = 2, It would mean Finding the set of edges whose removal would Disconnect the graph into 2 connected components.
However, The same article of Wikipedia says that:
For a fixed k, the problem is polynomial time solvable in O(|V|^(k^2))
My question is Does this mean that minimum 2-cut is a problem that belongs to complexity class P?
The min-cut problem is solvable in polynomial time and thus yes it is true that it belongs to complexity class P. Another article related to this particular problem is the Max-flow min-cut theorem.
First of all, the time complexity an algorithm should be evaluated by expressing the number of steps the algorithm requires to finish as a function of the length of the input (see Time complexity). More or less formally, if you vary the length of the input, how would the number of steps required by the algorithm to finish vary?
Second of all, the time complexity of an algorithm is not exactly the same thing as to what complexity class does the problem the algorithm solves belong to. For one problem there can be multiple algorithms to solve it. The primality test problem (i.e. testing if a number is a prime or not) is in P, but some (most) of the algorithms used in practice are actually not polynomial.
Third of all, in the case of most algorithms you'll find on the Internet evaluating the time complexity is not done by definition (i.e. not as a function of the length of the input, at least not expressed directly as such). Lets take the good old naive primality test algorithm (the one in which you take n as input and you check for division by 2,3...n-1). How many steps does this algo take? One way to put it is O(n) steps. This is correct. So is this algorithm polynomial? Well, it is linear in n, so it is polynomial in n. But, if you take a look at what time complexity means, the algorithm is actually exponential. First, what is the length of the input to your problem? Well, if you provide the input n as an array of bits (the usual in practice) then the length of the input is, roughly said, L = log n. Your algorithm thus takes O(n)=O(2^log n)=O(2^L) steps, so exponential in L. So the naive primality test is in the same time linear in n, but exponential in the length of the input L. Both correct. Btw, the AKS primality test algorithm is polynomial in the size of input (thus, the primality test problem is in P).
Fourth of all, what is P in the first place? Well, it is a class of problems that contains all decision problems that can be solved in polynomial time. What is a decision problem? A problem that can be answered with yes or no. Check these two Wikipedia pages for more details: P (complexity) and decision problems.
Coming back to your question, the answer is no (but pretty close to yes :p). The minimum 2-cut problem is in P if formulated as a decision problem (your formulation requires an answer that is not just a yes-or-no). In the same time the algorithm that solves the problem in O(|V|^4) steps is a polynomial algorithm in the size of the input. Why? Well, the input to the problem is the graph (i.e. vertices, edges and weights), to keep it simple lets assume we use an adjacency/weights matrix (i.e. the length of the input is at least quadratic in |V|). So solving the problem in O(|V|^4) steps means polynomial in the size of the input. The algorithm that accomplishes this is a proof that the minimum 2-cut problem (if formulated as decision problem) is in P.
A class related to P is FP and your problem (as you formulated it) belongs to this class.

What is fixed-parameter tractability? Why is it useful?

Some problems that are NP-hard are also fixed-parameter tractable, or FPT. Wikipedia describes a problem as fixed-parameter tractable if there's an algorithm that solves it in time f(k) Β· |x|O(1).
What does this mean? Why is this concept useful?
To begin with, under the assumption that P β‰  NP, there are no polynomial-time, exact algorithms for any NP-hard problem. Although we don't know whether P = NP or P β‰  NP, we don't have any polynomial-time algorithms for any NP-hard problems.
The idea behind fixed-parameter tractability is to take an NP-hard problem, which we don't know any polynomial-time algorithms for, and to try to separate out the complexity into two pieces - some piece that depends purely on the size of the input, and some piece that depends on some "parameter" to the problem.
As an example, consider the 0/1 knapsack problem. In this problem, you're given a list of n objects that have associated weights and values, along with some maximum weight W that you're allowed to carry. The question is to determine the maximum amount of value that you can carry. This problem is NP-hard, meaning that there's no polynomial-time algorithm that solves it. A brute-force method will take time around O(2n) by considering all possible subsets of the items, which is extremely slow for large n. However, it is possible to solve this problem in time O(nW), where n is the number of elements and W is the amount of weight you can carry. If you look at the runtime O(nW), you'll notice that it's split into two parts: a component that's linear in the number of elements (the n part) and a component that's linear in the weight (the W part). If W is any fixed constant, then the runtime of this algorithm will be O(n), which is linear-time, even though the problem in general is NP-hard. This means that if we treat W as some tunable "parameter" of the problem, for any fixed value of this parameter, the problem ends up running in polynomial time (which is "tractable," in the complexity theory sense of the word.)
As another example, consider the problem of finding long, simple paths in a graph. This problem is also NP-hard, and the naive algorithm for finding simple paths of length k in a graph takes time O(n! / (n - k)!), which for large k ends up being superexponential. However, using the technique of color-coding, it's possible to solve this problem in time O((2e)kn3 log n), where k is the length of the path to find and n is the number of nodes in the input graph. Notice that this runtime also has two "components:" one component that's a polynomial in the number of nodes in the input graph (the n3 log n part) and one component that's exponential in k (the (2e)k part). This means that for any fixed value of k, there's a polynomial-time algorithm for finding length-k paths in the graph; the runtime will be O(n3 log n).
In both of these cases, we can take a problem for which we have an exponential-time solution (or worse) and find a new solution whose runtime is some polynomial in n times some crazy-looking function of some extra "parameter." In the case of the knapsack problem, that parameter is the maximum amount of weight we can carry; in the case of finding long paths, the parameter is the length of the path to find. Generally speaking, a problem is called fixed-parameter tractable if there is some algorithm for solving the problem defined in terms of two quantities: n, the size of the input, and k, some "parameter," where the runtime is
O(p(n) Β· f(k))
Where p(n) is some polynomial function and f(k) is an arbitrary function in k. Intuitively, this means that the complexity of the problem scales polynomially with n (meaning that as only the problem size increases, the runtime will scale nicely), but can scale arbitrarily badly with the parameter k. This separates out the "inherent hardness" of the problem such that the "hard part" of the problem is blamed on the parameter k, while the "easy part" of the problem is charged to the size of the input.
Once you have a runtime that looks like O(p(n) Β· f(k)), we immediately get polynomial-time algorithms for solving the problem for any fixed k. Specifically, if k is fixed, then f(k) is some constant, so O(p(n) Β· f(k)) is just O(p(n)). This is a polynomial-time algorithm. Therefore, if we "fix" the parameter, we get back some "tractable" algorithm for solving the problem. This is the origin of the term fixed-parameter tractable.
(A note: Wikipedia's definition of fixed-parameter tractability says that the algorithm should have runtime f(k) Β· |x|O(1). Here, |x| refers to the size of the input, which I've called n here. This means that Wikipedia's definition is the same as saying that the runtime is f(k) Β· nO(1). As mentioned in this earlier answer, nO(1) means "some polynomial in n," and so this definition ends up being equivalent to the one I've given here).
Fixed-parameter tractability has enormous practical implications for a problem. It's common to encounter problems that are NP-hard. If you find a problem that's fixed-parameter tractable and the parameter is low, it can be significantly more efficient to use the fixed-parameter tractable algorithm than to use the normal brute-force algorithm. The color-coding example above for finding long paths in a graph, for example, has been used to great success in computational biology to find sequencing pathways in yeast cells, and the 0/1 knapsack solution is used frequently because common values of W are low enough for it to be practical.
Hope this helps!
I believe that the explanation of #templatetypedef was already quite comprehensive of the generality of FPT.
I would like to add that in practice, it appears quite often that the class of problem one is trying to solve is FPT, such as above examples.
In the case of problems expressed as set of constraints (e.g. SAT, CSP, ILP, etc.) a very common parameter is treewidth, which basically explicits how much your problem is organized as a tree.
This allows to split ones problem into a tree of subproblems which can then be solved more individually using dynamic programming.
In such case, many problems are linear-time fixed-parameter tractable, that is the complexity grows linearly with the number of components (i.e. the size of the system) by exponentially in the size of its biggest component.
Although the use of explicit techniques is possible to solve sub-problems is possible, in order to scale-up to more reasonnable instances, using symbolic representations is recomended.

Dynamic programming aspect in Kadane's algorithm

Initialize:
max_so_far = 0
max_ending_here = 0
Loop for each element of the array
(a) max_ending_here = max_ending_here + a[i]
(b) if(max_ending_here < 0)
max_ending_here = 0
(c) if(max_so_far < max_ending_here)
max_so_far = max_ending_here
return max_so_far
Can anyone help me in understanding the optimal substructure and overlapping problem(bread and butter of DP) i the above algo?
According to this definition of overlapping subproblems, the recursive formulation of Kadane's algorithm (f[i] = max(f[i - 1] + a[i], a[i])) does not exhibit this property. Each subproblem would only be computed once in a naive recursive implementation.
It does however exhibit optimal substructure according to its definition here: we use the solution to smaller subproblems in order to find the solution to our given problem (f[i] uses f[i - 1]).
Consider the dynamic programming definition here:
In mathematics, computer science, and economics, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems1 and optimal substructure (described below). When applicable, the method takes far less time than naive methods that don't take advantage of the subproblem overlap (like depth-first search).
The idea behind dynamic programming is quite simple. In general, to solve a given problem, we need to solve different parts of the problem (subproblems), then combine the solutions of the subproblems to reach an overall solution. Often when using a more naive method, many of the subproblems are generated and solved many times. The dynamic programming approach seeks to solve each subproblem only once, thus reducing the number of computations
This leaves room for interpretation as to whether or not Kadane's algorithm can be considered a DP algorithm: it does solve the problem by breaking it down into easier subproblems, but its core recursive approach does not generate overlapping subproblems, which is what DP is meant to handle efficiently - so this would put it outside DP's specialty.
On the other hand, you could say that it is not necessary for the basic recursive approach to lead to overlapping subproblems, but this would make any recursive algorithm a DP algorithm, which would give DP a much too broad scope in my opinion. I am not aware of anything in the literature that definitely settles this however, so I wouldn't mark down a student or disconsider a book or article either way they labeled it.
So I would say that it is not a DP algorithm, just a greedy and / or recursive one, depending on the implementation. I would label it as greedy from an algorithmic point of view for the reasons listed above, but objectively I would consider other interpretations just as valid.
Note that I derived my explanation from this answer. It demonstrates how Kadane’s algorithm can be seen as a DP algorithm which has overlapping subproblems.
Identifying subproblems and recurrence relations
Imagine we have an array a from which we want to get the maximum subarray. To determine the max subarray that ends at index i the following recursive relation holds:
max_subarray_to(i) = max(max_subarray_to(i - 1) + a[i], a[i])
In order to get the maximum subarray of a we need to compute max_subarray_to() for each index i in a and then take the max() from it:
max_subarray = max( for i=1 to n max_subarray_to(i) )
Example
Now, let's assume we have an array [10, -12, 11, 9] from which we want to get the maximum subarray. This would be the work required running Kadane's algorithm:
result = max(max_subarray_to(0), max_subarray_to(1), max_subarray_to(2), max_subarray_to(3))
max_subarray_to(0) = 10 # base case
max_subarray_to(1) = max(max_subarray_to(0) + (-12), -12)
max_subarray_to(2) = max(max_subarray_to(1) + 11, 11)
max_subarray_to(3) = max(max_subarray_to(2) + 9, 49)
As you can see, max_subarray_to() is evaluated twice for each i apart from the last index 3, thus showing that Kadane's algorithm does have overlapping subproblems.
Kadane's algorithm is usually implemented using a bottom up DP approach to take advantage of the overlapping subproblems and to only compute each subproblem once, hence turning it to O(n).

P versus NP Clarification

Quoted from wikipedia, the P vs NP problem, regarding the time complexity of algorithms "... asks whether every problem whose solution can be quickly verified by a computer can also be quickly solved by a computer."
I am hoping that somebody can clarify what the difference between "verifying the problem" and "solving the problem" is here.
I am hoping that somebody can clarify what the difference between "verifying the problem" and "solving the problem" is here.
It's not "verifying the problem", but "verifying the solution". For example you can check in polynomial time whether a given set is valid for SAT. The actual generation of such set is NP hard. The section Verifier-based definition in the Wikipedia article NP (complexity) could help you a little bit:
Verifier-based definition
In order to explain the verifier-based definition of NP, let us consider the subset sum problem: Assume that we are given some integers, such as {βˆ’7, βˆ’3, βˆ’2, 5, 8}, and we wish to know whether some of these integers sum up to zero. In this example, the answer is "yes", since the subset of integers {βˆ’3, βˆ’2, 5} corresponds to the sum (βˆ’3) + (βˆ’2) + 5 = 0. The task of deciding whether such a subset with sum zero exists is called the subset sum problem.
As the number of integers that we feed into the algorithm becomes larger, the number of subsets grows exponentially, and in fact the subset sum problem is NP-complete. However, notice that, if we are given a particular subset (often called a certificate), we can easily check or verify whether the subset sum is zero, by just summing up the integers of the subset. So if the sum is indeed zero, that particular subset is the proof or witness for the fact that the answer is "yes". An algorithm that verifies whether a given subset has sum zero is called verifier. A problem is said to be in NP if and only if there exists a verifier for the problem that executes in polynomial time. In case of the subset sum problem, the verifier needs only polynomial time, for which reason the subset sum problem is in NP.
If you're more into graph theory the Hamilton cycle is a NP-complete problem. It's trivial to check whether a given solution is a Hamilton cycle (linear complexity, traverse the solution path), but if P != NP than there exists no algorithm with polynomial runtime which solves the problem.
Maybe the term "quick" is misleading in this context. An algorithm is quick in this regard if and only if it's worst-case runtime is bounded by a polynomial function, such as O(n) or O(n log n). The creation of all permutations for a given range with the length n is not bounded, as you have n! different permutations. This means that the problem could be solved in n 100 log n, which will take a very long time, but this is still considered fast. On the other hand, one of the first algorithms for TSP was O(n!) and another one was O(n2 2n). And compared to polynomial functions these things grow really, really fast.
The RSA encryption uses prime numbers like following: two big prime numbers P and Q (200-400 digits each) are multiplied to form the public key N. N=P*Q
To break the encryption one needs to figure out P and Q given the N.
While finding P and Q is very difficult and can take years, verifing the solution is just about multiplying P by Q and comparing with N.
So solving the problem is very hard while it takes nothing to verifying the solution.
P.S. The example is only part of the RSA simplified for this question. The real RSA is much more complex.

Resources