Optimal substructure for least number of perfect squares - algorithm

Question: I know how the recursion works but I can't seem to understand the 'optimal substructure' for this problem which necessitates the use of dynamic programming.
Problem: Find least number of perfect squares that sum upto a given number.
Let's say we want to find the shortest path from U to V. If we have a node X in between then shortest path from U to V will be shortest path from U to X plus shortest path from X to V.
I am having difficulty understanding how the least squares problem follows the optimal substructure property.

To my understanding, the recurrence relation for sum of perfect squares behaved similarly to the recurrence relation for shortest paths in the following way. Let
f(n) := minimum number of perfect squares which sum up to exactly n
then a suitable recurrence can be stated as
f(n) = min{ f(n-i) + f(i) : 0 < i < n }
which means that all partitions of the original argument into two summands have to be taken into account. Intuitively, the 'split point' for the shortest path problem is a node, whereras in the perfect squares problem it is the decision how to partition into summands (which are then examined further).

You did not state the property correctly, the third paragraph should be rephrased this way:
Let's say we want to find the shortest path from U to V. First verify if U and V are linked directly, if not, compute for each node X in between U and V the shortest path from U to X plus shortest path from X to V, the smallest answers will be the shortest paths from U to V, possibly duplicated.
This is applicable for your problem because you can determine the set of nodes X that are between U and V. For U=0 and V=n, this set is all the numbers from 1 to n-1, because you are adding positive numbers.
For the solution, use an array to cache the smallest path from 0 to i for i going from 0 to n, for each new value, a linear search will yield the best solutions, for an overall time complexity of O(n2).
You can optimize the linear search by enumerating only the perfect squares less or equal to n. This list is much shorter that the whole list of numbers. Its length is actually sqrt(n), so the complexity for the overall search drops to O(n3/2).
The cache can be just a pair of integers: the length of the path and the value of an intermediary X that is on one of the shortest paths. This gives a space complexity of O(n).
The problem in question: Find least number of perfect squares that sum up to a given number. has been extensively studied for more than 17 centuries: Lagrange's four-square theorem, also known as Bachet's conjecture was already known to Diophantus in the third century. It states that every natural number can be represented as the sum of four integer squares. Analytical solutions exist for determine whether any given integer is the sum of 1, 2, 3 or 4 perfect squares.

Related

All paths of length k from a given vertice

Let's say i have a directed graph G(V,E) with edge cost(real number) ∈ (0,1).For given i,I need to find all the couples of vertices (i,j) starting from i that "match".Two vertices (i,j) match if there is a directed path from i to j with length exactly k(k is a given number that is relatively small and could be considered as constant)with cost >=C(C is a given number).Cost of a path is calculated as the product of it's edges.For example if a path starting from i and ending in j of lenght 2 consists edges e1 and e2 then CostOfpath=cost(e1)*cost(e2).
This has to be done in O(E+V*k).So what i thought is modifying the DFS algorithm updating the distances from given starting vertice i until they reach the length of k.If they don't then we can't have a match.However i am having a hard time finding what exactly i can modify in the DFS.Any ideas?
When you need to consider paths with a fixed number of edges in it, dynamic programming often comes to help (while other approaches often fail).
Let's denote dp[v][j] the maximal cost of the path from vertex i (fixed) to vertex v that has exactly j edges.
For a starting values, you can set values for j==1: dp[v][1] is the cost of edge from i to v (or 0 if no such edge exists). Or if you think on it it will be obvious that you can set values for j==0, not j==1: dp[i][0] is 1, while dp[v][0] can be set to zero for v!=i.
Now, if you have values for some j, it is easy to calculate values for j+1:
dp[v][j+1] = max( dp[v'][j] * cost((v', v)) )
This is very similar to Ford-Bellman's algorithm, only that the latter does not need to track the number of edges and thus can use one-dimensional array.
This gives you the solution in O((E+V)*k). Not exactly what you have requested, but I doubt that there exists solution in O(E+V*k).
(In the solution above, I assume that the constant C is positive, and so a zero cost path is equivalent to the path being absolutely absent. If you need, you can specifically account for the C==0 case.)

Single Source Shortest Path with constraint lower bound of minimum cost

Problem description:
Given a graph G in adjacencyMatrix and adjacencyList, inside which there is a source vertex s and a destination vertex d. Find the shortest path from s to d, with a constraint. The constraint is that the shortest path cost c has a lower bound, i.e., the cost c must be greater than an assigned lower bound N but is the smallest in all the costs of possible paths that are greater or equal N.
I understand with this constraint conventional SSSP algorithm like Bellman ford cannot work correctly. How shall I find a most efficient algorithm for this problem?
I suppose you wanted a walk, since path cannot have cycles.
Unfortunately, the problem can be easily modeled as change-making problem, which is NPC too.
Change-making problem: Given N types of coins of c_i value each, is it possible that number X can be changed with those N coins?
Modelling: Assume all c_i's are even (double all c_i, and also X, if not). Create N + 2 vertices, which the i-th vertex represents the i-th coin for 1 <= i <= N. Also, the (N+1) and (N+2)-th vertices have edge to all coins with cost (c_i / 2). The problem is then equivalent to finding a shortest path of cost at least X, which is NPC. The reduction should be obvious, but if further explanations are needed I can edit my answer.

Find Path of a Specific Weight in a Weighted DAG

Given a DAG where are Edges have a Positive Edge Weight. Given a Value N.
Algorithm to calculate a simple (no cycles or node repetitions) Path with the Total weight N?
I am aware of the Algorithm where we have to find a Path of Given Path Length (number of Edges) but somewhat confused about for the Given Path Weight?
Can Dijkstra be modified for this case? Or anything else?
This is NP-complete, so don't expect any reasonably fast (polynomial-time) algorithm. Here's a reduction from the NP-complete Subset Sum problem, where we are given a multiset of n integers X = {x_1, x_2, ..., x_n} and a number k, and asked if there is any submultiset of the n numbers that sum to exactly k:
Create a graph G with n+1 vertices v_1, v_2, ..., v_{n+1}. For each vertex v_i, add edges to every higher-numbered vertex v_j, and give all these edges weight x_i. This graph has O(n^2) edges and can be constructed in O(n^2) time. Clearly it contains no cycles.
Suppose the answer to the Subset Sum problem is YES: That is, there exists a submultiset Y of X such that the numbers in Y total to exactly k. Actually, let Y = {y_1, y_2, ..., y_m} consist of the m <= n indices 1 <= i <= n of the selected elements of X. Then there is a corresponding path in the graph G with exactly the same weight -- namely the path that starts at v_{y_1}, takes the edge to v_{y_2} (which is of weight x_{y_1}), then takes the edge to v_{y_3}, and so on, finally arriving at v_{y_m} and taking a final edge (which is of weight x_{y_m}) to the terminal vertex v_{n+1}.
In the other direction, suppose that there is a simple path in G of total weight exactly k. Since the path is simple, each vertex appears at most once. Thus each edge in the path leaves a unique vertex. For each vertex v_i in the path except the last, add x_i to a set of chosen numbers: these numbers correspond to the edge weights in the path, so clearly they sum to exactly k, implying that the solution to the Subset Sum problem is also YES. (Notice that the position of the final vertex in the path doesn't matter, since we only care about the vertex that it leaves, and all edges leaving a vertex have the same weight.)
A YES answer to either problem implies a YES answer to the other problem, so a NO answer to either problem implies a NO answer to the other problem. Thus the answer to any Subset Sum problem instance can be found by first constructing the specified instance of your problem in polynomial time, and then using any algorithm for your problem to solve that instance -- so if an algorithm exists that can solve any instance of your problem in polynomial time, the NP-hard Subset Sum problem can also be solved in polynomial time.

Complete Weighted Graph and Hamiltonian Tour

I ran into a question on a midterm exam. Can anyone clarify the answer?
Problem A: Given a Complete Weighted Graph G, find a Hamiltonian Tour with minimum weight.
Problem B: Given a Complete Weighted Graph G and Real Number R, does G have a Hamiltonian Tour with weight at most R?
Suppose there is a machine that solves B. How many times can we call B (each time G and Real number R are given),to solve problem A with that machine? Suppose the sum of Edges in G up to M.
1) We cannot do this, because there is uncountable state.
2) O(|E|) times
3) O(lg m) times
4) because A is NP-Hard, This is cannot be done.
First algorithm
The answer is (3) O(lg m) times. You just have to perform a binary search for the minimum hamiltonian tour in the weighted graph. Notice that if there is a hamiltonian tour of length L in the graph, there is no point in checking if a hamiltonian tour of length L', where L' > L, exists, since you are interested in the minimum-weight hamiltonian tour. So, in each step of your algorithm you can eliminate half of the remaining possible tour-weights. Consequently, you will have to call B in your machine O(lg m) times, where m stands for the total weight of all edges in the complete graph.
Edit:
Second algorithm
I have a slight modification of the above algorithm, which uses the machine O(|E|) times, since some people said that we cannot apply binary search in an uncountable set of possible values (and they are possibly right): Take every possible subset of edges from the graph, and for each subset store a value that is the sum of weights of all edges from the subset. Lets store the values for all the subsets in an array called Val. The size of this array is 2^|E|. Sort Val in increasing order, and then apply binary search for the minimum hamiltonian path, but this time call the machine that solves problem B only with values from the Val array. Since every subset of edges is included in the sorted array, it is guaranteed that the solution will be found. The total number of calls of the machine is O(lg(2^|E|)), which is O(|E|). So, the correct choice is (2) O(|E|) times.
Note:
The first algorithm I proposed is probably incorrect, as some people noted that you cannot apply binary search in an uncountable set. Since we are talking about real numbers, we cannot apply binary search in the range [0-M]
I believe that choice that was meant to be the answer is 1- you can't do that.
The reason is that you can only do binary search on countable sets.
Note that the edges of the graph may even have negative weights, and besides, they may have fractional, or even irrational weights. In that case, the search space for the answer will be the set of all real values less than m.
However,you may get arbitrarily close to the answer of A in Log(n) time, but you cannot find the exact answer. (n being the size of the countable space).
Supposing that in the encoding of graphs the weights are encoded as binary strings representing nonnegative integers and that Problem B can actually algorithmically be solved by entering a real number and perform calculations based on that, things are apparently as follows.
It is possible to do first binary search over the integral interval {0,...,M} to obtain the minumum weight of a Hamiltonian tour in O(log M) calls to the algorithm for Problem B. As afterwards the optimum is known, we can eliminate single edges in G and use the resulting graph as an input to the algorithm for Problem B to test whether or not the optimum changes. This process uses O(|E|) calls to the algorithm for Problem B to identify edges which occur in an optimal Hamiltonian tour. The overall running time of this approach is O( (|E| + log M ) * r(G)), where r(G) denotes the running time of the algorithm for Problem B taking a graph G as an input. I suppose that r is a polynomial, although the question does not explicitly state this; in total, the overall running time would be polynomially bounded in the encoding length of the input, as M can be computed in polynomial time (and hence is pseudopolynomially bounded in the encoding length of the input G).
That being said, the supposed answers can be commented as follows.
The answer is wrong, as the set of necessary states are finite.
Might be true, but does not follow from the algorithm discussed above.
Might be true, but does not follow from the algorithm discussed above.
The answer is wrong. Strictly speaking, the NP-hardness of Problem A does not rule out a polynomial time algorithm; furthermore, the algorithm for Problem B is not stated to be polynomial, so even P=NP does not follow if Problem A can be solved by a polynomial number of calls to the algorithm for Problem B (which is the case by the algorithm sketched above).

Minimum number of days required to solve a list of questions

There are N problems numbered 1..N which you need to complete. You've arranged the problems in increasing difficulty order, and the ith problem has estimated difficulty level i. You have also assigned a rating vi to each problem. Problems with similar vi values are similar in nature. On each day, you will choose a subset of the problems and solve them. You've decided that each subsequent problem solved on the day should be tougher than the previous problem you solved on that day. Also, to not make it boring, consecutive problems you solve should differ in their vi rating by at least K. What is the least number of days in which you can solve all problems?
Input:
The first line contains the number of test cases T. T test cases follow. Each case contains an integer N and K on the first line, followed by integers v1,...,vn on the second line.
Output:
Output T lines, one for each test case, containing the minimum number of days in which all problems can be solved.
Constraints:
1 <= T <= 100
1 <= N <= 300
1 <= vi <= 1000
1 <= K <= 1000
Sample Input:
2
3 2
5 4 7
5 1
5 3 4 5 6
Sample Output:
2
1
This is one of the challenge from interviewstreet.
Below is my approach
Start from 1st question and find out max possible number of question can be solve and remove these questions from the question list.Now again start from first element of the remainning list and do this till now size of the question list is 0.
I am getting wrong answer from this method so looking for some algo to solve this challenge.
Construct a DAG of problems in the following way. Let pi and pj be two different problems. Then we will draw a directed edge from pi to pj if and only if pj can be solved directly after pi on the same day, consecutively. Namely, the following conditions have to be satisfied:
i < j, because you should solve the less difficult problem earlier.
|vi - vj| >= K (the rating requirement).
Now notice that each subset of problems that is chosen to be solved on some day corresponds to the directed path in that DAG. You choose your first problem, and then you follow the edges step by step, each edge in the path corresponds to the pair of problems that have been solved consecutively on the same day. Also, each problem can be solved only once, so any node in our DAG may appear only in exactly one path. And you have to solve all the problems, so these paths should cover all the DAG.
Now we have the following problem: given a DAG of n nodes, find the minimal number of non-crossing directed paths that cover this DAG completely. This is a well-known problem called Path cover. Generally speaking, it is NP-hard. However, our directed graph is acyclic, and for acyclic graphs it can be solved in polynomial time using reduction to the matching problem. Maximal matching problem, in its turn, can be solved using Hopcroft-Karp algorithm, for example. The exact reduction method is easy and can be read, say, on Wikipedia. For each directed edge (u, v) of the original DAG one should add an undirected edge (au, bv) to the bipartite graph, where {ai} and {bi} are two parts of size n.
The number of nodes in each part of the resulting bipartite graph is equal to the number of nodes in the original DAG, n. We know that Hopcroft-Karp algorithm runs in O(n2.5) in the worst case, and 3002.5 ≈ 1 558 845. For 100 tests this algorithm should take under a 1 second in total.
The algorithm is simple. First, sort the problems by v_i, then, for each problem, find the number of problems in the interval (v_i-K, v_i]. The maximum of those numbers is the result. The second phase can be done in O(n), so the most costly operation is sorting, making the whole algorithm O(n log n). Look here for a demonstration of the work of the algorithm on your data and K=35 in a spreadsheet.
Why does this work
Let's reformulate the problem to the problem of graph coloring. We create graph G as follows: vertices will be the problems and there will be an edge between two problems iff |v_i - v_j| < K.
In such graph, independent sets exactly correspond to sets of problems doable on the same day. (<=) If the set can be done on a day, it is surely an independent set. (=>) If the set doesn't contain two problems not satisfying the K-difference criterion, you can just sort them according to the difficulty and solve them in this order. Both condition will be satisfied this way.
Therefore, it easily follows that colorings of graph G exactly correspond to schedules of the problems on different days, with each color corresponding to one day.
So, we want to find the chromaticity of graph G. This will be easy once we recognize the graph is an interval graph, which is a perfect graph, those have chromaticity equal to cliqueness, and both can be found by a simple algorithm.
Interval graphs are graphs of intervals on the real line, edges are between intervals that intersect. Our graph, as can be easily seen, is an interval graph (for each problem, assign an interval (v_i-K, v_i]. It can be easily seen that the edges of this interval graph are exactly the edges of our graph).
Lemma 1: In an interval graph, there exist a vertex whose neighbors form a clique.
Proof is easy. You just take the interval with the lowest upper bound (or highest lower bound) of all. Any intervals intersecting it have the upper bound higher, therefore, the upper bound of the first interval is contained in the intersection of them all. Therefore, they intersect each other and form a clique. qed
Lemma 2: In a family of graphs closed on induced subgraphs and having the property from lemma 1 (existence of vertex, whose neighbors form a clique), the following algorithm produces minimal coloring:
Find the vertex x, whose neighbors form a clique.
Remove x from the graph, making its subgraph G'.
Color G' recursively
Color x by the least color not found on its neighbors
Proof: In (3), the algorithm produces optimal coloring of the subgraph G' by induction hypothesis + closeness of our family on induced subgraphs. In (4), the algorithm only chooses a new color n if there is a clique of size n-1 on the neighbors of x. That means, with x, there is a clique of size n in G, so its chromaticity must be at least n. Therefore, the color given by the algorithm to a vertex is always <= chromaticity(G), which means the coloring is optimal. (Obviously, the algorithm produces a valid coloring). qed
Corollary: Interval graphs are perfect (perfect <=> chromaticity == cliqueness)
So we just have to find the cliqueness of G. That is easy easy for interval graphs: You just process the segments of the real line not containing interval boundaries and count the number of intervals intersecting there, which is even easier in your case, where the intervals have uniform length. This leads to an algorithm outlined in the beginning of this post.
Do we really need to go to path cover? Can't we just follow similar strategy as LIS.
The input is in increasing order of complexity. We just have to maintain a bunch of queues for tasks to be performed each day. Every element in the input will be assigned to a day by comparing the last elements of all queues. Wherever we find a difference of 'k' we append the task to that list.
For ex: 5 3 4 5 6
1) Input -> 5 (Empty lists so start a new one)
5
2) 3 (only list 5 & abs(5-3) is 2 (k) so append 3)
5--> 3
3) 4 (only list with last vi, 3 and abs(3-4) < k, so start a new list)
5--> 3
4
4) 5 again (abs(3-5)=k append)
5-->3-->5
4
5) 6 again (abs(5-6) < k but abs(4-6) = k) append to second list)
5-->3-->5
4-->6
We just need to maintain an array with last elements of each list. Since order of days (when tasks are to be done is not important) we can maintain a sorted list of last tasks therefore, searching a place to insert the new task is just looking for the value abs(vi-k) which can be done via binary search.
Complexity:
The loop runs for N elements. In the worst case , we may end up querying ceil value using binary search (log i) for many input[i].
Therefore, T(n) < O( log N! ) = O(N log N). Analyse to ensure that the upper and lower bounds are also O( N log N ). The complexity is THETA (N log N).
The minimum number of days required will be the same as the length of the longest path in the complementary (undirected) graph of G (DAG). This can be solved using Dilworth's theorem.

Resources