So I've come across an interesting problem I'd like to solve. It came across when I was trying to solve a game with nondeterminstic transitions. If you've ever heard of this problem or know if it has a name/papers written about it let me know! Here it is.
Given n boxes and m elements where n1 has i1 elements, n2 has i2 elements, etc (i.e i1 + i2 + ... + in = m). Each element has a weight w and value v. Find a selection of exactly one element from each n boxes (solution size = n) such that the value is maximized and the weight <= k (some input parameter).
The first thing I noticed is there are i1*i2...*in solutions. This is less than m choose n, which is less than 2^m, so does this mean the problem is in P (sorry my math is a little fuzzy)? Does anyone have any idea of an algorithm that does not involve iterating over every solution? Approximations are fine!
Edit: Okay so this problem is actually identical to the knapsack problem, so it's NP-hard. Let the boxes have two elements each, one of zero size and zero value, and one of nonzero size and nonzero value. This is identical to knapsack. Can anyone think of a clever pseudopolynomial time algorithm/conversion to knapsack?
This looks close enough to http://en.wikipedia.org/wiki/Knapsack_problem#0.2F1_Knapsack_Problem that almost the same definition of m[i, w] as given there will work - let m[i, w] be the maximum value that can be obtained with weight <= w using items up to i. The only difference is that at each stage instead of considering either taking an item or not, consider which of the possible items at each stage you should take.
Related
You are given an array of positive integers of size N. You can choose any positive number x such that x<=max(Array) and subtract it from all elements of the array greater than and equal to x.
This operation has a cost A[i]-x for A[i]>=x. The total cost for a particular step is the
sum(A[i]-x). A step is only valid if the sum(A[i]-x) is less than or equal to a given number K.
For all the valid steps find the minimum number of steps to make all elements of the array zero.
0<=i<10^5
0<=x<=10^5
0<k<10^5
Can anybody help me with any approach? DP will not work due to high constraints.
Just some general exploratory thoughts.
First, there should be a constraint on N. If N is 3, this is much easier than if it is 100. The naive brute force approach is going to be O(k^N)
Next, you are right that DP will not work with these constraints.
For a greedy approach, I would want to minimize the number of distinct non-zero values, and not maximize how much I took. Our worst case approach is take out the largest each time, for N steps. If you can get 2 pairs of entries to both match, then that shortened our approach.
The obvious thing to try if you can is an A* search. However that requires a LOWER bound (not upper). The best naive lower bound that I can see is ceil(log_2(count_distinct_values)). Unless you're incredibly lucky and the problem can be solved that quickly, this is unlikely to narrow your search enough to be helpful.
I'm curious what trick makes this problem actually doable.
I do have an idea. But it is going to take some thought to make it work. Naively we want to take each choice for x and explore the paths that way. And this is a problem because there are 10^5 choices for x. After 2 choices we have a problem, and after 3 we are definitely not going to be able to do it.
BUT instead consider the possible orders of the array elements (with ties both possible and encouraged) and the resulting inequalities on the range of choices that could have been made. And now instead of having to store a 10^5 choices of x we only need store the distinct orderings we get, and what inequalities there are on the range of choices that get us there. As long as N < 10, the number of weak orderings is something that we can deal with if we're clever.
It would take a bunch of work to flesh out this idea though.
I may be totally wrong, and if so, please tell me and I'm going to delete my thoughts: maybe there is an opportunity if we translate the problem into another form?
You are given an array A of positive integers of size N.
Calculate the histogram H of this array.
The highest populated slot of this histogram has index m ( == max(A)).
Find the shortest sequence of selections of x for:
Select an index x <= m which satisfies sum(H[i]*(i-x)) <= K for i = x+1 .. m (search for suitable x starts from m down)
Add H[x .. m] to H[0 .. m-x]
Set the new m as the highest populated index in H[0 .. x-1] (we ignore everything from H[x] up)
Repeat until m == 0
If there is only a "good" but not optimal solution sought for, I could imagine that some kind of spectral analysis of H could hint towards favorable x selections so that maxima in the histogram pile upon other maxima in the reduction step.
I am solving the following problem from hackerrank
https://www.hackerrank.com/challenges/coin-change/problem
I 'm unable to solve the problem , so I have looked at the editorial and they mentioned
T(i, m) = T(i, m-i)+T(i+1, m)
I'm unable to get big picture of why this solution works on a higher level. (like a proof in CLRS or simple understandable example)
Solution which I have written is as follows
fun(m){
//base cases
count = 0;
for(i..n){
count+= fun(m-i);
}
}
My solution didn't work because there are some duplicates calls. But how editorial works and what is the difference between my solution and editorial on a higher level..
I think in order for this to work you have to clearly define what T is. Namely, let's define T(i,m) to be the number of ways to make change for m units using only coins with index at least i (i.e. we only look at the ith coin, the (i+1)th coin, all the way to the nth coin while neglecting the first i-1 coins). Further, we define an array C such that C[i] is the value of the ith coin (note that in general C[i] is not the same as i). As a result, if there are n coins (i.e. length of C is n) and we want to make change for W units, we are looking for the value T(0, W) as our answer (make sure you can see why this is the case at this point!).
Now, we proceed by constructing a recursive definition of T(i,m). Note that our solution will either contain an additional ith coin or it won't. In the case that it does, our new target will simply be m - C[i] and the number of ways to make change for this is T(i,m - C[i]) (since our new target is now C[i] less than m). In another case, our solution doesn't contain the ith coin. In this case, we keep the target value the same, but only consider coins with index greater than i. Namely, the number of ways to make change in this case is T(i+1,m). Since these cases are disjoint and exhaustive (either you put the ith coin in the solution or you don't!), we have that
T(i,m) = T(i, m-C[i]) + T(i+1,m)
which is very similar to what you had (the C[i] difference is important). Note that if m <= 0 (since we are assuming that coin values are positive), there are 0 ways to make change. You must keep these base cases in mind when computing T(i,m).
Now it remains to compute T(0, W), which you can easily do recursively. However, you likely noticed that a lot of the subproblems are repeated making this a slow solution. The solution is to use something called dynamic programming or memoization. Namely, whenever a solution is computed, add its value to a table (e.g. T[i,m] where T is a n x W size 2D array). Then whenever you recursively compute something check the table first so you don't compute the same thing twice. This is called memoization. Dynamic programming is simple except you use a little foresight to compute things in the order in which they will be needed. For example, I would compute the base cases first i.e. the column T[ . , 0]. And then I would compute all values bordering this row and column based on the recursive definition.
I'm having a hard time figuring out formulas that are used to solve a given problem more efficiently.
For example, a problem I have encountered was the following:
n children are placed in a circle. Every kth child is given chocolate until a child that has already been given chocolate is
selected again. Determine the nr number of children that don't
receive chocolate, given n and k.
Ex: n = 12, k = 9; nr will be 8.
This problem can be solved in 2 ways:
Creating a boolean array and traversing it until a child that hasn't been given chocolate is selected (not really efficient);
Using the formula: n - n / GCD(n, k);
How would I go about figuring out the 2nd way of solving it (the formula)?
Also, where can I practice this specific type of problem, where there is an obvious, slow way of solving it or an efficient one requiring you to figure out a formula?
Every problem is different, there is no rule to find a solution. You need to analyse the situation and reason about it. Mathematical training helps a lot.
For this concrete example you can proceed like this: number the children from 0 to n-1. If you start at 0, the children getting chocolate are exactly the ones with a number divisible by GCD(n, k). How many are there: n / GDC(n, k) therefore, how many don't get chocolate: n -n / GDC(n, k).
What I'm trying to achieve is continuously add more values to a set and keep them as far apart from each other as possible. I'm sure there must be several algorithms out there to solve this problem, but I'm probably just not searching with the right terms. If someone could point me to a solution (doesn't need to be a particularly efficient one) that would be great.
Effectively, given an set of values S, within a range Min-Max, I need to calculate a new value V, within the same range, such that the sum of distances between V and all values in S gets maximized.
It's easy to show that possible candidates for V are either an already existing value of S or the minimum/maximum. Proof: Let S_1, S_2, ..., S_n be the sorted sequence of S, including min and max. If you choose S_i < V < S_{i+1}, then the sum sum of distances can be achieved with either V = S_i or V = S_{i+1}, depending on the number of points on the left and the right.
This observation yields an O(n^2) algorithm that just checks every potential candidate in S. It can be improved to O(n) by computing prefix sums upfront to compute the sum of distances in O(1) per element.
In general, since each element contributes two linear cost functions to the domain of possible values, this problem can be solved in O(log n) per query. You just need a data structure that can maintain a list of linear function segments and returns the point with maximum sum. A balanced binary search tree with some clever augmentation and lazy updates can solve this. Whether this is necessary or not of course depends on the number of elements and the number of queries you expect to perform.
I don't think there is a silver bullet solution to your problem, but this is how I would go about solving it generally. First, you need to define a function sumDistance() which takes in a new value V along with all the values in the current set, and outputs the sum of the distances between V and each value in the set.
Next, you can iterate over the domain d of sumDistance(), where Min <= d <= Max, and keep track of the sums for each value V in the domain. When you encounter a new largest sum, then record it. The V value which gave you the largest sum is the value you retain and add to your set.
This algorithm can be repeated for each new value you wish to add. Note that because this is essentially a one dimensional optimization problem, the running time should not be too bad so your first attempt might be good enough.
Assuming the distance d(a,b) = |a-b| then one of min and max will always yield a maximum.
Proof:
Let's assume you have V that is not at an end point. You then have n1 values that are lower and n2 values that are higher. The total distance at the minimum will be at least (n1 - n2) * (max - V) bigger and the total distance at the maximum will be at least (n2 - n1) * (V - min) bigger.
Since at least one of n1 - n2 and n2 - n1 must be non-negative, a maximum can always be found at one of the end points.
I recently had this problem on a test: given a set of points m (all on the x-axis) and a set n of lines with endpoints [l, r] (again on the x-axis), find the minimum subset of n such that all points are covered by a line. Prove that your solution always finds the minimum subset.
The algorithm I wrote for it was something to the effect of:
(say lines are stored as arrays with the left endpoint in position 0 and the right in position 1)
algorithm coverPoints(set[] m, set[][] n):
chosenLines = []
while m is not empty:
minX = min(m)
bestLine = n[0]
for i=1 to length of n:
if n[i][0] <= minX and n[i][1] > bestLine[1] then
bestLine = n[i]
add bestLine to chosenLines
for i=0 to length of m:
if m[i] <= bestLine[1] then delete m[i] from m
return chosenLines
I'm just not sure if this always finds the minimum solution. It's a simple greedy algorithm so my gut tells me it won't, but one of my friends who is much better than me at this says that for this problem a greedy algorithm like this always finds the minimal solution. For proving mine always finds the minimal solution I did a very hand wavy proof by contradiction where I made an assumption that probably isn't true at all. I forget exactly what I did.
If this isn't a minimal solution, is there a way to do it in less than something like O(n!) time?
Thanks
Your greedy algorithm IS correct.
We can prove this by showing that ANY other covering can only be improved by replacing it with the cover produced by your algorithm.
Let C be a valid covering for a given input (not necessarily an optimal one), and let S be the covering according to your algorithm. Now lets inspect the points p1, p2, ... pk, that represent the min points you deal with at each iteration step. The covering C must cover them all as well. Observe that there is no segment in C covering two of these points; otherwise, your algorithm would have chosen this segment! Therefore, |C|>=k. And what is the cost (segments count) in your algorithm? |S|=k.
That completes the proof.
Two notes:
1) Implementation: Initializing bestLine with n[0] is incorrect, since the loop may be unable to improve it, and n[0] does not necessarily cover minX.
2) Actually this problem is a simplified version of the Set Cover problem. While the original is NP-complete, this variation results to be polynomial.
Hint: first try proving your algorithm works for sets of size 0, 1, 2... and see if you can generalise this to create a proof by induction.