The standard 0/1 knapsack requires that the weight of every item is independent to others. Then DP is a efficient algorithm towards the solution. But now I met a similar but extensions of this problem, that
the weight of new items are dependent on previous items already in
the knapsack.
For example, we have 5 items a, b, c, d and e with weight w_a, ..., w_e. item b and c have weight dependency.
When b is already in the knapsack, the weight of item c will be smaller than w_c because it can share some space with b, i.e. weight(b&c) < w_b + w_c. Symmetrically, when c is already in the knapsack, the weight of b will be smaller than w_b.
This uncertainty results a failure of original DP algorithm, since it depend on the correctness of previous iterations which may not correct now. I have read some papers about knapsack but they either have dependencies subjected to profit (quadratic knapsack problem), or have variable weight which follows a random distribution (stochastic knapsack problem). I have also aware of the previous question 1/0 Knapsack Variation with Weighted Edges, but there is only a very generic answer available, and no answer about what is the name of this knapsack.
One existing solution:
I have also read one approximate solution in a paper about DBMS optimizations, where they group the related items as one combined item for knapsack. If use this technique into our example, the items for knapsack will be a, bc, d, e, therefore there is no more dependencies between any two of these four items. However it is easy to construct an example that does not get optimal result, like when an item with "small weight and benefit" is grouped with another item with "large weight and benefit". In this example, the "small" item should not be selected in solution, but is selected together with the "large" item.
Question:
Is there any kind of efficient solving techniques that can get optimal result, or at least with some error guarantee? Or am I taking the wrong direction for modelling this problem?
Could you not have items a, b, c, bc, d and e? Possibly with a constraint that b and bc can't be both in the knapsack and similarly so with c and bc? My understanding is that that would be a correct solution since any solution that has b and c can be improved by substituting both by bc (by definition). The constraints on membership should take care of any other cases.
This is a very interesting problem and I have been working on this for a while. The first thing to consider is that binary knapsack problem with dependent item weights/value is not trivial at all. You may consider using Bayesian networks, Markov models, and other similar techniques for solving this problem. Nonetheless, any practical approach to this problem has to make some assumptions either about the optimization model or its input. Here is an example of formulating the binary knapsack problem with value-dependent items. https://arxiv.org/pdf/1702.06662.pdf
In this work, authors have proposed modeling the input (value-related dependencies) using fuzzy graphs and then using the proposed integer linear programming model to solve the optimization problem. An extended version of the work has been accepted for publication and will be soon available online.
Please do not hesitate to contact me if you needed further information. I can also provide you with the source code of the model if needed.
In the end I managed to solve the problem with the B&B method proposed by #Holt. Here is the key settings:
(0) Before running the B&B algorithm, group all items depend on their dependency. All items in one partition have weight dependency with all other items in the same group, but not with items in other groups.
Settings for B&B:
(1) Upper-bound: assume that the current item has the minimum weight, i.e. assume all dependencies exist.
(2) Lower-bound: assume that the current item has the maximum weight, i.e. assume all dependencies do not exist.
(3) Current weight: Calculate the real current weight.
All the above calculations can be done in a linear time by playing around with the groups we get in step 0. Specifically, when obtaining those weights, scanning only items in current group (the group which the current item be in) is enough - items in other groups have no dependencies with the current one, so it will not change the real weight of current item.
Related
I'm working with a problem that is similar to the box stacking problem that can be solved with a dynamic programming algorithm. I read posts here on SO about it but I have a difficult time understanding the DP approach, and would like some explanation as to how it works. Here's the problem at hand:
Given X objects, each with its own weight 'w' and strength 's', how
many can you stack on top of each other? An object can carry its own
weight and the sum of all weights on top of it as long as it does not
exceed its strength.
I understand that it has an optimal substructure, but its the overlapping subproblem part that confuses me. I'm trying to create a recursion tree to see where it would calculate the same thing several times, but I can't figure out if the function would take one or two parameters for example.
The first step to solving this problem is proving that you can find an optimal stack with boxes ordered from highest to lowest strength.
Then you just have to sort the boxes by strength and figure out which ones are included in the optimal stack.
The recursive subproblem has two parameters: find the best stack you can put on top of a stack with X remaining strength, using boxes at positions >= Y in the list.
If good DP solution exists, it takes 2 params:
number of visited objects or number of unvisited objects
total weight of unvisited objects you can currently afford (weight of visited objects does not matter)
To make it work you have to find ordering, where placing object on top of next objects is useless. That is, for any solution with violation of this ordering there is another solution that follows this ordering and is better or equal.
You have to proof that selected ordering exists and define it clearly. I don't think simple sorting by strength, suggested by Matt Timmermans, is enough, since weight has some meaning. But it's the proofing part...
Please forgive me if I'm not using the correct terms or have overlooked an existing solution. I'm not experienced in search algorithms and the theories behind it. I just would like to solve a problem.
I've previously used what I was told to be the A* algorithm to solve a different problem. But reading up on it I've realized that what I learned is not quite what wikipedia tells me.
What I learned was:
Start at your origin node
Open a new solution for each path you can take
Recursively create a new subsolution for each path you can take from there
When you arrive at the same place with multiple solutions, drop those who took longer than the fastest
Now if I understand wikipedia correctly, this is what I was supposed to do:
Start at your origin node
Open a new solution for each path you can take
Order the solutions by "cost of path taken" + "estimated cost to target"
Take cheapest solution and create subsolutions for each possible path
order those solutions into the others then rinse repeat
I can see how this would help with not calculating quite as many solutions but my problem is that I see no possiblity to create an "optimistic" estimate.
I'm not searching for a path on a geographical map. I'm trying to find the best sequence of actions. There's a minimum sequence of - say - ABCDEFGH. You cannot do F before E but repeating previous actions in particilar ordering might make later actions more efficient.
Do I need a different search algorithm? Do I do what I originally learned and just live with the fact that doing more work is the price for not having a good heuristic function?
I believe my teacher recognized this problem. And what I learned was simply A* with a heuristic function of f(n) = 0.
I'm not searching for a path on a geographical map. I'm trying to find
the best sequence of actions. There's a minimum sequence of - say -
ABCDEFGH. You cannot do F before E but repeating previous actions in
particular ordering might make later actions more efficient.
It is not clear to me whether you can repeat one action, i.e., a solution is ABCDEFGH, but would ABBBBCDEFGH be possible?
If not, then you might be able to have A* algorithm, implemented like this:
1. At some stage (say the first, "empty"), you have one of several actions
available.
2. The cost of going from Empty City to A City is the cost of action A.
3. The cost of going from Empty City to B city is the cost of action B.
When you've reached B, the cost of doing C is constant (if it is not, then you can't use A* as is) and you insert the cost of going from B City to C City as the cost of C.
So you can handle the case in which an action has different costs, provided that this difference is completely described by the previous state. For example, if you can only do C if you have done A or B, and the cost of C is 5 and 8, you enter the "distance" between A and C as 5, and B to C as 8.
If the cost of, say, D depends on the two previous states, you can still use a more complicated A* implementation where you define the virtual "cities" BC, AB and AC, and the distance from BC to D is "the cost of D having done B and C", and so on. The cost of reaching BC from A is "the cost of B given A, and the cost of C given A and B". So if these costs depend on the previous states, things get even more complicated.
In the end, the complexity of this revised A* will grow until it becomes your algorithm, where every state depends potentially on the sequence of all preceding states. The more this is true, the more your algorithm is convenient; the more every state is a cost unto itself, the more A* is convenient.
And of course the possibility of closed loops (visiting the same state/action twice, making this a cyclic graph) blows A* straight out of the water.
I asked about the minimum cost maximum flow several weeks ago. Kraskevich's answer was brilliant and solved my problem. I've implemented it and it works fine (available only in French, sorry). Additionaly, the algorithm can handle the assignement of i (i > 1) projects to each student.
Now I'm trying something more difficult. I'd like to add constraints on choices. In the case one wants to affect i (i > 1) projects to each student, I'd like to be able to specify which projects are compatible (each other).
In the case some projects are not compatible, I'd like the algorithm to return the global optimum, i.e. affect i projects to each student maximizing global happiness and repecting compatibility constraints.
Chaining i times the original method (and checking constraints at each step) will not help, as it would only return a local optimum.
Any idea about the correct graph to work with ?
Unfortunately, it is not solvable in polynomial time(unless P = NP or there are additional constraints).
Here is a polynomial time reduction from the maximum independent set problem(which is known to be NP-complete) to this one:
Given a graph G and a number k, do the following:
Create a project for each vertex in the graph G and say that two project are incompatible iff there is an edge between the corresponding vertices in G.
Create one student who likes each project equally(we can assume that the happiness each project gives to him is equal to 1).
Find the maximum happiness using an algorithm that solves the problem stated in your question. Let's call it h.
A set of projects can be picked iff they all are compatible, which means that the picked vertices of G form an independent set(due to the way we constructed the graph).
Thus, h is equal to the size of the maximum independent set.
Return h >= k.
What does it mean in practice? It means that it is not reasonable to look for a polynomial time solution to this problem. There are several things that can be done:
If the input is small, you can use exhaustive search.
If it is not, you can use heuristics and/or approximations to find a relatively good solution(not necessary the optimal one, though).
If you can stomach the library dependency, integer programming will be quicker and easier than anything you can implement yourself. All you have to do is formulate the original problem as an integer program and then add your ad hoc constraints at the end.
Can someone provide me with a backtracking algorithm to solve the "set cover" problem to find the minimum number of sets that cover all the elements in the universe?
The greedy approach almost always selects more sets than the optimal number of sets.
This paper uses Linear Programming Relaxation to solve covering problems.
Basically, the LP relaxation yields good bounds, and can be used to identify solutions that are optimum in many cases. Incidentally, when I last looked at open source LP solvers (~2003) I wasn't impressed (some gave incorrect results), but there seem to be some decent open source LP solvers now.
Your problem needs a little more clarification - it seems that you are given a family of subsets $$S_1,\ldots,S_n$$ of a set A, such that the union of the subsets equals A, and you want a minimum number of subsets whose union is still A.
The basic approach is branch and bound with some heuristics. E.g., if a particular element of A is in only one subset $$S_i$$, then you must select $$S_i$$. Similarly, if $$S_k$$ is a subset of $$S_j$$, then there's no reason to consider $$S_k$$; if element $$a_i$$ is in every subset that $$a_j$$ is in, then you can not bother considering $$a_i$$.
For branch and bound you need good bounding heuristics. Lower bounds can come from independent sets (if there are k elements $$i_1,\ldots,i_L$$ in A such that each if $$i_p$$ is contained in $$A_p$$ and $$i_q$$ is contained in $$A_q$$ then $$A_p$$ and $$A_q$$ are disjoint). Better lower bounds come from the LP relaxation described above.
The Espresso logic minimization system from Berkeley has a very high quality set covering engine.
My best shot so far:
A delivery vehicle needs to make a series of deliveries (d1,d2,...dn), and can do so in any order--in other words, all the possible permutations of the set D = {d1,d2,...dn} are valid solutions--but the particular solution needs to be determined before it leaves the base station at one end of the route (imagine that the packages need to be loaded in the vehicle LIFO, for example).
Further, the cost of the various permutations is not the same. It can be computed as the sum of the squares of distance traveled between di -1 and di, where d0 is taken to be the base station, with the caveat that any segment that involves a change of direction costs 3 times as much (imagine this is going on on a railroad or a pneumatic tube, and backing up disrupts other traffic).
Given the set of deliveries D represented as their distance from the base station (so abs(di-dj) is the distance between two deliveries) and an iterator permutations(D) which will produce each permutation in succession, find a permutation which has a cost less than or equal to that of any other permutation.
Now, a direct implementation from this description might lead to code like this:
function Cost(D) ...
function Best_order(D)
for D1 in permutations(D)
Found = true
for D2 in permutations(D)
Found = false if cost(D2) > cost(D1)
return D1 if Found
Which is O(n*n!^2), e.g. pretty awful--especially compared to the O(n log(n)) someone with insight would find, by simply sorting D.
My question: can you come up with a plausible problem description which would naturally lead the unwary into a worse (or differently awful) implementation of a sorting algorithm?
I assume you're using this question for an interview to see if the applicant can notice a simple solution in a seemingly complex question.
[This assumption is incorrect -- MarkusQ]
You give too much information.
The key to solving this is realizing that the points are in one dimension and that a sort is all that is required. To make this question more difficult hide this fact as much as possible.
The biggest clue is the distance formula. It introduces a penalty for changing directions. The first thing an that comes to my mind is minimizing this penalty. To remove the penalty I have to order them in a certain direction, this ordering is the natural sort order.
I would remove the penalty for changing directions, it's too much of a give away.
Another major clue is the input values to the algorithm: a list of integers. Give them a list of permutations, or even all permutations. That sets them up to thinking that a O(n!) algorithm might actually be expected.
I would phrase it as:
Given a list of all possible
permutations of n delivery locations,
where each permutation of deliveries
(d1, d2, ...,
dn) has a cost defined by:
Return permutation P such that the
cost of P is less than or equal to any
other permutation.
All that really needs to be done is read in the first permutation and sort it.
If they construct a single loop to compare the costs ask them what the big-o runtime of their algorithm is where n is the number of delivery locations (Another trap).
This isn't a direct answer, but I think more clarification is needed.
Is di allowed to be negative? If so, sorting alone is not enough, as far as I can see.
For example:
d0 = 0
deliveries = (-1,1,1,2)
It seems the optimal path in this case would be 1 > 2 > 1 > -1.
Edit: This might not actually be the optimal path, but it illustrates the point.
YOu could rephrase it, having first found the optimal solution, as
"Give me a proof that the following convination is the most optimal for the following set of rules, where optimal means the smallest number results from the sum of all stage costs, taking into account that all stages (A..Z) need to be present once and once only.
Convination:
A->C->D->Y->P->...->N
Stage costs:
A->B = 5,
B->A = 3,
A->C = 2,
C->A = 4,
...
...
...
Y->Z = 7,
Z->Y = 24."
That ought to keep someone busy for a while.
This reminds me of the Knapsack problem, more than the Traveling Salesman. But the Knapsack is also an NP-Hard problem, so you might be able to fool people to think up an over complex solution using dynamic programming if they correlate your problem with the Knapsack. Where the basic problem is:
can a value of at least V be achieved
without exceeding the weight W?
Now the problem is a fairly good solution can be found when V is unique, your distances, as such:
The knapsack problem with each type of
item j having a distinct value per
unit of weight (vj = pj/wj) is
considered one of the easiest
NP-complete problems. Indeed empirical
complexity is of the order of O((log
n)2) and very large problems can be
solved very quickly, e.g. in 2003 the
average time required to solve
instances with n = 10,000 was below 14
milliseconds using commodity personal
computers1.
So you might want to state that several stops/packages might share the same vj, inviting people to think about the really hard solution to:
However in the
degenerate case of multiple items
sharing the same value vj it becomes
much more difficult with the extreme
case where vj = constant being the
subset sum problem with a complexity
of O(2N/2N).
So if you replace the weight per value to distance per value, and state that several distances might actually share the same values, degenerate, some folk might fall in this trap.
Isn't this just the (NP-Hard) Travelling Salesman Problem? It doesn't seem likely that you're going to make it much harder.
Maybe phrasing the problem so that the actual algorithm is unclear - e.g. by describing the paths as single-rail railway lines so the person would have to infer from domain knowledge that backtracking is more costly.
What about describing the question in such a way that someone is tempted to do recursive comparisions - e.g. "can you speed up the algorithm by using the optimum max subset of your best (so far) results"?
BTW, what's the purpose of this - it sounds like the intent is to torture interviewees.
You need to be clearer on whether the delivery truck has to return to base (making it a round trip), or not. If the truck does return, then a simple sort does not produce the shortest route, because the square of the return from the furthest point to base costs so much. Missing some hops on the way 'out' and using them on the way back turns out to be cheaper.
If you trick someone into a bad answer (for example, by not giving them all the information) then is it their foolishness or your deception that has caused it?
How great is the wisdom of the wise, if they heed not their ego's lies?