Utility Maximizing Assignment - algorithm

I posted this on computer science section but no one replied :(. Any help would be greatly appreciated :).
There is a grid of size MxN. M~20000 and N~10. So M is very huge. So one way is to look at this is N grid blocks of size M placed side by side. Next assume that there are K number of users who each have a utility matrix of MxN, where each element provides the utility that the user will obtain if that user is assigned that grid element. The allocation needs to be done in a way such for each assigned user total utility must exceed a certain threshold utility U in every grid block. Assume only one user can be assigned one grid element. What is the maximum number of users that can be assigned?. (So its okay if some users are not assigned ).
Level 2: Now assume for each user at least n out N blocks must exceed utility threshold U. For this problem, whats the maximum number of users that can be assigned.
Of course brute force search is of no use here due to K^(MN) complexity. I am guessing that some kind of dynamic programming approach maybe possible.

To my understanding, the problem can be modelled as a Maximum Bipartite Matching problem, which can be solved efficiently with the Hungarian algorithm. In the left partition L, create K nodes, one for each user. In the right partition R, create L*M*N nodes, one for each cell in the grid. As edges create edges for each l in L and r in R with cost equal to the cost of the assignment of user l to the grid cell r.

Using a different interpretation of your question than Codor, I am going to claim that (at least in theory, in the worst case) it is a hard problem.
Suppose that we can solve it in the special case when there is one block which must be shared between two users, who each have the same utility for each cell, and the threshold utility U is (half of the total utility for all the cells in the block), minus one.
This means that in order to solve the problem we must take a list of numbers and divide them up into two sets such that the sum of the numbers in each set is the same, and is exactly half of the total sum of the numbers available.
This is http://en.wikipedia.org/wiki/Partition_problem, which is NP complete, so if you could solve your problem as I have described it you could solve a problem known to be hard.
(However the Wikipedia entry does say that this is known as "the easiest hard problem" so even if I have described it correctly, there may be solutions that work well in practice).

Related

Extended version of the set cover problem

I don't generally ask questions on SO, so if this question seems inappropriate for SO, just tell me (help would still be appreciated of course).
I'm still a student and I'm currently taking a class in Algorithms. We recently learned about the Branch-and-Bound paradigm and since I didn't fully understand it, I tried to do some exercises in our course book. I came across this particular instance of the Set Cover problem with a special twist:
Let U be a set of elements and S = {S1, S2, ..., Sn} a set of subsets of U, where the union of all sets Si equals U. Outline a Branch-and-Bound algorithm to find a minimal subset Q of S, so that for all elements u in U there are at least two sets in Q, which contain u. Specifically, elaborate how to split the problem up into subproblems and how to calculate upper and lower bounds.
My first thought was to sort all the sets Si in S in descending order, according to how many elements they contain which aren't yet covered at least twice by the currently chosen subsets of S, so our current instance of Q. I was then thinking of recursively solving this, where I choose the first set Si in the sorted order and make one recursive call, where I take this set Si and one where I don't (meaning from those recursive calls onwards the subset is no longer considered). If I choose it I would then go through each element in this chosen subset Si and increase a counter for all its elements (before the recursive call), so that I'll eventually know, when an element is already covered by two or more chosen subsets. Since I sort the not chosen sets Si for each recursive call, I would theoretically (in my mind at least) always be making the best possible choice for the moment. And since I basically create a binary tree of recursive calls, because I always make one call with the current best subset chosen and one where I don't I'll eventually cover all 2^n possibilities, meaning eventually I'll find the optimal solution.
My problem now is I don't know or rather understand how I would implement a heuristic for upper and lower bounds, so the algorithm can discard some of the paths in the binary tree, which will never be better than the current best Q. I would appreciate any help I could get.
Here's a simple lower bound heuristic: Find the set containing the largest number of not-yet-twice-covered elements. (It doesn't matter which set you pick if there are multiple sets with the same, largest possible number of these elements.) Suppose there are u of these elements in total, and this set contains k <= u of them. Then, you need to add at least u/k further sets before you have a solution. (Why? See if you can prove this.)
This lower bound works for the regular set cover problem too. As always with branch and bound, using it may or may not result in better overall performance on a given instance than simply using the "heuristic" that always returns 0.
First, some advice: don't re-sort S every time you recurse/loop. Sorting is an expensive operation (O(N log N)) so putting it in a loop or a recursion usually costs more than you gain from it. Generally you want to sort once at the beginning, and then leverage that sort throughout your algorithm.
The sort you've chosen, descending by the length of the S subsets is a good "greedy" ordering, so I'd say just do that upfront and don't re-sort after that. You don't get to skip over subsets that are not ideal within your recursion, but checking a redundant/non-ideal subset is still faster than re-sorting every time.
Now what upper/lower bounds can you use? Well generally, you want your bounds and bounds-checking to be as simple and efficient as possible because you are going to be checking them a lot.
With this in mind, an upper bounds is easy: use the shortest set-length solution that you've found so far. Initially set your upper-bounds as var bestQlength = int.MaxVal, some maximum value that is greater than n, the number of subsets in S. Then with every recursion you check if currentQ.length > bestQlength, if so then this branch is over the upper-bounds and you "prune" it. Obviously when you find a new solution, you also need to check if it is better (shorter) than your current bestQ and if so then update both bestQ and bestQlength at the same time.
A good lower bounds is a bit trickier, the simplest I can think of for this problem is: Before you add a new subset Si into your currentQ, check to see if Si has any elements that are not already in currentQ two or more times, if it does not, then this Si cannot contribute in any way to the currentQ solution that you are trying to build, so just skip it and move on to the next subset in S.

Object stacking, dynamic programming

I'm working with a problem that is similar to the box stacking problem that can be solved with a dynamic programming algorithm. I read posts here on SO about it but I have a difficult time understanding the DP approach, and would like some explanation as to how it works. Here's the problem at hand:
Given X objects, each with its own weight 'w' and strength 's', how
many can you stack on top of each other? An object can carry its own
weight and the sum of all weights on top of it as long as it does not
exceed its strength.
I understand that it has an optimal substructure, but its the overlapping subproblem part that confuses me. I'm trying to create a recursion tree to see where it would calculate the same thing several times, but I can't figure out if the function would take one or two parameters for example.
The first step to solving this problem is proving that you can find an optimal stack with boxes ordered from highest to lowest strength.
Then you just have to sort the boxes by strength and figure out which ones are included in the optimal stack.
The recursive subproblem has two parameters: find the best stack you can put on top of a stack with X remaining strength, using boxes at positions >= Y in the list.
If good DP solution exists, it takes 2 params:
number of visited objects or number of unvisited objects
total weight of unvisited objects you can currently afford (weight of visited objects does not matter)
To make it work you have to find ordering, where placing object on top of next objects is useless. That is, for any solution with violation of this ordering there is another solution that follows this ordering and is better or equal.
You have to proof that selected ordering exists and define it clearly. I don't think simple sorting by strength, suggested by Matt Timmermans, is enough, since weight has some meaning. But it's the proofing part...

Dual knapsack algorithm

Say you have a warehouse with fragile goods (f.e. vegetables or fruits), and you can only take out a container with vegetables once. If you move them twice, they'll rot too fast and cant be sold anymore.
So if you give a value to every container of vegetables (depending on how long they'll still be fresh), you want to sell the lowest value first. And when a client asks a certain weight, you want to deliver a good service, and give the exact weight (so you need to take some extra out of your warehouse, and throw the extra bit away after selling).
I don't know if this problem has a name, but I would consider this the dual form of the knapsack problem. In the knapsack problem, you want to maximise the value and limit the weight to a maximum. While here you want to minimise the value and limit the weight to a minimum.
You can easily see this duality by treating the warehouse as the knapsack, and optimising the warehouse for the maximum value and limited weight to a maximum of the current weight minus what the client asks.
However, many practical algorithms on solving the knapsack problem rely on the assumption that the weight you can carry is small compared to the total weight you can chose from. F.e. the dynamic programming 0/1 solution relies on looping until you reach the maximum weight, and the FPTAS solution guarantees to be correct within a factor of (1-e) of the total weight (but a small factor of a huge value can still make a pretty big difference).
So both have issues when the wanted weight is big.
As such, I wondered if anyone studied the "dual knapsack problem" already (if some literature can be found around it), or if there's some easy modification to the existing algorithms that I'm missing.
The usual pseudopolynomial DP algorithm for solving knapsack asks, for each i and w, "What is the largest total value I can get from the first i items if I use at most w capacity?"
You can instead ask, for each i and w, "What is the smallest total value I can get from the first i items if I use at least w capacity?" The logic is almost identical, except that the direction of the comparison is reversed, and you need a special value to record the possibility that even taking all i of the first i items cannot reach w capacity -- infinity works for this, since you want this value to lose against any finite value when they are compared with min().

Optimal selection election algorithm

Given a bunch of sets of people (similar to):
[p1,p2,p3]
[p2,p3]
[p1]
[p1]
Select 1 from each set, trying to minimize the maximum number of times any one person is selected.
For the sets above, the max number of times a given person MUST be selected is 2.
I'm struggling to get an algorithm for this. I don't think it can be done with a greedy algorithm, more thinking along the lines of a dynamic programming solution.
Any hints on how to go about this? Or do any of you know any good websites about this stuff that I could have a look at?
This is neither dynamic nor greedy. Let's look at a different problem first -- can it be done by selecting every person at most once?
You have P people and S sets. Create a graph with S+P vertices, representing sets and people. There is an edge between person pi and set si iff pi is an element of si. This is a bipartite graph and the decision version of your problem is then equivalent to testing whether the maximum cardinality matching in that graph has size S.
As detailed on that page, this problem can be solved by using a maximum flow algorithm (note: if you don't know what I'm talking about, then take your time to read it now, as you won't understand the rest otherwise): first create a super-source, add an edge linking it to all people with capacity 1 (representing that each person may only be used once), then create a super-sink and add edges linking every set to that sink with capacity 1 (representing that each set may only be used once) and run a suitable max-flow algorithm between source and sink.
Now, let's consider a slightly different problem: can it be done by selecting every person at most k times?
If you paid attention to the remarks in the last paragraph, you should know the answer: just change the capacity of the edges leaving the super-source to indicate that each person may be used more than once in this case.
Therefore, you now have an algorithm to solve the decision problem in which people are selected at most k times. It's easy to see that if you can do it with k, then you can also do it with any value greater than k, that is, it's a monotonic function. Therefore, you can run a binary search on the decision version of the problem, looking for the smallest k possible that still works.
Note: You could also get rid of the binary search by testing each value of k sequentially, and augmenting the residual network obtained in the last run instead of starting from scratch. However, I decided to explain the binary search version as it's conceptually simpler.

Looking for a multidimensional optimization algorithm

Problem description
There are different categories which contain an arbitrary amount of elements.
There are three different attributes A, B and C. Each element does have an other distribution of these attributes. This distribution is expressed through a positive integer value. For example, element 1 has the attributes A: 42 B: 1337 C: 18. The sum of these attributes is not consistent over the elements. Some elements have more than others.
Now the problem:
We want to choose exactly one element from each category so that
We hit a certain threshold on attributes A and B (going over it is also possible, but not necessary)
while getting a maximum amount of C.
Example: we want to hit at least 80 A and 150 B in sum over all chosen elements and want as many C as possible.
I've thought about this problem and cannot imagine an efficient solution. The sample sizes are about 15 categories from which each contains up to ~30 elements, so bruteforcing doesn't seem to be very effective since there are potentially 30^15 possibilities.
My model is that I think of it as a tree with depth number of categories. Each depth level represents a category and gives us the choice of choosing an element out of this category. When passing over a node, we add the attributes of the represented element to our sum which we want to optimize.
If we hit the same attribute combination multiple times on the same level, we merge them so that we can stripe away the multiple computation of already computed values. If we reach a level where one path has less value in all three attributes, we don't follow it anymore from there.
However, in the worst case this tree still has ~30^15 nodes in it.
Does anybody of you can think of an algorithm which may aid me to solve this problem? Or could you explain why you think that there doesn't exist an algorithm for this?
This question is very similar to a variation of the knapsack problem. I would start by looking at solutions for this problem and see how well you can apply it to your stated problem.
My first inclination to is try branch-and-bound. You can do it breadth-first or depth-first, and I prefer depth-first because I think it's cleaner.
To express it simply, you have a tree-walk procedure walk that can enumerate all possibilities (maybe it just has a 5-level nested loop). It is augmented with two things:
At every step of the way, it keeps track of the cost at that point, where the cost can only increase. (If the cost can also decrease, it becomes more like a minimax game tree search.)
The procedure has an argument budget, and it does not search any branches where the cost can exceed the budget.
Then you have an outer loop:
for (budget = 0; budget < ... ; budget++){
walk(budget);
// if walk finds a solution within the budget, halt
}
The amount of time it takes is exponential in the budget, so easier cases will take less time. The fact that you are re-doing the search doesn't matter much because each level of the budget takes as much or more time than all the previous levels combined.
Combine this with some sort of heuristic about the order in which you consider branches, and it may give you a workable solution for typical problems you give it.
IF that doesn't work, you can fall back on basic heuristic programming. That is, do some cases by hand, and pay attention to how you did it. Then program it the same way.
I hope that helps.

Resources