Is this solvable in polynomial (or pseudo-polynomial) time? - algorithm

I'm trying to come up with a reasonable algorithm for this problem:
Let's say you have a bunch of balls. Each ball has at least one color, but can also be multicolored. Each ball has a weight and a value associated with it. There are also a bunch of boxes which are each only one color. Each box has a maximum number of balls it can hold. The goal is to maximize the sum of the value in the boxes while staying under some total weight, W, and the only rule is:
In order to place a ball in a box, it has to at least have the box's color on it
(For example, you can put a blue and green ball into a blue box or a green box, but not into a red box.)
I've dome some research and this seems similar to the knapsack problem and also similar to being solvable by the Hungarian algorithm, but I can't quite seem to reduce it to either problem.
I'm just curious is there's some kind of dynamic programming algorithm for this type of problem to make it solvable in polynomial time, or if it's just the traveling salesman problem in disguise. Would it help if I knew there were at most X colors? Any help is greatly appreciated. I could also formalize the problem a bit with variable names if it would help. Thanks!
Here's a simple example:
Maximum weight: 5
Balls:
1 red ball - (value = 5, weight = 1)
1 blue ball - (value = 3, weight = 1)
1 green/red/blue ball - (value = 2, weight = 4)
1 green/blue ball - (value = 4, weight = 1)
1 red/blue ball - (value = 1, weight = 1)
Boxes:
1 red (holds 1 ball)
1 blue (holds 2 balls)
1 green (holds 1 ball)
Optimal Solution:
red ball in red box
blue ball and red/blue ball in blue box
green/blue ball in green box
Total value: 13 (5 + 3 + 1 + 4)
Total weight: 4 (1 + 1 + 1 + 1)
Note: even though the green/red/blue ball was more valuable than the red/blue ball, it's weight would have put us over the limit.
Edit:
One clarifying point: balls with the same color combination will not necessarily have the same weights and values. For example, you could have a red ball with value 3 and weight 1 and another red ball with value 2 and weight 5.
Edit 2:
We can assume integer weights and values if it helps us come up with a dynamic programming algorithm.

This is at least as complex as the Knapsack problem - consider a case where all balls are red and there is only one red box.
In the case when balls that have the same combination of colors must have the same weights and values consider a case when you have red/blue, red/green, etc. balls and only one red box.

If there is no bound on the number of boxes, then this problem is strongly NP-hard by reduction from 3-partition (set up n/3 boxes and make all the things rainbow-colored with value = weight).
If the number of boxes is constant, then there's a pseudo-polynomial time algorithm via dynamic programming, where each DP state consists of how full each box is.

The reduction from knapsack is as follows. Given your knapsack instance, you create an instance of the balls and bins problem: for each item of the knapsack instance you have a ball with the same weight and value as the item. Then you have a box representing the knapsack. The balls and the box are all blue. The capacity of the box is the limit given in the knapsack problem. Given a solution to your problem, we have a set of balls in the box whose total weight is at most the knapsack limit, and whose total value is maximised.

This problem is NP-complete, because it subsumes the knapsack problem.
That is, it's not just similar to the knapsack problem: if there is one bowl, all the balls have that bowl's color, and the maximum number of balls in the bowl is the total number of balls, then the problem is exactly the knapsack problem.
If an algorithm could solve this problem in polynomial time, it could solve any knapsack problem in polynomial time. But, since the knapsack problem is NP-complete, this problem is, too.

The best you can do in this situation is get an approximation of the optimal solution - the knapsack problem is not solvable in polynomial time itself. You may be able to still get good (although not guaranteed to be optimal) results in polynomial time if you can generate a good algorithm for it.

Related

algorithm to find closest object

I need to map blue objects to red ones by distance. The center of each object is known. The yellow and green objects, if they are shown, are hints. They help to decide which red object is more important.
For example, in the situation shown in image below:
The bottom blue object should be mapped to the bottom most right red object since both green and yellow objects are very close to that red object.
The right blue object should be mapped to top-right red object since it's closer to it.
I have a naive solution, but I'm not quite sure what to do instead of "????" below
Do you have any suggestions?
My naive solution in sort of pseudo-code:
for each BLUE:
find group P=(YELLOW_BLUE, GREEN_BLUE and RED_BLUE) when each object in P is the closest to BLUE
vector<RED> redCandidates
for each O in P:
if O is YELLOW OR O is GREEN
find closest RED to O
insert RED to redCandidates
if size of redCandidates is 0 -> return RED_BLUE
else if size of redCandidates is 1 -> return redCandidates[0] since hint has more weight to the decision
else if size of redCandidates is > 1 -> ????
UPDATE1
After looking into Minimum Cost Flow problem suggested by #ldog, I decided to use Hungarian Algorithm. I created bipartite graph where each blue node is connected to each red node and the weights on the edges are the distance between blue and red.
Now, before I solve the graph, I need to apply rewards on the edges where yellow/green are close to red. I don't quite understand how to do that.
Let's say distance between blue 1 and the red 4 is D_1_4 = 10 and the distance between the yellow hint 11 and the red 4 is D_4_11 = 3. So, because D_1_4 > D_4_11, should I just add reward to the edge 1_4? Or, should I add the reward to each edge that enters node 4, meaning edges 1_4, 2_4 and 3_4?
It seems your question is not fully formed and you are looking for a decent formulation of the things that you expressed in words.
Here are some suggestions:
Use intersection over union for dealing with how to assign similarity for overlapping areas.
There are many ways you could try to make what you expressed in words quantified, and therefore able to be optimized. One reasonable way to quantify what you expressed in words is as a minimization problem I will discuss below.
The minimization should assign one blue box to exactly one red box (given what you told me.) The green and yellow boxes are hints and are not included in this constraint, they are simply used to modify which red box is preferential over others. To formalize what you described in words, we have the set of red boxes R and the set of blue boxes B. There are m red boxes and n blue boxes with m >= n. Each pairing of blue box i with red box j has a preference w_{ij} (this preference is pre-calculated accounting for the hint boxes as well as spatial proximity.)
We wish to compute:
max \sum_{i<j} w_{ij}x_{ij}
such that
\sum_{k} x_{ik} = 1, \sum_{l} x_{lj} = 1, x_{ik} \in {0,1}
The variable x_{ij} is 1 if and only if blue box i is assigned to red box j, it is 0 otherwise.
This problem (is Totally Unimodular) and can be solved in polynomial time. In fact, it can be solved as an instance of the common Minimum Cost Flow problem. To do this, define a node in a graph for each blue box i, and a node in a graph for each red box j. Connect each blue node to each red node (directed blue->red) with an edge with weight -w_{ij} and capacity 1. Connect each blue node to the source (directed source -> blue) with an edge of capacity 1 and weight 0. Connect each red node to the sink (directed red->sink) with an edge of capacity 1 and weight 0. Give the source a supply of n and the sink a demand of n. Compute the Minimum Cost Flow on this graph (see for example lemon) and the resulting flow yields the maximum solution (alternatively minimum flow.)
Having described this in detail, I see this is already a common approach [1] to solve exactly problems like yours. Here is an implementation.
YMMV depending on how good you make your weights to be. You may want to try a machine learning approach to determine optimal weights using a ground truth dataset and an iterative refinement. The refinement can be computed for a fixed set of blue and red ground truth boxes by refining the weights w_{ij} until all other possible assignments other than the ground truth have a lower score of the optimization than the ground truth. This can be done iterative using any max-margin learning framework or technique, combined with the method I described above (and apparently is described in [1].)
[1] Zhang, Li, Nevatia: Global data association for multi-object tracking using network flows, CVPR (2008).

Springs and Hooks

Okay,
I am dealing with a problem, that I can't seem to solve. A little help ?
Problem: Given a wooden line of length M which has n hooks and the positions of these hooks are given as an array ( 0 < p < M ). There is a spring attached to each hook and each spring has a metal ball at the other end. All balls are of the same radius, R and the same mass m. The springs have the same stiffness coefficient.
How do we find the optimum position of the balls such that the springs and the system is in equilibrium ? The metal balls are not allowed to go before or after the line. i.e ends of the balls cannot be < 0 or > M. It is possible to have multiple hooks at the same position in the array.
Assumptions: The given array is always valid.
You can ignore the vertical stretch and only consider the stretch of the spring in the horizontal directions. The problem can be seen as 1D in nature then.
Limits: O(nlogn) solution or better is sought here.
Example: M = 10, array = [ 4, 4 ], R = 1 ( Diameter is 2 ), optimum ball position = [ 3, 5 ]
What I've tried so far:
take one hook/ball at a time, create clustors if two balls hit each other. Place them symmetrically at centroid of the hooks. Bottleneck O(n^2) since balls keep hitting each other
Put all balls at the complete centroid of the hooks. return max of 3 sub-problems recursively..
a) balls that are being stretched left, b) balls being stretched right, c) balls in middle of these. Bottleneck The 3 subproblems may have overlaps and getting the overlaps good seems awkward.
Here is a sort of binary search to find the correct position of each of the balls.
Start the balls next to each in order of their connections the farthest left each can go.
Calculate the amount of space you have for the balls to move (the distance from the right-most ball to the right edge), and use half of this as your starting increment.
Calculate the net force on each ball from the spring, its neighbors, and the edges.
Move each ball the increment in the direction of the net force on it, or keep it where it is if there is no net force.
If the increment is below the precision you want or all balls had no net force on them, stop. Else, Divide the increment by 2 and go to step 3.

Maximization algorithm for ball preferences

I'm trying to devise the most efficient algorithm for a problem, but I'm having some difficulty. If someone could lend a hand, either by proposing an algorithm or classifying the problem so I can do some further research, I would be very appreciative.
The problem is as follows:
There are n (an integer) number of distinct red balls, each of which has its own number, and m number of distinct green balls, each of which has its own corresponding number as well. For example, if n = 3, then there are three red balls named Red Ball 1, Red Ball 2 and Red Ball 3.
There are also two boxes in which the balls can be placed.
Before the balls are placed in the boxes however, x number of people make predictions as to which balls will be placed in which box (either box 1 or box 2). Each person gets one prediction and for each prediction they can guess one ball to be in each box. The only condition is that the ball they guess in box 1 cannot be the same color as the ball they guess to be in box 2. An example prediction would be: "I think that Red Ball 2 will be in box 1 and Green Ball 3 will be in box 2"
After everyone has made their predictions, the balls will be placed in the boxes the maximize the number of predictions that are correct.
The code I must write will be prompted with n, m, and x as well as the predictions and then be asked to return the maximum number of predictions that are correct.
Once again, I am looking for either algorithmic help or help to identify the type of problem this is. I currently have a recursive algorithm running on (n^2), but I need something a little more efficient.
Thanks for your help! Cheers, Mates!

Is this problem NP-hard?

I'm trying to come up with a reasonable algorithm for this problem:
Let's say you have a bunch of balls. Each ball has at least one color, but can also be multicolored. Each ball also has a number on it. There are also a bunch of boxes which are each only one color. The goal is to maximize the sum of the numbers on the balls in the boxes, and the only rules are:
in order to place a ball in a box, it
has to at least have the box's color
on it
you can only put one ball in each
box.
For example, you can put a blue and green ball into a blue box or a green box, but not into a red box.
I've come up with a few optimizations that help a lot in terms of running time. For example, you can sort the balls in descending order of point value. Then as you go from highest number to lowest, if the ball only has one color, and there are no other higher-point balls that contain that color, you can put it in that box (and thus remove that box and that ball from the remaining combinations).
I'm just curious is there's some kind of dynamic algorithm for this type of problem, or if it's just the traveling salesman problem in disguise. Would it help if I knew there were at most X colors? Any help is greatly appreciated. Thanks!
Edit - here's a simple example:
Balls:
1 red ball - 5 points
1 blue ball - 3 points
1 green/red ball - 2 points
1 green/blue ball - 4 points
1 red/blue ball - 1 point
Boxes:
1 red
1 blue
1 green
Optimal Solution:
red ball in red box
blue ball in blue box
green/blue ball in green box
Total value: 12 points (5 + 3 + 4)
This is a special case of the maximum weight matching problem on a weighted bipartite graph. Construct a graph whose left vertices correspond to balls, whose right vertices correspond to boxes and with the edge joining a ball and a box having weight V where V is the number on the ball if the ball can be placed in the box, and 0 otherwise. Add extra boxes or balls joined to the other side with edges of weight zero until you have the same number of vertices on each side. The assignment you're looking for is determined by the set of edges of nonzero weight in the maximum (total) weight matching in the resulting graph.
The assignment algorithm can be solved in O(n^3) time, where n is here the maximum of the number of balls or boxes, using the Hungarian algorithm. (BTW, I should make the disclaimer that I only mention the Hungarian algorithm because it is the theoretical result I happen to be familiar with and it presumably answers the question in the title of whether the original problem is NP-hard. I have no idea whether it is the best algorithm to use in practice.)
Have you tried a greedy alg?
Sort by points/value and place in box if possible.
If there are any exceptions im missing id like to see them.

How to optimally solve the flood fill puzzle?

I like playing the puzzle game Flood-It, which can be played online at:
https://www.lemoda.net/javascript/flood-it/game.html
It's also available as an iGoogle gadget. The aim is to fill the whole board with the least number of successive flood-fills.
I'm trying to write a program which can solve this puzzle optimally. What's the best way to approach this problem? Ideally I want to use the A* algorithm, but I have no idea what should be the function estimating the number of steps left. I did write a program which conducted a depth-4 brute force search to maximize the filled area. It worked reasonably well and beat me in solving the puzzle, but I'm not completely satisfied with that algorithm.
Any suggestions? Thanks in advance.
As a heuristic, you could construct a graph where each node represents a set of contiguous, same-colour squares, and each node is connected to those it touches. (Each edge weighted as 1). You could then use a path-finding algorithm to calculate the "distance" from the top left to all other nodes. Then, by looking the results of flood-filling using each of the other 5 colours, determine which one minimizes the distance to the "furthest" node, since that will likely be your bottleneck.
Add the result of that calculation to the number of fills done so far, and use that as your A* heuristic.
A naive 'greedy' algorithm is to pick the next step that maximizes the overall perimeter of the main region.
(A couple of smart friends of mine were thinking about this the other day and decided the optimium may be NP-hard (e.g. you must brute force it) - I do not know if they're correct (wasn't around to hear the reasoning and haven't thought through it myself).)
Note that for computing steps, I presume the union-find algorithm is your friend, it makes computing 'one step' very fast (see e.g. this blog post).
After playing the game a few times, I noticed that a good strategy is to always go "deep", to go for the colour which goes farthest into the unflooded territory.
A* is just a prioritized graph search. Each node is a game state, you rank nodes based on some heuristic, and always expand the lowest-expected-final-cost node. As long as your heuristic doesn't underestimate costs, the first solution you find is guaranteed to be optimal.
After playing the games a few times, I found that trying to drill to the opposite corner then all corners tended to result in a win. So a good starting cost estimate would be (cost so far) + a sufficient number of fills to reach the opposite corner [note: not minimum, just sufficient. Just greedily fill towards the corner to compute the heuristic].
I have been working on this, and after I got my solver working I took a look at the approaches others had taken.
Most of the solvers out there are heuristic and do not guarantee optimality. Heuristics look at the number of squares and distribution of colors left unchosen, or the distance to the "farthest away" square. Combining a good heuristic with bounded DFS (or BFS with lookahead) results in solutions that are quite fast for the standard 14x14 grid.
I took a slightly different approach because I was interested in finding the provably optimal path, not just a 'good' one. I observed that the search space actually grows much slower than the branching factor of the search tree, because there are quite a lot of duplicate positions. (With a depth-first strategy it is therefore important to maintain a history to avoid a redundant work.) The effective branching factor seems closer to 3 than to 5.
The search strategy I took is to perform BFS up to a "midpoint" depth where the number of states would become infeasible, somewhere between 11 and 13 moves works best. Then, I examine each state at the midpoint depth and perform a new BFS starting with that as the root. Both of these BFS searches can be pruned by eliminating states found in previous depths, and the latter search can be bounded by the depth of the best-known solution. (A heuristic applied to the order of the subtrees examined in the second step would probably help some, as well.)
The other pruning technique which proved to be key to a fast solver is simply checking whether there are more than N colors left, if you are N or fewer steps away from the current best solution.
Once we know which midpoint state is on the path to an optimal solution, the program can perform DFS using that midpoint state as a goal (and pruning any path that selects a square not in the midpoint.) Or, it might be feasible to just build up the paths in the BFS steps, at the cost of some additional memory.
My solver is not super-fast but it can find a guaranteed optimal solution in no more than a couple minutes. (See http://markgritter.livejournal.com/673948.html, or the code at http://pastebin.com/ZcrS286b.)
Smashery's answer can be slightly tweaked. For the total number of moves estimate, if there are 'k' colors at maximum distance, add 'k-1' to the number of moves estimate.
More generally, for each color, consider the maximum distance at which the color can be cleared. This gives us a dictionary mapping some maximum distances to a non-zero number of colors that can be cleared at that distance. Sum value-1 across the keys and add this to the maximum distance to get a number of moves estimate.
Also, there are certain free cases. If at any point we can clear a color in one move, we can take that move without considering the other moves.
Here's an idea for implementing the graph to support Smashery's heuristic.
Represent each group of contiguous, same-colour squares in a disjoint set, and a list of adjacent groups of squares. A flood fill merges a set to all its adjacent sets, and merges the adjacency lists. This implicit graph structure will let you find the distance from the upper left corner to the farthest node.
I think you could consider the number of squares that match or don't match the current color. So, your heuristic measure of "distance" would be the number of squares on the board that are -not- the same color as your chosen color, rather than the number of steps.
A naive heuristic could be to use the number of colours left (minus 1) - this is admissible because it will take at least that many clicks to clear off the board.
I'm not certain, but I'm fairly sure that this could be solved greedily. You're trying to reduce the number of color fields to 1, so reducing more color fields earlier shouldn't be any less efficient than reducing fewer earlier.
1) Define a collection of existing like-colored groups.
2) For each collection, count the number of neighboring collections by color. The largest count of neighboring collections with a single color is the weight of this collection.
3) Take the collection with the highest count of neighbors with a single color, and fill it to that color. Merge the collections, and update the sort for all the collections affected by the merge (all the new neighbors of the merged collection).
Overall, I think this should actually compute in O(n log n) time, where n is the number of pixels and the log(n) only comes from maintaining the sorted list of weights.
I'm not sure if there needs to be a tie-breaker for when multiple fields have the same weight though. Maybe the tie-breaker goes to the color that's common to the most groups on the map.
Anyway, note that the goal of the game is to reduce the number of distinct color fields and not to maximize the perimeter, as different color schemes can occasionally make a larger field a sub-optimal choice. Consider the field:
3 3 3 3 3
1 1 1 1 1
1 1 1 1 1
2 2 2 2 2
1 2 2 2 2
The color 1 has the largest perimeter by any measure, but the color 2 is the optimal choice.
EDIT>
Scratch that. The example:
3 1 3 1 3
1 1 1 1 1
1 1 1 1 1
2 2 2 2 2
1 2 2 2 2
Invalidates my own greedy algorithm. But I'm not convinced that this is a simple graph traversal, since changing to a color shared by 2 neighbors visits 2 nodes, and not 1.
Color elimination should probably play some role in the heuristic.
1) It is never correct to fill with a color that is not already on the graph.
2) If there is one color field with a unique color, at least one fill will be required for it. It cannot be bundled with any other fills. I think this means that it's safe to fill it sooner rather than later.
3) The greedy algorithm for neighbor field count makes sense for a 2 color map.

Resources