Defining a special case of a subset-sum with complications

Defining a special case of a subset-sum with complications - algorithm

I have a problem that I have a number of questions about. First, I'm mostly looking for help describing and understanding the problem at hand. Solutions are always welcome, but most importantly I could use some advice from someone more experienced than I. Now, to the problem at hand:
I have a set of orders that each require some number of items. I also have several groupings of items that each contain some number of some items (call them groups). The goal is to find a subset of the orders that can be fulfilled using as few groups as possible and where the total number of items contained within the orders is between n and N.
Edit: The constraints on the number of items contained in the orders (n and N) are chosen independently.
To me at least, that's a really complicated way of saying the problem so I've been trying to re-phrase it as a knapsack problem (I suspect this might reduce to a subset-sum). To help my conceptual understanding of this I've started using the following definitions:
First, lets say that a dimension exists for each possible item, and somethings 'length' in that dimension is the number of that particular type of item it either has or requires.
From this, an order becomes an 'n-dimensional object' where its value in each dimension corresponds to the number of that item that it requires.
In addition, a group can be seen as an 'n-dimensional box' that has space in each dimension corresponding to the number of items it provides.
An objects value is equal to the sum of its length in all dimensions.
Boxes can be combined.
Given the above I've rephrased the problem to this:
What is the smallest combination of boxes that can hold a combination of items with value between n and N.
Question #1: Is this a correct/useful way to express the problem? Does it seem like I've missed anything obvious?
As I see it, since there are two combinations that I'm looking for I need to break the problem into two parts. So far I think breaking the problem up like this is a good step:
How many objects can box (or combination of boxes) X hold?
Check all (or preferably some small subset of) the possible combinations of boxes and pick the 'best'.
That makes it a little more manageable, but I'm still struggling with the details.
Question #2: Solved To solve the first part I think it's appropriate to say that the cost of an object is equal to the sum of its length in all dimensions, so is it's value. That places me into a subset-sum problem, right? Obviously it's a special case, but does this problem have a name?
Question #3: Solved I've been looking into subset-sum solutions a lot, but I don't understand how to apply them to something like this in multiple dimensions. I assume it's been done before, but I'm unsure where to start my research. Could someone either describe the principles at work or point me in a research direction?
Edit: After looking at everyone's feedback and digging into the terms I think I've found a good algorithm I can implement to solve part 1. Since I will have a very large number of dimensions compared to the number of items it looks like using a 'primal effective capacity heuristic (PECH)' will be a good fit. I'd be interested in hearing someones thoughts about it if they have experience with such an algorithm.
Question #4: For the second part, performance is a concern and I doubt it will be realistic to brute force it. So I intend to treat all combinations of boxes as a really big tree of solutions. The idea is to compute part 1 for all combinations of M-1 boxes where M is the total number of boxes. Somehow determine the 'best' couple box combinations from that set and do the same to their child nodes on the tree. Does this sound like it would help me arrive at something close to optimal? How would I choose the 'best' box combinations?
Thanks for reading! Suggestions for edits and clarifications are welcome.

Related

Number of restricted arrangements

I am looking for a faster way to solve this problem:
Let's suppose we have n boxes and n marbles (each of them has a different kind). Every box can contain only some kinds of marbles (It is shown it the example below), and only one marble fits inside one box. Please read the edits. The whole algorithm has been described in the post linked below but it was not precisely described, so I am asking for a reexplenation.
The question is: In how many ways can I put marbles inside the boxes in polynomial time?
Example:
n=3
Marbles: 2,5,3
Restrictions of the i-th box (i-th box can only contain those marbles): {5,2},{3,5,2},{3,2}
The answer: 3, because the possible positions of the marbles are: {5,2,3},{5,3,2},{2,5,3}
I have a solution which works in O(2^n), but it is too slow. There are also one limitation about the boxes tolerance, which I don't find very important, but I will write them also. Each box has it's own kind-restrictions, but there is one list of kinds which is accepted by all of them (in the example above this widely accepted kind is 2).
Edit: I have just found this question but I am not sure if it works in my case, and the dynamic solution is not well described. Could somebody clarify this? This question was answered 4 years ago, so I won't ask it there. https://math.stackexchange.com/questions/2145985/how-to-compute-number-of-combinations-with-placement-restrictions?rq=1
Edit#2: I also have to mention that excluding widely-accepted list the maximum size of the acceptance list of a box has 0 1 or 2 elements.
Edit#3: This question refers to my previos question(Allowed permutations of numbers 1 to N), which I found too general. I am attaching this link because there is also one more important information - the distance between boxes in which a marble can be put isn't higher than 2.

As noted in the comments, https://cs.stackexchange.com/questions/19924/counting-and-finding-all-perfect-maximum-matchings-in-general-graphs is this problem, with links to papers on how to tackle it, and counting the number of matchings is #P-complete. I would recommend finding those papers.
As for dynamic programming, simply write a recursive solution and then memoize it. (That's top down, and is almost always the easier approach.) For the stack exchange problem with a fixed (and fairly small) number of boxes, that approach is manageable. Unfortunately in your variation with a large number of boxes, the naive recursive version looks something like this (untested, probably buggy):
def solve (balls, box_rules):
ball_can_go_in = {}
for ball in balls:
ball_can_go_in[ball] = set()
for i in range(len(box_rules)):
for ball in box_rules[i]:
ball_can_go_in[ball].add(i)
def recursive_attempt (n, used_boxes):
if n = len(balls):
return 1
else:
answer = 0
for box in ball_can_go_in[balls[n]]:
if box not in used_boxes:
used_boxes.add(box)
answer += recursive_attempt(n+1, used_boxes)
used_boxes.remove(box)
return answer
return recursive_attempt(0, set())
In order to memoize it you have to construct new sets, maybe use bit strings, BUT you're going to find that you're calling it with subsets of n things. There are an exponential number of them. Unfortunately this will take exponential time AND use exponential memory.
If you replace the memoizing layer with an LRU cache, you can control how much memory it uses and probably still get some win from the memoizing. But ultimately you will still use exponential or worse time.
If you go that route, one practical tip is sort the balls by how many boxes they go in. You want to start with the fewest possible choices. Since this is trying to reduce exponential complexity, it is worth quite a bit of work on this sorting step. So I'd first pick the ball that goes in the fewest boxes. Then I'd next pick the ball that goes in the fewest new boxes, and break ties by fewest overall. The third ball will be fewest new boxes, break ties by fewest boxes not used by the first, break ties by fewest boxes. And so on.
The idea is to generate and discover forced choices and conflicts as early as possible. In fact this is so important that it is worth a search at every step to try to discover and record forced choices and conflicts that are already visible. It feels counterintuitive, but it really does make a difference.
But if you do all of this, the dynamic programming approach that was just fine for 5 boxes will become faster, but you'll still only be able to handle slightly larger problems than a naive solution. So go look at the research for better ideas than this dynamic programming approach.
(Incidentally the inclusion-exclusion approach has a term for every subset, so it also will blow up exponentially.)

How to find neighboring solutions in simulated annealing?

I'm working on an optimization problem and attempting to use simulated annealing as a heuristic. My goal is to optimize placement of k objects given some cost function. Solutions take the form of a set of k ordered pairs representing points in an M*N grid. I'm not sure how to best find a neighboring solution given a current solution. I've considered shifting each point by 1 or 0 units in a random direction. What might be a good approach to finding a neighboring solution given a current set of points?
Since I'm also trying to learn more about SA, what makes a good neighbor-finding algorithm and how close to the current solution should the neighbor be? Also, if randomness is involved, why is choosing a "neighbor" better than generating a random solution?

I would split your question into several smaller:
Also, if randomness is involved, why is choosing a "neighbor" better than generating a random solution?
Usually, you pick multiple points from a neighborhood, and you can explore all of them. For example, you generate 10 points randomly and choose the best one. By doing so you can efficiently explore more possible solutions.
Why is it better than a random guess? Good solutions tend to have a lot in common (e.g. they are close to each other in a search space). So by introducing small incremental changes, you would be able to find a good solution, while random guess could send you to completely different part of a search space and you'll never find an appropriate solution. And because of the curse of dimensionality random jumps are not better than brute force - there will be too many places to jump.
What might be a good approach to finding a neighboring solution given a current set of points?
I regret to tell you, that this question seems to be unsolvable in general. :( It's a mix between art and science. Choosing a right way to explore a search space is too problem specific. Even for solving a placement problem under varying constraints different heuristics may lead to completely different results.
You can try following:
Random shifts by fixed amount of steps (1,2...). That's your approach
Swapping two points
You can memorize bad moves for some time (something similar to tabu search), so you will use only 'good' ones next 100 steps
Use a greedy approach to generate a suboptimal placement, then improve it with methods above.
Try random restarts. At some stage, drop all of your progress so far (except for the best solution so far), raise a temperature and start again from a random initial point. You can do this each 10000 steps or something similar
Fix some points. Put an object at point (x,y) and do not move it at all, try searching for the best possible solution under this constraint.
Prohibit some combinations of objects, e.g. "distance between p1 and p2 must be larger than D".
Mix all steps above in different ways
Try to understand your problem in all tiniest details. You can derive some useful information/constraints/insights from your problem description. Assume that you can't solve placement problem in general, so try to reduce it to a more specific (== simpler, == with smaller search space) problem.
I would say that the last bullet is the most important. Look closely to your problem, consider its practical aspects only. For example, a size of your problems might allow you to enumerate something, or, maybe, some placements are not possible for you and so on and so forth. THere is no way for SA to derive such domain-specific knowledge by itself, so help it!
How to understand that your heuristic is a good one? Only by practical evaluation. Prepare a decent set of tests with obvious/well-known answers and try different approaches. Use well-known benchmarks if there are any of them.
I hope that this is helpful. :)

variant of knapsack problem

I have 'n' number of amounts (non-negative integers). My requirement is to determine an optimal set of amounts so that the sum of the combination is less than or equal to a given fixed limit and the total is as large as possible. There is no limit to the number of amounts that can be included in the optimal set.
for sake of example: amounts are 143,2054,546,3564,1402 and the given limit is 5000.
As per my understanding the knapsack problem has 2 attributes for each item (weight and value). But the problem stated above has only one attribute (amount). I hope that would make things simpler? :)
Can someone please help me with the algorithm or source code for solving this?

this is still an NP-hard problem, but if you want to (or have to) to do something like that, maybe this topic helps you out a bit:
find two or more numbers from a list of numbers that add up towards a given amount
where i solved it like this and NikiC modified it to be faster. only difference: that one was about getting the exact amount, not "as close as possible", but that would be only some small changes in code (and you'll have to translate it into the language you're using).
take a look at the comments in my code to understand what i'm trying to do, wich is, in short form:
calculating all possible combinations of the given parts and sum them up
if the result is the amount i'm looking for, save the solution to an array
at least, sort all possible solutions to get the one using the least parts
so you'll have to change:
save a solution if it's lower than the amount you're looking for
sort solutions by total amount instead of number of used parts

The book "Knapsack Problems" By Hans Kellerer, Ulrich Pferschy and David Pisinger calls this The Subset Sum Problem and dedicates an entire chapter (Ch 4) to it. The chapter is very comprehensive and covers algorithms as well as computational results.
Even though this problem is a special case of the knapsack problem, it is still NP-hard.

Writing Simulated Annealing algorithm for 0-1 knapsack in C#

I'm in the process of learning about simulated annealing algorithms and have a few questions on how I would modify an example algorithm to solve a 0-1 knapsack problem.
I found this great code on CP:
http://www.codeproject.com/KB/recipes/simulatedAnnealingTSP.aspx
I'm pretty sure I understand how it all works now (except the whole Bolzman condition, as far as I'm concerned is black magic, though I understand about escaping local optimums and apparently this does exactly that). I'd like to re-design this to solve a 0-1 knapsack-"ish" problem. Basically I'm putting one of 5,000 objects in 10 sacks and need to optimize for the least unused space. The actual "score" I assign to a solution is a bit more complex, but not related to the algorithm.
This seems easy enough. This means the Anneal() function would be basically the same. I'd have to implement the GetNextArrangement() function to fit my needs. In the TSM problem, he just swaps two random nodes along the path (ie, he makes a very small change each iteration).
For my problem, on the first iteration, I'd pick 10 random objects and look at the leftover space. For the next iteration, would I just pick 10 new random objects? Or am I best only swapping out a few of the objects, like half of them or only even one of them? Or maybe the number of objects I swap out should be relative to the temperature? Any of these seem doable to me, I'm just wondering if someone has some advice on the best approach (though I can mess around with improvements once I have the code working).
Thanks!
Mike

With simulated annealing, you want to make neighbour states as close in energy as possible. If the neighbours have significantly greater energy, then it will just never jump to them without a very high temperature -- high enough that it will never make progress. On the other hand, if you can come up with heuristics that exploit lower-energy states, then exploit them.
For the TSP, this means swapping adjacent cities. For your problem, I'd suggest a conditional neighbour selection algorithm as follows:
If there are objects that fit in the empty space, then it always puts the biggest one in.
If no objects fit in the empty space, then pick an object to swap out -- but prefer to swap objects of similar sizes.
That is, objects have a probability inverse to the difference in their sizes. You might want to use something like roulette selection here, with the slice size being something like (1 / (size1 - size2)^2).

Ah, I think I found my answer on Wikipedia.. It suggests moving to a "neighbor" state, which usually implies changing as little as possible (like swapping two cities in a TSM problem)..
From: http://en.wikipedia.org/wiki/Simulated_annealing
"The neighbours of a state are new states of the problem that are produced after altering the given state in some particular way. For example, in the traveling salesman problem, each state is typically defined as a particular permutation of the cities to be visited. The neighbours of some particular permutation are the permutations that are produced for example by interchanging a pair of adjacent cities. The action taken to alter the solution in order to find neighbouring solutions is called "move" and different "moves" give different neighbours. These moves usually result in minimal alterations of the solution, as the previous example depicts, in order to help an algorithm to optimize the solution to the maximum extent and also to retain the already optimum parts of the solution and affect only the suboptimum parts. In the previous example, the parts of the solution are the parts of the tour."
So I believe my GetNextArrangement function would want to swap out a random item with an item unused in the set..

Divide people into teams for most satisfaction

Just a curiosity question. Remember when in class groupwork the professor would divide people up into groups of a certain number (n)?
Some of my professors would take a list of n people one wants to work with and n people one doesn't want to work with from each student, and then magically turn out groups of n where students would be matched up with people they prefer and avoid working with people they don't prefer.
To me this algorithm sounds a lot like a Knapsack problem, but I thought I would ask around about what your approach to this sort of problem would be.
EDIT: Found an ACM article describing something exactly like my question. Read the second paragraph for deja vu.

To me it sounds more like some sort of clique problem.
The way I see the problem, I'd set up the following graph:
Vertices would be the students
Two students would be connected by an edge if both of these following things hold:
At least one of the two students wants to work with the other one.
None of the two students doesn't want to work with the other one.
It is then a matter of partitioning the graph into cliques of size n. (Assuming the number of students is divisible by n)
If this was not possible, I'd probably let the first constraint on the edges slip, and have edges between two people as long as neither of them explicitly says that they don't want to work with the other one.
As for an approach to solving this efficiently, I have no idea, but this should hopefully get you closer to some insight into the problem.

You could model this pretty easily as a clustering problem and you wouldn't even really need to define a space, you could actually just define the distances:
Make two people very close if they both want to work together.
Close if one of them wants to work with the other.
Medium distance if there's just apathy.
Far away if either one doesn't want to work with the other.
Then you could just find clusters, yay. Then split up any clusters of overly large size, with confidence that the people in the clusters would all be fine working together.

This problem can be brute-forced, hence my approach would be first to brute force it, then fix it when I get a better idea.

There are a couple of algorithms you could use. A great example is the so called "stable marriage problem", which has a perfect solution. You can read more about it here:
http://en.wikipedia.org/wiki/Stable_marriage_problem
The stable marriage problem only works with two groups of people (men/women in the marriage case). If you want to form pair you can use a variation, the stable roommate problem. In this case you create pairs but everybody comes from a single pool.
But you asked for a team (which I translate into >2 people per team). In this case you could let everybody fill in their best to worst match and then run the

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio