Complex conveyor capacity calculation (graph) - fun game algorithm - algorithm

I am working on a game in Javascript where people can create conveyor lines. Imagine postal service, packages come in, travel on conveyors, get processed and end in trucks. These conveyors can be of any complexity, can contain loops etc.
Example of a rather complex (and inefficient) conveyor
Example conveyor with bad graphics
Example graph of a conveyor:
Conveyor example
number in circle means current situation - number of packages on that conveyor "part".
Rules:
IN components generate postal packages that are put to belt component.
Belt component pushes postal packages to OUT components. OUT removes packages from the graph.
Each component has MAX capacity it can hold at any given moment.
There can be 0..N IN components. (if 0, there can be packages already on the conveyor)
There can be 0..N OUT components. (if 0, meaning whole conveyor will get full)
Calculations are tick based, meaning that "group of packages" can travel only one step at a time.
Each merge distributes packages evenly. (so if two out going lines, then each gets packages/2)
Each join accepts packages evenly. (so if two input lines, each can give max max/2 packages. )
Packages DO NOT have identity, they are just total numbers.
Packages can be split, so 1 package can become 0.5 and 0.5 packages. (so float as number type)
Problem:
How to solve such a graph fast in two steps:
Generate necessary data structure/graph (cache) (Performance: <200ms for 1000 elements). In reality only needs to be recalculated if setup changes before next package travel calculation.
Calculate package travel. (Performance: <20ms for 1000 elements)
Some problematic solutions
OPTION A: Solve per component. (for loop each component and solve based on current situation) Doesn't work, as loops, joins, merges cause situations that it doesn't move packages evenly as required. Also it doesn't honour rule, that "package" can travel only from one component to another. Based on loop order it may be that package gets to end immediately). Additionally, it may prefer only one input all the time, other one will be always blocked.
OPTION B: Calculate for every component what they want to do and find conflicts. (for example A and B want to push to C 10 packages EACH, but C can accept only 12 in total. Fix error, so A and B both can move only 6 each, meaning their stock will be at least 4 each for next tick. This may cause more errors, as A and B did not clear their content and they can accept less). Now find more errors and fix. Repeat until no errors or "max repeats reached" and just quick fix errors = So that it actually doesn't work as expected based on rules.
problem with this is, that I think it will get stuck for certain setup and may break.
OPTION C: Start from OUT elements and try to move packages out. Problems arise if loops are introduced or areas without OUT elements.
Verdict
I am stuck and don't even know what to google for, so all ideas are welcome :)
I figured it is a graph problem, but maybe there are other approaches.

As long as you dont need the optimal solution to the problem, I think you could:
1. At the beginning, use a topological sort order of the graph, and "tag"
each node with its position in that order so you have a criteria as to
which node is better than others. The final node has the maximum "tag"
(check link to learn more about topological sort, not hard)
2. Mark all nodes as not visited
3. Get the *node A* with max(tag) still not visited:
3.a. Get each *node B* that is a target from *node A*
(i.e. at the end of the arrow) starting from the one with maximum tag
and finishing with the one with minimum tag:
3.a.a. push all packages from *node A* to *node B* until it is filled or
*node A* is empty.
You can find the topological sort definition and a bunch of algorithms in https://en.wikipedia.org/wiki/Topological_sorting
Best of lucks :)

Related

Ideas for heuristically solving travelling salesman with extra constraints

I'm trying to come up with a fast and reasonably optimal algorithm to solve the following TSP/hamiltonian-path-like problem:
A delivery vehicle has a number of pickups and dropoffs it needs to
perform:
For each delivery, the pickup needs to come before the
dropoff.
The vehicle is quite small and the packages vary in size.
The total carriage cannot exceed some upper bound (e.g. 1 cubic
metre). Each delivery has a deadline.
The planner can run mid-route, so the vehicle will begin with a number of jobs already picked up and some capacity already taken up.
A near-optimal solution should minimise the total cost (for simplicity, distance) between each waypoint. If a solution does not exist because of the time constraints, I need to find a solution that has the fewest number of late deliveries. Some illustrations of an example problem and a non-optimal, but valid solution:
I am currently using a greedy best first search with backtracking bounded to 100 branches. If it fails to find a solution with on-time deliveries, I randomly generate as many as I can in one second (the most computational time I can spare) and pick the one with the fewest number of late deliveries. I have looked into linear programming but can't get my head around it - plus I would think it would be inappropriate given it needs to be run very frequently. I've also tried algorithms that require mutating the tour, but the issue is mutating a tour nearly always makes it invalid due to capacity constraints and precedence. Can anyone think of a better heuristic approach to solving this problem? Many thanks!
Safe Moves
Here are some ideas for safely mutating an existing feasible solution:
Any two consecutive stops can always be swapped if they are both pickups, or both deliveries. This is obviously true for the "both deliveries" case; for the "both pickups" case: if you had room to pick up A, then pick up B without delivering anything in between, then you have room to pick up B first, then pick up A. (In fact a more general rule is possible: In any pure-delivery or pure-pickup sequence of consecutive stops, the stops can be rearranged arbitrarily. But enumerating all the possibilities might become prohibitive for long sequences, and you should be able to get most of the benefit by considering just pairs.)
A pickup of A can be swapped with any later delivery of something else B, provided that A's original pickup comes after B was picked up, and A's own delivery comes after B's original delivery. In the special case where the pickup of A is immediately followed by the delivery of B, they can always be swapped.
If there is a delivery of an item of size d followed by a pickup of an item of size p, then they can be swapped provided that there is enough extra room: specifically, provided that f >= p, where f is the free space available before the delivery. (We already know that f + d >= p, otherwise the original schedule wouldn't be feasible -- this is a hint to look for small deliveries to apply this rule to.)
If you are starting from purely randomly generated schedules, then simply trying all possible moves, greedily choosing the best, applying it and then repeating until no more moves yield an improvement should give you a big quality boost!
Scoring Solutions
It's very useful to have a way to score a solution, so that they can be ordered. The nice thing about a score is that it's easy to incorporate levels of importance: just as the first digit of a two-digit number is more important than the second digit, you can design the score so that more important things (e.g. deadline violations) receive a much greater weight than less important things (e.g. total travel time or distance). I would suggest something like 1000 * num_deadline_violations + total_travel_time. (This assumes of course that total_travel_time is in units that will stay beneath 1000.) We would then try to minimise this.
Managing Solutions
Instead of taking one solution and trying all the above possible moves on it, I would instead suggest using a pool of k solutions (say, k = 10000) stored in a min-heap. This allows you to extract the best solution in the pool in O(log k) time, and to insert new solutions in the same time.
You could initially populate the pool with randomly generated feasible solutions; then on each step, you would extract the best solution in the pool, try all possible moves on it to generate child solutions, and insert any child solutions that are better than their parent back into the pool. Whenever the pool doubles in size, pull out the first (i.e. best) k solutions and make a new min-heap with them, discarding the old one. (Performing this step after the heap grows to a constant multiple of its original size like this has the nice property of leaving the amortised time complexity unchanged.)
It can happen that some move on solution X produces a child solution Y that is already in the pool. This wastes memory, which is unfortunate, but one nice property of the min-heap approach is that you can at least handle these duplicates cheaply when they arrive at the front of the heap: all duplicates will have identical scores, so they will all appear consecutively when extracting solutions from the top of the heap. Thus to avoid having duplicate solutions generate duplicate children "down through the generations", it suffices to check that the new top of the heap is different from the just-extracted solution, and keep extracting and discarding solutions until this holds.
A note on keeping worse solutions: It might seem that it could be worthwhile keeping child solutions even if they are slightly worse than their parent, and indeed this may be useful (or even necessary to find the absolute optimal solution), but doing so has a nasty consequence: it means that it's possible to cycle from one solution to its child and back again (or possibly a longer cycle). This wastes CPU time on solutions we have already visited.
You are basically combining the Knapsack Problem with the Travelling Salesman Problem.
Your main problem here seems to be actually the Knapsack Problem, rather then the Travelling Salesman Problem, since it has the one hard restriction (maximum delivery volume). Maybe try to combine the solutions for the Knapsack Problem with the Travelling Salesman.
If you really only have one second max for calculations a greedy algorithm with backtracking might actually be one of the best solutions that you can get.

Algorithm for container planning

OK guys I have a real world problem, and I need some algorithm to figure it out.
We have a bunch of orders waiting to be shipped, each order will have a volume (in cubic feet), let's say, V1, V2, V3, ..., Vn
The shipping carrier can provide us four types of containers, and the volume/price of the containers are listed below:
Container Type 1: 2700 CuFt / $2500;
Container Type 2: 2350 CuFt / $2200;
Container Type 3: 2050 CuFt / $2170;
Container Type 4: 1000 CuFt / $1700;
No single order will exceed 2700 CuFt but likely to pass 1000 CuFt.
Now we need a program to get an optimized solution on freight charges, that is, minimum prices.
I appreciate any suggestions/ideas.
EDIT:
My current implementation is using biggest container at first, apply the first fit decreasing algorithm for bin packing to get result, then parse through all containers and adjust container sizes according to the content volume...
I wrote a similar program when I was working for a logistics company. This is a 3-dimensional bin-packing problem, which is a bit trickier than a classic 1-dimensional bin-packing problem - the person at my job who wrote the old box-packing program that I was replacing made the mistake of reducing everything to a 1-dimensional bin-packing problem (volumes of boxes and volumes of packages), but this doesn't work: this problem formulation states that three 8x8x8 packages would fit into a 12x12x12 box, but this would leave you with overlapping packages.
My solution was to use what's called a guillotine cut heuristic: when you put a package into the shipping container then this produces three new empty sub-containers: assuming that you placed the package in the back bottom left of the container, then you would have a new empty sub-container in the space in front of the package, a new empty sub-container in the space to the right of the package, and a new empty sub-container on top of the package. Be certain not to assign the same empty space to multiple sub-containers, e.g. if you're not careful then you'll assign the section in the front-right of the container to the front sub-container and to the right sub-container, you'll need to pick just one to which to assign it. This heuristic will rule out some optimal solutions, but it's fast. (As a concrete example, say you have a 12x12x12 box and you put an 8x8x8 package into it - this would leave you with a 4x12x12 empty sub-container, a 4x8x12 empty sub-container, and a 4x8x8 empty sub-container. Note that the wrong way to divide up the free space is to have three 4x12x12 empty sub-containers - this is going to result in overlapping packages. If the box or package weren't cubes then you'd have more than one way to divide up the free space - you'd need to decide whether to maximize the size of one or two sub-containers or to instead try to create three more or less equal sub-containers.) You need to use a reasonable criteria for ordering/selecting the sub-containers or else the number of sub-containers will grow exponentially; I solved this problem by filling the smallest sub-containers first and removing any sub-container that was too small to contain a package, which kept the quantity of sub-containers to a reasonable number.
There are several options you have: what containers to use, how to rotate the packages going into the container (there are usually six ways to rotate a package, but not all rotations are legal for some packages e.g. a "this end up" package will only have two rotations), how to partition the sub-containers (e.g. do you assign the overlapping space to the right sub-container or to the front sub-container), and in what order you pack the container. I used a randomized algorithm that approximated a best-fit decreasing heuristic (using volume for the heuristic) and that favored creating one large sub-container and two small sub-containers rather than three medium-sized sub-containers, but I used a random number generator to mix things up (so the greatest probability is that I'd select the largest package first, but there would be a lesser probability that I'd select the second-largest package first, and so on, with the lowest probability being that I'd select the smallest package first; likewise, there was a chance that I'd favor creating three medium-sized sub-containers instead of one large and two small, there was a chance that I'd use three medium-sized boxes instead of two large boxes, etc). I then ran this in parallel several dozen times and selected the result that cost the least.
There are other heuristics I considered, for example the extreme point heuristic is slower (while still running in polynomial time - IIRC it's a cubic time solution, whereas the guillotine cut heuristic is linear time, and at the other extreme the branch and bound algorithm finds the optimal solution and runs in exponential time) but more accurate (specifically, it finds some optimal solutions that are ruled out by the guillotine cut heuristic); however, my use case was that I was supposed to produce a fast shipping estimate, and so the extreme point heuristic wasn't appropriate (it was too slow and it was "too accurate" - I would have needed to add 10% or 20% to its results to account for the fact that the people actually packing the boxes would inevitably make sub-optimal choices).
I don't know the name of a program offhand, but there's probably some commercial software that would solve this for you, depending on how much a good solution is worth to you.
Zim Zam's answer is good for big boxes, but assuming relatively small boxes you can use a much simpler algorithm that amounts to solving an integral linear equation with a constraint:
Where a, b, c and d are integers being the number of each type of container used:
Given,
2700a + 2350b + 2050c + 1000d <= V (where V is the total volume of the orders)
You want to find a, b, c, and d such that the following function is minimized:
Total Cost C = 2500a + 2200b + 2170c + 1700d
It would seem you can brute force this problem (which is NP hard). Calculate every possible viable combination of a, b, c and d, and calculate the total cost for each combination. Note that no solution will ever use more than 1 container of type d, so that cuts down the number of possible combinations.
I am assuming orders can be split between containers.

Optimal selection election algorithm

Given a bunch of sets of people (similar to):
[p1,p2,p3]
[p2,p3]
[p1]
[p1]
Select 1 from each set, trying to minimize the maximum number of times any one person is selected.
For the sets above, the max number of times a given person MUST be selected is 2.
I'm struggling to get an algorithm for this. I don't think it can be done with a greedy algorithm, more thinking along the lines of a dynamic programming solution.
Any hints on how to go about this? Or do any of you know any good websites about this stuff that I could have a look at?
This is neither dynamic nor greedy. Let's look at a different problem first -- can it be done by selecting every person at most once?
You have P people and S sets. Create a graph with S+P vertices, representing sets and people. There is an edge between person pi and set si iff pi is an element of si. This is a bipartite graph and the decision version of your problem is then equivalent to testing whether the maximum cardinality matching in that graph has size S.
As detailed on that page, this problem can be solved by using a maximum flow algorithm (note: if you don't know what I'm talking about, then take your time to read it now, as you won't understand the rest otherwise): first create a super-source, add an edge linking it to all people with capacity 1 (representing that each person may only be used once), then create a super-sink and add edges linking every set to that sink with capacity 1 (representing that each set may only be used once) and run a suitable max-flow algorithm between source and sink.
Now, let's consider a slightly different problem: can it be done by selecting every person at most k times?
If you paid attention to the remarks in the last paragraph, you should know the answer: just change the capacity of the edges leaving the super-source to indicate that each person may be used more than once in this case.
Therefore, you now have an algorithm to solve the decision problem in which people are selected at most k times. It's easy to see that if you can do it with k, then you can also do it with any value greater than k, that is, it's a monotonic function. Therefore, you can run a binary search on the decision version of the problem, looking for the smallest k possible that still works.
Note: You could also get rid of the binary search by testing each value of k sequentially, and augmenting the residual network obtained in the last run instead of starting from scratch. However, I decided to explain the binary search version as it's conceptually simpler.

Solution to 2010 ACM Problem: Castles

Post your best solutions! You can find the full problem description and examples here: ACM 2010 problems (pdf)
You have a set of castles connected by roads, and you want to conquer all the castles with the minimum number of soldiers. Each castle has three properties: the minimum number of soldiers required to take it, the number of soldiers that will die taking it, and the number of soldiers that must be left behind to hold it.
There is exactly one path between any two castles (the roads form a tree). You can pick any castle as the first target, but you must follow the roads afterward. You can only travel a road twice. Your mobile army must stay in one group.
I would solve this this way:
Bruteforce all starting castles (100 max)
For each starting castle:
fill up array
need[i] and cost[i] means that when you go from chosen starting point to i, and trying to conqure subtree starting at i, you would need at least need[i] solders and cost[i] solders would die.
min_solder_to_attack_castle[i] goes from input file.
Obviously, need[] and cost[] values are obvious for "terminal" castles.
Then, for each castle which have known need[] and cost[] values for all "childs" you calculate need and cost for this castle this way:
cost[i] = sum(cost[childs])
Getting need[i] is the tricky part: we know it's somewhere between max(min_solder_to_attack_castle[all childs]), and max(min_solder_to_attack_castle[all childs])+max(cost[all childs]). Trying all variants would cost us (number_of_childs)! and potentially be n!, and probably optimizations would help here, here is where I stopped for now.
I would solve this in reverse - you want to have as few men "wasted" after taking the last castle as possible. Since we can't pass through a castle without taking it, we will obviously end at a "leaf" castle.
It is straightforward to walk backwards from all leaf castles to determine the total number of men "wasted" on each subtree - then it's simply a matter of walking the subtrees in the right order.
Elementary, my dear Watson.
The first thing to realize is that, as far as the numbers go, there is no difference between soldiers lost and soldiers left behind. So we can reduce the castle properties to soldiers lost and required.
The second thing to realize is that if you go down a branch of the tree, you must complete the whole branch for returning. This allows us to reduce the entire branch to a single "mega castle" with aggregate soldiers required and lost.
So, assuming we can compute the costs of branches, we're left with two problems: where to start, and how to choose which branch to descend first. I'm just going to brute force the start position, but it might be possible to do better. Choosing which branch to descend is a bit harder. The number of soldiers of lost is trivial, but the number required is not. There are n! possibilities, so we can't just try them all.
Instead of thinking about how many soldiers are lost/required at each castle, I'm going to go backwards. Start with 0 soldiers, and add them when you attack a castle, ensuring we end up with at least the required amount. There are two cases: either there is a castle which we meet the requirement for, or there is not. If there is, (un)do that castle (this is optimal, because we used the minimum number of soldiers). If there isn't, add an additional soldier and try again (this is optimal, because we must add a soldier to continue). Now it should become obvious: we want to (un)do castle with requirements closest to the number lost first. Just sort by (required minus lost) and that's your order.
So the final algorithm looks like this:
Brute force the starting point
Recursively reduce branches into aggregate castles (memoize this result, for the other starting points)
Visit branches in descending (required minus lost) order.
The running time is O(n * c^2 * lg(c)), where n is the number of castles and c is the maximum connectivity of any single castle. This worse because there are at most nc 'branches', and a node takes at most clg(c) time to evaluate after its branches have been evaluated. [The branches and nodes are computed at most once thanks to memoization]
I think it's possible to do better, but I'm not sure how.

Looking for a multidimensional optimization algorithm

Problem description
There are different categories which contain an arbitrary amount of elements.
There are three different attributes A, B and C. Each element does have an other distribution of these attributes. This distribution is expressed through a positive integer value. For example, element 1 has the attributes A: 42 B: 1337 C: 18. The sum of these attributes is not consistent over the elements. Some elements have more than others.
Now the problem:
We want to choose exactly one element from each category so that
We hit a certain threshold on attributes A and B (going over it is also possible, but not necessary)
while getting a maximum amount of C.
Example: we want to hit at least 80 A and 150 B in sum over all chosen elements and want as many C as possible.
I've thought about this problem and cannot imagine an efficient solution. The sample sizes are about 15 categories from which each contains up to ~30 elements, so bruteforcing doesn't seem to be very effective since there are potentially 30^15 possibilities.
My model is that I think of it as a tree with depth number of categories. Each depth level represents a category and gives us the choice of choosing an element out of this category. When passing over a node, we add the attributes of the represented element to our sum which we want to optimize.
If we hit the same attribute combination multiple times on the same level, we merge them so that we can stripe away the multiple computation of already computed values. If we reach a level where one path has less value in all three attributes, we don't follow it anymore from there.
However, in the worst case this tree still has ~30^15 nodes in it.
Does anybody of you can think of an algorithm which may aid me to solve this problem? Or could you explain why you think that there doesn't exist an algorithm for this?
This question is very similar to a variation of the knapsack problem. I would start by looking at solutions for this problem and see how well you can apply it to your stated problem.
My first inclination to is try branch-and-bound. You can do it breadth-first or depth-first, and I prefer depth-first because I think it's cleaner.
To express it simply, you have a tree-walk procedure walk that can enumerate all possibilities (maybe it just has a 5-level nested loop). It is augmented with two things:
At every step of the way, it keeps track of the cost at that point, where the cost can only increase. (If the cost can also decrease, it becomes more like a minimax game tree search.)
The procedure has an argument budget, and it does not search any branches where the cost can exceed the budget.
Then you have an outer loop:
for (budget = 0; budget < ... ; budget++){
walk(budget);
// if walk finds a solution within the budget, halt
}
The amount of time it takes is exponential in the budget, so easier cases will take less time. The fact that you are re-doing the search doesn't matter much because each level of the budget takes as much or more time than all the previous levels combined.
Combine this with some sort of heuristic about the order in which you consider branches, and it may give you a workable solution for typical problems you give it.
IF that doesn't work, you can fall back on basic heuristic programming. That is, do some cases by hand, and pay attention to how you did it. Then program it the same way.
I hope that helps.

Resources