Algorithm for container planning

Algorithm for container planning - algorithm

OK guys I have a real world problem, and I need some algorithm to figure it out.
We have a bunch of orders waiting to be shipped, each order will have a volume (in cubic feet), let's say, V1, V2, V3, ..., Vn
The shipping carrier can provide us four types of containers, and the volume/price of the containers are listed below:
Container Type 1: 2700 CuFt / $2500;
Container Type 2: 2350 CuFt / $2200;
Container Type 3: 2050 CuFt / $2170;
Container Type 4: 1000 CuFt / $1700;
No single order will exceed 2700 CuFt but likely to pass 1000 CuFt.
Now we need a program to get an optimized solution on freight charges, that is, minimum prices.
I appreciate any suggestions/ideas.
EDIT:
My current implementation is using biggest container at first, apply the first fit decreasing algorithm for bin packing to get result, then parse through all containers and adjust container sizes according to the content volume...

I wrote a similar program when I was working for a logistics company. This is a 3-dimensional bin-packing problem, which is a bit trickier than a classic 1-dimensional bin-packing problem - the person at my job who wrote the old box-packing program that I was replacing made the mistake of reducing everything to a 1-dimensional bin-packing problem (volumes of boxes and volumes of packages), but this doesn't work: this problem formulation states that three 8x8x8 packages would fit into a 12x12x12 box, but this would leave you with overlapping packages.
My solution was to use what's called a guillotine cut heuristic: when you put a package into the shipping container then this produces three new empty sub-containers: assuming that you placed the package in the back bottom left of the container, then you would have a new empty sub-container in the space in front of the package, a new empty sub-container in the space to the right of the package, and a new empty sub-container on top of the package. Be certain not to assign the same empty space to multiple sub-containers, e.g. if you're not careful then you'll assign the section in the front-right of the container to the front sub-container and to the right sub-container, you'll need to pick just one to which to assign it. This heuristic will rule out some optimal solutions, but it's fast. (As a concrete example, say you have a 12x12x12 box and you put an 8x8x8 package into it - this would leave you with a 4x12x12 empty sub-container, a 4x8x12 empty sub-container, and a 4x8x8 empty sub-container. Note that the wrong way to divide up the free space is to have three 4x12x12 empty sub-containers - this is going to result in overlapping packages. If the box or package weren't cubes then you'd have more than one way to divide up the free space - you'd need to decide whether to maximize the size of one or two sub-containers or to instead try to create three more or less equal sub-containers.) You need to use a reasonable criteria for ordering/selecting the sub-containers or else the number of sub-containers will grow exponentially; I solved this problem by filling the smallest sub-containers first and removing any sub-container that was too small to contain a package, which kept the quantity of sub-containers to a reasonable number.
There are several options you have: what containers to use, how to rotate the packages going into the container (there are usually six ways to rotate a package, but not all rotations are legal for some packages e.g. a "this end up" package will only have two rotations), how to partition the sub-containers (e.g. do you assign the overlapping space to the right sub-container or to the front sub-container), and in what order you pack the container. I used a randomized algorithm that approximated a best-fit decreasing heuristic (using volume for the heuristic) and that favored creating one large sub-container and two small sub-containers rather than three medium-sized sub-containers, but I used a random number generator to mix things up (so the greatest probability is that I'd select the largest package first, but there would be a lesser probability that I'd select the second-largest package first, and so on, with the lowest probability being that I'd select the smallest package first; likewise, there was a chance that I'd favor creating three medium-sized sub-containers instead of one large and two small, there was a chance that I'd use three medium-sized boxes instead of two large boxes, etc). I then ran this in parallel several dozen times and selected the result that cost the least.
There are other heuristics I considered, for example the extreme point heuristic is slower (while still running in polynomial time - IIRC it's a cubic time solution, whereas the guillotine cut heuristic is linear time, and at the other extreme the branch and bound algorithm finds the optimal solution and runs in exponential time) but more accurate (specifically, it finds some optimal solutions that are ruled out by the guillotine cut heuristic); however, my use case was that I was supposed to produce a fast shipping estimate, and so the extreme point heuristic wasn't appropriate (it was too slow and it was "too accurate" - I would have needed to add 10% or 20% to its results to account for the fact that the people actually packing the boxes would inevitably make sub-optimal choices).
I don't know the name of a program offhand, but there's probably some commercial software that would solve this for you, depending on how much a good solution is worth to you.

Zim Zam's answer is good for big boxes, but assuming relatively small boxes you can use a much simpler algorithm that amounts to solving an integral linear equation with a constraint:
Where a, b, c and d are integers being the number of each type of container used:
Given,
2700a + 2350b + 2050c + 1000d <= V (where V is the total volume of the orders)
You want to find a, b, c, and d such that the following function is minimized:
Total Cost C = 2500a + 2200b + 2170c + 1700d
It would seem you can brute force this problem (which is NP hard). Calculate every possible viable combination of a, b, c and d, and calculate the total cost for each combination. Note that no solution will ever use more than 1 container of type d, so that cuts down the number of possible combinations.
I am assuming orders can be split between containers.

Related

Is there an established algorithm for reaching a goal through the sequential application of known cause and effect tuples?

Let's say that I want to create a fictional product called g.
I know that:
a+b=c
x+y=z
and finally that
c+z=g
So clearly if I start off with products
a,b,x,y
I can create g in three steps:
a+b=c
x+y=z
c+z=g
So a naive algorithm for reaching a goal could be:
For each component required to make the goal (here c and z), recursively find a cause and effect tuple that can create that component.
But there are snags with that algorithm.
For example, let's say that my cause and effect tuples are:
a+b=c
x+y+c=z (NOTE THE EXTRA 'c' REQUIRED!!)
c+z=g
Now when I run my naive algorithm I will do
a+b=c
x+y+c=z (Using up the 'c' I created in the previous step)
c+z=g (Uh oh! I can't do this because I don't have the 'c' any more)
It seems like quite a basic area of research - how we can combine known causes and effects to reach a goal - so I suspect that work must have been done on it, but I've looked around and couldn't find anything and I don't really know where to look now.
Many thanks for any assistance!

Assuming that using a product consumes one item of it, which can then be replaced by producing a second item of that product, I would model this by giving each product a cost and working out how to minimize the cost of the final product. In this case I think this is the same as minimizing the costs of every product, because minimizing the cost of an input never increases the cost of any output. You end up with loads of equations like
a=min(b+c, d+e, f+g)
where a is the cost of a product that can be produced in alternative ways, one way consuming units with cost of b and c, another way consuming units with costs of d and e, another way consuming units with costs of f and g, and so on. There may be cycles in the associated graph.
One way to solve such a problem would be to start by assigning the cost infinity to all products not originally provided as inputs (with costs) and then repeatedly reducing costs where equations show a way of calculating a cost less than the current cost, keeping track of re-calculations caused by inputs not yet considered or reductions in costs. At each stage I would consider the consequences of the smallest input or recalculated value available, with ties broken by a second component which amounts to a tax on production. The outputs produced from a calculation are always at least as large as any input, so newly produced values are always larger than the recalculated value considered, and the recalculated value considered at each stage never decreases, which should reduce repeated recalculation.
Another way would be to turn this into a linear program and throw it at a highly optimized guaranteed polynomial time (at least in practice) linear programming solver.
a = min(b+c, d+e, f+g)
becomes
a = b+c-x
a = d+e-y
a = f+g-z
x >= 0
y >= 0
z >= 0
minimize sum(x+y+z+....)

Dealing with huge amounts of data in continuum percolation

For a group project I'm currently part of, we have to simulate the following:
Take a square with side length n. Distribute some amount of unit disks uniformly over the square. Find the number of disks required until there is a connected component of disks stretching from the left side to the right side of the square. Then find the number of disks required until the square is entirely filled with disks.
It's not explicitly stated, but we assume this is to be done in Matlab, as we use it in other parts of this course. For the first problem, finding a path from the left to the right, I've written two methods that work. One uses an adjacency list and the graph tools in Matlab for finding connected nodes. This method is fast enough but takes up far too much memory for what we need to do. The other method uses a recursive search algorithm without storing adjacency information, but is far too slow.
The problem arises when we need the square to be of sizes n=1000 and n=10 000. We predict this will require some tens of millions of circles, or more, and we simply do not see how we're supposed to handle this as any adjacency list or matrix would be ridiculously large, and not using one seems to require a huge amount of time. Any thoughts and ideas are appreciated, thanks

Lets reformulate your problems as decision problems like this:
Starting with seed s for the PRNG, randomly distribute m unit disks in an n x n square, and determine whether or not they make a connected path from the left side to the right.
and:
Starting with seed s for the PRNG, randomly distribute m unit disks in an n x n square, and determine whether or not they cover the entire square.
If you can solve these decision problems, then you can find the minimum value for m by doing a binary search over the possible values.
You can solve these decision problems without using a whole lot of RAM by generating your disks like so:
Initialize your PRNG with the given seed s
For each disk 1 ... m, randomly select the column it will go in (that's floor(center.x)). Accumulate counts for all the columns to determine how many disks will go in each one. Let COUNT(col) be the number of disks to generate in column col
For each col in 0 ... n-1, generate the COUNT(col) disks uniformly distributed within that column.
Now you are generating your disks predominantly from left to right. Both of the above decision problems have solutions that work from left to right, and only need to see the disks in one or two columns at a time, so you only need to remember one or two columns worth of disks.
For the full coverage problem, work from left to right and determine whether each column in the squrare is completely covered, using disk from that column and adjacent columns. No other disks could possibly reach.
For the left-right path problem, work from left to right using union-find to keep track of which disks in the current column are connected to the left side and each other. When you get to the last column you can check to see if any disks in the last column are connected to the left side and extend past the right.

Complex conveyor capacity calculation (graph) - fun game algorithm

I am working on a game in Javascript where people can create conveyor lines. Imagine postal service, packages come in, travel on conveyors, get processed and end in trucks. These conveyors can be of any complexity, can contain loops etc.
Example of a rather complex (and inefficient) conveyor
Example conveyor with bad graphics
Example graph of a conveyor:
Conveyor example
number in circle means current situation - number of packages on that conveyor "part".
Rules:
IN components generate postal packages that are put to belt component.
Belt component pushes postal packages to OUT components. OUT removes packages from the graph.
Each component has MAX capacity it can hold at any given moment.
There can be 0..N IN components. (if 0, there can be packages already on the conveyor)
There can be 0..N OUT components. (if 0, meaning whole conveyor will get full)
Calculations are tick based, meaning that "group of packages" can travel only one step at a time.
Each merge distributes packages evenly. (so if two out going lines, then each gets packages/2)
Each join accepts packages evenly. (so if two input lines, each can give max max/2 packages. )
Packages DO NOT have identity, they are just total numbers.
Packages can be split, so 1 package can become 0.5 and 0.5 packages. (so float as number type)
Problem:
How to solve such a graph fast in two steps:
Generate necessary data structure/graph (cache) (Performance: <200ms for 1000 elements). In reality only needs to be recalculated if setup changes before next package travel calculation.
Calculate package travel. (Performance: <20ms for 1000 elements)
Some problematic solutions
OPTION A: Solve per component. (for loop each component and solve based on current situation) Doesn't work, as loops, joins, merges cause situations that it doesn't move packages evenly as required. Also it doesn't honour rule, that "package" can travel only from one component to another. Based on loop order it may be that package gets to end immediately). Additionally, it may prefer only one input all the time, other one will be always blocked.
OPTION B: Calculate for every component what they want to do and find conflicts. (for example A and B want to push to C 10 packages EACH, but C can accept only 12 in total. Fix error, so A and B both can move only 6 each, meaning their stock will be at least 4 each for next tick. This may cause more errors, as A and B did not clear their content and they can accept less). Now find more errors and fix. Repeat until no errors or "max repeats reached" and just quick fix errors = So that it actually doesn't work as expected based on rules.
problem with this is, that I think it will get stuck for certain setup and may break.
OPTION C: Start from OUT elements and try to move packages out. Problems arise if loops are introduced or areas without OUT elements.
Verdict
I am stuck and don't even know what to google for, so all ideas are welcome :)
I figured it is a graph problem, but maybe there are other approaches.

As long as you dont need the optimal solution to the problem, I think you could:
1. At the beginning, use a topological sort order of the graph, and "tag"
each node with its position in that order so you have a criteria as to
which node is better than others. The final node has the maximum "tag"
(check link to learn more about topological sort, not hard)
2. Mark all nodes as not visited
3. Get the *node A* with max(tag) still not visited:
3.a. Get each *node B* that is a target from *node A*
(i.e. at the end of the arrow) starting from the one with maximum tag
and finishing with the one with minimum tag:
3.a.a. push all packages from *node A* to *node B* until it is filled or
*node A* is empty.
You can find the topological sort definition and a bunch of algorithms in https://en.wikipedia.org/wiki/Topological_sorting
Best of lucks :)

Ideas for heuristically solving travelling salesman with extra constraints

I'm trying to come up with a fast and reasonably optimal algorithm to solve the following TSP/hamiltonian-path-like problem:
A delivery vehicle has a number of pickups and dropoffs it needs to
perform:
For each delivery, the pickup needs to come before the
dropoff.
The vehicle is quite small and the packages vary in size.
The total carriage cannot exceed some upper bound (e.g. 1 cubic
metre). Each delivery has a deadline.
The planner can run mid-route, so the vehicle will begin with a number of jobs already picked up and some capacity already taken up.
A near-optimal solution should minimise the total cost (for simplicity, distance) between each waypoint. If a solution does not exist because of the time constraints, I need to find a solution that has the fewest number of late deliveries. Some illustrations of an example problem and a non-optimal, but valid solution:
I am currently using a greedy best first search with backtracking bounded to 100 branches. If it fails to find a solution with on-time deliveries, I randomly generate as many as I can in one second (the most computational time I can spare) and pick the one with the fewest number of late deliveries. I have looked into linear programming but can't get my head around it - plus I would think it would be inappropriate given it needs to be run very frequently. I've also tried algorithms that require mutating the tour, but the issue is mutating a tour nearly always makes it invalid due to capacity constraints and precedence. Can anyone think of a better heuristic approach to solving this problem? Many thanks!

Safe Moves
Here are some ideas for safely mutating an existing feasible solution:
Any two consecutive stops can always be swapped if they are both pickups, or both deliveries. This is obviously true for the "both deliveries" case; for the "both pickups" case: if you had room to pick up A, then pick up B without delivering anything in between, then you have room to pick up B first, then pick up A. (In fact a more general rule is possible: In any pure-delivery or pure-pickup sequence of consecutive stops, the stops can be rearranged arbitrarily. But enumerating all the possibilities might become prohibitive for long sequences, and you should be able to get most of the benefit by considering just pairs.)
A pickup of A can be swapped with any later delivery of something else B, provided that A's original pickup comes after B was picked up, and A's own delivery comes after B's original delivery. In the special case where the pickup of A is immediately followed by the delivery of B, they can always be swapped.
If there is a delivery of an item of size d followed by a pickup of an item of size p, then they can be swapped provided that there is enough extra room: specifically, provided that f >= p, where f is the free space available before the delivery. (We already know that f + d >= p, otherwise the original schedule wouldn't be feasible -- this is a hint to look for small deliveries to apply this rule to.)
If you are starting from purely randomly generated schedules, then simply trying all possible moves, greedily choosing the best, applying it and then repeating until no more moves yield an improvement should give you a big quality boost!
Scoring Solutions
It's very useful to have a way to score a solution, so that they can be ordered. The nice thing about a score is that it's easy to incorporate levels of importance: just as the first digit of a two-digit number is more important than the second digit, you can design the score so that more important things (e.g. deadline violations) receive a much greater weight than less important things (e.g. total travel time or distance). I would suggest something like 1000 * num_deadline_violations + total_travel_time. (This assumes of course that total_travel_time is in units that will stay beneath 1000.) We would then try to minimise this.
Managing Solutions
Instead of taking one solution and trying all the above possible moves on it, I would instead suggest using a pool of k solutions (say, k = 10000) stored in a min-heap. This allows you to extract the best solution in the pool in O(log k) time, and to insert new solutions in the same time.
You could initially populate the pool with randomly generated feasible solutions; then on each step, you would extract the best solution in the pool, try all possible moves on it to generate child solutions, and insert any child solutions that are better than their parent back into the pool. Whenever the pool doubles in size, pull out the first (i.e. best) k solutions and make a new min-heap with them, discarding the old one. (Performing this step after the heap grows to a constant multiple of its original size like this has the nice property of leaving the amortised time complexity unchanged.)
It can happen that some move on solution X produces a child solution Y that is already in the pool. This wastes memory, which is unfortunate, but one nice property of the min-heap approach is that you can at least handle these duplicates cheaply when they arrive at the front of the heap: all duplicates will have identical scores, so they will all appear consecutively when extracting solutions from the top of the heap. Thus to avoid having duplicate solutions generate duplicate children "down through the generations", it suffices to check that the new top of the heap is different from the just-extracted solution, and keep extracting and discarding solutions until this holds.
A note on keeping worse solutions: It might seem that it could be worthwhile keeping child solutions even if they are slightly worse than their parent, and indeed this may be useful (or even necessary to find the absolute optimal solution), but doing so has a nasty consequence: it means that it's possible to cycle from one solution to its child and back again (or possibly a longer cycle). This wastes CPU time on solutions we have already visited.

You are basically combining the Knapsack Problem with the Travelling Salesman Problem.
Your main problem here seems to be actually the Knapsack Problem, rather then the Travelling Salesman Problem, since it has the one hard restriction (maximum delivery volume). Maybe try to combine the solutions for the Knapsack Problem with the Travelling Salesman.
If you really only have one second max for calculations a greedy algorithm with backtracking might actually be one of the best solutions that you can get.

Optimal placement of objects wrt pairwise similarity weights

Ok this is an abstract algorithmic challenge and it will remain abstract since it is a top secret where I am going to use it.
Suppose we have a set of objects O = {o_1, ..., o_N} and a symmetric similarity matrix S where s_ij is the pairwise correlation of objects o_i and o_j.
Assume also that we have an one-dimensional space with discrete positions where objects may be put (like having N boxes in a row or chairs for people).
Having a certain placement, we may measure the cost of moving from the position of one object to that of another object as the number of boxes we need to pass by until we reach our target multiplied with their pairwise object similarity. Moving from a position to the box right after or before that position has zero cost.
Imagine an example where for three objects we have the following similarity matrix:
1.0 0.5 0.8
S = 0.5 1.0 0.1
0.8 0.1 1.0
Then, the best ordering of objects in the tree boxes is obviously:
[o_3] [o_1] [o_2]
The cost of this ordering is the sum of costs (counting boxes) for moving from one object to all others. So here we have cost only for the distance between o_2 and o_3 equal to 1box * 0.1sim = 0.1, the same as:
[o_3] [o_1] [o_2]
On the other hand:
[o_1] [o_2] [o_3]
would have cost = cost(o_1-->o_3) = 1box * 0.8sim = 0.8.
The target is to determine a placement of the N objects in the available positions in a way that we minimize the above mentioned overall cost for all possible pairs of objects!
An analogue is to imagine that we have a table and chairs side by side in one row only (like the boxes) and you need to put N people to sit on the chairs. Now those ppl have some relations that is -lets say- how probable is one of them to want to speak to another. This is to stand up pass by a number of chairs and speak to the guy there. When the people sit on two successive chairs then they don't need to move in order to talk to each other.
So how can we put those ppl down so that every distance-cost between two ppl are minimized. This means that during the night the overall number of distances walked by the guests are close to minimum.
Greedy search is... ok forget it!
I am interested in hearing if there is a standard formulation of such problem for which I could find some literature, and also different searching approaches (e.g. dynamic programming, tabu search, simulated annealing etc from combinatorial optimization field).
Looking forward to hear your ideas.
PS. My question has something in common with this thread Algorithm for ordering a list of Objects, but I think here it is better posed as problem and probably slightly different.

That sounds like an instance of the Quadratic Assignment Problem. The speciality is due to the fact that the locations are placed on one line only, but I don't think this will make it easier to solve. The QAP in general is NP hard. Unless I misinterpreted your problem you can't find an optimal algorithm that solves the problem in polynomial time without proving P=NP at the same time.
If the instances are small you can use exact methods such as branch and bound. You can also use tabu search or other metaheuristics if the problem is more difficult. We have an implementation of the QAP and some metaheuristics in HeuristicLab. You can configure the problem in the GUI, just paste the similarity and the distance matrix into the appropriate parameters. Try starting with the robust Taboo Search. It's an older, but still quite well working algorithm. Taillard also has the C code for it on his website if you want to implement it for yourself. Our implementation is based on that code.
There has been a lot of publications done on the QAP. More modern algorithms combine genetic search abilities with local search heuristics (e. g. Genetic Local Search from Stützle IIRC).

Here's a variation of the already posted method. I don't think this one is optimal, but it may be a start.
Create a list of all the pairs in descending cost order.
While list not empty:
Pop the head item from the list.
If neither element is in an existing group, create a new group containing
the pair.
If one element is in an existing group, add the other element to whichever
end puts it closer to the group member.
If both elements are in existing groups, combine them so as to minimize
the distance between the pair.
Group combining may require reversal of order in a group, and the data structure should
be designed to support that.

Let me help the thread (of my own) with a simplistic ordering approach.
1. Order the upper half of the similarity matrix.
2. Start with the pair of objects having the highest similarity weight and place them in the center positions.
3. The next object may be put on the left or the right side of them. So each time you may select the object that when put to left or right
has the highest cost to the pre-placed objects. Goto Step 2.
The selection of Step 3 is because if you left this object and place it later this cost will be again the greatest of the remaining, and even more (farther to the pre-placed objects). So the costly placements should be done as earlier as it can be.
This is too simple and of course does not discover a good solution.
Another approach is to
1. start with a complete ordering generated somehow (random or from another algorithm)
2. try to improve it using "swaps" of object pairs.
I believe local minima would be a huge deterrent.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio