Petri net boundedness - petri-net

I want to ask about petri net(PN) boundedness. When I have a state s1 = (2 0 0) then i find state s2 = (2 0 1) so since s1 < s2 can I declare PN as NOT bounded ? Because when I have this PN:
PN is bounded but u can find there (2 0 0) < (2 0 1).
So my question is. Am I wrong about boundedness of petri net or is something wrong with PT on the picture ?

This net is bounded. You can check this by building its reachability graph.
Initially only one transition t1 is enabled. Therefore s1=(2 0 0) has only one successor state s2 = (1 1 0). Here is the full reachability graph
Also observe that there is no transition that has more outgoing arcs than incoming arcs, therefore the number of tokens in the net cannot increase. This is called an invariant. From this observation you can derive that the state (2 0 1) can never be reached from (2 0 0).

Related

Maximum the result from multiple functions which share an input amount

I have multiple functions as shown in the image. For a fixed x value, I need to distribute it into f, g, and h functions for getting the maximum output (y). In other words, having a fixed x value, find a, b, and c in which these conditions are satisfied:
a + b + c = x
a >= 0 and b >= 0 and c >= 0
f(a) + g(b) + h(c) has max value.
Given the functions are continuous and monotonic. How should I write code to find out a, b, and c? Thanks in advance!
Under appropriate assumptions, if the maximum has a > 0 and b > 0 and c > 0, then a necessary condition is f'(a) = g'(b) = h'(c). Intuitively, if one of these derivatives was greater than the others, then we could effect an improvement by increasing the corresponding variable a little bit and decreasing another variable by the same amount. Otherwise, the maximum has a = 0 or b = 0 or c = 0, and we have a lower dimensional problem of the same type.
The algorithm is to loop over all seven possibilities for whether a, b, c are zero (assuming x > 0 to avoid the trivial case), then solve the equations a + b + c = x and f'(a) = g'(b) = h'(c) (omitting the variables that are zero) to find the candidate solutions, then return the maximum.
Even if you only had 2 functions f and g, you would be looking for the x that maximises the maximum of a :-> f(a) + g(x-a) on [0,x], which is a sum of an increasing and a decreasing function, so you can't have any guarantee about it.
Still if these functions are given to you as closed form expressions, you can compute u(a)=f(a)+g(x-a) and try to find the maximum (under sufficient assumptions, you will have u'(a) = 0 and u''(a) <= 0 for instance).
Going back to the 3 functions case, if it's possible you can compute for every a, v(a) = max_{b in [0, x-a]} ( g(b)+h(x-a-b) ), and then compute the max of (f+v)(a), or do with b or c first if it works better, but in the general case there is no efficient algorithm.

Find largest ones after setting a coordinate to one

Interview Question:
You are given a grid of ones and zeros. You can arbitrarily select any point in that grid. You have to write a function which does two things:
If you choose e.g. coordinate (3,4) and it is zero you need to flip
that to a one. If it is a one you need to flip that to a zero.
You need to return the largest contiguous region
with the most ones i.e. ones have to be at least connected to
another one.
E.g.
[0,0,0,0]
[0,1,1,0]
[1,0,1,0]
We have the largest region being the 3 ones. We have another region which have only one one (found at coordinate (2,0)).
You are to find an algorithm that will solve this where you will call that function many times. You need to ensure that your amortized run time is the lowest you can achieve.
My Solution which has Time Complexity:O(num_row*num_col) each time this function is called:
def get_all_coordinates_of_ones(grid):
set_ones = set()
for i in range(len(grid[0])):
for j in range(len(grid)):
if grid[i][j]:
set_ones.add((i, j))
return set_ones
def get_largest_region(x, y, grid):
num_col = len(grid)
num_row = len(grid[0])
one_or_zero = grid[x][y]
if not grid[x][y]:
grid[x][y] = 1 - grid[x][y]
# get the coordinates of ones in the grid
# Worst Case O(num_col * num_row)
coordinates_ones = get_all_coordinates_of_ones(grid)
while coordinates_ones:
queue = collections.deque([coordinates_ones.pop()])
largest_one = float('-inf')
count_one = 1
visited = set()
while queue:
x, y = queue.popleft()
visited.add((x, y))
for new_x, new_y in ((x, y + 1), (x, y - 1), (x + 1, y), (x - 1, y)):
if (0 <= new_x < num_row and 0 <= new_y < num_col):
if grid[new_x][new_y] == 1 and (new_x, new_y) not in visited:
count_one += 1
if (new_x, new_y) in coordinates_ones:-
coordinates_ones.remove((new_x, new_y))
queue.append((new_x, new_y))
largest_one = max(largest_one, count_one)
return largest_one
My Proposed modifications:
Use Union Find by rank. Encountered a problem. Union all the ones that are adjacent to each other. Now when one of the
coordinates is flipped e.g. from zero to one I will need to remove that coordinate from the region that it is connected to.
Questions are:
What is the fastest algorithm in terms of time complexity?
Using Union Find with rank entails removing a node. Is this the way to do improve the time complexity. If so, is there an implementation of removing a node in union find online?
------------------------ EDIT ---------------------------------
Should we always subtract one from the degree from sum(degree-1 of each 'cut' vertex). Here are two examples the first one where we need to subtract one and the second one where we do not need to subtract one:
Block Cut Tree example 1
Cut vertex is vertex B. Degree of vertex B in the block cut tree is 2.
Sum(cardinality of each 'block' vertex) : 2(A,B) + 1(B) + 3 (B,C,D) = 6
Sum(degree of each 'cut' vertex) : 1 (B)
Block cut size: 6 – 1 = 5 but should be 4 (A. B, C, D, E, F). Here need to subtract one more.
Block Cut Tree Example 2
Sum(cardinality of each 'block' vertex) : 3 (A,B,C) + 1(C) + 1(D) + 3 (D, E, F) = 8
Sum(degree of each 'cut' vertex) : 2 (C and D)
Block cut size: 8 – 2 = 6 which is (A. B, C, D, E, F). Here no need to subtract one.
Without preprocessing:
Flip the cell in the matrix.
Consider the matrix as a graph where each '1' represents a node, and neighbor nodes are connected with an edge.
Find all connected components. For each connected component - store its cardinality.
Return the highest cardinality.
Note that O(V) = O(E) = O(num_row*num_col).
Step 3 takes O(V+E)=O(num_row*num_col), which is similar to your solution.
You are to find an algorithm that will solve this where you will call that function many times. You need to ensure that your amortized run time is the lowest you can achieve.
That hints that you can benefit from preprocessing:
Preprocessing:
Consider the original matrix as a graph G where each '1' represents a node, and neighbor nodes are connected with an edge.
Find all connected components
Construct the set of block-cut trees (section 5.2) of G (also here, here and here) (one block-cut tree for each connected component of G). Construction: see here.
Processing:
If you flip a '0' cell to '1':
Find neighbor connected components (0 to 4)
Remove old block-cut trees, construct a new block-cut tree for the merged component (Optimizations are possible: in some cases, previous tree(s) may be updated instead of reconstructed).
If you flip a '1' cell to '0':
If this cell is a 'cut' in a block-cut tree:
remove it from the block-cut-tree
remove it from each neighbor 'cut' vertex
split the block-cut-tree into several block-cut trees
Otherwise (this cell is part of only one 'block vertex')
remove it from the 'block' vertex; if empty - remove vertex. If block-cut-tree empty - remove it from the set of trees.
The size of a block-cut tree = sum(cardinality of each 'block' vertex) - sum(neighbor_blocks-1 of each 'cut' vertex).
Block-cut trees are not 'well known' as other data structures, so I'm not sure if this is what the interviewer had in mind. If it is - they're really looking for someone well experienced with graph algorithms.

LP Feasible Region

Hello guys I have a question to Linear Programming.
Draw the feasible region for the following linear program:
min
sx + ty
st.
2x + y <= 7
-6x + 5y >= -5
-x + 4y <= 18
y <= 4
(The problem should not be changed to a feasibilty problem, i.e., s=t=0 is not allowed.)
So what I did so far I calculated the extremepoints they are:
(0,4)
(1.5, 4)
(2.5, 2)
(0.83, 0)
(0, 0)
Give appropitate values for s and t that the linear program has
exactly one solution
I understand if I have one solution when I chose s = t = 1
multiple optimal solutions, where each one is bounded (i.e. none of its components has arbitrarily large magnitude).
?
multiple optimal solutions, unbounded
my guess was s = 1 and t = 0, these are the points (0, 4) and (0, 0)
and the whole line between them and there are infinite many points on
that line
no optimal solution
?
I think the feasible region should extend further to the bottom left beyond the x and y axes, since you do not have a constraint in the form x>0 or y>0.
1) see 4), probably better is s=t=-1
2) e.g., s=-2, t=-1, then each point between 2. and 3. has the same minimum value. So the solution is bounded by the points 2. and 3. Also s=1 ant t=0 as mentioned by you is a bounded solution.
3) e.g., s=1, t=-4, then each point on the function -x + 4y = 18 (for y <= 4) is part of the minimum
4) I am not sure about this one but probably s=t=1, then the minimum is reached when x=y = - \infinity thus there is no minimum.

Can an ANN of 2 neurons solve XOR?

I know that an artificial neural network (ANN) of 3 neurons in 2 layers can solve XOR
Input1----Neuron1\
\ / \
/ \ +------->Neuron3
/ \ /
Input2----Neuron2/
But to minify this ANN, can just 2 neurons (Neuron1 takes 2 inputs, Neuron2 take only 1 input) solve XOR?
Input1
\
\ Neuron1------->Neuron2
/
Input2/
The artificial neuron receives one or more inputs...
https://en.wikipedia.org/wiki/Artificial_neuron
Bias input '1' is assumed to be always there in both diagrams.
Side notes:
Single neuron can solve xor but with additional input x1*x2 or x1+x2
https://www.quora.com/Why-cant-the-XOR-problem-be-solved-by-a-one-layer-perceptron/answer/Razvan-Popovici/log
The ANN form in second diagram may solve XOR with additional input like above to Neuron1 or Neuron2?
No that's not possible, unless (maybe) you start using some rather strange, unusual activation functions.
Let's first ignore neuron 2, and pretend that neuron 1 is the output node. Let x0 denote the bias value (always x0 = 1), and x1 and x2 denote the input values of an example, let y denote the desired output, and let w1, w2, w3 denote the weights from the x's to neuron 1. With the XOR problem, we have the following four examples:
x0 = 1, x1 = 0, x2 = 0, y = 0
x0 = 1, x1 = 1, x2 = 0, y = 1
x0 = 1, x1 = 0, x2 = 1, y = 1
x0 = 1, x1 = 1, x2 = 1, y = 0
Let f(.) denote the activation function of neuron 1. Then, assuming we can somehow train our weights to solve the XOR problem, we have the following four equations:
f(w0 + x1*w1 + x2*w2) = f(w0) = 0
f(w0 + x1*w1 + x2*w2) = f(w0 + w1) = 1
f(w0 + x1*w1 + x2*w2) = f(w0 + w2) = 1
f(w0 + x1*w1 + x2*w2) = f(w0 + w1 + w2) = 0
Now, the main problem is that activation functions that are typically used (ReLUs, sigmoid, tanh, idendity function... maybe others) are nondecreasing. That means that if you give it a larger input, you also get a larger output: f(a + b) >= f(a) if b >= 0. If you look at the above four equations, you'll see this is a problem. Comparing the second and third equations to the first tell us that w1 and w2 need to be positive because they need to increase the output in comparison to f(w0). But, then the fourth equation won't work out because it will give an even greater output, instead of 0.
I think (but didn't actually try to verify, maybe I'm missing something) that it would be possible if you use an activation function that goes up first and then down again. Think of something like f(x) = -(x^2) with some extra term to shift it away from the origin. I don't think such activation functions are commonly used in neural networks. I suspect they'll behave less nicely when training, and are not plausible from a biological point of view (remember than neural networks are at least inspired by biology).
Now, in your question you also added an extra link from neuron 1 to neuron 2, which I ignored in the discussion above. The problem here is still the same though. The activation level in neuron 1 is always going to be higher than (or at least as high as) the second and third cases. Neuron 2 would typically again have a nondecreasing activation function, so would not be able to change this (unless you put a negative weight between the hidden neuron 1 and output neuron 2, in which case you flip the problem around and will predict too high a value for the first case)
EDIT: Note that this is related to Aaron's answer, which is essentially also about the problem of nondecreasing activation functions, just using more formal language. Give him an upvote too!
It's not possible.
Firstly, you need an equal number of inputs to the inputs of XOR. The smallest ANN capable of modelling any binary operation will contain two inputs. The second diagram only shows one input, one output.
Secondly, and this is probably the most direct refutation, the XOR function's output is not an additive or multiplicative relationship, but can be modelled using a combination of them. A neuron is generally modelled using functions like sigmoids or lines which have no stationary points, so one layer of neurons can roughly approximate an additive or multiplicative relationship.
What this means is that a minimum of two layers of processing are required to produce a XOR operation.
This question brings up an interesting topic of ANNs. They are well-suited to identifying fuzzy relationships, but tend to require at least as much network complexity as any mathematical process which would solve the problem with no fuzzy margin for error. Use ANNs where you need to identify something which looks mostly like what you are identifying, and use math where you need to know precisely whether something matches a set of concrete traits.
Understanding the distinction between ANN and mathematics opens up the possibility of combining the two in more powerful calculation pipelines, such as identifying possible circles in an image using ANN, using mathematics to pin down their precise origins, and using a second ANN to compare those origins to the configurations on known objects.
It is absolutely possible to solve the XOR problem with only two neurons.
Take a look at the model below.
This model solves the problem easily.
The first representing logic AND and the other logic OR. The value of +1.5 for the threshold of the hidden neuron insures that it will be turned on only when both input units are on. The value of +0.5 for the output neuron insures that it will turn on only when it receives a net positive input greater than +0.5. The weight of -2 from the hidden neuron to the output one insures that the output neuron will not come on when both input neurons are on (ref. 2).
ref. 1: Hazem M El-Bakry, Modular neural networks for solving high complexity problems (link)
ref. 2: D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representation by error backpropagation, Parallel distributed processing: Explorations in the Microstructures of Cognition, Vol. 1, Cambridge, MA: MIT Press, pp. 318-362, 1986.
Of course it is possible. But before solving XOR problem with two neurons I want to discuss on linearly separability. A problem is linearly separable if only one hyperplane can make the decision boundary. (Hyperplane is just a plane drawn to differentiate the classes. For an N dimensional problem i.e, a problem having N features as inputs the hyperplane will be an N-1 dimensional plane.) So for a 2 input XOR problem the hyperplane will be an one dimensional plane that is a "line".
Now coming to the question, XOR is not linearly separable. Hence we cannot directly solve XOR problem with two neurons. Following images show no matter how many ways we draw a line in 2D space we cannot differentiate one side's output with the other. For example for the first one (0,1) and (1,0) both inputs makes XOR to give 1. But for the input (1,1) the output is 0 but we cannot make it separated and unfortunately they are falling in the same side.
So here we have two options to solve it:
Using hidden layer. But it will increase the number of neurons more than two.
Another option is to increase the dimensions.
Let's have an illustration how increasing dimensions can solve this problem keeping the the number of neurons 2.
For an analogy we can think XOR as a subtraction of AND from OR like below,
If you notice the upper figure, the first neuron will mimic logical AND after passing "v=(-1.5)+(x1*1)+(x2*1)" to some activation function and the output will be considered as 0 or 1 depending on v is negative or positive respectively (I am not getting into the details...hope you got the point). And the same way the next neuron will mimic logical OR.
So for the first three cases of the truth table the AND neuron will remain turned off. But for the last one (actually where OR is different from XOR) the AND neuron will be turned on providing a big negative value to the OR neuron which will overwhelm the total summation to negative as it is big enough to make the summation a negative number. So finally activation function of the second neuron will interpret it as 0.
By this way we can make XOR with 2 neurons.
Following two figures are also the solutions to your questions which I have collected:
The problem can be split in two parts.
Part one
a b c
-------
0 0 0
0 1 1
1 0 0
1 1 0
Part two
a b d
-------
0 0 0
0 1 0
1 0 1
1 1 0
Part one can be solved with one neuron.
Part two can also be solved with one neuron.
part one and part two added together makes the XOR.
c = sigmoid(a * 6.0178 + b * -6.6000 + -2.9996)
d = sigmoid(a * -6.5906 + b *5.9016 + -3.1123 )
----------------------------------------------------------
sigmoid(0.0 * 6.0178 + 0 * -6.6000 + -2.9996)+ sigmoid(0.0 * -6.5906 + 0 *5.9016 + -3.1123 ) = 0.0900
sigmoid(1.0 * 6.0178 + 0 * -6.6000 + -2.9996)+ sigmoid(1.0 * -6.5906 + 0 *5.9016 + -3.1123 ) = 0.9534
sigmoid(0.0 * 6.0178 + 1 * -6.6000 + -2.9996)+ sigmoid(0.0 * -6.5906 + 1 *5.9016 + -3.1123 ) = 0.9422
sigmoid(1.0 * 6.0178 + 1 * -6.6000 + -2.9996)+ sigmoid(1.0 * -6.5906 + 1 *5.9016 + -3.1123 ) = 0.0489

Partition an array in order

This is an interview question that a friend of mine got and I'm unable to come up with how to solve it.
Question:
You are given a array of n buttons that are either red or blue. There are k containers present. The value of a container is given by the product of red buttons and blue buttons present in it. The problem is to put the buttons into the containers such that the sum of all values of the containers is minimal. Additionally, all containers must contain the buttons and they must be put in order they are given.
For example, the very first button can only go to the first container, the second one can go to either the first or the second but not the third (otherwise the second container won't have any buttons).
k will be less than or equal to n.
I think there must be a dynamic programming solution for this.
How do you solve this ?
So far, I've only got the trivial cases where
if (n==k), the answer would be zero because you could just put one in each container making the value of each container zero, therefore the sum would be zero.
if (k==1), you just dump all of them and calculate the product.
if only one color is present, the answer would be zero.
Edit:
I'll give an example.
n = 4 and k = 2
Input: R B R R
The first container gets the first two (R and B) making its value 1 (1R X 1B)
The second container gets the remaining (R and R) making its value 0 (2R x 0B)
The answer is 1 + 0 = 1
if k=3,
the first container would have only the first button (R)
the second container would have only the second one (B)
the third one would have the last two buttons (R and R)
Each of the containers would have value 0 and hence sum and answer would be 0.
Hope this clears up the doubts.
Possible DP solution:
Let dp[i, j] = minimum number possible if we put the first i numbers into j containers.
dp[i, j] = min{dp[p, j - 1] + numRed[p+1, i]*numBlues[p+1, i]}, p = 1 to i - 1
Answer will be in dp[n, k].
int blue = 0, red = 0;
for (int i = 1; i <= n; ++i)
{
if (buttons[i] == 1)
++red;
else
++blue;
dp[i][1] = red * blue;
}
for (int i = 2; i <= n; ++i)
for (int j = 2; j <= k; ++j)
{
dp[i][j] = inf;
for (int p = 1; p <= i; ++p)
dp[i][j] = min(dp[p][j - 1] + getProd(p + 1, i), dp[i][j]);
}
return dp[n][k];
Complexity will be O(n^3*k), but it's possible to reduce to O(n^2*k) by making getProd run in O(1) with the help of certain precomputations (hint: use dp[i][1]). I'll post it tomorrow if no one figures out this is actually wrong until then.
It might also be possible to reduce to O(n*k), but that will probably require a different approach...
If I understand the question correctly, as long as every container has at least one button in it, you can choose any container to put the remaining buttons in. Given that, put one button in every container, making sure that there is at least one container with a red button and at least one with a blue button. Then with the remaining buttons, put all the red buttons in a container with a red button and put all the blue buttons in a container with blue buttons in it. This will make it so every container has at least one button and every container has only one color of buttons. Then every container's score is 0. Thus the sum is 0 and you have minimized the combined score.
Warning: Proven to be non-optimal
How about a greedy algorithm to get people talking? I'm not going to try to prove it's optimal at this point, but it's a way of approaching the problem.
In this solution, we use the G to denote the number of contiguous regions of one colour in the sequence of buttons. Say we had (I'm using x for red and o for blue since R and B look too similar):
x x x o x o o o x x o
This would give G = 6. Let's split this into groups (red/blue) where, to start with, each group gets an entire region of a consistent colour:
3/0 0/1 1/0 0/3 2/0 0/1 //total value: 0
When G <= k, you have a minimum of zero since each grouping can go into its own container. Now assume G > k. Our greedy algorithm will be, while there are more groups than containers, collapse two adjacent groups into one that result in the least container value delta (valueOf(merged(a, b)) - valueOf(a) - valueOf(b)). Say k = 5 with our example above. Our choices are:
Collapse 1,2: delta = (3 - 0 - 0) = 3
2,3: delta = 1
3,4: delta = 3
4,5: delta = 6
5,6: delta = 2
So we collapse 2 and 3:
3/0 1/1 0/3 2/0 0/1 //total value: 1
And k = 4:
Collapse 1,2: delta = (4 - 0 - 1) = 3
2,3: delta = (4 - 1 - 0) = 3
3,4: delta = (6 - 0 - 0) = 6
4,5: delta = 2
3/0 1/1 0/3 2/1 //total value: 3
k = 3
4/1 0/3 2/1 //total value: 6
k = 2
4/1 2/4 //total value: 12
k = 1
6/5 //total value: 30
It seems optimal for this case, but I was just intending to get people talking about a solution. Note that the starting assignments of buttons to containers was a shortcut: you could instead start with each button in the sequence in its own bucket and then reduce, but you would always arrive to the point where each container has the maximum number of buttons of one colour.
Counterexample: Thanks to Jules Olléon for providing a counter-example that I was too lazy to think of:
o o o x x o x o o x x x
If k = 2, the optimal mapping is
2/4 4/2 //total value: 16
Let's see how the greedy algorithm approaches it:
0/3 2/0 0/1 1/0 0/2 3/0 //total value: 0
0/3 2/0 1/1 0/2 3/0 //total value: 1
0/3 3/1 0/2 3/0 //total value: 3
0/3 3/1 3/2 //total value: 9
3/4 3/2 //total value: 18
I'll leave this answer up since it's accomplished its only purpose of getting people talking about a solution. I wonder if the greedy heuristic could be used in an informed search algorithm such as A* to improve the runtime of an exhaustive search, but that would not achieve polynomial runtime.
I always ask for clarifications of the problem statement in an interview. Imagine that you never put blue an red buttons together. Then the sum is 0, just like n==k. So, for all cases where k > 1, then the minimum is 0.
Here is what I understand so far: The algorithm is to process a sequence of values {R,B}.
It may choose to put the value in the current container or the next, if there is a next.
I first would ask a couple of questions to clarify the things I don't know yet:
Is k and n known to the algorithm in advance? I assume so.
Do we know the full sequence of buttons in advance?
If we don't know the sequence in advance, should the average value minimized? Or the maximum (the worst case)?
Idea for a proof for the algortihm by Mark Peters
Edit: Idea for a proof (sorry, couldn't fit it in a comment)
Let L(i) be the length of the ith group. Let d(i) be the diff you get by collapsing container i and i+1 => d(i) = L(i)*L(i+1).
We can define a distribution by the sequence of containers collapsed. As index we use the maximum index of the original containers contained in the collapsed container containing the containers with the smaller indexes.
A given sequence of collapses I = [i(1), .. i(m)] results in a value which has a lower bound equal to the sum of d(i(m)) for all m from 1 to n-k.
We need to proof that there can't be a sequence other then the one created by the algorithm with a smaller diff. So let the sequence above be the one resulting from the algorithm. Let J = [j(1), .. j(m)].
Here it gets skimpy:
I think it should be possible to proof that the lower limit of J is larger then the actual value of I because in each step we choose by construction the collapse operation from I so it must be smaller then the matching collapse from the alternate sequence
I think we might assume that the sequences are disjunct, but I'm not completely sure about it.
Here is a brute force algorithm written in Python which seems to work.
from itertools import combinations
def get_best_order(n, k):
slices = combinations(range(1, len(n)), k-1)
container_slices = ([0] + list(s) + [len(n)] for s in slices)
min_value = -1
best = None
def get_value(slices, n):
value = 0
for i in range(1, len(slices)):
start, end = slices[i-1], slices[i]
num_red = len([b for b in n[start:end] if b == 'r'])
value += num_red * (end - start - num_red)
return value
for slices in container_slices:
value = get_value(slices, n)
if value < min_value or min_value == -1:
min_value = value
best = slices
return [n[best[i-1]:best[i]] for i in range(1, len(best))]
n = ['b', 'r', 'b', 'r', 'r', 'r', 'b', 'b', 'r']
k = 4
print(get_best_order(n, k))
# [['b', 'r', 'b'], ['r', 'r', 'r'], ['b', 'b'], ['r']]
Basically the algorithm works like this:
Generate a list of every possible arrangement (items stay in order, so this is just a number of items per container)
Calculate the value for that arrangement as described by the OP
If that value is less than the current best value, save it
Return the arrangement that has the lowest value

Resources