What is the point of the one-comparison-per-iteration binary search? And can you explain how it works?
There are two reasons to binary search with one comparison per iteration. The
less important is performance. Detecting an exact match early using two
comparisons per iteration saves an average one iteration of the loop, whereas
(assuming comparisons involve significant work) binary searching with one
comparison per iteration almost halves the work done per iteration.
Binary searching an array of integers, it probably makes little difference
either way. Even with a fairly expensive comparison, asymptotically the
performance is the same, and the half-rather-than-minus-one probably isn't worth
pursuing in most cases. Besides, expensive comparisons are often coded as functions that return negative, zero or positive for <, == or >, so you can get both comparisons for pretty much the price of one anyway.
The important reason to do binary searches with one comparison per iteration is
because you can get more useful results than just some-equal-match. The main
searches you can do are...
First key > goal
First key >= goal
First key == goal
Last key < goal
Last key <= goal
Last key == goal
These all reduce to the same basic algorithm. Understanding this well enough
that you can code all the variants easily isn't that difficult, but I've not
really seen a good explanation - only pseudocode and mathematical proofs. This
is my attempt at an explanation.
There are games where the idea is to get as close as possible to a target
without overshooting. Change that to "undershooting", and that's what "Find
First >" does. Consider the ranges at some stage during the search...
| lower bound | goal | upper bound
+-----------------+-------------------------+--------------
| Illegal | better worse |
+-----------------+-------------------------+--------------
The range between the current upper and lower bound still need to be searched.
Our goal is (normally) in there somewhere, but we don't yet know where. The
interesting point about items above the upper bound is that they are legal in
the sense that they are greater than the goal. We can say that the item just
above the current upper bound is our best-so-far solution. We can even say this
at the very start, even though there is probably no item at that position - in a
sense, if there is no valid in-range solution, the best solution that hasn't
been disproved is just past the upper bound.
At each iteration, we pick an item to compare between the upper and lower bound.
For binary search, that's a rounded half-way item. For binary tree search, it's
dictated by the structure of the tree. The principle is the same either way.
As we are searching for an item greater-than our goal, we compare the test item
using Item [testpos] > goal. If the result is false, we have overshot (or
undershot) our goal, so we keep our existing best-so-far solution, and adjust
our lower bound upwards. If the result is true, we have found a new best-so-far
solution, so we adjust the upper bound down to reflect that.
Either way, we never want to compare that test item again, so we adjust our
bound to eliminate (only just) the test item from the range to search. Being
careless with this usually results in infinite loops.
Normally, half-open ranges are used - an inclusive lower bound and an exclusive
upper bound. Using this system, the item at the upper bound index is not in the
search range (at least not now), but it is the best-so-far solution. When you
move the lower bound up, you move it to testpos+1 (to exclude the item you just
tested from the range). When you move the upper bound down, you move it to
testpos (the upper bound is exclusive anyway).
if (item[testpos] > goal)
{
// new best-so-far
upperbound = testpos;
}
else
{
lowerbound = testpos + 1;
}
When the range between the lower and upper bounds is empty (using half-open,
when both have the same index), your result is your most recent best-so-far
solution, just above your upper bound (ie at the upper bound index for
half-open).
So the full algorithm is...
while (upperbound > lowerbound)
{
testpos = lowerbound + ((upperbound-lowerbound) / 2);
if (item[testpos] > goal)
{
// new best-so-far
upperbound = testpos;
}
else
{
lowerbound = testpos + 1;
}
}
To change from first key > goal to first key >= goal, you literally switch
the comparison operator in the if line. The relative operator and goal could be replaced by a single parameter - a predicate function that returns true if (and only if) its parameter is on the greater-than side of the goal.
That gives you "first >" and "first >=". To get "first ==", use "first >=" and
add an equality check after the loop exits.
For "last <" etc, the principle is the same as above, but the range is
reflected. This just means you swap over the bound-adjustments (but not the
comment) as well as changing the operator. But before doing that, consider the following...
a > b == !(a <= b)
a >= b == !(a < b)
Also...
position (last key < goal) = position (first key >= goal) - 1
position (last key <= goal) = position (first key > goal ) - 1
When we move our bounds during the search, both sides are being moved towards the goal until they meet at the goal. And there is a special item just below the lower bound, just as there is just above the upper bound...
while (upperbound > lowerbound)
{
testpos = lowerbound + ((upperbound-lowerbound) / 2);
if (item[testpos] > goal)
{
// new best-so-far for first key > goal at [upperbound]
upperbound = testpos;
}
else
{
// new best-so-far for last key <= goal at [lowerbound - 1]
lowerbound = testpos + 1;
}
}
So in a way, we have two complementary searches running at once. When the upperbound and lowerbound meet, we have a useful search result on each side of that single boundary.
For all cases, there's the chance that that an original "imaginary" out-of-bounds
best-so-far position was your final result (there was no match in the
search range). This needs to be checked before doing a final == check for the
first == and last == cases. It might be useful behaviour, as well - e.g. if
you're searching for the position to insert your goal item, adding it after the
end of your existing items is the right thing to do if all the existing items
are smaller than your goal item.
A couple of notes on the selection of the testpos...
testpos = lowerbound + ((upperbound-lowerbound) / 2);
First off, this will never overflow, unlike the more obvious ((lowerbound +
upperbound)/2). It also works with pointers as well as integer
indexes.
Second, the division is assumed to round down. Rounding down for non-negatives
is OK (all you can be sure of in C) as the difference is always non-negative
anyway.
This is one aspect that may need care if you use non-half-open
ranges, though - make sure the test position is inside the search range, and not just outside (on one of the already-found best-so-far positions).
Finally, in a binary tree search, the moving of bounds is implicit and the
choice of testpos is built into the structure of the tree (which may be
unbalanced), yet the same principles apply for what the search is doing. In this
case, we choose our child node to shrink the implicit ranges. For first match
cases, either we've found a new smaller best match (go to the lower child in hopes of finding an even smaller and better one) or we've overshot (go to the higher child in hopes of recovering). Again, the four main cases can be handled by switching the comparison operator.
BTW - there are more possible operators to use for that template parameter. Consider an array sorted by year then month. Maybe you want to find the first item for a particular year. To do this, write a comparison function that compares the year and ignores the month - the goal compares as equal if the year is equal, but the goal value may be a different type to the key that doesn't even have a month value to compare. I think of this as a "partial key comparison", and plug that into your binary search template and you get what I think of as a "partial key search".
EDIT The paragraph below used to say "31 Dec 1999 to be equal to 1 Feb 2000". That wouldn't work unless the whole range in-between was also considered equal. The point is that all three parts of the begin- and end-of-range dates differ, so you're not deal with a "partial" key, but the keys considered equivalent for the search must form a contiguous block in the container, which will normally imply a contiguous block in the ordered set of possible keys.
It's not strictly just "partial" keys, either. Your custom comparison might consider 31 Dec 1999 to be equal to 1 Jan 2000, yet all other dates different. The point is the custom comparison must agree with the original key about the ordering, but it might not be so picky about considering all different values different - it can treat a range of keys as an "equivalence class".
An extra note about bounds that I really should have included before, but I may not have thought about it this way at the time.
One way of thinking about bounds is that they aren't item indexes at all. A bound is the boundary line between two items, so you can number the boundary lines as easily as you can number the items...
| | | | | | | | |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
| |0| | |1| | |2| | |3| | |4| | |5| | |6| | |7| |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
| | | | | | | | |
0 1 2 3 4 5 6 7 8
Obviously the numbering of bounds is related to the numbering of the items. As long as you number your bounds left-to-right and the same way you number your items (in this case starting from zero) the result is effectively the same as the common half-open convention.
It would be possible to select a middle bound to bisect the range precisely into two, but that's not what a binary search does. For binary search, you select an item to test - not a bound. That item will be tested in this iteration and must never be tested again, so it's excluded from both subranges.
| | | | | | | | |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
| |0| | |1| | |2| | |3| | |4| | |5| | |6| | |7| |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
| | | | | | | | |
0 1 2 3 4 5 6 7 8
^
|<-------------------|------------->|
|
|<--------------->| | |<--------->|
low range i hi range
So the testpos and testpos+1 in the algorithm are the two cases of translating the item index into the bound index. Of course if the two bounds are equal, there's no items in that range to choose so the loop cannot continue, and the only possible result is that one bound value.
The ranges shown above are just the ranges still to be searched - the gap we intend to close between the proven-lower and proven-higher ranges.
In this model, the binary search is searching for the boundary between two ordered kinds of values - those classed as "lower" and those classed as "higher". The predicate test classifies one item. There is no "equal" class - equal-to-key values are part of the higher class (for x[i] >= key) or the lower class (for x[i] > key).
Related
Background
In an algorithm called finite element method, a continuous region is discretized into repeated sections with consistent geometry, over which linked equations are formed based on the assumption of continuity between them.
In this case, I have chosen to divide a shape up into an arbitrary grid, and am now trying to connect the elements' values together as I iterate through the elements. Here is an example of the kind of grid I am talking about:
Indices
There are a bunch of related indices:
The element index (teal numbers), linear, row-major, range 0..ROWS*2.
The node index (brown numbers), linear, row-major, range 0..ROWS*COLS.
The local element vertex index (lavender numbers), counter-clockwise by element, range 0..2.
The coordinates of the actual point in space (stored with the element's struct, as well as the grid's struct)
Problem
In order to get to the next step in the algorithm, I need to iterate over each node and compute some sums of values indexed by local element indices and store them in another matrix. If, for example, I'm on node 8, my element lookup/access function is generically V(el_index, start_vertex, end_vertex), and my matrix of outputs is S(start_node, end_node):
S(8,8) = V(el_14, vert_1, vert_1) + V(el_15, vert_1, vert_1) + V(el_04, vert_0, vert_0) + V(el_03, vert_0, vert_0) + V(el_2, vert_2, vert_2) + V(el_13, vert_2, vert_2)
S(8,9) = V(el_15, vert_1, vert_2) + V(el_4, vert_0, vert_2)
and so on, for all of the connections (teal lines) from node 8. (The connections are symmetric, so once I compute S(7,8), I don't need to compute S(8,7).)
The problem is, the grid (and therefore everything else) is parameterized at runtime, so which node index + adjacency direction corresponds to which element index is dynamically determined. I need to tell the program, "Get me the element indices where the node index of vert_x is my current node index." That's the instruction that tells the program which element to access in V().
Is there a way I can relate these indices in a simple and transparent manner in Rust?
Attempts
I tried computing some simple arithmetic functions modulo the row stride of the node matrix, but the result is messy and hard to debug, as well as requiring verbose bounds checking.
I tried creating three HashMaps keyed by the different vertices of each triangular element, holding the values at each vertex, but the problem is that adjacent triangles share vertex numbers as well as spatial coordinates.
I considered keying a HashMap with multiple keys, but the Rust docs didn't say anything about a HashMap with multiple keys.
I think this is a possible workaround, but I hope there's something better:
Nest one HashMap in another as per this question, with the node index as the first key, the element index as the second key/first value, and the element itself as the second value.
So, the iteration can look like this:
Iterate through grid node index N
Get all elements with N as the first key
Note that the vertex indices and (relative) element indices have the following patterns, depending on where you number from:
Node index numbering beginning from matrix top-left (usual in this programming domain):
| Element (index ordinal) | Vertex |
|-------------------------|--------|
| 0 (min) | 1 |
| 1 | 2 |
| 2 | 1 |
| 3 | 2 |
| 4 | 0 |
| 5 | 0 |
Node index numbering beginning as picture, from lower-left:
| Element (index ordinal) | Vertex |
|-------------------------|--------|
| 0 (min) | 2 |
| 1 | 0 |
| 2 | 0 |
| 3 | 2 |
| 4 | 1 |
| 5 | 1 |
Since you can hardcode the relative order of elements like this, it might be better to use a HashMap<usize, Vec<Element>>, or even HashMap<usize, [Element; 6]>.
This method brings up the question of how to relate node index to element indices dynamically. How do you know which elements to insert into that HashMap? One way to accomplish this is to record the nodes the element vertices correspond to in the element struct as well, in the same order.
At that point, you can compute a list like adjacent_elements as you iterate through the matrix, and use the above patterns to figure out what to access (sadly, with bounds checking).
In looking through the dynamic programming algorithm for computing the minimum edit distance between two strings I am having a hard time grasping one thing. To me it seems like given the two strings s and t inserting a character into s would be the same as deleting a character from t. Why then do we need to consider these operations separately when computing the edit distance? I always have a hard time computing the indices in the recurrence relation because I can't intuitively understand this part.
I've read through Skiena and some other sources but they all don't explain this part well. This SO link explains the insert and delete operations better than elsewhere in terms of understanding what string is being inserted into or deleted from but I still can't figure out why they aren't one and the same.
Edit: Ok, I didn't do a very good job of detailing the source of my confusion.
The way Skiena explains computing the minimum edit distance m(i,j) of the first i characters of a string s and the first j characters of a string t based on already having computed solutions to the subproblems is as follows. m(i,j) will be the minimum of the following 3 possibilities:
opt[MATCH] = m[i-1][j-1].cost + match(s[i],t[j]);
opt[INSERT] = m[i][j-1].cost + indel(t[j]);
opt[DELETE] = m[i-1][j].cost + indel(s[i]);
The way I understand it the 3 operations are all operations on the string s. An INSERT means you have to insert a character at the end of string s to get the minimum edit distance. A DELETE means you have to delete the character at the end of string s to get the minimum edit distance.
Given s = "SU" and t = "SATU" INSERT and DELETE would be as follows:
Insert:
SU_
SATU
Delete:
SU
SATU_
My confusion was that an INSERT into s is the same as a DELETION from t. I'm probably confused on something basic but it's not intuitive to me yet.
Edit 2: I think this link kind of clarifies my confusion but I'd love an explanation given my specific questions above.
They aren't the same thing any more than < and > are the same thing. There is of course a sort of duality and you are correct to point it out. a < b if and only if b > a so if you have a good algorithm to test for b > a then it makes sense to use it when you need to test if a < b.
It is much easier to directly test if s can be obtained from t by deletion rather than to directly test if t can be obtained from s by insertion. It would be silly to randomly insert letters to s and see if you get t. I can't imagine that any implementation of edit-distance actually does that. Still, it doesn't mean that you can't distinguish between insertion and deletion.
More abstractly. There is a relation, R on any set of strings defined by
s R t <=> t can be obtained from s by insertion
deletion is the inverse relation. Closely related, but not the same.
The problem of edit distance can be restated as a problem of converting the source string into target string with minimum number of operations (including insertion, deletion and replacement of a single character).
Thus, in the process of converting a source string into a target string, if inserting a character from target string or deleting a character from the source string or replacing a character in the source string with a character from the target string yields the same (minimum) edit distance, then, well, all the operations can be said to be equivalent. In other words, it does not matter how you arrive at the target string as long as you have done minimum number of edits.
This is realized by looking at how the cost matrix is calculated. Consider a simpler problem where source = AT (represented vertically) and target = TA (represented horizontally). The matrix is then constructed as (coming from west, northwest, north in that order):
| ε | T | A |
| | | |
ε | 0 | 1 | 2 |
| | | |
A | 1 | min(2, 1, 2) = 1 | min(2, 1, 3) = 1 |
| | | |
T | 2 | min(3, 1, 3) = 1 | min(2, 2, 2) = 2 |
The idea of filling this matrix is:
If we moved east, we insert the current target string character.
If we moved south, we delete the current source string character.
If we moved southeast, we replace the current source character with current target character.
If all or any two of these impart the same cost in terms of editing, then they can be said to be equivalent and you can break the ties arbitrarily.
One of the first experiences with this comes when we find c(2, 2) in the cost matrix (c(0, 0) through c(0, 2) -- minimum costs of converting an empty string to "T", "TA" respectively, and c(0, 0) to c(2,0) -- costs of converting "A", "AT" respectively to empty string are clear).
Value of c(2, 2), can be realized either by:
inserting the current character in target, 'A' (we move east from c(2,1)) -- cost is 1 + 1 = 2, or
replacing the current character 'T' in source by current character in target 'A' -- cost is `1 + 1 = 2
deleting the current character in source, 'T' (we move south from c(1, 2)) -- cost is 1 + 1 = 2
Since all values are the same, which one are you going to choose?
If you choose to move from west, your alignment could be:
A T -
- T A
(one deletion, one 0-cost replacement, one insertion)
If you choose to move from north, your alignment could be:
- A T
T A -
(one insertion, one 0-cost replacement, one deletion)
If you choose to move from northwest, your alignment could be:
A T
T A
(Two 1-cost replacements).
All these edit graphs are equivalent in terms of given edit distance (under given cost function).
Edit distance is only interested in the minimum number of operations required to transform one sequence into another; it is not interested in the uniqueness of the transformation. In practice, there are often multiple ways to transform one string into another, that all have the minimum number of operations.
I'm trying to implement a feature with following requirement but having hard time come up with a algorithm.
My data contains a positive integer value and a date. The numbers of data vary from 100 ~ 10,000.
-------------------------
id | value | date
-------------------------
1 | 10 | 2015-01-01
2 | 10 | 2015-01-02
3 | 20 | 2015-01-02
....................
960 | 30 | 2015-09-10
961 | 15 | 2015-09-10
And a specified target value, says 5,000.
I would like to find a combination of the data, so their sum of the values equal to target, and they contains older data as much as possible. (The target number must match, it is okay to have a combination without using oldest data first)
Can anyone give me a direction how I can implement it ?
One approach based on Subset-Sum pseudo-polynomial solution could be :
First, sort the entries such that the oldest one is last, and the newest one is first. Then, generate the DP matrix based on the formulas:
D(i,0) = true
D(0,x) = false x!= 0
D(i,x) = D(i-1,x) OR D(i-1, x-value[i])
This matrix is of size (n+1) * (target+1).
Next, generate a solution by greedily choosing (from last to first) to take the element if it's possible:
t = target
i = n
sol = [] //empty list
while (t != 0):
if D(i-1,t-value[i] == true):
sol.append(i) //item i in the solution
t = t - value[i]
i = i-1 //either case
This guarantees:
Values of sol sums to the target
The oldest value which is in any feasible solution will be in sol.
Apparently the problem can be interpreted as the knapsack problem. Note that this problem is NP-hard. It can be solved to optimality by dynamic programming and admits an FPTAS.
The problem can be modelled in the following way. The item profits and weights are the item's value (which means that the problem can be simplified to the https://en.wikipedia.org/wiki/Subset_sum_problem). Let C denote the specified target value. Group the items by value; for value v, let keep only the floor(C/v) oldest items, where floor denotes rounding down. After the knapsack solver has generated a solution, replace the at most floor(C/v) items in the solution of value v (for each value v) with the oldest ones selected before.
Consider the following data.
Groundtruth | Dataset1 | Dataset2 | Dataset3
Datapoints|Time | Datapoints|Time | Datapoints|Time | Datapoints|Time
A |0 | a |0 | a |0 | a |0
B |10 | b |5 | b |5 | b |13
C |15 | c |12 | c |12 | c |21
D |25 | d |22 | d |14 | d |30
E |30 | e |30 | e |17 |
| | f |27 |
| | g |30 |
Visualized like this (as in number of - between each identifier):
Time ->
Groundtruth: A|----------|B|-----|C|----------|D|-----|E
Dataset1: a|-----|b|-------|c|----------|d|--------|e
Dataset2: a|-----|b|-------|c|--|d|---|e|----------|f|---|g
Dataset3: a|-------------|b|--------|c|---------|d
My goal is to compare the datasets with the groundtruth. I want to create a function that generates a similarity measurement between one of the datasets and the groundtruth in order to evaluate how good my segmentation algorithm is. Obviously I would like the segmentation algorithm to consist of equal number of datapoints(segments) as the groundtruth but as illustrated with the datasets this is not a guarantee, neither is the number of datapoints known ahead of time.
I've already created a Jacard Index to generate a basic evaluation score. But I am now looking into an evaluation method that punish the abundance/absence of datapoints as well as limit the distance to a correct datapoint. That is, b doesn't have to match B, it just has to be close to a correct datapoint.
I've tried to look into a dynamic programming method where I introduced a penalty for removing or adding a datapoint as well as a distance penalty to move to the closest datapoint. I'm struggling though, due to:
1. I need to limit each datapoint to one correct datapoint
2. Figure out which datapoint to delete if needed
3. General lack of understanding in how to implement DP algorithms
Anyone have ideas how to do this? If dynamic programming is the way to go, I'd love some link recommendation as well as some pointers in how to go about it.
Basically, you can modify the DP for Levenshtein edit distance to compute distances for your problem. The Levenshtein DP amounts to finding shortest paths in an acyclic directed graph that looks like this
*-*-*-*-*
|\|\|\|\|
*-*-*-*-*
|\|\|\|\|
*-*-*-*-*
where the arcs are oriented left-to-right and top-to-bottom. The DAG has rows numbered 0 to m and columns numbered 0 to n, where m is the length of the first sequence, and n is the length of the second. Lists of instructions for changing the first sequence into the second correspond one-to-one (cost and all) to paths from the upper left to the lower right. The arc from (i, j) to (i + 1, j) corresponds to the instruction of deleting the ith element from the first sequence. The arc from (i, j) to (i, j + 1) corresponds to the instruction of adding the jth element from the second sequence. The arc from (i, j) corresponds to modifying the ith element of the first sequence to become the jth element of the second sequence.
All you have to do to get a quadratic-time algorithm for your problem is to define the cost of (i) adding a datapoint (ii) deleting a datapoint (iii) modifying a datapoint to become another datapoint and then compute shortest paths on the DAG in one of the ways described by Wikipedia.
(As an aside, this algorithm assumes that it is never profitable to make modifications that "cross over" one another. Under a fairly mild assumption about the modification costs, this assumption is superfluous. If you're interested in more details, see this answer of mine: Approximate matching of two lists of events (with duration) .)
Given a set of items (sized anywhere from 1 to 100) and a number of bins (1 to 15.) Each item having a subset of bins the item can be assigned to and a preference ordering of which bin is best, second best, etc., just for it. Items also have a natural order, represented below by naming, e.g., item1 before item2. Each bin has a capacity between 1 and 5 (every item has identical weight, i.e., 1.)
An example input could be three bins and six items (- indicates a bin is not in the item's usable set, i.e., can't be packed with it):
| bin1 bin2 bin3 | bin1 bin2 bin3
------------------------ ----------------------------
item1 | 1 2 - capacity | 4 4 5
item2 | - 1 2
item3 | 2 1 3
item4 | 1 2 3
item5 | 1 - 2
item6 | 1 2 3
The goals are (in order with each goal completely overriding any lower goal when there's a conflict, e.g., packing five items is always better than four no matter what number of bins are used or preferences ignored):
maximize number of items packed
pack items in their natural order, e.g., if total bin capacity is one and there are two items, item1 will be packed and item2 not
minimize number of bins used
pack each item according to its bin preferences and natural order, i.e, item1 in its first preference and item2 in its second is better than item1 in its second and item2 in its first
in cases where two solutions are indistinguishable by these goals, either solution is acceptable to rank higher, e.g, as a side-effect of implementation or just arbitrary tie-breaking.
So the input above would be packed as:
| bin1 bin2 bin3
------------------------
item1 | x
item2 | x
item3 | x
item4 | x
item5 | x
item6 | x
The question then is for what to read/review to help me come up with algorithm ideas for solving this problem with the input sizes from the first paragraph and a time constraint of a few seconds, i.e., not brute force (or at least any brute force I've conceived of so far.) I'm using Ruby and C but language isn't overly relevant at this stage of woods stumbling.
I'll be grateful of any reading suggestions, ideas on combinations of algorithms, or just thoughts on clarifying the problem statement...
Update 1
To be less unclear, while there are many algorithms that cover various parts of this my difficulty is in finding (or perhaps recognizing) information handling all the criteria together, especially minimizing the number of bins used when there is excess capacity and conflicting item-to-bin sets and item preferences, which is hopefully more clearly shown in the following example:
| bin1 bin2 bin3 | bin1 bin2 bin3
------------------------ ----------------------------
item1 | 1 2 3 capacity | 3 2 3
item2 | 1 2 3
item3 | - 1 2
While bin1 is the most preferred, item3 can't be placed in it at all, and while bin2 is the next most preferred for all items, it can hold only two of the three items. So the correct set of assignments (x) is actually the least preferred bin:
| bin1 bin2 bin3
------------------------
item1 | x
item2 | x
item3 | x
Update 2
I reworked the description with information on how the goals relate and removed the variable of bin priority as it only makes finding an answer less likely and can be worked around elsewhere in the system I'm working on.
Suppose there are n items and b bins, and each bin has size s. The ordering of constraints you have added actually simplifies the problem a great deal.
They mean specifically that we should always pick items 1, 2, ..., m for the largest m <= n that will fit in the allotted number of bins (since picking a smaller number would necessarily produce a worse solution by rule 1). Items will be packed in bins in this order, possibly with some bins left incompletely filled (since rearranging items within a bin or across bins would produce a worse solution by rule 2). There are 2 cases:
m < n, meaning that we can't fit all the items. In that case, all b bins will be tightly packed with the 1st m items in that order, and we are done.
m = n, in which case we can fit all the items. We now consider subcases of this case.
In this case, it may be possible that packing bins tightly will leave a final block of 0 < e <= b of the bins completely empty. In that case, discard those final e empty bins and proceed (since using more bins would produce a worse solution by rule 3). In any case, call the final number of bins remaining r. (r = b - e.)
We now know exactly which items and which bins we will be using. We also know the order in which the items must be packed. Because of the ordering constraint, we can regard the decisions about which bins are to be left incompletely filled as the problem of how to inject "start-next-bin" instructions into the ordered list 1, 2, ... n of items. We can inject up to r-1 of these instructions.
This problem can be solved in O(nrs) time using dynamic programming. Essentially we compute the function:
f(i, j, k) = the score of the best solution in which the first i items occupy the first j boxes, with exactly k items in the jth box.
The recurrence is:
f(i, j, 0) = max(f(i, j-1, k)) over all 0 <= k <= s
f(i, j, k > 0) = f(i-1, j, k-1) + q(i, j)
Where q(i, j) is the quality score of assigning item i to box j. (As I mentioned in the comments on your post, you need to decide on some way to assign scores for a placement of any item i into any box j, presumably based on how well i's preferences are met. If it's easier to work with "badness" values than quality values, just change the max() to a min() and the -infinity boundary values below to infinity.)
The first equation says that the best score of a solution for the first i items whose rightmost bin is empty is equal to the best score that can be found by considering every solution for the first i items without that bin. These candidate solutions consist of all the ways that the previous bin can be packed, including leaving it empty too.
The second equation says that the best score for the first i items whose rightmost bin is not empty is found simply by adding the quality score for placing the last item to the best score for placing the first i-1 items in the same number of bins.
The boundary conditions are:
f(0, 0, 0) = 0
f(i, 0, k) = -infinity for all other i and k
After calculating the values of f(i, j, k) for each 0 <= i <= n, 0 <= j <= r and 0 <= k <= s and storing them in a table, f(n, r, s) will give the optimal score of the final solution. Although this only gives the score of the maximum, the actual optimal solution(s) can be found by tracing back through the f(i, j, k) matrix from the end, at each k = 0 step looking for the predecessor state (i.e. the alternative under the max()) that must have led to the current state. (It may happen that several alternatives under the max() give equal scores, in which case multiple optimal solutions exist, and any of these paths can be followed to find just one of them.)
This reminds me of the "Match" algorithm used to place medical school graduates in residency programs. What if you treat the items like students, their bin preferences like the rank lists, and the bins like hospitals?
Basically, you go through the list of items, and for each item, find the bin it prefers most. Check with the bin: do you have room for this item, and if not, do you prefer it more than any items you currently have?
If no, cross this bin off the item's list, and move to the item's next choice.
If yes, place this item in the bin, and put the displaced item (if any) back in the unmatched pool.
The difference between your problem and the residency match is that you wouldn't fix the bin's preferences up front. Instead you would use a rule that prefers items that bring the bin closest to 100% full.
My only concern is that this modification might make the algorithm unstable. But it's such a simple algorithm, it's probably worth trying.
This is a bipartite matching problem and can be solved in polynomial time.
http://en.wikipedia.org/wiki/Matching_(graph_theory)#Maximum_matchings_in_bipartite_graphs