Algorithm to keep collection sorted while inserting in the middle - algorithm

Let's say I have a large collection of elements.
Each element has a "position" field, which is a positive integer.
No two elements have the same value for the field "position".
The only supported operation of the collection is: addElement(newElement, positionAfterElement), where:
- newElement is the new element to be added (its position is unknown for now)
- positionAfterElement is an existing element of the collection.
The function will guarantee that:
- position(positionAfterElement) < position(newElement)
- no other element in the collection has a position between position(positionAfterElement) and position(newElement)
I can change the value of all the element positions as I wish but I want to minimize the number of changes (on average).
How should I implement the addElement function?
I could just push all the elements with higher positions by 1 but I am pretty sure there must be a better way to do this.
Thanks all for your help.

Use a balanced tree. At every node of the tree, keep a count of the number of items below it (left.count + right.count + 1). Then, you can compute the index of an item easily while traversing to it. This is O(n log n) time in the number of operations.

Here's a basic idea:
expected_number_of_elements = 10^6
spread_factor = 100
first element gets position = spread_factor * expected_number_of_element
each following element inserted:
if its inserted in last position, give it the last element's position + spread_factor
if its inserted in the first position, give it the first element's position - spread_factor
otherwise, put it in the middle between its 2 closest neighbors
if you don't have any space left: expand_the_array
expand_the_array:
spread_factor = spread_factor * 10
iterate over all the elements, and multiply position by 10.
expanding the array is an expensive operation, but since it multiplies the size of the array, on average (assuming your input is random, and not crafted by an adversary) you'll have to do this operation very rarely.
the major drawback of this solution, is that you'll have to watch out for int overflow....

OK, so here is what I implemented, in pseudo-code:
addElement(newElement, positionAfterElement):
p0 <- positionOf(positionAfterElement)
c <- 5
finished <- false
while (not finished):
search for all elements which positions are in the range [p0 + 1, p0 + c]
if there are less than c elements in that range: // the range is not full
adjust position of elements in the range, and position of newElement,
so that
- elements are correctly ordered
- element positions are "evenly spread out" within the range
finished <- true
else: // the range is full
c <- c * 2
end while
end addElement

Related

why Find-Minimum operation in priority queue implemented in unsorted array take only complexity = O(1) ? <steven skiena's the algorithm design manual>

In steven skiena's the algorithm design manual (page 85),
The author show in a table that priority queue implemented in unsorted array only take O(1) for both insertion and find minimum operation.
For my understanding unsorted array wasn't able get the minimum item in O(1) , because it has to search through the whole array to get the minimum.
is there any details i missed out in priority queue ?
It's (mostly) written there under the table:
The trick is using an extra variable to store a pointer/index to the minimum ...
Presumably, the next word is "value", meaning it's a simple O(1) dereference to get the minimum.
When inserting an item, you just append it to the end and, if it's less than the current minimum, update that pointer/index. That means O(1) for the insert.
The only "expensive" operation is then delete-minimum. You know where it is due to the pointer/index but it will take O(n) operations to shuffle the array elements beyond it down one.
And, since the cost is already O(n), you may as well take the opportunity to search the array for the new minimum and store its position in the pointer/index.
The pseudo-code for those operations be something along the lines of (first up, initialisation and insertion, and assuming zero-based indexes):
class prioQ:
array = [] # Empty queue.
lowIndex = 0 # Index of lowest value (for non-empty queue).
def insert(item):
# Add to end, quick calc if array empty beforehand.
array.append(item)
if len(array) == 1:
lowIndex = 0
return
# Adjust low-index only if inserted value smaller than current.
if array[lowIndex] > item:
lowIndex = len(array) - 1
Then a function to find the actual minimum value:
def findMin():
# Empty array means no minimum. Otherwise, return minimum.
if len(array) == 0: return None
return array[lowIndex]
And, finally, to extract the minimum value (remove it from the queue and return it):
def extractMin():
# Empty array means no minimum. Otherwise save lowest value.
if len(array) == 0: return None
retVal = array[lowIndex]
# Shuffle down all following elements to delete lowest one
for index = lowIndex to len(array) - 2 inclusive:
array[index] = array[index + 1]
# Remove final element (it's already been shuffled).
delete array[len(array) - 1]
# Find lowest element and store.
if len(array) > 0:
lowIndex = len(array) - 1
for index = len(array) - 2 to 0 inclusive:
if array[index] <= array[lowIndex]:
lowIndex = index
# Return saved value.
return retVal
As an aside, the two loops in the extractMin function could be combined in to one for efficiency. I've left it as two separate loops for readability.
One thing you should keep in mind, there are actually variations of the priority queue that preserve insertion order (within a priority level) and variations that do not care about that order.
For the latter case, you don't have to shuffle all the elements to remove an extracted one, you can simply move the last one in the array over the extracted one. This may result in some time savings if you don't actually need to preserve insertion order - you still have to scan the entire array looking for the new highest-priority item but at least the number of shuffle assignments will be reduced.
#paxdiablo's answer gives the scheme referred to in the book. Another way to achieve the same complexity is to always store the minimum at the first index in the array:
To insert x in O(1) time, either insert it at the end (if it is bigger than the current minimum), or copy the current minimum to the end and then store x at index 0.
To query the minimum in O(1) time, return the value at index 0.
To delete the minimum in O(n) time, search for the new minimum from index 1 onwards, write it at index 0, then "fill in the gap" by swapping the element at the last index to where the new minimum used to be.

Optimized solution for checking whether two-time interval arrays are overlapping or not?

Can you find whether two-time interval arrays are overlapping or not, in an optimized way?
Suppose input array A contains 10 elements, and each and every element have a start date and end date, And similarly, input array B contains 4 elements, and each and every element have a start data and end data. Now find whether A and B are overlapping or not?
Example 1:
Input:
A={[1,5],[7,10],[11,15]}; //Array A contains 3elements, and each element have start and end time.
B={[6,10],[1,5]};//Array B contains 2elements, and each element have start and end time.
Output: Yes // why because A and B are overlapping at [6,10] || [1,5]
Example 2:
Input:
A={[1,5],[8,10],[11,15]}; //Array A contains 3elements, and each element have start and end time.
B={[5,8],[15,16]};//Array B contains 2elements, and each element have start and end time.
Output: No // why because A and B are not-overlapping at [5,8] || [15,16]
I know we can solve this problem by using brute force, by iterating each element
in B and comparing with each element of A to check whether overlapping or not(A[i].start<=B[j].start and A[i].end>B[j].start), it'll take O(N*M) where N is length of array A and M is length of B.
Can you please optimize the solution.
You can sort the array according to the start times. You can then check if the end time of an element is greater than the start time of next element by iterating through both the arrays simultaneously(use two pointers). If it is the case, then you have found an overlap.
Here is what you can do
Build a segment tree using the values of array A. if the first interval id (l1, r1), 2nd is (l2, r2) and so on.
for every interval in A(li, ri) update the segment tree such that we update each element in the interval (li, ri) to 1. this can be done in O(logn) using lazy propagation
now for each interval in B(lj, rj) try to query the segment tree for this range. a query would return the sum of the range (lj, rj)
if sum is 0, then that range is non overlapping. else it is overlapping
overall complexity O(nlogn)

Judgecode -- Sort with swap (2)

The problem I've seen is as bellow, anyone has some idea on it?
http://judgecode.com/problems/1011
Given a permutation of integers from 0 to n - 1, sorting them is easy. But what if you can only swap a pair of integers every time?
Please calculate the minimal number of swaps
One classic algorithm seems to be permutation cycles (https://en.wikipedia.org/wiki/Cycle_notation#Cycle_notation). The number of swaps needed equals the total number of elements subtracted by the number of cycles.
For example:
1 2 3 4 5
2 5 4 3 1
Start with 1 and follow the cycle:
1 down to 2, 2 down to 5, 5 down to 1.
1 -> 2 -> 5 -> 1
3 -> 4 -> 3
We would need to swap index 1 with 5, then index 5 with 2; as well as index 3 with index 4. Altogether 3 swaps or n - 2. We subtract n by the number of cycles since cycle elements together total n and each cycle represents a swap less than the number of elements in it.
Here is a simple implementation in C for the above problem. The algorithm is similar to User גלעד ברקן:
Store the position of every element of a[] in b[]. So, b[a[i]] = i
Iterate over the initial array a[] from left to right.
At position i, check if a[i] is equal to i. If yes, then keep iterating.
If no, then it's time to swap. Look at the logic in the code minutely to see how the swapping takes place. This is the most important step as both array a[] and b[] needs to be modified. Increase the count of swaps.
Here is the implementation:
long long sortWithSwap(int n, int *a) {
int *b = (int*)malloc(sizeof(int)*n); //create a temporary array keeping track of the position of every element
int i,tmp,t,valai,posi;
for(i=0;i<n;i++){
b[a[i]] = i;
}
long long ans = 0;
for(i=0;i<n;i++){
if(a[i]!=i){
valai = a[i];
posi = b[i];
a[b[i]] = a[i];
a[i] = i;
b[i] = i;
b[valai] = posi;
ans++;
}
}
return ans;
}
The essence of solving this problem lies in the following observation
1. The elements in the array do not repeat
2. The range of elements is from 0 to n-1, where n is the size of the array.
The way to approach
After you have understood the way to approach the problem ou can solve it in linear time.
Imagine How would the array look like after sorting all the entries ?
It will look like arr[i] == i, for all entries . Is that convincing ?
First create a bool array named FIX, where FIX[i] == true if ith location is fixed, initialize this array with false initially
Start checking the original array for the match arr[i] == i, till the time this condition holds true, eveything is okay. While going ahead with traversal of array also update the FIX[i] = true. The moment you find that arr[i] != i you need to do something, arr[i] must have some value x such that x > i, how do we guarantee that ? The guarantee comes from the fact that the elements in the array do not repeat, therefore if the array is sorted till index i then it means that the element at position i in the array cannot come from left but from right.
Now the value x is essentially saying about some index , why so because the array only has elements till n-1 starting from 0, and in the sorted arry every element i of the array must be at location i.
what does arr[i] == x means is that , not only element i is not at it's correct position but also the element x is missing from it's place.
Now to fix ith location you need to look at xth location, because maybe xth location holds i and then you will swap the elements at indices i and x, and get the job done. But wait, it's not necessary that the index x will hold i (and you finish fixing these locations in just 1 swap). Rather it may be possible that index x holds value y, which again will be greater than i, because array is only sorted till location i.
Now before you can fix position i , you need to fix x, why ? we will see later.
So now again you try to fix position x, and then similarly you will try fixing till the time you don't see element i at some location in the fashion told .
The fashion is to follow the link from arr[i], untill you hit element i at some index.
It is guaranteed that you will definitely hit i at some location while following in this way . Why ? try proving it, make some examples, and you will feel it
Now you will start fixing all the index you saw in the path following from index i till this index (say it j). Now what you see is that the path which you have followed is a circular one and for every index i, the arr[i] is tored at it's previous index (index from where you reached here), and Once you see that you can fix the indices, and mark all of them in FIX array to be true. Now go ahead with next index of array and do the same thing untill whole array is fixed..
This was the complete idea, but to only conunt no. of swaps, you se that once you have found a cycle of n elements you need n swaps, and after doing that you fix the array , and again continue. So that's how you will count the no. of swaps.
Please let me know if you have some doubts in the approach .
You may also ask for C/C++ code help. Happy to help :-)

In random draw: how to insure that a value is not re-drawn too soon

When drawing in random from a set of values in succession, where a drawn value is allowed to
be drawn again, a given value has (of course) a small chance of being drawn twice (or more) in immediate succession, but that causes an issue (for the purposes of a given application) and we would like to eliminate this chance. Any algorithmic ideas on how to do so (simple/efficient)?
Ideally we would like to set a threshold say as a percentage of the size of the data set:
Say the size of the set of values N=100, and the threshold T=10%, then if a given value is drawn in the current draw, it is guaranteed not to show up again in the next N*T=10 draws.
Obviously this restriction introduces bias in the random selection. We don't mind that a
proposed algorithm introduces further bias into the randomness of selection, what really
matters for this application is that the selection is just random enough to appear so
for a human observer.
As an implementation detail, the values are stored as database records, so database table flags/values can be used, or maybe external memory structures. Answers about the abstract case are welcome too.
Edit:
I just hit this other SO question here, which has good overlap with my own. Going through the good points there.
Here's an implementation that does the whole process in O(1) (for a single element) without any bias:
The idea is to treat the last K elements in the array A (which contains all the values) like a queue, we draw a value from the first N-k values in A, which is the random value, and swap it with an element in position N-Pointer, when Pointer represents the head of the queue, and it resets to 1 when it crosses K elements.
To eliminate any bias in the first K draws, the random value will be drawn between 1 and N-Pointer instead of N-k, so this virtual queue is growing in size at each draw until reaching the size of K (e.g. after 3 draws the number of possible values appear in A between indexes 1 and N-3, and the suspended values appear in indexes N-2 to N.
All operations are O(1) for drawing a single elemnent and there's no bias throughout the entire process.
void DrawNumbers(val[] A, int K)
{
N = A.size;
random Rnd = new random;
int Drawn_Index;
int Count_To_K = 1;
int Pointer = K;
while (stop_drawing_condition)
{
if (Count_To_K <= K)
{
Drawn_Index = Rnd.NextInteger(1, N-Pointer);
Count_To_K++;
}
else
{
Drawn_Index = Rnd.NextInteger(1, N-K)
}
Print("drawn value is: " + A[Drawn_Index])
Swap(A[Drawn_Index], A[N-Pointer])
Pointer--;
if (Pointer < 1) Pointer = K;
}
}
My previous suggestion, by using a list and an actual queue, is dependent on the remove method of the list, which I believe can be at best O(logN) by using an array to implement a self balancing binary tree, as the list has to have direct access to indexes.
void DrawNumbers(list N, int K)
{
queue Suspended_Values = new queue;
random Rnd = new random;
int Drawn_Index;
while (stop_drawing_condition)
{
if (Suspended_Values.count == K)
N.add(Suspended_Value.Dequeue());
Drawn_Index = Rnd.NextInteger(1, N.size) // random integer between 1 and the number of values in N
Print("drawn value is: " + N[Drawn_Index]);
Suspended_Values.Enqueue(N[Drawn_Index]);
N.Remove(Drawn_Index);
}
}
I assume you have an array, A, that contains the items you want to draw. At each time period you randomly select an item from A.
You want to prevent any given item, i, from being drawn again within some k iterations.
Let's say that your threshold is 10% of A.
So create a queue, call it drawn, that can hold threshold items. Also create a hash table that contains the drawn items. Call the hash table hash.
Then:
do
{
i = Get random item from A
if (i in hash)
{
// we have drawn this item recently. Don't draw it.
continue;
}
draw(i);
if (drawn.count == k)
{
// remove oldest item from queue
temp = drawn.dequeue();
// and from the hash table
hash.remove(temp);
}
// add new item to queue and hash table
drawn.enqueue(i);
hash.add(i);
} while (forever);
The hash table exists solely to increase lookup speed. You could do without the hash table if you're willing to do a sequential search of the queue to determine if an item has been drawn recently.
Say you have n items in your list, and you don't want any of the k last items to be selected.
Select at random from an array of size n-k, and use a queue of size k to stick the items you don't want to draw (adding to the front and removing from the back).
All operations are O(1).
---- clarification ----
Give n items, and a goal of not redrawing any of the last k draws, create an array and queue as follows.
Create an array A of size n-k, and put n-k of your items in the list (chosen at random, or seeded however you like).
Create a queue (linked list) Q and populate it with the remaining k items, again in random order or whatever order you like.
Now, each time you want to select an item at random:
Choose a random index from your array, call this i.
Give A[i] to whomever is asking for it, and add it to the front of Q.
Remove the element from the back of Q, and store it in A[i].
Everything is O(1) after the array and linked list are created, which is a one-time O(n) operation.
Now, you might wonder, what do we do if we want to change n (i.e. add or remove an element).
Each time we add an element, we either want to grow the size of A or of Q, depending on our logic for deciding what k is (i.e. fixed value, fixed fraction of n, whatever...).
If Q increases then the result is trivial, we just append the new element to Q. In this case I'd probably append it to the end of Q so that it gets in play ASAP. You could also put it in A, kicking some element out of A and appending it to the end of Q.
If A increases, you can use a standard technique for increasing arrays in amortized constant time. E.g., each time A fills up, we double it in size, and keep track of the number of cells of A that are live. (look up 'Dynamic Arrays' in Wikipedia if this is unfamiliar).
Set-based approach:
If the threshold is low (say below 40%), the suggested approach is:
Have a set and a queue of the last N*T generated values.
When generating a value, keep regenerating it until it's not contained in the set.
When pushing to the queue, pop the oldest value and remove it from the set.
Pseudo-code:
generateNextValue:
// once we're generated more than N*T elements,
// we need to start removing old elements
if queue.size >= N*T
element = queue.pop
set.remove(element)
// keep trying to generate random values until it's not contained in the set
do
value = getRandomValue()
while set.contains(value)
set.add(value)
queue.push(value)
return value
If the threshold is high, you can just turn the above on its head:
Have the set represent all values not in the last N*T generated values.
Invert all set operations (replace all set adds with removes and vice versa and replace the contains with !contains).
Pseudo-code:
generateNextValue:
if queue.size >= N*T
element = queue.pop
set.add(element)
// we can now just get a random value from the set, as it contains all candidates,
// rather than generating random values until we find one that works
value = getRandomValueFromSet()
//do
// value = getRandomValue()
//while !set.contains(value)
set.remove(value)
queue.push(value)
return value
Shuffled-based approach: (somewhat more complicated that the above)
If the threshold is a high, the above may take long, as it could keep generating values that already exists.
In this case, some shuffle-based approach may be a better idea.
Shuffle the data.
Repeatedly process the first element.
When doing so, remove it and insert it back at a random position in the range [N*T, N].
Example:
Let's say N*T = 5 and all possible values are [1,2,3,4,5,6,7,8,9,10].
Then we first shuffle, giving us, let's say, [4,3,8,9,2,6,7,1,10,5].
Then we remove 4 and insert it back in some index in the range [5,10] (say at index 5).
Then we have [3,8,9,2,4,6,7,1,10,5].
And continue removing the next element and insert it back, as required.
Implementation:
An array is fine if we don't care about efficient a whole lot - to get one element will cost O(n) time.
To make this efficient we need to use an ordered data structure that supports efficient random position inserts and first position removals. The first thing that comes to mind is a (self-balancing) binary search tree, ordered by index.
We won't be storing the actual index, the index will be implicitly defined by the structure of the tree.
At each node we will have a count of children (+ 1 for itself) (which needs to be updated on insert / remove).
An insert can be done as follows: (ignoring the self-balancing part for the moment)
// calling function
insert(node, value)
insert(node, N*T, value)
insert(node, offset, value)
// node.left / node.right can be defined as 0 if the child doesn't exist
leftCount = node.left.count - offset
rightCount = node.right.count
// Since we're here, it means we're inserting in this subtree,
// thus update the count
node.count++
// Nodes to the left are within N*T, so simply go right
// leftCount is the difference between N*T and the number of nodes on the left,
// so this needs to be the new offset (and +1 for the current node)
if leftCount < 0
insert(node.right, -leftCount+1, value)
else
// generate a random number,
// on [0, leftCount), insert to the left
// on [leftCount, leftCount], insert at the current node
// on (leftCount, leftCount + rightCount], insert to the right
sum = leftCount + rightCount + 1
random = getRandomNumberInRange(0, sum)
if random < leftCount
insert(node.left, offset, value)
else if random == leftCount
// we don't actually want to update the count here
node.count--
newNode = new Node(value)
newNode.count = node.count + 1
// TODO: swap node and newNode's data so that node's parent will now point to newNode
newNode.right = node
newNode.left = null
else
insert(node.right, -leftCount+1, value)
To visualize inserting at the current node:
If we have something like:
4
/
1
/ \
2 3
And we want to insert 5 where 1 is now, it will do this:
4
/
5
\
1
/ \
2 3
Note that when a red-black tree, for example, performs operations to keep itself balanced, none of these involve comparisons, so it doesn't need to know the order (i.e. index) of any already-inserted elements. But it will have to update the counts appropriately.
The overall efficiency will be O(log n) to get one element.
I'd put all "values" into a "list" of size N, then shuffle the list and retrieve values from the top of the list. Then you "insert" the retrieved value at a random position with any index >= N*T.
Unfortunately I'm not truly a math-guy :( So I simply tried it (in VB, so please take it as pseudocode ;) )
Public Class BiasedRandom
Private prng As New Random
Private offset As Integer
Private l As New List(Of Integer)
Public Sub New(ByVal size As Integer, ByVal threshold As Double)
If threshold <= 0 OrElse threshold >= 1 OrElse size < 1 Then Throw New System.ArgumentException("Check your params!")
offset = size * threshold
' initial fill
For i = 0 To size - 1
l.Add(i)
Next
' shuffle "Algorithm p"
For i = size - 1 To 1 Step -1
Dim j = prng.Next(0, i + 1)
Dim tmp = l(i)
l(i) = l(j)
l(j) = tmp
Next
End Sub
Public Function NextValue() As Integer
Dim tmp = l(0)
l.RemoveAt(0)
l.Insert(prng.Next(offset, l.Count + 1), tmp)
Return tmp
End Function
End Class
Then a simple check:
Public Class Form1
Dim z As Integer = 10
Dim k As BiasedRandom
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
k = New BiasedRandom(z, 0.5)
End Sub
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim j(z - 1)
For i = 1 To 10 * 1000 * 1000
j(k.NextValue) += 1
Next
Stop
End Sub
End Class
And when I check out the distribution it looks okay enough for an unarmed eye ;)
EDIT:
After thinking about RonTeller's argumentation, I have to admit that he is right. I don't think that there is a performance friendly way to achieve the wanted and to pertain a good (not more biased than required) random order.
I came to the follwoing idea:
Given a list (array whatever) like this:
0123456789 ' not shuffled to make clear what I mean
We return the first element which is 0. This one must not come up again for 4 (as an example) more draws but we also want to avoid a strong bias. Why not simply put it to the end of the list and then shuffle the "tail" of the list, i.e. the last 6 elements?
1234695807
We now return the 1 and repeat the above steps.
2340519786
And so on and so on. Since removing and inserting is kind of unnecessary work, one could use a simple array and a "pointer" to the actual element. I have changed the code from above to give a sample. It's slower than the first one, but should avoid the mentioned bias.
Public Function NextValue() As Integer
Static current As Integer = 0
' only shuffling a part of the list
For i = current + l.Count - 1 To current + 1 + offset Step -1
Dim j = prng.Next(current + offset, i + 1)
Dim tmp = l(i Mod l.Count)
l(i Mod l.Count) = l(j Mod l.Count)
l(j Mod l.Count) = tmp
Next
current += 1
Return l((current - 1) Mod l.Count)
End Function
EDIT 2:
Finally (hopefully), I think the solution is quite simple. The below code assumes that there is an array of N elements called TheArray which contains the elements in random order (could be rewritten to work with sorted array). The value DelaySize determines how long a value should be suspended after it has been drawn.
Public Function NextValue() As Integer
Static current As Integer = 0
Dim SelectIndex As Integer = prng.Next(0, TheArray.Count - DelaySize)
Dim ReturnValue = TheArray(SelectIndex)
TheArray(SelectIndex) = TheArray(TheArray.Count - 1 - current Mod DelaySize)
TheArray(TheArray.Count - 1 - current Mod DelaySize) = ReturnValue
current += 1
Return ReturnValue
End Function

Finding the best pair of elements that don't exceed a certain weight?

I have a collection of objects, each of which has a weight and a value. I want to pick the pair of objects with the highest total value subject to the restriction that their combined weight does not exceed some threshold. Additionally, I am given two arrays, one containing the objects sorted by weight and one containing the objects sorted by value.
I know how to do it in O(n2) but how can I do it in O(n)?
This is a combinatorial optimization problem, and the fact the values are sorted means you can easily try a branch and bound approach.
I think that I have a solution that works in O(n log n) time and O(n) extra space. This isn't quite the O(n) solution you wanted, but it's still better than the naive quadratic solution.
The intuition behind the algorithm is that we want to be able to efficiently determine, for any amount of weight, the maximum value we can get with a single item that uses at most that much weight. If we can do this, we have a simple algorithm for solving the problem: iterate across the array of elements sorted by value. For each element, see how much additional value we could get by pairing a single element with it (using the values we precomputed), then find which of these pairs is maximum. If we can do the preprocessing in O(n log n) time and can answer each of the above queries in O(log n) time, then the total time for the second step will be O(n log n) and we have our answer.
An important observation we need to do the preprocessing step is as follows. Our goal is to build up a structure that can answer the question "which element with weight less than x has maximum value?" Let's think about how we might do this by adding one element at a time. If we have an element (value, weight) and the structure is empty, then we want to say that the maximum value we can get using weight at most "weight" is "value". This means that everything in the range [0, max_weight - weight) should be set to value. Otherwise, suppose that the structure isn't empty when we try adding in (value, weight). In that case, we want to say that any portion of the range [0, weight) whose value is less than value should be replaced by value.
The problem here is that when we do these insertions, there might be, on iteration k, O(k) different subranges that need to be updated, leading to an O(n2) algorithm. However, we can use a very clever trick to avoid this. Suppose that we insert all of the elements into this data structure in descending order of value. In that case, when we add in (value, weight), because we add the elements in descending order of value, each existing value in the data structure must be higher than our value. This means that if the range [0, weight) intersects any range at all, those ranges will automatically be higher than value and so we don't need to update them. If we combine this with the fact that each range we add always spans from zero to some value, the only portion of the new range that could ever be added to the data structure is the range [weight, x), where x is the highest weight stored in the data structure so far.
To summarize, assuming that we visit the (value, weight) pairs in descending order of value, we can update our data structure as follows:
If the structure is empty, record that the range [0, value) has value "value."
Otherwise, if the highest weight recorded in the structure is greater than weight, skip this element.
Otherwise, if the highest weight recorded so far is x, record that the range [weight, x) has value "value."
Notice that this means that we are always splitting ranges at the front of the list of ranges we have encountered so far. Because of this, we can think about storing the list of ranges as a simple array, where each array element tracks the upper endpoint of some range and the value assigned to that range. For example, we might track the ranges [0, 3), [3, 9), and [9, 12) as the array
3, 9, 12
If we then needed to split the range [0, 3) into [0, 1) and [1, 3), we could do so by prepending 1 to he list:
1, 3, 9, 12
If we represent this array in reverse (actually storing the ranges from high to low instead of low to high), this step of creating the array runs in O(n) time because at each point we just do O(1) work to decide whether or not to add another element onto the end of the array.
Once we have the ranges stored like this, to determine which of the ranges a particular weight falls into, we can just use a binary search to find the largest element smaller than that weight. For example, to look up 6 in the above array we'd do a binary search to find 3.
Finally, once we have this data structure built up, we can just look at each of the objects one at a time. For each element, we see how much weight is left, use a binary search in the other structure to see what element it should be paired with to maximize the total value, and then find the maximum attainable value.
Let's trace through an example. Given maximum allowable weight 10 and the objects
Weight | Value
------+------
2 | 3
6 | 5
4 | 7
7 | 8
Let's see what the algorithm does. First, we need to build up our auxiliary structure for the ranges. We look at the objects in descending order of value, starting with the object of weight 7 and value 8. This means that if we ever have at least seven units of weight left, we can get 8 value. Our array now looks like this:
Weight: 7
Value: 8
Next, we look at the object of weight 4 and value 7. This means that with four or more units of weight left, we can get value 7:
Weight: 7 4
Value: 8 7
Repeating this for the next item (weight six, value five) does not change the array, since if the object has weight six, if we ever had six or more units of free space left, we would never choose this; we'd always take the seven-value item of weight four. We can tell this since there is already an object in the table whose range includes remaining weight four.
Finally, we look at the last item (value 3, weight 2). This means that if we ever have weight two or more free, we could get 3 units of value. The final array now looks like this:
Weight: 7 4 2
Value: 8 7 3
Finally, we just look at the objects in any order to see what the best option is. When looking at the object of weight 2 and value 3, since the maximum allowed weight is 10, we need tom see how much value we can get with at most 10 - 2 = 8 weight. A binary search over the array tells us that this value is 8, so one option would give us 11 weight. If we look at the object of weight 6 and value 5, a binary search tells us that with five remaining weight the best we can do would be to get 7 units of value, for a total of 12 value. Repeating this on the next two entries doesn't turn up anything new, so the optimum value found has value 12, which is indeed the correct answer.
Hope this helps!
Here is an O(n) time, O(1) space solution.
Let's call an object x better than an object y if and only if (x is no heavier than y) and (x is no less valuable) and (x is lighter or more valuable). Call an object x first-choice if no object is better than x. There exists an optimal solution consisting either of two first-choice objects, or a first-choice object x and an object y such that only x is better than y.
The main tool is to be able to iterate the first-choice objects from lightest to heaviest (= least valuable to most valuable) and from most valuable to least valuable (= heaviest to lightest). The iterator state is an index into the objects by weight (resp. value) and a max value (resp. min weight) so far.
Each of the following steps is O(n).
During a scan, whenever we encounter an object that is not first-choice, we know an object that's better than it. Scan once and consider these pairs of objects.
For each first-choice object from lightest to heaviest, determine the heaviest first-choice object that it can be paired with, and consider the pair. (All lighter objects are less valuable.) Since the latter object becomes lighter over time, each iteration of the loop is amortized O(1). (See also searching in a matrix whose rows and columns are sorted.)
Code for the unbelievers. Not heavily tested.
from collections import namedtuple
from operator import attrgetter
Item = namedtuple('Item', ('weight', 'value'))
sentinel = Item(float('inf'), float('-inf'))
def firstchoicefrombyweight(byweight):
bestsofar = sentinel
for x in byweight:
if x.value > bestsofar.value:
bestsofar = x
yield (x, bestsofar)
def firstchoicefrombyvalue(byvalue):
bestsofar = sentinel
for x in byvalue:
if x.weight < bestsofar.weight:
bestsofar = x
yield x
def optimize(items, maxweight):
byweight = sorted(items, key=attrgetter('weight'))
byvalue = sorted(items, key=attrgetter('value'), reverse=True)
maxvalue = float('-inf')
try:
i = firstchoicefrombyvalue(byvalue)
y = i.next()
for x, z in firstchoicefrombyweight(byweight):
if z is not x and x.weight + z.weight <= maxweight:
maxvalue = max(maxvalue, x.value + z.value)
while x.weight + y.weight > maxweight:
y = i.next()
if y is x:
break
maxvalue = max(maxvalue, x.value + y.value)
except StopIteration:
pass
return maxvalue
items = [Item(1, 1), Item(2, 2), Item(3, 5), Item(3, 7), Item(5, 8)]
for maxweight in xrange(3, 10):
print maxweight, optimize(items, maxweight)
This is similar to Knapsack problem. I will use naming from it (num - weight, val - value).
The essential part:
Start with a = 0 and b = n-1. Assuming 0 is the index of heaviest object and n-1 is the index of lightest object.
Increase a til objects a and b satisfy the limit.
Compare current solution with best solution.
Decrease b by one.
Go to 2.
Update:
It's the knapsack problem, except there is a limit of 2 items. You basically need to decide how much space you want for the first object and how much for the other. There is n significant ways to split available space, so the complexity is O(n). Picking the most valuable objects to fit in those spaces can be done without additional cost.

Resources