Data structure for dynamically changing n-length sequence with longest subsequence length query - algorithm

I need to design a data structure for holding n-length sequences, with the following methods:
increasing() - returns length of the longest increasing sub-sequence
change(i, x) - adds x to i-th element of the sequence
Intuitively, this sounds like something solvable with some kind of interval tree. But I have no idea how to think of that.
I'm wondering how to use the fact, that we completely don't need to know how this sub-sequence looks like, we only need its length...
Maybe this is something that can be used, but I'm pretty much stuck at this point.

This solves the problem only for contiguous intervals. It doesn't solve arbitrary subsequences. :-(
It is possible to implement this with time O(1) for interval and O(log(n)) for change.
First of all we'll need a heap for all of the current intervals, with the largest on top. Finding the longest interval is just a question of looking on the top of the heap.
Next we need a bunch of information for each of our n slots.
value: Current value in this slot
interval_start: Where the interval containing this point starts
interval_end: Where the interval containing this point ends
heap_index: Where to find this interval in the heap NOTE: Heap operations MUST maintain this!
And now the clever trick! We always store the value for each slot. But we only store the interval information for an interval at the point in the interval whose index is divisible by the highest power of 2. There is always only one such point for any interval, so storing/modifying this is very little work.
Then to figure out what interval a given position in the array currently falls in, we have to look at all of the neighbors that are increasing powers of 2 until we find the last one with our value. So, for instance, position 13's information might be found in any of the positions 0, 8, 12, 13, 14, 16, 32, 64, .... (And we'll take the first interval we find it in in the list 0, ..., 64, 32, 16, 8, 12, 14, 13.) This is a search of a O(log(n)) list so is O(log(n)) work.
Now how do we implement change?
Update value.
Figure out what interval we were in, and whether we were at an interval boundary.
If intervals got changed, remove the old ones from the heap. (We may remove 0, 1 or 2)
If intervals got change, insert the new ones into the heap. (We may insert 0, 1, or 2)
That update is very complex, but it is a fixed number of O(log(n)) operations and so should be O(log(n)).

I try to explain my idea. It can be a bit simpler than implementing interval tree, and should give desirable complexity - O(1) for increasing(), and O(logS) for change(), where S is sequences count (can be reduced to N in worst cases of course).
At first you need original array. It need to check borders of intervals (I will use word interval as synonym to sequence) after change(). Let it be A
At the second you need bidirectional list of intervals. Element of this list should store left and right borders. Every increasing sequence should be presented as separate element of this list and this intervals should go one after another as they presented in A. Let this list be L. We need to operate pointers on elements, so, I don't know is it possible to do it on iterators with standard container.
At third you need priority queue that stores lengths of all intervals in you array. So, increasing() function can be done with O(1) time. But you need also storing of pointer to node from L to lookup intervals. Let this priority queue be PQ. More formally you priority queue contains pairs (length of interval, pointer to list node) with comparison only by length.
At forth you need tree, that can retrieve interval borders (or range) for particular element. It can be simply implemented with std::map where key is left border of tree, so with help of map::lower_bound you can find this interval. Value should store pointer to interval in L. Let this map be MP
And next important thing - List nodes should stores indecies of corresponding element in priority queue. And you shouldn't work with priority queue without connection with link to node from L (every swap operation on PQ you should update corresponding indecies on L).
change(i, x) operation can be looks like this:
Find interval, where i located with map. -> you find pointer to corresponding node in L. So, you know borders and length of interval
Try to understand what actions need to do: nothing, split interval, glue intervals.
Do this action on list and map with connection with PQ. If you need split interval, remove it from PQ (this is not remove-max operation) and then add 2 new elements to PQ. Similar if you need to glue intervals, you can remove one from PQ and do increase-key to second.
One difficulty is that PQ should support removing arbitrary element (by index), so you can't use std::priority_queue, but it is not difficult to implement as I think.

LIS can be solved with tree, but there is another implementation with dynamic programming, which is faster than recursive tree.
This is a simple implementation in C++.
class LIS {
private vector<int> seq ;
public LIS(vector<int> _seq) {seq = _seq ;}
public int increasing() {
int i, j ;
vector<int> lengths ;
lengths.resize(seq.size()) ;
for(i=0;i<seq.size();i++) lengths[i] = 1 ;
for(i=1;i<seq.size();i++) {
for(j=0;j<i;j++) {
if( seq[i] > seq[j] && lengths[i] < lengths[j]+1 ) {
lengths[i] = lengths[j] + 1 ;
}
}
}
int mxx = 0 ;
for(i=0;i<seq.size();i++)
mxx = mxx < lengths[i] ? lengths[i] : mxx ;
return mxx ;
}
public void change(i, x) {
seq[i] += x ;
}
}

Related

To get the total length of overlapped intervals using Segment tree with lazy propogation

So here's the problem: given a set of intervals [a,b) where a,b is integers,0<=a < b=10^5, what's the length of all the interval(overlapped parts will only be counted once)? if we want to support two operations: add(a,b), and remove(a,b) ([a,b) exist in the set in the case of remove), which add and remove interval [a,b) to and from the set then return the new total length, can you do them in O(logn), where n is 10^5? .eg: if we have interval [1,3),[2,4), then the total length is 3 in this case.
My approach to it is using Segment tree, nodes of which is
class Segnode:
def __init__(self,start,end):
self.start,self.end = start,end
self.left,self.right = None,None
self.layer,self.cover = 0,0
self.lazy = 0
where layer record how many intervals cover this node (eg: if we have a set of [0,3),[1,3),[1,2), then the Segnode(1,3) will have layer 2, because only [0,3) and [1,3) totally cover it); cover record the length of the part of this node that covered(eg: Segnode(1,3) in last example will have cover 2).
But the thing is I couldn't come up with a correct,LAZY way to update the tree (if we don't use lazy propagation then the problem is trivial, but time complexity could reach O(n) where n could be 10^5 per operation). Can someone help me with this part? is there a correct, lazy approach to this?
Thank you very much.

What happens if we iterates build-max- heap in Top Down Manner

what are the disadvantages if we construct build heap in top down manner with brief time complexity calculation.in brief using first buid-max-heap heap algorithm than commonly used second algorithm
Build-max-heap(A)
{
A.heap-size=A.length
for(i=1 to [A.lenth]/2)
max-heapify(A,i)
}
Build-max-heap(A)
{
A.heap-size=A.length
for(i=[A.lenth]/2 downto 1)
max-heapify(A,i)
}
As written, your first example won't do anything because i is less than [A.length/2]. I suspect you meant your first example to be:
for (i=1 to [A.length]/2)
Assuming that's what you meant, doing the min-heapify from the top, down will not result in a valid heap. Consider the original array [4,3,2,1], which represents this tree:
4
3 2
1
On the first iteration, you want to move 4, down. So you swap it with the smallest child and get the array [2,3,4,1].
Next, you want to filter 3. So you swap it with its smallest child and get [2,1,4,3]. You're done now, and your "heap" looks like this:
2
1 4
3
Which is not a valid heap.
When you go from the middle, up, then the smallest item can filter its way to the top. But when you go from the top down, it's possible for the smallest item never to reach the top.
a max or min heap is an implementation of a nested max or min function,
e.g. max(max(max(a, b), max(c, d)), ...), it is a kind of an expression tree for min() or max() of all array elements, that is, you are implementing max(a, b, c, ...) or min(a, b, c, ...). To yield the correct result you need to gather the min or max elements to compare. To do that you need to do a broad comparison of the bottom elements, then going up, the number of elements you need to compare is divided by 2 per level (one half are eliminated per level). Going from top to bottom will not yield the correct result; you are implementing the wrong expression.

How to "sort" elements of 2 possible values in place in linear time? [duplicate]

This question already has answers here:
Stable separation for two classes of elements in an array
(3 answers)
Closed 9 years ago.
Suppose I have a function f and array of elements.
The function returns A or B for any element; you could visualize the elements this way ABBAABABAA.
I need to sort the elements according to the function, so the result is: AAAAAABBBB
The number of A values doesn't have to equal the number of B values. The total number of elements can be arbitrary (not fixed). Note that you don't sort chars, you sort objects that have a single char representation.
Few more things:
the sort should take linear time - O(n),
it should be performed in place,
it should be a stable sort.
Any ideas?
Note: if the above is not possible, do you have ideas for algorithms sacrificing one of the above requirements?
If it has to be linear and in-place, you could do a semi-stable version. By semi-stable I mean that A or B could be stable, but not both. Similar to Dukeling's answer, but you move both iterators from the same side:
a = first A
b = first B
loop while next A exists
if b < a
swap a,b elements
b = next B
a = next A
else
a = next A
With the sample string ABBAABABAA, you get:
ABBAABABAA
AABBABABAA
AAABBBABAA
AAAABBBBAA
AAAAABBBBA
AAAAAABBBB
on each turn, if you make a swap you move both, if not you just move a. This will keep A stable, but B will lose its ordering. To keep B stable instead, start from the end and work your way left.
It may be possible to do it with full stability, but I don't see how.
A stable sort might not be possible with the other given constraints, so here's an unstable sort that's similar to the partition step of quick-sort.
Have 2 iterators, one starting on the left, one starting on the right.
While there's a B at the right iterator, decrement the iterator.
While there's an A at the left iterator, increment the iterator.
If the iterators haven't crossed each other, swap their elements and repeat from 2.
Lets say,
Object_Array[1...N]
Type_A objs are A1,A2,...Ai
Type_B objs are B1,B2,...Bj
i+j = N
FOR i=1 :N
if Object_Array[i] is of Type_A
obj_A_count=obj_A_count+1
else
obj_B_count=obj_B_count+1
LOOP
Fill the resultant array with obj_A and obj_B with their respective counts depending on obj_A > obj_B
The following should work in linear time for a doubly-linked list. Because up to N insertion/deletions are involved that may cause quadratic time for arrays though.
Find the location where the first B should be after "sorting". This can be done in linear time by counting As.
Start with 3 iterators: iterA starts from the beginning of the container, and iterB starts from the above location where As and Bs should meet, and iterMiddle starts one element prior to iterB.
With iterA skip over As, find the 1st B, and move the object from iterA to iterB->previous position. Now iterA points to the next element after where the moved element used to be, and the moved element is now just before iterB.
Continue with step 3 until you reach iterMiddle. After that all elements between first() and iterB-1 are As.
Now set iterA to iterB-1.
Skip over Bs with iterB. When A is found move it to just after iterA and increment iterA.
Continue step 6 until iterB reaches end().
This would work as a stable sort for any container. The algorithm includes O(N) insertion/deletion, which is linear time for containers with O(1) insertions/deletions, but, alas, O(N^2) for arrays. Applicability in you case depends on whether the container is an array rather than a list.
If your data structure is a linked list instead of an array, you should be able to meet all three of your constraints. You just skim through the list and accumulating and moving the "B"s will be trivial pointer changes. Pseudo code below:
sort(list) {
node = list.head, blast = null, bhead = null
while(node != null) {
nextnode = node.next
if(node.val == "a") {
if(blast != null){
//move the 'a' to the front of the 'B' list
bhead.prev.next = node, node.prev = bhead.prev
blast.next = node.next, node.next.prev = blast
node.next = bhead, bhead.prev = node
}
}
else if(node.val == "b") {
if(blast == null)
bhead = blast = node
else //accumulate the "b"s..
blast = node
}
3
node = nextnode
}
}
So, you can do this in an array, but the memcopies, that emulate the list swap, will make it quiet slow for large arrays.
Firstly, assuming the array of A's and B's is either generated or read-in, I wonder why not avoid this question entirely by simply applying f as the list is being accumulated into memory into two lists that would subsequently be merged.
Otherwise, we can posit an alternative solution in O(n) time and O(1) space that may be sufficient depending on Sir Bohumil's ultimate needs:
Traverse the list and sort each segment of 1,000,000 elements in-place using the permutation cycles of the segment (once this step is done, the list could technically be sorted in-place by recursively swapping the inner-blocks, e.g., ABB AAB -> AAABBB, but that may be too time-consuming without extra space). Traverse the list again and use the same constant space to store, in two interval trees, the pointers to each block of A's and B's. For example, segments of 4,
ABBAABABAA => AABB AABB AA + pointers to blocks of A's and B's
Sequential access to A's or B's would be immediately available, and random access would come from using the interval tree to locate a specific A or B. One option could be to have the intervals number the A's and B's; e.g., to find the 4th A, look for the interval containing 4.
For sorting, an array of 1,000,000 four-byte elements (3.8MB) would suffice to store the indexes, using one bit in each element for recording visited indexes during the swaps; and two temporary variables the size of the largest A or B. For a list of one billion elements, the maximum combined interval trees would number 4000 intervals. Using 128 bits per interval, we can easily store numbered intervals for the A's and B's, and we can use the unused bits as pointers to the block index (10 bits) and offset in the case of B (20 bits). 4000*16 bytes = 62.5KB. We can store an additional array with only the B blocks' offsets in 4KB. Total space under 5MB for a list of one billion elements. (Space is in fact dependent on n but because it is extremely small in relation to n, for all practical purposes, we may consider it O(1).)
Time for sorting the million-element segments would be - one pass to count and index (here we can also accumulate the intervals and B offsets) and one pass to sort. Constructing the interval tree is O(nlogn) but n here is only 4000 (0.00005 of the one-billion list count). Total time O(2n) = O(n)
This should be possible with a bit of dynamic programming.
It works a bit like counting sort, but with a key difference. Make arrays of size n for both a and b count_a[n] and count_b[n]. Fill these arrays with how many As or Bs there has been before index i.
After just one loop, we can use these arrays to look up the correct index for any element in O(1). Like this:
int final_index(char id, int pos){
if(id == 'A')
return count_a[pos];
else
return count_a[n-1] + count_b[pos];
}
Finally, to meet the total O(n) requirement, the swapping needs to be done in a smart order. One simple option is to have recursive swapping procedure that doesn't actually perform any swapping until both elements would be placed in correct final positions. EDIT: This is actually not true. Even naive swapping will have O(n) swaps. But doing this recursive strategy will give you absolute minimum required swaps.
Note that in general case this would be very bad sorting algorithm since it has memory requirement of O(n * element value range).

Store the largest 5000 numbers from a stream of numbers

Given the following problem:
"Store the largest 5000 numbers from a stream of numbers"
The solution which springs to mind is a binary search tree maintaining a count of the number of nodes in the tree and a reference to the smallest node once the count reaches 5000. When the count reaches 5000, each new number to add can be compared to the smallest item in the tree. If greater, the new number can be added then the smallest removed and the new smallest calculated (which should be very simple already having the previous smallest).
My concern with this solution is that the binary tree is naturally going to get skewed (as I'm only deleting on one side).
Is there a way to solve this problem which won't create a terribly skewed tree?
In case anyone wants it, I've included pseudo-code for my solution so far below:
process(number)
{
if (count == 5000 && number > smallest.Value)
{
addNode( root, number)
smallest = deleteNodeAndGetNewSmallest ( root, smallest)
}
}
deleteNodeAndGetNewSmallest( lastSmallest)
{
if ( lastSmallest has parent)
{
if ( lastSmallest has right child)
{
smallest = getMin(lastSmallest.right)
lastSmallest.parent.right = lastSmallest.right
}
else
{
smallest = lastSmallest.parent
}
}
else
{
smallest = getMin(lastSmallest.right)
root = lastSmallest.right
}
count--
return smallest
}
getMin( node)
{
if (node has left)
return getMin(node.left)
else
return node
}
add(number)
{
//standard implementation of add for BST
count++
}
The simplest solution for this is maintaining a min heap of max size 5000.
Every time a new number arrives - check if the heap is smaller then
5000, if it is - add it.
If it is not - check if the minimum is smaller then the new
element, and if it is, pop it out and insert the new element instead.
When you are done - you have a heap containing 5000 largest elements.
This solution is O(nlogk) complexity, where n is the number of elements and k is the number of elements you need (5000 in your case).
It can be done also in O(n) using selection algorithm - store all the elements, and then find the 5001th largest element, and return everything bigger than it. But it is harder to implement and for reasonable size input - might not be better. Also, if stream contains duplicates, more processing is needed.
Use a (minimum) priority queue. Add each incoming item to the queue and when the size reaches 5,000 remove the minimum (top) element every time you add an incoming element. The queue will contain the 5,000 largest elements and when the input stops, just remove the contents. This MinPQ is also called a heap but that is an overloaded term. Insertions and deletions take about log2(N). Where N maxes out at 5,000 this would be just over 12 [log2(4096) = 12] times the number of items you are processing.
An excellent source of info is Algorithms, (4th Edition) by Robert Sedgewick and Kevin Wayne. There is an excellent MOOC on coursera.org that is based on this text.

Finding the best pair of elements that don't exceed a certain weight?

I have a collection of objects, each of which has a weight and a value. I want to pick the pair of objects with the highest total value subject to the restriction that their combined weight does not exceed some threshold. Additionally, I am given two arrays, one containing the objects sorted by weight and one containing the objects sorted by value.
I know how to do it in O(n2) but how can I do it in O(n)?
This is a combinatorial optimization problem, and the fact the values are sorted means you can easily try a branch and bound approach.
I think that I have a solution that works in O(n log n) time and O(n) extra space. This isn't quite the O(n) solution you wanted, but it's still better than the naive quadratic solution.
The intuition behind the algorithm is that we want to be able to efficiently determine, for any amount of weight, the maximum value we can get with a single item that uses at most that much weight. If we can do this, we have a simple algorithm for solving the problem: iterate across the array of elements sorted by value. For each element, see how much additional value we could get by pairing a single element with it (using the values we precomputed), then find which of these pairs is maximum. If we can do the preprocessing in O(n log n) time and can answer each of the above queries in O(log n) time, then the total time for the second step will be O(n log n) and we have our answer.
An important observation we need to do the preprocessing step is as follows. Our goal is to build up a structure that can answer the question "which element with weight less than x has maximum value?" Let's think about how we might do this by adding one element at a time. If we have an element (value, weight) and the structure is empty, then we want to say that the maximum value we can get using weight at most "weight" is "value". This means that everything in the range [0, max_weight - weight) should be set to value. Otherwise, suppose that the structure isn't empty when we try adding in (value, weight). In that case, we want to say that any portion of the range [0, weight) whose value is less than value should be replaced by value.
The problem here is that when we do these insertions, there might be, on iteration k, O(k) different subranges that need to be updated, leading to an O(n2) algorithm. However, we can use a very clever trick to avoid this. Suppose that we insert all of the elements into this data structure in descending order of value. In that case, when we add in (value, weight), because we add the elements in descending order of value, each existing value in the data structure must be higher than our value. This means that if the range [0, weight) intersects any range at all, those ranges will automatically be higher than value and so we don't need to update them. If we combine this with the fact that each range we add always spans from zero to some value, the only portion of the new range that could ever be added to the data structure is the range [weight, x), where x is the highest weight stored in the data structure so far.
To summarize, assuming that we visit the (value, weight) pairs in descending order of value, we can update our data structure as follows:
If the structure is empty, record that the range [0, value) has value "value."
Otherwise, if the highest weight recorded in the structure is greater than weight, skip this element.
Otherwise, if the highest weight recorded so far is x, record that the range [weight, x) has value "value."
Notice that this means that we are always splitting ranges at the front of the list of ranges we have encountered so far. Because of this, we can think about storing the list of ranges as a simple array, where each array element tracks the upper endpoint of some range and the value assigned to that range. For example, we might track the ranges [0, 3), [3, 9), and [9, 12) as the array
3, 9, 12
If we then needed to split the range [0, 3) into [0, 1) and [1, 3), we could do so by prepending 1 to he list:
1, 3, 9, 12
If we represent this array in reverse (actually storing the ranges from high to low instead of low to high), this step of creating the array runs in O(n) time because at each point we just do O(1) work to decide whether or not to add another element onto the end of the array.
Once we have the ranges stored like this, to determine which of the ranges a particular weight falls into, we can just use a binary search to find the largest element smaller than that weight. For example, to look up 6 in the above array we'd do a binary search to find 3.
Finally, once we have this data structure built up, we can just look at each of the objects one at a time. For each element, we see how much weight is left, use a binary search in the other structure to see what element it should be paired with to maximize the total value, and then find the maximum attainable value.
Let's trace through an example. Given maximum allowable weight 10 and the objects
Weight | Value
------+------
2 | 3
6 | 5
4 | 7
7 | 8
Let's see what the algorithm does. First, we need to build up our auxiliary structure for the ranges. We look at the objects in descending order of value, starting with the object of weight 7 and value 8. This means that if we ever have at least seven units of weight left, we can get 8 value. Our array now looks like this:
Weight: 7
Value: 8
Next, we look at the object of weight 4 and value 7. This means that with four or more units of weight left, we can get value 7:
Weight: 7 4
Value: 8 7
Repeating this for the next item (weight six, value five) does not change the array, since if the object has weight six, if we ever had six or more units of free space left, we would never choose this; we'd always take the seven-value item of weight four. We can tell this since there is already an object in the table whose range includes remaining weight four.
Finally, we look at the last item (value 3, weight 2). This means that if we ever have weight two or more free, we could get 3 units of value. The final array now looks like this:
Weight: 7 4 2
Value: 8 7 3
Finally, we just look at the objects in any order to see what the best option is. When looking at the object of weight 2 and value 3, since the maximum allowed weight is 10, we need tom see how much value we can get with at most 10 - 2 = 8 weight. A binary search over the array tells us that this value is 8, so one option would give us 11 weight. If we look at the object of weight 6 and value 5, a binary search tells us that with five remaining weight the best we can do would be to get 7 units of value, for a total of 12 value. Repeating this on the next two entries doesn't turn up anything new, so the optimum value found has value 12, which is indeed the correct answer.
Hope this helps!
Here is an O(n) time, O(1) space solution.
Let's call an object x better than an object y if and only if (x is no heavier than y) and (x is no less valuable) and (x is lighter or more valuable). Call an object x first-choice if no object is better than x. There exists an optimal solution consisting either of two first-choice objects, or a first-choice object x and an object y such that only x is better than y.
The main tool is to be able to iterate the first-choice objects from lightest to heaviest (= least valuable to most valuable) and from most valuable to least valuable (= heaviest to lightest). The iterator state is an index into the objects by weight (resp. value) and a max value (resp. min weight) so far.
Each of the following steps is O(n).
During a scan, whenever we encounter an object that is not first-choice, we know an object that's better than it. Scan once and consider these pairs of objects.
For each first-choice object from lightest to heaviest, determine the heaviest first-choice object that it can be paired with, and consider the pair. (All lighter objects are less valuable.) Since the latter object becomes lighter over time, each iteration of the loop is amortized O(1). (See also searching in a matrix whose rows and columns are sorted.)
Code for the unbelievers. Not heavily tested.
from collections import namedtuple
from operator import attrgetter
Item = namedtuple('Item', ('weight', 'value'))
sentinel = Item(float('inf'), float('-inf'))
def firstchoicefrombyweight(byweight):
bestsofar = sentinel
for x in byweight:
if x.value > bestsofar.value:
bestsofar = x
yield (x, bestsofar)
def firstchoicefrombyvalue(byvalue):
bestsofar = sentinel
for x in byvalue:
if x.weight < bestsofar.weight:
bestsofar = x
yield x
def optimize(items, maxweight):
byweight = sorted(items, key=attrgetter('weight'))
byvalue = sorted(items, key=attrgetter('value'), reverse=True)
maxvalue = float('-inf')
try:
i = firstchoicefrombyvalue(byvalue)
y = i.next()
for x, z in firstchoicefrombyweight(byweight):
if z is not x and x.weight + z.weight <= maxweight:
maxvalue = max(maxvalue, x.value + z.value)
while x.weight + y.weight > maxweight:
y = i.next()
if y is x:
break
maxvalue = max(maxvalue, x.value + y.value)
except StopIteration:
pass
return maxvalue
items = [Item(1, 1), Item(2, 2), Item(3, 5), Item(3, 7), Item(5, 8)]
for maxweight in xrange(3, 10):
print maxweight, optimize(items, maxweight)
This is similar to Knapsack problem. I will use naming from it (num - weight, val - value).
The essential part:
Start with a = 0 and b = n-1. Assuming 0 is the index of heaviest object and n-1 is the index of lightest object.
Increase a til objects a and b satisfy the limit.
Compare current solution with best solution.
Decrease b by one.
Go to 2.
Update:
It's the knapsack problem, except there is a limit of 2 items. You basically need to decide how much space you want for the first object and how much for the other. There is n significant ways to split available space, so the complexity is O(n). Picking the most valuable objects to fit in those spaces can be done without additional cost.

Resources