Increment Range using Fenwick Tree

Increment Range using Fenwick Tree - data-structures

I was wondering if a Fenwick Tree (or Binary Indexed Tree) can be modified to:
1) Increment the frequency all elements in a range by a certain amount
2) Query the frequency of a single element.
This is as opposed to the traditional Fenwick Tree where updates are done on a single element and queries are done over a range (kind of like an inverse Fenwick Tree).

Sure!
Fenwick tree allows you to do these operations in O(log n):
update(x, delta) => increases value at index x by delta
query(x) => returns sum of values at indices 0,1,2,...,x
Here's a simple implementation of a Fenwick tree in C++:
int F[MAX];
void update( int x, int delta ) {
for( ++x; x < MAX; x += x&-x ) F[x] += delta;
}
int query( int x ) {
int sum = 0;
for( ++x; x > 0; x -= x&-x ) sum += F[x];
return sum;
}
Now forget about the internals of the Fenwick tree and focus on the problem. When using Fenwick tree, just imagine it literally stores an array of frequencies and somehow magically does both operations in O(log n). Function update modifies the frequency of a single element and query returns sum of frequencies of first x elements.
So in the 'traditional' problem you had these operations:
void incFreqAt( int index ) {
update( index, 1 );
}
int getFreqAt( int index ) {
return query( index ) - query( index-1 );
}
Now, instead of storing frequency of each single element, let's store differences between frequencies of adjacent elements.
These are the new operations:
void incFreqFromTo( int a, int b, int delta ) {
update( a, delta );
update( b+1, -delta );
}
int getFreqAt( int index ) {
return query( index );
}
When incrementing frequencies in range [a..b], just increment difference at index a and decrement difference at index b+1. This is also like saying: increment all frequencies in range [a..infinity] and decrement all frequencies in range [b+1..infinity].
To get the frequency of the element at index x, just sum all differences of frequencies in range [0..x].

Related

How to find minimum positive contiguous sub sequence in O(n) time?

We have this algorithm for finding maximum positive sub sequence in given sequence in O(n) time. Can anybody suggest similar algorithm for finding minimum positive contiguous sub sequence.
For example
If given sequence is 1,2,3,4,5 answer should be 1.
[5,-4,3,5,4] ->1 is the minimum positive sum of elements [5,-4].

There cannot be such algorithm. The lower bound for this problem is O(n log n). I'll prove it by reducing the element distinctness problem to it (actually to the non-negative variant of it).
Let's suppose we have an O(n) algorithm for this problem (the minimum non-negative subarray).
We want to find out if an array (e.g. A=[1, 2, -3, 4, 2]) has only distinct elements. To solve this problem, I could construct an array with the difference between consecutive elements (e.g. A'=[1, -5, 7, -2]) and run the O(n) algorithm we have. The original array only has distinct elements if and only if the minimum non-negative subarray is greater than 0.
If we had an O(n) algorithm to your problem, we would have an O(n) algorithm to element distinctness problem, which we know is not possible on a Turing machine.

We can have a O(n log n) algorithm as follow:
Assuming that we have an array prefix, which index i stores the sum of array A from 0 to i, so the sum of sub-array (i, j) is prefix[j] - prefix[i - 1].
Thus, in order to find the minimum positive sub-array ending at index j, so, we need to find the maximum element prefix[x], which less than prefix[j] and x < j. We can find that element in O(log n) time if we use a binary search tree.
Pseudo code:
int[]prefix = new int[A.length];
prefix[0] = A[0];
for(int i = 1; i < A.length; i++)
prefix[i] = A[i] + prefix[i - 1];
int result = MAX_VALUE;
BinarySearchTree tree;
for(int i = 0; i < A.length; i++){
if(A[i] > 0)
result = min(result, A[i];
int v = tree.getMaximumElementLessThan(prefix[i]);
result = min(result, prefix[i] - v);
tree.add(prefix[i]);
}

I believe there's a O(n) algorithm, see below.
Note: it has a scale factor that might make it less attractive in practical applications: it depends on the (input) values to be processed, see remarks in the code.
private int GetMinimumPositiveContiguousSubsequenc(List<Int32> values)
{
// Note: this method has no precautions against integer over/underflow, which may occur
// if large (abs) values are present in the input-list.
// There must be at least 1 item.
if (values == null || values.Count == 0)
throw new ArgumentException("There must be at least one item provided to this method.");
// 1. Scan once to:
// a) Get the mimumum positive element;
// b) Get the value of the MAX contiguous sequence
// c) Get the value of the MIN contiguous sequence - allowing negative values: the mirror of the MAX contiguous sequence.
// d) Pinpoint the (index of the) first negative value.
int minPositive = 0;
int maxSequence = 0;
int currentMaxSequence = 0;
int minSequence = 0;
int currentMinSequence = 0;
int indxFirstNegative = -1;
for (int k = 0; k < values.Count; k++)
{
int value = values[k];
if (value > 0)
if (minPositive == 0 || value < minPositive)
minPositive = value;
else if (indxFirstNegative == -1 && value < 0)
indxFirstNegative = k;
currentMaxSequence += value;
if (currentMaxSequence <= 0)
currentMaxSequence = 0;
else if (currentMaxSequence > maxSequence)
maxSequence = currentMaxSequence;
currentMinSequence += value;
if (currentMinSequence >= 0)
currentMinSequence = 0;
else if (currentMinSequence < minSequence)
minSequence = currentMinSequence;
}
// 2. We're done if (a) there are no negatives, or (b) the minPositive (single) value is 1 (or 0...).
if (minSequence == 0 || minPositive <= 1)
return minPositive;
// 3. Real work to do.
// The strategy is as follows, iterating over the input values:
// a) Keep track of the cumulative value of ALL items - the sequence that starts with the very first item.
// b) Register each such cumulative value as "existing" in a bool array 'initialSequence' as we go along.
// We know already the max/min contiguous sequence values, so we can properly size that array in advance.
// Since negative sequence values occur we'll have an offset to match the index in that bool array
// with the corresponding value of the initial sequence.
// c) For each next input value to process scan the "initialSequence" bool array to see whether relevant entries are TRUE.
// We don't need to go over the complete array, as we're only interested in entries that would produce a subsequence with
// a value that is positive and also smaller than best-so-far.
// (As we go along, the range to check will normally shrink as we get better and better results.
// Also: initially the range is already limited by the single-minimum-positive value that we have found.)
// Performance-wise this approach (which is O(n)) is suitable IFF the number of input values is large (or at least: not small) relative to
// the spread between maxSequence and minSeqence: the latter two define the size of the array in which we will do (partial) linear traversals.
// If this condition is not met it may be more efficient to replace the bool array by a (binary) search tree.
// (which will result in O(n logn) performance).
// Since we know the relevant parameters at this point, we may below have the two strategies both implemented and decide run-time
// which to choose.
// The current implementation has only the fixed bool array approach.
// Initialize a variable to keep track of the best result 'so far'; it will also be the return value.
int minPositiveSequence = minPositive;
// The bool array to keep track of which (total) cumulative values (always with the sequence starting at element #0) have occurred so far,
// and the 'offset' - see remark 3b above.
int offset = -minSequence;
bool[] initialSequence = new bool[maxSequence + offset + 1];
int valueCumulative = 0;
for (int k = 0; k < indxFirstNegative; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
}
for (int k = indxFirstNegative; k < values.Count; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
// Check whether the difference with any previous "cumulative" may improve the optimum-so-far.
// the index that, if the entry is TRUE, would yield the best possible result.
int indexHigh = valueCumulative + offset - 1;
// the last (lowest) index that, if the entry is TRUE, would still yield an improvement over what we have so far.
int indexLow = Math.Max(0, valueCumulative + offset - minPositiveSequence + 1);
for (int indx = indexHigh; indx >= indexLow; indx--)
{
if (initialSequence[indx])
{
minPositiveSequence = valueCumulative - indx + offset;
if (minPositiveSequence == 1)
return minPositiveSequence;
break;
}
}
}
return minPositiveSequence;
}
}

Adding sum of frequencies whille solving Optimal Binary search tree

I am referring to THIS problem and solution.
Firstly, I did not get why sum of frequencies is added in the recursive equation.
Can someone please help understand that with an example may be.
In Author's word.
We add sum of frequencies from i to j (see first term in the above
formula), this is added because every search will go through root and
one comparison will be done for every search.
In code, sum of frequencies (purpose of which I do not understand) ... corresponds to fsum.
int optCost(int freq[], int i, int j)
{
// Base cases
if (j < i) // If there are no elements in this subarray
return 0;
if (j == i) // If there is one element in this subarray
return freq[i];
// Get sum of freq[i], freq[i+1], ... freq[j]
int fsum = sum(freq, i, j);
// Initialize minimum value
int min = INT_MAX;
// One by one consider all elements as root and recursively find cost
// of the BST, compare the cost with min and update min if needed
for (int r = i; r <= j; ++r)
{
int cost = optCost(freq, i, r-1) + optCost(freq, r+1, j);
if (cost < min)
min = cost;
}
// Return minimum value
return min + fsum;
}
Secondly, this solution will just return the optimal cost. Any suggestions regarding how to get the actual bst ?

Why we need sum of frequencies
The idea behind sum of frequencies is to correctly calculate cost of particular tree. It behaves like accumulator value to store tree weight.
Imagine that on first level of recursion we start with all keys located on first level of the tree (we haven't picked any root element yet). Remember the weight function - it sums over all node weights multiplied by node level. For now weight of our tree equals to sum of weights of all keys because any of our keys can be located on any level (starting from first) and anyway we will have at least one weight for each key in our result.
1) Suppose that we found optimal root key, say key r. Next we move all our keys except r one level down because each of the elements left can be located at most on second level (first level is already occupied). Because of that we add weight of each key left to our sum because anyway for all of them we will have at least double weight. Keys left we split in two sub arrays according to r element(to the left from r and to the right) which we selected before.
2) Next step is to select optimal keys for second level, one from each of two sub arrays left from first step. After doing that we again move all keys left one level down and add their weights to the sum because they will be located at least on third level so we will have at least triple weight for each of them.
3) And so on.
I hope this explanation will give you some understanding of why we need this sum of frequencies.
Finding optimal bst
As author mentioned at the end of the article
2) In the above solutions, we have computed optimal cost only. The
solutions can be easily modified to store the structure of BSTs also.
We can create another auxiliary array of size n to store the structure
of tree. All we need to do is, store the chosen ‘r’ in the innermost
loop.
We can do just that. Below you will find my implementation.
Some notes about it:
1) I was forced to replace int[n][n] with utility class Matrix because I used Visual C++ and it does not support non-compile time constant expression as array size.
2) I used second implementation of the algorithm from article which you provided (with memorization) because it is much easier to add functionality to store optimal bst to it.
3) Author has mistake in his code:
Second loop for (int i=0; i<=n-L+1; i++) should have n-L as upper bound not n-L+1.
4) The way we store optimal bst is as follows:
For each pair i, j we store optimal key index. This is the same as for optimal cost but instead of storing optimal cost we store optimal key index. For example for 0, n-1 we will have index of the root key r of our result tree. Next we split our array in two according to root element index r and get their optimal key indexes. We can dot that by accessing matrix elements 0, r-1 and r+1, n-1. And so forth. Utility function 'PrintResultTree' uses this approach and prints result tree in in-order (left subtree, node, right subtree). So you basically get ordered list because it is binary search tree.
5) Please don't flame me for my code - I'm not really a c++ programmer. :)
int optimalSearchTree(int keys[], int freq[], int n, Matrix& optimalKeyIndexes)
{
/* Create an auxiliary 2D matrix to store results of subproblems */
Matrix cost(n,n);
optimalKeyIndexes = Matrix(n, n);
/* cost[i][j] = Optimal cost of binary search tree that can be
formed from keys[i] to keys[j].
cost[0][n-1] will store the resultant cost */
// For a single key, cost is equal to frequency of the key
for (int i = 0; i < n; i++)
cost.SetCell(i, i, freq[i]);
// Now we need to consider chains of length 2, 3, ... .
// L is chain length.
for (int L = 2; L <= n; L++)
{
// i is row number in cost[][]
for (int i = 0; i <= n - L; i++)
{
// Get column number j from row number i and chain length L
int j = i + L - 1;
cost.SetCell(i, j, INT_MAX);
// Try making all keys in interval keys[i..j] as root
for (int r = i; r <= j; r++)
{
// c = cost when keys[r] becomes root of this subtree
int c = ((r > i) ? cost.GetCell(i, r - 1) : 0) +
((r < j) ? cost.GetCell(r + 1, j) : 0) +
sum(freq, i, j);
if (c < cost.GetCell(i, j))
{
cost.SetCell(i, j, c);
optimalKeyIndexes.SetCell(i, j, r);
}
}
}
}
return cost.GetCell(0, n - 1);
}
Below is utility class Matrix:
class Matrix
{
private:
int rowCount;
int columnCount;
std::vector<int> cells;
public:
Matrix()
{
}
Matrix(int rows, int columns)
{
rowCount = rows;
columnCount = columns;
cells = std::vector<int>(rows * columns);
}
int GetCell(int rowNum, int columnNum)
{
return cells[columnNum + rowNum * columnCount];
}
void SetCell(int rowNum, int columnNum, int value)
{
cells[columnNum + rowNum * columnCount] = value;
}
};
And main method with utility function to print result tree in in-order:
//Print result tree in in-order
void PrintResultTree(
Matrix& optimalKeyIndexes,
int startIndex,
int endIndex,
int* keys)
{
if (startIndex == endIndex)
{
printf("%d\n", keys[startIndex]);
return;
}
else if (startIndex > endIndex)
{
return;
}
int currentOptimalKeyIndex = optimalKeyIndexes.GetCell(startIndex, endIndex);
PrintResultTree(optimalKeyIndexes, startIndex, currentOptimalKeyIndex - 1, keys);
printf("%d\n", keys[currentOptimalKeyIndex]);
PrintResultTree(optimalKeyIndexes, currentOptimalKeyIndex + 1, endIndex, keys);
}
int main(int argc, char* argv[])
{
int keys[] = { 10, 12, 20 };
int freq[] = { 34, 8, 50 };
int n = sizeof(keys) / sizeof(keys[0]);
Matrix optimalKeyIndexes;
printf("Cost of Optimal BST is %d \n", optimalSearchTree(keys, freq, n, optimalKeyIndexes));
PrintResultTree(optimalKeyIndexes, 0, n - 1, keys);
return 0;
}
EDIT:
Below you can find code to create simple tree like structure.
Here is utility TreeNode class
struct TreeNode
{
public:
int Key;
TreeNode* Left;
TreeNode* Right;
};
Updated main function with BuildResultTree function
void BuildResultTree(Matrix& optimalKeyIndexes,
int startIndex,
int endIndex,
int* keys,
TreeNode*& tree)
{
if (startIndex > endIndex)
{
return;
}
tree = new TreeNode();
tree->Left = NULL;
tree->Right = NULL;
if (startIndex == endIndex)
{
tree->Key = keys[startIndex];
return;
}
int currentOptimalKeyIndex = optimalKeyIndexes.GetCell(startIndex, endIndex);
tree->Key = keys[currentOptimalKeyIndex];
BuildResultTree(optimalKeyIndexes, startIndex, currentOptimalKeyIndex - 1, keys, tree->Left);
BuildResultTree(optimalKeyIndexes, currentOptimalKeyIndex + 1, endIndex, keys, tree->Right);
}
int main(int argc, char* argv[])
{
int keys[] = { 10, 12, 20 };
int freq[] = { 34, 8, 50 };
int n = sizeof(keys) / sizeof(keys[0]);
Matrix optimalKeyIndexes;
printf("Cost of Optimal BST is %d \n", optimalSearchTree(keys, freq, n, optimalKeyIndexes));
PrintResultTree(optimalKeyIndexes, 0, n - 1, keys);
TreeNode* tree = new TreeNode();
BuildResultTree(optimalKeyIndexes, 0, n - 1, keys, tree);
return 0;
}

Algorithm for counting n-element subsets of m-element set divisible by k

Suppose {1, 2, 3, ..., m} is a set. I choose n distinct elements from this set. Can I write an algorithm which counts the number of such subsets whose sum is divisible by k (ordering not mattering)?
This problem would have been much easier if ordering mattered, but it doesn't and I don't have a clue how to approach. Can anyone please help?

This can be done in time O(n·k·m) and space O(n·k) by a method similar to that outlined below. Let S be your set with m elements. By definition of set and subset, all the elements of S are distinct, as are all the elements of any S-subset.
First, consider the simpler problem where we count S-subsets with any number of elements instead of exactly n elements. Let N(W,r) be the number of W-subsets U such that ΣU (the sum of elements of U) is equal to r mod k. If W is a subset of S, let W' be W + z, where z ∈ S\W; that is, z is an element of S not already in W. Now N(W', (r+z)%k) = N(W, (r+z)%k) + N(W, r) because N(W, (r+z)%k) is the number of W'-subsets U with ΣU≡(r+z)%k) that don't contain z and N(W, r) is the number of W'-subsets U with ΣU≡(r+z)%k) that do contain z. Repeat this construction, treating each element of S in turn until W' = S, at which point the desired answer is N(S,0). Time is O(k·m), space is O(k).
To adapt the above process for exact subset sizes, change N(W,r) to N(W,h,r), where h is a subset size, and adapt the equations for N(W',r) to N(W',h,r) in the obvious way. Time is O(k·n·m), space is O(k·n).

set -> all elements are different.
create an array to describe how many representatives each numberclass has:
ncnt=new int[k]
for x in elements{
ncnt[x%k]++;
}
dynamic programming:
int waysToCreate(int input_class,int class_idx, int n){
int ways=0
// not using this class:
if(class_idx+1 < k )
ways+=waysToCreate(input_class,class_idx+1,n);
for( int i=1;i < ncnt[class_idx] && i<=n ){
int new_input_class=(input_class+i*class_idx)%k;
if(i == n && new_input_class != 0){
break; // all elements are used, but doesn't congrunent with 0 (mod k)
}
int subways=1;
if(class_idx+1 < k )
subways=waysToCreate(new_input_class,class_idx+1,n-i)
ways+=nchoosek(ncnt[class_idx],i) * subways;
}
return ways;
}
enable memoize on waysToCreate, nchoosek

it can work but slow
/**
* List all k size subset of a given list with n unique elements.
* n can be bigger than 64. this function will take O(K^N) time, Bad.
*
* #param list
* #param subSetSize
* #param subSet
* #param indexFrom
* #param indexEnd
*/
private static void subSetOf(List<Integer> list, int subSetSize, Set<Integer> subSet, int indexFrom, int indexEnd) {
if (subSet == null) {
assert 0 < subSetSize && subSetSize <= list.size();
subSet = new HashSet(subSetSize);
}
if (subSetSize <= 64) {
// Todo using bitwise trick
}
for (int index = indexFrom; index <= indexEnd; index++) {
subSet.add(list.get(index));
if (subSet.size() == subSetSize) {
System.out.println(Arrays.toString(subSet.toArray()));
// check the sum of this subset is satisfied or not by sum/k
}
if (subSet.size() < subSetSize) {
subSetOf(list, subSetSize, subSet,
index + 1,
list.size() - (subSetSize - subSet.size()));
}
subSet.remove(list.get(index));
}
}
public static void subSetOf(List<Integer> list,
int subSetSize,
Set<Integer> subSet) {
subSetOf(list, subSetSize, subSet, 0, list.size() - subSetSize);
}

Find the top k sums of two sorted arrays

You are given two sorted arrays, of sizes n and m respectively. Your task (should you choose to accept it), is to output the largest k sums of the form a[i]+b[j].
A O(k log k) solution can be found here. There are rumors of a O(k) or O(n) solution. Does one exist?

I found the responses at your link mostly vague and poorly structured. Here's a start with a O(k * log(min(m, n))) O(k * log(m + n)) O(k * log(k)) algorithm.
Suppose they are sorted decreasing. Imagine you computed the m*n matrix of the sums as follows:
for i from 0 to m
for j from 0 to n
sums[i][j] = a[i] + b[j]
In this matrix, values monotonically decrease down and to the right. With that in mind, here is an algorithm which performs a graph search through this matrix in order of decreasing sums.
q : priority queue (decreasing) := empty priority queue
add (0, 0) to q with priority a[0] + b[0]
while k > 0:
k--
x := pop q
output x
(i, j) : tuple of int,int := position of x
if i < m:
add (i + 1, j) to q with priority a[i + 1] + b[j]
if j < n:
add (i, j + 1) to q with priority a[i] + b[j + 1]
Analysis:
The loop is executed k times.
There is one pop operation per iteration.
There are up to two insert operations per iteration.
The maximum size of the priority queue is O(min(m, n)) O(m + n) O(k).
The priority queue can be implemented with a binary heap giving log(size) pop and insert.
Therefore this algorithm is O(k * log(min(m, n))) O(k * log(m + n)) O(k * log(k)).
Note that the general priority queue abstract data type needs to be modified to ignore duplicate entries. Alternately, you could maintain a separate set structure that first checks for membership in the set before adding to the queue, and removes from the set after popping from the queue. Neither of these ideas would worsen the time or space complexity.
I could write this up in Java if there's any interest.
Edit: fixed complexity. There is an algorithm which has the complexity I described, but it is slightly different from this one. You would have to take care to avoid adding certain nodes. My simple solution adds many nodes to the queue prematurely.

private static class FrontierElem implements Comparable<FrontierElem> {
int value;
int aIdx;
int bIdx;
public FrontierElem(int value, int aIdx, int bIdx) {
this.value = value;
this.aIdx = aIdx;
this.bIdx = bIdx;
}
#Override
public int compareTo(FrontierElem o) {
return o.value - value;
}
}
public static void findMaxSum( int [] a, int [] b, int k ) {
Integer [] frontierA = new Integer[ a.length ];
Integer [] frontierB = new Integer[ b.length ];
PriorityQueue<FrontierElem> q = new PriorityQueue<MaxSum.FrontierElem>();
frontierA[0] = frontierB[0]=0;
q.add( new FrontierElem( a[0]+b[0], 0, 0));
while( k > 0 ) {
FrontierElem f = q.poll();
System.out.println( f.value+" "+q.size() );
k--;
frontierA[ f.aIdx ] = frontierB[ f.bIdx ] = null;
int fRight = f.aIdx+1;
int fDown = f.bIdx+1;
if( fRight < a.length && frontierA[ fRight ] == null ) {
q.add( new FrontierElem( a[fRight]+b[f.bIdx], fRight, f.bIdx));
frontierA[ fRight ] = f.bIdx;
frontierB[ f.bIdx ] = fRight;
}
if( fDown < b.length && frontierB[ fDown ] == null ) {
q.add( new FrontierElem( a[f.aIdx]+b[fDown], f.aIdx, fDown));
frontierA[ f.aIdx ] = fDown;
frontierB[ fDown ] = f.aIdx;
}
}
}
The idea is similar to the other solution, but with the observation that as you add to your result set from the matrix, at every step the next element in our set can only come from where the current set is concave. I called these elements frontier elements and I keep track of their position in two arrays and their values in a priority queue. This helps keep the queue size down, but by how much I've yet to figure out. It seems to be about sqrt( k ) but I'm not entirely sure about that.
(Of course the frontierA/B arrays could be simple boolean arrays, but this way they fully define my result set, This isn't used anywhere in this example but might be useful otherwise.)

As the pre-condition is the Array are sorted hence lets consider the following
for N= 5;
A[]={ 1,2,3,4,5}
B[]={ 496,497,498,499,500}
Now since we know Summation of N-1 of A&B would be highest hence just insert this in to heap along with the indexes of A & B element ( why, indexes? we'll come to know in a short while )
H.insert(A[N-1]+B[N-1],N-1,N-1);
now
while(!H.empty()) { // the time heap is not empty
H.pop(); // this will give you the sum you are looking for
The indexes which we got at the time of pop, we shall use them for selecting the next sum element.
Consider the following :
if we have i & j as the indexes in A & B , then the next element would be max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ,
So, insert the same if that has not been inserted in the heap
hence
(i,j)= max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ;
if(Hash[i,j]){ // not inserted
H.insert (i,j);
}else{
get the next max from max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ; and insert.
}
K pop-ing them will give you max elements required.
Hope this helps

Many thanks to #rlibby and #xuhdev with such an original idea to solve this kind of problem. I had a similar coding exercise interview require to find N largest sums formed by K elements in K descending sorted arrays - means we must pick 1 element from each sorted arrays to build the largest sum.
Example: List findHighestSums(int[][] lists, int n) {}
[5,4,3,2,1]
[4,1]
[5,0,0]
[6,4,2]
[1]
and a value of 5 for n, your procedure should return a List of size 5:
[21,20,19,19,18]
Below is my code, please take a look carefully for those block comments :D
private class Pair implements Comparable<Pair>{
String state;
int sum;
public Pair(String state, int sum) {
this.state = state;
this.sum = sum;
}
#Override
public int compareTo(Pair o) {
// Max heap
return o.sum - this.sum;
}
}
List<Integer> findHighestSums(int[][] lists, int n) {
int numOfLists = lists.length;
int totalCharacterInState = 0;
/*
* To represent State of combination of largest sum as String
* The number of characters for each list should be Math.ceil(log(list[i].length))
* For example:
* If list1 length contains from 11 to 100 elements
* Then the State represents for list1 will require 2 characters
*/
int[] positionStartingCharacterOfListState = new int[numOfLists + 1];
positionStartingCharacterOfListState[0] = 0;
// the reason to set less or equal here is to get the position starting character of the last list
for(int i = 1; i <= numOfLists; i++) {
int previousListNumOfCharacters = 1;
if(lists[i-1].length > 10) {
previousListNumOfCharacters = (int)Math.ceil(Math.log10(lists[i-1].length));
}
positionStartingCharacterOfListState[i] = positionStartingCharacterOfListState[i-1] + previousListNumOfCharacters;
totalCharacterInState += previousListNumOfCharacters;
}
// Check the state <---> make sure that combination of a sum is new
Set<String> states = new HashSet<>();
List<Integer> result = new ArrayList<>();
StringBuilder sb = new StringBuilder();
// This is a max heap contain <State, largestSum>
PriorityQueue<Pair> pq = new PriorityQueue<>();
char[] stateChars = new char[totalCharacterInState];
Arrays.fill(stateChars, '0');
sb.append(stateChars);
String firstState = sb.toString();
states.add(firstState);
int firstLargestSum = 0;
for(int i = 0; i < numOfLists; i++) firstLargestSum += lists[i][0];
// Imagine this is the initial state in a graph
pq.add(new Pair(firstState, firstLargestSum));
while(n > 0) {
// In case n is larger than the number of combinations of all list entries
if(pq.isEmpty()) break;
Pair top = pq.poll();
String currentState = top.state;
int currentSum = top.sum;
/*
* Loop for all lists and generate new states of which only 1 character is different from the former state
* For example: the initial state (Stage 0) 0 0 0 0 0
* So the next states (Stage 1) should be:
* 1 0 0 0 0
* 0 1 0 0 0 (choose element at index 2 from 2nd array)
* 0 0 1 0 0 (choose element at index 2 from 3rd array)
* 0 0 0 0 1
* But don't forget to check whether index in any lists have exceeded list's length
*/
for(int i = 0; i < numOfLists; i++) {
int indexInList = Integer.parseInt(
currentState.substring(positionStartingCharacterOfListState[i], positionStartingCharacterOfListState[i+1]));
if( indexInList < lists[i].length - 1) {
int numberOfCharacters = positionStartingCharacterOfListState[i+1] - positionStartingCharacterOfListState[i];
sb = new StringBuilder(currentState.substring(0, positionStartingCharacterOfListState[i]));
sb.append(String.format("%0" + numberOfCharacters + "d", indexInList + 1));
sb.append(currentState.substring(positionStartingCharacterOfListState[i+1]));
String newState = sb.toString();
if(!states.contains(newState)) {
// The newSum is always <= currentSum
int newSum = currentSum - lists[i][indexInList] + lists[i][indexInList+1];
states.add(newState);
// Using priority queue, we can immediately retrieve the largest Sum at Stage k and track all other unused states.
// From that Stage k largest Sum's state, then we can generate new states
// Those sums composed by recently generated states don't guarantee to be larger than those sums composed by old unused states.
pq.add(new Pair(newState, newSum));
}
}
}
result.add(currentSum);
n--;
}
return result;
}
Let me explain how I come up with the solution:
The while loop in my answer executes N times, consider the max heap
( priority queue).
Poll operation 1 time with complexity O(log(
sumOfListLength )) because the maximum element Pair in
heap is sumOfListLength.
Insertion operations might up to K times,
the complexity for each insertion is log(sumOfListLength).
Therefore, the complexity is O(N * log(sumOfListLength) ),

A data structure problem

Given a sequence of integers, there are a number of queries.
Each query has a range [l, r], and you are to find the median of the given range [l, r]
The number of queries can be as large as 100,000
The length of the sequence can be as large as 100,000
I wonder if there is any data structure can support such query
My solution:
I consult my partner today and he tells to use partition tree.
We can build a partition tree in nlog(n) time and answer each query in log(n) time
The partition tree actually is the process of merge sort, but for each node in the tree, it saves the number of integers that go to the left subtree. Thus, we can use this information to deal with the query.
here is my code:
This program is to find the x in a given interval [l, r], that minimize the following equation.
alt text http://acm.tju.edu.cn/toj/3556_01.jpg
Explanation:
seq saves the sequence
pos saves the position after sort
ind saves the index
cntL saves the number of integers that go to the left tree in a given range
#include <cstdio>
#include <cstring>
#include <algorithm>
using namespace std;
#define N 100008
typedef long long LL;
int n, m, seq[N], ind[N], pos[N], next[N];
int cntL[20][N];
LL sum[20][N], sumL, subSum[N];
void build(int l, int r, int head, int dep)
{
if (l == r)
{
cntL[dep][l] = cntL[dep][l-1];
sum[dep][l] = sum[dep][l-1];
return ;
}
int mid = (l+r)>>1;
int hl = 0, hr = 0, tl = 0, tr = 0;
for (int i = head, j = l; i != -1; i = next[i], j++)
{
cntL[dep][j] = cntL[dep][j-1];
sum[dep][j] = sum[dep][j-1];
if (pos[i] <= mid)
{
next[tl] = i;
tl = i;
if (hl == 0) hl = i;
cntL[dep][j]++;
sum[dep][j] += seq[i];
}
else
{
next[tr] = i;
tr = i;
if (hr == 0) hr = i;
}
}
next[tl] = -1;
next[tr] = -1;
build(l, mid, hl, dep+1);
build(mid+1, r, hr, dep+1);
}
int query(int left, int right, int ql, int qr, int kth, int dep)
{
if (left == right)
{
return ind[left];
}
int mid = (left+right)>>1;
if (cntL[dep][qr] - cntL[dep][ql-1] >= kth)
{
return query(left, mid, left+cntL[dep][ql-1]-cntL[dep][left-1], left+cntL[dep][qr]-cntL[dep][left-1]-1, kth, dep+1);
}
else
{
sumL += sum[dep][qr]-sum[dep][ql-1];
return query(mid+1, right, mid+1+ql-left-(cntL[dep][ql-1]-cntL[dep][left-1]), mid+qr+1-left-(cntL[dep][qr]-cntL[dep][left-1]), \
kth-(cntL[dep][qr]-cntL[dep][ql-1]), dep+1);
}
}
inline int cmp(int x, int y)
{
return seq[x] < seq[y];
}
int main()
{
int ca, t, i, j, middle, ql, qr, id, tot;
LL ans;
scanf("%d", &ca);
for (t = 1; t <= ca; t++)
{
scanf("%d", &n);
subSum[0] = 0;
for (i = 1; i <= n; i++)
{
scanf("%d", seq+i);
ind[i] = i;
subSum[i] = subSum[i-1]+seq[i];
}
sort(ind+1, ind+1+n, cmp);
for (i = 1; i <= n; i++)
{
pos[ind[i]] = i;
next[i] = i+1;
}
next[n] = -1;
build(1, n, 1, 0);
printf("Case #%d:\n", t);
scanf("%d", &m);
while (m--)
{
scanf("%d%d", &ql, &qr);
ql++, qr++;
middle = (qr-ql+2)/2;
sumL= 0;
id = query(1, n, ql, qr, middle, 0);
ans = subSum[qr]-subSum[ql-1]-sumL;
tot = qr-ql+1;
ans = ans-(tot-middle+1)*1ll*seq[id]+(middle-1)*1ll*seq[id]-sumL;
printf("%lld\n", ans);
}
puts("");
}
}

This is called the Range Median Query problem. The following paper might be relevant: Towards Optimal Range Medians. (Free link, thanks to belisarius).
From the abstract of the paper:
We consider the following problem:
Given an unsorted array of n elements,
and a sequence of intervals in the
array, compute the median in each of
the subarrays defined by the
intervals. We describe a simple
algorithm which needs O(nlogk+klogn)
time to answer k such median queries.
This improves previous algorithms by a
logarithmic factor and matches a
comparison lower bound for k=O(n). The
space complexity of our simple
algorithm is O(nlogn) in the pointer
machine model, and O(n) in the RAM
model. In the latter model, a more
involved O(n) space data structure can
be constructed in O(nlogn) time where
the time per query is reduced to
O(logn/loglogn). We also give
efficient dynamic variants of both
data structures, achieving O(log^2n)
query time using O(nlogn) space in the
comparison model and
O((logn/loglogn)^2) query time using
O(nlogn/loglogn) space in the RAM
model, and show that in the cell-probe
model, any data structure which
supports updates in O(log^O(1)n) time
must have Ω(logn/loglogn) query time.
Our approach naturally generalizes to
higher-dimensional range median
problems, where element positions and
query ranges are multidimensional—it
reduces a range median query to a
logarithmic number of range counting
queries.
Of course, you could preprocess the whole array in O(n^3) time (or perhaps even O(n^2logn) time) and O(n^2) space to be able to return the median in O(1) time.
Additional constraints might help simplify the solution. For instance, do we know that r-l will lesser than a known constant? etc...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Increment Range using Fenwick Tree - data-structures

Related

How to find minimum positive contiguous sub sequence in O(n) time?

Adding sum of frequencies whille solving Optimal Binary search tree

Algorithm for counting n-element subsets of m-element set divisible by k

Find the top k sums of two sorted arrays

A data structure problem

Categories

Resources