Finding number of zeros in a changing array - algorithm

The problem is pretty much what the title says. There is an n-element(n<10^5) array, which consists of n zeros. There are q operations (q<2*10^5): Each operation can be one of two below:
1. Add x to all elements on a [l,r] range(x can be also negative)
2. Ask for the number of zeros on a [l,r] range
Note that it is guaranteed that absolute values in the array will never get greater than 10^5
I am asking this question because I was reading a solution to another problem where my question was its subproblem. The author said that it can be solved using segment tree with lazy propagation. I can not figure out how to do that. The brute force solution(O(q*n)) is too slow...
What is the most efficient way to implement answering the query considering the first operation? O(q*long(n)) is what I would be guessing.
Example:
The array is: 0 0 0 0
-> Add 3 from index 2 to index 3:
The array is now: 0 0 3 3
-> Ask about number of zeros on [1,2]
The answer is 1
-> Add -3 from index 3 to index 3
The array is now: 0 0 3 0
-> Ask about number of zeros on [0,3]
The answer is 3

Ok, I have solved this task. All we have to do is create a segment tree of minimums with lazy propagation, which also counts number of those minimums.
In each node of our segment tree we will store 3 values:
1. Minimum from the segment operated by our node.
2. Number of those minimums on a segment operated by our node.
3. Lazy propagation values(values which tell us what should we pass to our sons when visiting this node next time).
When reading from a segment we will get:
1.Minimum on this segment
2.How many numbers are equal to the minimum on this segment.
If segment's minimum is 0, then we have to simply return the second value. If our minimum is higher than 0, the answer is 0(no zeros found on this segment, because the lowest number is higher than 0). Since read operation, as well as update operations, is O(log(n)), and we have q operations, the complexity of this algorithm is O(q*log(n)), which is sufficient.
Pseudocode:
min_count[2*MAX_N]
val[2*MAX_N]
lazy[2*MAX_N]
values_from_sons(node)
{
if(node has no childern) stop the function
val[node]=min(val[2*node],val[2*node+1] //it is a segment tree of minimums
if(val[2*node]<val[2*node+1]) //minimum from the left son < minimum from the right son
{
min_count[node]=min_count[2*node]
stop the function
}
if(val[2*node]<val[2*node+1]) //minimum from the left son > minimum from the right son
{
min_count[node]=min_count[2*node]
stop the function
}
if(val[2*node]==val[2*node+1])
{
min_count[node]=min_count[2*node]+min_count[2*node+1];
//we have x minimums in the left son, and y non-intersecting with x minimums on the right, so we can sum them up
}
}
pass(node)
{
if(node has no childern) stop the function
//we are passing values to our children when visiting node,
// remember that array "lazy" stores values which belong to node's sons
val[2*node]+=lazy[node];
lazy[2*node]+=lazy[node];
val[2*node+1]+=lazy[node];
lazy[2*node+1]+=lazy[node];
lazy[node]=0;
}
update(node,left,right,s1,s2,add)
//node-number of a node, [left,right]-segment operated by this node, [s1,s2]-segment on which we want to add "add" value
{
pass(node)
if([left,right] and [s1,s2] have no intersections) stop the function
if([left,right] and [s1,s2] have at least one intersection) /// add "add" value to this node's lazy and val
{
val[node]+=add
lazy[node]+=add
stop the function
}
update(values of the left son)
update(values of the right son)
values_from_sons(node)
//placing this function here updates this node's values when some of his lower ancestors were changed
}
read(node,left,right,s1,s2)
//node-number of a node, [left,right]-segment operated by this node, [s1,s2]-segment for which we want an answer
// this function returns 2 values - minimum from a [s1,s2] segment, and number of values equal to this minimum
{
pass(node)
if([left,right] and [s1,s2] have no intersections) return {INF,0}; //return neutral value of min operation
if([left,right] and [s1,s2] have at least one intersection) return {val[node],min_count[node]}
vl=read(values of the left son)
vr=read(values of the right son)
if(vl<vr)
{
//vl has lower minimums, so the answer for this node will be vl
return vl
}
else if(vl>vr)
{
//vr has lower minimums, so the answer for this node will be vr
return vr
}
else
{
//left and right son have the same minimum, and non intersecting values. Hence we can add them
return {vl's minimum, vl's count of minimums + vr's count of minimums};
}
}
ini()
//builds tree. remember that you have to use it before using any of the functions above
{
//Hence we don't have to worry about beginning values, all of them are set to 0 at the beginning,
// we just have to set min_count table properly
for(each leaf[node that has no sons])
{
min_cout[leaf]=1;
}
for(x=MAX_N-1, x>0, x--)
{
min_count[x]=min_count[2*x]+min_count[2*x+1]
}
}

Related

Given a circular linked list, find a suitable head such that the running sum of the values is never negative

I have a linked list with integers. For example:
-2 → 2
↑ ↓
1 ← 5
How do I find a suitable starting point such that the running sum always stays non-negative?
For example:
If I pick -2 as starting point, my sum at the first node will be -2. So that is an invalid selection.
If I pick 2 as the starting point, the running sums are 2, 7, 8, 6 which are all positive numbers. This is a valid selection.
The brute force algorithm is to pick every node as head and do the calculation and return the node which satisfies the condition, but that is O(𝑛²).
Can this be done with O(𝑛) time complexity or better?
Let's say you start doing a running sum at some head node, and you eventually reach a point where the sum goes negative.
Obviously, you know that the head node you started at is invalid as an answer to the question.
But also, you know that all of nodes contributing to that sum are invalid. You've already checked all the prefixes of that sublist, and you know that all the prefixes have nonnegative sums, so removing any prefix from the total sum can only make it smaller. Also, of course, the last node you added must be negative, you can't start their either.
This leads to a simple algorithm:
Start a cumulative sum at the head node.
If it becomes negative, discard all the nodes you've looked at and start at the next one
Stop when the sum includes the whole list (success), or when you've discarded all the nodes in the list (no answer exsits)
The idea is to use a window, i.e. two node references, where one runs ahead of the other, and the sum of the nodes within that window is kept in sync with any (forward) movement of either references. As long as the sum is non-negative, enlarge the window by adding the front node's value and moving the front reference ahead. When the sum turns negative, collapse the window, as all suffix sums in that window will now be negative. The window becomes empty, with back and front referencing the same node, and the running sum (necessarily) becomes zero, but then the forward reference will move ahead again, widening the window.
The algorithm ends when all nodes are in the window, i.e. when the front node reference meets the back node reference. We should also end the algorithm when the back reference hits or overtakes the list's head node, since that would mean we looked at all possibilities, but found no solution.
Here is an implementation of that algorithm in JavaScript. It first defines a class for Node and one for CircularList. The latter has a method getStartingNode which returns the node from where the sum can start and can accumulate without getting negative:
class Node {
constructor(value, next=null) {
this.value = value;
this.next = next;
}
}
class CircularList {
constructor(values) {
// Build a circular list for the given values
let node = new Node(values[0]);
this.head = node;
for (let i = values.length - 1; i > 0; i--) {
node = new Node(values[i], node);
}
this.head.next = node; // close the cycle
}
getStartingNode() {
let looped = false;
let back = this.head;
let front = this.head;
let sum = 0;
while (true) {
// As long as the sum is not negative (or window is empty),
// ...widen the window
if (front === back || sum >= 0) {
sum += front.value;
front = front.next;
if (front === back) break; // we added all values!
if (front === this.head) looped = true;
} else if (looped) {
// avoid endless looping when there is no solution
return null;
} else { // reset window
sum = 0;
back = front;
}
}
if (sum < 0) return null; // no solution
return back;
}
}
// Run the algorithm for the example given in question
let list = new CircularList([-2, 2, 5, 1]);
console.log("start at", list.getStartingNode()?.value);
As the algorithm is guaranteed to end when the back reference has visited all nodes, and the front reference will never overtake the back reference, this is has a linear time complexity. It cannot be less as all node values need to be read to know their sum.
I have assumed that the value 0 is allowed as a running sum, since the title says it should never be negative. If zero is not allowed, then just change the comparison operators used to compare the sum with 0. In that case the comparison back === front is explicitly needed in the first if statement, otherwise you may actually drop it, since that implies the sum is 0, and the second test in that if condition does the job.

How to adapt Fenwick tree to answer range minimum queries

Fenwick tree is a data-structure that gives an efficient way to answer to main queries:
add an element to a particular index of an array update(index, value)
find sum of elements from 1 to N find(n)
both operations are done in O(log(n)) time and I understand the logic and implementation. It is not hard to implement a bunch of other operations like find a sum from N to M.
I wanted to understand how to adapt Fenwick tree for RMQ. It is obvious to change Fenwick tree for first two operations. But I am failing to figure out how to find minimum on the range from N to M.
After searching for solutions majority of people think that this is not possible and a small minority claims that it actually can be done (approach1, approach2).
The first approach (written in Russian, based on my google translate has 0 explanation and only two functions) relies on three arrays (initial, left and right) upon my testing was not working correctly for all possible test cases.
The second approach requires only one array and based on the claims runs in O(log^2(n)) and also has close to no explanation of why and how should it work. I have not tried to test it.
In light of controversial claims, I wanted to find out whether it is possible to augment Fenwick tree to answer update(index, value) and findMin(from, to).
If it is possible, I would be happy to hear how it works.
Yes, you can adapt Fenwick Trees (Binary Indexed Trees) to
Update value at a given index in O(log n)
Query minimum value for a range in O(log n) (amortized)
We need 2 Fenwick trees and an additional array holding the real values for nodes.
Suppose we have the following array:
index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
value 1 0 2 1 1 3 0 4 2 5 2 2 3 1 0
We wave a magic wand and the following trees appear:
Note that in both trees each node represents the minimum value for all nodes within that subtree. For example, in BIT2 node 12 has value 0, which is the minimum value for nodes 12,13,14,15.
Queries
We can efficiently query the minimum value for any range by calculating the minimum of several subtree values and one additional real node value. For example, the minimum value for range [2,7] can be determined by taking the minimum value of BIT2_Node2 (representing nodes 2,3) and BIT1_Node7 (representing node 7), BIT1_Node6 (representing nodes 5,6) and REAL_4 - therefore covering all nodes in [2,7]. But how do we know which sub trees we want to look at?
Query(int a, int b) {
int val = infinity // always holds the known min value for our range
// Start traversing the first tree, BIT1, from the beginning of range, a
int i = a
while (parentOf(i, BIT1) <= b) {
val = min(val, BIT2[i]) // Note: traversing BIT1, yet looking up values in BIT2
i = parentOf(i, BIT1)
}
// Start traversing the second tree, BIT2, from the end of range, b
i = b
while (parentOf(i, BIT2) >= a) {
val = min(val, BIT1[i]) // Note: traversing BIT2, yet looking up values in BIT1
i = parentOf(i, BIT2)
}
val = min(val, REAL[i]) // Explained below
return val
}
It can be mathematically proven that both traversals will end in the same node. That node is a part of our range, yet it is not a part of any subtrees we have looked at. Imagine a case where the (unique) smallest value of our range is in that special node. If we didn't look it up our algorithm would give incorrect results. This is why we have to do that one lookup into the real values array.
To help understand the algorithm I suggest you simulate it with pen & paper, looking up data in the example trees above. For example, a query for range [4,14] would return the minimum of values BIT2_4 (rep. 4,5,6,7), BIT1_14 (rep. 13,14), BIT1_12 (rep. 9,10,11,12) and REAL_8, therefore covering all possible values [4,14].
Updates
Since a node represents the minimum value of itself and its children, changing a node will affect its parents, but not its children. Therefore, to update a tree we start from the node we are modifying and move up all the way to the fictional root node (0 or N+1 depending on which tree).
Suppose we are updating some node in some tree:
If new value < old value, we will always overwrite the value and move up
If new value == old value, we can stop since there will be no more changes cascading upwards
If new value > old value, things get interesting.
If the old value still exists somewhere within that subtree, we are done
If not, we have to find the new minimum value between real[node] and each tree[child_of_node], change tree[node] and move up
Pseudocode for updating node with value v in a tree:
while (node <= n+1) {
if (v > tree[node]) {
if (oldValue == tree[node]) {
v = min(v, real[node])
for-each child {
v = min(v, tree[child])
}
} else break
}
if (v == tree[node]) break
tree[node] = v
node = parentOf(node, tree)
}
Note that oldValue is the original value we replaced, whereas v may be reassigned multiple times as we move up the tree.
Binary Indexing
In my experiments Range Minimum Queries were about twice as fast as a Segment Tree implementation and updates were marginally faster. The main reason for this is using super efficient bitwise operations for moving between nodes. They are very well explained here. Segment Trees are really simple to code so think about is the performance advantage really worth it? The update method of my Fenwick RMQ is 40 lines and took a while to debug. If anyone wants my code I can put it on github. I also produced a brute and test generators to make sure everything works.
I had help understanding this subject & implementing it from the Finnish algorithm community. Source of the image is http://ioinformatics.org/oi/pdf/v9_2015_39_44.pdf, but they credit Fenwick's 1994 paper for it.
The Fenwick tree structure works for addition because addition is invertible. It doesn't work for minimum, because as soon as you have a cell that's supposed to be the minimum of two or more inputs, you've lost information potentially.
If you're willing to double your storage requirements, you can support RMQ with a segment tree that is constructed implicitly, like a binary heap. For an RMQ with n values, store the n values at locations [n, 2n) of an array. Locations [1, n) are aggregates, with the formula A(k) = min(A(2k), A(2k+1)). Location 2n is an infinite sentinel. The update routine should look something like this.
def update(n, a, i, x): # value[i] = x
i += n
a[i] = x
# update the aggregates
while i > 1:
i //= 2
a[i] = min(a[2*i], a[2*i+1])
The multiplies and divides here can be replaced by shifts for efficiency.
The RMQ pseudocode is more delicate. Here's another untested and unoptimized routine.
def rmq(n, a, i, j): # min(value[i:j])
i += n
j += n
x = inf
while i < j:
if i%2 == 0:
i //= 2
else:
x = min(x, a[i])
i = i//2 + 1
if j%2 == 0:
j //= 2
else:
x = min(x, a[j-1])
j //= 2
return x

Hard Coding Depth First Search Results (or Optimizing?)

I need to get all possible paths of a tree so I implemented a DFS like this:
void bisearch(std::vector<int> path, int steps,
int node, std::vector<std::vector<int>> *paths) {
int sum = 0;
if (path.size() == steps) {
for(std::vector<int>::iterator it=path.begin(); it != path.end(); ++it) {
sum += (*it);
}
if (sum == node)
paths->push_back(path);
}
else {
std::vector<int> uPath(path);
uPath.push_back(1);
bisearch(uPath, steps, node, paths);
std::vector<int> dPath(path);
dPath.push_back(0);
bisearch(dPath, steps, node, paths);
}
}
The above code gives me all paths to some ending node for a tree of length "steps". I then loop through all ending nodes and run this to get every path. Issue is it takes forever! I was thinking of maybe hardcoding all the possible combinations to speed it up, of course I couldn't do this by hand since for instance a tree with 25 steps would have 2^25 ~= 35 million possible combinations, but maybe I could print the output from the search and use that to hardcode? Or does anyone see any easy optimizations I could make that would make a big difference on the performance? Thanks.
EDIT: Let me clarify a little. I need the path, that is the sequence of movements along the tree where 1 represents a right hand move and 0 a left (or up/down whichever you prefer). So for instance a 2 step tree I need the four ordered pairs (1,0) (0,1) (1,1) (0,0).
Since "all the combinations" should mean just "the combinations of turning right / left at a certain level", you could just loop through 0 to 2 ^ n - 1, and the binary representation padded with 0 in the front might be just what you want.
If what you want is the count of paths with left turn count equals to a certain number k, then this just equals the numbers from 0 to 2 ^ n - 1 that has k bit equal to 1, and you could possibly use this to compute the result you want.

Trouble with a stack based algorithm

I'm working on this programming assignment. It tests our understanding of stacks and their applications. I find it extremely difficult to come up with an algorithm that can work efficiently and accurately. Some of their test cases have 200,000+ "trees"! While my algorithm can work for simpler test cases with less than 10 trees, it failed in the accuracy and efficiency departments when the number of "trees" is exceedingly large (from 100+ onwards).
I would appreciate it very much, if you guys can kindly give me a hint or point me to the right direction. Thank you.
Task Statement
Monkeys like to swing from tree to tree. They can swing from one tree
to another directly as long as there is no tree in between that is
taller than or have the same height as either one of the two trees.
For example, if there are 5 trees with heights 19m, 17m, 20m, 20m and
20m lining up in that order, then the monkey will be able to swing
from one tree to the other as shown below:
1. from first tree to second tree
2. from first tree to third tree
3. from second tree to third tree
4. from third tree to fourth tree
5. from fourth tree to fifth tree
Tarzan, the king of jungle who is able to communicate with the
monkeys, wants to test the monkeys to see if they know how to count
the total number of pairs of trees that they can swing directly from
one to the other. But he himself is not very good in counting. So he
turns to you, the best Java programmer in the country, to write a
program for getting the correct count for the trees in different parts
of the jungle.
Input
The first line contains N, the number of trees in the path. The next
line contains N integers a1 a2 a3 ... aN, where ai represents the
height of the i-th tree in the path, 0 < ai ≤ 231 and 2 ≤ N ≤ 500,000.
Note that short symbol N is used above for convenience. In your
program, you are expected to give it a descriptive name.
Output
The total number of pairs of trees which the monkeys can swing
directly from one to the other with the given list of tree heights.
Sample Input 1
4
3 4 1 2
Sample Output 1
4
Sample Input 2
5
19 17 20 20 20
Sample Output 2
5
Sample Input 3
4 1
2 21 21 12
Sample Output 3
3
Here's my code. So this is a method that returns the number of pairs of trees a monkey can swing with. The parameter is an array of inputs.
My algorithm goes as follows:
we set the numPairs to be (array length - 1), since all trees can be swing from one to another.
now we find the extra numPairs (extra trees to swing with).
push the first input into the empty stack
we enter a for loop:
for the next input until the end of array:
case1:
if the top of the stack is smaller than the current input and the size of the stack is equal to 1, then we replace the top with the input.
case2:
if the top of the stack smaller than the current input and the size of the stack is bigger than 1, we pop the top, and enter a while loop to pop the previous elements which is smaller than the current top of the stack.
we then push the current input after we exit the while loop.
case3:
otherwise, if the above conditions are not satisfied, we simply push the current input into the stack.
we exit the for loop
return the numPairs
public int solve(int[] arr) {
int input, temp;
numPairs = arr.length-1;
for(int i=0; i<arr.length; i++)
{
input = arr[i];
if(stack.isEmpty())
stack.push(input);
else if(!stack.isEmpty())
{
if(input>stack.peek() && stack.size() == 1)
{
stack.pop();
stack.push(input);
}
else if(input>stack.peek() && stack.size() > 1)
{
temp = stack.pop();
while(!stack.isEmpty() && temp < stack.peek())
{
numPairs++;
temp = stack.pop();
}
stack.push(input);
//numPairs++;
}
else
stack.push(input);
}
}
return numPairs;
}
Here's my solution, it's an iterative one.
class Result {
// declare the member field
Stack<Integer> stack;
int numPairs = 0;
// declare the constructor
public Result()
{
stack = new Stack<Integer>();
}
/*
* solve : to compute the result, return the result
* Pre-condition : parameter must be of array of integer type
* Post-condition : return the number of tree pairs that can be swung with
*/
public int solve(int[] arr) {
// implementation
int input;
for(int i=0; i<arr.length; i++)
{
input = arr[i];
if(stack.isEmpty()) //if stack is empty, just push the input
stack.push(input);
else if(!stack.isEmpty())
{
//do a while loop to pop all possible top stack element until
//the top element is bigger than the input
//or the stack is empty
while(!stack.isEmpty() && input > stack.peek())
{
stack.pop();
numPairs++;
}
//if the stack is empty after exiting the while loop
//push the current element onto the stack
if(stack.isEmpty())
stack.push(input);
//this condition applies for two cases:
//1. the while loop is never entered because the input is smaller than the top element by default
//2. the while loop is exited and the input is pushed onto the non-empty stack with numPairs being incremented
else if(!stack.isEmpty() && input < stack.peek())
{
stack.push(input);
numPairs++;
}
//this is the last condition:
//the input is never pushed if the input is identical to the top element
//instead we increment the numPairs
else if(input == stack.peek())
numPairs++;
}
}
return numPairs;
}
}
If I understand the problem correctly, there are two kinds of trees accessible to each other:
Trees that are next to each (adjacent) other are always accessible to each other
Trees that are not adjacent are only accessible if all the trees in between are shorter than both of the trees.
One might come up with several types of solutions for this:
The brute force solution: compare every tree to every other tree checking the conditions above. Running time: O(n^2)
Find near accessible neighbors solution: look for near neighbors that are accessible. Running time: close to O(n). Here's how this would work:
Build an array of tree sizes in order that they are given. Then walk this array in order and for every tree at index i:
Going to the right from i
If tree at i+1 is taller then tree at i break out (no more accessible neighbors can be found)
Add 1 to the count of accessible trees if tree at i+1 is shorter than tree at i+2
Do the same for trees i+2, i+3.. etc. until you find a tree that is taller than tree at i.
This will get a count of non-adjacent accessible trees for every tree. Then just add N*2-2 to the count to account for all the adjacent trees, and you are done.

Store the largest 5000 numbers from a stream of numbers

Given the following problem:
"Store the largest 5000 numbers from a stream of numbers"
The solution which springs to mind is a binary search tree maintaining a count of the number of nodes in the tree and a reference to the smallest node once the count reaches 5000. When the count reaches 5000, each new number to add can be compared to the smallest item in the tree. If greater, the new number can be added then the smallest removed and the new smallest calculated (which should be very simple already having the previous smallest).
My concern with this solution is that the binary tree is naturally going to get skewed (as I'm only deleting on one side).
Is there a way to solve this problem which won't create a terribly skewed tree?
In case anyone wants it, I've included pseudo-code for my solution so far below:
process(number)
{
if (count == 5000 && number > smallest.Value)
{
addNode( root, number)
smallest = deleteNodeAndGetNewSmallest ( root, smallest)
}
}
deleteNodeAndGetNewSmallest( lastSmallest)
{
if ( lastSmallest has parent)
{
if ( lastSmallest has right child)
{
smallest = getMin(lastSmallest.right)
lastSmallest.parent.right = lastSmallest.right
}
else
{
smallest = lastSmallest.parent
}
}
else
{
smallest = getMin(lastSmallest.right)
root = lastSmallest.right
}
count--
return smallest
}
getMin( node)
{
if (node has left)
return getMin(node.left)
else
return node
}
add(number)
{
//standard implementation of add for BST
count++
}
The simplest solution for this is maintaining a min heap of max size 5000.
Every time a new number arrives - check if the heap is smaller then
5000, if it is - add it.
If it is not - check if the minimum is smaller then the new
element, and if it is, pop it out and insert the new element instead.
When you are done - you have a heap containing 5000 largest elements.
This solution is O(nlogk) complexity, where n is the number of elements and k is the number of elements you need (5000 in your case).
It can be done also in O(n) using selection algorithm - store all the elements, and then find the 5001th largest element, and return everything bigger than it. But it is harder to implement and for reasonable size input - might not be better. Also, if stream contains duplicates, more processing is needed.
Use a (minimum) priority queue. Add each incoming item to the queue and when the size reaches 5,000 remove the minimum (top) element every time you add an incoming element. The queue will contain the 5,000 largest elements and when the input stops, just remove the contents. This MinPQ is also called a heap but that is an overloaded term. Insertions and deletions take about log2(N). Where N maxes out at 5,000 this would be just over 12 [log2(4096) = 12] times the number of items you are processing.
An excellent source of info is Algorithms, (4th Edition) by Robert Sedgewick and Kevin Wayne. There is an excellent MOOC on coursera.org that is based on this text.

Resources