Find the largest subarray with all ones - algorithm

Given a binary array (element is 0 or 1), I need to find the maximum length of sub array having all ones for given range(l and r) in the array.
I know the O(n) approach to find such sub array but if there are O(n) queries the overall complexity becomes O(n^2).
I know that segment tree is used for such type of problems but I cant figure out how to build tree for this problem.
Can I build a segment tree using which I can answer queries in log(n) time such that for O(n) queries overall complexity will become O(nlog(n)).

Let A be your binary array.
Build two array IL and IR:
- IL contains, in order, every i such that A[i] = 1 and (i = 0 or A[i-1] = 0);
- IR contains, in order, every i such that A[i-1] = 1 and (i = N or A[i] = 0).
In other words, for any i, the range defined by IL[i] inclusive and IR[i] non-inclusive corresponds to a sequence of 1s in A.
Now, for any query {L, R} (for the range [L; R] inclusive), let S = 0. Traverse both IL and IR with i, until IL[i] >= L. At this point, if IR[i-1] > L, set S = IR[i-1]-L. Continue traversing IL and IR, setting S = max(S, IR[i]-IL[i]), until IR[i] > R. Finally, if IL[i] <= R, set S = max(S, R-IL[i]).
S is now the size of the greatest sequence of 1s in A between L and R.
The complexity of building IL and IR is O(N), and the complexity of answering a query is O(M), with M the length of IL or IR.

Yes, you can use a segment tree to solve this problem.
Let's try to think what that tree must look like. Obviously, every node must contain the length of max subarray of 1s and 0s in that range.
Now, how do we join two nodes into a bigger one. In other words, you have a node representing [low, mid) and a node representing [mid, high). You have to obtain max subarray for [low, high).
First things first, max for whole will at least be max for parts. So we have to take the maximum among the left and right values.
But what if the real max subarray overlaps both nodes? Well, then it must be the rightmost part of left node and leftmost part of right node. So, we need to keep track of longest subarray at start and end as well.
Now, how to update these left and rightmost subarray lengths? Well, leftmost of parent node must be leftmost of left child, unless leftmost of left child spans the entire left node. In that case, leftmost of parent node will be leftmost of left + leftmost of right node.
A similar rule applies to tracking the rightmost subarray of 1s.
And we're finished. Here's the final rules in pseudo code.
max_sub[parent] = max(max_sub[left], max_sub[right], right_sub[left] + left_sub[right])
left_sub[parent] = left_sub[left] if left_sub[left] < length[left] else left_sub[left] + left_sub[right]
right_sub[parent] = right_sub[right] if right_sub[right] < length[right] else right_sub[right] + right_sub[left]
Note that you will need to take similar steps when finding the result for a range.
Here's an example tree for the array [0, 1, 1, 0, 1, 1, 1, 0].

Related

Find minimum number of steps to collect target number of coins

Given a list of n houses, each house has a certain number of coins in it. And a target value t. We have to find the minimum number of steps required to reach the target.
The person can choose to start at any house and then go right or left and collect coins in that direction until it reaches the target value. But the person cannot
change the direction.
Example: 5 1 2 3 4 These are supposed the coin values in 5 houses and the target is 13 then the minimum number of steps required is 5 because we have to select all the coins.
My Thoughts:
One way will be for each index i calculate the steps required in left or right direction to reach the target and then take the minimum of all these 2*n values.
Could there be a better way ?
First, let's simplify and canonize the problem.
Observation 1: The "choose direction" capability is redundant, if you choose to go from house j to house i, you can also go from i to j to have the same value, so it is sufficient to look at one direction only.
Observation 2: Now that we can look at the problem as going from left to right (observation 1), it is clear that we are looking for a subarray whose value exceeds k.
This means that we can canonize the problem:
Given an array with non negative values a, find minimal subarray
with values summing k or more.
There are various ways to solve this, one simple solution using a sorted map (balanced tree for example) is to go from left to right, summing values, and looking for the last element seen whose value was sum - k.
Pseudo code:
solve(array, k):
min_houses = inf
sum = 0
map = new TreeMap()
map.insert(0, -1) // this solves issue where first element is sufficient on its own.
for i from 0 to array.len():
sum = sum + array[i]
candidate = map.FindClosestLowerOrEqual(sum - k)
if candidate == null: // no matching sum, yet
continue
min_houses = min(min_houses, i - candidate)
map.insert(sum, i)
return min_houses
This solution runs in O(nlogn), as each map insertion takes O(logn), and there are n+1 of those.
An optimization, running in O(n), can be done if we take advantage of "non negative" trait of the array. This means, as we go on in the array - the candidate chosen (in the map seek) is always increasing.
We can utilize it to have two pointers running concurrently, and finding best matches, instead of searching from scratch in the index as we did before.
solve(array, k):
left = 0
sum = 0
min_houses = infinity
for right from 0 to len(array):
sum = sum + array[right]
while (left < right && sum >= k):
min_houses = min(min_houses, right - left)
sum = sum - array[left]
left = left + 1
return min_houses
This runs in O(n), as each index is increased at most n times, and every operation is O(1).

Sum of all numbers less than k in a range using segment tree

I have an array A with elements size <= 10^6.
I want to implement a data structure which gives me sum of all elements less thank k in a particular range say l to r.
I know it can be solved using segment tree but dont know how to maintain segment tree for variable k queries.
Please help me with pseudo code.
As there are no updates I think Mo's Algorithms could also be used.
Below assumes the elements in your array are all positive
how about not maintaining segment tree for specific k but resolving the query instead
Just consider your segment tree.
At each node Node_i, you know:
its covering sum: s_i
the number of elements it covers: n_i
So two steps:
For a given range query, get down to the corresponding node Node_i.
For that Node_i, s_i is the sum of its two children's sum. For each of those given child Node_j with its n_j elements covered: two possibilities
n_j*k < s_j :all elements are less than k
n_j*k >= s_j:at least one element is greater or equal than k
So first case, the child's sum is already valid, nothing more to do.
Second case, you have to explore the child and so forth until nothing more to do
At some point, (if you have an invalid element) you will reach a bottom of the tree: that very node (also an elem) is bad, and you backtrack that fact.
When you get back to your node Node_i, you substract from s_i all those bad leaf node's value you found.
pseudo code is:
#node is like:
#children:[c1, c2]
#n:number of elem covered
#sum: sum of all elemens it covers
#returns the sum of the covered elements whose value is __greater__ or equal than k
def explore(node, k):
#terminal case
if node.n == 1:
if node.sum >= k:
return node.sum
# when the range query is of size 1...,
# you may want to handle that elsewhere (e.g before calling explore)
return 0
end
#standard case
[c1,c2] = node.children
totalsum = 0
if c1.n * k < c1.sum
#all your elems are less than k, substract nothing
totalsum += 0
else
totalsum += explore(c1, k)
#same for c2...
return totalsum
If your k is fixed you can map array values as follows:
If element is less than k put that value in leaf, else put 0. Then you can use standard sum function because all elements that are greater than k will be 0 in leafs and won't affect sum.

Find two elements with smallest absolute difference in an interval

I'm given an array and a list of queries of type L R which mean find the smallest absolute difference between any two array elements such that their indices are between L and R inclusive (Here the starting index of array is at 1 instead of at 0)
For example take the array a with elements 2 1 8 5 11 then the query 1-3 which would be (2 1 8) the answer would be 1=2-1, or the query 2-4 (1 8 5) where the answer would be 3=8-5
Now this is easy if you have to look at one interval you sort the interval and then compare i-th element with i+1-th and store the minimum difference for each i.
The problem is that I'll have a lot of intervals to check I have to keep the original array intact.
What I've done is I constructed a new array b with indices from the first one such that a[b[i]] <= a[b[j]] for i <= j. Now for each query I loop through the whole array and look if b[j] is between L and R if it is compare its absolute difference to the first next element that is also between L and R keep the minimum and then do the same for that element until you get to the end.
This is inefficient because for each query I have to check all elements of the array especially if the query is small compared to the size of array. I'm looking for a time efficient approach.
EDIT: The numbers don't have to be consecutive, perhaps I gave a bad array as an example, What I've meant for example if it's 1 5 2 then the smallest difference is 1=2-1. In a sorted array the smallest difference is guaranteed to be between two consecutive elements, that's why I've thought of sorting
I'll sketch an O(n (√n) log n)-time solution, which might be fast enough? When I gave up sport programming, computers were a lot slower.
The high-level idea is to apply Mo's trick to a data structure with the following operations.
insert(x) - inserts x into the underlying multiset
delete(x) - deletes one copy of x from the underlying multiset
min-abs-diff() - returns the minimum absolute difference
between two elements of the multiset
(0 if some element has multiplicity >1)
Read in all of the query intervals [l, r], sort them in order of lexicographically nondecreasing (floor(l / sqrt(n)), r) where n is the length of the input, and then to process an interval I, insert the elements in I - I' where I' was the previous interval, delete the elements in I' - I, and report the minimum absolute difference. (The point of the funny sort order is to reduce the number of operations from O(n^2) to O(n √n) assuming n queries.)
There are a couple ways to implement the data structure to have O(log n)-time operations. I'm going to use a binary search tree for clarity of exposition, but you could also sort the array and use a segment tree (less work if you don't have a BST implementation that lets you specify decorations).
Add three fields to each BST node: min (minimum value in the subtree rooted at this node), max (maximum value in the subtree rooted at this node), min-abs-diff (minimum absolute difference between values in the subtree rooted at this node). These fields can be computed bottom-up like so.
if node v has left child u and right child w:
v.min = u.min
v.max = w.max
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max,
w.min - v.value, w.min-abs-diff)
if node v has left child u and no right child:
v.min = u.min
v.max = v.value
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max)
if node v has no left child and right child w:
v.min = v.value
v.max = w.max
v.min-abs-diff = min(w.min - v.value, w.min-abs-diff)
if node v has no left child and no right child:
v.min = v.value
v.max = v.value
v.min-abs-diff = ∞
This logic can be implemented pretty compactly.
if v has a left child u:
v.min = u.min
v.min-abs-diff = min(u.min-abs-diff, v.value - u.max)
else:
v.min = v.value
v.min-abs-diff = ∞
if v has a right child w:
v.max = w.max
v.min-abs-diff = min(v.min-abs-diff, w.min - v.value, w.min-abs-diff)
else:
v.max = v.value
insert and delete work as usual, except that the decorations need to be updated along the traversal path. The total time is still O(log n) for reasonable container choices.
min-abs-diff is implemented by returning root.min-abs-diff where root is the root of the tree.
EDIT #2: My answer determines the smallest difference between any two adjacent values in a sequence, not the smallest difference between any two values in the sequence.
When you say that you have a lot of intervals to check, do you happen to mean that you have to perform checks of many intervals over the same sequence of numbers? If so, what if you just pre-computed the differences from one number to the next? E.g., in Python:
elements = [2, 1, 8, 5, 11]
def get_differences(sequence):
"""Yield absolute differences between each pair of items in the sequence"""
it = iter(sequence)
sentinel = object()
previous = next(it, sentinel)
if previous is sentinel:
return ()
for current in it:
yield abs(previous - current)
previous = current
differences = list(get_differences(elements)) # differences = [1, 7, 3, 6]
Then when you have to find the minimum difference, just return min(differences[start_index:stop_index-1].
EDIT: I missed your paragraph:
Now this is easy if you have to look at one interval you sort the interval and then compare i-th element with i+1-th and store the minimum difference for each i.
But I still think what I'm saying makes sense; you don't have to sort the entire collection but you still need to do an O(n) operation. If you're dealing with numeric values on a platform where the numbers can be represented as machine integers or floats, then as long as you use an array-like container, this should be cache friendly and relatively efficient. If you happen to have repeated queries, you might be able to do some memoization to cache pre-computed results.

Finding longest overlapping interval pair

Say I have a list of n integral intervals [a,b] each representing set S = {a, a+1, ...b}. An overlap is defined as |S_1 \cap S_2|. Example: [3,6] and [5,9] overlap on [5,6] so the length of that is 2. The task is to find two intervals with the longest overlap in Little-O(n^2) using just recursion and not dynamic programming.
Naive approach is obviously brute force, which does not hold with time complexity condition. I was also unsuccessful trying sweep line algo and/or Longest common subsequence algorithm.
I just cannot find a way of dividing it into subproblems. Any ideas would be appreciated.
Also found this, which in my opinion does not work at all:
Finding “maximum” overlapping interval pair in O(nlog(n))
Here is an approach that takes N log(N) time.
Breakdown every interval [a,b] [c,d] into an array of pair like this:
pair<a,-1>
pair<b,a>
pair<c,-1>
pair<d,c>
sort these pairs in increasing order. Since interval starts are marked as -1, in case of ties interval they should come ahead of interval ends.
for i = 0 to end of the pair array
if current pair represents interval start
put it in a multiset
else
remove the interval start corresponding to this interval end from the multiset.
if the multiset is not empty
update the maxOverlap with (current_interval_end - max(minimum_value_in_multiset,start_value_of_current_interval)+1)
This approach should update the maxOverlap to the highest possible value.
Keep info about the two largest overlapping intervals max1 and max2 (empty in the beginning).
Sort the input list [x1, y1] .. [xn, yn] = I1..In by the value x, discarding the shorter of two intervals if equality is encountered. While throwing intervals out, keep max1 and max2 updated.
For each interval, add an attribute max in linear time, showing the largest y value of all preceding intervals (in sorted list):
rollmax = −∞
for j = 1..n do
Ij.max = rollmax
rollmax = max(rollmax, Ij.y)
On sorted, filtered, and expanded input list perform the following query. It uses an ever expanding sublist of intervals smaller then currently searched for interval Ii as input into recursive function SearchOverlap.
for i = 2..n do
SearchOverlap(Ii, 1, i − 1)
return {max1, max2}
Function SearchOverlap uses divide and conquer approach to traverse the sorted list Il, .. Ir. It imagines such list as a complete binary tree, with interval Ic as its local root. The test Ic.max < I.max is used to always decide to traverse the binary tree (go left/right) in direction of interval with largest overlap with I. Note, that I is the queried for interval, which is compared to log(n) other intervals. Also note, that the largest possible overlapping interval might be passed in such traversal, hence the check for largest overlap in the beginning of function SearchOverlap.
SearchOverlap(I , l, r)
c = ceil(Avg(l, r)) // Central element of queried list
if Overlap(max1, max2) < Overlap(I , Ic) then
max1 = I
max2 = Ic
if l ≥ r then
return
if Ic.max < I.max then
SearchOverlap(I , c + 1, r)
else
SearchOverlap(I , l, c − 1)
return
Largest overlapping intervals (if not empty) are returned at the end. Total complexity is O(n log(n)).

Partition a binary tree into k parts with similar sizes

I was trying to split a binary-tree into k similar-sized parts (by removing k-1 edges). Is there any efficient algorithm for this problem? Or is it NP-hard? Any pointers to papers, problem definitions, etc?
-- One reasonable metric for evaluating the quality of partitioning could be the size gap between the largest and smallest partition; another metric could be making the smallest partition having as many vertices as possible.
I can suggest pretty fast solution for making the smallest part having as many vertices as possible metric.
Let suppose we guess the size S of smallest partit and want check if it's correct.
First I want to make a few statements:
If total size of tree bigger than S there is at least one subtree which is bigger than S and all subtrees of that subtree are smaller. (It's enough to check both biggest.)
If there is some way to split tree where size of smallest part >= S and we have subtree T all subtrees of which are smaller than S than we can grant that no edges inside T are deleted. (Cause any such deletion will create a partition which will be smaller than S)
If there is some way to split tree where size of smallest part >= S, and we have some subtree T which size >= S, has no deleted edges inside but is not one of parts, we can split the tree in other way where subtree T will be one of parts itself and all parts will be no smaller than S. (Just move some extra vertices from original part to any other part, this other part will not become smaller.)
So here is an algorithm to check if we can split the tree in k parts no smaller than S.
find all suitable vertices (roots of subtrees of size >= S and size for both childs < S) and add them in list. You can start from the root and move through all vertices while subtrees are bigger than S.
While list not empty and number of parts lesser then K take a vertice from the list and cut it off the tree. Than update size of subtrees for parent vertices and add to the list if one of them become suitable.
You even have no need to update all the parent vertices, only until you will find first which's new subtree size is bigger than S, parent vertices cant't be suitable for adding in list yet and can be updated later.
You may need to construct tree back to restore original subtree sizes assigned to the vertices.
Now we can use bisection method. We can determine upper bound as Smax = n/k and lower bound can be retrieved from equation (2*Smin- 1)*(K - 1) + Smin = N it will grants that if we will cut off k-1 subtrees with two child subtrees of size Smin - 1 each, we will have part of size Smin left. Smin = (n + k -1)/(2*k - 1)
And now we can check S = (Smax + Smin)/2
If we manage to construct partition using the method above than S is smaller or equal to it's largest possible value, also smallest part in constructed partition may be bigger than S and we can set new lower bound to it instead of S, if we fail S is bigger than possible.
Time complexity of one check is k multiplied by number of parent nodes updated each time, for well balanced tree number of updated nodes is constant (we will use trick explaned earlier and will not update all parent nodes), still it's not bigger than (n/k) in worst case for ultimately unbalanced tree. Searching for suitable vertices has very similar behavior (all vertices passed while searching will be updated later.).
Difference between n/k and (n + k -1)/(2*k - 1) is proportional to n/k.
So we have time complexity O(k * log(n/k)) in best case if we have precalculated subtree sizes, O(n) if subtree sizes are not precalculated and O(n * log(n/k)) in worst case.
This method may lead to situation when last of parts will be comparably big but I suppose once you've got suggested method you can figure out some improvements to minimize it.
Here is a polynomial deterministic solution:
Let's assume that the tree is rooted and there are two fixed values: MIN and MAX - minimum and maximum allowed size of one component.
Then one can use dynamic programming to check if there is a partition such that each component size is between MIN and MAX:
Let's assume f(node, cuts_count, current_count) is true if and only if there is a way to make exactly cuts_count cuts in node's subtree so that current_count vertices are connected to the node so that condition 2) holds true.
The base case for the leaves is: f(leaf, 1, 0)(cut the edge from the parent to the leaf) is true if and only if MIN <= 1 and MAX >= 1 f(leaf, 0, 1)(do not cut it) is always true(it is false for all other values of cuts_count and current_count).
To compute f for a node(not a leaf), one can use the following algorithm:
//Combine all possible children states.
for cuts_left in 0..k
for cuts_right in 0..k
for cnt_left in 0..left_subtree_size
for cnt_right in 0..right_subtree_size
if f(left_child, cuts_left, cnt_left) is true and
f(right_child, cuts_right, cnt_right) is true and then
f(node, cuts_left + cuts_right, cnt_left + cnt_right + 1) = true
//Cut an edge from this node to its parent.
for cuts in 0..k-1
for cnt in 0..node's_subtree_size
if f(node, cuts, node's_subtree_size) is true and MIN <= cnt <= MAX:
f(node, cuts + 1, 0) = true
What this pseudo code does is combining all possible states of node's children to compute all reachable states for this node(the first bunch of for loops) and then produces the rest of reachable states by cutting the edge between this node and its parent(the second bunch of for loops)(the state means (node, cuts_count, current_count) tuple. I call it reachable if f(state) is true).
That is the case for a node with two children, the case with one child can be processes in similar manner.
Finally, if f(root, k, 0) is true then it is possible to find the partition which stratifies the condition 2) and it is not possible otherwise. We need to "pretend" that we did k cuts here because we also cut an imaginary edge from root to its parent(this edge and this parent doesn't exist actually) when we computed f for the root(to avoid corner case).
The space complexity of this algorithm(for fixed MIN and MAX) is O(n^2 * k)(n is the number of nodes), time complexity is O(k^2 * n^2). It might seem that the complexity is actually O(k^2 * n^3), but is not so because the product of number of vertices in left and right subtree of a node is exactly the number of pairs of node's such that their least common ancestor is this node. But the total number of pairs of nodes is O(n^2)(and each pair has only one least common ancestor). Thus, the sum of products of left and right subtree sizes over all nodes is O(n^2).
One can simply try all possible MIN and MAX values and choose the best, but it can be done faster. The key observation is that if there is a solution for MIN and MAX, there is always a solution for MIN and MAX + 1. Thus, one can iterate over all possible values of MIN(n / k different values) and apply binary search to find the smallest MAX which gives a valid solution(log n iterations). So the overall time complexity is O(n^2 * k^2 * n / k * log n) = O(n^3 * k * log n). However, if you want to maximize MIN(not to minimize the difference between MAX and MIN), you can simply use this algorithm and ignore MAX value everywhere(by setting its value to n). Then no binary search over MAX would be required, but one would be able to binary search over MIN instead and obtain an O(n^2 * k^2 * log n) solution.
To reconstruct the partition itself, one can start from f(root, k, 0) and apply the steps we used to compute f, but this time in opposite direction(from root to leaves). It is also possible to save the information about how to get the value of each state(what children's states were combined or what was the state before the edge was cut)(and update it appropriately during the initial computation of f) and then reconstruct the partition using this data(if my explanation of this step seems not very clear, reading an article on dynamic programming and reconstructing the answer might help).
So, there is a polynomial solution for this problem on a binary tree(even though it is NP-hard for an arbitrary graph).

Resources