I have an exercise that needs to be done with O(n) time complexity, however, I can only solve it with an O(n^2) solution.
You have an array and you need to count the longest contiguous sequence such that it's sum can be divided to 3 without any remainder. For example for array {1,2,3,-4,-1), the function will return 4 because the longest sequence that its sum(0) can be divided to 3 is {2,3,-4,-1}.
My solution O(n^2) is based on arithmetic progression. Is there any way to do it with O(n) complexity?
Please, I only want a clue or a theoretical explanation. Please don't write the full solution :)
Let's take a look at prefix sums. A [L, R] subarray is divisble by 3 if and only if prefixSum[L - 1] mod 3 = prefixSum[R] mod 3. This observation gives a very simple linear solution(because there are only 3 possible values of a prefix sum mod 3, we can simply find the first and the last one).
For example, if the input array is {1, 2, 3, -4, -1}, the prefix sums are {0, 1, 0, 0, 2, 1}. (there are n + 1 prefix sums because of an empty prefix). Now you can just take a look at the first and last occurrence of 0, 1 and 2.
As a non-CS person, this is interesting. First approach of mine was simply to calc the running sum mod 3. You'll get a sequence of {0,1,2}. Now look for the first and the last 0, the first and the last 1 and the first and the last 2, and compare their respective distances...
Iterate through the array, summing the total as you go. Record the position of the first position where the modulo sum is 0. Also, record the position of he first position where the modulo sum is 1. And, finally, record the position of he first position where the modulo sum is 2.
Do the same thing backwards also, recording the last position where the modulo sum is 0, 1, and 2. That gives three possibilities for the longest sequence - you just check which pair are farthest apart.
You apply dynamic programming.
For every position you compute 3 values:
The longest sequence ending in that position which has sum s = 0 mod 3
The longest sequence ending in that position which has sum s = 1 mod 3
The longest sequence ending in that position which has sum s = 2 mod 3
So given this value for position i you can easily compute the new ones for position i+1.
Related
This question already has answers here:
Finding the largest subarray with equal number of 0's and 1's
(11 answers)
Closed 7 years ago.
I'm trying to solve the following problem:
Given an binary array containing only 0s and 1s, find the largest subarray which contain equal no of 0s and 1s.
Examples:
Input: arr[] = {1, 0, 1, 1, 1, 0, 0,0,1}
Output: 1 to 8 (Starting and Ending indexes of output sub array)
I could only think of an O(n^2) solution (i.e. the obvious way of starting an array at each subposition and then checking all remaining elements for having the same number of 0s and 1s).
Can somebody figure out a better solution for this problem?
One teensy tiny note on wording: you say find the longest subarray, which implies uniqueness, but even in your example there is more than one ( 0 to 7 or 1 to 8). It would be worded better as "find a subarray of maximal length" or similar. But that's a non-issue.
As for a faster algorithm, first define a new array swap by replacing each instance of a 0 with a -1. This can be done in O(n) time. For your example, we would have
1, -1, 1, 1, 1, -1, -1, -1, 1
Now define another array sum such that sum[i] is the sum of all values swap[0], swap[1], ..., swap[i]. Equivalently,
sum[0] = swap[0];
for i > 1, sum[i] = sum[i-1] + swap[i]
Which again is in O(n) time. So your example becomes
1, 0, 1, 2, 3, 2, 1, 0, 1
Now for an observation. If the number of 1s is equal to the number of 0s in the subarray (arr[i], ..., arr[j]) then in the first new array the 1s will cancel with the corresponding -1, so the sum of all values (swap[i], ..., swap[j]) will be equal to 0. But this is then equal to
swap[0] + swap[1] + ... + swap[j] - (swap[0] + swap[1] + ... + swap[i-1]),
which in turn is equal to
sum[j] - sum[i-1].
Although note we have to be careful if i is equal to 0 because otherwise we are out of the array's bounds. This is an easy check to implement.
Now we have reduced the problem to finding when sum[j] - sum[i-1] is equal to 0. But this is equivalent to finding values j and i such that sum[j] = sum[i-1].
Since we know that for all values in sum they lie between -n and n (where n is the size of the initial array) you can now create another a pair of arrays min and max of size 2n+1. Here, the indices of min and max correspond to potential values of sum, where min[0] will hold the smallest index i for which sum[i] = -n, and min[1] will hold the smallest index i for which sum[i] = -n+1, and so on. Similarly max will hold the largest index. This can also be achieved in linear time. After this step, max[i] and min[i] will correspond to values for which sum[min[i]] = i = sum[max[i]].
Now all you have to do is find the largest value of max[k] - min[k], and this will give you from above i = min[k] + 1 and j = max[k], the indices of a maximal subarray containing an equal number of 0s and 1s. This is also O(n).
I sorta sketched this out a bit roughly, so you have to be careful when i = 0, but that's easily accounted for. Each step is however O(n), so there's your more efficient algorithm.
I believe this can be solved n O(n) using a weight balanced binary tree structure.
This question already has answers here:
Ranking and unranking of permutations with duplicates
(4 answers)
Closed 7 years ago.
This question has two parts, though since I'm trying to compe up with a Prolog implementation, solving one will probably immediately lead to a solution of the other one.
Given a permutation of a list of integers {1,2,...,N}, how can I tell what is the index of that permutation in lexicographic ordering?
Given a number k, how can I calculate k-th permutation of numbers {1,2...,N}?
I'm looking for an algorithm that can do this reasonably better than just iterating a next permutation function k times. Afaik it should be possible to directly compute both of these.
What I came up with so far is that by looking at numbers from the left, I can tell how many permutations were before each number at a particular index, and then somehow combine those, but I'm not really sure if this leads to a correct solution.
Think how many permutations start with the number 1, how many start with the number 2, and so on. Let's say n = 5, then 24 permutations start with 1, 24 start with 2, and so on. If you are looking for permutation say k = 53, there are 48 permutations starting with 1 or 2, so #53 is the fifth of the permutations starting with 3.
Of the permutations starting with 3, 6 each start with 31, 32, 34 or 35. So you are looking for the fifth permutation starting with (3, 1). There are two permutations each starting with 312, 314 and 315. So you are looking for the first of the two permutations starting with 315. Which is 31524.
Should be easy enough to turn this into code.
You can also have a look at the factorial number system, especially the part regarding permutations. For a given number k, you are first supposed to find its factorial representation, which then easily gives the required permutation (actually, (k+1)-st permutation).
An example for k=5 and numbers {1,2,3}:
5 = 2*2! + 1*1! + 0*0! = (210)_!
so the factorial representation of 5 is 210. Let's now map that representation into the permutation. We start with the ordered list (1,2,3). The leftmost digit in our factorial representation is 2, so we are looking for the element in the list at the index 2, which is 3 (list is zero-indexed). Now we are left with the list (1,2) and continue the procedure. The leftmost digit in our factorial representation, after removing 2, is 1, so we get the element at the index 1, which is 2. Finally, we are left with 1, so the (k+1)-st (6th) permutation of {1,2,3} is {3,2,1}.
Even though it takes some time to understand it, it is quite efficient algorithm and simple to program. The reverse mapping is similar.
I'll just give the outline of a solution for each:
Given a permutation of a list of integers {1,2,...,N}, how can I tell what is the index of that permutation in lexicographic ordering?
To do this, ask yourself how many permutations start with 1? There are (N - 1)!. Now, let's do an example:
3 1 2
How many permutations of 1 2 3 start with 1 or 2? 2*2!. This one has to be after those, so its index is at least 2*2! = 4. Now check the next element. How many permutations of 1 2 start with 0? None. You're done, the index is 4. You can add 1 if you want to use 1-based indexing.
Given a number k, how can I calculate k-th permutation of numbers {1,2...,N}?
Given 4, how can we get 3 1 2? We have to find each element.
What can we have on the first position? If we have 1, the maximum index can be 2! - 1 = 1 (I'm using zero-based indexing). If we have 2, the maximum can be 2*2! - 1 = 3. If we have 3, the maximum can be 5. So we must have 3:
3
Now, we have reduced the problem to finding the 4 - 2*2! = 0-th permutation of 1 2, which is 1 2 (you can reason about it recursively as above).
I'm looking for an efficient algorithm (not necessarily a code) for solving the following question:
Given n positive and negative numbers that sum up to zero, we would like to find a starting index that will cause the cumulated sum to zero up as many times as possible.
It doesn't have to be in a specific manner, but the importance here is the efficincy- we want the algorithm/idea to be able to this in less then a qudratic "time complexity"
An example:
Given the numbers: 2, -1, 3, 1, -3, -2:
If we strat summing up with 2 (first index), the sum will be zero only once (at the end of the summation), but strting with -1 will yield zero twice during the summation.
The given numbers may have more than one "best index", but we would like to find at least one of these indexes.
I've tried doing it with binary search, but didn't make much progress- so any hints/help will be appreciated.
You can compute prefix sums. In terms of prefix sums, zeros are positions that have the same value of a prefix sum as the start position. So the problem is reduced to finding the most frequent element in the array of prefix sums. It can be solved efficiently using sorting or hash tables.
Here is an example:
Input: {2, -1, 3, 1, -3, 2}
Prefix sums: {0, 2, 1, 4, 5, 2, 0}
The most frequent element is 2. The first occurrence of 2 is in the first position. Thus, starting from the second element yields optimal answer.
How to find, in a binary string, the longest substring where the balance, i.e. the difference between the number of ones and zeros, is >= 0?
Example:
01110000010 -> 6: 011100
1110000011110000111 -> 19: entire string
While this problem looks very similar to the Maximum Value Contiguous Subsequence (Maximum Contiguous Sum) problem, a dynamic programming solution doesn't seem to be obvious. In a divide-and-conquer approach, how to do the merging? Is an "efficient" algorithm possible after all? (A trivial O(n^2) algorithm will just iterate over all substrings for all possible starting points.)
This is a modified variant of Finding a substring, with some additional conditions. The difference is that in the linked question, only such substrings are allowed where balance never falls below zero (looking at the string in either forward or backward direction). In the given problem, balance is allowed to fall below zero, provided it recovers at some later stage.
I have a solution that requires O(n) additional memory and O(n) time.
Let's denote the 'height' of an index h(i) as
h(i) = <number of 1s in the substring 1..i> - <number of 0s in the same substring>
The problem can now be reformulated as: find i and j such as h(i) <= h(j) and j-i -> max.
Obviously, h(0) = 0, and if h(n) = 0, then the solution is the entire string.
Now let's compute the array B so that B[x] = min{i: h(i) = -x}. In other words, let B[x] be the leftmost index i at which h(i)= -x.
The array B[x] has a length of at most n, and is computed in one linear pass.
Now we can iterate over the original string and for each index i compute the length of the longest sequence with non-negative balance that ends on i as follows:
Lmax(i) = i - B[MIN{0, h(i)}]
The largest Lmax(i) across all i will give you the desired length.
I leave the proof as an exercise :) Contact me if you can't figure it out.
Also, my algorithm needs 2 passes of the original string, but you can collapse them into one.
This can be answered quite easily in O(n) using "height array", representing the number of 1's relative to the number of 0's. Like my answer in the linked question.
Now, instead of focusing on the original array, we now focus on two arrays indexed by the heights, and one will contain the smallest index such height is found, and the other will contain the largest index such height is found. Since we don't want a negative index, we can shift everything up, such that the minimum height is 0.
So for the sample cases (I added two more 1's at the end to show my point):
1110000011010000011111
Array height visualization
/\
/ \
/ \
\ /\/\ /
\/ \ /
\ /
\ /
\/
(lowest height = -5)
Shifted height array:
[5, 6, 7, 8, 7, 6, 5, 4, 3, 4, 5, 4, 5, 4, 3, 2, 1, 0, 1, 2, 3]
Height: 0 1 2 3 4 5 6 7 8
first_view = [17,16,15, 8, 7, 0, 1, 2, 3]
last_view = [17,18,19,20,21,22, 5, 4, 3]
note that we have 22 numbers and 23 distinct indices, 0-22, representing the 23 spaces between and padding the numbers
We can build the first_view and last_view array in O(n).
Now, for each height in the first_view, we only need to check every larger heights in last_view, and take the index with maximum difference from the first_view index. For example, from height 0, the maximum value of index in larger heights is 22. So the longest substring starting at index 17+1 will end at index 22.
To find the maximum index on the last_view array, you can convert it to a maximum to the right in O(n):
last_view_max = [22,22,22,22,22,22, 5, 4, 3]
And so finding answer is simply subtracting first_view from last_view_max,
first_view = [17,16,15, 8, 7, 0, 1, 2, 3]
last_view_max = [22,22,22,22,22,22, 5, 4, 3]
result = [ 5, 6, 7,14,15,22, 4, 2, 0]
and taking the maximum (again in O(n)), which is 22, achieved from starting index 0 to ending index 22, i.e., the whole string. =D
Proof of correctness:
Suppose that the maximum substring starts at index i, ends at index j.
If the height at index i is the same as the height at index k<i, then k..j would be a longer substring still satisfying the requirement. Therefore it suffices to consider the first index of each height. Analogously for the last index.
Compressed quadratic runtime
We will be looking for (locally) longest substrings with balance zero, starting at the beginning. We will ignore strings of zeros. (Corner cases: All zeros -> empty string, balance never reaches zero again -> entire string.) Of these substrings with balance zero, all trailing zeros will be removed.
Denote by B a substring with balance > 0 and by Z a substring with only zeros. Each input string can be decomposed as follows (pseudo-regex notation):
B? (Z B)* Z?
Each of the Bs is a maximum feasible solution, meaning that it cannot be extended in either direction without reducing balance. However, it might be possible to collapse sequences of BZB or ZBZ if the balance is still larger than zero after collapsing.
Note that it is always possible to collapse sequences of BZBZB to a single B if the ZBZ part has balance >= 0. (Can be done in one pass in linear time.) Once all such sequences have been collapsed, the balance of each ZBZ part is below zero. Still, it is possible that there exist BZB parts with balance above zero -- even that in a BZBZB sequence with balance below zero both the leading and trailing BZB parts have balance over zero. At this point, it seems to be difficult to decide which BZB to collapse.
Still quadratic...
Anyway, with this simplified data structure one can try all Bs as starting points (possibly extending to the left if there's still balance left). Run time is still quadratic, but (in practice) with a much smaller n.
Divide and conquer
Another classic. Should run in O(n log n), but rather difficult to implement.
Idea
The longest feasible substring is either in the left half, in the right half, or it passes over the boundary. Call the algorithm for both halves. For the boundary:
Assume problem size n. For the longest feasible substring that crosses the boundary, we are going to compute the balance of the left-half part of the substring.
Determine, for each possible balance between -n/2 and n/2, in the left half, the length of the longest string that ends at the boundary and has this (or a larger) balance. (Linear time!) Do the same for the right half and the longest string that starts at the boundary. The result is two arrays of size n + 1; we reverse one of them, add them element-wise and find the maximum. (Again, linear.)
Why does it work?
A substring with balance >= 0 that crosses the boundary can have balance < 0 in either the left or the right part, if the other part compensates this. ("Borrowing" balance.) The crucial question is how much to borrow; we iterate over all potential "balance credits" and find the best trade-off.
Why is this O(n log n)?
Because merging (looking at boundary-crossing string) takes only linear time.
Why is merging O(n)?
Exercise left to the reader.
Dynamic programming -- linear run time (finally!)
inspired by this blog post. Simple and efficient, one-pass online algorithm, but takes some time to explain.
Idea
The link above shows a different problem: Maximum subsequence sum. It cannot be mapped 1:1 to the given problem, here a "state" of O(n) is needed, in contrast to O(1) for the original problem. Still, the state can be updated in O(1).
Let's rephrase the problem. We are looking for the longest substring in the input where the balance, i.e. the difference between 0's and 1's, is greater than zero.
The state is similar to my other divide-and-conquer solution: We compute, for each position i and for each possible balance b the starting position s(i, b) of the longest string with balance b or greater that ends at position i. That is, the string that starts at index s(i, b) + 1 and ends at i has balance b or greater, and there is no longer such string that ends at i.
We find the result by maximizing i - s(i, 0).
Algorithm
Of course, we do not keep all s(i, b) in memory, just those for the current i (which we iterate over the input). We start with s(0, b) := 0 for b <= 0 and := undefined for b > 0. For each i, we update with the following rule:
If 1 is read: s(i, b) := s(i - 1, b - 1).
If 0 is read: s(i, b) := s(i - 1, b + 1) if defined, s(i, 0) := i if s(i - 1, 1) undefined.
The function s (for current i) can be implemented as a pointer into an array of length 2n + 1; this pointer is moved forward or backward depending on the input. At each iteration, we note the value of s(i, 0).
How does it work?
The state function s becomes effective especially if the balance from the start to i is negative. It records the earliest start point where zero balance is reached, for all possible numbers of 1s that have not been read yet.
Why does it work?
Because the recursive definition of the state function is equivalent to its direct definition -- the starting position of the longest string with balance b or greater that ends at position i.
Why is the recursive definition correct?
Proof by induction.
E.g.: Array: 4,3,0,1,5 {Assume all digits are >=0. Also each element in array correspond to a digit. i.e. each element on the array is between 0 and 9. }
In the above array, the largest number is: 5430 {using digits 5, 4, 3 and 0 from the array}
My Approach:
For divisibility by 3, we need the sum of digits to be divisible by 3.
So,
Step-1: Remove all the zeroes from the array.
Step-2: These zeroes will come at the end. {Since they dont affect the sum and we have to find the largest number}
Step-3: Find the subset of the elements of array (excluding zeroes) such that the number of digits is MAXIMUM and also that the sum of digits is MAXIMUM and the sum is divisible by 3.
STEP-4: The required digit consists of the digits in the above found set in decreasing order.
So, the main step is STEP-3 i.e. How to find the subset such that it contains MAXIMUM possible number of elements such that their sum is MAX and is divisible by 3 .
I was thinking, maybe Step-3 could be done by GREEDY CHOICE of taking all the elements and keep on removing the smallest element in the set till the sum is divisible by 3.
But i am not convinced that this GREEDY choice will work.
Please tell if my approach is correct.
If it is, then please suggest as to how to do Step-3 ?
Also, please suggest any other possible/efficient algorithm.
Observation: If you can get a number that is divisible by 3, you need to remove at most 2 numbers, to maintain optimal solution.
A simple O(n^2) solution will be to check all possibilities to remove 1 number, and if none is valid, check all pairs (There are O(n^2) of those).
EDIT:
O(n) solution: Create 3 buckets - bucket1, bucket2, bucket0. Each will denote the modulus 3 value of the numbers. Ignore bucket0 in the next algorithm.
Let the sum of the array be sum.
If sum % 3 ==0: we are done.
else if sum % 3 == 1:
if there is a number in bucket1 - chose the minimal
else: take 2 minimals from bucket 2
else if sum % 3 == 2
if there is a number in bucket2 - chose the minimal
else: take 2 minimals from bucket1
Note: You don't actually need the bucket, to achieve O(1) space - you need only the 2 minimal values from bucket1 and bucket2, since it is the only number we actually used from these buckets.
Example:
arr = { 3, 4, 0, 1, 5 }
bucket0 = {3,0} ; bucket1 = {4,1} bucket2 = { 5 }
sum = 13 ; sum %3 = 1
bucket1 is not empty - chose minimal from it (1), and remove it from the array.
result array = { 3, 4, 0, 5 }
proceed to STEP 4 "as planned"
Greedy choice definitely doesn't work: consider the set {5, 2, 1}. You'd remove the 1 first, but you should remove the 2.
I think you should work out the sum of the array modulo 3, which is either 0 (you're finished), or 1, or 2. Then you're looking to remove the minimal subset whose sum modulo 3 is 1 or 2.
I think that's fairly straightforward, so no real need for dynamic programming. Do it by removing one number with that modulus if possible, otherwise do it by removing two numbers with the other modulus. Once you know how many to remove, choose the smallest possible. You'll never need to remove three numbers.
You don't need to treat 0 specially, although if you're going to do that then you can further reduce the set under consideration in step 3 if you temporarily remove all 0, 3, 6, 9 from it.
Putting it all together, I would probably:
Sort the digits, descending.
Calculate the modulus. If 0, we're finished.
Try to remove a digit with that modulus, starting from the end. If successful, we're finished.
Remove two digits with negative-that-modulus, starting from the end. This always succeeds, so we're finished.
We might be left with an empty array (e.g. if the input is 1, 1), in which case the problem was impossible. Otherwise, the array contains the digits of our result.
Time complexity is O(n) provided that you do a counting sort in step 1. Which you certainly can since the values are digits.
What do you think about this:
first sort an array elements by value
sum up all numbers
- if sum's remainder after division by 3 is equal to 0, just return the sorted
array
- otherwise
- if sum of remainders after division by 3 of all the numbers is smaller
than the remainder of their sum, there is no solution
- otherwise
- if it's equal to 1, try to return the smallest number with remainder
equal to 1, or if no such, try two smallest with remainder equal to 2,
if no such two (I suppose it can happen), there's no solution
- if it's equal to 2, try to return the smallest number with remainder
equal to 2, or if no such, try two smallest with remainder equal to 1,
if no such two, there's no solution
first sort an array elements by remainder of division by 3 ascending
then each subset of equal remainder sort by value descending
First, this problem reduces to maximizing the number of elements selected such that their sum is divisible by 3.
Trivial: Select all numbers divisible by 3 (0,3,6,9).
Le a be the elements that leave 1 as remainder, b be the elements that leave 2 as remainder. If (|a|-|b|)%3 is 0, then select all elements from both a and b. If (|a|-|b|)%3 is 1, select all elements from b, and |a|-1 highest numbers from a. If the remainder is 2, then select all numbers from a, and |b|-1 highest numbers from b.
Once you have all the numbers, sort them in reverse order and concatenate. that is your answer.
Ultimately if n is the number of elements this algorithm returns a number that is al least n-1 digits long (except corner cases. see below).
NOTE: Take care of corner cases(i.e. what is |a|=0 or |b|=0 etc). (-1)%3 = 2 and (-2)%3 = 1 .
If m is the size of alphabet, and n is the number of elements, this my algorithm is O(m+n)
Sorting the data is unnecessary, since there are only ten different values.
Just count the number of zeroes, ones, twos etc. in O (n) if n digits are given.
Calculate the sum of all digits, check whether the remainder modulo 3 is 0, 1 or 2.
If the remainder is 1: Remove the first of the following which is possible (one of these is guaranteed to be possible): 1, 4, 7, 2+2, 2+5, 5+5, 2+8, 5+8, 8+8.
If the remainder is 2: Remove the first of the following which is possible (one of these is guaranteed to be possible): 2, 5, 8, 1+1, 1+4, 4+4, 1+7, 4+7, 7+7.
If there are no digits left then the problem cannot be solved. Otherwise, the solution is created by concatenating 9's, 8's, 7's, and so on as many as are remaining.
(Sorting n digits would take O (n log n). Unless of course you sort by counting how often each digit occurs and generating the sorted result according to these numbers).
Amit's answer has a tiny thing missing.
If bucket1 is not empty but it has a humongous value, lets say 79 and 97 and b2 is not empty as well and its 2 minimals are, say 2 and 5. Then in this case, when the modulus of the sum of all digits is 1, we should choose to remove 2 and 5 from bucket 2 instead of the minimal in bucket 1 to get the largest concatenated number.
Test case : 8 2 3 5 78 79
If we follow Amits and Steve's suggested method, largest number would be 878532 whereas the largest number possible divisble by 3 in this array is 879783
Solution would be to compare the appropriate bucket's smallest minimal with the concatenation of both the minimals of the other bucket and eliminate the smaller one.