Efficient approach to find co-prime subarrays - algorithm

Given an array, is it possible to find the number of co-prime sub arrays of the array in better than O(N²) time? Co-prime arrays are defined as a contiguous subset of an array such that GCD of all elements is 1.

Consider adding one element to the end of the array. Now find the rightmost position, if any, such that the sub-array from that position to the element you have just added is co-prime. Since it is rightmost, no shorter array ending with the element added is co-prime. Since it is co-prime, every array that starts to its left and ends with the new element is co-prime. So you have worked out the number of co-prime sub-arrays that end with the new element. If you can find the rightmost position efficiently - say in O(log n) instead of O(n) - then you can count the number of co-prime sub-arrays in O(n log n) by extending the array one element at a time.
To make it possible to find rightmost positions, think of the full array as the leaves of a complete binary tree, padded out to make its a length a power of two. At each node put the GCD of all of the elements below that node - you can do this from the bottom up in time O(n). Every contiguous interval within the array can be covered by a collection of nodes of size O(log n) such that the interval consists of the leaves underneath the nodes, so you can compute the GCD of the interval is time O(log n).
To find the rightmost position forming a co-prime subarray with your current element, start with the current element and check to see if it is 1. If it is, you are finished. If not, look at the element to its left, take a GCD with that, and push the result on a stack. If the result is 1, you are finished, if not, do the same, but look to see if there is a sub-tree of 2 elements you can use to add 2 elements at once. At each of the succeeding steps you double the size of the sub-tree you are trying to find. You won't always find a convenient sub-tree of the size you want, but because every interval can be covered by O(log n) subtrees you should get lucky often enough to go through this step in time O(log n).
Now you have either found that whole array to the current element is not co-prime or you have found a section that is co-prime, but may go further to the left than it needs. The value at the top of the stack was computed by taking the GCD of the value just below it on the stack and the GCD at the top of a sub-tree. Pop it off the stack and take the GCD of the value just below it and the right half of the sub-tree. If you are still co-prime then you didn't need the left half of the sub-tree. If not, then you needed it, but perhaps not all of it. In either case you can continue down to find the rightmost match in time O(log n).
So I think you can find the rightmost position forming a co-prime subarray with the current element in time O(log n) (admittedly with some very fiddly programming) so you can count the number of coprime sub-arrays in time O(n log n)
Two examples:
List 1, 3, 5, 7. The next level is 1, 1 and the root is 1. If the current element is 13 then I check against 7 and find that gcd(7, 13) = 1. Therefore I immediately know that GCD(5, 7, 13) = GCD(3, 5, 7, 13) = GCD(1, 3, 4, 7, 13) = 1.
List 2, 4, 8, 16. The next level is 2, 8 and the root is 2. If the current numbers is 32 then I check against 16 and find that gcd(16, 32) = 16 != 1 so then I check against 8 and find that GCD(8, 32) = 8 and then I check against 2 and find that GCD(2, 32) = 2 so there is no interval in the extended array which has GCD = 1.

Related

find 4th smallest element in linear time

So i had an exercise given to me about 2 months ago, that says the following:
Given n (n>=4) distinct elements, design a divide & conquer algorithm to compute the 4th smallest element. Your algorithm should run in linear time in the worst case.
I had an extremely hard time with this problem, and could only find relevant algorithms that runs in the worst case O(n*k). After several weeks of trying, we managed, with the help of our teacher, "solve" this problem. The final algorithm is as follows:
Rules: The input size can only be of size 2^k
(1): Divide input into n/2. One left array, one right array.
(2): If input size == 4, sort the arrays using merge sort.
(2.1) Merge left array with right array into a new result array with length 4.
(2.2) Return element at index [4-1]
(3): Repeat step 1
This is solved recursively and our base case is at step 2. Step 2.2 means that for all
of our recursive calls that we did, we will get a final result array of length 4, and at that
point, we can justr return the element at index [4-1].
With this algorithm, my teacher claims that this runs in linear time. My problem with that statement is that we are diving the input until we reach sub-arrays with an input size of 4, and then that is sorted. So for an input size of 8, we would sort 2 sub-arrays with length 4, since 8/4 = 2. How is this in any case linear time? We are still sorting the whole input size but in blocks aren't we? This really does not make sense to me. It doesn't matter if we sort the whole input size at it is, or divide it into sub-arrays with size of 4,and sort them like that? It will still be a worst time of O(n*log(n))?
Would appreciate some explanations on this !
To make proving that algorithm runs in linear time, let's modify it a bit (we will only change an order of dividing and merging blocks, nothing more):
(1): Divide input into n/4 blocks, each has size 4.
(2): Until there is more than one block, repeat:
Merge each pair of adjacent blocks into one block of size 4.
(For example, if we have 4 blocks, we will split them in 2 pairs -
first pair contains first and second blocks,
second pair contains third and fourth blocks.
After merging we will have 2 blocks -
the first one contains 4 least elements from blocks 1 and 2,
the second one contains 4 least elements from blocks 3 and 4).
(3): The answer is the last element of that one block left.
Proof: It's a fact that array of constant length (in your case, 4) can be sorted in constant time. Let k = log(n). Loop (2) runs k-2 iterations (on each iteration the count of elements left is divided by 2, until 4 elements are left).
Before i-th iteration (0 <= i < k-2) there are (2^(k-i)) elements left, so there are 2^(k-i-2) blocks and we will merge 2^(k-i-3) pairs of blocks. Let's find how many pairs we will merge in all iterations. Count of merges equals
mergeOperationsCount = 2^(k-3) + 2^(k-4) + .... + 2^(k-(k-3)) =
= 2^(k-3) * (1 + 1/2 + 1/4 + 1/8 + .....) < 2^(k-2) = O(2^k) = O(n)
Since we can merge each pair in constant time (because thay have constant size), and the only operation we make is merging pairs, the algorithm runs in O(n).
And after this proof, I want to notice that there is another linear algorithm which is trivial, but it is not divide-and-conquer.

How to calculate the maximum median in an array

This is an algorithm question:
Input is an array with non-duplicate positive integers. Find a continuous subarray(size > 1) which has the maximum median value.
Example: input: [100, 1, 99, 2, 1000], output should be the result of (1000 + 2) / 2 = 501
I can come up the brute force solution: try all lengths from 2 -> array size to find the maximum median. But it seems too slow. I also tried to use two pointer on this question but not sure when to move left and right pointer.
Anyone has a better idea to solve this question?
tl;dr - We can show that the answer must be of length 2 or 3, after which it's linear time to check all the possibilities.
Let's say the input is A and the smallest subarray with the biggest median is a. The biggest median is either a single element or the average of a pair of elements from a. Notice that every element in a bigger than the largest element of the median can only be next to elements less than the smallest element of the median (otherwise such a pair could be chosen as a subarray to form a bigger median).
If either end of a had a pair of elements that didn't include an element of the median, it could be eliminated from a without affecting the median, a contradiction.
If either end of a was smaller than the smallest element of the median, eliminating it would increase the median, a contradiction.
Thus each end of a is either an element of the median or larger than the largest element of the median (because it's larger than the smallest elt of the median and not equal to the largest elt of the median).
Thus each end of a is an element of the median because otherwise, we'd have an element larger than an element of the median adjacent to an elt of the median, forming a larger median.
If a is odd then it must be of length three, since any larger odd length could have 2 removed from the end of a farthest from the median without changing the median.
If a is even then it must be of length 2 because any larger even length bookended by the elements of the median with interior elements alternating between smaller and larger than the median must have one of the median elements adjacent to a larger element than the other elt of the median, forming a larger median.
This proof outline could use some editing, but regardless, the conclusion is that the smallest array containing the largest median must be of length 2 or 3.
Given that, check every such subarray in linear time. O(n).
This is a Python implementation of an algorithm that solves the problem in O(n):
import random
import statistics
n = 50
numbers = random.sample(range(n),n)
max_m = 0;
max_a = [];
for i in range(2,3):
for j in range(0,n-i+1):
a = numbers[j:j+i]
m = statistics.median(a)
if m > max_m:
max_m = m
max_a = a
print(numbers)
print(max_m)
print(max_a)
This is a variation of the brute force algorithm (O(n^3)) that performs only the search for sub-arrays of length 2 or 3. The reason is that for every array of size n, there exists a sub-array that has the same or improved median. Applying this reasoning recursively, we can reduce the size of the sub-array to 2 or 3. Thus, by looking only at sub-arrays of size 2 or 3, we are guaranteed to obtain the sub-array with the maximum median.
The operation is the following: If, for a contiguous sub-array (at the beginning or at the end), at least half of the elements are lower than the median (or lower than both values forming the median, if this is the case), remove them to improve or at least preserve the median.
If in all sub-arrays there is always at least one more element above or equal to the median(s) than below, there will come a point where the size of the sub-array will be that of the median. In that case, it means that the complement will have more elements below the median, and thus, we can simply remove the complement and improve (or preserve) the median. Thus, we can always perform the operation. For n=3, it can happen that you need to remove 2 or 3 elements to perform the operation, which is not allowed. In this case, the result is the list itself.

Sequence increasing and decreasing by turns

Let's assume we've got a sequence of integers of given length n. We want to delete some elements (maybe none), so that the sequence is increasing and decreasing by turns in result. It means, that every element should have neighbouring elements either both bigger or both smaller than itself.
For example 1 3 2 7 6 and 5 1 4 2 10 are both sequences increasing and decreasing by turns.
We want to delete some elements to transform our sequence that way, but we also want to maximize the sum of elements left. So, for example, from sequence 2 18 6 7 8 2 10 we want to delete 6 and make it 2 18 7 8 2 10.
I am looking for an effective solution to that problem. Example above shows that the most naive greedy algorithm (delete every first element that breaks the sequence) won't work - it would delete 7 instead of 6, which would not maximize the sum of elements left.
Any ideas how to solve that effectively (O(n) or O(n log n) probably) and correctly?
For every element of the sequence with index i we will calculate F(i, high) and F(i, low), where F(i, high) equals to the biggest sum of the subsequence with wanted characteristics that ends with the i-th element and this element is a "high peak". (I'll explain mainly the "high" part, the "low" part can be done similarly). We can calculate these functions using the following relations:
The answer is maximal among all F(i, high) and F(i, low) values.
That gives us a rather simple dynamic programming solution with O(n^2) time complexity. But we can go further.
We can optimize a calculation of max(F(j,low)) part. What we need to do is to find the biggest value among previously calculated F(j, low) with the condition that a[j] < a[i]. This can be done with segment trees.
First of all, we'll "squeeze" our initial sequence. We need the real value of the element a[i] only when calculating the sum. But we need only the relative order of the elements when checking that a[j] is less than a[i]. So we'll map every element to its index in the sorted elements array without duplicates. For example, sequence a = 2 18 6 7 8 2 10 will be translated to b = 0 5 1 2 3 0 4. This can be done in O(n*log(n)).
The biggest element of b will be less than n, as a result, we can build a segment tree on the segment [0, n] with every node containing the biggest sum within the segment (we need two segment trees for "high" and "low" part accordingly). Now let's describe the step i of the algorithm:
Find the biggest sum max_low on the segment [0, b[i]-1] using the "low" segment tree (initially all nodes of the tree contain zero).
F(i, high) is equal to max_low + a[i].
Find the biggest sum max_high on the segment [b[i]+1, n] using the "high" segment tree.
F(i, low) is equal to max_high + a[i].
Update the [b[i], b[i]] segment of the "high" segment tree with F(i, high) value recalculating maximums of the parent nodes (and [b[i], b[i]] node itself).
Do the same for "low" segment tree and F(i, low).
Complexity analysis: b sequence calculation is O(n*log(n)). Segment tree max/update operations have O(log(n)) complexity and there are O(n) of them. The overall complexity of this algorithm is O(n*log(n)).

Time complexity to find 7th smallest element in a min heap?

I am interested in finding 7th smallest element in a min heap, if we assume that min heap contains duplicates ?
I don't know how to approach. Can anyone provide an idea ?
As the seventh smallest element is in the top 7 levels of the min-heap, it is the 7th smallest of the 127 elements in the top 7 levels. Since this number is fixed (independent of the size of the original heap), the complexity is O(1).
There's a simple O(k*log k) algorithm to select the k'th smallest element from a heap:
# h = input heap
q = new min-heap()
q.insert(h.root)
for i := 1 to k - 1
top = q.delete-min()
q.insert(top.left)
q.insert(top.right)
report q.top
Of course this is constant time for the case k = 7. If you want the k-th smallest distinct element, rather than the k-th smallest overall, you will need linear time, because all elements in the heap could be equal except for the leaves, and then you need to find the (k-1)st smallest leaf, which is not possible in o(n) if all inner nodes have the same value.

Interview Question: Reverse pairs

I got this for my interview:
Numbers are said to be "reverse ordered" if N[i] > N[j] for i < j.
For example, in a list: 3 4 1 6 7 3, the reverse ordered items are (3,1) (4,1) (4,3) (6,3) (7,3).
How to get the number of pairs of reverse ordered items in O(nlogn) time.
It is possible to do this in O(n log n) time using a modified version of merge sort. Do the division as normal, but you can count inversions as you merge. Each time you select an item from the right list over an item from the left list increment the count of inversions by the number of items remaining in the left list. So at each level the number of inversions is the number of inversions found during the merge plus the inversions found by each recursive call.
Note, please read the bottom of this answer to see why it actually is possible to do the problem. I read the question wrong.
It is not possible in the general case. Consider the list:
n, n-1, n-2 ... 4, 3, 2, 1
The pairs will be:
(n, n-1), (n, n-2) ... (n, 2), (n, 1), (n-1, n-2), (n-1, n-3) ... ... (3, 2), (3, 1), (2, 1)
Hence there are O(n^2) pairs and hence the list cannot be build in O(n log n)
However, you can do this with one pass of the list:
start at the end of the list and work backwords.
while moving through the list maintain a heap of the numbers you have seen (this will cause the loop to be O(n log n))
for ever number you encounter do a search in the heap to find all numbers which are less than the number you are currently on. Output the current number and the number in the heap as a pair. (this is O(n log n) to find the first match in the heap, but will be O(n) to find all smaller numbers)
For your example:
The list: 3 4 1 6 7 3
starting at the second item
heap (3)
item (7)
Output (7, 3)
heap (3, 7)
item (6)
search and find 7, output (6, 3)
heap (3, 6, 7)
item (1)
search and find nothing
heap (1, 3, 6, 7)
item (4)
search and find 3 and 1. output (4, 3) (4, 1)
etc....
Edit, it is possible actually
Since JoshD correctly noted that we are looking for the number of elements, you can use a B-Tree instead of a heap and then you can get just the count of elements less than the current item and add it to a counter.
This can be solved by creating a binary search tree such that each node contains the size of its left subtree.
Values are added to the BST in reverse order of the original array. A sum is kept and each time we go right when adding a node, the current node being compared's size of the left subtree + 1 is added to the final sum (since the value being added is greater than that node and every value in its left subtree).
Building the tree is nlogn and once the tree is built, the sum will be the number of pairs.
Specially handling needs to be added for duplicate numbers depending on the requirements (i.e. if (4,3) shows up twice, should it be counted twice)
Right-to-left traversal, Red-Black tree where each node is augmented by the size of its respective subtree. So it takes O(logn) to find the number of elements below a given one.
As #jlewis42 points out, you can use a modified version of merge sort. I just wanted to add, you could use any of the standard comparison sort algorithms, as long as the worst-case running time is n log n, by "instrumenting" it to count inversions as it works. See also this near dupe.

Resources