What is a better method to solve this prob? - algorithm

Here is the problem i am trying to solve,
You are given a table with 2 rows and N columns. Each cell has an integer in it. The score
of such a table is denfined as follows: for each column, consider the sum of the two numbers
in the column; the maximum of the N numbers so obtained is the score. For example, for
the table
7 1 6 2
1 2 3 4
the score is max(7 + 1; 1 + 2; 6 + 3; 2 + 4) = 9.
The first row of the table is fixed, and given as input. N possible ways to ll the second
row are considered:
1; 2; : : : ; N
2; 3; : : : ; N; 1
3; 4; : : : ; N; 1; 2
|
N; 1; : : : ; ; N 1
For instance, for the example above, we would consider each of the following as possibilities for the second row.
1 2 3 4
2 3 4 1
3 4 1 2
4 1 2 3
Your task is to find the score for each of the above choices of the second row. In the
example above, you would evaluate the following four tables,
7 1 6 2
1 2 3 4
7 1 6 2
2 3 4 1
7 1 6 2
3 4 1 2
7 1 6 2
4 1 2 3
and compute scores 9, 10, 10 and 11, respectively
Test data: N <= 200000
Time Limit: 2 seconds
Here is the obvious method:
Maintain two arrays A,B, Do the following n times
add every element A[i] to B[i] and keep a variable max which stores the maximum value so far.
print max
loop through array B[i] and increment all the elements by 1, if any element is equal to N, set it equal to 1.
This method will have take O(n^2) time, the outer loop runs N times and there are two inner loops which run for N times each.
To reduce the time taken, we can find the maximum element M in the first row(in a linear scan), and then remove A[i] and B[i] whenever A[i] + N <= M + 1.
Because they will never be max.
But this method might perform better in the average case, the worst case time will still be O(N^2).
To find max in constant time i also considered using a heap, each element of the heap has two attributes, their original value and the value to be added.
But still it will require a linear time to increment the value-to-be-added for all the elements of the heap for each of the n cases.
And so the time still remains O(N^2)
I am not able to find a method that will solve this problem faster than N^2 time which will be too slow as the value of N can be very large.
Any help would be greatly appreciated.

There is also an O(n) algorithm. Using the same observations as in a previous answer:
Now consider what happens to the column sums when you rotate the second row to the left (e.g. change it from 1,2,...,N to 2,3,...,N,1): Each column sum will increase by 1, except for one column sum which is decreased by N-1.
Instead of modifying all the column sums, we can decrease the one column sum by N, and then take the maximum column sum plus 1 to find the new maximum of the column sums. So we only need to update one column instead of all of them.
The column where the maximum appears can only move to the left or jump back to the column with the overall maximum as we iterate over the possibilities of the second row. Candidate columns are the ones that were the temporary maximum in a left to right maximum scan.
calculate all the sums for the first choice of the second row (1, 2, ..., N) and store them in an array.
find the maximum in this array in a left to right scan and remember the positions of temporary maxima.
in a right to left pass the sums are now decreased by N. If this decreasing process reaches the max column, check if the number is smaller than the overall maximum - N, in this case the new max column is the overall maximum column and it will stay there for the rest of the loop. If the number is still larger than the previous maximum determined in step 2, the max column will stay the same for the rest of the loop. Otherwise, the previous maximum becomes the new max column.
Taking the example input 7,1,6,2 the algorithm runs as follows:
Step 1 calculates the sum 8,3,9,6
Step 2 finds the temporary maxima from left to right: 8 in col 1 and then 9 in col 3
Step 3 generates the results passing over the array from right to left
8 3 9 6 -> output 9 + 0 = 9
8 3 9 2 -> output 9 + 1 = 10
8 3 5 2 -> current max col is decreased, previous max 8 is larger and becomes current
output 8 + 2 = 10
8 -1 5 2 -> output 8 + 3 = 11
Here is the algorithm in C:
#include <stdio.h>
int N;
int A[200000];
int M[200000];
int main(){
int i,m,max,j,mval,mmax;
scanf("%d",&N);
for(i = 0;i < N; i++){
scanf("%d",&A[i]);
A[i] = A[i]+i+1;
}
m = 0;
max = 0;
M[0] = 0;
for(i = 1;i < N; i++){
if(A[i] > A[max]){
m++;
M[m] = i;
max = i;
}
}
mval = A[max] - N;
mmax = max;
for(i = N-1,j = 0;i >=0;i --,j++){
printf("%d ", A[max]+j);
A[i] = A[i] - N;
if(i == max){
if (A[i] < mval) {
max = mmax;
} else if(m > 0 && A[i] < A[M[m-1]]){
max = M[m-1];
m--;
}
}
}
printf("\n");
return 0;
}

Here is an O(n*logn) solution:
Suppose you calculate all the column sums for a particular arrangement of the second row.
Now consider what happens to the column sums when you rotate the second row to the left (e.g. change it from 1,2,...,N to 2,3,...,N,1): Each column sum will increase by 1, except for one column sum which is decreased by N-1.
Instead of modifying all the column sums, we can decrease the one column sum by N, and then take the maximum column sum plus 1 to find the new maximum of the column sums. So we only need to update one column instead of all of them.
So our algorithm would be:
Set the second row to be 1,2,...,N and calculate the sum of each column. Put all these sums in a max-heap. The root of the heap will be the largest sum.
For i in 1 to N:
Decrease the value of the heap node corresponding to the N-i th column by N.
The new maximum column sum is the root of the heap plus i.
Each heap update takes O(logn) which leads to an O(n*logn) overall time.

Related

Find the kth largest element in an array after inserting the absolute difference back in the array

I recently found this question somewhere in a contest, couldn't remember though. The problem statement goes like this.
Given an unsorted positive integer array like [2,4,9], you can do an operation on the array to give it a new form. Find the kth largest element after you no longer can do the operation.
Operation is defined as follows. Absolute difference of any two elements should be re-inserted back in the array. For example for the above array, it could be [2,4,9,5,7], duplicates can't be inserted back, for example absolute diff(2,4) is 2, but 2 is already part of array.
Can anybody figure out the approach?
The answer is equal to m - k + 1 multiplied by the greatest common divisor (GCD) of the elements in the input array, where m is the number of elements in the final array, so long as this number is at least k.
To show this, we need to show that the array after applying the operation as many times as possible will always result in an array of the form [d, 2*d, 3*d, ..., m*d] in some order, where d is the GCD and m is some positive integer. There are three parts to the proof:
We need to show that d is constructible by some sequence of applying the operation. This is true because the operation allows us to do any subtractions we like where the smaller number is the one subtracted, and this is sufficient to perform Euclid's algorithm.
We need to show that all of the numbers in the claimed result are constructible. This is true because the largest number in the input array has d as a divisor by definition, so it must be m*d for some m, the smaller multiples can be constructed by repeatedly subtracting d.
We need to show that no other numbers are constructible. This is true because the result of a subtraction always shares common divisors with the two operands, and because larger numbers cannot be constructed by subtraction.
So the algorithm works as follows:
Find the GCD of the input array (e.g. by repeatedly applying Euclid's algorithm). Call the result d.
Find the maximum element of the input array, and divide it by d. Call the result m.
If m >= k, then return (m - k + 1)*d, otherwise raise an error.
The m - k + 1 term is to get the kth largest element in the result; if the kth smallest element is required, this will be k*d.
Not a complete answer, just some thoughts that might help.
We have streams of numbers. For example, given [2, 4, 9], we know all numbers with the difference of 2 will be generated down from each number, m, higher than 2, as well as m mod 2, which starts another cycle.
9-2=7
7-2=5
5-2=3
etc.
We get [2, 3, 4, 5, 7, 9] and the remainder 1. But 1, using the same procedure, will generate all numbers in the range (1, max).
I would start with considering how to obtain the smallest such remainder (greater than zero) we can have. But we may also need to consider the full range of differences that each generate such a "stream."
import java.util.*;
class TestClass {
public static void main(String args[] ) throws Exception {
Scanner s = new Scanner(System.in);
int t = s.nextInt();
while(t--!=0){
int n = s.nextInt();
int arr[] = new int[n];
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
int k;
for(int i=0;i<n;i++){
arr[i] = s.nextInt();
if(arr[i] > max)
max = arr[i];
if(arr[i] < min)
min = arr[i];
}
k = s.nextInt();
int is_dev = 1;
for(int i =0 ;i < n;i++){
if(arr[i]%min !=0){
is_dev = 0; //easily reach to [1,2,3 .... max]
break;
}
}
if(is_dev == 0){
n = max - (k - 1);
} else {
n = max - min*(k-1);
}
if(n<=0)
System.out.println("-1");
else
System.out.println(n);
}
}
}
eg1 . 4 7 9
here min - 4
max - 9
if we check all number not divisible to 4 take min number always
iteration 1 . 4 5 7 9 - for pair (4,7)
iteration 2 . 1 4 5 7 9 - for pair (4,5)
at this point if we have we can easily go upto
1 2 3 4 5 6 7 8 9
easily find kth largest element
so point is any one of number not divisible to min number then we can easily reach to 1.
eg2 . 3 9
min 3
max 9
here all number number divisible by 3 so final array like .
3 6 9
kth largest element will be : max-3(k-1)
eg 9 - 3(2-1) =6

Find the k-th sum from sums of every pair

I'm given the array with n elements and I need to find k-th sum from sums of every pair n^2 in time complexity O(n*logn), sums are in ascending order.
Example input
In the first line are given number of elements and number of sum to find. In the second line list of number which sums of pair we need to generate.
3 6
1 4 6
The answer is 8 for given list, below is array of every pair of sums, where 8, sum of 4+4 is on the 6-th position.
2 5 5 7 7 8 10 10 12
where first three elements are genereted as follow
1+1 = 2
1+4 = 5
4+1 = 5
Edit:
I came up to this that the main problem is to find place for sum of elements with themselves. I will give example to make it more clear.
For sequence [1, 4, 10], we have
2 5 5 8 11 11 14 14 20
The problem is where to place sum of 4+4, that depends if 1+10 > 4+4, others sums have fixed place because second element + last will be always bigger than last + first (if we have elements in ascending order).
This can be solved in O(n log maxSum).
Pseudocode:
sort(array)
low = array[1] * 2
high = array[n] * 2
while (low <= high): (binarySearch between low and high)
mid = floor((high + low) / 2)
qty = checkHowManySumsAreEqualOrLessThan(mid)
if qty >= k:
high = mid - 1
else:
low = mid + 1
answer = low // low will be the first value of mid where qty was >= k. This means that on low - 1, qty was < k. This means the solution must be low
Sorting is O(n log n). Binary search costs log(array[n] * 2 - array[0] * 2).
checkHowManySumsAreEqualOrLessThan(mid) can be done in O(n) using 2 pointers, let me know if you can't figure out how.
This works because even though we are not doing the binary search over k, it is true that if there were x sums <= mid, if k < x then the kth sum would be lower than mid. Same for when k > x.
Thanks to juvian solving the hard parts, I was able to write this solution, the comments should explain it:
def count_sums_of_at_most(amount, nums1, nums2):
p1 = 0 # Pointer into the first array, start at the beginning
p2 = len(nums2) - 1 # Pointer into the second array, start at the end
# Move p1 up and p2 down, walking through the "diagonal" in O(n)
sum_count = 0
while p1 < len(nums1):
while amount < nums1[p1] + nums2[p2]:
p2 -= 1
if p2 < 0:
# p1 became too large, we are done
break
else:
# Found a valid p2 for the given p1
sum_count += p2 + 1
p1 += 1
continue
break
return sum_count
def find_sum(k, nums1, nums2):
# Sort both arrays, this runs in O(n * log(n))
nums1.sort()
nums2.sort()
# Binary search through all sums, runs in O(n * log(max_sum))
low = nums1[0] + nums2[0]
high = nums1[-1] + nums2[-1]
while low <= high:
mid = (high + low) // 2
sum_count = count_sums_of_at_most(mid, nums1, nums2)
if sum_count >= k:
high = mid - 1
else:
low = mid + 1
return low
arr = [1, 4, 5, 6]
for k in range(1, 1 + len(arr) ** 2):
print('sum', k, 'is', find_sum(k, arr, arr))
This prints:
sum 1 is 2
sum 2 is 5
sum 3 is 5
sum 4 is 6
sum 5 is 6
sum 6 is 7
sum 7 is 7
sum 8 is 8
sum 9 is 9
sum 10 is 9
sum 11 is 10
sum 12 is 10
sum 13 is 10
sum 14 is 11
sum 15 is 11
sum 16 is 12
Edit: this is O(n^2)
The way I understood the problem, the first number on the first row is the number of numbers, and the second number is k.
You can do this problem by using a PriorityQueue, which orders everything for you as you input numbers. Use 2 nested for loops such that they visit each pair once.
for(int k = 0; k < n; k++){
for(int j = 0; j <= k; j++){
If j==k, enter k+j into the PriorityQueue once, if not, enter the sum twice. Then, loop through the PriorityQueue to get he 6th value.
Will edit with full code if you'd like.

Number of different marks

I came across an interesting problem and I can't solve it in a good complexity (better than O(qn)):
There are n persons in a row. Initially every person in this row has some value - lets say that i-th person has value a_i. These values are pairwise distinct.
Every person gets a mark. There are two conditions:
If a_i < a_j then j-th person cant get worse mark than i-th person.
If i < j then j-th person can't get worse mark than i-th person (this condition tells us that sequence of marks is non-decreasing sequence).
There are q operations. In every operation two person are swapped (they swap their values).
After each operation you have tell what is maximal number of diffrent marks that these n persons can get.
Do you have any idea?
Consider any two groups, J and I (j < i and a_j < a_i for all j and i). In any swap scenario, a_i is the new max for J and a_j is the new min for I, and J gets extended to the right at least up to and including i.
Now if there was any group of is to the right of i whos values were all greater than the values in the left segment of I up to i, this group would not have been part of I, but rather its own group or part of another group denoting a higher mark.
So this kind of swap would reduce the mark count by the count of groups between J and I and merge groups J up to I.
Now consider an in-group swap. The only time a mark would be added is if a_i and a_j (j < i), are the minimum and maximum respectively of two adjacent segments, leading to the group splitting into those two segments. Banana123 showed in a comment below that this condition is not sufficient (e.g., 3,6,4,5,1,2 => 3,1,4,5,6,2). We can address this by also checking before the switch that the second smallest i is greater than the second largest j.
Banana123 also showed in a comment below that more than one mark could be added in this instance, for example 6,2,3,4,5,1. We can handle this by keeping in a segment tree a record of min,max and number of groups, which correspond with a count of sequential maxes.
Example 1:
(1,6,1) // (min, max, group_count)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 2 and 5. Updates happen in log(n) along the intervals containing 2 and 5.
To add group counts in a larger interval the left group's max must be lower than the right group's min. But if it's not, as in the second example, we must check one level down in the tree.
(1,6,1)
(2,6,1) (1,5,1)
(6,6,1) (2,3,2) (4,4,1) (1,5,1)
6 2 3 4 5 1
Swap 1 and 6:
(1,6,6)
(1,3,3) (4,6,3)
(1,1,1) (2,3,2) (4,4,1) (5,6,2)
1 2 3 4 5 6
Example 2:
(1,6,1)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 1 and 6. On the right side, we have two groups where the left group's max is greater than the right group's min, (4,4,1) (2,6,2). To get an accurate mark count, we go down a level and move 2 into 4's group to arrive at a count of two marks. A similar examination is then done in the level before the top.
(1,6,3)
(1,5,2) (2,6,2)
(1,1,1) (3,5,1) (4,4,1) (2,6,2)
1 5 3 4 2 6
Here's an O(n log n) solution:
If n = 0 or n = 1, then there are n distinct marks.
Otherwise, consider the two "halves" of the list, LEFT = [1, n/2] and RIGHT = [n/2 + 1, n]. (If the list has an odd number of elements, the middle element can go in either half, it doesn't matter.)
Find the greatest value in LEFT — call it aLEFT_MAX — and the least value in the second half — call it aRIGHT_MIN.
If aLEFT_MAX < aRIGHT_MIN, then there's no need for any marks to overlap between the two, so you can just recurse into each half and return the sum of the two results.
Otherwise, we know that there's some segment, extending at least from LEFT_MAX to RIGHT_MIN, where all elements have to have the same mark.
To find the leftmost extent of this segment, we can scan leftward from RIGHT_MIN down to 1, keeping track of the minimum value we've seen so far and the position of the leftmost element we've found to be greater than some further-rightward value. (This can actually be optimized a bit more, but I don't think we can improve the algorithmic complexity by doing so, so I won't worry about that.) And, conversely to find the rightmost extent of the segment.
Suppose the segment in question extends from LEFTMOST to RIGHTMOST. Then we just need to recursively compute the number of distinct marks in [1, LEFTMOST) and in (RIGHTMOST, n], and return the sum of the two results plus 1.
I wasn't able to get a complete solution, but here are a few ideas about what can and can't be done.
First: it's impossible to find the number of marks in O(log n) from the array alone - otherwise you could use your algorithm to check if the array is sorted faster than O(n), and that's clearly impossible.
General idea: spend O(n log n) to create any additional data which would let you to compute number of marks in O(log n) time and said data can be updated after a swap in O(log n) time. One possibly useful piece to include is the current number of marks (i.e. finding how number of marks changed may be easier than to compute what it is).
Since update time is O(log n), you can't afford to store anything mark-related (such as "the last person with the same mark") for each person - otherwise taking an array 1 2 3 ... n and repeatedly swapping first and last element would require you to update this additional data for every element in the array.
Geometric interpretation: taking your sequence 4 1 3 2 5 7 6 8 as an example, we can draw points (i, a_i):
|8
+---+-
|7 |
| 6|
+-+---+
|5|
-------+-+
4 |
3 |
2|
1 |
In other words, you need to cover all points by a maximal number of squares. Corollary: exchanging points from different squares a and b reduces total number of squares by |a-b|.
Index squares approach: let n = 2^k (otherwise you can add less than n fictional persons who will never participate in exchanges), let 0 <= a_i < n. We can create O(n log n) objects - "index squares" - which are "responsible" for points (i, a_i) : a*2^b <= i < (a+1)*2^b or a*2^b <= a_i < (a+1)*2^b (on our plane, this would look like a cross with center on the diagonal line a_i=i). Every swap affects only O(log n) index squares.
The problem is, I can't find what information to store for each index square so that it would allow to find number of marks fast enough? all I have is a feeling that such approach may be effective.
Hope this helps.
Let's normalize the problem first, so that a_i is in the range of 0 to n-1 (can be achieved in O(n*logn) by sorting a, but just hast to be done once so we are fine).
function normalize(a) {
let b = [];
for (let i = 0; i < a.length; i++)
b[i] = [i, a[i]];
b.sort(function(x, y) {
return x[1] < y[1] ? -1 : 1;
});
for (let i = 0; i < a.length; i++)
a[b[i][0]] = i;
return a;
}
To get the maximal number of marks we can count how many times
i + 1 == mex(a[0..i]) , i integer element [0, n-1]
a[0..1] denotes the sub-array of all the values from index 0 to i.
mex() is the minimal exclusive, which is the smallest value missing in the sequence 0, 1, 2, 3, ...
This allows us to solve a single instance of the problem (ignoring the swaps for the moment) in O(n), e.g. by using the following algorithm:
// assuming values are normalized to be element [0,n-1]
function maxMarks(a) {
let visited = new Array(a.length + 1);
let smallestMissing = 0, marks = 0;
for (let i = 0; i < a.length; i++) {
visited[a[i]] = true;
if (a[i] == smallestMissing) {
smallestMissing++;
while (visited[smallestMissing])
smallestMissing++;
if (i + 1 == smallestMissing)
marks++;
}
}
return marks;
}
If we swap the values at indices x and y (x < y) then the mex for all values i < x and i > y doesn't change, although it is an optimization, unfortunately that doesn't improve complexity and it is still O(qn).
We can observe that the hits (where mark is increased) are always at the beginning of an increasing sequence and all matches within the same sequence have to be a[i] == i, except for the first one, but couldn't derive an algorithm from it yet:
0 6 2 3 4 5 1 7
*--|-------|*-*
3 0 2 1 4 6 5 7
-|---|*-*--|*-*

nth smallest element in a union of an array of intervals with repetition

I want to know if there is a more efficient solution than what I came up with(not coded it yet but described the gist of it at the bottom).
Write a function calcNthSmallest(n, intervals) which takes as input a non-negative int n, and a list of intervals [[a_1; b_1]; : : : ; [a_m; b_m]] and calculates the nth smallest number (0-indexed) when taking the union of all the intervals with repetition. For example, if the intervals were [1; 5]; [2; 4]; [7; 9], their union with repetition would be [1; 2; 2; 3; 3; 4; 4; 5; 7; 8; 9] (note 2; 3; 4 each appear twice since they're in both the intervals [1; 5] and [2; 4]). For this list of intervals, the 0th smallest number would be 1, and the 3rd and 4th smallest would both be 3. Your implementation should run quickly even when the a_i; b_i can be very large (like, one trillion), and there are several intervals
The way I thought to go about it is the straightforward solution which is to make the union array and traverse it.
This problem can be solved in O(N log N) where N is the number of intervals in the list, regardless of the actual values of the interval endpoints.
The key to solving this problem efficiently is to transform the list of possibly-overlapping intervals into a list of intervals which are either disjoint or identical. In the given example, only the first interval needs to be split:
{ [1,5], [2,4], [7,9]} =>
+-----------------+ +---+ +---+
{[1,1], [2,4], [5,5], [2,4], [7,9]}
(This doesn't have to be done explicitly, though: see below.) Now, we can sort the new intervals, replacing duplicates with a count. From that, we can compute the number of values each (possibly-duplicated) interval represents. Now, we simply need to accumulate the values to figure out which interval the solution lies in:
interval count size values cumulative
in interval values
[1,1] 1 1 1 [0, 1)
[2,4] 2 3 6 [1, 7) (eg. from n=1 to n=6 will be here)
[5,5] 1 1 1 [7, 8)
[7,9] 1 3 3 [8, 11)
I wrote the cumulative values as a list of half-open intervals, but obviously we only need the end-points. We can then find which interval holds value n by, for example, binary-searching the cumulative values list, and we can figure out which value in the interval we want by subtracting the start of the interval from n and then integer-dividing by the count.
It should be clear that the maximum size of the above table is twice the number of original intervals, because every row must start and end at either the start or end of some interval in the original list. If we'd written the intervals as half-open instead of closed, this would be even clearer; in that case, we can assert that the precise size of the table will be the number of unique values in the collection of end-points. And from that insight, we can see that we don't really need the table at all; we just need the sorted list of end-points (although we need to know which endpoint each value represents). We can simply iterate through that list, maintaining the count of the number of active intervals, until we reach the value we're looking for.
Here's a quick python implementation. It could be improved.
def combineIntervals(intervals):
# endpoints will map each endpoint to a count
endpoints = {}
# These two lists represent the start and (1+end) of each interval
# Each start adds 1 to the count, and each limit subtracts 1
for start in (i[0] for i in intervals):
endpoints[start] = endpoints.setdefault(start, 0) + 1
for limit in (i[1]+1 for i in intervals):
endpoints[limit] = endpoints.setdefault(limit, 0) - 1
# Filtering is a possibly premature optimization but it was easy
return sorted(filter(lambda kv: kv[1] != 0,
endpoints.iteritems()))
def nthSmallestInIntervalList(n, intervals):
limits = combineIntervals(intervals)
cumulative = 0
count = 0
index = 0
here = limits[0][0]
while index < len(limits):
size = limits[index][0] - here
if n < cumulative + count * size:
# [here, next) contains the value we're searching for
return here + (n - cumulative) / count
# advance
cumulative += count * size
count += limits[index][1]
here += size
index += 1
# We didn't find it. We could throw an error
So, as I said, the running time of this algorithm is independent of the actual values of the intervals; it only depends in the length of the interval list. This particular solution is O(N log N) because of the cost of the sort (in combineIntervals); if we used a priority queue instead of a full sort, we could construct the heap in O(N) but making the scan O(log N) for each scanned endpoint. Unless N is really big and the expected value of the argument n is relatively small, this would be counter-productive. There might be other ways to reduce complexity, though.
Edit2:
Here's yet another take on your question.
Let's consider the intervals graphically:
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
[-------]
[-----------------]
[---------]
[--------------]
[-----]
When sorted in increasing order on the lower bound, we could get something that looks like the above for the interval list ([2;10];[4;24];[7;17];[13;30];[20;27]). Each lower bound indicates the start of a new interval, and would also marks the beginning of one more "level" of duplication of the numbers. Conversely, upper bounds mark the end of that level, and decrease the duplication level of one.
We could therefore convert the above into the following list:
[2;+];[4;+];[7;+][10;-];[13;+];[17;-][20;+];[24;-];[27;-];[30;-]
Where the first value indicates the rank of the bound, and the second value whether the bound is lower (+) or upper (-). The computation of the nth element is done by following the list, raising or lowering the duplication level when encountering an lower or upper bound, and using the duplication level as a counting factor.
Let's consider again the list graphically, but as an histogram:
3333 44444 5555
2222222333333344444555
111111111222222222222444444
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
The view above is the same as the first one, with all the intervals packed vertically.
1 being the elements of the 1st one, 2 the second one, etc. In fact, what matters here
is the height at each index, corresponding of the number of time each index is duplicated in the union of all intervals.
3333 55555 7777
2223333445555567777888
112223333445555567777888999
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
| | | | | | || | |
We can see that histogram blocks start at lower bounds of intervals, and end either on upper bounds, or one unit before lower bounds, so the new notation must be modified accordingly.
With a list containing n intervals, as a first step, we convert the list into the notation above (O(n)), and sort it in increasing bound order (O(nlog(n))). The second step of computing the number is then in O(n), for a total average time in O(nlog(n)).
Here's a simple implementation in OCaml, using 1 and -1 instead of '+' and '-'.
(* transform the list in the correct notation *)
let rec convert = function
[] -> []
| (l,u)::xs -> (l,1)::(u+1,-1)::convert xs;;
(* the counting function *)
let rec count r f = function
[] -> raise Not_found
| [a,x] -> (match f + x with
0 -> if r = 0 then a else raise Not_found
| _ -> a + (r / f))
| (a,x)::(b,y)::l ->
if a = b
then count r f ((b,x+y)::l)
else
let f = f + x in
if f > 0 then
let range = (b - a) * f in
if range > r
then a + (r / f)
else count (r - range) f ((b,y)::l)
else count r f ((b,y)::l);;
(* the compute function *)
let compute l =
let compare (x,_) (y,_) = compare x y in
let l = List.sort compare (convert l) in
fun m -> count m 0 l;;
Notes:
- the function above will raise an exception if the sought number is above the intervals. This corner case isn't taken in account by the other methods below.
- the list sorting function used in OCaml is merge sort, which effectively performs in O(nlog(n)).
Edit:
Seeing that you might have very large intervals, the solution I gave initially (see down below) is far from optimal.
Instead, we could make things much faster by transforming the list:
we try to compress the interval list by searching for overlapping ones and replace them by prefixing intervals, several times the overlapping one, and suffixing intervals. We can then directly compute the number of entries covered by each element of the list.
Looking at the splitting above (prefix, infix, suffix), we see that the optimal structure to do the processing is a binary tree. A node of that tree may optionally have a prefix and a suffix. So the node must contain :
an interval i in the node
an integer giving the number of repetition of i in the list,
a left subtree of all the intervals below i
a right subtree of all the intervals above i
with this structure in place, the tree is automatically sorted.
Here's an example of an ocaml type embodying that tree.
type tree = Empty | Node of int * interval * tree * tree
Now the transformation algorithm boils down to building the tree.
This function create a tree out of its component:
let cons k r lt rt =
the tree made of count k, interval r, left tree lt and right tree rt
This function recursively insert an interval in a tree.
let rec insert i it =
let r = root of it
let lt = the left subtree of it
let rt = the right subtree of it
let k = the count of r
let prf, inf, suf = the prefix, infix and suffix of i according to r
return cons (k+1) inf (insert prf lt) (insert suf rt)
Once the tree is built, we do a pre-order traversal of the tree, using the count of the node to accelerate the computation of the nth element.
Below is my previous answer.
Here are the steps of my solution:
you need to sort the interval list in increasing order on the lower bound of each interval
you need a deque dq (or a list which will be reversed at some point) to store the intervals
here's the code:
let lower i = lower bound of interval i
let upper i = upper bound of i
let il = sort of interval list
i <- 0
j <- lower (head of il)
loop on il:
i <- i + 1
let h = the head of il
let il = the tail of il
if upper h > j then push h to dq
if lower h > j then
il <- concat dq and il
j <- j + 1
dq <- empty
loop
if i = k then return j
loop
This algorithm works by simply iterating through the intervals, only taking in account the relevant intervals, and counting both the rank i of the element in the union, and the value j of that element. When the targeted rank k has been reached, the value is returned.
The complexity is roughly in O(k) + O(sort(l)).
if i have understood your question correctly, you want to find the kth largest element in union of list of intervals.
If we assume that no of list = 2 the question is :
Find the kth smallest element in union of two sorted arrays (where an interval [2,5] is nothing but elements from 2 to 5 {2,3,4,5}) this sollution can be solved in (n+m)log(n+m) time where (n and m are sizes of list) . where i and j are list iterators .
Maintaining the invariant
i + j = k – 1,
If Bj-1 < Ai < Bj, then Ai must be the k-th smallest,
or else if Ai-1 < Bj < Ai, then Bj must be the k-th smallest.
For details click here
Now the problem is if you have no of lists=3 lists then
Maintaining the invariant
i + j+ x = k – 1,
i + j=k-x-1
The value k-x-1 can take y (size of third list, because x iterates from start point of list to end point) .
problem of 3 lists size can be reduced to y*(problem of size 2 list). So complexity is `y*((n+m)log(n+m))`
If Bj-1 < Ai < Bj, then Ai must be the k-th smallest,
or else if Ai-1 < Bj < Ai, then Bj must be the k-th smallest.
So for problem of size n list the complexity is NP .
But yes we can do minor improvement if we know that k< sizeof(some lists) we can chop the elements starting from k+1th element to end(from our search space ) in those list whose size is bigger than k (i think it doesnt help for large k).If there is any mistake please let me know.
Let me explain with an example:
Assume we are given these intervals [5,12],[3,9],[8,13].
The union of these intervals is:
number : 3 4 5 5 6 6 7 7 8 8 8 9 9 9 10 10 11 11 12 12 13.
indices: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
The lowest will return 11 when 9 is passed an input.
The highest will return 14 when 9 is passed an input.
Lowest and highest function just check whether the x is present in that interval, if it is present then adds x-a(lower index of interval) to return value for that one particular interval. If an interval is completely smaller than x, then adds total number of elements in that interval to the return value.
The find function will return 9 when 13 is passed.
The find function will use the concept of binary search to find the kth smallest element. In the given range [0,N] (if range is not given we can find high range in O(n)) find the mid and calculate the lowest and highest for mid. If given k falls in between lowest and highest return mid else if k is less than or equal to lowest search in the lower half(0,mid-1) else search in the upper half(mid+1,high).
If the number of intervals are n and the range is N, then the running time of this algorithm is n*log(N). we will find lowest and highest (which runs in O(n)) log(N) times.
//Function call will be `find(0,N,k,in)`
//Retrieves the no.of smaller elements than first x(excluding) in union
public static int lowest(List<List<Integer>> in, int x){
int sum = 0;
for(List<Integer> lst: in){
if(x > lst.get(1))
sum += lst.get(1) - lst.get(0)+1;
else if((x >= lst.get(0) && x<lst.get(1)) || (x > lst.get(0) && x<=lst.get(1))){
sum += x - lst.get(0);
}
}
return sum;
}
//Retrieve the no.of smaller elements than last x(including) in union.
public static int highest(List<List<Integer>> in, int x){
int sum = 0;
for(List<Integer> lst: in){
if(x > lst.get(1))
sum += lst.get(1) - lst.get(0)+1;
else if((x >= lst.get(0) && x<lst.get(1)) || (x > lst.get(0) && x<=lst.get(1))){
sum += x - lst.get(0)+1;
}
}
return sum;
}
//Do binary search on the range.
public static int find(int low, int high, int k,List<List<Integer>> in){
if(low > high)
return -1;
int mid = low + (high-low)/2;
int lowIdx = lowest(in,mid);
int highIdx = highest(in,mid);
//k lies between the current numbers high and low indices
if(k > lowIdx && k <= highIdx) return mid;
//k less than lower index. go on to left side
if(k <= lowIdx) return find(low,mid-1,k,in);
// k greater than higher index go to right
if(k > highIdx) return find(mid+1,high,k,in);
else
return -1; // catch statement
}
It's possible to count how many numbers in the list are less than some chosen number X (by iterating through all of the intervals). Now, if this number is greater than n, the solution is certainly smaller than X. Similarly, if this number is less than or equal to n, the solution is greater than or equal to X. Based on these observation we can use binary search.
Below is a Java implementation :
public int nthElement( int[] lowerBound, int[] upperBound, int n )
{
int lo = Integer.MIN_VALUE, hi = Integer.MAX_VALUE;
while ( lo < hi ) {
int X = (int)( ((long)lo+hi+1)/2 );
long count = 0;
for ( int i=0; i<lowerBound.length; ++i ) {
if ( X >= lowerBound[i] && X <= upperBound[i] ) {
// part of interval i is less than X
count += (long)X - lowerBound[i];
}
if ( X >= lowerBound[i] && X > upperBound[i] ) {
// all numbers in interval i are less than X
count += (long)upperBound[i] - lowerBound[i] + 1;
}
}
if ( count <= n ) lo = X;
else hi = X-1;
}
return lo;
}

array median transformation minimum steps

Given an array A with n
integers. In one turn one can apply the
following operation to any consecutive
subarray A[l..r] : assign to all A i (l <= i <= r)
median of subarray A[l..r] .
Let max be the maximum integer of A .
We want to know the minimum
number of operations needed to change A
to an array of n integers each with value
max.
For example, let A = [1, 2, 3] . We want to change it to [3, 3, 3] . We
can do this in two operations, first for
subarray A[2..3] (after that A equals to [1,
3, 3] ), then operation to A[1..3] .
Also,median is defined for some array A as follows. Let B be the same
array A , but sorted in non-decreasing
order. Median of A is B m (1-based
indexing), where m equals to (n div 2)+1 .
Here 'div' is an integer division operation.
So, for a sorted array with 5 elements,
median is the 3rd element and for a sorted
array with 6 elements, it is the 4th element.
Since the maximum value of N is 30.I thought of brute forcing the result
could there be a better solution.
You can double the size of the subarray containing the maximum element in each iteration. After the first iteration, there is a subarray of size 2 containing the maximum. Then apply your operation to a subarray of size 4, containing those 2 elements, giving you a subarray of size 4 containing the maximum. Then apply to a size 8 subarray and so on. You fill the array in log2(N) operations, which is optimal. If N is 30, five operations is enough.
This is optimal in the worst case (i.e. when only one element is the maximum), since it sets the highest possible number of elements in each iteration.
Update 1: I noticed I messed up the 4s and 8s a bit. Corrected.
Update 2: here's an example. Array size 10, start state:
[6 1 5 9 3 2 0 7 4 8]
To get two nines, run op on subarray of size two containing the nine. For instance A[4…5] gets you:
[6 1 5 9 9 2 0 7 4 8]
Now run on size four subarray that contains 4…5, for instance on A[2…5] to get:
[6 9 9 9 9 2 0 7 4 8]
Now on subarray of size 8, for instance A[1…8], get:
[9 9 9 9 9 9 9 9 4 8]
Doubling now would get us 16 nines, but we have only 10 positions, so round of with A[1…10], get:
[9 9 9 9 9 9 9 9 9 9]
Update 3: since this is only optimal in the worst case, it is actually not an answer to the original question, which asks for a way of finding the minimal number of operations for all inputs. I misinterpreted the sentence about brute forcing to be about brute forcing with the median operations, rather than in finding the minimum sequence of operations.
This is the problem from codechef Long Contest.Since the contest is already over,so awkwardiom ,i am pasting the problem setter approach (Source : CC Contest Editorial Page).
"Any state of the array can be represented as a binary mask with each bit 1 means that corresponding number is equal to the max and 0 otherwise. You can run DP with state R[mask] and O(n) transits. You can proof (or just believe) that the number of statest will be not big, of course if you run good DP. The state of our DP will be the mask of numbers that are equal to max. Of course, it makes sense to use operation only for such subarray [l; r] that number of 1-bits is at least as much as number of 0-bits in submask [l; r], because otherwise nothing will change. Also you should notice that if the left bound of your operation is l it is good to make operation only with the maximal possible r (this gives number of transits equal to O(n)). It was also useful for C++ coders to use map structure to represent all states."
The C/C++ Code is::
#include <cstdio>
#include <iostream>
using namespace std;
int bc[1<<15];
const int M = (1<<15) - 1;
void setMin(int& ret, int c)
{
if(c < ret) ret = c;
}
void doit(int n, int mask, int currentSteps, int& currentBest)
{
int numMax = bc[mask>>15] + bc[mask&M];
if(numMax == n) {
setMin(currentBest, currentSteps);
return;
}
if(currentSteps + 1 >= currentBest)
return;
if(currentSteps + 2 >= currentBest)
{
if(numMax * 2 >= n) {
setMin(currentBest, 1 + currentSteps);
}
return;
}
if(numMax < (1<<currentSteps)) return;
for(int i=0;i<n;i++)
{
int a = 0, b = 0;
int c = mask;
for(int j=i;j<n;j++)
{
c |= (1<<j);
if(mask&(1<<j)) b++;
else a++;
if(b >= a) {
doit(n, c, currentSteps + 1, currentBest);
}
}
}
}
int v[32];
void solveCase() {
int n;
scanf(" %d", &n);
int maxElement = 0;
for(int i=0;i<n;i++) {
scanf(" %d", v+i);
if(v[i] > maxElement) maxElement = v[i];
}
int mask = 0;
for(int i=0;i<n;i++) if(v[i] == maxElement) mask |= (1<<i);
int ret = 0, p = 1;
while(p < n) {
ret++;
p *= 2;
}
doit(n, mask, 0, ret);
printf("%d\n",ret);
}
main() {
for(int i=0;i<(1<<15);i++) {
bc[i] = bc[i>>1] + (i&1);
}
int cases;
scanf(" %d",&cases);
while(cases--) solveCase();
}
The problem setter approach has exponential complexity. It is pretty good for N=30. But not so for larger sizes. I think, it's more interesting to find an exponential time solution. And I found one, with O(N4) complexity.
This approach uses the fact that optimal solution starts with some group of consecutive maximal elements and extends only this single group until whole array is filled with maximal values.
To prove this fact, take 2 starting groups of consecutive maximal elements and extend each of them in optimal way until they merge into one group. Suppose that group 1 needs X turns to grow to size M, group 2 needs Y turns to grow to the same size M, and on turn X + Y + 1 these groups merge. The result is a group of size at least M * 4. Now instead of turn Y for group 2, make an additional turn X + 1 for group 1. In this case group sizes are at least M * 2 and at most M / 2 (even if we count initially maximal elements, that might be included in step Y). After this change, on turn X + Y + 1 the merged group size is at least M * 4 only as a result of the first group extension, add to this at least one element from second group. So extending a single group here produces larger group in same number of steps (and if Y > 1, it even requires less steps). Since this works for equal group sizes (M), it will work even better for non-equal groups. This proof may be extended to the case of several groups (more than two).
To work with single group of consecutive maximal elements, we need to keep track of only two values: starting and ending positions of the group. Which means it is possible to use a triangular matrix to store all possible groups, allowing to use a dynamic programming algorithm.
Pseudo-code:
For each group of consecutive maximal elements in original array:
Mark corresponding element in the matrix and clear other elements
For each matrix diagonal, starting with one, containing this element:
For each marked element in this diagonal:
Retrieve current number of turns from this matrix element
(use indexes of this matrix element to initialize p1 and p2)
p2 = end of the group
p1 = start of the group
Decrease p1 while it is possible to keep median at maximum value
(now all values between p1 and p2 are assumed as maximal)
While p2 < N:
Check if number of maximal elements in the array is >= N/2
If this is true, compare current number of turns with the best result \
and update it if necessary
(additional matrix with number of maximal values between each pair of
points may be used to count elements to the left of p1 and to the
right of p2)
Look at position [p1, p2] in the matrix. Mark it and if it contains \
larger number of turns, update it
Repeat:
Increase p1 while it points to maximal value
Increment p1 (to skip one non-maximum value)
Increase p2 while it is possible to keep median at maximum value
while median is not at maximum value
To keep algorithm simple, I didn't mention special cases when group starts at position 0 or ends at position N, skipped initialization and didn't make any optimizations.

Resources