M Blossoming Groups of length atleast K - algorithm

I was asked this question in an interview recently, I still cannot come up with a solution.
There is a garden with N slots. In each slot, there is a flower. The N flowers will bloom one by one in N days. In each day, there will be exactly one flower blooming and it will be in the status of blooming since then.
Given an array flowers consists of number from 1 to N. Each number in the array represents the place where the flower will open in that day.
For example, flowers[i] = x means that the unique flower that blooms at day i will be at position x, where i and x will be in the range from 1 to N.
Also given an integer K and M, you need to output in which latest day there exists M blossoming groups of length atleast K in the status of blooming.

One possible approach
transform array
flowers[i] = x to flowers2[x]=i
Now can start iterating with day=1..N
flower at position i will be blossom if flowers2[i]<=day. Now all you have to do it count how many consecutive groups you have with size>=K - O(n2) solution.
This can be optimized to O(nlong) by realizing that as day grow from 1 to N ..groups are going to be bigger in size and smaller in number. So do something like binary search. Start with day = N/2. Let's say min group size for this is k and number of groups are m. Now if kM choose day as mid of upper range ( N/2, N ) else mid of lower range ( 0, N/2 ). Do it iteratively until you find ans. This will only work if answer always exists

Related

Maximum Score a Mathematician can score

For an array A of n integers, a mathematician can perform the following moves move on the array
1. Choose an index i(0<=i<length(A)) and add A[i] to the scores.
2. Discard either the left partition(i.e A[0....i-1]) or the right
partition(i.e A[i+1 ... length(A)-1]). the partition discarded can
be empty too. The selected partition becomes the new value of A and
is used for subsequent operations.
Starting from the initial score of 0 mathematician wishes to find the maximum score achievable after K moves.
Example:
A = [4,6,-10,-1,10,-20], K = 4
Maximum Score is 19
Explanation:
- Select A[4](0-based indexing) and keep the left subarray. Now the
score is 10 and A = [4,6,-10,-1].
- Select A[0] and keep the right subarray. Now Score is 10+4=14 and A =
[6,-10,-1].
- Select A[0] and keep the right subarray. Now the score is 14+6=20,
and A = [-10,-1].
- Select A[1] and then right subarray. Now score is 20-1=19 and A = []
So, after K=4 moves, the maximum score is 19
I tried a dynamic programming solution with the following subproblem and recurrence relation:
- opt(i,j,k) = maximum score possible using element from index i to j
in k moves
- opt(i,j,k) = max( opt(i,j,k), a[l] + max(opt(i,l-1,k-1),
opt(l+1,j,k-1)) for l ranging from i to j (inclusive).
the complexity of the above dp solution is: n^3k
Can you help me with a better solution?
Let M be a set of the K largest values in A. It's obvious the maximum achievable score is the sum of all the elements in M. Note that it's always possible to get such a score. The mathematician can first find M and then go through the array selecting the leftmost value in A that belongs to M and discarding the left part of the array. This proves that finding the sum of M is the answer.
You can use Quickselect to achieve O(n) performance on average. If you want to avoid the worst-case performance O(n^2) you can find M using a min heap of size K storing the K largest numbers as you iterate over A. This would lead to O(n * log(K)) time complexity.

Given numbers a1,a2,...,an whose sum is positive. Find the minimal number s.t. the sum of numbers less than or equal to it is positive, in linear time

Problem: Given n different numbers a1,a2,...,an, whose sum is positive. Show how one can find the minimal number such that the sum of numbers less than or equal to it is positive, in time-complexity of O(n).
Note: the numbers aren't necessarily whole and they aren't necessarily sorted as given.
Some explanation of the problem: if the array was sorted, [x,x,x,y,x,...,x,x,x] and y is the first number such that summing all the numbers up-to it will give a positive/zero sum ( and summing less numbers up-to it will give negative sum ), then y will be returned. ( the x here is just a place holder for a number, all numbers in the array are different )
Attempt:
Define the parameters low , high = 0, n which will serve as boundaries for the summation of the elements within them and also as boundaries for choosing the pivot.
Chose a pivot randomly and partition the array ( for example, by Lomuto's partition ), denote this pivot's index as p'. The partitioning will cost O(n). Sum the numbers from low to p' and designate the sum of these numbers as s.
If s<0 define low=p', and repeat the process of choosing a random pivot ( whose index will be denoted as p' ) and parititoning between low and high and then summing the numbers between these two bounderies as s := s + the new summation value.
Else, define high=p' and repeat the process described in the 'If' condition above.
The process will end when low = high.
Besides a few logical gaps in my attempt, it's overall complexity is O(n) on average and not at worst-case.
Do you have any ideas as to how solve the problem in O(n) time?, I thought maybe using a manipulation of 'Median of Medians' algorithm but I have no idea.
Thanks in advance for any help!

How many times variable m is updated

Given the following pseudo-code, the question is how many times on average is the variable m being updated.
A[1...n]: array with n random elements
m = a[1]
for I = 2 to n do
if a[I] < m then m = a[I]
end for
One might answer that since all elements are random, then the variable will be updated on average on half the number of iterations of the for loop plus one for the initialization.
However, I suspect that there must be a better (and possibly the only correct) way to prove it using binomial distribution with p = 1/2. This way, the average number of updates on m would be
M = 1 + Σi=1 to n-1[k.Cn,k.pk.(1-p)(n-k)]
where Cn,k is the binomial coefficient. I have tried to solve this but I have stuck some steps after since I do not know how to continue.
Could someone explain me which of the two answers is correct and if it is the second one, show me how to calculate M?
Thank you for your time
Assuming the elements of the array are distinct, the expected number of updates of m is the nth harmonic number, Hn, which is the sum of 1/k for k ranging from 1 to n.
The summation formula can also be represented by the recursion:
H1 &equals; 1
Hn &equals; Hn−1&plus;1/n (n > 1)
It's easy to see that the recursion corresponds to the problem.
Consider all permutations of n−1 numbers, and assume that the expected number of assignments is Hn−1. Now, every permutation of n numbers consists of a permutation of n−1 numbers, with a new smallest number inserted in one of n possible insertion points: either at the beginning, or after one of the n−1 existing values. Since it is smaller than every number in the existing series, it will only be assigned to m in the case that it was inserted at the beginning. That has a probability of 1/n, and so the expected number of assignments of a permutation of n numbers is Hn−1 + 1/n.
Since the expected number of assignments for a vector of length one is obviously 1, which is H1, we have an inductive proof of the recursion.
Hn is asymptotically equal to ln n &plus; γ where γ is the Euler-Mascheroni constant, approximately 0.577. So it increases without limit, but quite slowly.
The values for which m is updated are called left-to-right maxima, and you'll probably find more information about them by searching for that term.
I liked #rici answer so I decided to elaborate its central argument a little bit more so to make it clearer to me.
Let H[k] be the expected number of assignments needed to compute the min m of an array of length k, as indicated in the algorithm under consideration. We know that
H[1] = 1.
Now assume we have an array of length n > 1. The min can be in the last position of the array or not. It is in the last position with probability 1/n. It is not with probability 1 - 1/n. In the first case the expected number of assignments is H[n-1] + 1. In the second, H[n-1].
If we multiply the expected number of assignments of each case by their probabilities and sum, we get
H[n] = (H[n-1] + 1)*1/n + H[n-1]*(1 - 1/n)
= H[n-1]*1/n + 1/n + H[n-1] - H[n-1]*1/n
= 1/n + H[n-1]
which shows the recursion.
Note that the argument is valid if the min is either in the last position or in any the first n-1, not in both places. Thus we are using that all the elements of the array are different.

Choosing M of N packets so sum is minimal multiple of K

I saw this program on Codechef.
There are N packets each containing some candies. (Eg: 1st contains 10, 2nd contains 4 and so on)
We have to select exactly M packets from it ( M<=N) such that total candies are divisible by K.
If there are more than one solution then output the one having lowest number of candies.
I thought its similar to Subset Sum problem but that is NP hard. So it will take exponential time.
I don't want the complete solution of this program. An algorithm would be appreciated. Thinking on it from 2 days but unable to get the correct logic.
1 ≤ M ≤ N ≤ 50000, 1 ≤ K ≤ 20
Number of Candies in each packet [1,10^9]
Let packets contain the original packets.
Partition k into sums of p = 1, 2, ..., m numbers >= 1 and < k (there are O(2^k) such partitions). For each partition, iterate over packets and add those numbers whose remainder modulo k is one of the partition's elements, then remove that element from the partition. Keep the minimum sum as well, and update a global minimum. Note that if m > p, you must also have m - p zeroes.
You might be thinking this is O(2^k * n) and it's too slow, but you don't actually have to iterate the packets array for each partition if you keep num[i] = how many numbers have packets[i] % k == i, in which case it becomes O(2^k + n). To handle the minimum sum requirement too, you can keep num[i] = the list of the numbers that have packets[i] % k == i, which will allow you to always pick the smallest numbers for a valid partition.
Have a look again at http://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution and note that K is relatively small. Furthermore, although N can be large, all you care about in the sums that involve N is the answer mod K. So there is a dynamic programming solution lurking around here, where at each step you have K possible values mod K, and you keep track of which of these values are currently attainable.

Programing Pearls - Random Select algorithm

Page 120 of Programming Pearls 1st edition presents this algorithm for selecting M equally probable random elements out of a population of N integers.
InitToEmpty
Size := 0
While Size < M do
T := RandInt(1,N)
if not Member(T)
Insert(T)
Size := Size + 1
It is stated that the expected number of Member tests is less than 2M, as long as M < N/2.
I'd like to know how to prove it, but my algorithm analysis background is failing me.
I understand that the closer M is to N, the longer the program will take, because the result set will have more elements and the likelihood of RandInt selecting an existing one will increase proportionally.
Can you help me figuring out this proof?
I am not a math wizard, but I will give it a rough shot. This is NOT guaranteed to be right though.
For each additional member of M, you pick a number, see if it's there, and if is add it. Otherwise, you try again. Trying something until you're successful is called a geometric probability distribution.
http://en.wikipedia.org/wiki/Geometric_distribution
So you are running M geometric trials. Each trial has expected value 1/p, so will take expected 1/p tries to get a number not already in M. p is N minus the number of numbers we've already added from M divided by N (i.e. how many unpicked items / total items). So for the fourth number, p = (N -3) / N, which is the probability of picking an unused number, so the expected number of picks for the third number is N / N-3 .
The expected value of the run time is all of these added together. So something like
E(run time) = N/N + N/(N -1) + N/(N -2 ) ... + N/ (N-M)
Now if M < N/2, then the last element in that summation is bounded above by 2. ((N/N/2) == 2)). It's also obviously the largest element in the whole summation. So if the biggest element is two picks, and there are M elements being summed, the EV of the whole run time is bounded above by 2M.
Ask me if any of this is unclear. Correct me if any of this is wrong :)
Say we have chosen K elements out of N. Then our next try has probability (N-K)/N of succeeding, so the number of tries that it takes to find the K + 1 st element is geometrically distributed with mean N/(N-K).
So if 2M < N we expect it to take less than two tries to get each element.

Resources