How to solve this problem without dynamic programming - data-structures

The traders of Hackerland have a stock prediction model that they want to backtest. The predicted profit earned every month by a certain portfolio is given by an array pnl of n integers where a negative integer denotes loss.
The traders decide to iterate through the months from 1 to n, and if the pnl is positive, they add it to their profit and if the pnl is negative, they can either subtract it or skip it. The net profit must always be greater than or equal to 0.
Given the array pnl and an integer k, find the maximum total profit that can be earned by the portfolio after subtracting at least k losses. If it is not possible to take at least k losses, report -1 as the answer.
I thought of greedy approach ie first add all positive elements and then take k largest negative elements and if it became negative I return -1 otherwise return the value.

Related

Given numbers a1,a2,...,an whose sum is positive. Find the minimal number s.t. the sum of numbers less than or equal to it is positive, in linear time

Problem: Given n different numbers a1,a2,...,an, whose sum is positive. Show how one can find the minimal number such that the sum of numbers less than or equal to it is positive, in time-complexity of O(n).
Note: the numbers aren't necessarily whole and they aren't necessarily sorted as given.
Some explanation of the problem: if the array was sorted, [x,x,x,y,x,...,x,x,x] and y is the first number such that summing all the numbers up-to it will give a positive/zero sum ( and summing less numbers up-to it will give negative sum ), then y will be returned. ( the x here is just a place holder for a number, all numbers in the array are different )
Attempt:
Define the parameters low , high = 0, n which will serve as boundaries for the summation of the elements within them and also as boundaries for choosing the pivot.
Chose a pivot randomly and partition the array ( for example, by Lomuto's partition ), denote this pivot's index as p'. The partitioning will cost O(n). Sum the numbers from low to p' and designate the sum of these numbers as s.
If s<0 define low=p', and repeat the process of choosing a random pivot ( whose index will be denoted as p' ) and parititoning between low and high and then summing the numbers between these two bounderies as s := s + the new summation value.
Else, define high=p' and repeat the process described in the 'If' condition above.
The process will end when low = high.
Besides a few logical gaps in my attempt, it's overall complexity is O(n) on average and not at worst-case.
Do you have any ideas as to how solve the problem in O(n) time?, I thought maybe using a manipulation of 'Median of Medians' algorithm but I have no idea.
Thanks in advance for any help!

M Blossoming Groups of length atleast K

I was asked this question in an interview recently, I still cannot come up with a solution.
There is a garden with N slots. In each slot, there is a flower. The N flowers will bloom one by one in N days. In each day, there will be exactly one flower blooming and it will be in the status of blooming since then.
Given an array flowers consists of number from 1 to N. Each number in the array represents the place where the flower will open in that day.
For example, flowers[i] = x means that the unique flower that blooms at day i will be at position x, where i and x will be in the range from 1 to N.
Also given an integer K and M, you need to output in which latest day there exists M blossoming groups of length atleast K in the status of blooming.
One possible approach
transform array
flowers[i] = x to flowers2[x]=i
Now can start iterating with day=1..N
flower at position i will be blossom if flowers2[i]<=day. Now all you have to do it count how many consecutive groups you have with size>=K - O(n2) solution.
This can be optimized to O(nlong) by realizing that as day grow from 1 to N ..groups are going to be bigger in size and smaller in number. So do something like binary search. Start with day = N/2. Let's say min group size for this is k and number of groups are m. Now if kM choose day as mid of upper range ( N/2, N ) else mid of lower range ( 0, N/2 ). Do it iteratively until you find ans. This will only work if answer always exists

length arrangement with probability and cost

Consider a set of length, each associated with a probability. i.e.
X={a1=(100,1/4),a2=(500,1/4),a3=(200,1/2)}
Obviously, the sum of all the probabilities = 1.
Arrange the lengths together on a line one after the other from a starting point.
For example: {a2,a1,a3} in that order from start to finish.
Define the cost of an element a_i as its the total length from the starting line to the end of this element multiplied by its probability.
So from the previous arrangement:
cost(a2) = (500)*(1/4)
cost(a1) = (500+100)*(1/4)
cost(a3) = (500+100+200)*(1/2)
Define the total cost as the sum of all costs. e.g. cost(X) = cost(a2) + cost(a1) + cost(a3). Give an algorithm that finds an arrangement that minimizes cost(X)
My thoughts:
This looks like an greedy algorithm, since the last element in the arrangement always has the same sum multiplied by its probability, but I'm can't think of an heuristics that accomplishes this. It goes without saying that sorting by probability or length will not work.

Dynamic programming to find minimum number of coins

I'm trying to understand part of a question I have as my HW but it really looks like Chinese...
Let's say we have coins x_1, x_2, x_3, ... x_n. x_1 = 1 always.
We want to give a certain amount of money in a minimum number of coins.
Then we use dynamic programming.
And now I don't understand this - c(i,j) = min { c(i-1,j), 1+c(i,j-x_i) }
where c(i,j) is the minimal amount of coins to return amount j.
c(i,j-x_i) is the minimal number of coins to get the value j-x_i using only coins i,i+1,...,n (This is the induction hypothesis, that's what the recursive formula ensures us).
Thus, 1+c(i,j-x_i) is the minimal way to get j-x_i with the given set of coins + an extra coin valued x_i, which we decided to use.
From this, c(i,j) = min { c(i-1,j), 1+c(i,j-x_i) } is actually choosing "what is best" exhaustively:
Taking the current coin, and checking recursively the rest of the smaller problem
Deciding not to take it - and again, checking the smaller problem recursively.
Taking the minimal of those ensures us (because it is done exhaustively - over all possibilities) that c(i,j) is minimal.

Greatest GCD between some numbers

We've got some nonnegative numbers. We want to find the pair with maximum gcd. actually this maximum is more important than the pair!
For example if we have:
2 4 5 15
gcd(2,4)=2
gcd(2,5)=1
gcd(2,15)=1
gcd(4,5)=1
gcd(4,15)=1
gcd(5,15)=5
The answer is 5.
You can use the Euclidean Algorithm to find the GCD of two numbers.
while (b != 0)
{
int m = a % b;
a = b;
b = m;
}
return a;
If you want an alternative to the obvious algorithm, then assuming your numbers are in a bounded range, and you have plenty of memory, you can beat O(N^2) time, N being the number of values:
Create an array of a small integer type, indexes 1 to the max input. O(1)
For each value, increment the count of every element of the index which is a factor of the number (make sure you don't wraparound). O(N).
Starting at the end of the array, scan back until you find a value >= 2. O(1)
That tells you the max gcd, but doesn't tell you which pair produced it. For your example input, the computed array looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4 2 1 1 2 0 0 0 0 0 0 0 0 0 1
I don't know whether this is actually any faster for the inputs you have to handle. The constant factors involved are large: the bound on your values and the time to factorise a value within that bound.
You don't have to factorise each value - you could use memoisation and/or a pregenerated list of primes. Which gives me the idea that if you are memoising the factorisation, you don't need the array:
Create an empty set of int, and a best-so-far value 1.
For each input integer:
if it's less than or equal to best-so-far, continue.
check whether it's in the set. If so, best-so-far = max(best-so-far, this-value), continue. If not:
add it to the set
repeat for all of its factors (larger than best-so-far).
Add/lookup in a set could be O(log N), although it depends what data structure you use. Each value has O(f(k)) factors, where k is the max value and I can't remember what the function f is...
The reason that you're finished with a value as soon as you encounter it in the set is that you've found a number which is a common factor of two input values. If you keep factorising, you'll only find smaller such numbers, which are not interesting.
I'm not quite sure what the best way is to repeat for the larger factors. I think in practice you might have to strike a balance: you don't want to do them quite in decreasing order because it's awkward to generate ordered factors, but you also don't want to actually find all the factors.
Even in the realms of O(N^2), you might be able to beat the use of the Euclidean algorithm:
Fully factorise each number, storing it as a sequence of exponents of primes (so for example 2 is {1}, 4 is {2}, 5 is {0, 0, 1}, 15 is {0, 1, 1}). Then you can calculate gcd(a,b) by taking the min value at each index and multiplying them back out. No idea whether this is faster than Euclid on average, but it might be. Obviously it uses a load more memory.
The optimisations I can think of is
1) start with the two biggest numbers since they are likely to have most prime factors and thus likely to have the most shared prime factors (and thus the highest GCD).
2) When calculating the GCDs of other pairs you can stop your Euclidean algorithm loop if you get below your current greatest GCD.
Off the top of my head I can't think of a way that you can work out the greatest GCD of a pair without trying to work out each pair individually (and optimise a bit as above).
Disclaimer: I've never looked at this problem before and the above is off the top of my head. There may be better ways and I may be wrong. I'm happy to discuss my thoughts in more length if anybody wants. :)
There is no O(n log n) solution to this problem in general. In fact, the worst case is O(n^2) in the number of items in the list. Consider the following set of numbers:
2^20 3^13 5^9 7^2*11^4 7^4*11^3
Only the GCD of the last two is greater than 1, but the only way to know that from looking at the GCDs is to try out every pair and notice that one of them is greater than 1.
So you're stuck with the boring brute-force try-every-pair approach, perhaps with a couple of clever optimizations to avoid doing needless work when you've already found a large GCD (while making sure that you don't miss anything).
With some constraints, e.g the numbers in the array are within a given range, say 1-1e7, it is doable in O(NlogN) / O(MAX * logMAX), where MAX is the maximum possible value in A.
Inspired from the sieve algorithm, and came across it in a Hackerrank Challenge -- there it is done for two arrays. Check their editorial.
find min(A) and max(A) - O(N)
create a binary mask, to mark which elements of A appear in the given range, for O(1) lookup; O(N) to build; O(MAX_RANGE) storage.
for every number a in the range (min(A), max(A)):
for aa = a; aa < max(A); aa += a:
if aa in A, increment a counter for aa, and compare it to current max_gcd, if counter >= 2 (i.e, you have two numbers divisible by aa);
store top two candidates for each GCD candidate.
could also ignore elements which are less than current max_gcd;
Previous answer:
Still O(N^2) -- sort the array; should eliminate some of the unnecessary comparisons;
max_gcd = 1
# assuming you want pairs of distinct elements.
sort(a) # assume in place
for ii = n - 1: -1 : 0 do
if a[ii] <= max_gcd
break
for jj = ii - 1 : -1 :0 do
if a[jj] <= max_gcd
break
current_gcd = GCD(a[ii], a[jj])
if current_gcd > max_gcd:
max_gcd = current_gcd
This should save some unnecessary computation.
There is a solution that would take O(n):
Let our numbers be a_i. First, calculate m=a_0*a_1*a_2*.... For each number a_i, calculate gcd(m/a_i, a_i). The number you are looking for is the maximum of these values.
I haven't proved that this is always true, but in your example, it works:
m=2*4*5*15=600,
max(gcd(m/2,2), gcd(m/4,4), gcd(m/5,5), gcd(m/15,15))=max(2, 2, 5, 5)=5
NOTE: This is not correct. If the number a_i has a factor p_j repeated twice, and if two other numbers also contain this factor, p_j, then you get the incorrect result p_j^2 insted of p_j. For example, for the set 3, 5, 15, 25, you get 25 as the answer instead of 5.
However, you can still use this to quickly filter out numbers. For example, in the above case, once you determine the 25, you can first do the exhaustive search for a_3=25 with gcd(a_3, a_i) to find the real maximum, 5, then filter out gcd(m/a_i, a_i), i!=3 which are less than or equal to 5 (in the example above, this filters out all others).
Added for clarification and justification:
To see why this should work, note that gcd(a_i, a_j) divides gcd(m/a_i, a_i) for all j!=i.
Let's call gcd(m/a_i, a_i) as g_i, and max(gcd(a_i, a_j),j=1..n, j!=i) as r_i. What I say above is g_i=x_i*r_i, and x_i is an integer. It is obvious that r_i <= g_i, so in n gcd operations, we get an upper bound for r_i for all i.
The above claim is not very obvious. Let's examine it a bit deeper to see why it is true: the gcd of a_i and a_j is the product of all prime factors that appear in both a_i and a_j (by definition). Now, multiply a_j with another number, b. The gcd of a_i and b*a_j is either equal to gcd(a_i, a_j), or is a multiple of it, because b*a_j contains all prime factors of a_j, and some more prime factors contributed by b, which may also be included in the factorization of a_i. In fact, gcd(a_i, b*a_j)=gcd(a_i/gcd(a_i, a_j), b)*gcd(a_i, a_j), I think. But I can't see a way to make use of this. :)
Anyhow, in our construction, m/a_i is simply a shortcut to calculate the product of all a_j, where j=1..1, j!=i. As a result, gcd(m/a_i, a_i) contains all gcd(a_i, a_j) as a factor. So, obviously, the maximum of these individual gcd results will divide g_i.
Now, the largest g_i is of particular interest to us: it is either the maximum gcd itself (if x_i is 1), or a good candidate for being one. To do that, we do another n-1 gcd operations, and calculate r_i explicitly. Then, we drop all g_j less than or equal to r_i as candidates. If we don't have any other candidate left, we are done. If not, we pick up the next largest g_k, and calculate r_k. If r_k <= r_i, we drop g_k, and repeat with another g_k'. If r_k > r_i, we filter out remaining g_j <= r_k, and repeat.
I think it is possible to construct a number set that will make this algorithm run in O(n^2) (if we fail to filter out anything), but on random number sets, I think it will quickly get rid of large chunks of candidates.
pseudocode
function getGcdMax(array[])
arrayUB=upperbound(array)
if (arrayUB<1)
error
pointerA=0
pointerB=1
gcdMax=0
do
gcdMax=MAX(gcdMax,gcd(array[pointera],array[pointerb]))
pointerB++
if (pointerB>arrayUB)
pointerA++
pointerB=pointerA+1
until (pointerB>arrayUB)
return gcdMax

Resources