Bonetrousle | Find B distinct positive integers below K such that their sum is N or say that it is not possible. | Timeout error - algorithm

I am attempting the Bonetrousle HackerRank challenge.
The problem is the following:
Find B distinct positive integers below K such that their sum is N or say that it is not possible.
Constraints:
n, k <= 10^18
b <= 10^5
You can check that a solution exists if the given N lies between the minimum(take first B elements) and maximum(take last B elements) possible sum.
From there on, I start with the minimum sum, and try to make it to N by assigning each element the maximum possible value without breaking the constraint. (no duplication, sum == N)
Below is the code I wrote.
def foo1(n,k,b):
minSum = (b*(b+1))//2
maxSum = (b)*(k-b+1+k)//2
#maxSum = (k*(k+1))//2 - minSum
#print(minSum, maxSum)
if n>=minSum and n<=maxSum:
minArr = [i for i in range(1,b+1)]
minArr.reverse()
sumA = sum(minArr)
maxA = k
for i in range(len(minArr)):
tmp = minArr[i]
minArr[i] = maxA
sumA = sumA-tmp+minArr[i]
while sumA > n:
sumA -=1
minArr[i] -= 1
maxA = minArr[i]-1
"""
while sumA+1 <= n and minArr[i]+1 <= k and minArr[i]+1 != maxA:
#print(minArr, maxA)
minArr[i]+=1
sumA +=1
maxA = minArr[i]
if sumA == n:
break
"""
else:
return [-1]
return minArr
The code outputs correct solutions however it times out on hacker rank for 4 test cases. (sample n,b,k : 19999651, 20000000, 6324)
It gives answer within 3 seconds on my machine for the same test case.
Initially I thought the issue was with the commented code, since I was trying to increment each element array 1-by-1 until the sum was reached. I modified the code to assign each element the maximum possible value and then decrement it if it breaks the constraints, however it did not help much, apparently.
Any suggestion on modifying the code to get it to pass the timing constraint or a much faster algorithm?

First, find the B largest consecutive integers with sum <= N. The problem is impossible if this sequence starts at an integer < 1 or ends at an integer > K
The sum of B integers starting at x is B*(2x+B-1)/2, so just solve for x directly.
Obviously, if you were to add one to each of the integers in the sequence starting at x, then you'd get the next B consecutive integers, and their sum is > N, so you don't need to increment that many. Just add 1 to the highest N-sum integers in the sequence to make the sum come out right.

Related

Find scalar interval containing maximum elements from population A and zero elements from population B

Given two large sets A and B of scalar (floating point) values, what algorithm would you use to find the (scalar) range [x0,x1] containing zero elements from B and the maximum number of elements from A?
Is sorting complexity (O(n log n)) unavoidable?
Create a single list with all values, where each value is marked with two counts: one count that relates to set A, and another that relates to set B. Initially these counts are 1 and 0, when the value comes from set A, and 0 and 1 when the value comes from set B. So entries in this list could be tuples (value, countA, countB). This operation is O(n).
Sort these tuples. O(nlogn)
Merge tuples with duplicate values into one tuple, and accumulate the values in the corresponding counters, so that the tuple tells us how many times the value occurs in set A and how many times in set B. O(n)
Traverse this list in sorted order and maintain the largest sum of counts for countA of a series of adjacent tuples where countB is always 0, and the minimum and maximum value of that range. O(n)
The sorting is the determining factor of the time complexity: O(nlogn).
Sort both A and B in O(|A| log |A| + |B| log |B|). Then apply the following algorithm, which has complexity O(|A| + |B|):
i = j = k = 0
best_interval = (0, 1)
while i < len(B) - 1:
lo = B[i]
hi = B[i+1]
j = k # We can skip ahead from last iteration.
while j < len(A) and A[j] <= lo:
j += 1
k = j # We can skip ahead from the above loop.
while k < len(A) and A[k] < hi:
k += 1
if k - j > best_interval[1] - best_interval[0]:
best_interval = (j, k)
i += 1
x0 = A[best_interval[0]]
x1 = A[best_interval[1]-1]
It may look quadratic at a first inspection but note we never decrease j and k - it really is just a linear scan with three pointers.

How to make the run time of the program to ϴ n

The requirement is that the input will be set of integer ranging from -5 to 5, the result should give the longest subset of the integer, in which the total must be greater or equal to zero.
I can only come up with the following:
The input will be input[0 to n]
let start, longestStart, end, longestEnd, sum = 0
for i=0 to n-1
start = i
sum = input[i]
for j=1 to n
if sum + input[j] >= 0 then
end=j;
if end - start > longestEnd - longestStart then
longestStart = start;
longestEnd = end;
However this is ϴ(n^2). I would like to know what are the ways to make this loop become ϴ(n)
Thank you
Since
a - b == (a + n) - (b + n)
for any a, b or n, we can apply this to the array of numbers, keeping a running total of all elements from 0 to current. From the above equation, the sum of any subarray from index a to b is sum(elements 0-b) - sum(elements 0-a).
By keeping track of local minima and maxima, and the sums to them, you can find the subarray with the greatest range in one pass, ie O(n).

Count number of ways to divide a number in 4 parts

Given a positive integer n, find number of ways to divide n in four parts or represent n as sum of four positive integers. Here n varies from 0 to 5000.
def foo(target, k, j):
count = 0
map = {}
if target in map.keys() and map[target] == k:
return map[target]
if target == 0 and k == 0:
return 1
if target <= 0 or k < 0:
return 0
for i in range(j, target+1):
count += foo(target-i, k-1, i)
map[target] = count
return count
print(foo(10, 4, 1))
I have solved this problem with above recursive solution but I just saw someone with below dynamic programming solution.
f(0,0) = 1
f(target, k) = 0 if k > target or (target > 0 and k = 0)
f(target, k) = f(target-k, k) + f(target-1, k-1)
Can someone enlighten me on this solution?
That solution is correct but a little bit tricky, and I will try my best to illustrate it clearly to you.
If target=25 and we split it into 25=9+7+5+4. And we express it by 4 columns(1*9, 1*7, 1*5, 1*4):
But in another perspective, you can view the image as 9 rows(1*4, 1*4, 1*4 , 1*4, 1*3, 1*2, 1*2, 1*1, 1*1).
So, you will find your solution is constructing the image by column ways, and that solution is by row ways.
So we come to that solution for details:
f(target, k) = f(target-k, k) + f(target-1, k-1)
f(target, k): target tiles remain and the length of the row is k
f(target - k, k): put a row of length k
f(target - 1, k - 1): put just one tile to rightest column(ensuring the answer is positive integer), and decrease the length of the row by 1.
That's all.
If you still got any question, you may leave a comment here.
Given a number n, find the number of ways you can represent n as sum of x1+x2+x3+x4 such that xi>=0
The answer will be (n+3)C(3)
In general, x1+x2+x3+...+xn=k with xi>=0 has the solution (k+n-1)C(n-1)
Here, k=n and n=4.

Creating a random number generator from a coin toss

Yesterday i had this interview question, which I couldn't fully answer:
Given a function f() = 0 or 1 with a perfect 1:1 distribution, create a function f(n) = 0, 1, 2, ..., n-1 each with probability 1/n
I could come up with a solution for if n is a natural power of 2, ie use f() to generate the bits of a binary number of k=ln_2 n. But this obviously wouldn't work for, say, n=5 as this would generate f(5) = 5,6,7 which we do not want.
Does anyone know a solution?
You can build a rng for the smallest power of two greater than n as you described. Then whenever this algorithm generates a number larger than n-1, throw that number away and try again. This is called the method of rejection.
Addition
The algorithm is
Let m = 2^k >= n where k is is as small as possible.
do
Let r = random number in 0 .. m-1 generated by k coin flips
while r >= n
return r
The probability that this loop stops with at most i iterations is bounded by 1 - (1/2)^i. This goes to 1 very rapidly: The loop is still running after 30 iterations with probability less than one-billionth.
You can decrease the expected number of iterations with a slightly modified algorithm:
Choose p >= 1
Let m = 2^k >= p n where k is is as small as possible.
do
Let r = random number in 0 .. m-1 generated by k coin flips
while r >= p n
return floor(r / p)
For example if we are trying to generate 0 .. 4 (n = 5) with the simpler algorithm, we would reject 5, 6 and 7, which is 3/8 of the results. With p = 3 (for example), pn = 15, we'd have m = 16 and would reject only 15, or 1/16 of the results. The price is needing four coin flips rather than 3 and a division op. You can continue to increase p and add coin flips to decrease rejections as far as you wish.
Another interesting solution can be derived through a Markov Chain Monte Carlo technique, the Metropolis-Hastings algorithm. This would be significantly more efficient if a large number of samples were required but it would only approach the uniform distribution in the limit.
initialize: x[0] arbitrarily
for i=1,2,...,N
if (f() == 1) x[i] = (x[i-1]++) % n
else x[i] = (x[i-1]-- + n) % n
For large N the vector x will contain uniformly distributed numbers between 0 and n. Additionally, by adding in an accept/reject step we can simulate from an arbitrary distribution, but you would need to simulate uniform random numbers on [0,1] as a sub-procedure.
def gen(a, b):
min_possible = a
max_possible = b
while True:
floor_min_possible = floor(min_possible)
floor_max_possible = floor(max_possible)
if max_possible.is_integer():
floor_max_possible -= 1
if floor_max_possible == floor_min_possible:
return floor_max_possible
mid = (min_possible + max_possible)/2
if coin_flip():
min_possible = mid
else:
max_possible = mid
My #RandomNumberGenerator #RNG
/w any f(x) that gives rand ints from 1 to x, we can get rand ints from 1 to k, for any k:
get ints p & q, so p^q is smallest possible, while p is a factor of x, & p^q >= k;
Lbl A
i=0 & s=1; while i < q {
s+= ((f(x) mod p) - 1) * p^i;
i++;
}
if s > k, goto A, else return s
//** about notation/terms:
rand = random
int = integer
mod is (from) modulo arithmetic
Lbl is a “Label”, from the Basic language, & serves as a coordinates for executing code. After the while loop, if s > k, then “goto A” means return to the point of code where it says “Lbl A”, & resume. If you return to Lbl A & process the code again, it resets the values of i to 0 & s to 1.
i is an iterator for powers of p, & s is a sum.
"s+= foo" means "let s now equal what it used to be + foo".
"i++" means "let i now equal what it used to be + 1".
f(x) returns random integers from 1 to x. **//
I figured out/invented/solved it on my own, around 2008. The method is discussed as common knowledge here. Does anyone know since when the random number generator rejection method has been common knowledge? RSVP.

Find pairs in an array such that a%b = k , where k is a given integer

Here is an interesting programming puzzle I came across . Given an array of positive integers, and a number K. We need to find pairs(a,b) from the array such that a % b = K.
I have a naive O(n^2) solution to this where we can check for all pairs such that a%b=k. Works but inefficient. We can certainly do better than this can't we ? Any efficient algorithms for the same? Oh and it's NOT homework.
Sort your array and binary search or keep a hash table with the count of each value in your array.
For a number x, we can find the largest y such that x mod y = K as y = x - K. Binary search for this y or look it up in your hash and increment your count accordingly.
Now, this isn't necessarily the only value that will work. For example, 8 mod 6 = 8 mod 3 = 2. We have:
x mod y = K => x = q*y + K =>
=> x = q(x - K) + K =>
=> x = 1(x - K) + K =>
=> x = 2(x - K)/2 + K =>
=> ...
This means you will have to test all divisors of y as well. You can find the divisors in O(sqrt y), giving you a total complexity of O(n log n sqrt(max_value)) if using binary search and O(n sqrt(max_value)) with a hash table (recommended especially if your numbers aren't very large).
Treat the problem as having two separate arrays as input: one for the a numbers and a % b = K and one for the b numbers. I am going to assume that everything is >= 0.
First of all, you can discard any b <= K.
Now think of every number in b as generating a sequence K, K + b, K + 2b, K + 3b... You can record this using a pair of numbers (pos, b), where pos is incremented by b at each stage. Start with pos = 0.
Hold these sequences in a priority queue, so you can find the smallest pos value at any given time. Sort the array of a numbers - in fact you could do this ahead of time and discard any duplicates.
For each a number
While the smallest pos in the priority queue is <= a
Add the smallest multiple of b to it to make it >= a
If it is == a, you have a match
Update the stored value of pos for that sequence, re-ordering the priority queue
At worst, you end up comparing every number with every other number, which is the same as the simple solution, but with priority queue and sorting overhead. However, large values of b may remain unexamined in the priority queue while several a numbers pass through, in which case this does better - and if there are a lot of numbers to process and they are all different, some of them must be large.
This answer mentions the main points of an algorithm (called DL because it uses “divisor lists” ) and gives details via a program, called amodb.py.
Let B be the input array, containing N positive integers. Without much loss of generality, suppose B[i] > K for all i and that B is in ascending order. (Note that x%B[i] < K if B[i] < K; and where B[i] = K, one can report pairs (B[i], B[j]) for all j>i. If B is not sorted initially, charge a cost of O(N log N) to sort it.)
In algorithm DL and program amodb.py, A is an array with K pre-subtracted from the input array elements. Ie, A[i] = B[i] - K. Note that if a%b == K, then for some j we have a = b*j + K or a-K = b*j. That is, a%b == K iff a-K is a multiple of b. Moreover, if a-K = b*j and p is any factor of b, then p is a factor of a-K.
Let the prime numbers from 2 to 97 be called “small factors”. When N numbers are uniformly randomly selected from some interval [X,Y], on the order of N/ln(Y) of the numbers will have no small factors; a similar number will have a greatest small factor of 2; and declining proportions will have successively larger greatest small factors. For example, on the average about N/97 will be divisible by 97, about N/89-N/(89*97) by 89 but not 97, etc. Generally, when members of B are random, lists of members with certain greatest small factors or with no small factors are sub-O(N/ln(Y)) in length.
Given a list Bd containing members of B divisible by largest small factor p, DL tests each element of Bd against elements of list Ad, those elements of A divisible by p. But given a list Bp for elements of B without small factors, DL tests each of Bp's elements against all elements of A. Example: If N=25, p=13, Bd=[18967, 23231], and Ad=[12779, 162383], then DL tests if any of 12779%18967, 162383%18967, 12779%23231, 162383%23231 are zero. Note that it is possible to cut the number of tests in half in this example (and many others) by noticing 12779<18967, but amodb.py does not include that optimization.
DL makes J different lists for J different factors; in one version of amodb.py, J=25 and the factor set is primes less than 100. A larger value of J would increase the O(N*J) time to initialize divisor lists, but would slightly decrease the O(N*len(Bp)) time to process list Bp against elements of A. See results below. Time to process other lists is O((N/logY)*(N/logY)*J), which is in sharp contrast to the O(n*sqrt(Y)) complexity for a previous answer's method.
Shown next is output from two program runs. In each set, the first Found line is from a naïve O(N*N) test, and the second is from DL. (Note, both DL and the naïve method would run faster if too-small A values were progressively removed.) The time ratio in the last line of the first test shows a disappointingly low speedup ratio of 3.9 for DL vs naïve method. For that run, factors included only the 25 primes less than 100. For the second run, with better speedup of ~ 4.4, factors included numbers 2 through 13 and primes up to 100.
$ python amodb.py
N: 10000 K: 59685 X: 100000 Y: 1000000
Found 208 matches in 21.854 seconds
Found 208 matches in 5.598 seconds
21.854 / 5.598 = 3.904
$ python amodb.py
N: 10000 K: 97881 X: 100000 Y: 1000000
Found 207 matches in 21.234 seconds
Found 207 matches in 4.851 seconds
21.234 / 4.851 = 4.377
Program amodb.py:
import random, time
factors = [2,3,4,5,6,7,8,9,10,11,12,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
X, N = 100000, 10000
Y, K = 10*X, random.randint(X/2,X)
print "N: ", N, " K: ", K, "X: ", X, " Y: ", Y
B = sorted([random.randint(X,Y) for i in range(N)])
NP = len(factors); NP1 = NP+1
A, Az, Bz = [], [[] for i in range(NP1)], [[] for i in range(NP1)]
t0 = time.time()
for b in B:
a, aj, bj = b-K, -1, -1
A.append(a) # Add a to A
for j,p in enumerate(factors):
if a % p == 0:
aj = j
Az[aj].append(a)
if b % p == 0:
bj = j
Bz[bj].append(b)
Bp = Bz.pop() # Get not-factored B-values list into Bp
di = time.time() - t0; t0 = time.time()
c = 0
for a in A:
for b in B:
if a%b == 0:
c += 1
dq = round(time.time() - t0, 3); t0 = time.time()
c=0
for i,Bd in enumerate(Bz):
Ad = Az[i]
for b in Bd:
for ak in Ad:
if ak % b == 0:
c += 1
for b in Bp:
for ak in A:
if ak % b == 0:
c += 1
dr = round(di + time.time() - t0, 3)
print "Found", c, " matches in", dq, "seconds"
print "Found", c, " matches in", dr, "seconds"
print dq, "/", dr, "=", round(dq/dr, 3)

Resources