so I got a bit of a stumper for you. I need to write a function that takes a number (call this K) and outputs n numbers where the sum if these numbers == K.
For instance, if I give this function (100,3) it will output [1,2,97], [1,3,96], [1,4,95]... [97,1,2]
I have the function worked out for three digits:
k = 100
r = []
0.upto(k/2) do |a|
(a+1).upto(k/2) do |b|
c = k-(a+b)
r << [a,b,c]
end
end
How would I write this function that takes an n amount of digits?
This probably isn't the best solution in the world (memory required grows with O(k^3)), but it's a solution. I'm welcome to suggestions for improvement.
You might be interested in reading about integer partitions, which is what we're counting here.
You're looking for a function f(k,n) that will count the number of ways to partition a number k into exactly n parts. Part of the problem is it's hard to tell when you count a partition twice.
I'll solve this problem by using another function g(k,n,s) that counts the number of ways to partition a number k into n parts where the maximum value allowed is s. So for example, we don't count the partitions (90,8,2) or (64,20,16) in g(100,3,60) since they use values greater than s=60.
g(k,n,s) = f(k,n) when s>=k (i.e. we don't place a maximum on the values allowed in the partition).
A few facts about g(k,n,s):
k==n implies g=1 since the only way to partition k into k parts is by using all 1s (hence s is irrelevant since we use the lowest number possible)
n>k implies g=0 since we can't partition k into more than k parts
s==1 implies g=1 if k==n and g=0 otherwise, since placing a maximum value of 1 only allows a partition of k into k parts (all ones)
n==1 implies g=1 if s>=k and g=0 otherwise, since the only partition of k into n=1 parts requires that we use k itself in the partition
s<RoundUp(k/n) implies g=0 since we can't partition k into n parts using values less than k/n; for example we can't partition 100 int o 4 pieces using only values less than 25.
s>k-n+1 implies g(k,n,s) = g(k,n,s-1) since increasing the max value s after k-n+1 doesn't add any new partitions; for example, any partition of 100 into 3 parts will never include a number greater than 100-3+1 = 98
g(k,n,s) = g(k,n,s-1) + g(k-s,n-1,s) in every other case. This just adds any partitions using a value of s to all the previous partitions we've counted using a maximum value of s-1
Now I just choose maximum number K and throw all these facts into a nested for loop and deduce every value of g(k,n,s). To get f(k,n) I just find g(k,n,k).
Here's the algorithm, not suitable for large K.
g = (K by K by K) array of all zeros
for k = 1:K
for n = 1:K
for s = 1:K
if (k==n)
val = 1;
else if (n>k)
val = 0;
else if s==1
val = int(k==n);
else if n==1
val = int(s>=k);
else if s<RoundUp(k/n)
val = 0;
else if s>(k-n+1)
val = g(k,n,s-1);
else
val = g(k,n,s-1) + g(k-s,n-1,s);
end
g(k,n,s) = val;
end
end
end
For f(100,3) = g(100,3,100) I get 833 unique partitions, which you can see is correct if you use a brute force method. Please point out mistakes if you see any.
Related
I'm stuck for hours on the following homework question for data-structures class:
You are given a static set S (i.e., S never changes) of n integers from {1, . . . , u}.
Describe a data structure of size O(n log u) that can answer the following queries in O(1) time:
Empty(i, j) - returns TRUE if and only if there is no element in S that is between i and j (where i and j are integers in {1, . . . , u}).
At first I thought of using a y-fast-trie.
Using y-fast-trie we can achieve O(n) space and O(loglogu) query (by finding the successor of i and check if it's bigger than j).
But O(loglogu) is not O(1)...
Then I thought maybe we can sort the array and create a second array of size n+1 of the ranges that are not in the array and then in the query we would check if [i, j] is a sub-range of one of the ranges but I didn't thought of any way to do it that uses O(nlogu) space and can answer the query in O(1).
I have no idea how to solve this and I feel like I'm not even close to the solution, any help would be nice.
We can create a x-fast-trie of S (takes O(nlogu) space) and save in each node the maximum and minimum value of a leaf in it's sub tree. Now we can use that to answer the Empty query in O(1). Like this:
Empty(i, j)
We first calculate xor(i,j) now the number of leading zeros in that number will be the number of leading bits i and j share in common let's mark this number as k. Now we'll take the first k bits of i (or j because they're equal) and check in the x-fast-trie hash table if there's a node that equels to those bits. If there isn't we'll return TRUE because any number between i and j would also have the same k leading bits and since there isn't any number with those leading bits there isn't any number between i and j. If there is let's mark that node as X.
if X->right->minimum > j and X->left->maximum < i we return TRUE and otherwise we return FALSE, because if this is false then there is a number between i and j and if it's true then all the numbers that are smaller than j are also smaller than i and all the numbers that are bigger than i are also bigger than j.
Sorry for bad English
You haven't clarify either the numbers given will be sorted or not. If not, sort them, while will take O(nlogn).
Find upper bound of i, say x. Find lower bound of j, say y.
Now just check 4 numbers. Numbers at index x, x+1, y-1 and y. If any of the numbers of the given array is between i and j return true. Otherwise return false.
If the given Set/Array is not sorted, then in this approach additional O(nlogn) is required to sort it. Memory requires O(n). For each query, it's O(1).
Consider a data structure consisting of
an array A[1,...,u] of size u such that A[i]=1 if i is present in S, and A[i]=0 otherwise. This array can be constructed from set S in O(n).
an array B[1,...,u] of size u which stores cumulative sum of A i.e. B[i] = A[1]+...+A[i]. This array can be constructed in O(u) from A using the relation B[i] = B[i-1] + A[i] for all i>1.
a function empty(i,j) which returns the desired Boolean query. If i==1, then define count = B[j], otherwise take count = B[j]-B[i-1]. Note that count gives the number of distinct elements in S lying in range [i,j]. Once we have count, simply return count==0. Clearly, each query takes O(1).
Edit: As pointed out in comments, the size of this data structure is O(u), which doesn't matches the constraints. But I hope it gives others an approximate target to shoot at.
It isn't a solution, but impossible to write it in a comment. There is an idea of how to solve the more specific task that possibly will help to solve the generic task from the question.
The specific task is the same except the following point, u = 1024. Also, it isn't a final solution, it is a rough sketch (for the specific task).
Data structure creation:
Create a bitmask for U = { 1, ..., u } - M = 0000.....100001, where Mᵥ = 1 when Uᵥ ∊ S, otherwice = 0.
Save bitmask M as 'unsigned intgers 32' array = G (32 items). Each item of G contains 32 items from M.
Combine integer H = bitmask where Hᵣ = 0 when Gᵣ = 0, otherwice = 1
Convert G to G that is HashMap r to Gᵣ. G is G but contains records for Gᵣ != 0 only.
Images in the following pseudocode use 8 bits except 32, just for simplicity.
Empty(i, j) {
I = i / 32
J = j / 32
if I != J {
if P == 0: return true
if P(I) == 0: return true
if P(J) == 0: return true
} else {
if P(J=I) == 0: return true
}
return false
}
I am attempting the Bonetrousle HackerRank challenge.
The problem is the following:
Find B distinct positive integers below K such that their sum is N or say that it is not possible.
Constraints:
n, k <= 10^18
b <= 10^5
You can check that a solution exists if the given N lies between the minimum(take first B elements) and maximum(take last B elements) possible sum.
From there on, I start with the minimum sum, and try to make it to N by assigning each element the maximum possible value without breaking the constraint. (no duplication, sum == N)
Below is the code I wrote.
def foo1(n,k,b):
minSum = (b*(b+1))//2
maxSum = (b)*(k-b+1+k)//2
#maxSum = (k*(k+1))//2 - minSum
#print(minSum, maxSum)
if n>=minSum and n<=maxSum:
minArr = [i for i in range(1,b+1)]
minArr.reverse()
sumA = sum(minArr)
maxA = k
for i in range(len(minArr)):
tmp = minArr[i]
minArr[i] = maxA
sumA = sumA-tmp+minArr[i]
while sumA > n:
sumA -=1
minArr[i] -= 1
maxA = minArr[i]-1
"""
while sumA+1 <= n and minArr[i]+1 <= k and minArr[i]+1 != maxA:
#print(minArr, maxA)
minArr[i]+=1
sumA +=1
maxA = minArr[i]
if sumA == n:
break
"""
else:
return [-1]
return minArr
The code outputs correct solutions however it times out on hacker rank for 4 test cases. (sample n,b,k : 19999651, 20000000, 6324)
It gives answer within 3 seconds on my machine for the same test case.
Initially I thought the issue was with the commented code, since I was trying to increment each element array 1-by-1 until the sum was reached. I modified the code to assign each element the maximum possible value and then decrement it if it breaks the constraints, however it did not help much, apparently.
Any suggestion on modifying the code to get it to pass the timing constraint or a much faster algorithm?
First, find the B largest consecutive integers with sum <= N. The problem is impossible if this sequence starts at an integer < 1 or ends at an integer > K
The sum of B integers starting at x is B*(2x+B-1)/2, so just solve for x directly.
Obviously, if you were to add one to each of the integers in the sequence starting at x, then you'd get the next B consecutive integers, and their sum is > N, so you don't need to increment that many. Just add 1 to the highest N-sum integers in the sequence to make the sum come out right.
I have a number n and a set of numbers S ∈ [1..n]* with size s (which is substantially smaller than n). I want to sample a number k ∈ [1..n] with equal probability, but the number is not allowed to be in the set S.
I am trying to solve the problem in at worst O(log n + s). I am not sure whether it's possible.
A naive approach is creating an array of numbers from 1 to n excluding all numbers in S and then pick one array element. This will run in O(n) and is not an option.
Another approach may be just generating random numbers ∈[1..n] and rejecting them if they are contained in S. This has no theoretical bound as any number could be sampled multiple times even if it is in the set. But on average this might be a practical solution if s is substantially smaller than n.
Say s is sorted. Generate a random number between 1 and n-s, call it k. We've chosen the k'th element of {1,...,n} - s. Now we need to find it.
Use binary search on s to find the count of the elements of s <= k. This takes O(log |s|). Add this to k. In doing so, we may have passed or arrived at additional elements of s. We can adjust for this by incrementing our answer for each such element that we pass, which we find by checking the next larger element of s from the point we found in our binary search.
E.g., n = 100, s = {1,4,5,22}, and our random number is 3. So our approach should return the third element of [2,3,6,7,...,21,23,24,...,100] which is 6. Binary search finds that 1 element is at most 3, so we increment to 4. Now we compare to the next larger element of s which is 4 so increment to 5. Repeating this finds 5 in so we increment to 6. We check s once more, see that 6 isn't in it, so we stop.
E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. Binary search finds that 2 elements are at most 4, so we increment to 6. Now we compare to the next larger element of s which is 5 so increment to 7. We check s once more, see that the next number is > 7, so we stop.
If we assume that "s is substantially smaller than n" means |s| <= log(n), then we will increment at most log(n) times, and in any case at most s times.
If s is not sorted then we can do the following. Create an array of bits of size s. Generate k. Parse s and do two things: 1) count the number of elements < k, call this r. At the same time, set the i'th bit to 1 if k+i is in s (0 indexed so if k is in s then the first bit is set).
Now, increment k a number of times equal to r plus the number of set bits is the array with an index <= the number of times incremented.
E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. We parse s and 1) note that 1 element is below 4 (r=1), and 2) set our array to [1, 1, 0, 0]. We increment once for r=1 and an additional two times for the two set bits, ending up at 7.
This is O(s) time, O(s) space.
This is an O(1) solution with O(s) initial setup that works by mapping each non-allowed number > s to an allowed number <= s.
Let S be the set of non-allowed values, S(i), where i = [1 .. s] and s = |S|.
Here's a two part algorithm. The first part constructs a hash table based only on S in O(s) time, the second part finds the random value k ∈ {1..n}, k ∉ S in O(1) time, assuming we can generate a uniform random number in a contiguous range in constant time. The hash table can be reused for new random values and also for new n (assuming S ⊂ { 1 .. n } still holds of course).
To construct the hash, H. First set j = 1. Then iterate over S(i), the elements of S. They do not need to be sorted. If S(i) > s, add the key-value pair (S(i), j) to the hash table, unless j ∈ S, in which case increment j until it is not. Finally, increment j.
To find a random value k, first generate a uniform random value in the range s + 1 to n, inclusive. If k is a key in H, then k = H(k). I.e., we do at most one hash lookup to insure k is not in S.
Python code to generate the hash:
def substitute(S):
H = dict()
j = 1
for s in S:
if s > len(S):
while j in S: j += 1
H[s] = j
j += 1
return H
For the actual implementation to be O(s), one might need to convert S into something like a frozenset to insure the test for membership is O(1) and also move the len(S) loop invariant out of the loop. Assuming the j in S test and the insertion into the hash (H[s] = j) are constant time, this should have complexity O(s).
The generation of a random value is simply:
def myrand(n, s, H):
k = random.randint(s + 1, n)
return (H[k] if k in H else k)
If one is only interested in a single random value per S, then the algorithm can be optimized to improve the common case, while the worst case remains the same. This still requires S be in a hash table that allows for a constant time "element of" test.
def rand_not_in(n, S):
k = random.randint(len(S) + 1, n);
if k not in S: return k
j = 1
for s in S:
if s > len(S):
while j in S: j += 1
if s == k: return j
j += 1
Optimizations are: Only generate the mapping if the random value is in S. Don't save the mapping to a hash table. Short-circuit the mapping generation when the random value is found.
Actually, the rejection method seems like the practical approach.
Generate a number in 1...n and check whether it is forbidden; regenerate until the generated number is not forbidden.
The probability of a single rejection is p = s/n.
Thus the expected number of random number generations is 1 + p + p^2 + p^3 + ... which is 1/(1-p), which in turn is equal to n/(n-s).
Now, if s is much less than n, or even more up to s = n/2, this expected number is at most 2.
It would take s almost equal to n to make it infeasible in practice.
Multiply the expected time by log s if you use a tree-set to check whether the number is in the set, or by just 1 (expected value again) if it is a hash-set. So the average time is O(1) or O(log s) depending on the set implementation. There is also O(s) memory for storing the set, but unless the set is given in some special way, implicitly and concisely, I don't see how it can be avoided.
(Edit: As per comments, you do this only once for a given set.
If, additionally, we are out of luck, and the set is given as a plain array or list, not some fancier data structure, we get O(s) expected time with this approach, which still fits into the O(log n + s) requirement.)
If attacks against the unbounded algorithm are a concern (and only if they truly are), the method can include a fall-back algorithm for the cases when a certain fixed number of iterations didn't provide the answer.
Similarly to how IntroSort is QuickSort but falls back to HeapSort if the recursion depth gets too high (which is almost certainly a result of an attack resulting in quadratic QuickSort behavior).
Find all numbers that are in a forbidden set and less or equal then n-s. Call it array A.
Find all numbers that are not in a forbidden set and greater then n-s. Call it array B. It may be done in O(s) if set is sorted.
Note that lengths of A and B are equal, and create mapping map[A[i]] = B[i]
Generate number t up to n-s. If there is map[t] return it, otherwise return t
It will work in O(s) insertions to a map + 1 lookup which is either O(s) in average or O(s log s)
A frequent task in parallelizing N embarrassingly parallel work chunks contiuguously among K workers is to use the following algorithm to partition, in psuedocode:
acc = 0
for _ in range(K):
end = acc + ceil(N/K)
emit acc:end
acc = end
This will emit K contiguous paritions generally of size N/K and works fine for large N. However if K is approximately N this may cause imbalance because the last worker will get very few items. If we define imbalance as the maximum absolute difference between partition sizes, then an iterative algorithm that starts from any random partition and reduces potential until the maximum difference is 1 (or 0 if K divides N) is going to be optimal.
It seems to me that the following may be a more efficient way of getting at the same answer, by performing online "re-planning". Does this algorithm have a name and optimality proof?
acc = 0
workers = K
while workers > 0:
rem = N - acc
end = acc + ceil(rem/workers)
emit acc:end
acc = end
workers -= 1
Edit. Given that we can define the loop above recursively, I can see that an inductive optimality proof might work. In any case, the name and confirmation of its optimality would be appreciated :)
A simple way of dividing the range is:
for i in range(K):
emit (i*N // K):((i+1)*N // K)
This has the advantage of being itself parallelizable since the iterations do not need to be performed in order.
It is easy to prove that every partition has either floor(N/K) or ceil(N/K) elements, and it is evident that every element will be in exactly one partition. Since floor and ceiling differ by at most 1, the algorithm must be optimal.
The algorithm you suggest is also optimal (and the results are similar). I don't know its name, though.
Another way of dividing the ranges which can be done in parallel is to use the range start(N, K, i):start(N, K, i+1) where start(N, K, i) is (N//K)*i + min(i, N%K). (Note that N//K and N%K only need to be computed once.) This algorithm is also optimal, but distributes the imbalance so that the first partitions are the larger ones. That may or may not be useful.
Here's a simpler approach. You have floor(N/K) tasks which can be perfectly partitioned among the workers, leaving N mod K remaining tasks. To keep the regions contiguous, you can put the remaining tasks on the first N mod K workers.
Here it is in imperative style. Just to be clear, I'm numbering the tasks {0..(N-1)}, and emitting sets of contiguous task numbers.
offset = 0
for 0 <= i < K:
end = offset + floor(N/K)
if i < N mod K:
end = end + 1
emit {c | offset <= c < end}
offset = end
And in a more declarative style:
chunk = floor(N/K)
rem = N mod K
// i == worker number
function offset(i) =
i * chunk + (i if i < rem else rem)
for 0 <= i < K:
emit {c | offset(i) <= c < offset(i+1)}
The proof of optimality is pretty trivial at this point. Worker i has offset(i+1) - offset(i) tasks assigned to it. Depending on i, this is either floor(N/K) or floor(N/K) + 1 tasks.
Given a set S of positive integers whose elements need not to be distinct i need to find minimal non-negative sum that cant be obtained from any subset of the given set.
Example : if S = {1, 1, 3, 7}, we can get 0 as (S' = {}), 1 as (S' = {1}), 2 as (S' = {1, 1}), 3 as (S' = {3}), 4 as (S' = {1, 3}), 5 as (S' = {1, 1, 3}), but we can't get 6.
Now we are given one array A, consisting of N positive integers. Their are M queries,each consist of two integers Li and Ri describe i'th query: we need to find this Sum that cant be obtained from array elements ={A[Li], A[Li+1], ..., A[Ri-1], A[Ri]} .
I know to find it by a brute force approach to be done in O(2^n). But given 1 ≤ N, M ≤ 100,000.This cant be done .
So is their any effective approach to do it.
Concept
Suppose we had an array of bool representing which numbers so far haven't been found (by way of summing).
For each number n we encounter in the ordered (increasing values) subset of S, we do the following:
For each existing True value at position i in numbers, we set numbers[i + n] to True
We set numbers[n] to True
With this sort of a sieve, we would mark all the found numbers as True, and iterating through the array when the algorithm finishes would find us the minimum unobtainable sum.
Refinement
Obviously, we can't have a solution like this because the array would have to be infinite in order to work for all sets of numbers.
The concept could be improved by making a few observations. With an input of 1, 1, 3, the array becomes (in sequence):
(numbers represent true values)
An important observation can be made:
(3) For each next number, if the previous numbers had already been found it will be added to all those numbers. This implies that if there were no gaps before a number, there will be no gaps after that number has been processed.
For the next input of 7 we can assert that:
(4) Since the input set is ordered, there will be no number less than 7
(5) If there is no number less than 7, then 6 cannot be obtained
We can come to a conclusion that:
(6) the first gap represents the minimum unobtainable number.
Algorithm
Because of (3) and (6), we don't actually need the numbers array, we only need a single value, max to represent the maximum number found so far.
This way, if the next number n is greater than max + 1, then a gap would have been made, and max + 1 is the minimum unobtainable number.
Otherwise, max becomes max + n. If we've run through the entire S, the result is max + 1.
Actual code (C#, easily converted to C):
static int Calculate(int[] S)
{
int max = 0;
for (int i = 0; i < S.Length; i++)
{
if (S[i] <= max + 1)
max = max + S[i];
else
return max + 1;
}
return max + 1;
}
Should run pretty fast, since it's obviously linear time (O(n)). Since the input to the function should be sorted, with quicksort this would become O(nlogn). I've managed to get results M = N = 100000 on 8 cores in just under 5 minutes.
With numbers upper limit of 10^9, a radix sort could be used to approximate O(n) time for the sorting, however this would still be way over 2 seconds because of the sheer amount of sorts required.
But, we can use statistical probability of 1 being randomed to eliminate subsets before sorting. On the start, check if 1 exists in S, if not then every query's result is 1 because it cannot be obtained.
Statistically, if we random from 10^9 numbers 10^5 times, we have 99.9% chance of not getting a single 1.
Before each sort, check if that subset contains 1, if not then its result is one.
With this modification, the code runs in 2 miliseconds on my machine. Here's that code on http://pastebin.com/rF6VddTx
This is a variation of the subset-sum problem, which is NP-Complete, but there is a pseudo-polynomial Dynamic Programming solution you can adopt here, based on the recursive formula:
f(S,i) = f(S-arr[i],i-1) OR f(S,i-1)
f(-n,i) = false
f(_,-n) = false
f(0,i) = true
The recursive formula is basically an exhaustive search, each sum can be achieved if you can get it with element i OR without element i.
The dynamic programming is achieved by building a SUM+1 x n+1 table (where SUM is the sum of all elements, and n is the number of elements), and building it bottom-up.
Something like:
table <- SUM+1 x n+1 table
//init:
for each i from 0 to SUM+1:
table[0][i] = true
for each j from 1 to n:
table[j][0] = false
//fill the table:
for each i from 1 to SUM+1:
for each j from 1 to n+1:
if i < arr[j]:
table[i][j] = table[i][j-1]
else:
table[i][j] = table[i-arr[j]][j-1] OR table[i][j-1]
Once you have the table, you need the smallest i such that for all j: table[i][j] = false
Complexity of solution is O(n*SUM), where SUM is the sum of all elements, but note that the algorithm can actually be trimmed after the required number was found, without the need to go on for the next rows, which are un-needed for the solution.