Quad combination time complexity - algorithm

I came across this problem: Given an array of numbers arr and a number S, find 4 different numbers in arr that sum up to S.
Where the solution is:
function findArrayQuadCombination(arr, S):
if (arr == null OR S == null):
return null
n = length(arr)
if (n < 4):
return null
# hashing implementation language dependent:
pairHash = new HashTable()
for i from 0 to n-1
for j from i+1 to n-1
if !pairHash.isMapped(arr[i]+arr[j]):
pairHash.map(arr[i]+arr[j], [])
pairHash.get(arr[i]+arr[j]).push([i, j])
for pairSum in pairHash.getKeys()
if pairHash.isMapped(S - pairSum):
pairsA = pairHash.get(pairSum)
pairsB = pairHash.get(S - pairsSum)
combination = find4Uniques(pairsA, pairsB)
if (combination != null):
return combination
return null
# Helper function.
# Gets 2 arrays of sub-arrays of 2 numbers
# Gets 4 unique numbers, from 2 sub-arrays of different arrays
function find4Uniques(A, B):
lenA = length(A)
lenB = length(B)
for i from 0 to lenA-1:
for j from 0 to lenB-1:
if ( A[i][0] == B[j][0] OR A[i][1] == B[j][1] OR
A[i][0] == B[j][1] OR A[i][1] == B[j][0] ):
continue
else:
return [A[i][0], A[i][1], B[j][0], B[j][1]]
return null
The solution says that it is O(n^2), but I disagree.
lenA and lenB in find4Uniques can be at most n^2 in length so find4Uniques is O(n^4)
The "for pairSum in pairHash.getKeys()" line is O(n^2) because there can be n^2 different keys. So shouldn't the whole thing be O(n^6)?

For the complexity to be O(n^6), the lengths you've given would need to be all true at the same time. Also, the early return in the final loop couldn't be triggered, and the lengths would need to be possible given the mathematical constraints of the sum combinations.
The problem is that the lengths are dependant on each other.
If you had n^2 keys, their values can now only have a length of 1 because every pair would need to sum to a different value.
If the list had a length n^2, then all the pairs would sum to a single value, so there's now only 1 key.
If both lenA and lenB were n^2, then you'd get a non-null return from find4Uniques for at least one of the combinations, exiting the whole algorithm, so it can't be run n^2 times as well.
To show the time-complexity, you'd need to give an actual value of arr and S that give that complexity.

If all values in arr are distinct, then find4Uniques will return a value within 3 iterations of the inner loop if B has size 3+. That makes all calls to find4Uniques bounded above by the sum of the sizes of the arrays that can be passed in for A. Which is the number of pairs of elements in arr and is O(n^2).
However if values in arr are NOT distinct, then it need not perform well. In particular if S = 6 and arr = [0, 0, ..., 0, 1, 1, ..., 1, 4, 4,..., 4] then the answer is null but for pairSum == 1 we'll have O(n^2) values in A that all look like [0, 1] meeting O(n^2) values in B that all look like [1, 4] and will do O(n^4) work.
However fixing this performance bug can be easily done by just deduping arr first.

Related

MinAbsSum task from codility

There already is a topic about this task, but I'd like to ask about my specific approach. The task is:
For a given array A of N integers and a sequence S of N integers from
the set {−1, 1}, we define val(A, S) as follows:
val(A, S) = |sum{ A[i]*S[i] for i = 0..N−1 }|
(Assume that the sum of zero elements equals zero.)
For a given array A, we are looking for such a sequence S that
minimizes val(A,S).
Write a function:
def solution(A)
that, given an array A of N integers, computes the minimum value of
val(A,S) from all possible values of val(A,S) for all possible
sequences S of N integers from the set {−1, 1}.
For example, given array:
A[0] = 1 A1 = 5 A[2] = 2 A[3] = -2 your function should
return 0, since for S = [−1, 1, −1, 1], val(A, S) = 0, which is the
minimum possible value.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [0..20,000]; each element of array A
is an integer within the range [−100..100].
My approach is to iterate through the array, track all possibble solutions in a set and chose the smallest one. To limit the time complexity, I save only the results that are less or equal sum(abs(A)). My code is:
def solution(A):
if not len (A):
return 0
A = [abs(a) for a in A]
possible_results = set([A[0]])
limit = sum(A)
for a in A[1:]:
possible_so_far = set()
for val in possible_results:
if abs(val + a) <= limit:
possible_so_far.add(abs(val + a))
if abs(val - a) <= limit:
possible_so_far.add(abs(val - a))
possible_results = possible_so_far
return min(possible_results)
It passes all the correctness tests, but failes some performance due to the timeout. The detected time complexity is O(N**2 * max(abs(A))), but I don't understand where the square comes from. The main loop is O(N) and the size of the set is up to sum(A), so the final complexity should be O(N * sum(A)).

Generate one permutation from an index

Is there an efficient algorithm to generate a permutation from one index provided? The permutations do not need to have any specific ordering and it just needs to return every permutation once per every possible index. The set I wish to permute is all integers from 0~255.
If I understand the question correctly, the problem is as follows: You are given two integers n and k, and you want to find the kth permutation of n integers. You don't care about it being the kth lexicographical permutation, but it's just easier to be lexicographical so let's stick with that.
This is not too bad to compute. The base permutation is 1,2,3,4...n. This is the k=0 case. Consider what happens if you were to swap the 1 and 2: by moving the 1, you are passing up every single permutation where 1 goes first, and there are (n-1)! of those (since you could have permuted 2,3,4..n if you fixed the 1 in place). Thus, the algorithm is as follows:
for i from 1 to n:
j = k / (n-i)! // integer division, so rounded down
k -= j * (n-i)!
place down the jth unplaced number
This will iteratively produce the kth lexicographical permutation, since it repeatedly solves a sub-problem with a smaller set of numbers to place, and decrementing k along the way.
There is an implementation in python in module more-itertools: nth_permutation.
Here is an implementation, adapted from the code of more_itertools.nth_permutation:
from sympy import factorial
def nth_permutation(iterable, index):
pool = list(iterable)
n = len(pool)
c = factorial(n)
index = index % c
result = [0] * n
q = index
for d in range(1, n + 1):
q, i = divmod(q, d)
if 0 <= n - d < n:
result[n - d] = i
if q == 0:
break
return tuple(map(pool.pop, result))
print( nth_permutation(range(6), 360) )
# (3, 0, 1, 2, 4, 5)

Counting valid sequences with dynamic programming

I am pretty new to Dynamic Programming, but I am trying to get better. I have an exercise from a book, which asks me the following question (slightly abridged):
You want to construct sequence of length N from numbers from the set {1, 2, 3, 4, 5, 6}. However, you cannot place the number i (i = 1, 2, 3, 4, 5, 6) more than A[i] times consecutively, where A is a given array. Given the sequence length N (1 <= N <= 10^5) and the constraint array A (1 <= A[i] <= 50), how many sequences are possible?
For instance if A = {1, 2, 1, 2, 1, 2} and N = 2, this would mean you can only have one consecutive 1, two consecutive 2's, one consecutive 3, etc. Here, something like "11" is invalid since it has two consecutive 1's, whereas something like "12" or "22" are both valid. It turns out that the actual answer for this case is 33 (there are 36 total two-digit sequences, but "11", "33", and "55" are all invalid, which gives 33).
Somebody told me that one way to solve this problem is to use dynamic programming with three states. More specifically, they say to keep a 3d array dp(i, j, k) with i representing the current position we are at in the sequence, j representing the element put in position i - 1, and k representing the number of times that this element has been repeated in the block. They also told me that for the transitions, we can put in position i every element different from j, and we can only put j in if A[j] > k.
It all makes sense to me in theory, but I've been struggling with implementing this. I have no clue how to begin with the actual implementation other than initializing the matrix dp. Typically, most of the other exercises had some sort of "base case" that were manually set in the matrix, and then a loop was used to fill in the other entries.
I guess I am particularly confused because this is a 3D array.
For a moment let's just not care about the array. Let's implement this recursively. Let dp(i, j, k) be the number of sequences with length i, last element j, and k consecutive occurrences of j at the end of the array.
The question now becomes how do we write the solution of dp(i, j, k) recursively.
Well we know that we are adding a j the kth time, so we have to take each sequence of length i - 1, and has j occurring k - 1 times, and add another j to that sequence. Notice that this is simply dp(i - 1, j, k - 1).
But what if k == 1? If that's the case we can add one occurence of j to every sequence of length i - 1 that doesn't end with j. Essentially we need the sum of all dp(i, x, k), such that A[x] >= k and x != j.
This gives our recurrence relation:
def dp(i, j, k):
# this is the base case, the number of sequences of length 1
# one if k is valid, otherwise zero
if i == 1: return int(k == 1)
if k > 1:
# get all the valid sequences [0...i-1] and add j to them
return dp(i - 1, j, k - 1)
if k == 1:
# get all valid sequences that don't end with j
res = 0
for last in range(len(A)):
if last == j: continue
for n_consec in range(1, A[last] + 1):
res += dp(i - 1, last, n_consec)
return res
We know that our answer will be all valid subsequences of length N, so our final answer is sum(dp(N, j, k) for j in range(len(A)) for k in range(1, A[j] + 1))
Believe it or not this is the basis of dynamic programming. We just broke our main problem down into a set of subproblems. Of course, right now our time is exponential because of the recursion. We have two ways to lower this:
Caching, we can simply keep track of the result of each (i, j, k) and then spit out what we originally computed when it's called again.
Use an array. We can reimplement this idea with bottom-up dp, and have an array dp[i][j][k]. All of our function calls just become array accesses in a for loop. Note that using this method forces us iterate over the array in topological order which may be tricky.
There are 2 kinds of dp approaches: top-down and bottom-up
In bottom up, you fill the terminal cases in dp table and then use for loops to build up from that. Lets consider bottom-up algo to generate Fibonacci sequence. We set dp[0] = 1 and dp[1] = 1 and run a for loop from i = 2 to n.
In top down approach, we start from the "top" view of the problem and go down from there. Consider the recursive function to get n-th Fibonacci number:
def fib(n):
if n <= 1:
return 1
if dp[n] != -1:
return dp[n]
dp[n] = fib(n - 1) + fib(n - 2)
return dp[n]
Here we don't fill the complete table, but only the cases we encounter.
Why I am talking about these 2 types is because when you start learning dp, it is often difficult to come up with bottom-up approaches (like you are trying to). When this happens, first you want to come up with a top-down approach, and then try to get a bottom up solution from that.
So let's create a recursive dp function first:
# let m be size of A
# initialize dp table with all values -1
def solve(i, j, k, n, m):
# first write terminal cases
if k > A[j]:
# this means sequence is invalid. so return 0
return 0
if i >= n:
# this means a valid sequence.
return 1
if dp[i][j][k] != -1:
return dp[i][j][k]
result = 0
for num = 1 to m:
if num == j:
result += solve(i + 1, num, k + 1, n)
else:
result += solve(i + 1, num, 1, n)
dp[i][j][k] = result
return dp[i][j][k]
So we know what terminal cases are. We create a dp table of size dp[n + 1][m][50]. Initialize it with all values 0, not -1.
So we can do bottom-up as:
# initially all values in table are zero. With loop below, we set the valid endings as 1.
# So any state trying to reach valid terminal states will get 1, but invalid states will
# return the values 0
for num = 1 to m:
for occour = 1 to A[num]:
dp[n][num][occour] = 1
# now to build up from bottom, we start by filling n-1 th position
for i = n-1 to 1:
for num = 1 to m:
for occour = 1 to A[num]:
for next_num = 1 to m:
if next_num != num:
dp[i][num][occour] += dp[i + 1][next_num][1]
else:
dp[i][num][occour] += dp[i + 1][num][occour + 1]
The answer will be:
sum = 0
for num = 1 to m:
sum += dp[1][num][1]
I am sure there must be some more elegant dp solution, but I believe this answers your question. Note that I considered that k is the number of times j-th number has been repeated consecutively, correct me if I am wrong with this.
Edit:
With the given constraints the size of the table will be, in the worst case, 10^5 * 6 * 50 = 3e7. This would be > 100MB. It is workable, but can be considered too much space use (I think some kernels doesn't allow that much stack space to a process). One way to reduce it would be to use a hash-map instead of an array with top down approach since top-down doesn't visit all the states. That would be mostly true in this case, for example if A[1] is 2, then all the other states where 1 has occoured more that twice need not be stored. Ofcourse this would not save much space if A[i] has large values, say [50, 50, 50, 50, 50, 50]. Another approach would be to modify our approach a bit. We dont actually need to store the dimension k, i.e. the times j has appeared consecutively:
dp[i][j] = no of ways from i-th position if (i - 1)th position didn't have j and i-th position is j.
Then, we would need to modify our algo to be like:
def solve(i, j):
if i == n:
return 1
if i > n:
return 0
if dp[i][j] != -1
return dp[i][j]
result = 0
# we will first try 1 consecutive j, then 2 consecutive j's then 3 and so on
for count = 1 to A[j]:
for num = 1 to m:
if num != j:
result += solve(i + count, num)
dp[i][j] = result
return dp[i][j]
This approach will reduce our space complexity to O(10^6) ~= 2mb, while time complexity is still the same : O(N * 6 * 50)

How can i find the total number of distinct arrays that can be obtained after applying given operation exactly k times?

Given an array and elements inside the array are in range [-10^6, 10^6].
We also have an integer kand we need to find how many different arrays can be obtained by applying an operation exactly k times. The only operation is to pick any element of the array and multiply it by -1.
For example, Array A = {1, 2, 1} and k = 2, different array obtained after k operations is 4 ({1, 2, 1}, {-1, -2, 1}, {-1, 2, -1}, {1, -2,-1}).
Although, Code and explanation are provided here but it is hard to understand. Please someone simplify that explanation or give some other approach to solve the problem. Thanks.
Let the size of the array be n. First see that the answer doesn't depend on the order of operations done.
Consider the two cases :
Case 1 : There are no zeros in the array and
Case 2 : There are non-zero number of zeros in the array.
Considering Case 1 :
Sub-Case 1 : Number of elements >= number of operations i.e n > k
Suppose we allow a maximum of 1 operation on every element, we can see that we can get nck different arrays having k changed elements from the original array.
But what happens when we do 2 operations on a single element ? The element basically doesn't change and keeping in mind that the order of operations doesn't change, you can put it this way : You took the initial array, selected an element, multiplied it by -1 twice and hence you are with the exact original array now but with just k-2 operations in your hand which means that we are throwing away 2 of our k chances initially. Now we can carefully perform the k-2 operations one on each element and get nck-2 different arrays. Similarly you can throw away 4, 6, 8, .... chances and get nck-4, nck-6, nck-8, ..... arrays respectively for each case.
This leads to nck+nck-2+nck-4+nck-6+ nck-8+ ....... number of possible arrays if no element in the array is zero.
Sub Case 2 : n < k
Since the number of operations are greater than number of elements you have to throw away some of your operations because you have to apply more than 1 operation on at least one element. So, if n and k both are even or both are odd you should throw k-n of your operations and have n operations left and from here it is just the sub case 1. If one is odd and one is even you have to throw away k-n+1 of your operations and have n-1 operations left and again it is just the sub case 1 from this point. You can try to get the expression for this case.
Considering case 2 :
Notice that in the earlier case you were only able to throw away an even number of operations.
Even here, there arise the cases n >= k and n < k.
For n >= k case :
Since there is at least one zero, you will now be able to throw away any number of operations by just applying that number of operations on any of the zeros since multiplying a zero with -1 doesn't affect it.
So the answer here will simply be nck+nck-1+nck-2+nck-3+ nck-4+ .......
And for n < k case :
The answer would be ncn+ncn-1+ncn-2+ncn-3+ ncn-4+ ....... = 2n
I think this is a dynamic programming problem because you have to calculate the sum of ncrs. Logic wise it is a combinatorics problem.
Ok let's go throught the code,
First there is this function nChoosek: it is a function that calculate the combination calculator, and this is what will be used to solve the problem
Conbinaison is basically the number of selecting part of a collection https://en.wikipedia.org/wiki/Combination Example for array {1, 2, 3} if I tell you chose two item from the three item of the array this is Combination of tow from three, in the code it is nChoosek(2,3) = card{(1,2), (2,3), (1,3)} = 3
If we consider the problem with those three additional conditions
1- you can't multiply the same item twice
2- n<=k
3- there is no zero in the array
The solution here will be nChoosek(k,n) but since those constraints exist we have to deal with each one of them
For the first one we can multiply the same item twice: so for nChoosek(k,n) we should the number of array that we can have if we multiply an item (or many) twice by -1..
but wait let's consider the combinaition when we multiply a single item twice: here we lost two multiplication without changing the array so the number of combination that we have will be nChoosek(k -2 ,n)
The same way if we decide to multiply two item twice the result will be nChoosek(k -4 ,n)
From that comes
for(; i >= 0; i -= 2){
ans += nChoosek(n, i);
ans = ans % (1000000007l);
}
For the case where k > n applying the algorithm imply that we will multiply at least one element twice so it is similar to applying the algorthm with k-2 and n
if k-2 still bigger than n we can by the same logic transform it to its equivalent with n and k-4 and so on until k-2*i <=n and k- 2 *(i+1) > 0 It is obvious here that this k-2*i will be n or n-1 so the new k will be n or n-1 and this justify this code
if(k <= n){
i = k;
}else if((k % 2 == 0 && n % 2 == 0) || (k % 2 != 0 && n % 2 != 0)){
i = n;
}else if((k % 2 == 0 && n % 2 != 0) || (k % 2 != 0 && n % 2 == 0)){
i = n - 1;
}
Now the story of zero, if we consider T1 = {1,2,3} and T2 ={0,1,0,0,2,3,0,0,0} and k =2 you can notice that the dealing with an array with length = n and has m zero is similar to dealing with array of length = n-m with no zero

Minimum sum that cant be obtained from a set

Given a set S of positive integers whose elements need not to be distinct i need to find minimal non-negative sum that cant be obtained from any subset of the given set.
Example : if S = {1, 1, 3, 7}, we can get 0 as (S' = {}), 1 as (S' = {1}), 2 as (S' = {1, 1}), 3 as (S' = {3}), 4 as (S' = {1, 3}), 5 as (S' = {1, 1, 3}), but we can't get 6.
Now we are given one array A, consisting of N positive integers. Their are M queries,each consist of two integers Li and Ri describe i'th query: we need to find this Sum that cant be obtained from array elements ={A[Li], A[Li+1], ..., A[Ri-1], A[Ri]} .
I know to find it by a brute force approach to be done in O(2^n). But given 1 ≤ N, M ≤ 100,000.This cant be done .
So is their any effective approach to do it.
Concept
Suppose we had an array of bool representing which numbers so far haven't been found (by way of summing).
For each number n we encounter in the ordered (increasing values) subset of S, we do the following:
For each existing True value at position i in numbers, we set numbers[i + n] to True
We set numbers[n] to True
With this sort of a sieve, we would mark all the found numbers as True, and iterating through the array when the algorithm finishes would find us the minimum unobtainable sum.
Refinement
Obviously, we can't have a solution like this because the array would have to be infinite in order to work for all sets of numbers.
The concept could be improved by making a few observations. With an input of 1, 1, 3, the array becomes (in sequence):
(numbers represent true values)
An important observation can be made:
(3) For each next number, if the previous numbers had already been found it will be added to all those numbers. This implies that if there were no gaps before a number, there will be no gaps after that number has been processed.
For the next input of 7 we can assert that:
(4) Since the input set is ordered, there will be no number less than 7
(5) If there is no number less than 7, then 6 cannot be obtained
We can come to a conclusion that:
(6) the first gap represents the minimum unobtainable number.
Algorithm
Because of (3) and (6), we don't actually need the numbers array, we only need a single value, max to represent the maximum number found so far.
This way, if the next number n is greater than max + 1, then a gap would have been made, and max + 1 is the minimum unobtainable number.
Otherwise, max becomes max + n. If we've run through the entire S, the result is max + 1.
Actual code (C#, easily converted to C):
static int Calculate(int[] S)
{
int max = 0;
for (int i = 0; i < S.Length; i++)
{
if (S[i] <= max + 1)
max = max + S[i];
else
return max + 1;
}
return max + 1;
}
Should run pretty fast, since it's obviously linear time (O(n)). Since the input to the function should be sorted, with quicksort this would become O(nlogn). I've managed to get results M = N = 100000 on 8 cores in just under 5 minutes.
With numbers upper limit of 10^9, a radix sort could be used to approximate O(n) time for the sorting, however this would still be way over 2 seconds because of the sheer amount of sorts required.
But, we can use statistical probability of 1 being randomed to eliminate subsets before sorting. On the start, check if 1 exists in S, if not then every query's result is 1 because it cannot be obtained.
Statistically, if we random from 10^9 numbers 10^5 times, we have 99.9% chance of not getting a single 1.
Before each sort, check if that subset contains 1, if not then its result is one.
With this modification, the code runs in 2 miliseconds on my machine. Here's that code on http://pastebin.com/rF6VddTx
This is a variation of the subset-sum problem, which is NP-Complete, but there is a pseudo-polynomial Dynamic Programming solution you can adopt here, based on the recursive formula:
f(S,i) = f(S-arr[i],i-1) OR f(S,i-1)
f(-n,i) = false
f(_,-n) = false
f(0,i) = true
The recursive formula is basically an exhaustive search, each sum can be achieved if you can get it with element i OR without element i.
The dynamic programming is achieved by building a SUM+1 x n+1 table (where SUM is the sum of all elements, and n is the number of elements), and building it bottom-up.
Something like:
table <- SUM+1 x n+1 table
//init:
for each i from 0 to SUM+1:
table[0][i] = true
for each j from 1 to n:
table[j][0] = false
//fill the table:
for each i from 1 to SUM+1:
for each j from 1 to n+1:
if i < arr[j]:
table[i][j] = table[i][j-1]
else:
table[i][j] = table[i-arr[j]][j-1] OR table[i][j-1]
Once you have the table, you need the smallest i such that for all j: table[i][j] = false
Complexity of solution is O(n*SUM), where SUM is the sum of all elements, but note that the algorithm can actually be trimmed after the required number was found, without the need to go on for the next rows, which are un-needed for the solution.

Resources