Need an algorithm for this problem - algorithm

There are two integer sequences A[] and B[] of length N,both unsorted.
Requirement: through the swapping of elements between A[] and B[]( can randomly exchange, not with same index), make the difference between {the sum of all elements in A[]} and {the sum of all elements in B[]} to be minimum.
PS: actually,it is an interview question I encountered.
Many thanks

This is going to be NP-hard! I believe you can do a reduction from Subset Sum to this.
As per BlueRaja/polygene's comments, I will try to provide a full reduction from Subset Sum.
Here is a reduction:
Subset Sum problem: Given integers x1, x2, ..., xn, is there some non-empty subset which sums to zero?
Our problem: Given two integer arrays of size k, find the minimum possible difference of the sum of the two arrays, assuming we can shuffle around the integers in the arrays, treating both arrays as one array.
Say we had a polynomial time algo for our problem.
Say now you are given integers T = {x1,x2, ...,xn} (multiset)
Let Si = x1 + x2 + ...+ xn + xi.
Let Ti = {x1, x2, ..., xi-1, xi+1, ..., xn } ( = T - xi)
Define
Ai = Array formed using Ti
Bi = [Si, 0, ..., 0] (i.e one element is Si and rest are zeroes).
Let mi = the min difference found by our problem for arrays Ai and Bi
(we run our problem n times).
Claim: Some non-empty subset of T sums to zero if and only if, there is some i, for which mi = 0.
Proof: (wlog) say x1 + x2 + .. + xk = 0
Then
A = [xk+1, ..., xn, 0, ...0]
B = [x2, x3, ..., xk, S1, 0, ..0]
gives the minimum difference m1 to be |x2 + .. + xk + (x1 + ... + xn) + x1 - (xk+1 + .. + xn)| = |2(x1+ x2 + .. xk)| = 0.
Similarly the if part can be proved.
In fact, this actually also follows (more easily) from Partition too: just create new array with all zeroes.
Hoepfully I haven't made any mistakes.

Take any instance of the NP-complete partition problem:
Partition a multiset A of positive integers into two multisets B and C with the same sum
like {a1,a2,...,an}. Add n zeroes {0,0,0...,0,a1,...,an} and ask if the set can be partitioned into two multisets A and B with the same sum and same number of elements. I claim these two conditions are equivalent:
If A and B are a solution to the problem, then you can strike out the zeroes and get a solution of partiton problem.
If there is a solution to the partition problem, for example ai1 + ai2 + ... aik = aj1 + ... +ajl where {ai1, ai2, aik, aj1, ..., ajl} = {a1, ... , an} then obviously k+l = n. Add l zeroes to the left side and k zeroes to the right side and you'll get 0 + ... + 0 + ai1 + ai2 + ... aik = 0 + ... + 0 + aj1 + ... +ajl, whichi is a solution of your problem.
So, this is a reduction (so the problem is NP-hard) and the problem is NP, so it is NP-complete.

"sequences A[] and B[] of length N" -> does this mean both A and B are each of length N?
(For the purpose of clarity I am using 1-based arrays below).
If so, how about this:
Assume A[1..N] and B[1..N]
Concatenate A and B into a new array C of length 2N: C[1..N] <- A[1..N]; C[N+1 .. 2N] <- B[1..N]
Sort C in ascending order.
Take the first pair of numbers from C; send the first element (C[1]) to A[1] and second element (C[2]) to B[1]
Take the second pair of numbers from C; this time send the second element (C[4]) to A[2] and the first element (C[3]) to B[2] (the order of elements in the pair sent to A and B is the opposite of 3)
... repeat 3 and 4 until C is exhausted
The observation here is that, in a sorted array, an adjacent pair of numbers will have the smallest difference (compared to a pair of numbers from non-adjacent positions). Step 3 ensures that A[1] and B[1] consists of a pair of numbers with the least possible difference. Step 4 ensures that (a) A[2] and B[2] consist of a pair of numbers with the least possible difference (from the available numbers) and also (b) that the difference is opposite in sign from step 3. By continuing like this, we are ensuring that A[i] and B[i] contain numbers with the least possible difference. Also, by flipping the order in which we send elements to A and B, we are ensuring that the difference changes sign for each successive i.

Try being greedy about it. Given such limited information, I'm not sure what else one could put out there.

I'm not sure that this will ensure the minimum possible distance, but the first thing that comes to mi mind is something like this:
int diff=0;
for (int i = 0; i<len; i++){
int x = a[i] - b[i];
if (abs(diff - x) > abs(diff + x)){
swap(a,b,i);
diff-=x;
}else{
diff+=x;
}
}
assuming that you have a swap function which takes the two arrays and exchanges the items at position i :)
computing and adding the difference between the two values at position i you get the incremental difference between the sums of the elements of the two arrays.
at each step you check if it's better to add (a[i]-b[i]) or (b[i]-a[i]). if the b[i]-a[i] it's the case, you swap the elements at position i in the arrays.
Maybe this will not be the best way, but it should be a start :)

The problem is NP-Complete.
We can reduce the partition problem to the decision version of this problem, i.e. given two arrays of ints of the same size, determine whether items can be swapped so that the sums are equal.
The input to the partition problem: a set S of integers, of size N
In order to transform this input into an input to our problem, we define A to be an array of all items in S, and B an array of the same size, with B[i]=0 for all i. This transformation is linear in the input size.
It is clear that our algorithm applied on A and B returns true if and only if there is a partition of S into 2 subsets such that the sums are equal.

Related

Minimize Sum of Absolute Difference of Two Arrays

I have two arrays of integers A and B both of size n. The cost of a pair is |A(i) - B(i)|.
I want to pair the n elements of A and B such that the sum of all costs across all A(i)s and B(i)s are minimized.
I understand that I can get O(n log n) by sorting A, then sorting B, and then pairing them together from 1...n respectively, but after attempting for hours and hours, I can't figure out how to prove it. Can somebody help me out?
I've seen how to implement it, I just don't get how to prove it
I am following a slightly different approach here to prove this fact by making use of squares rather than absolute.
Consider 2 arrays, A = [a1, a2, ..., an] and B = [b1, b2, ..., bn].
Now, even if I use random pairing (form a pair using any index from A and B ),
Let's say, the sum of squares of difference (S) = a1^2 + b1^2 + a2^2 + b2^2 + ... + an^2 + bn^2 - 2 * (a1 * b3 + a2 * b4 + .... + an * b56 + bn * a34).
The above sum can be represented as S = sum(ai^2) + sum(bi^2) - 2 * sum(ai*bi), for i goes from 1 to n.
To minimise this sum, we need to maximise the part sum(ai*bi), for i goes from 1 to n.
The term sum(ai*bi) will be maximum when the 2 arrays will be sorted.
Thanks for pointing out #Abhinav Mathur: The statement The term sum(ai*bi) will be maximum when the 2 arrays will be sorted can be proved using rearrangement inequality.
Assume that according to the current sorted arrays, there is a pair |x-a|, and another pair |y-b|. Let's say that switching the elements would give a lesser sum i.e. a more optimal solution.
(Note: while switching around two pairs, the rest of array remains unaffected).
Current total sum of pairs = |x-a| + |y-b|
Modified sum after switching pairs = |x-b| + |y-a|
Difference in sums = diff = |x-b| - |x-a| + |y-a| + |y-b|
If diff is negative, it means we have found a better ordering. If not, it means our original solution was better.
Now, you can take cases and analyse this. (Since the arrays are sorted, let x<y (they're from the first array) and a<b (they're from second array).
Case 1: x>b or y<a:
In this case, both sums will be equal, which can be easily seen by expanding the modulus
Case 2: a<x<b:
If y>b, diff = 2*(b-x). Since we assumed b>x, diff is positive.
If y<b, diff = 2*(y-x). Since y>x as stated earlier, diff is again positive.
You can continue taking similar cases and prove that diff will always be positive, meaning that our original ordering will be the most efficient one.
Sorting and pairing creates a matching that we might call "monotonic", which ensures that if A[i] matches B[x] and A[j] matches B[y], then:
If A[i] < A[j] then B[x] <= B[y]; and
If B[x] < B[y] then A[i] <= A[j]
If you choose a matching that is not monotonic, then one of these rules will be violated for some pair of matchings.
If we pick any two elements from both arrays such that A[i] <= A[j] and B[x] <= B[y], then we can evaluate the cost of the monotonic pairing and the other pairing. Note that if A[j] = A[j] or B[i] = B[j] then both pairings have the same cost so it doesn't matter which one we call monotonic.
In order to compare the costs, we need to get rid of the absolute value operations. We can do that by separately considering all the possible orderings between the 4 values:
Case: A[i] <= A[j] <= B[x] <= B[y]:
Monotonic cost: B[x]-A[i] + B[y]-A[j]
Swapped cost: B[y]-A[i] + B[x]-A[j]
Difference: 0
cost is the same - doesn't matter which we choose
Case: A[i] <= B[x] <= A[j] <= B[y]
Monotonic cost: B[x]-A[i] + B[i]-A[j]
other cost: B[y]-A[i] + A[j]-B[x]
Difference: 2A[j] - 2B[x]
since A[j] >= B[x], monotonic is as good or better
... etc
If you go through all 6 possible orderings, in every case you find that the monotonic matching is as good or better. Given any matching, you can make every pair of element matchings monotonic, and the cost can only go down.
If you start with an optimal matching and make every pair of matchings monotonic then you end up with an optimal monotonic matching. (In fact the one you start with has to be monotonic if it's optimal, but we don't have to prove that) Since every monotonic matching has the same cost, and at least one of them is optimal, they must all be optimal.

Find The number of sides of triangle

I am giving a Array A consists of integer, I have to choose any two integer from array A and any third integer from range [L,R] , such that all three integer forms a valid triangle.
I have to find the number of integer in range [L,R] which can be used to form a valid triangle, by choosing any two values from array A.
I know if i know two sides then third side must be range a-b<x<a+b
where a and b are any two integer from A.
How to find the number of valid integer in [L,R] in O(N) N= Size of A time.
L and R and be very large upto 10^20
To my understanding, complete enumeration will yield an efficient solution by the following argument. The maximum number of possible solutions occurs in the case where any selection of elements of A yields a consistent choice of side lengths for a triangle. Such an input, for instance, would be an input of N times the value 1. However, the number of triples chosen from A can be bounded by
(n choose 3) = n!/(3!(n-3))
= n!/(6(n-3)!)
= (1/6)*(n-2)*(n-1)*n
<= n^3
(where choose is meant to denote the binomial coefficient) which is a polynomial number of choices. Any choice can be checked for validity in constant time, as only 3 values are involved.
Now the contest has ended, so here is my way to solve it in the contest.
The problem is asking how many number x in [L,R]can form triangle with any pair (a_i, a_j) in A.
Naive method is to brute all pairs which is O(N^2 * (R-L+1)) = O(N^2 * (R-L))
But indeed, we do not need to test all O(N^2) pairs, we only need to test O(N) pairs IF A is sorted, namely, all adjacent pair (a_i-1, a_i) for i > 0
Why? Because in sorted A:
If there is some pair (a_j, a_i) where a_j < a_i and j != i-1, which can form triangle with x
Then (a_i-1, a_i) must form a triangle with x too:
a_i-1 + a_i > a_j + a_i > x
x + a_i-1 > x + a_j > a_i
x + a_i > x + a_i-1 > a_j
Therefore checking all (a_i-1, a_i) is sufficient, this is the first core idea to solve this problem.
So now we have a O(NlgN + N*(R-L+1)) = O(N*(R-L)) algorithm.
For the original contest problem, it is still too slow as R-L+1 is as large as 10^18, so we need another tricks.
Note that actually by the triangle inequality, for a pair (a_i-1, a_i), indeed we can find a range of x which can form triangle with this pair:
a_i-1 + a_i > x > a_i - a_i-1 (By above 1 & 2)
For example, (4,5) can form triangle with all 1 < x < 9
So instead of iterating all x in [L,R], we try to find range of x for each pair, which can be done in O(N) as we know the range for x of a pair in O(1). Beware that x must fall in the range of [L,R].
After that we have O(N) ranges / segments of x which then we can union them and the size of the result set is the desired answer.
Union O(N) segments can be done easily in O(N) as well, I leave you as a homework :)
Combine both tricks, the algorithm is O(N lg N + N + N) = O(N lg N)

Analysis of sorting Algorithm with probably wrong comparator?

It is an interesting question from an Interview, I failed it.
An array has n different elements [A1 .. A2 .... An](random order).
We have a comparator C, but it has a probability p to return correct results.
Now we use C to implement sorting algorithm (any kind, bubble, quick etc..)
After sorting we have [Ai1, Ai2, ..., Ain] (It could be wrong)。
Now given a number m (m < n), the question is as follows:
What is Expectation of size S of Intersection between {A1, A2, ..., Am} and {Ai1, Ai2, ..., Aim}, in other words, what is E[S]?
Any relationship among m, n and p ?
If we use different sorting algorithm, how will E[S] change ?
My idea is as follows:
When m=n, E[S] = n, surely
When m=n-1, E[S] = n-1+P(An in Ain)
I dont know how to complete the answer but I thought it could be solved through induction.. Any simulation methods would also be fine I think.
Hm, if A1,A2,...,An are in random order (as stated in the question), then all that sorting and probability of correctness of Comparator C does not really matter. The question is then reduced to the expectation of the length of intersection of two random subsets, each of size m, of {A1,...,An}.
The probability that S is k is then (m k)*((n-m) (n-k))/(n m), where (a b) shall denote "a over b", the number of possibilities chosing b elements from a elements. (Because for the second subset we have to choose k elements out of the first subset and m-k elements from the rest.)
E[S] is then the sum(0 <= k <= m) k*(m k)*((n-m) (n-k))/(n m), which reduces to m/(n m) * sum(0 <= k <= m) ((m-1) (k-1))*((n-m) (n-k)). This sum is a basic (well-known) binomial identity giving ((n-1) (m-1)), so finally we get m/(n m) * ((n-1) (m-1)) = m^2/n.
Partial answer for modified question:
Let's assume that Ai1,...,Aim is not intersected with the initial (random) array start, but with the first m values of the correctly sorted array Aj1,...,Ajn. (which seems to be more interesting)
Let us further assume that the comparator C is non-deterministic.
And for simplicity we assume all array elements are different and n=2^N.
Now the partial answer here is at first restricted to
m=1
sorting-algorith=merge sort
Aj1 is the smallest element. In merge sort each element is compared ld(n)=N times. The smallest element is sorted to the first position if and only if it turns out smaller in each of its ld(n)=N comparisons. So the probability P(Ai1=Aj1) = p^N, which equals for m=1 the requested E[S]. So we get
E[S] = p^ld(n)
And here is a partial answer for
m=1
sorting-algorith=bubble sort with getting the smallest elements in place first
If the smallest element is on position k at the beginning (Ak=Aj1), then it takes max(k-1,1) correct comparisons to bring Ak to the front (Ak=Ai1). Since all n start positions are equally probable we get
E[S] = P(Ai1=Aj1) =
= P(Ai1=Aj1|Aj1=A1)*P(Aj1=A1) + ... + P(Ai1=Aj1|Aj1=An)*P(Aj1=An) =
= 1/n (p + p + p^2 + ... + p^(n-1)) = 1/n ((1-p^(n-1))/(1-p)+p-1) =
= (2p - p^2 - p^(n-1)) / (n(1-p))
Good luck for the general case!

Maximum of sums of unsorted array and each of a number of sorted arrays

Given an unsorted array
A = a_1 ... a_n
And a set of sorted Arrays
B_i = b_i_1 ... b_i_n # for i from 1 to $large_number
I would like to find the maximums from the (not yet calculated) sum arrays
C_i = (a_1 + b_i_1) ... (a_n + b_i_n)
for each i.
Is there a trick to do better than just calculating all the C_i and finding their maximums in O($large_number * n)?
Can we do better when we know that the B arrays are just shifts from an endless sequence,
e.g.
S = 0 1 4 9 16 ...
B_i = S[i:i+n]
(The above sequence has the maybe advantageous property that (S_i - S_i-1 > S_i-1 - S_i-2))
There are $large_number * n data in your first problem, so there can't be any such trick.
You can prove this with an adversary argument. Suppose you have an algorithm that solves your problem without looking at all n * $large_number entries of b. I'm going to pick a fixed a, namely (-10, -20, -30, ..., -10n). The first $large_number * n - 1 the algorithm looks at an entry b_(i,j), I'll answer that it's 10j, for a sum of zero. The last time it looks at an entry, I'll answer that it's 10j+1, for a sum of 1.
If $large_number is Omega(n), your second problem requires you to look at n * $large_number entries of S, so it also can't have any such trick.
However, if you specify S, there may be something. And if $large_number <= n/2 (or whatever it is), then, all of the entries of S must be sorted, so you only have to look at the last B.
If we don't know anything I don't it's possible to do better than O($large_number * n)
However - If it's just shifts of an endless sequence we can do it in O($large_number + n):
We calculate B_0 ןמ O($large_number).
Than B_1 = (B_0 - S[0]) + S[n+1]
And in general: B_i = (B_i-1 - S[i-1]) + S[i-1+n].
So we can calculate all the other entries and the max in O(n).
This is for a general sequence - if we have some info about it, it might be possible to do better.
we know that the B arrays are just shifts from an endless sequence,
e.g.
S = 0 1 4 9 16 ...
B_i = S[i:i+n]
You can easily calculate S[i:i+n] as (sum of squares from 1 to i+n) - (sum of squares from 1 to i-1)
See https://math.stackexchange.com/questions/183316/how-to-get-to-the-formula-for-the-sum-of-squares-of-first-n-numbers
With the provided example, S1 = 0, S2 = 1, S3 = 4...
Let f(n) = SUM of Si for i=1 to n = (n-1)(n)(2n-1)/6
B_i = f(i+n) - f(i-1)
You then add SUM(A) to each sum.
Another approach is to calculate the difference between B_i and B_(i-1):
That would be: S[i:i+n] - S[i-1:i+n-1] = S(i+n) - S(i-1)
That way, you can just calculate the difference of the sums of each array with the previous one. In my understanding, since Ci = SUM(Bi)+SUM(A), SUM(A) becomes a constant that is irrelevant in finding the maximum.

Minimal number of swaps?

There are N characters in a string of types A and B in the array (same amount of each type). What is the minimal number of swaps to make sure that no two adjacent chars are same if we can only swap two adjacent characters ?
For example, input is:
AAAABBBB
The minimal number of swaps is 6 to make the array ABABABAB. But how would you solve it for any kind of input ? I can only think of O(N^2) solution. Maybe some kind of sort ?
If we need just to count swaps, then we can do it with O(N).
Let's assume for simplicity that array X of N elements should become ABAB... .
GetCount()
swaps = 0, i = -1, j = -1
for(k = 0; k < N; k++)
if(k % 2 == 0)
i = FindIndexOf(A, max(k, i))
X[k] <-> X[i]
swaps += i - k
else
j = FindIndexOf(B, max(k, j))
X[k] <-> X[j]
swaps += j - k
return swaps
FindIndexOf(element, index)
while(index < N)
if(X[index] == element) return index
index++
return -1; // should never happen if count of As == count of Bs
Basically, we run from left to right, and if a misplaced element is found, it gets exchanged with the correct element (e.g. abBbbbA** --> abAbbbB**) in O(1). At the same time swaps are counted as if the sequence of adjacent elements would be swapped instead. Variables i and j are used to cache indices of next A and B respectively, to make sure that all calls together of FindIndexOf are done in O(N).
If we need to sort by swaps then we cannot do better than O(N^2).
The rough idea is the following. Let's consider your sample: AAAABBBB. One of Bs needs O(N) swaps to get to the A B ... position, another B needs O(N) to get to A B A B ... position, etc. So we get O(N^2) at the end.
Observe that if any solution would swap two instances of the same letter, then we can find a better solution by dropping that swap, which necessarily has no effect. An optimal solution therefore only swaps differing letters.
Let's view the string of letters as an array of indices of one kind of letter (arbitrarily chosen, say A) into the string. So AAAABBBB would be represented as [0, 1, 2, 3] while ABABABAB would be [0, 2, 4, 6].
We know two instances of the same letter will never swap in an optimal solution. This lets us always safely identify the first (left-most) instance of A with the first element of our index array, the second instance with the second element, etc. It also tells us our array is always in sorted order at each step of an optimal solution.
Since each step of an optimal solution swaps differing letters, we know our index array evolves at each step only by incrementing or decrementing a single element at a time.
An initial string of length n = 2k will have an array representation A of length k. An optimal solution will transform this array to either
ODDS = [1, 3, 5, ... 2k]
or
EVENS = [0, 2, 4, ... 2k - 1]
Since we know in an optimal solution instances of a letter do not pass each other, we can conclude an optimal solution must spend min(abs(ODDS[0] - A[0]), abs(EVENS[0] - A[0])) swaps to put the first instance in correct position.
By realizing the EVENS or ODDS choice is made only once (not once per letter instance), and summing across the array, we can count the minimum number of needed swaps as
define count_swaps(length, initial, goal)
total = 0
for i from 0 to length - 1
total += abs(goal[i] - initial[i])
end
return total
end
define count_minimum_needed_swaps(k, A)
return min(count_swaps(k, A, EVENS), count_swaps(k, A, ODDS))
end
Notice the number of loop iterations implied by count_minimum_needed_swaps is 2 * k = n; it runs in O(n) time.
By noting which term is smaller in count_minimum_needed_swaps, we can also tell which of the two goal states is optimal.
Since you know N, you can simply write a loop that generates the values with no swaps needed.
#define N 4
char array[N + N];
for (size_t z = 0; z < N + N; z++)
{
array[z] = 'B' - ((z & 1) == 0);
}
return 0; // The number of swaps
#Nemo and #AlexD are right. The algorithm is order n^2. #Nemo misunderstood that we are looking for a reordering where two adjacent characters are not the same, so we can not use that if A is after B they are out of order.
Lets see the minimum number of swaps.
We dont care if our first character is A or B, because we can apply the same algorithm but using A instead of B and viceversa everywhere. So lets assume that the length of the word WORD_N is 2N, with N As and N Bs, starting with an A. (I am using length 2N to simplify the calculations).
What we will do is try to move the next B right to this A, without taking care of the positions of the other characters, because then we will have reduce the problem to reorder a new word WORD_{N-1}. Lets also assume that the next B is not just after A if the word has more that 2 characters, because then the first step is done and we reduce the problem to the next set of characters, WORD_{N-1}.
The next B should be as far as possible to be in the worst case, so it is after half of the word, so we need $N-1$ swaps to put this B after the A (maybe less than that). Then our word can be reduced to WORD_N = [A B WORD_{N-1}].
We se that we have to perform this algorithm as most N-1 times, because the last word (WORD_1) will be already ordered. Performing the algorithm N-1 times we have to make
N_swaps = (N-1)*N/2.
where N is half of the lenght of the initial word.
Lets see why we can apply the same algorithm for WORD_{N-1} also assuming that the first word is A. In this case it matters than the first word should be the same as in the already ordered pair. We can be sure that the first character in WORD_{N-1} is A because it was the character just next to the first character in our initial word, ant if it was B the first work can perform only a swap between these two words and or none and we will already have WORD_{N-1} starting with the same character than WORD_{N}, while the first two characters of WORD_{N} are different at the cost of almost 1 swap.
I think this answer is similar to the answer by phs, just in Haskell. The idea is that the resultant-indices for A's (or B's) are known so all we need to do is calculate how far each starting index has to move and sum the total.
Haskell code:
Prelude Data.List> let is = elemIndices 'B' "AAAABBBB"
in minimum
$ map (sum . zipWith ((abs .) . (-)) is) [[1,3..],[0,2..]]
6 --output

Resources