Minimum of sum of absolute values - algorithm

Problem statement:
There are 3 arrays A,B,C all filled with positive integers, and all the three arrays are of the same size.
Find min(|a-b|+|b-c|+|c-a|) where a is in A, b is in B, c is in C.
I worked on the problem the whole weekend. A friend told me that it can be done in linear time. I don't see how that could be possible.
How would you do it ?

Well, I think I can do it in O(n log n). I can only do O(n) if the arrays are initially sorted.
First, observe that you can permute a,b,c however you like without changing the value of the expression. So let x be the smallest of a,b,c; let y be the middle of the three; and let z be the maximum. Then note that the expression just equals 2*(z-x). (Edit: This is easy to see... Once you have the three numbers in order, x < y < z, the sum is just (y-x) + (z-y) + (z-x) which equals 2*(z-x))
Thus, all we are really trying to do is find three numbers such that the outer two are as close together as possible, with the other number "sandwiched" between them.
So start by sorting all three arrays in O(n log n). Maintain an index into each array; call these i, j, and k. Initialize all three to zero. Whichever index points to the smallest value, increment that index. That is, if A[i] is smaller than B[j] and C[k], increment i; if B[j] is smallest, increment j; if C[k] is smallest, increment k. Repeat, keeping track of |A[i]-B[j]| + |B[j]-C[k]| + |C[k]-A[i]| the whole time. The smallest value you observe during this march is your answer. (When the smallest of the three is at the end of its array, stop because you are done.)
At each step, you add one to exactly one index; but you can only do this n times for each array before hitting the end. So this is at most 3*n steps, which is O(n), which is less than O(n log n), meaning the total time is O(n log n). (Or just O(n) if you can assume the arrays are sorted.)
Sketch of a proof that this works: Suppose A[I], B[J], C[K] are the a, b, c that form the actual answer; i.e., they have the minimum |a-b|+|b-c|+|c-a|. Suppose further that a > b > c; the proof for the other cases is symmetric.
Lemma: During our march, we do not increment j past J until after we increment k past K. Proof: We always increment the index of the smallest element, and when k <= K, B[J] > C[k]. So when j=J and k <= K, B[j] is not the smallest element, so we do not increment j.
Now suppose we increment k past K before i reaches I. What do things look like just before we perform that increment? Well, C[k] is the smallest of the three at that moment, because we are about to increment k. A[i] is less than or equal to A[I], because i < I and A is sorted. Finally, j <= J because k <= K (by our Lemma), so B[j] is also less than A[I]. Taken together, this means our sum-of-abs-diff at this moment is less than 2*(c-a), which is a contradiction.
Thus, we do not increment k past K until i reaches I. Therefore, at some point during our march i=I and k=K. By our Lemma, at this point j is less than or equal to J. So at this point, either B[j] is less than the other two and j will get incremented; or B[j] is between the other two and our sum is just 2*(A[i]-C[k]), which is the right answer.
This proof is sloppy; in particular, it fails to explicitly account for the case where one or more of a,b,c are equal. But I think that detail can be worked out pretty easily.

I would write a really simple program like this:
#!/usr/bin/python
import sys, os, random
A = random.sample(range(100), 10)
B = random.sample(range(100), 10)
C = random.sample(range(100), 10)
minsum = sys.maxint
for a in A:
for b in B:
for c in C:
print 'checking with a=%d b=%d c=%d' % (a, b, c)
abcsum = abs(a - b) + abs(b - c) + abs(c - a)
if abcsum < minsum:
print 'found new low sum %d with a=%d b=%d c=%d' % (abcsum, a, b, c)
minsum = abcsum
And test it over and over until I saw some pattern emerge. The pattern I found here is what would be expected: the numbers that are closest together in each set, regardless of whether the numbers are "high" or "low", are those that produce the smallest minimum sum. So it becomes a nearest-number problem. For whatever that's worth, probably not much.

Related

Adding two arrays into one

I have two arrays (a and b) of size n, (positive whole numbers)
a= [a1…..an] b= [b1….bn]
I want to store them in array c, also an array of size n
c=[c1…..cn]
where I add one element from a plus one element from b (each used once) into c, lets say the first element in c is combining a1+b3
Quick example:
n=4 a=[a1,a2,a3,a4] b=[b1,b2,b3,b4]
one way could be:
c=[a1+b2,b3+a4,a2+b1,a3+b4]
The problem is that I want to add them in a way so that the elements in c become as evenly distributed as possible,
One ideal case would be that c came out as:
c=[5,5,5,5]
but the numbers in a and b might not match up so they become even, so I want it to come as close to even as possible.
I an trying to find a way so that the difference between the biggest number in c minus the smallest number in c (after being combined as evenly as I can) to be as small as possible. In my optimal example above that would be 5-5=0 which is most optimal since 0 is the smallest minimum difference I want to achieve. Some other case with other numbers might come out as 6-5=1, which might be the smallest I could get in that situation
My way of going would be to sort array a in ascending order and my array b in descending order,and then combining them with the same element that they are in. Im not sure if this is the best way or the fastest to do this in, I want my code (doing it with python) to be fast. I cant come up with a better way where I could distribute them more evenly,any clue if there are better ways to solve this problem? I really appreciate all advice I could get! Thank you
When trying to solve it in a way where one of the arrays is ascending, and the other one being descending, there might already exist an algorithm that solves it better that I have not thought of. Thank you for reading!
Your algorithm is both correct and fast. It is just proving it that is optimal which is tricky.
We can do this by proving the following two results.
Any other matching of a and b will lead to a maximum at least as big as yours.
Any other matching of a and b will lead to a minimum at least as small as yours.
And the conclusion is that any other matching must have a maximum-minimum at least as big as yours. From which yours must be optimal.
Now let's look at part 1. Sort a ascending, and b descending. Find the i such that c[i] = a[i] + b[i] is a maximum. Suppose that m is any other matching where we're matching up a[j] + b[m[j]]. Note that m[1], ..., m[n] is a permutation of 1, ..., n.
If a[i] + b[m[i]] >= a[i] + b[i] then part 1 is true..
If a[i] + b[m[i]] < a[i] + b[i] then b[m[i]] < b[i] and so we must have i < m[i]. Now there are n-i numbers in the range i+1, ..., n. m maps something out of that range into that range. Because m is a permutation, by the pigeonhole principle, m must map something in that range, out of that range.
In other words there must be a j > i such that m[j] <= i. But now a[i] <= a[j] and b[i] <= b[m[j]] and therefore a[i] + b[i] <= a[j] + b[m[j]]. And so part 1 is true again.
That concludes the proof of part 1.
The proof of part 2 is similar. Except now a[i] + b[i] is at a minimum, m[i] < i, there is a j < i with i <= m[j], a[j] <= a[i], b[m[j]] <= b[i], and a[j] + b[m[j]] <= a[i] + b[i].
And as noted, part 1 and part 2 together implies that you've minimized the difference between the minimum and maximum.

Minimize Sum of Absolute Difference of Two Arrays

I have two arrays of integers A and B both of size n. The cost of a pair is |A(i) - B(i)|.
I want to pair the n elements of A and B such that the sum of all costs across all A(i)s and B(i)s are minimized.
I understand that I can get O(n log n) by sorting A, then sorting B, and then pairing them together from 1...n respectively, but after attempting for hours and hours, I can't figure out how to prove it. Can somebody help me out?
I've seen how to implement it, I just don't get how to prove it
I am following a slightly different approach here to prove this fact by making use of squares rather than absolute.
Consider 2 arrays, A = [a1, a2, ..., an] and B = [b1, b2, ..., bn].
Now, even if I use random pairing (form a pair using any index from A and B ),
Let's say, the sum of squares of difference (S) = a1^2 + b1^2 + a2^2 + b2^2 + ... + an^2 + bn^2 - 2 * (a1 * b3 + a2 * b4 + .... + an * b56 + bn * a34).
The above sum can be represented as S = sum(ai^2) + sum(bi^2) - 2 * sum(ai*bi), for i goes from 1 to n.
To minimise this sum, we need to maximise the part sum(ai*bi), for i goes from 1 to n.
The term sum(ai*bi) will be maximum when the 2 arrays will be sorted.
Thanks for pointing out #Abhinav Mathur: The statement The term sum(ai*bi) will be maximum when the 2 arrays will be sorted can be proved using rearrangement inequality.
Assume that according to the current sorted arrays, there is a pair |x-a|, and another pair |y-b|. Let's say that switching the elements would give a lesser sum i.e. a more optimal solution.
(Note: while switching around two pairs, the rest of array remains unaffected).
Current total sum of pairs = |x-a| + |y-b|
Modified sum after switching pairs = |x-b| + |y-a|
Difference in sums = diff = |x-b| - |x-a| + |y-a| + |y-b|
If diff is negative, it means we have found a better ordering. If not, it means our original solution was better.
Now, you can take cases and analyse this. (Since the arrays are sorted, let x<y (they're from the first array) and a<b (they're from second array).
Case 1: x>b or y<a:
In this case, both sums will be equal, which can be easily seen by expanding the modulus
Case 2: a<x<b:
If y>b, diff = 2*(b-x). Since we assumed b>x, diff is positive.
If y<b, diff = 2*(y-x). Since y>x as stated earlier, diff is again positive.
You can continue taking similar cases and prove that diff will always be positive, meaning that our original ordering will be the most efficient one.
Sorting and pairing creates a matching that we might call "monotonic", which ensures that if A[i] matches B[x] and A[j] matches B[y], then:
If A[i] < A[j] then B[x] <= B[y]; and
If B[x] < B[y] then A[i] <= A[j]
If you choose a matching that is not monotonic, then one of these rules will be violated for some pair of matchings.
If we pick any two elements from both arrays such that A[i] <= A[j] and B[x] <= B[y], then we can evaluate the cost of the monotonic pairing and the other pairing. Note that if A[j] = A[j] or B[i] = B[j] then both pairings have the same cost so it doesn't matter which one we call monotonic.
In order to compare the costs, we need to get rid of the absolute value operations. We can do that by separately considering all the possible orderings between the 4 values:
Case: A[i] <= A[j] <= B[x] <= B[y]:
Monotonic cost: B[x]-A[i] + B[y]-A[j]
Swapped cost: B[y]-A[i] + B[x]-A[j]
Difference: 0
cost is the same - doesn't matter which we choose
Case: A[i] <= B[x] <= A[j] <= B[y]
Monotonic cost: B[x]-A[i] + B[i]-A[j]
other cost: B[y]-A[i] + A[j]-B[x]
Difference: 2A[j] - 2B[x]
since A[j] >= B[x], monotonic is as good or better
... etc
If you go through all 6 possible orderings, in every case you find that the monotonic matching is as good or better. Given any matching, you can make every pair of element matchings monotonic, and the cost can only go down.
If you start with an optimal matching and make every pair of matchings monotonic then you end up with an optimal monotonic matching. (In fact the one you start with has to be monotonic if it's optimal, but we don't have to prove that) Since every monotonic matching has the same cost, and at least one of them is optimal, they must all be optimal.

How was the assumption made on half will have i < j?

I have been reading "Cracking the Coding Interview 6th Edition".. On Chapter 0 - Big O, I have problem understanding an assumption made to a problem on Example 3.
void printUnorderedPairs(int[] array){
for(int i = 0; i < array.length; i++){
for(int j = i + 1; j < array.length; j++){
...
}
}
}
Under What It Means section, it assumed that:
There are N^2 total pairs. Roughly half of those will have i < j and the remaining half will have i > j. This code goes through roughly n^2/2 pairs so it does O(N^2) work.
My question is, how was the assumption made on Roughly half of those will have i < j and the remaining half will have i > j done? Can someone explain it to me please?
Thanks!
There are several ways you can try to think about this assumption, I quite like the "geometric" suggestion from #IanMercer in the comments. Here is another:
What is an unordered pair
An unordered pair is a pair of integers (i,j) where i and j is in the domain (1, N). (They can take any value from 1 to N).
How many pairs are there?
i can be of any value from 1 to N, and j can be of any value from 1 to N. Any combination of i forms a valid pair. So there are are N*N pairs.
Among all the pairs, how many pairs are there that i < j
Note that for any pair (a,b) where a is smaller than b, there exists a counterpart (b,a) (same values but flipped). So there is an equal amount of pairs where i<j as there are pairs 'i>j'.
So what is this confusing roughly part? It is because of all those N*N pairs there are some where neither i<j nor j>i, and those are precisely the N pairs where i==j.
The N*N pairs are thus divided into three parts (those where i < j), (those where j> i) and (those where i==j). Since first two are much larger O(N**2)/2 vs. the last group which has only N elements, we can state that roughly half have the property that i<j.

Finding best algorithm for sum of a section of an array's values

Given an array of n integers in the locations A[1], A[2], …, A[n], describe an O(n^2) time algorithm to
compute the sum A[i] + A[i+1] + … + A[j] for all i, j, 1 ≤ i < j ≤ n.
I've tried multiple ways of solving this problem but none have in O(n^2) time.
So for an array containing {1,2,3,4}
You would output:
1+2 = 3
1+2+3 = 6
1+2+3+4 = 10
2+3 = 5
2+3+4 = 9
3+4 = 7
The answer does not need to be in a specific language, pseudocode is preferred.
A good preperation is everything.
You could create an array of integrals:
I[0..n] = (0, I[0] + A[1], I[1] + A[2], ..., I[n-1]+A[n]);
This will cost you O(n) * O(1) (looping over all elements and doing one addition);
Now you can calculate each Sum(A, i, j) with just a single subtraction: I[j] - I[i-1];
so this has O(1)
Looping over all combinations of i and j with 1 <= (i,j) <= n has O(n^2).
So you end up with O(n) * O(1) + O(n^2) * O(1) = O(n^2) .
Edit:
Your array A starts at 1 - adapted to this - this also solves the little quirk with i-1
So the integral array I starts with index 0 and is 1 element larger than A
Edit:
First you'll maybe have thought about the most naive idea:
Naive idea
Create a function that for given values of i and of j will return the sum A[i] + ... + A[j].
function sumRange(A, i, j):
sum = 0
for k = i to j
sum = sum + A[k]
return sum
Then generate all pairs of i and j (with i < j) and call the above function for each pair:
for i = 1 to n
for j = i+1 to n
output sumRange(A, i, j)
This is not O(n²), because already the two loops on i and j represent O(n²) iterations, and then the function will perform yet another loop, making it O(n³).
Better idea
The above can be improved. Look at the repetition it performs. The sum that was calculated for given values of i and j could be reused to calculate the sum for when j has increased with 1, without starting from scratch and summing the values between i and (now) j-1 again, only to add that one more value to it.
We should just remember what the previous sum was, and add A[j] to it.
So without a separate function:
for i = 1 to n
sum = A[i]
for j = i+1 to n
sum = sum + A[j]
output sum
Note how the sum is not reset to 0 once it is output. It is preserved, so that when j is incremented, only one value needs to be added to it.
Now it is O(n²). Note also how it does not require an extra array for storage. It only needs the memory for a few variables (i, j, sum), so its space complexity is O(1).
As the number of sums you need to output is O(n²), there is no way to improve this time complexity any further.
NB: I assume here that single array values do not constitute a "sum". As you stated in your question, i < j, and also in your example you only showed sums of at least two array values. The above can be easily adapted to also include single value "sums" if ever that were needed.

Efficient Way to Find Pair Orderings?

Let's say I have three arrays a, b, and c of equal length N. The elements of each of these arrays come from a totally ordered set, but are not sorted. I also have two index variables, i and j. For all i != j, I want to count the number of index pairs such that a[i] < a[j], b[i] > b[j] and c[i] < c[j]. Is there any way this can be done in less than O(N ^ 2) time complexity, for example by creative use of sorting algorithms?
Notes: The inspiration for this question is that, if you only have two arrays, a and b, you can find the number of index pairs such that a[i] < a[j] and b[i] > b[j] in O(N log N) with a merge sort. I'm basically looking for a generalization to three arrays.
For simplicity, you may assume that no two elements of any array are equal (no ties).
By sorting the array a and rearranging the arrays b and c at the same time, we can suppose that a[i] < a[j] <=> i < j. So we need to find the number of pairs (i,j) such that i < j, b[i] > b[j] and c[i] < c[j]. Let's view (b[i], c[i]) as a point on a plane. We add the points one by one. Each time we add a point (b[j], c[j]), first we count the number of already added points (i < j) such that b[i] > b[j] and c[i] < c[j]. Then we add the point j and proceed to the next one. The sum of the numbers obtained at each step is our result.
Now it seems that this kind of queries can be fulfilled by two-dimensional segment tree: http://en.wikipedia.org/wiki/Segment_tree The cost of one iteration will be O(log^2 n), and the total complexity is O(n log^2 n).
(Note that I assume here that the elements of arrays are numbers. It's OK, because using a sorting we can always replace the elements of an array with numbers from 1 to n so that the order was preserved.)
Edit: In fact, a simpler structure called Fenwick tree or binary indexed tree is sufficient. See this link: http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees#2d

Resources