How was the assumption made on half will have i < j? - algorithm

I have been reading "Cracking the Coding Interview 6th Edition".. On Chapter 0 - Big O, I have problem understanding an assumption made to a problem on Example 3.
void printUnorderedPairs(int[] array){
for(int i = 0; i < array.length; i++){
for(int j = i + 1; j < array.length; j++){
...
}
}
}
Under What It Means section, it assumed that:
There are N^2 total pairs. Roughly half of those will have i < j and the remaining half will have i > j. This code goes through roughly n^2/2 pairs so it does O(N^2) work.
My question is, how was the assumption made on Roughly half of those will have i < j and the remaining half will have i > j done? Can someone explain it to me please?
Thanks!

There are several ways you can try to think about this assumption, I quite like the "geometric" suggestion from #IanMercer in the comments. Here is another:
What is an unordered pair
An unordered pair is a pair of integers (i,j) where i and j is in the domain (1, N). (They can take any value from 1 to N).
How many pairs are there?
i can be of any value from 1 to N, and j can be of any value from 1 to N. Any combination of i forms a valid pair. So there are are N*N pairs.
Among all the pairs, how many pairs are there that i < j
Note that for any pair (a,b) where a is smaller than b, there exists a counterpart (b,a) (same values but flipped). So there is an equal amount of pairs where i<j as there are pairs 'i>j'.
So what is this confusing roughly part? It is because of all those N*N pairs there are some where neither i<j nor j>i, and those are precisely the N pairs where i==j.
The N*N pairs are thus divided into three parts (those where i < j), (those where j> i) and (those where i==j). Since first two are much larger O(N**2)/2 vs. the last group which has only N elements, we can state that roughly half have the property that i<j.

Related

Finding best algorithm for sum of a section of an array's values

Given an array of n integers in the locations A[1], A[2], …, A[n], describe an O(n^2) time algorithm to
compute the sum A[i] + A[i+1] + … + A[j] for all i, j, 1 ≤ i < j ≤ n.
I've tried multiple ways of solving this problem but none have in O(n^2) time.
So for an array containing {1,2,3,4}
You would output:
1+2 = 3
1+2+3 = 6
1+2+3+4 = 10
2+3 = 5
2+3+4 = 9
3+4 = 7
The answer does not need to be in a specific language, pseudocode is preferred.
A good preperation is everything.
You could create an array of integrals:
I[0..n] = (0, I[0] + A[1], I[1] + A[2], ..., I[n-1]+A[n]);
This will cost you O(n) * O(1) (looping over all elements and doing one addition);
Now you can calculate each Sum(A, i, j) with just a single subtraction: I[j] - I[i-1];
so this has O(1)
Looping over all combinations of i and j with 1 <= (i,j) <= n has O(n^2).
So you end up with O(n) * O(1) + O(n^2) * O(1) = O(n^2) .
Edit:
Your array A starts at 1 - adapted to this - this also solves the little quirk with i-1
So the integral array I starts with index 0 and is 1 element larger than A
Edit:
First you'll maybe have thought about the most naive idea:
Naive idea
Create a function that for given values of i and of j will return the sum A[i] + ... + A[j].
function sumRange(A, i, j):
sum = 0
for k = i to j
sum = sum + A[k]
return sum
Then generate all pairs of i and j (with i < j) and call the above function for each pair:
for i = 1 to n
for j = i+1 to n
output sumRange(A, i, j)
This is not O(n²), because already the two loops on i and j represent O(n²) iterations, and then the function will perform yet another loop, making it O(n³).
Better idea
The above can be improved. Look at the repetition it performs. The sum that was calculated for given values of i and j could be reused to calculate the sum for when j has increased with 1, without starting from scratch and summing the values between i and (now) j-1 again, only to add that one more value to it.
We should just remember what the previous sum was, and add A[j] to it.
So without a separate function:
for i = 1 to n
sum = A[i]
for j = i+1 to n
sum = sum + A[j]
output sum
Note how the sum is not reset to 0 once it is output. It is preserved, so that when j is incremented, only one value needs to be added to it.
Now it is O(n²). Note also how it does not require an extra array for storage. It only needs the memory for a few variables (i, j, sum), so its space complexity is O(1).
As the number of sums you need to output is O(n²), there is no way to improve this time complexity any further.
NB: I assume here that single array values do not constitute a "sum". As you stated in your question, i < j, and also in your example you only showed sums of at least two array values. The above can be easily adapted to also include single value "sums" if ever that were needed.

write a number as sum of a consecutive primes

How to check if n can be partitioned to sum of a sequence of consecutive prime numbers.
For example, 12 is equal to 5+7 which 5 and 7 are consecutive primes, but 20 is equal to 3+17 which 3 and 17 are not consecutive.
Note that, repetition is not allowed.
My idea is to find and list all primes below n, then use 2 loops to sum all primes. The first 2 numbers, second 2 numbers, third 2 numbers etc. and then first 3 numbers, second 3 numbers and so far. But it takes lot of time and memory.
Realize that a consecutive list of primes is defined only by two pieces of information, the starting and the ending prime number. You just have to find these two numbers.
I assume that you have all the primes at your disposal, sorted in the array called primes. Keep three variables in memory: sum which initially is 2 (the smallest prime), first_index and last_index which are initially 0 (index of the smallest prime in array primes).
Now you have to "tweak" these two indices, and "travel" the array along the way in the loop:
If sum == n then finish. You have found your sequence of primes.
If sum < n then enlarge the list by adding next available prime. Increment last_index by one, and then increment sum by the value of new prime, which is primes[last_index]. Repeat the loop. But if primes[last_index] is larger than n then there is no solution, and you must finish.
If sum > n then reduce the list by removing the smallest prime from the list. Decrement sum by that value, which is primes[first_index], and then increment first_index by one. Repeat the loop.
Dialecticus's algorithm is the classic O(m)-time, O(1)-space way to solve this type of problem (here I'll use m to represent the number of prime numbers less than n). It doesn't depend on any mysterious properties of prime numbers. (Interestingly, for the particular case of prime numbers, AlexAlvarez's algorithm is also linear time!) Dialecticus gives a clear and correct description, but seems at a loss to explain why it is correct, so I'll try to do this here. I really think it's valuable to take the time to understand this particular algorithm's proof of correctness: although I had to read a number of explanations before it finally "sank in", it was a real "Aha!" moment when it did! :) (Also, problems that can be efficiently solved in the same manner crop up quite a lot.)
The candidate solutions this algorithm tries can be represented as number ranges (i, j), where i and j are just the indexes of the first and last prime number in a list of prime numbers. The algorithm gets its efficiency by ruling out (that is, not considering) sets of number ranges in two different ways. To prove that it always gives the right answer, we need to show that it never rules out the only range with the right sum. To that end, it suffices to prove that it never rules out the first (leftmost) range with the right sum, which is what we'll do here.
The first rule it applies is that whenever we find a range (i, j) with sum(i, j) > n, we rule out all ranges (i, k) having k > j. It's easy to see why this is justified: the sum can only get bigger as we add more terms, and we have determined that it's already too big.
The second, trickier rule, crucial to the linear time complexity, is that whenever we advance the starting point of a range (i, j) from i to i+1, instead of "starting again" from (i+1, i+1), we start from (i+1, j) -- that is, we avoid considering (i+1, k) for all i+1 <= k < j. Why is it OK to do this? (To put the question the other way: Couldn't it be that doing this causes us to skip over some range with the right sum?)
[EDIT: The original version of the next paragraph glossed over a subtlety: we might have advanced the range end point to j on any previous step.]
To see that it never skips a valid range, we need to think about the range (i, j-1). For the algorithm to advance the starting point of the current range, so that it changes from (i, j) to (i+1, j), it must have been that sum(i, j) > n; and as we will see, to get to a program state in which the range (i, j) is being considered in the first place, it must have been that sum(i, j-1) < n. That second claim is subtle, because there are two different ways to arrive in such a program state: either we just incremented the end point, meaning that the previous range was (i, j-1) and this range was found to be too small (in which case our desired property sum(i, j-1) < n obviously holds); or we just incremented the start point after considering (i-1, j) and finding it to be too large (in which case it's not obvious that the property still holds).
What we do know, however, is that regardless of whether the end point was increased from j-1 to j on the previous step, it was definitely increased at some time before the current step -- so let's call the range that triggered this end point increase (k, j-1). Clearly sum(k, j-1) < n, since this was (by definition) the range that caused us to increase the end point from j-1 to j; and just as clearly k <= i, since we only process ranges in increasing order of their start points. Since i >= k, sum(i, j-1) is just the same as sum(k, j-1) but with zero or more terms removed from the left end, and all of these terms are positive, so it must be that sum(i, j-1) <= sum(k, j-1) < n.
So we have established that whenever we increase i to i+1, we know that sum(i, j-1) < n. To finish the analysis of this rule, what we (again) need to make use of is that dropping terms from either end of this sum can't make it any bigger. Removing the first term leaves us with sum(i+1, j-1) <= sum(i, j-1) < n. Starting from that sum and successively removing terms from the other end leaves us with sum(i+1, j-2), sum(i+1, j-3), ..., sum(i+1, i+1), all of which we know must be less than n -- that is, none of the ranges corresponding to these sums can be valid solutions. Therefore we can safely avoid considering them in the first place, and that's exactly what the algorithm does.
One final potential stumbling block is that it might seem that, since we are advancing two loop indexes, the time complexity should be O(m^2). But notice that every time through the loop body, we advance one of the indexes (i or j) by one, and we never move either of them backwards, so if we are still running after 2m loop iterations we must have i + j = 2m. Since neither index can ever exceed m, the only way for this to hold is if i = j = m, which means that we have reached the end: i.e. we are guaranteed to terminate after at most 2m iterations.
The fact that primes have to be consecutive allows to solve quite efficiently this problem in terms of n. Let me suppose that we have previously computed all the primes less or equal than n. Therefore, we can easily compute sum(i) as the sum of the first i primes.
Having this function precomputed, we can loop over the primes less or equal than n and see whether there exists a length such that starting with that prime we can sum up to n. But notice that for a fixed starting prime, the sequence of sums is monotone, so we can binary search over the length.
Thus, let k be the number of primes less or equal than n. Precomputing the sums has cost O(k) and the loop has cost O(klogk), dominating the cost. Using the Prime number theorem, we know that k = O(n/logn), and then the whole algorithm has cost O(n/logn log(n/logn)) = O(n).
Let me put a code in C++ to make it clearer, hope there are not bugs:
#include <iostream>
#include <vector>
using namespace std;
typedef long long ll;
int main() {
//Get the limit for the numbers
int MAX_N;
cin >> MAX_N;
//Compute the primes less or equal than MAX_N
vector<bool> is_prime(MAX_N + 1, true);
for (int i = 2; i*i <= MAX_N; ++i) {
if (is_prime[i]) {
for (int j = i*i; j <= MAX_N; j += i) is_prime[j] = false;
}
}
vector<int> prime;
for (int i = 2; i <= MAX_N; ++i) if (is_prime[i]) prime.push_back(i);
//Compute the prefixed sums
vector<ll> sum(prime.size() + 1, 0);
for (int i = 0; i < prime.size(); ++i) sum[i + 1] = sum[i] + prime[i];
//Get the number of queries
int n_queries;
cin >> n_queries;
for (int z = 1; z <= n_queries; ++z) {
int n;
cin >> n;
//Solve the query
bool found = false;
for (int i = 0; i < prime.size() and prime[i] <= n and not found; ++i) {
//Do binary search over the lenght of the sum:
//For all x < ini, [i, x] sums <= n
int ini = i, fin = int(prime.size()) - 1;
while (ini <= fin) {
int mid = (ini + fin)/2;
int value = sum[mid + 1] - sum[i];
if (value <= n) ini = mid + 1;
else fin = mid - 1;
}
//Check the candidate of the binary search
int candidate = ini - 1;
if (candidate >= i and sum[candidate + 1] - sum[i] == n) {
found = true;
cout << n << " =";
for (int j = i; j <= candidate; ++j) {
cout << " ";
if (j > i) cout << "+ ";
cout << prime[j];
}
cout << endl;
}
}
if (not found) cout << "No solution" << endl;
}
}
Sample input:
1000
5
12
20
28
17
29
Sample output:
12 = 5 + 7
No solution
28 = 2 + 3 + 5 + 7 + 11
17 = 2 + 3 + 5 + 7
29 = 29
I'd start by noting that for a pair of consecutive primes to sum to the number, one of the primes must be less than N/2, and the other prime must be greater than N/2. For them to be consecutive primes, they must be the primes closest to N/2, one smaller and the other larger.
If you're starting with a table of prime numbers, you basically do a binary search for N/2. Look at the primes immediately larger and smaller than that. Add those numbers together and see if they sum to your target number. If they don't, then it can't be the sum of two consecutive primes.
If you don't start with a table of primes, it works out pretty much the same way--you still start from N/2 and find the next larger prime (we'll call that prime1). Then you subtract N-prime1 to get a candidate for prime2. Check if that's prime, and if it is, search the range prime2...N/2 for other primes to see if there was a prime in between. If there's a prime in between your number is a sum of non-consecutive primes. If there's no other prime in that range, then it is a sum of consecutive primes.
The same basic idea applies for sequences of 3 or more primes, except that (of course) your search starts from N/3 (or whatever number of primes you want to sum to get to the number).
So, for three consecutive primes to sum to N, 2 of the three must be the first prime smaller than N/3 and the first prime larger than N/3. So, we start by finding those, then compute N-(prime1+prime2). That gives use our third candidate. We know these three numbers sum to N. We still need to prove that this third number is a prime. If it is prime, we need to verify that it's consecutive to the other two.
To give a concrete example, for 10 we'd start from 3.333. The next smaller prime is 3 and the next larger is 5. Those add to 8. 10-8 = 2. 2 is prime and consecutive to 3, so we've found the three consecutive primes that add to 10.
There are some other refinements you can make as well. The most obvious would be based on the fact that all primes (other than 2) are odd numbers. Therefore (assuming we can ignore 2), an even number can only be the sum of an even number of primes, and an odd number can only be a sum of an odd number of primes. So, given 123456789, we know immediately that it can't possibly be the sum of 2 (or 4, 6, 8, 10, ...) consecutive primes, so the only candidates to consider are 3, 5, 7, 9, ... primes. Of course, the opposite works as well: given, say, 12345678, the simple fact that it's even lets us immediately rule out the possibility that it could be the sum of 3, 5, 7 or 9 consecutive primes; we only need to consider sequences of 2, 4, 6, 8, ... primes. We violate this basic rule only when we get to a large enough number of primes that we could include 2 as part of the sequence.
I haven't worked through the math to figure out exactly how many that would be be for a given number, but I'm pretty sure it should be fairly easy and it's something we want to know anyway (because it's the upper limit on the number of consecutive primes to look for for a given number). If we use M for the number of primes, the limit should be approximately M <= sqrt(N), but that's definitely only an approximation.
I know that this question is a little old, but I cannot refrain from replying to the analysis made in the previous answers. Indeed, it has been emphasized that all the three proposed algorithms have a run-time that is essentially linear in n. But in fact, it is not difficult to produce an algorithm that runs at a strictly smaller power of n.
To see how, let us choose a parameter K between 1 and n and suppose that the primes we need are already tabulated (if they must be computed from scratch, see below). Then, here is what we are going to do, to search a representation of n as a sum of k consecutive primes:
First we search for k<K using the idea present in the answer of Jerry Coffin; that is, we search k primes located around n/k.
Then to explore the sums of k>=K primes we use the algorithm explained in the answer of Dialecticus; that is, we begin with a sum whose first element is 2, then we advance the first element one step at a time.
The first part, that concerns short sums of big primes, requires O(log n) operations to binary search one prime close to n/k and then O(k) operations to search for the other k primes (there are a few simple possible implementations). In total this makes a running time
R_1=O(K^2)+O(Klog n).
The second part, that is about long sums of small primes, requires us to consider sums of consecutive primes p_1<\dots<p_k where the first element is at most n/K.
Thus, it requires to visit at most n/K+K primes (one can actually save a log factor by a weak version of the prime number theorem). Since in the algorithm every prime is visited at most O(1) times, the running time is
R_2=O(n/K) + O(K).
Now, if log n < K < \sqrt n we have that the first part runs with O(K^2) operations and the second part runs in O(n/K). We optimize with the choice K=n^{1/3}, so that the overall running time is
R_1+R_2=O(n^{2/3}).
If the primes are not tabulated
If we also have to find the primes, here is how we do it.
First we use Erathostenes, that in C_2=O(T log log T) operations finds all the primes up to T, where T=O(n/K) is the upper bound for the small primes visited in the second part of the algorithm.
In order to perform the first part of the algorithm we need, for every k<K, to find O(k) primes located around n/k. The Riemann hypothesis implies that there are at least k primes in the interval [x,x+y] if y>c log x (k+\sqrt x) for some constant c>0. Therefore a priori we need to find the primes contained in an interval I_k centered at n/k with width |I_k|= O(k log n)+O(\sqrt {n/k} log n).
Using the sieve Eratosthenes to sieve the interval I_k requires O(|I_k|log log n) + O(\sqrt n) operations. If k<K<\sqrt n we get a time complexity C_1=O(\sqrt n log n log log n) for every k<K.
Summing up, the time complexity C_1+C_2+R_1+R_2 is maximized when
K = n^{1/4} / (log n \sqrt{log log n}).
With this choice have the sublinear time complexity
R_1+R_2+C_1+C_2 = O(n^{3/4}\sqrt{log log n}.
If we do not assume the Riemann Hypothesis we will have to search on larger intervals, but we still get at the end a sublinear time complexity. If instead we assume stronger conjectures on prime gaps, we may only need to search on intervals I_k with width |I_k|=k (log n)^A for some A>0. Then, instead of Erathostenes, we can use other deterministic primality tests. For example, suppose that you can test a single number for primality in O((log n)^B) operations, for some B>0.
Then you can search the interval I_k in O(k(log n)^{A+B}) operations. In this case the optimal K is still K\approx n^{1/3}, up to logarithmic factors, and so the total complexity is O(n^{2/3}(log n)^D for some D>0.

Find ascending triples in a list

I encountered the problem in a programming interview and have no idea about it by now.
A list whose length is n, the elements in it are all positive integers without order.
To find out all possible triples (a, b, c), that a < b < c, and a appears before b and b before c in the list.
And analyse the time complexity of your algorithm.
No general algorithm can be faster than O(n^3), as given a sorted input of distinct elements then the output will have size O(n^3), so just to produce the output will take time proportional. Indeed even a randomly generated list of integers will already have n^3 triples up until constant factors.
That given you could simply iterate over all possible triples in list order, and compare them for sorted order. This naive solution is already the best it can be asymptotically (that is O(n^3))
for (int i = 0; i < n; i++)
for (int j = i+1; j < n; j++)
for (int k = j+1; k < n; k++)
if (X[i] < X[j] && X[j] < X[k)
output(X[i],X[j],X[k])
I suspect you may have a transcription error in your problem statement - or else the question is supposed to be a very easy short coding exercise.
If it is known that there are only a small set of triples (say k), then you may prefer to find all the triples by storing pointers to the previous smallest element.
ALGORITHM
Prepare an empty data structure (possible choices described later).
Prepare an empty array B of length n.
Then for each element c in the list:
Store the index (in the array B) of the most recent element in the list that is smaller than c (if it exists) using the data structure.
Store c (and its index in the original list) in the data structure
Then use array B to find all elements b smaller than c, and then again to find all elements a smaller than b, and emit all these combinations as output triples.
DATA STRUCTURE
The data structure needs to be able to store value,position pairs to make it easy to find the largest position (i.e. most recent) over all elements with value less than c.
One easy way to do this if the range of allowed values is fairly small is to use a series of arrays where A[k][x] stores the maximum position for all elements in the range [x*2^k,(x+1)*2^k).
If the values have up to M bits (i.e. the values are in the range 0 to 2^M-1) then updating or accessing this data structure are both O(M) operations.
COMPLEXITY
The given method is O(nM+k).
If the values have a larger range, then you could use a form of binary search tree instead of the series of arrays, or instead sort the values and replace the values with their ordinal value. This would then have complexity O(nlogn+k).
COUNTING TRIPLES
If you just wish to know the total number of triples of this form then you can do this in O(n).
The idea is similar to before:
Find the most recent smaller element for each index, and the count of smaller elements for each index
Find the next greater element for each index, and the count of greater elements
Compute the sum of the product of the count of smaller elements and the count of larger elements for each index.
To make this O(n) we need to be able to find the next greater element in O(n). This can be done by:
Push the current index i to the stack
while A[top(stack)] < A[i+1], pop an index x off the stack and store NGE[x]=i+1
increment i and return to step 1
We also need to be able to find the count of greater elements in O(n). Once the NGE array has been prepared, we can find the counts by iterating backwards over the array and computing
count_greater_elements[i] = count_greater_elements[ NGE[i] ] + 1 if NGE[i] is defined
= 0 otherwise
The most recent smaller elements and counts can be computed in an analogous way.
N^2 solution for general case (to count all such triples, not output them all; output will take n^3 just because of its size):
for each number X in array lets count amount of numbers less than X with indexes less than x and amount of numbers greater than X with indexes greater than X. Than for each X we can get number of triples in which X is the middle element just as less[X] * greater[X]. The answer is sum of such products.
int calc(vector<int> numbers) {
int n = numbers.size();
vector<int> less(n), more(n);
for (int i = 0; i < n; i++)
for (int j = i + 1; j < n; j++)
if (numbers[i] < numbers[j])
less[j]++, more[i]++;
int res = 0;
for (int i = 0; i < n; i++)
res += less[i] * more[i];
return res;
}

Minimum of sum of absolute values

Problem statement:
There are 3 arrays A,B,C all filled with positive integers, and all the three arrays are of the same size.
Find min(|a-b|+|b-c|+|c-a|) where a is in A, b is in B, c is in C.
I worked on the problem the whole weekend. A friend told me that it can be done in linear time. I don't see how that could be possible.
How would you do it ?
Well, I think I can do it in O(n log n). I can only do O(n) if the arrays are initially sorted.
First, observe that you can permute a,b,c however you like without changing the value of the expression. So let x be the smallest of a,b,c; let y be the middle of the three; and let z be the maximum. Then note that the expression just equals 2*(z-x). (Edit: This is easy to see... Once you have the three numbers in order, x < y < z, the sum is just (y-x) + (z-y) + (z-x) which equals 2*(z-x))
Thus, all we are really trying to do is find three numbers such that the outer two are as close together as possible, with the other number "sandwiched" between them.
So start by sorting all three arrays in O(n log n). Maintain an index into each array; call these i, j, and k. Initialize all three to zero. Whichever index points to the smallest value, increment that index. That is, if A[i] is smaller than B[j] and C[k], increment i; if B[j] is smallest, increment j; if C[k] is smallest, increment k. Repeat, keeping track of |A[i]-B[j]| + |B[j]-C[k]| + |C[k]-A[i]| the whole time. The smallest value you observe during this march is your answer. (When the smallest of the three is at the end of its array, stop because you are done.)
At each step, you add one to exactly one index; but you can only do this n times for each array before hitting the end. So this is at most 3*n steps, which is O(n), which is less than O(n log n), meaning the total time is O(n log n). (Or just O(n) if you can assume the arrays are sorted.)
Sketch of a proof that this works: Suppose A[I], B[J], C[K] are the a, b, c that form the actual answer; i.e., they have the minimum |a-b|+|b-c|+|c-a|. Suppose further that a > b > c; the proof for the other cases is symmetric.
Lemma: During our march, we do not increment j past J until after we increment k past K. Proof: We always increment the index of the smallest element, and when k <= K, B[J] > C[k]. So when j=J and k <= K, B[j] is not the smallest element, so we do not increment j.
Now suppose we increment k past K before i reaches I. What do things look like just before we perform that increment? Well, C[k] is the smallest of the three at that moment, because we are about to increment k. A[i] is less than or equal to A[I], because i < I and A is sorted. Finally, j <= J because k <= K (by our Lemma), so B[j] is also less than A[I]. Taken together, this means our sum-of-abs-diff at this moment is less than 2*(c-a), which is a contradiction.
Thus, we do not increment k past K until i reaches I. Therefore, at some point during our march i=I and k=K. By our Lemma, at this point j is less than or equal to J. So at this point, either B[j] is less than the other two and j will get incremented; or B[j] is between the other two and our sum is just 2*(A[i]-C[k]), which is the right answer.
This proof is sloppy; in particular, it fails to explicitly account for the case where one or more of a,b,c are equal. But I think that detail can be worked out pretty easily.
I would write a really simple program like this:
#!/usr/bin/python
import sys, os, random
A = random.sample(range(100), 10)
B = random.sample(range(100), 10)
C = random.sample(range(100), 10)
minsum = sys.maxint
for a in A:
for b in B:
for c in C:
print 'checking with a=%d b=%d c=%d' % (a, b, c)
abcsum = abs(a - b) + abs(b - c) + abs(c - a)
if abcsum < minsum:
print 'found new low sum %d with a=%d b=%d c=%d' % (abcsum, a, b, c)
minsum = abcsum
And test it over and over until I saw some pattern emerge. The pattern I found here is what would be expected: the numbers that are closest together in each set, regardless of whether the numbers are "high" or "low", are those that produce the smallest minimum sum. So it becomes a nearest-number problem. For whatever that's worth, probably not much.

Efficient Way to Find Pair Orderings?

Let's say I have three arrays a, b, and c of equal length N. The elements of each of these arrays come from a totally ordered set, but are not sorted. I also have two index variables, i and j. For all i != j, I want to count the number of index pairs such that a[i] < a[j], b[i] > b[j] and c[i] < c[j]. Is there any way this can be done in less than O(N ^ 2) time complexity, for example by creative use of sorting algorithms?
Notes: The inspiration for this question is that, if you only have two arrays, a and b, you can find the number of index pairs such that a[i] < a[j] and b[i] > b[j] in O(N log N) with a merge sort. I'm basically looking for a generalization to three arrays.
For simplicity, you may assume that no two elements of any array are equal (no ties).
By sorting the array a and rearranging the arrays b and c at the same time, we can suppose that a[i] < a[j] <=> i < j. So we need to find the number of pairs (i,j) such that i < j, b[i] > b[j] and c[i] < c[j]. Let's view (b[i], c[i]) as a point on a plane. We add the points one by one. Each time we add a point (b[j], c[j]), first we count the number of already added points (i < j) such that b[i] > b[j] and c[i] < c[j]. Then we add the point j and proceed to the next one. The sum of the numbers obtained at each step is our result.
Now it seems that this kind of queries can be fulfilled by two-dimensional segment tree: http://en.wikipedia.org/wiki/Segment_tree The cost of one iteration will be O(log^2 n), and the total complexity is O(n log^2 n).
(Note that I assume here that the elements of arrays are numbers. It's OK, because using a sorting we can always replace the elements of an array with numbers from 1 to n so that the order was preserved.)
Edit: In fact, a simpler structure called Fenwick tree or binary indexed tree is sufficient. See this link: http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees#2d

Resources