Maximizing number of factors contributing in the sum of sorted array bounded by a value - algorithm

I have a sorted array of integers of size n. These values are not unique. What I need to do is
: Given a B, I need to find an i<A[n] such that the sum of |A[j:1 to n]-i| is lesser than B and to that particular sum contribute the biggest number of A[j]s. I have some ideas but I can't seem to find anything better from the naive n*B and n*n algorithm. Any ideas about O(nlogn) or O(n) ?
For example: Imagine
A[n] = 1 2 10 10 12 14 and B<7 then the best i is 12 cause I achieve having 4 A[j]s contribute to my sum. 10 and 11 are also equally good i's cause if i=10 I got 10 - 10 + 10 - 10 +12-10 + 14-10 = 6<7

A solution in O(n) : start from the end and compute a[n]-a[n-1] :
let d=14-12 => d=2 and r=B-d => r=5,
then repeat the operation but multiplying d by 2:
d=12-10 => d=2 and r=r-2*d => r=1,
r=1 end of the algorithm because the sum must be less than B:
with a array indexed 0..n-1
i=1
r=B
while(r>0 && n-i>1) {
d=a[n-i]-a[n-i-1];
r-=i*d;
i++;
}
return a[n-i+1];
maybe a drawing explains better
14 x
13 x -> 2
12 xx
11 xx -> 2*2
10 xxxx -> 3*0
9 xxxx
8 xxxx
7 xxxx
6 xxxx
5 xxxx
4 xxxxx
3 xxxxx
2 xxxxxx
1 xxxxxxx

I think you can do it in O(n) using these three tricks:
CUMULATIVE SUM
Precompute an array C[k] that stores sum(A[0:k]).
This can be done recursively via C[k]=C[k-1]+A[k] in time O(n).
The benefit of this array is that you can then compute sum(A[a:b]) via C[b]-C[a-1].
BEST MIDPOINT
Because your elements are sorted, then it is easy to compute the best i to minimise the sum of absolute values. In fact, the best i will always be given by the middle entry.
If the length of the list is even, then all values of i between the two central elements will always give the minimum absolute value.
e.g. for your list 10,10,12,14 the central elements are 10 and 12, so any value for i between 10 and 12 will minimise the sum.
ITERATIVE SEARCH
You can now scan over the elements a single time to find the best value.
1. Init s=0,e=0
2. if the score for A[s:e] is less than B increase e by 1
3. else increase s by 1
4. if e<n return to step 2
Keep track of the largest value for e-s seen which has a score < B and this is your answer.
This loop can go around at most 2n times so it is O(n).
The score for A[s:e] is given by sum |A[s:e]-A[(s+e)/2]|.
Let m=(s+e)/2.
score = sum |A[s:e]-A[(s+e)/2]|
= sum |A[s:e]-A[m]|
= sum (A[m]-A[s:m]) + sum (A[m+1:e]-A[m])
= (m-s+1)*A[m]-sum(A[s:m]) + sum(A[m+1:e])-(e-m)*A[m]
and we can compute the sums in this expression using the precomputed array C[k].
EDIT
If the endpoint must always be n, then you can use this alternative algorithm:
1. Init s=0,e=n
2. while the score for A[s:e] is greater than B, increase s by 1
PYTHON CODE
Here is a python implementation of the algorithm:
def fast(A,B):
C=[]
t=0
for a in A:
t+=a
C.append(t)
def fastsum(s,e):
if s==0:
return C[e]
else:
return C[e]-C[s-1]
def fastscore(s,e):
m=(s+e)//2
return (m-s+1)*A[m]-fastsum(s,m)+fastsum(m+1,e)-(e-m)*A[m]
s=0
e=0
best=-1
while e<len(A):
if fastscore(s,e)<B:
best=max(best,e-s+1)
e+=1
elif s==e:
e+=1
else:
s+=1
return best
print fast([1,2,10,10,12,14],7)
# this returns 4, as the 4 elements 10,10,12,14 can be chosen

Try it this way for an O(N) with N size of array approach:
minpos = position of closest value to B in array (binary search, O(log(N))
min = array[minpos]
if (min >= B) EXIT, no solution
// now, we just add the smallest elements from the left or the right
// until we are greater than B
leftindex = minpos - 1
rightindex = minpos + 1
while we have a valid leftindex or valid rightindex:
add = min(abs(array[leftindex (if valid)]-B), abs(array[rightindex (if valid)]-B))
if (min + add >= B)
break
min += add
decrease leftindex or increase rightindex according to the usage
min is now our sum, rightindex the requested i (leftindex the start)
(It could happen that some indices are not correct, this is just the idea, not the implementation)
I would guess, the average case for small b is O(log(N)). The linear case only happens if we can use the whole array.
Im not sure, but perhaps this can be done in O(log(N)*k) with N size of array and k < N, too. We have to use the bin search in a clever way to find leftindex and rightindex in every iteration, such that the possible result range gets smaller in every iteration. This could be easily done, but we have to take care of duplicates, because they could destroy our bin search reductions.

Related

Maximum Value taken by thief

Consider we have a sacks of gold and thief wants to get the maximum gold. Thief can take the gold to get maximum by,
1) Taking the Gold from contiguous sacks.
2) Thief should take the same amount of gold from all sacks.
N Sacks 1 <= N <= 1000
M quantity of Gold 0 <= M <= 100
Sample Input1:
3 0 5 4 4 4
Output:
16
Explanation:
4 is the minimum amount he can take from the sacks 3 to 6 to get the maximum value of 16.
Sample Input2:
2 4 3 2 1
Output:
8
Explanation:
2 is the minimum amount he can take from the sacks 1 to 4 to get the maximum value of 8.
I approached the problem using subtracting the values from array and taking the transition point from negative to positive, but this doesn't solves the problem.
EDIT: code provided by OP to find the index:
int temp[6];
for(i=1;i<6;i++){
for(j=i-1; j>=0;j--) {
temp[j] = a[j] - a[i];
}
}
for(i=0;i<6;i++){
if(temp[i]>=0) {
index =i;
break;
}
}
The best amount of gold (TBAG) taken from every sack is equal to weight of some sack. Let's put indexes of candidates in a stack in order.
When we meet heavier weight (than stack contains), it definitely continues "good sequence", so we just add its index to the stack.
When we meet lighter weight (than stack top), it breaks some "good sequences" and we can remove heavier candidates from the stack - they will not have chance to be TBAG later. Remove stack top until lighter weight is met, calculate potentially stolen sum during this process.
Note that stack always contains indexes of strictly increasing sequence of weights, so we don't need to consider items before index at the stack top (intermediate AG) in calculation of stolen sum (they will be considered later with another AG value).
for idx in Range(Sacks):
while (not Stack.Empty) and (Sacks[Stack.Peek] >= Sacks[idx]): //smaller sack is met
AG = Sacks[Stack.Pop]
if Stack.Empty then
firstidx = 0
else
firstidx = Stack.Peek + 1
//range_length * smallest_weight_in_range
BestSUM = MaxValue(BestSUM, AG * (idx - firstidx))
Stack.Push(idx)
now check the rest:
repeat while loop without >= condition
Every item is pushed and popped once, so linear time and space complexity.
P.S. I feel that I've ever seen this problem in another formulation...
I see two differents approaches for the moment :
Naive approach: For each pair of indices (i,j) in the array, compute the minimum value m(i,j) of the array in the interval (i,j) and then compute score(i,j) = |j-i+1|*m(i,j). Take then the maximum score over all the pairs (i,j).
-> Complexity of O(n^3).
Less naive approach:
Compute the set of values of the array
For each value, compute the maximum score it can get. For that, you just have to iterate once over all the values of the array. For example, when your sample input is [3 0 5 4 4 4] and the current value you are looking is 3, then it will give you a score of 12. (You'll first find a value of 3 thanks to the first index, and then a score of 12 due to indices from 2 to 5).
Take the maximum over all values found at step 2.
-> Complexity is here O(n*m), since you have to do at most m times the step 2, and the step 2 can be done in O(n).
Maybe there is a better complexity, but I don't have a clue yet.

Given an integer array (of size N) and a number M, find product of N-1 elements of the array modulo M

Let's say you are given an array A of N integers and another integer M. For any given index i where 0 <= i < N, hide the ith index of A and return the product of all other elements of A modulo M.
For example, say A = {1, 2, 3, 4, 5} and M=100 then for i=1, the result would be (1x3x4x5) mod 100. Hence the result is 60.
Assume that all integers are 32 bit unsigned integers.
Now an obvious approach to do this is to calculate the result for any given value of i. That would mean N-1 multiplications for every given value of i. Is there a more optimal way to do this?
P.S.
First idea would be to store the product of all numbers in A (let's call this total). Now for every given value of i, we can just divide total by A[i] and return the result after taking the modulo. However, the total would cause an overflow so this cannot be done.
Easy...:)
left[0]=a[0];
for(int i=1;i<=n-1;i++)
left[i]=(left[i-1]*a[i])%M;
right[n-1]=a[n-1];
for(int i=n-2;i>=0;i--)
right[i]=(right[i-1]*a[i])%M;
for query q
if(q==0)
return right[1]%M;
if(q==n-1)
return left[n-2]%M;
return (left[q-1]*right[q+1])%M;
Suppose there is an array of 5 elements.
Now
index: 1 2 3 4 5
1 5 2 10 4
Now for query q=3
answer is = ((1*5) * (10*4))%M
for query q=4
answer is = ((1*5*2)*(4))%M
We are basically pre computing all the left and right multiplication
index: 1 2 3 4 5
1 5 2 10 4
left: 1 5 10 100 400
right: 400 400 80 40 4
For q=3 answer is left[2]*right[4]= (5*40)%M= 200%M
For q=4 answer is left[3]*right[5]= (10*4)%M= 40%M
For this answer, I'm assuming that this is not a ONE-TIME calculation, but it is something that can take place many times with different values of i.
First, define a non-volatile array to hold calculated products.
Then, whenever the function is invoked with a given pair of parameters (M and i):
Check in the array (of above) if the product was calculated,
If yes, simply use the stored value, calculate the MOD and return the result,
If not, calculate the product, store it, calculate the MOD and return the value.
This method spares you from having a (potentially long) initialization which might calculate products that would not be needed.

Fast algorithm to optimize a sequence of arithmetic expression

EDIT: clarified description of problem
Is there a fast algorithm solving following problem?
And, is also for extendend version of this problem
that is replaced natural numbers to Z/(2^n Z)?(This problem was too complex to add more quesion in one place, IMO.)
Problem:
For a given set of natural numbers like {7, 20, 17, 100}, required algorithm
returns the shortest sequence of additions, mutliplications and powers compute
all of given numbers.
Each item of sequence are (correct) equation that matches following pattern:
<number> = <number> <op> <number>
where <number> is a natual number, <op> is one of {+, *, ^}.
In the sequence, each operand of <op> should be one of
1
numbers which are already appeared in the left-hand-side of equal.
Example:
Input: {7, 20, 17, 100}
Output:
2 = 1 + 1
3 = 1 + 2
6 = 2 * 3
7 = 1 + 6
10 = 3 + 7
17 = 7 + 10
20 = 2 * 10
100 = 10 ^ 2
I wrote backtracking algorithm in Haskell.
it works for small input like above, but my real query is
randomly distributed ~30 numbers in [0,255].
for real query, following code takes 2~10 minutes in my PC.
(Actual code,
very simple test)
My current (Pseudo)code:
-- generate set of sets required to compute n.
-- operater (+) on set is set union.
requiredNumbers 0 = { {} }
requiredNumbers 1 = { {} }
requiredNumbers n =
{ {j, k} | j^k == n, j >= 2, k >= 2 }
+ { {j, k} | j*k == n, j >= 2, k >= 2 }
+ { {j, k} | j+k == n, j >= 1, k >= 1 }
-- remember the smallest set of "computed" number
bestSet := {i | 1 <= i <= largeNumber}
-- backtracking algorithm
-- from: input
-- to: accumulator of "already computed" number
closure from to =
if (from is empty)
if (|bestSet| > |to|)
bestSet := to
return
else if (|from| + |to| >= |bestSet|)
-- cut branch
return
else
m := min(from)
from' := deleteMin(from)
foreach (req in (requiredNumbers m))
closure (from' + (req - to)) (to + {m})
-- recoverEquation is a function converts set of number to set of equation.
-- it can be done easily.
output = recoverEquation (closure input {})
Additional Note:
Answers like
There isn't a fast algorithm, because...
There is a heuristic algorithm, it is...
are also welcomed. Now I'm feeling that there is no fast and exact algorithm...
Answer #1 can be used as a heuristic, I think.
What if you worked backwards from the highest number in a sorted input, checking if/how to utilize the smaller numbers (and numbers that are being introduced) in its construction?
For example, although this may not guarantee the shortest sequence...
input: {7, 20, 17, 100}
(100) = (20) * 5 =>
(7) = 5 + 2 =>
(17) = 10 + (7) =>
(20) = 10 * 2 =>
10 = 5 * 2 =>
5 = 3 + 2 =>
3 = 2 + 1 =>
2 = 1 + 1
What I recommend is to transform it into some kind of graph shortest path algorithm.
For each number, you compute (and store) the shortest path of operations. Technically one step is enough: For each number you can store the operation and the two operands (left and right, because power operation is not commutative), and also the weight ("nodes")
Initially you register 1 with the weight of zero
Every time you register a new number, you have to generate all calculations with that number (all additions, multiplications, powers) with all already-registered numbers. ("edges")
Filter for the calculations: it the result of the calculation is already registered, you shouldn't store that, because there is an easier way to get to that number
Store only 1 operation for the commutative ones (1+2=2+1)
Prefilter the power operation because that may even cause overflow
You have to order this list to the shortest sum path (weight of the edge). Weight = (weight of operand1) + (weight of operand2) + (1, which is the weight of the operation)
You can exclude all resulting numbers which are greater than the maximum number that we have to find (e.g. if we found 100 already, anything greater that 20 can be excluded) - this can be refined so that you can check the members of the operations also.
If you hit one of your target numbers, then you found the shortest way of calculating one of your target numbers, you have to restart the generations:
Recalculate the maximum of the target numbers
Go back on the paths of the currently found number, set their weight to 0 (they will be given from now on, because their cost is already paid)
Recalculate the weight for the operations in the generation list, because the source operand weight may have been changed (this results reordering at the end) - here you can exclude those where either operand is greater than the new maximum
If all the numbers are hit, then the search is over
You can build your expression using the "backlinks" (operation, left and right operands) for each of your target numbers.
The main point is that we always keep our eye on the target function, which is that the total number of operation must be the minimum possible. In order to get this, we always calculate the shortest path to a certain number, then considering that number (and all the other numbers on the way) as given numbers, then extending our search to the remaining targets.
Theoretically, this algorithm processes (registers) each numbers only once. Applying the proper filters cuts the unnecessary branches, so nothing is calculated twice (except the weights of the in-queue elements)

Interview puzzle: Jump Game

Jump Game:
Given an array, start from the first element and reach the last by jumping. The jump length can be at most the value at the current position in the array. The optimum result is when you reach the goal in minimum number of jumps.
What is an algorithm for finding the optimum result?
An example: given array A = {2,3,1,1,4} the possible ways to reach the end (index list) are
0,2,3,4 (jump 2 to index 2, then jump 1 to index 3 then 1 to index 4)
0,1,4 (jump 1 to index 1, then jump 3 to index 4)
Since second solution has only 2 jumps it is the optimum result.
Overview
Given your array a and the index of your current position i, repeat the following until you reach the last element.
Consider all candidate "jump-to elements" in a[i+1] to a[a[i] + i]. For each such element at index e, calculate v = a[e] + e. If one of the elements is the last element, jump to the last element. Otherwise, jump to the element with the maximal v.
More simply put, of the elements within reach, look for the one that will get you furthest on the next jump. We know this selection, x, is the right one because compared to every other element y you can jump to, the elements reachable from y are a subset of the elements reachable from x (except for elements from a backward jump, which are obviously bad choices).
This algorithm runs in O(n) because each element need be considered only once (elements that would be considered a second time can be skipped).
Example
Consider the array of values a, indicies, i, and sums of index and value v.
i -> 0 1 2 3 4 5 6 7 8 9 10 11 12
a -> [4, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
v -> 4 12 3 4 5 6 7 8 9 10 11 12 13
Start at index 0 and consider the next 4 elements. Find the one with maximal v. That element is at index 1, so jump to 1. Now consider the next 11 elements. The goal is within reach, so jump to the goal.
Demo
See here or here with code.
Dynamic programming.
Imagine you have an array B where B[i] shows the minimum number of step needed to reach index i in your array A. Your answer of course is in B[n], given A has n elements and indices start from 1. Assume C[i]=j means the you jumped from index j to index i (this is to recover the path taken later)
So, the algorithm is the following:
set B[i] to infinity for all i
B[1] = 0; <-- zero steps to reach B[1]
for i = 1 to n-1 <-- Each step updates possible jumps from A[i]
for j = 1 to A[i] <-- Possible jump sizes are 1, 2, ..., A[i]
if i+j > n <-- Array boundary check
break
if B[i+j] > B[i]+1 <-- If this path to B[i+j] was shorter than previous
B[i+j] = B[i]+1 <-- Keep the shortest path value
C[i+j] = i <-- Keep the path itself
The number of jumps needed is B[n]. The path that needs to be taken is:
1 -> C[1] -> C[C[1]] -> C[C[C[1]]] -> ... -> n
Which can be restored by a simple loop.
The algorithm is of O(min(k,n)*n) time complexity and O(n) space complexity. n is the number of elements in A and k is the maximum value inside the array.
Note
I am keeping this answer, but cheeken's greedy algorithm is correct and more efficient.
Construct a directed graph from the array. eg: i->j if |i-j|<=x[i] (Basically, if you can move from i to j in one hop have i->j as an edge in the graph). Now, find the shortest path from first node to last.
FWIW, you can use Dijkstra's algorithm so find shortest route. Complexity is O( | E | + | V | log | V | ). Since | E | < n^2, this becomes O(n^2).
We can calculate far index to jump maximum and in between if the any index value is larger than the far, we will update the far index value.
Simple O(n) time complexity solution
public boolean canJump(int[] nums) {
int far = 0;
for(int i = 0; i<nums.length; i++){
if(i <= far){
far = Math.max(far, i+nums[i]);
}
else{
return false;
}
}
return true;
}
start from left(end)..and traverse till number is same as index, use the maximum of such numbers. example if list is
list: 2738|4|6927
index: 0123|4|5678
once youve got this repeat above step from this number till u reach extreme right.
273846927
000001234
in case you dont find nething matching the index, use the digit with the farthest index and value greater than index. in this case 7.( because pretty soon index will be greater than the number, you can probably just count for 9 indices)
basic idea:
start building the path from the end to the start by finding all array elements from which it is possible to make the last jump to the target element (all i such that A[i] >= target - i).
treat each such i as the new target and find a path to it (recursively).
choose the minimal length path found, append the target, return.
simple example in python:
ls1 = [2,3,1,1,4]
ls2 = [4,11,1,1,1,1,1,1,1,1,1,1,1]
# finds the shortest path in ls to the target index tgti
def find_path(ls,tgti):
# if the target is the first element in the array, return it's index.
if tgti<= 0:
return [0]
# for each 0 <= i < tgti, if it it possible to reach
# tgti from i (ls[i] <= >= tgti-i) then find the path to i
sub_paths = [find_path(ls,i) for i in range(tgti-1,-1,-1) if ls[i] >= tgti-i]
# find the minimum length path in sub_paths
min_res = sub_paths[0]
for p in sub_paths:
if len(p) < len(min_res):
min_res = p
# add current target to the chosen path
min_res.append(tgti)
return min_res
print find_path(ls1,len(ls1)-1)
print find_path(ls2,len(ls2)-1)
>>>[0, 1, 4]
>>>[0, 1, 12]

Getting the lowest possible sum from numbers' difference

I have to find the lowest possible sum from numbers' difference.
Let's say I have 4 numbers. 1515, 1520, 1500 and 1535. The lowest sum of difference is 30, because 1535 - 1520 = 15 && 1515 - 1500 = 15 and 15 + 15 = 30. If I would do like this: 1520 - 1515 = 5 && 1535 - 1500 = 35 it would be 40 in sum.
Hope you got it, if not, ask me.
Any ideas how to program this? I just found this online, tried to translate from my language to English. It sounds interesting. I can't do bruteforce, because it would take ages to compile. I don't need code, just ideas how to program or little fragment of code.
Thanks.
Edit:
I didn't post everything... One more edition:
I have let's say 8 possible numbers. But I have to take only 6 of them to make the smallest sum. For instance, numbers 1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731, the smallest sum will be 48, but here I have to take only 6 numbers from 8.
Taking the edit into account:
Start by sorting the list. Then use a dynamic programming solution, with state i, n representing the minimum sum of n differences when considering only the first i numbers in the sequence. Initial states: dp[*][0] = 0, everything else = infinity. Use two loops: outer loop looping through i from 1 to N, inner loop looping through n from 0 to R (3 in your example case in your edit - this uses 3 pairs of numbers which means 6 individual numbers). Your recurrence relation is dp[i][n] = min(dp[i-1][n], dp[i-2][n-1] + seq[i] - seq[i-1]).
You have to be aware of handling boundary cases which I've ignored, but the general idea should work and will run in O(N log N + NR) and use O(NR) space.
The solution by marcog is a correct, non-recursive, polynomial-time solution to the problem — it's a pretty standard DP problem — but, just for completeness, here's a proof that it works, and actual code for the problem. [#marcog: Feel free to copy any part of this answer into your own if you wish; I'll then delete this.]
Proof
Let the list be x1, …, xN. Assume wlog that the list is sorted. We're trying to find K (disjoint) pairs of elements from the list, such that the sum of their differences is minimised.
Claim: An optimal solution always consists of the differences of consecutive elements.
Proof: Suppose you fix the subset of elements whose differences are taken. Then by the proof given by Jonas Kölker, the optimal solution for just this subset consists of differences of consecutive elements from the list. Now suppose there is a solution corresponding to a subset that does not comprise pairs of consecutive elements, i.e. the solution involves a difference xj-xi where j>i+1. Then, we can replace xj with xi+1 to get a smaller difference, since
xi ≤ xi+1 ≤ xj ⇒ xi+1-xi ≤ xj-xi.
(Needless to say, if xi+1=xj, then taking xi+1 is indistinguishable from taking xj.) This proves the claim.
The rest is just routine dynamic programming stuff: the optimal solution using k pairs from the first n elements either doesn't use the nth element at all (in which case it's just the optimal solution using k pairs from the first n-1), or it uses the nth element in which case it's the difference xn-xn-1 plus the optimal solution using k-1 pairs from the first n-2.
The whole program runs in time O(N log N + NK), as marcog says. (Sorting + DP.)
Code
Here's a complete program. I was lazy with initializing arrays and wrote Python code using dicts; this is a small log(N) factor over using actual arrays.
'''
The minimum possible sum|x_i - x_j| using K pairs (2K numbers) from N numbers
'''
import sys
def ints(): return [int(s) for s in sys.stdin.readline().split()]
N, K = ints()
num = sorted(ints())
best = {} #best[(k,n)] = minimum sum using k pairs out of 0 to n
def b(k,n):
if best.has_key((k,n)): return best[(k,n)]
if k==0: return 0
return float('inf')
for n in range(1,N):
for k in range(1,K+1):
best[(k,n)] = min([b(k,n-1), #Not using num[n]
b(k-1,n-2) + num[n]-num[n-1]]) #Using num[n]
print best[(K,N-1)]
Test it:
Input
4 2
1515 1520 1500 1535
Output
30
Input
8 3
1731 1572 2041 1561 1682 1572 1609 1731
Output
48
I assume the general problem is this: given a list of 2n integers, output a list of n pairs, such that the sum of |x - y| over all pairs (x, y) is as small as possible.
In that case, the idea would be:
sort the numbers
emit (numbers[2k], numbers[2k+1]) for k = 0, ..., n - 1.
This works. Proof:
Suppose you have x_1 < x_2 < x_3 < x_4 (possibly with other values between them) and output (x_1, x_3) and (x_2, x_4). Then
|x_4 - x_2| + |x_3 - x_1| = |x_4 - x_3| + |x_3 - x_2| + |x_3 - x_2| + |x_2 - x_1| >= |x_4 - x_3| + |x_2 - x_1|.
In other words, it's always better to output (x_1, x_2) and (x_3, x_4) because you don't redundantly cover the space between x_2 and x_3 twice. By induction, the smallest number of the 2n must be paired with the second smallest number; by induction on the rest of the list, pairing up smallest neighbours is always optimal, so the algorithm sketch I proposed is correct.
Order the list, then do the difference calculation.
EDIT: hi #hey
You can solve the problem using dynamic programming.
Say you have a list L of N integers, you must form k pairs (with 2*k <= N)
Build a function that finds the smallest difference within a list (if the list is sorted, it will be faster ;) call it smallest(list l)
Build another one that finds the same for two pairs (can be tricky, but doable) and call it smallest2(list l)
Let's define best(int i, list l) the function that gives you the best result for i pairs within the list l
The algorithm goes as follows:
best(1, L) = smallest(L)
best(2, L) = smallest2(L)
for i from 1 to k:
loop
compute min (
stored_best(i-2) - smallest2( stored_remainder(i-2) ),
stored_best(i-1) - smallest( stored_remainder(i-1)
) and store as best(i)
store the remainder as well for the chosen solution
Now, the problem is once you have chosen a pair, the two ints that form the boundaries are reserved and can't be used to form a better solution. But by looking two levels back you can guaranty you have allowed switching candidates.
(The switching work is done by smallest2)
Step 1: Calculate pair differences
I think it is fairly obvious that the right approach is to sort the numbers and then take differences between each
adjacent pair of numbers. These differences are the "candidate" differences contributing to the
minimal difference sum. Using the numbers from your example would lead to:
Number Diff
====== ====
1561
11
1572
0
1572
37
1609
73
1682
49
1731
0
1731
310
2041
Save the differences into an array or table or some other data structure where you can maintain the
differences and the two numbers that contributed to each difference. Call this the DiffTable. It
should look something like:
Index Diff Number1 Number2
===== ==== ======= =======
1 11 1561 1572
2 0 1572 1572
3 37 1572 1609
4 73 1609 1682
5 49 1682 1731
6 0 1731 1731
7 310 1731 2041
Step 2: Choose minimal Differences
If all numbers had to be chosen, we could have stopped at step 1 by choosing the number pair for odd numbered
indices: 1, 3, 5, 7. This is the correct answer. However,
the problem states that a subset of pairs are chosen and this complicates the problem quite a bit.
In your example 3 differences (6 numbers = 3 pairs = 3 differences) need to be chosen such that:
The sum of the differences is minimal
The numbers participating in any chosen difference are removed from the list.
The second point means that if we chose Diff 11 (Index = 1 above), the numbers 1561 and 1572 are
removed from the list, and consequently, the next Diff of 0 at index 2 cannot be used because only 1 instance
of 1572 is left. Whenever a
Diff is chosen the adjacent Diff values are removed. This is why there is only one way to choose 4 pairs of
numbers from a list containing eight numbers.
About the only method I can think of to minimize the sum of the Diff above is to generate and test.
The following pseudo code outlines a process to generate
all 'legal' sets of index values for a DiffTable of arbitrary size
where an arbitrary number of number pairs are chosen. One (or more) of the
generated index sets will contain the indices into the DiffTable yielding a minimum Diff sum.
/* Global Variables */
M = 7 /* Number of candidate pair differences in DiffTable */
N = 3 /* Number of indices in each candidate pair set (3 pairs of numbers) */
AllSets = [] /* Set of candidate index sets (set of sets) */
call GenIdxSet(1, []) /* Call generator with seed values */
/* AllSets now contains candidate index sets to perform min sum tests on */
end
procedure: GenIdxSet(i, IdxSet)
/* Generate all the valid index values for current level */
/* and subsequent levels until a complete index set is generated */
do while i <= M
if CountMembers(IdxSet) = N - 1 then /* Set is complete */
AllSets = AppendToSet(AllSets, AppendToSet(IdxSet, i))
else /* Add another index */
call GenIdxSet(i + 2, AppendToSet(IdxSet, i))
i = i + 1
end
return
Function CountMembers returns the number of members in the given set, function AppendToSet returns a new set
where the arguments are appended into a single ordered set. For example
AppendToSet([a, b, c], d) returns the set: [a, b, c, d].
For the given parameters, M = 7 and N = 3, AllSets becomes:
[[1 3 5]
[1 3 6] <= Diffs = (11 + 37 + 0) = 48
[1 3 7]
[1 4 6]
[1 4 7]
[1 5 7]
[2 4 6]
[2 4 7]
[2 5 7]
[3 5 7]]
Calculate the sums using each set of indices, the one that is minimum identifies the
required number pairs in DiffTable. Above I show that the second set of indices gives
the minimum you are looking for.
This is a simple brute force technique and it does not scale very well. If you had a list of
50 number pairs and wanted to choose the 5 pairs, AllSets would contain 1,221,759 sets of
number pairs to test.
I know you said you did not need code but it is the best way for me to describe a set based solution. The solution runs under SQL Server 2008. Included in the code is the data for the two examples you give. The sql solution could be done with a single self joining table but I find it easier to explain when there are multiple tables.
--table 1 holds the values
declare #Table1 table (T1_Val int)
Insert #Table1
--this data is test 1
--Select (1515) Union ALL
--Select (1520) Union ALL
--Select (1500) Union ALL
--Select (1535)
--this data is test 2
Select (1731) Union ALL
Select (1572) Union ALL
Select (2041) Union ALL
Select (1561) Union ALL
Select (1682) Union ALL
Select (1572) Union ALL
Select (1609) Union ALL
Select (1731)
--Select * from #Table1
--table 2 holds the sorted numbered list
Declare #Table2 table (T2_id int identity(1,1), T1_Val int)
Insert #Table2 Select T1_Val from #Table1 order by T1_Val
--table 3 will hold the sorted pairs
Declare #Table3 table (T3_id int identity(1,1), T21_id int, T21_Val int, T22_id int, T22_val int)
Insert #Table3
Select T2_1.T2_id, T2_1.T1_Val,T2_2.T2_id, T2_2.T1_Val from #Table2 AS T2_1
LEFT Outer join #Table2 AS T2_2 on T2_1.T2_id = T2_2.T2_id +1
--select * from #Table3
--remove odd numbered rows
delete from #Table3 where T3_id % 2 > 0
--select * from #Table3
--show the diff values
--select *, ABS(T21_Val - T22_val) from #Table3
--show the diff values in order
--select *, ABS(T21_Val - T22_val) from #Table3 order by ABS(T21_Val - T22_val)
--display the two lowest
select TOP 2 CAST(T22_val as varchar(24)) + ' and ' + CAST(T21_val as varchar(24)) as 'The minimum difference pairs are'
, ABS(T21_Val - T22_val) as 'Difference'
from #Table3
ORDER by ABS(T21_Val - T22_val)
I think #marcog's approach can be simplified further.
Take the basic approach that #jonas-kolker proved for finding the smallest differences. Take the resulting list and sort it. Take the R smallest entries from this list and use them as your differences. Proving that this is the smallest sum is trivial.
#marcog's approach is effectively O(N^2) because R == N is a legit option. This approach should be (2*(N log N))+N aka O(N log N).
This requires a small data structure to hold a difference and the values it was derived from. But, that is constant per entry. Thus, space is O(N).
I would go with answer of marcog, you can sort using any of the sorting algoriothms. But there is little thing to analyze now.
If you have to choose R numbers out N numbers so that the sum of their differences is minimum then the numbers be chosen in a sequence without missing any numbers in between.
Hence after sorting the array you should run an outer loop from 0 to N-R and an inner loop from 0 to R-1 times to calculate the sum of differnces.
If needed, you should try with some examples.
I've taken an approach which uses a recursive algorithm, but it does take some of what other people have contributed.
First of all we sort the numbers:
[1561,1572,1572,1609,1682,1731,1731,2041]
Then we compute the differences, keeping track of which the indices of the numbers that contributed to each difference:
[(11,(0,1)),(0,(1,2)),(37,(2,3)),(73,(3,4)),(49,(4,5)),(0,(5,6)),(310,(6,7))]
So we got 11 by getting the difference between number at index 0 and number at index 1, 37 from the numbers at indices 2 & 3.
I then sorted this list, so it tells me which pairs give me the smallest difference:
[(0,(1,2)),(0,(5,6)),(11,(0,1)),(37,(2,3)),(49,(4,5)),(73,(3,4)),(310,(6,7))]
What we can see here is that, given that we want to select n numbers, a naive solution might be to select the first n / 2 items of this list. The trouble is, in this list the third item shares an index with the first, so we'd only actually get 5 numbers, not 6. In this case you need to select the fourth pair as well to get a set of 6 numbers.
From here, I came up with this algorithm. Throughout, there is a set of accepted indices which starts empty, and there's a number of numbers left to select n:
If n is 0, we're done.
if n is 1, and the first item will provide just 1 index which isn't in our set, we taken the first item, and we're done.
if n is 2 or more, and the first item will provide 2 indices which aren't in our set, we taken the first item, and we recurse (e.g. goto 1). This time looking for n - 2 numbers that make the smallest difference in the remainder of the list.
This is the basic routine, but life isn't that simple. There are cases we haven't covered yet, but make sure you get the idea before you move on.
Actually step 3 is wrong (found that just before I posted this :-/), as it may be unnecessary to include an early difference to cover indices which are covered by later, essential differences. The first example ([1515, 1520, 1500, 1535]) falls foul of this. Because of this I've thrown it away in the section below, and expanded step 4 to deal with it.
So, now we get to look at the special cases:
** as above **
** as above **
If n is 1, but the first item will provide two indices, we can't select it. We have to throw that item away and recurse. This time we're still looking for n indices, and there have been no changes to our accepted set.
If n is 2 or more, we have a choice. Either we can a) choose this item, and recurse looking for n - (1 or 2) indices, or b) skip this item, and recurse looking for n indices.
4 is where it gets tricky, and where this routine turns into a search rather than just a sorting exercise. How can we decide which branch (a or b) to take? Well, we're recursive, so let's call both, and see which one is better. How will we judge them?
We'll want to take whichever branch produces the lowest sum.
...but only if it will use up the right number of indices.
So step 4 becomes something like this (pseudocode):
x = numberOfIndicesProvidedBy(currentDifference)
branchA = findSmallestDifference (n-x, remainingDifferences) // recurse looking for **n-(1 or 2)**
branchB = findSmallestDifference (n , remainingDifferences) // recurse looking for **n**
sumA = currentDifference + sumOf(branchA)
sumB = sumOf(branchB)
validA = indicesAddedBy(branchA) == n
validB = indicesAddedBy(branchB) == n
if not validA && not validB then return an empty branch
if validA && not validB then return branchA
if validB && not validA then return branchB
// Here, both must be valid.
if sumA <= sumB then return branchA else return branchB
I coded this up in Haskell (because I'm trying to get good at it). I'm not sure about posting the whole thing, because it might be more confusing than useful, but here's the main part:
findSmallestDifference = findSmallestDifference' Set.empty
findSmallestDifference' _ _ [] = []
findSmallestDifference' taken n (d:ds)
| n == 0 = [] -- Case 1
| n == 1 && provides1 d = [d] -- Case 2
| n == 1 && provides2 d = findSmallestDifference' taken n ds -- Case 3
| provides0 d = findSmallestDifference' taken n ds -- Case 3a (See Edit)
| validA && not validB = branchA -- Case 4
| validB && not validA = branchB -- Case 4
| validA && validB && sumA <= sumB = branchA -- Case 4
| validA && validB && sumB <= sumA = branchB -- Case 4
| otherwise = [] -- Case 4
where branchA = d : findSmallestDifference' (newTaken d) (n - (provides taken d)) ds
branchB = findSmallestDifference' taken n ds
sumA = sumDifferences branchA
sumB = sumDifferences branchB
validA = n == (indicesTaken branchA)
validB = n == (indicesTaken branchA)
newTaken x = insertIndices x taken
Hopefully you can see all the cases there. That code(-ish), plus some wrapper produces this:
*Main> findLeastDiff 6 [1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731]
Smallest Difference found is 48
1572 - 1572 = 0
1731 - 1731 = 0
1572 - 1561 = 11
1609 - 1572 = 37
*Main> findLeastDiff 4 [1515, 1520, 1500,1535]
Smallest Difference found is 30
1515 - 1500 = 15
1535 - 1520 = 15
This has become long, but I've tried to be explicit. Hopefully it was worth while.
Edit : There is a case 3a that can be added to avoid some unnecessary work. If the current difference provides no additional indices, it can be skipped. This is taken care of in step 4 above, but there's no point in evaluating both halves of the tree for no gain. I've added this to the Haskell.
Something like
Sort List
Find Duplicates
Make the duplicates a pair
remove duplicates from list
break rest of list into pairs
calculate differences of each pair
take lowest amounts
In your example you have 8 number and need the best 3 pairs. First sort the list which gives you
1561, 1572, 1572, 1609, 1682, 1731, 1731, 2041
If you have duplicates make them a pair and remove them from the list so you have
[1572, 1572] = 0
[1731, 1731] = 0
L = { 1561, 1609, 1682, 2041 }
Break the remaining list into pairs, giving you the 4 following pairs
[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
[1682, 2041] = 359
Then drop the amount of numbers you need to.
This gives you the following 3 pairs with the lowest pairs
[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
So
0 + 0 + 48 = 48

Resources