Partitioning Sets in Groups - algorithm

I bumped into an algorithmic problem in work. I am a little bit rusty on partitioning problems and I decided to ask here. So here is the problem:
Consider a finite set C of disjoint finite sets {s1, s2, ..., sN}. Sets sJ contains some proto-elements. The number of proto-elements in each set sI is known (or can be found) in advance. Group the sets in C into disjoint groups {g1, g2, ..., gL} so that the number of proto-elements in each group is less than or equal to a given K and the number of groups is minimal.
Explanations: The members of a gI are some sets sJ. The number of proto-elements in gI is simply the sum of the numbers of proto-elements in each sJ belonging to gI.
Example: {s1, s2, s3, s4}, s1 has 1 proto-elements, s2 has 1 proto-elements, s3 has 4 proto-elements, s4 has 2 proto-elements; k = 4
Partition 1: {g1, g2, g3}, where g1={s1, s2}, g2={s3}, g3={s4}. The total number of proto-elements in all groups is less than or equal to 4.
Partition 2: {h1, h2}, where h1={s1, s2, s4}, h2={s3} and it complies with the condition of minimal number of groups.
For my practical purposes I could live with any grouping as long as the groups has less then k proto-elements. I implemented a naïve algorithm:
1) Initialize the current group to an empty collection
current_group={}
2) Initialize the running count of proto-elements in current_group to 0
running_count=0
3) For each set S in C
3.1) Let count be the number of proto-elements in S
3.2) If count is 0 (S is empty), continue with the next iteration
3.3) new_running_count = running_count + count
3.4) if new_running_count < K {
add the current set S to current_group
running_count = new_running_count
}
3.5) else { /* new_running_count is equal to greater to K */
print current_group
running_count = count
current_group={}
add the current set S to current_group
}
4) After the loop ends print current_group if it is not empty
Obviously this algorithm does not generate the minimal number of groups. I would be very grateful if you give me any hints how to solve this. Pointers to books, articles, etc. is great, no need to write the code for me. I would search for the solution myself if the problem has a well-known name.
Regards
rambius

Related

Number of partitions with a given constraint

Consider a set of 13 Danish, 11 Japanese and 8 Polish people. It is well known that the number of different ways of dividing this set of people to groups is the 13+11+8=32:th Bell number (the number of set partitions). However we are asked to find the number of possible set partitions under a given constraint. The question is as follows:
A set partition is said to be good if it has no group consisting of at least two people that only includes a single nationality. How many good partitions there are for this set? (A group may include only one person.)
The brute force approach requires going though about 10^26 partitions and checking which ones are good. This seems pretty unfeasible, especially if the groups are larger or one introduces other nationalities. Is there a smart way instead?
EDIT: As a side note. There probably is no hope for a really nice solution. A highly esteemed expert in combinatorics answered a related question, which, I think, basically says that the related problem, and thus this problem also, is very difficult to solve exactly.
Here's a solution using dynamic programming.
It starts from an empty set, then adds one element at a time and calculates all the valid partitions.
The state space is huge, but notice that to be able to calculate the next step we only need to know about a partition the following things:
For each nationality, how many sets it contains that consists of only a single member of that nationality. (e.g.: {a})
How many sets it contains with mixed elements. (e.g.: {a, b, c})
For each of these configurations I only store the total count. Example:
[0, 1, 2, 2] -> 3
{a}{b}{c}{mixed}
e.g.: 3 partitions that look like: {b}, {c}, {c}, {a,c}, {b,c}
Here's the code in python:
import collections
from operator import mul
from fractions import Fraction
def nCk(n,k):
return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )
def good_partitions(l):
n = len(l)
i = 0
prev = collections.defaultdict(int)
while l:
#any more from this kind?
if l[0] == 0:
l.pop(0)
i += 1
continue
l[0] -= 1
curr = collections.defaultdict(int)
for solution,total in prev.iteritems():
for idx,item in enumerate(solution):
my_solution = list(solution)
if idx == i:
# add element as a new set
my_solution[i] += 1
curr[tuple(my_solution)] += total
elif my_solution[idx]:
if idx != n:
# add to a set consisting of one element
# or merge into multiple sets that consist of one element
cnt = my_solution[idx]
c = cnt
while c > 0:
my_solution = list(solution)
my_solution[n] += 1
my_solution[idx] -= c
curr[tuple(my_solution)] += total * nCk(cnt, c)
c -= 1
else:
# add to a mixed set
cnt = my_solution[idx]
curr[tuple(my_solution)] += total * cnt
if not prev:
# one set with one element
lone = [0] * (n+1)
lone[i] = 1
curr[tuple(lone)] = 1
prev = curr
return sum(prev.values())
print good_partitions([1, 1, 1, 1]) # 15
print good_partitions([1, 1, 1, 1, 1]) # 52
print good_partitions([2, 1]) # 4
print good_partitions([13, 11, 8]) # 29811734589499214658370837
It produces correct values for the test cases. I also tested it against a brute-force solution (for small values), and it produces the same results.
An exact analytic solution is hard, but a polynomial time+space dynamic programming solution is straightforward.
First of all, we need an absolute order on the size of groups. We do that by comparing how many Danes, Japanese, and Poles we have.
Next, the function to write is this one.
m is the maximum group size we can emit
p is the number of people of each nationality that we have left to split
max_good_partitions_of_maximum_size(m, p) is the number of "good partitions"
we can form from p people, with no group being larger than m
Clearly you can write this as a somewhat complicated recursive function that always select the next partition to use, then call itself with that as the new maximum size, and subtract the partition from p. If you had this function, then your answer is simply max_good_partitions_of_maximum_size(p, p) with p = [13, 11, 8]. But that is going to be a brute force search that won't run in reasonable time.
Finally apply https://en.wikipedia.org/wiki/Memoization by caching every call to this function, and it will run in polynomial time. However you will also have to cache a polynomial number of calls to it.

Dynamic programming algorithm for facility locations

There are n houses at locations a_1, a_2,..., a_n along a line. We want to set up porta potties along that same line so that every house is within distance R of at least one porta potty. These porta potties are restricted to the specified locations b_1, b_2,..., b_m. Let c_i be the cost of setting up a porta potty at location b_i.
Find a dynamic programming algorithm that minimizes the total cost of setting up the porta potties. The algorithm should be able to detect if a solution does not exist. Assume that all a and b values are distinct.
Inputs:
A[1, 2,...n] holds the house locations
B[1, 2,...m] holds the potential porta potty locations
C[1, 2,...m] holds the cost of setting up a porta potty at each
location
Output: the minimum cost of placing the porta potties under the constraint that every house must be within distance R of some porta potty
I'm having trouble figuring out a recursive expression to work off of. Any help would be appreciated!
Your question gave me the chance to write some code for a similar problem which often appears as cellphone tower placing problem or cellphone base coverage problem.
Pseudocode follows:
1) Sort houses in ascending order
2) Sort facilities positions and their costs in ascending order by facilities positions
3) Let dp(i) be the minimum cost to cover i houses and lastBase(j) the last base used to cover j houses
4) Set the base case dp(0) = 0 and lastBase(0) = -1
5) For each house i:
6) Check if previous solution is either valid or in range of this new house
7) if it is -> grab it as solution
8) else
9) find a new base starting from lastBase(i) + 1 which can cover this house
10) let it be the minimum-cost one
11) if a base could be found -> add it to the previous solution
12) else -> Problem cannot be solved
I recommend trying it out yourself first.
For completeness' sake: explanation, images and C++ code are available here.
Feedback or errors are welcome.
I am going to give you an idea of how to proceed, how you code it is up to you.
Given A,B,C (also assumption is that all elements in A and B are on the number line) -
-> Sort A in ascending order.
-> Sort B and C together(as they are dependent) based on B's values in ascending order.
-> Maintain a temp array(size n) which keeps track of which "porta potty"
an A element belongs to,by mapping to the "porta potty" index.
-> Now take each element from B and move both forward and backward R steps from that
point on the number line.
-> If any A element is found in those R steps(on the number line)
AND if(and only if) it does not presently belong to any "porta potty" OR
the cost of setting up the current "porta potty" element is more than the "porta potty"
it(A element) already belongs to, then only shall you set the value in temp array
for that A element to the current "porta potty" index number.
-> Now once we are done with all B points, do the following
-> Traverse the temp array and push the "porta potty" index numbers we have into a set
-> You now have a list of all the "porta potty" indices which are the cheapest
yet crucial to be placed.
Think this out and let me know if something is unclear to you. Also the sorting part is only to improve performance.
This one is for cellphone tower placing problem. Your's should be similar i guess.
There ought to be a recursion for this. Here is a commented example in Python, which assumes sorted input:
a = [1, 7,11,13,15]
b = [1,8,9,12,13]
c = [1,3,2, 2, 5]
r = 3
na = len(a)
nb = len(b)
def f (ia,ib,prev_ib,s):
# base case no suitable potties
if ib == nb:
return 1000 # a number larger than sum of all costs
# base case end of row of houses
if ia == na:
return s
# house is in range of last potty
if prev_ib >= 0 and abs(a[ia] - b[prev_ib]) < r:
return f(ia + 1,ib,prev_ib,s)
# house is too far
if abs(a[ia] - b[ib]) >= r:
# house is west of potty
if a[ia] < b[ib]:
return 1000
# house is east of potty
else:
return f(ia,ib + 1,prev_ib,s)
# house is in range of current potty
else:
# choose or skip
return min(f(ia + 1,ib + 1 if ib < nb - 1 else ib,ib,s + c[ib]),f(ia,ib + 1,prev_ib,s))
Output:
print f(0,0,-1,0) # 8

Discrete optimization algorithm

I'm trying to decide on the best approach for my problem, which is as follows:
I have a set of objects (about 3k-5k) which I want to uniquely assign to about 10 groups (1 group per object).
Each object has a set of grades corresponding with how well it fits within each group.
Each group has a capacity of objects it can manage (the constraints).
My goal is to maximize the sum of grades my assignments receive.
For example, let's say I have 3 objects (o1, o2, o3) and 2 groups (g1,g2) with a cap. of 1 object each.
Now assume the grades are:
o1: g1=11, g2=8
o2: g1=10, g2=5
o3: g1=5, g2=6
In that case, for the optimal result g1 should receive o2, and g2 should receive o1, yielding a total of 10+8=18 points.
Note that the number of objects can either exceed the sum of quotas (e.g. leaving o3 as a "leftover") or fall short from filling the quotas.
How should I address this problem (Traveling Salesman, sort of a weighted Knap-Sack etc.)? How long should brute-forcing it take on a regular computer? Are there any standard tools such as the linprog function in Matlab that support this sort of problem?
It can be solved with min cost flow algorithm.
The graph can look the following way:
It should be bipartite. The left part represents objects(one vertex for each object). The right part represents groups(one vertex for each group). There is an edge from each vertex from the left part to each vertex from the right part with capacity = 1 and cost = -grade for this pair. There is also an edge from the source vertex to each vertex from the left part with capacity = 1 and cost = 0 and there is an edge from each vertex from the right part to the sink vertex(sink and source are two additional vertices) with capacity = constraints for this group and cost = 0.
The answer is -the cheapest flow cost from the source to the sink.
It is possible to implement it with O(N^2 * M * log(N + M)) time complexity(using Dijkstra algorithm with potentials)(N is the number of objects, M is the number of groups).
This can be solved with an integer program. Binary variables x_{ij} state if object i is assigned to group j. The objective maximized \sum_{i,j} s_{ij}x_{ij}, where s_{ij} is the score associated with assigning i to j and x_{ij} is whether i is assigned to j. You have two types of constraints:
\sum_i x_{ij} <= c_j for all j, the capacity constraints for groups
\sum_j x_{ij} <= 1 for all i, limiting objects to be assigned to at most one group
Here's how you would implement it in R -- the lp function in R is quite similar to the linprog function in matlab.
# Score matrix
S <- matrix(c(11, 10, 5, 8, 5, 6), nrow=3)
# Capacity vector
cvec <- c(1, 1)
# Helper function to construct constraint matrices
unit.vec <- function(pos, n) {
ret <- rep(0, n)
ret[pos] <- 1
ret
}
# Capacity constraints
cap <- t(sapply(1:ncol(S), function(j) rep(unit.vec(j, ncol(S)), nrow(S))))
# Object assignment constraints
obj <- t(sapply(1:nrow(S), function(i) rep(unit.vec(i, nrow(S)), each=ncol(S))))
# Solve the LP
res <- lp(direction="max",
objective.in=as.vector(t(S)),
const.mat=rbind(cap, obj),
const.dir="<=",
const.rhs=c(cvec, rep(1, nrow(S))),
all.bin=TRUE)
# Grab assignments and objective
sln <- t(matrix(res$solution, nrow=ncol(S)))
apply(sln, 1, function(x) ifelse(sum(x) > 0.999, which(x == 1), NA))
# [1] 2 1 NA
res$objval
# [1] 18
Although this is modeled with binary variables, it will solve quite efficiently assuming integral capacities.

Getting the lowest possible sum from numbers' difference

I have to find the lowest possible sum from numbers' difference.
Let's say I have 4 numbers. 1515, 1520, 1500 and 1535. The lowest sum of difference is 30, because 1535 - 1520 = 15 && 1515 - 1500 = 15 and 15 + 15 = 30. If I would do like this: 1520 - 1515 = 5 && 1535 - 1500 = 35 it would be 40 in sum.
Hope you got it, if not, ask me.
Any ideas how to program this? I just found this online, tried to translate from my language to English. It sounds interesting. I can't do bruteforce, because it would take ages to compile. I don't need code, just ideas how to program or little fragment of code.
Thanks.
Edit:
I didn't post everything... One more edition:
I have let's say 8 possible numbers. But I have to take only 6 of them to make the smallest sum. For instance, numbers 1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731, the smallest sum will be 48, but here I have to take only 6 numbers from 8.
Taking the edit into account:
Start by sorting the list. Then use a dynamic programming solution, with state i, n representing the minimum sum of n differences when considering only the first i numbers in the sequence. Initial states: dp[*][0] = 0, everything else = infinity. Use two loops: outer loop looping through i from 1 to N, inner loop looping through n from 0 to R (3 in your example case in your edit - this uses 3 pairs of numbers which means 6 individual numbers). Your recurrence relation is dp[i][n] = min(dp[i-1][n], dp[i-2][n-1] + seq[i] - seq[i-1]).
You have to be aware of handling boundary cases which I've ignored, but the general idea should work and will run in O(N log N + NR) and use O(NR) space.
The solution by marcog is a correct, non-recursive, polynomial-time solution to the problem — it's a pretty standard DP problem — but, just for completeness, here's a proof that it works, and actual code for the problem. [#marcog: Feel free to copy any part of this answer into your own if you wish; I'll then delete this.]
Proof
Let the list be x1, …, xN. Assume wlog that the list is sorted. We're trying to find K (disjoint) pairs of elements from the list, such that the sum of their differences is minimised.
Claim: An optimal solution always consists of the differences of consecutive elements.
Proof: Suppose you fix the subset of elements whose differences are taken. Then by the proof given by Jonas Kölker, the optimal solution for just this subset consists of differences of consecutive elements from the list. Now suppose there is a solution corresponding to a subset that does not comprise pairs of consecutive elements, i.e. the solution involves a difference xj-xi where j>i+1. Then, we can replace xj with xi+1 to get a smaller difference, since
xi ≤ xi+1 ≤ xj ⇒ xi+1-xi ≤ xj-xi.
(Needless to say, if xi+1=xj, then taking xi+1 is indistinguishable from taking xj.) This proves the claim.
The rest is just routine dynamic programming stuff: the optimal solution using k pairs from the first n elements either doesn't use the nth element at all (in which case it's just the optimal solution using k pairs from the first n-1), or it uses the nth element in which case it's the difference xn-xn-1 plus the optimal solution using k-1 pairs from the first n-2.
The whole program runs in time O(N log N + NK), as marcog says. (Sorting + DP.)
Code
Here's a complete program. I was lazy with initializing arrays and wrote Python code using dicts; this is a small log(N) factor over using actual arrays.
'''
The minimum possible sum|x_i - x_j| using K pairs (2K numbers) from N numbers
'''
import sys
def ints(): return [int(s) for s in sys.stdin.readline().split()]
N, K = ints()
num = sorted(ints())
best = {} #best[(k,n)] = minimum sum using k pairs out of 0 to n
def b(k,n):
if best.has_key((k,n)): return best[(k,n)]
if k==0: return 0
return float('inf')
for n in range(1,N):
for k in range(1,K+1):
best[(k,n)] = min([b(k,n-1), #Not using num[n]
b(k-1,n-2) + num[n]-num[n-1]]) #Using num[n]
print best[(K,N-1)]
Test it:
Input
4 2
1515 1520 1500 1535
Output
30
Input
8 3
1731 1572 2041 1561 1682 1572 1609 1731
Output
48
I assume the general problem is this: given a list of 2n integers, output a list of n pairs, such that the sum of |x - y| over all pairs (x, y) is as small as possible.
In that case, the idea would be:
sort the numbers
emit (numbers[2k], numbers[2k+1]) for k = 0, ..., n - 1.
This works. Proof:
Suppose you have x_1 < x_2 < x_3 < x_4 (possibly with other values between them) and output (x_1, x_3) and (x_2, x_4). Then
|x_4 - x_2| + |x_3 - x_1| = |x_4 - x_3| + |x_3 - x_2| + |x_3 - x_2| + |x_2 - x_1| >= |x_4 - x_3| + |x_2 - x_1|.
In other words, it's always better to output (x_1, x_2) and (x_3, x_4) because you don't redundantly cover the space between x_2 and x_3 twice. By induction, the smallest number of the 2n must be paired with the second smallest number; by induction on the rest of the list, pairing up smallest neighbours is always optimal, so the algorithm sketch I proposed is correct.
Order the list, then do the difference calculation.
EDIT: hi #hey
You can solve the problem using dynamic programming.
Say you have a list L of N integers, you must form k pairs (with 2*k <= N)
Build a function that finds the smallest difference within a list (if the list is sorted, it will be faster ;) call it smallest(list l)
Build another one that finds the same for two pairs (can be tricky, but doable) and call it smallest2(list l)
Let's define best(int i, list l) the function that gives you the best result for i pairs within the list l
The algorithm goes as follows:
best(1, L) = smallest(L)
best(2, L) = smallest2(L)
for i from 1 to k:
loop
compute min (
stored_best(i-2) - smallest2( stored_remainder(i-2) ),
stored_best(i-1) - smallest( stored_remainder(i-1)
) and store as best(i)
store the remainder as well for the chosen solution
Now, the problem is once you have chosen a pair, the two ints that form the boundaries are reserved and can't be used to form a better solution. But by looking two levels back you can guaranty you have allowed switching candidates.
(The switching work is done by smallest2)
Step 1: Calculate pair differences
I think it is fairly obvious that the right approach is to sort the numbers and then take differences between each
adjacent pair of numbers. These differences are the "candidate" differences contributing to the
minimal difference sum. Using the numbers from your example would lead to:
Number Diff
====== ====
1561
11
1572
0
1572
37
1609
73
1682
49
1731
0
1731
310
2041
Save the differences into an array or table or some other data structure where you can maintain the
differences and the two numbers that contributed to each difference. Call this the DiffTable. It
should look something like:
Index Diff Number1 Number2
===== ==== ======= =======
1 11 1561 1572
2 0 1572 1572
3 37 1572 1609
4 73 1609 1682
5 49 1682 1731
6 0 1731 1731
7 310 1731 2041
Step 2: Choose minimal Differences
If all numbers had to be chosen, we could have stopped at step 1 by choosing the number pair for odd numbered
indices: 1, 3, 5, 7. This is the correct answer. However,
the problem states that a subset of pairs are chosen and this complicates the problem quite a bit.
In your example 3 differences (6 numbers = 3 pairs = 3 differences) need to be chosen such that:
The sum of the differences is minimal
The numbers participating in any chosen difference are removed from the list.
The second point means that if we chose Diff 11 (Index = 1 above), the numbers 1561 and 1572 are
removed from the list, and consequently, the next Diff of 0 at index 2 cannot be used because only 1 instance
of 1572 is left. Whenever a
Diff is chosen the adjacent Diff values are removed. This is why there is only one way to choose 4 pairs of
numbers from a list containing eight numbers.
About the only method I can think of to minimize the sum of the Diff above is to generate and test.
The following pseudo code outlines a process to generate
all 'legal' sets of index values for a DiffTable of arbitrary size
where an arbitrary number of number pairs are chosen. One (or more) of the
generated index sets will contain the indices into the DiffTable yielding a minimum Diff sum.
/* Global Variables */
M = 7 /* Number of candidate pair differences in DiffTable */
N = 3 /* Number of indices in each candidate pair set (3 pairs of numbers) */
AllSets = [] /* Set of candidate index sets (set of sets) */
call GenIdxSet(1, []) /* Call generator with seed values */
/* AllSets now contains candidate index sets to perform min sum tests on */
end
procedure: GenIdxSet(i, IdxSet)
/* Generate all the valid index values for current level */
/* and subsequent levels until a complete index set is generated */
do while i <= M
if CountMembers(IdxSet) = N - 1 then /* Set is complete */
AllSets = AppendToSet(AllSets, AppendToSet(IdxSet, i))
else /* Add another index */
call GenIdxSet(i + 2, AppendToSet(IdxSet, i))
i = i + 1
end
return
Function CountMembers returns the number of members in the given set, function AppendToSet returns a new set
where the arguments are appended into a single ordered set. For example
AppendToSet([a, b, c], d) returns the set: [a, b, c, d].
For the given parameters, M = 7 and N = 3, AllSets becomes:
[[1 3 5]
[1 3 6] <= Diffs = (11 + 37 + 0) = 48
[1 3 7]
[1 4 6]
[1 4 7]
[1 5 7]
[2 4 6]
[2 4 7]
[2 5 7]
[3 5 7]]
Calculate the sums using each set of indices, the one that is minimum identifies the
required number pairs in DiffTable. Above I show that the second set of indices gives
the minimum you are looking for.
This is a simple brute force technique and it does not scale very well. If you had a list of
50 number pairs and wanted to choose the 5 pairs, AllSets would contain 1,221,759 sets of
number pairs to test.
I know you said you did not need code but it is the best way for me to describe a set based solution. The solution runs under SQL Server 2008. Included in the code is the data for the two examples you give. The sql solution could be done with a single self joining table but I find it easier to explain when there are multiple tables.
--table 1 holds the values
declare #Table1 table (T1_Val int)
Insert #Table1
--this data is test 1
--Select (1515) Union ALL
--Select (1520) Union ALL
--Select (1500) Union ALL
--Select (1535)
--this data is test 2
Select (1731) Union ALL
Select (1572) Union ALL
Select (2041) Union ALL
Select (1561) Union ALL
Select (1682) Union ALL
Select (1572) Union ALL
Select (1609) Union ALL
Select (1731)
--Select * from #Table1
--table 2 holds the sorted numbered list
Declare #Table2 table (T2_id int identity(1,1), T1_Val int)
Insert #Table2 Select T1_Val from #Table1 order by T1_Val
--table 3 will hold the sorted pairs
Declare #Table3 table (T3_id int identity(1,1), T21_id int, T21_Val int, T22_id int, T22_val int)
Insert #Table3
Select T2_1.T2_id, T2_1.T1_Val,T2_2.T2_id, T2_2.T1_Val from #Table2 AS T2_1
LEFT Outer join #Table2 AS T2_2 on T2_1.T2_id = T2_2.T2_id +1
--select * from #Table3
--remove odd numbered rows
delete from #Table3 where T3_id % 2 > 0
--select * from #Table3
--show the diff values
--select *, ABS(T21_Val - T22_val) from #Table3
--show the diff values in order
--select *, ABS(T21_Val - T22_val) from #Table3 order by ABS(T21_Val - T22_val)
--display the two lowest
select TOP 2 CAST(T22_val as varchar(24)) + ' and ' + CAST(T21_val as varchar(24)) as 'The minimum difference pairs are'
, ABS(T21_Val - T22_val) as 'Difference'
from #Table3
ORDER by ABS(T21_Val - T22_val)
I think #marcog's approach can be simplified further.
Take the basic approach that #jonas-kolker proved for finding the smallest differences. Take the resulting list and sort it. Take the R smallest entries from this list and use them as your differences. Proving that this is the smallest sum is trivial.
#marcog's approach is effectively O(N^2) because R == N is a legit option. This approach should be (2*(N log N))+N aka O(N log N).
This requires a small data structure to hold a difference and the values it was derived from. But, that is constant per entry. Thus, space is O(N).
I would go with answer of marcog, you can sort using any of the sorting algoriothms. But there is little thing to analyze now.
If you have to choose R numbers out N numbers so that the sum of their differences is minimum then the numbers be chosen in a sequence without missing any numbers in between.
Hence after sorting the array you should run an outer loop from 0 to N-R and an inner loop from 0 to R-1 times to calculate the sum of differnces.
If needed, you should try with some examples.
I've taken an approach which uses a recursive algorithm, but it does take some of what other people have contributed.
First of all we sort the numbers:
[1561,1572,1572,1609,1682,1731,1731,2041]
Then we compute the differences, keeping track of which the indices of the numbers that contributed to each difference:
[(11,(0,1)),(0,(1,2)),(37,(2,3)),(73,(3,4)),(49,(4,5)),(0,(5,6)),(310,(6,7))]
So we got 11 by getting the difference between number at index 0 and number at index 1, 37 from the numbers at indices 2 & 3.
I then sorted this list, so it tells me which pairs give me the smallest difference:
[(0,(1,2)),(0,(5,6)),(11,(0,1)),(37,(2,3)),(49,(4,5)),(73,(3,4)),(310,(6,7))]
What we can see here is that, given that we want to select n numbers, a naive solution might be to select the first n / 2 items of this list. The trouble is, in this list the third item shares an index with the first, so we'd only actually get 5 numbers, not 6. In this case you need to select the fourth pair as well to get a set of 6 numbers.
From here, I came up with this algorithm. Throughout, there is a set of accepted indices which starts empty, and there's a number of numbers left to select n:
If n is 0, we're done.
if n is 1, and the first item will provide just 1 index which isn't in our set, we taken the first item, and we're done.
if n is 2 or more, and the first item will provide 2 indices which aren't in our set, we taken the first item, and we recurse (e.g. goto 1). This time looking for n - 2 numbers that make the smallest difference in the remainder of the list.
This is the basic routine, but life isn't that simple. There are cases we haven't covered yet, but make sure you get the idea before you move on.
Actually step 3 is wrong (found that just before I posted this :-/), as it may be unnecessary to include an early difference to cover indices which are covered by later, essential differences. The first example ([1515, 1520, 1500, 1535]) falls foul of this. Because of this I've thrown it away in the section below, and expanded step 4 to deal with it.
So, now we get to look at the special cases:
** as above **
** as above **
If n is 1, but the first item will provide two indices, we can't select it. We have to throw that item away and recurse. This time we're still looking for n indices, and there have been no changes to our accepted set.
If n is 2 or more, we have a choice. Either we can a) choose this item, and recurse looking for n - (1 or 2) indices, or b) skip this item, and recurse looking for n indices.
4 is where it gets tricky, and where this routine turns into a search rather than just a sorting exercise. How can we decide which branch (a or b) to take? Well, we're recursive, so let's call both, and see which one is better. How will we judge them?
We'll want to take whichever branch produces the lowest sum.
...but only if it will use up the right number of indices.
So step 4 becomes something like this (pseudocode):
x = numberOfIndicesProvidedBy(currentDifference)
branchA = findSmallestDifference (n-x, remainingDifferences) // recurse looking for **n-(1 or 2)**
branchB = findSmallestDifference (n , remainingDifferences) // recurse looking for **n**
sumA = currentDifference + sumOf(branchA)
sumB = sumOf(branchB)
validA = indicesAddedBy(branchA) == n
validB = indicesAddedBy(branchB) == n
if not validA && not validB then return an empty branch
if validA && not validB then return branchA
if validB && not validA then return branchB
// Here, both must be valid.
if sumA <= sumB then return branchA else return branchB
I coded this up in Haskell (because I'm trying to get good at it). I'm not sure about posting the whole thing, because it might be more confusing than useful, but here's the main part:
findSmallestDifference = findSmallestDifference' Set.empty
findSmallestDifference' _ _ [] = []
findSmallestDifference' taken n (d:ds)
| n == 0 = [] -- Case 1
| n == 1 && provides1 d = [d] -- Case 2
| n == 1 && provides2 d = findSmallestDifference' taken n ds -- Case 3
| provides0 d = findSmallestDifference' taken n ds -- Case 3a (See Edit)
| validA && not validB = branchA -- Case 4
| validB && not validA = branchB -- Case 4
| validA && validB && sumA <= sumB = branchA -- Case 4
| validA && validB && sumB <= sumA = branchB -- Case 4
| otherwise = [] -- Case 4
where branchA = d : findSmallestDifference' (newTaken d) (n - (provides taken d)) ds
branchB = findSmallestDifference' taken n ds
sumA = sumDifferences branchA
sumB = sumDifferences branchB
validA = n == (indicesTaken branchA)
validB = n == (indicesTaken branchA)
newTaken x = insertIndices x taken
Hopefully you can see all the cases there. That code(-ish), plus some wrapper produces this:
*Main> findLeastDiff 6 [1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731]
Smallest Difference found is 48
1572 - 1572 = 0
1731 - 1731 = 0
1572 - 1561 = 11
1609 - 1572 = 37
*Main> findLeastDiff 4 [1515, 1520, 1500,1535]
Smallest Difference found is 30
1515 - 1500 = 15
1535 - 1520 = 15
This has become long, but I've tried to be explicit. Hopefully it was worth while.
Edit : There is a case 3a that can be added to avoid some unnecessary work. If the current difference provides no additional indices, it can be skipped. This is taken care of in step 4 above, but there's no point in evaluating both halves of the tree for no gain. I've added this to the Haskell.
Something like
Sort List
Find Duplicates
Make the duplicates a pair
remove duplicates from list
break rest of list into pairs
calculate differences of each pair
take lowest amounts
In your example you have 8 number and need the best 3 pairs. First sort the list which gives you
1561, 1572, 1572, 1609, 1682, 1731, 1731, 2041
If you have duplicates make them a pair and remove them from the list so you have
[1572, 1572] = 0
[1731, 1731] = 0
L = { 1561, 1609, 1682, 2041 }
Break the remaining list into pairs, giving you the 4 following pairs
[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
[1682, 2041] = 359
Then drop the amount of numbers you need to.
This gives you the following 3 pairs with the lowest pairs
[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
So
0 + 0 + 48 = 48

Number of arrangements

Suppose we have n elements, a1, a2, ..., an, arranged in a circle. That is, a2 is between a1 and a3, a3 is between a2 and a4, an is between an-1 and a1, and so forth.
Each element can take the value of either 1 or 0. Two arrangements are different if there are corresponding ai's whose values differ. For instance, when n=3, (1, 0, 0) and (0, 1, 0) are different arrangements, even though they may be isomorphic under rotation or reflection.
Because there are n elements, each of which can take two values, the total number of arrangements is 2n.
Here is the question:
How many arrangements are possible, such that no two adjacent elements both have the value 1? If it helps, only consider cases where n>3.
I ask here for several reasons:
This arose while I was solving a programming problem
It sounds like the problem may benefit from Boolean logic/bit arithmetic
Maybe there is no closed solution.
Let's first ask the question "how many 0-1 sequences of length n are there with no two consecutive 1s?" Let the answer be A(n). We have A(0)=1 (the empty sequence), A(1) = 2 ("0" and "1"), and A(2)=3 ("00", "01" and "10" but not "11").
To make it easier to write a recurrence, we'll compute A(n) as the sum of two numbers:
B(n), the number of such sequences that end with a 0, and
C(n), the number of such sequences that end with a 1.
Then B(n) = A(n-1) (take any such sequence of length n-1, and append a 0)
and C(n) = B(n-1) (because if you have a 1 at position n, you must have a 0 at n-1.)
This gives A(n) = B(n) + C(n) = A(n-1) + B(n-1) = A(n-1) + A(n-2).
By now it should be familiar :-)
A(n) is simply the Fibonacci number Fn+2 where the Fibonacci sequence is defined by F0=0, F1=1, and Fn+2= Fn+1+Fn for n ≥ 0.
Now for your question. We'll count the number of arrangements with a1=0 and a1=1 separately. For the former, a2 … an can be any sequence at all (with no consecutive 1s), so the number is A(n-1)=Fn+1. For the latter, we must have a2=0, and then a3…an is any sequence with no consecutive 1s that ends with a 0, i.e. B(n-2)=A(n-3)=Fn-1.
So the answer is Fn+1 + Fn-1.
Actually, we can go even further than that answer. Note that if you call the answer as G(n)=Fn+1+Fn-1, then
G(n+1)=Fn+2+Fn, and
G(n+2)=Fn+3+Fn+1, so even G(n) satisfies the same recurrence as the Fibonacci sequence! [Actually, any linear combination of Fibonacci-like sequences will satisfy the same recurrence, so it's not all that surprising.] So another way to compute the answers would be using:
G(2)=3
G(3)=4
G(n)=G(n-1)+G(n-2) for n≥4.
And now you can also use the closed form Fn=(αn-βn)/(α-β) (where α and β are (1±√5)/2, the roots of x2-x-1=0), to get
G(n) = ((1+√5)/2)n + ((1-√5)/2)n.
[You can ignore the second term because it's very close to 0 for large n, in fact G(n) is the closest integer to ((1+√5)/2)n for all n≥2.]
I decided to hack up a small script to try it out:
#!/usr/bin/python
import sys
# thx google
bstr_pos = lambda n: n>0 and bstr_pos(n>>1)+str(n&1) or ""
def arrangements(n):
count = 0
for v in range(0, pow(2,n)-1):
bin = bstr_pos(v).rjust(n, '0')
if not ( bin.find("11")!=-1 or ( bin[0]=='1' and bin[-1]=='1' ) ):
count += 1
print bin
print "Total = " + str(count)
arrangements(int(sys.argv[1]))
Running this for 5, gave me a total of 11 possibilities with 00000,
00001,
00010,
00100,
00101,
01000,
01001,
01010,
10000,
10010,
10100.
P.S. - Excuse the not() in the above code.
Throwing my naive script into the mix. Plenty of opportunity for caching partial results, but it ran fast enough for small n that I didn't bother.
def arcCombinations(n, lastDigitMustBeZero):
"""Takes the length of the remaining arc of the circle, and computes
the number of legal combinations.
The last digit may be restricted to 0 (because the first digit is a 1)"""
if n == 1:
if lastDigitMustBeZero:
return 1 # only legal answer is 0
else:
return 2 # could be 1 or 0.
elif n == 2:
if lastDigitMustBeZero:
return 2 # could be 00 or 10
else:
return 3 # could be 10, 01 or 00
else:
# Could be a 1, in which case next item is a zero.
return (
arcCombinations(n-2, lastDigitMustBeZero) # If it starts 10
+ arcCombinations(n-1, lastDigitMustBeZero) # If it starts 0
)
def circleCombinations(n):
"""Computes the number of legal combinations for a given circle size."""
# Handle case where it starts with 0 or with 1.
total = (
arcCombinations(n-1,True) # Number of combinations where first digit is a 1.
+
arcCombinations(n-1,False) # Number of combinations where first digit is a 0.
)
return total
print circleCombinations(13)
This problem is very similar to Zeckendorf representations. I can't find an obvious way to apply Zeckendorf's Theorem, due to the circularity constraint, but the Fibonacci numbers are obviously very prevalent in this problem.

Resources