Related
Imagine you have a full calendar year in front of you. On some days you take the train, potentially even a few times in a single day and each trip could be to a different location (I.E. The amount you pay for the ticket can be different for each trip).
So you would have data that looked like this:
Date: 2018-01-01, Amount: $5
Date: 2018-01-01, Amount: $6
Date: 2018-01-04, Amount: $2
Date: 2018-01-06, Amount: $4
...
Now you have to group this data into buckets. A bucket can span up to 31 consecutive days (no gaps) and cannot overlap another bucket.
If a bucket has less than 32 train trips it will be blue. If it has 32 or more train trips in it, it will be red. The buckets will also get a value based on the sum of the ticket cost.
After you group all the trips the blue buckets get thrown out. And the value of all the red buckets gets summed up, we will call this the prize.
The goal, is to get the highest value for the prize.
This is the problem I have. I cant think of a good algorithm to do this. If anyone knows a good way to approach this I would like to hear it. Or if you know of anywhere else that can help with designing algorithms like this.
This can be solved by dynamic programming.
First, sort the records by date, and consider them in that order.
Let day (1), day (2), ..., day (n) be the days where the tickets were bought.
Let cost (1), cost (2), ..., cost (n) be the respective ticket costs.
Let fun (k) be the best prize if we consider only the first k records.
Our dynamic programming solution will calculate fun (0), fun (1), fun (2), ..., fun (n-1), fun (n), using the previous values to calculate the next one.
Base:
fun (0) = 0.
Transition:
What is the optimal solution, fun (k), if we consider only the first k records?
There are two possibilities: either the k-th record is dropped, then the solution is the same as fun (k-1), or the k-th record is the last record of a bucket.
Let us then consider all possible buckets ending with the k-th record in a loop, as explained below.
Look at records k, k-1, k-2, ..., down to the very first record.
Let the current index be i.
If the records from i to k span more than 31 consecutive days, break from the loop.
Otherwise, if the number of records, k-i+1, is at least 32, we can solve the subproblem fun (i-1) and then add the records from i to k, getting a prize of cost (i) + cost (i+1) + ... + cost (k).
The value fun (k) is the maximum of these possibilities, along with the possibility to drop the k-th record.
Answer: it is just fun (n), the case where we considered all the records.
In pseudocode:
fun[0] = 0
for k = 1, 2, ..., n:
fun[k] = fun[k-1]
cost_i_to_k = 0
for i = k, k-1, ..., 1:
if day[k] - day[i] > 31:
break
cost_i_to_k += cost[i]
if k-i+1 >= 32:
fun[k] = max (fun[k], fun[i-1] + cost_i_to_k)
return fun[n]
It is not clear whether we are allowed to split records on a single day into different buckets.
If the answer is no, we will have to enforce it by not considering buckets starting or ending between records in a single day.
Technically, it can be done by a couple of if statements.
Another way is to consider days instead of records: instead of tickets which have day and cost, we will work with days.
Each day will have cost, the total cost of tickets on that day, and quantity, the number of tickets.
Edit: as per comment, we indeed can not split any single day.
Then, after some preprocessing to get days records instead of tickets records, we can go as follows, in pseudocode:
fun[0] = 0
for k = 1, 2, ..., n:
fun[k] = fun[k-1]
cost_i_to_k = 0
quantity_i_to_k = 0
for i = k, k-1, ..., 1:
if k-i+1 > 31:
break
cost_i_to_k += cost[i]
quantity_i_to_k += quantity[i]
if quantity_i_to_k >= 32:
fun[k] = max (fun[k], fun[i-1] + cost_i_to_k)
return fun[n]
Here, i and k are numbers of days.
Note that we consider all possible days in the range: if there are no tickets for a particular day, we just use zeroes as its cost and quantity values.
Edit2:
The above allows us to calculate the maximum total prize, but what about the actual configuration of buckets which gets us there?
The general method will be backtracking: at position k, we will want to know how we got fun (k), and transition to either k-1 if the optimal way was to skip k-th record, or from k to i-1 for such i that the equation fun[k] = fun[i-1] + cost_i_to_k holds.
We proceed until i goes down to zero.
One of the two usual implementation approaches is to store par (k), a "parent", along with fun (k), which encodes how exactly we got the maximum.
Say, if par (k) = -1, the optimal solution skips k-th record.
Otherwise, we store the optimal index i in par (k), so that the optimal solution takes a bucket of records i to k inclusive.
The other approach is to store nothing extra.
Rather, we run a slight modification code which calculates fun (k).
But instead of assigning things to fun (k), we compare the right part of the assignment to the final value fun (k) we already got.
As soon as they are equal, we found the right transition.
In pseudocode, using the second approach, and days instead of individual records:
k = n
while k > 0:
k = prev (k)
function prev (k):
if fun[k] == fun[k-1]:
return k-1
cost_i_to_k = 0
quantity_i_to_k = 0
for i = k, k-1, ..., 1:
if k-i+1 > 31:
break
cost_i_to_k += cost[i]
quantity_i_to_k += quantity[i]
if quantity_i_to_k >= 32:
if fun[k] == fun[i-1] + cost_i_to_k:
writeln ("bucket from $ to $: cost $, quantity $",
i, k, cost_i_to_k, quantity_i_to_k)
return i-1
assert (false, "can't happen")
Simplify the challenge, but not too much, to make an overlookable example, which can be solved by hand.
That helps a lot in finding the right questions.
For example take only 10 days, and buckets of maximum length of 3:
For building buckets and colorizing them, we need only the ticket count, here 0, 1, 2, 3.
On Average, we need more than one bucket per day, for example 2-0-2 is 4 tickets in 3 days. Or 1-1-3, 1-3, 1-3-1, 3-1-2, 1-2.
But We can only choose 2 red buckets: 2-0-2 and (1-1-3 or 1-3-3 or 3-1-2) since 1-2 in the end is only 3 tickets, but we need at least 4 (one more ticket than max day span per bucket).
But while 3-1-2 is obviously more tickets than 1-1-3 tickets, the value of less tickets might be higher.
The blue colored area is the less interesting one, because it doesn't feed itself, by ticket count.
Assume we have a set of n jobs to execute, each of which takes unit time. At any time we can serve exactly one job. Job i, 1<=i<=n earns us a profit if and only if it is executed no later than its deadline.
We can a set of jobs feasible if there exists at least one sequence that allows each job in the set to be performed no later than their deadline. "Earliest deadline first" is feasible.
Show that the greedy algorithm is optimal: Add in every step the job with the highest value of profit among those not yet considered, provided that the chosen set of jobs remains feasible.
MUST DO THIS FIRST: show first that is always possible to re-schedule two feasible sequences (one computed by Greedy) in a way that every job common to both sequences is scheduled at the same time. This new sequence might contain gaps.
UPDATE
I created an example that seems to disprove the algorithm:
Assume 4 jobs:
Job A has profit 1, time duration 2, deadline before day 3;
Job B has profit 4, time duration 1, deadline before day 4;
Job C has profit 3, time duration 1, deadline before day 3;
Job D has profit 2, time duration 1, deadline before day 2.
If we use greedy algorithm with the highest profit first, then we only get job B & C. However, if we do deadline first, then we can get all jobs and the order is CDB
Not sure if I am approaching this question in the right way, since I created an example to disprove what the question wants
This problem looks like Job Shop Scheduling, which is NP-complete (which means there's no optimal greedy algorithm - despite that experts are trying to find one since the 70's). Here's a video on a more advanced form of that use case that is being solved with a Greedy algorithm followed by Local Search.
If we presume your use case can indeed be relaxed to Job Shop Scheduling, than there are many optimization algorithms that can help, such as Metaheuristics (including Local Search such as Tabu Search and Simulated Annealing), (M)IP, Dynamic Programming, Constraint Programming, ... The reason there are so many choices, is because none are perfect. I prefer Metaheuristics, as they out-scale the others in all the research challenges I've seen.
In fact, neither "earliest deadline first", "highest profit first" nor "highest profit/duration first" are correct algorithm...
Assume 2 jobs:
Job A has profit 1, time duration 1, deadline before day 1;
Job B has profit 2, time duration 2, deadline before day 2;
Then "earliest deadline first" fails to get correct answer. Correct answer is B.
Assume another 5 jobs:
Job A has profit 2, time duration 3, deadline before day 3;
Job B has profit 1, time duration 1, deadline before day 1;
Job C has profit 1, time duration 1, deadline before day 2;
Job D has profit 1, time duration 1, deadline before day 3;
Job E has profit 1, time duration 1, deadline before day 4;
Then "highest profit first" fails to get correct answer. Correct answer is BCDE.
Assume another 4 jobs:
Job A has profit 6, time duration 4, deadline before day 6;
Job B has profit 4, time duration 3, deadline before day 6;
Job C has profit 4, time duration 3, deadline before day 6;
Job D has profit 0.0001, time duration 2, deadline before day 6;
Then "highest profit/duration first" fails to get correct answer. Correct answer is BC (Thanks for #dognose's counter-example, see comment).
One correct algorithm is Dynamic Programming:
First order by deadline ascending. dp[i][j] = k means within the first i jobs and within j time units we can get k highest profit. Then initially dp[0][0] = 0.
Jobs info are stored in 3 arrays: profit are stored in profit[i], 1<=i<=n, time duration are stored in time[i], 1<=i<=n, deadline are stored in deadline[i], 1<=i<=n.
// sort by deadline in ascending order
...
// initially 2 dimension dp array are all -1, -1 means this condition unreachable
...
dp[0][0] = 0;
int maxDeadline = max(deadline); // max value of deadline
for(int i=0;i<n;i++) {
for(int j=0;j<=maxDeadline;j++) {
// if do task i+1 satisfy deadline condition, try to update condition for "within the first i+1 jobs, cost j+time[i+1] time units, what's the highest total profit will be"
if(dp[i][j] != -1 && j + time[i+1] <= deadline[i+1]) {
dp[i+1][j+time[i+1]] = max(dp[i+1][j+time[i+1]], dp[i][j] + profit[i+1]);
}
}
}
// the max total profit can get is max value of 2 dimension dp array
The time/space complexity (which is n*m, n is job count, m is maximum deadline) of DP algorithm is highly dependent on how many jobs and the maximum deadline. If n and/or m is rather large, it maybe difficult to get answer, while for common use, it will work well.
The problem is called Job sequencing with deadlines, and can be solved by two algorithms based on greedy strategy:
Sort input jobs decreasing on profit. For every job put it in list of jobs of solution sorted increasingly on deadline. If after including a job some jobs in solution has index grater than deadline, do not include this job.
Sort input jobs decreasing on profit. For every job put it in the list of job of solution on the last possible index. If there is no free index less or equal to the job deadline, do not include the job.
public class JOB {
public static void main(String[] args) {
char name[]={'1','2','3','4'};
int dl[] = {1,1,4,1};
int profit[] ={40,30,20,10};
char cap[] = new char[2];
for (int i =0;i<2 ;i++)
{
cap[i]='\0';
}
int j;
int i =0;
j = dl[i]-1;
while (i<4)
{
if(j<0) {
i++;
if(i<4)
j = dl[i]-1;
}
else if(j<2 && cap[j]=='\0')
{
cap[j] = name[i];
i++;
if(i<4)
j = dl[i]-1;
}
else
j=j-1;
}
for (int i1 =0 ; i1< 2 ; i1++)
System.out.println(cap[i1]);
}
}
I came across this question
ADZEN is a very popular advertising firm in your city. In every road
you can see their advertising billboards. Recently they are facing a
serious challenge , MG Road the most used and beautiful road in your
city has been almost filled by the billboards and this is having a
negative effect on
the natural view.
On people's demand ADZEN has decided to remove some of the billboards
in such a way that there are no more than K billboards standing together
in any part of the road.
You may assume the MG Road to be a straight line with N billboards.Initially there is no gap between any two adjecent
billboards.
ADZEN's primary income comes from these billboards so the billboard removing process has to be done in such a way that the
billboards
remaining at end should give maximum possible profit among all possible final configurations.Total profit of a configuration is the
sum of the profit values of all billboards present in that
configuration.
Given N,K and the profit value of each of the N billboards, output the maximum profit that can be obtained from the remaining
billboards under the conditions given.
Input description
1st line contain two space seperated integers N and K. Then follow N lines describing the profit value of each billboard i.e ith
line contains the profit value of ith billboard.
Sample Input
6 2
1
2
3
1
6
10
Sample Output
21
Explanation
In given input there are 6 billboards and after the process no more than 2 should be together. So remove 1st and 4th
billboards giving a configuration _ 2 3 _ 6 10 having a profit of 21.
No other configuration has a profit more than 21.So the answer is 21.
Constraints
1 <= N <= 1,00,000(10^5)
1 <= K <= N
0 <= profit value of any billboard <= 2,000,000,000(2*10^9)
I think that we have to select minimum cost board in first k+1 boards and then repeat the same untill last,but this was not giving correct answer
for all cases.
i tried upto my knowledge,but unable to find solution.
if any one got idea please kindly share your thougths.
It's a typical DP problem. Lets say that P(n,k) is the maximum profit of having k billboards up to the position n on the road. Then you have following formula:
P(n,k) = max(P(n-1,k), P(n-1,k-1) + C(n))
P(i,0) = 0 for i = 0..n
Where c(n) is the profit from putting the nth billboard on the road. Using that formula to calculate P(n, k) bottom up you'll get the solution in O(nk) time.
I'll leave up to you to figure out why that formula holds.
edit
Dang, I misread the question.
It still is a DP problem, just the formula is different. Let's say that P(v,i) means the maximum profit at point v where last cluster of billboards has size i.
Then P(v,i) can be described using following formulas:
P(v,i) = P(v-1,i-1) + C(v) if i > 0
P(v,0) = max(P(v-1,i) for i = 0..min(k, v))
P(0,0) = 0
You need to find max(P(n,i) for i = 0..k)).
This problem is one of the challenges posted in www.interviewstreet.com ...
I'm happy to say I got this down recently, but not quite satisfied and wanted to see if there's a better method out there.
soulcheck's DP solution above is straightforward, but won't be able to solve this completely due to the fact that K can be as big as N, meaning the DP complexity will be O(NK) for both runtime and space.
Another solution is to do branch-and-bound, keeping track the best sum so far, and prune the recursion if at some level, that is, if currSumSoFar + SUM(a[currIndex..n)) <= bestSumSoFar ... then exit the function immediately, no point of processing further when the upper-bound won't beat best sum so far.
The branch-and-bound above got accepted by the tester for all but 2 test-cases.
Fortunately, I noticed that the 2 test-cases are using small K (in my case, K < 300), so the DP technique of O(NK) suffices.
soulcheck's (second) DP solution is correct in principle. There are two improvements you can make using these observations:
1) It is unnecessary to allocate the entire DP table. You only ever look at two rows at a time.
2) For each row (the v in P(v, i)), you are only interested in the i's which most increase the max value, which is one more than each i that held the max value in the previous row. Also, i = 1, otherwise you never consider blanks.
I coded it in c++ using DP in O(nlogk).
Idea is to maintain a multiset with next k values for a given position. This multiset will typically have k values in mid processing. Each time you move an element and push new one. Art is how to maintain this list to have the profit[i] + answer[i+2]. More details on set:
/*
* Observation 1: ith state depends on next k states i+2....i+2+k
* We maximize across this states added on them "accumulative" sum
*
* Let Say we have list of numbers of state i+1, that is list of {profit + state solution}, How to get states if ith solution
*
* Say we have following data k = 3
*
* Indices: 0 1 2 3 4
* Profits: 1 3 2 4 2
* Solution: ? ? 5 3 1
*
* Answer for [1] = max(3+3, 5+1, 9+0) = 9
*
* Indices: 0 1 2 3 4
* Profits: 1 3 2 4 2
* Solution: ? 9 5 3 1
*
* Let's find answer for [0], using set of [1].
*
* First, last entry should be removed. then we have (3+3, 5+1)
*
* Now we should add 1+5, but entries should be incremented with 1
* (1+5, 4+3, 6+1) -> then find max.
*
* Could we do it in other way but instead of processing list. Yes, we simply add 1 to all elements
*
* answer is same as: 1 + max(1-1+5, 3+3, 5+1)
*
*/
ll dp()
{
multiset<ll, greater<ll> > set;
mem[n-1] = profit[n-1];
ll sumSoFar = 0;
lpd(i, n-2, 0)
{
if(sz(set) == k)
set.erase(set.find(added[i+k]));
if(i+2 < n)
{
added[i] = mem[i+2] - sumSoFar;
set.insert(added[i]);
sumSoFar += profit[i];
}
if(n-i <= k)
mem[i] = profit[i] + mem[i+1];
else
mem[i] = max(mem[i+1], *set.begin()+sumSoFar);
}
return mem[0];
}
This looks like a linear programming problem. This problem would be linear, but for the requirement that no more than K adjacent billboards may remain.
See wikipedia for a general treatment: http://en.wikipedia.org/wiki/Linear_programming
Visit your university library to find a good textbook on the subject.
There are many, many libraries to assist with linear programming, so I suggest you do not attempt to code an algorithm from scratch. Here is a list relevant to Python: http://wiki.python.org/moin/NumericAndScientific/Libraries
Let P[i] (where i=1..n) be the maximum profit for billboards 1..i IF WE REMOVE billboard i. It is trivial to calculate the answer knowing all P[i]. The baseline algorithm for calculating P[i] is as follows:
for i=1,N
{
P[i]=-infinity;
for j = max(1,i-k-1)..i-1
{
P[i] = max( P[i], P[j] + C[j+1]+..+C[i-1] );
}
}
Now the idea that allows us to speed things up. Let's say we have two different valid configurations of billboards 1 through i only, let's call these configurations X1 and X2. If billboard i is removed in configuration X1 and profit(X1) >= profit(X2) then we should always prefer configuration X1 for billboards 1..i (by profit() I meant the profit from billboards 1..i only, regardless of configuration for i+1..n). This is as important as it is obvious.
We introduce a doubly-linked list of tuples {idx,d}: {{idx1,d1}, {idx2,d2}, ..., {idxN,dN}}.
p->idx is index of the last billboard removed. p->idx is increasing as we go through the list: p->idx < p->next->idx
p->d is the sum of elements (C[p->idx]+C[p->idx+1]+..+C[p->next->idx-1]) if p is not the last element in the list. Otherwise it is the sum of elements up to the current position minus one: (C[p->idx]+C[p->idx+1]+..+C[i-1]).
Here is the algorithm:
P[1] = 0;
list.AddToEnd( {idx=0, d=C[0]} );
// sum of elements starting from the index at top of the list
sum = C[0]; // C[list->begin()->idx]+C[list->begin()->idx+1]+...+C[i-1]
for i=2..N
{
if( i - list->begin()->idx > k + 1 ) // the head of the list is "too far"
{
sum = sum - list->begin()->d
list.RemoveNodeFromBeginning()
}
// At this point the list should containt at least the element
// added on the previous iteration. Calculating P[i].
P[i] = P[list.begin()->idx] + sum
// Updating list.end()->d and removing "unnecessary nodes"
// based on the criterion described above
list.end()->d = list.end()->d + C[i]
while(
(list is not empty) AND
(P[i] >= P[list.end()->idx] + list.end()->d - C[list.end()->idx]) )
{
if( list.size() > 1 )
{
list.end()->prev->d += list.end()->d
}
list.RemoveNodeFromEnd();
}
list.AddToEnd( {idx=i, d=C[i]} );
sum = sum + C[i]
}
//shivi..coding is adictive!!
#include<stdio.h>
long long int arr[100001];
long long int sum[100001];
long long int including[100001],excluding[100001];
long long int maxim(long long int a,long long int b)
{if(a>b) return a;return b;}
int main()
{
int N,K;
scanf("%d%d",&N,&K);
for(int i=0;i<N;++i)scanf("%lld",&arr[i]);
sum[0]=arr[0];
including[0]=sum[0];
excluding[0]=sum[0];
for(int i=1;i<K;++i)
{
sum[i]+=sum[i-1]+arr[i];
including[i]=sum[i];
excluding[i]=sum[i];
}
long long int maxi=0,temp=0;
for(int i=K;i<N;++i)
{
sum[i]+=sum[i-1]+arr[i];
for(int j=1;j<=K;++j)
{
temp=sum[i]-sum[i-j];
if(i-j-1>=0)
temp+=including[i-j-1];
if(temp>maxi)maxi=temp;
}
including[i]=maxi;
excluding[i]=including[i-1];
}
printf("%lld",maxim(including[N-1],excluding[N-1]));
}
//here is the code...passing all but 1 test case :) comment improvements...simple DP
In a current project, people can order goods delivered to their door and choose 'pay on delivery' as a payment option. To make sure the delivery guy has enough change customers are asked to input the amount they will pay (e.g. delivery is 48,13, they will pay with 60,- (3*20,-)). Now, if it were up to me I'd make it a free field, but apparantly higher-ups have decided is should be a selection based on available denominations, without giving amounts that would result in a set of denominations which could be smaller.
Example:
denominations = [1,2,5,10,20,50]
price = 78.12
possibilities:
79 (multitude of options),
80 (e.g. 4*20)
90 (e.g. 50+2*20)
100 (2*50)
It's international, so the denominations could change, and the algorithm should be based on that list.
The closest I have come which seems to work is this:
for all denominations in reversed order (large=>small)
add ceil(price/denomination) * denomination to possibles
baseprice = floor(price/denomination) * denomination;
for all smaller denominations as subdenomination in reversed order
add baseprice + (ceil((price - baseprice) / subdenomination) * subdenomination) to possibles
end for
end for
remove doubles
sort
Is seems to work, but this has emerged after wildly trying all kinds of compact algorithms, and I cannot defend why it works, which could lead to some edge-case / new countries getting wrong options, and it does generate some serious amounts of doubles.
As this is probably not a new problem, and Google et al. could not provide me with an answer save for loads of pages calculating how to make exact change, I thought I'd ask SO: have you solved this problem before? Which algorithm? Any proof it will always work?
Its an application of the Greedy Algorithm http://mathworld.wolfram.com/GreedyAlgorithm.html (An algorithm used to recursively construct a set of objects from the smallest possible constituent parts)
Pseudocode
list={1,2,5,10,20,50,100} (*ordered *)
while list not null
found_answer = false
p = ceil(price) (* assume integer denominations *)
while not found_answer
find_greedy (p, list) (*algorithm in the reference above*)
p++
remove(first(list))
EDIT> some iterations are nonsense>
list={1,2,5,10,20,50,100} (*ordered *)
p = ceil(price) (* assume integer denominations *)
while list not null
found_answer = false
while not found_answer
find_greedy (p, list) (*algorithm in the reference above*)
p++
remove(first(list))
EDIT>
I found an improvement due to Pearson on the Greedy algorithm. Its O(N^3 log Z), where N is the number of denominations and Z is the greatest bill of the set.
You can find it in http://library.wolfram.com/infocenter/MathSource/5187/
You can generate in database all possible combination sets of payd coins and paper (im not good in english) and each row contains sum of this combination.
Having this database you can simple get all possible overpaid by one query,
WHERE sum >= cost and sum <= cost + epsilon
Some word about epsilon, hmm.. you can assign it from cost value? Maybe 10% of cost + 10 bucks?:
WHERE sum >= cost and sum <= cost * 1.10 + 10
Table structure must have number of columns representing number of coins and paper type.
Value of each column have number of occurences of this type of paid item.
This is not optimal and fastest solution of this problem but easy and simple to implement.
I think about better solution of this.
Other way you can for from cost to cost + epsilon and for each value calculate smallest possible number of paid items for each. I have algorithm for it. You can do this with this algorithm but this is in C++:
int R[10000];
sort(C, C + coins, cmp);
R[0]=0;
for(int i=1; i <= coins_weight; i++)
{
R[i] = 1000000;
for (int j=0; j < coins; j++)
{
if((C[j].weight <= i) && ((C[j].value + R[i - C[j].weight]) < R[i]))
{
R[i] = C[j].value + R[i - C[j].weight];
}
}
}
return R[coins_weight];
I have the following problem:
Given N objects (N < 30) of different values multiple of a "k" constant i.e. k, 2k, 3k, 4k, 6k, 8k, 12k, 16k, 24k and 32k, I need an algorithm that will distribute all items to M players (M <= 6) in such a way that the total value of the objects each player gets is as even as possible (in other words, I want to distribute all objects to all players in the fairest way possible).
EDIT: By fairest distribution I mean that the difference between the value of the objects any two players get is minimal.
Another similar case would be: I have N coins of different values and I need to divide them equally among M players; sometimes they don't divide exactly and I need to find the next best case of distribution (where no player is angry because another one got too much money).
I don't need (pseudo)code to solve this (also, this is not a homework :) ), but I'll appreciate any ideas or links to algorithms that could solve this.
Thanks!
The problem is strongly NP-complete. This means there is no way to ensure a correct solution in reasonable time. (See 3-partition-problem, thanks Paul).
Instead you'll wanna go for a good approximate solution generator. These can often get very close to the optimal answer in very short time. I can recommend the Simulated Annealing technique, which you will also be able to use for a ton of other NP-complete problems.
The idea is this:
Distribute the items randomly.
Continually make random swaps between two random players, as long as it makes the system more fair, or only a little less fair (see the wiki for details).
Stop when you have something fair enough, or you have run out of time.
This solution is much stronger than the 'greedy' algorithms many suggest. The greedy algorithm is the one where you continuously add the largest item to the 'poorest' player. An example of a testcase where greedy fails is [10,9,8,7,7,5,5].
I did an implementation of SA for you. It follows the wiki article strictly, for educational purposes. If you optimize it, I would say a 100x improvement wouldn't be unrealistic.
from __future__ import division
import random, math
values = [10,9,8,7,7,5,5]
M = 3
kmax = 1000
emax = 0
def s0():
s = [[] for i in xrange(M)]
for v in values:
random.choice(s).append(v)
return s
def E(s):
avg = sum(values)/M
return sum(abs(avg-sum(p))**2 for p in s)
def neighbour(s):
snew = [p[:] for p in s]
while True:
p1, p2 = random.sample(xrange(M),2)
if s[p1]: break
item = random.randrange(len(s[p1]))
snew[p2].append(snew[p1].pop(item))
return snew
def P(e, enew, T):
if enew < e: return 1
return math.exp((e - enew) / T)
def temp(r):
return (1-r)*100
s = s0()
e = E(s)
sbest = s
ebest = e
k = 0
while k < kmax and e > emax:
snew = neighbour(s)
enew = E(snew)
if enew < ebest:
sbest = snew; ebest = enew
if P(e, enew, temp(k/kmax)) > random.random():
s = snew; e = enew
k += 1
print sbest
Update: After playing around with Branch'n'Bound, I now believe this method to be superior, as it gives perfect results for the N=30, M=6 case within a second. However I guess you could play around with the simulated annealing approach just as much.
The greedy solution suggested by a few people seems like the best option, I ran it a bunch of times with some random values, and it seems to get it right every time.
If it's not optimal, it's at the very least very close, and it runs in O(nm) or so (I can't be bothered to do the math right now)
C# Implementation:
static List<List<int>> Dist(int n, IList<int> values)
{
var result = new List<List<int>>();
for (int i = 1; i <= n; i++)
result.Add(new List<int>());
var sortedValues = values.OrderByDescending(val => val);
foreach (int val in sortedValues)
{
var lowest = result.OrderBy(a => a.Sum()).First();
lowest.Add(val);
}
return result;
}
how about this:
order the k values.
order the players.
loop over the k values giving the next one to the next player.
when you get to the end of the players, turn around and continue giving the k values to the players in the opposite direction.
Repeatedly give the available object with the largest value to the player who has the least total value of objects assigned to him.
This is a straight-forward implementation of Justin Peel's answer:
M = 3
players = [[] for i in xrange(M)]
values = [10,4,3,1,1,1]
values.sort()
values.reverse()
for v in values:
lowest=sorted(players, key=lambda x: sum(x))[0]
lowest.append(v)
print players
print [sum(p) for p in players]
I am a beginner with Python, but it seems to work okay. This example will print
[[10], [4, 1], [3, 1, 1]]
[10, 5, 5]
30 ^ 6 isn't that large (it's less than 1 billion). Go through every possible allocation, and pick the one that's the fairest by whatever measure you define.
EDIT:
The purpose was to use the greedy solution with small improvement in the implementation, which is maybe transparent in C#:
static List<List<int>> Dist(int n, IList<int> values)
{
var result = new List<List<int>>();
for (int i = 1; i <= n; i++)
result.Add(new List<int>());
var sortedValues = values.OrderByDescending(val => val);//Assume the most efficient sorting algorithm - O(N log(N))
foreach (int val in sortedValues)
{
var lowest = result.OrderBy(a => a.Sum()).First();//This can be done in O(M * log(n)) [M - size of sortedValues, n - size of result]
lowest.Add(val);
}
return result;
}
Regarding this stage:
var lowest = result.OrderBy(a => a.Sum()).First();//This can be done in O(M * log(n)) [M - size of sortedValues, n - size of result]
The idea is that the list is always sorted (In this code it is done by OrderBy). Eventually, this sorting wont take more than O (log(n)) - because we just need to INSERT at most one item into a sorted list - that should take the same as a binary search.
Because we need to repeat this phase for sortedValues.Length times, the whole algorithm runs in O(M * log(n)).
So, in words, it can be rephrased as:
Repeat the steps below till you finish the Values values:
1. Add the biggest value to the smallest player
2. Check if this player still has the smallest sum
3. If yes, go to step 1.
4. Insert the last-that-was-got player to the sorted players list
Step 4 is the O (log(n)) step - as the list is always sorted.