Removal of billboards from given ones - algorithm

I came across this question
ADZEN is a very popular advertising firm in your city. In every road
you can see their advertising billboards. Recently they are facing a
serious challenge , MG Road the most used and beautiful road in your
city has been almost filled by the billboards and this is having a
negative effect on
the natural view.
On people's demand ADZEN has decided to remove some of the billboards
in such a way that there are no more than K billboards standing together
in any part of the road.
You may assume the MG Road to be a straight line with N billboards.Initially there is no gap between any two adjecent
billboards.
ADZEN's primary income comes from these billboards so the billboard removing process has to be done in such a way that the
billboards
remaining at end should give maximum possible profit among all possible final configurations.Total profit of a configuration is the
sum of the profit values of all billboards present in that
configuration.
Given N,K and the profit value of each of the N billboards, output the maximum profit that can be obtained from the remaining
billboards under the conditions given.
Input description
1st line contain two space seperated integers N and K. Then follow N lines describing the profit value of each billboard i.e ith
line contains the profit value of ith billboard.
Sample Input
6 2
1
2
3
1
6
10
Sample Output
21
Explanation
In given input there are 6 billboards and after the process no more than 2 should be together. So remove 1st and 4th
billboards giving a configuration _ 2 3 _ 6 10 having a profit of 21.
No other configuration has a profit more than 21.So the answer is 21.
Constraints
1 <= N <= 1,00,000(10^5)
1 <= K <= N
0 <= profit value of any billboard <= 2,000,000,000(2*10^9)
I think that we have to select minimum cost board in first k+1 boards and then repeat the same untill last,but this was not giving correct answer
for all cases.
i tried upto my knowledge,but unable to find solution.
if any one got idea please kindly share your thougths.

It's a typical DP problem. Lets say that P(n,k) is the maximum profit of having k billboards up to the position n on the road. Then you have following formula:
P(n,k) = max(P(n-1,k), P(n-1,k-1) + C(n))
P(i,0) = 0 for i = 0..n
Where c(n) is the profit from putting the nth billboard on the road. Using that formula to calculate P(n, k) bottom up you'll get the solution in O(nk) time.
I'll leave up to you to figure out why that formula holds.
edit
Dang, I misread the question.
It still is a DP problem, just the formula is different. Let's say that P(v,i) means the maximum profit at point v where last cluster of billboards has size i.
Then P(v,i) can be described using following formulas:
P(v,i) = P(v-1,i-1) + C(v) if i > 0
P(v,0) = max(P(v-1,i) for i = 0..min(k, v))
P(0,0) = 0
You need to find max(P(n,i) for i = 0..k)).

This problem is one of the challenges posted in www.interviewstreet.com ...
I'm happy to say I got this down recently, but not quite satisfied and wanted to see if there's a better method out there.
soulcheck's DP solution above is straightforward, but won't be able to solve this completely due to the fact that K can be as big as N, meaning the DP complexity will be O(NK) for both runtime and space.
Another solution is to do branch-and-bound, keeping track the best sum so far, and prune the recursion if at some level, that is, if currSumSoFar + SUM(a[currIndex..n)) <= bestSumSoFar ... then exit the function immediately, no point of processing further when the upper-bound won't beat best sum so far.
The branch-and-bound above got accepted by the tester for all but 2 test-cases.
Fortunately, I noticed that the 2 test-cases are using small K (in my case, K < 300), so the DP technique of O(NK) suffices.

soulcheck's (second) DP solution is correct in principle. There are two improvements you can make using these observations:
1) It is unnecessary to allocate the entire DP table. You only ever look at two rows at a time.
2) For each row (the v in P(v, i)), you are only interested in the i's which most increase the max value, which is one more than each i that held the max value in the previous row. Also, i = 1, otherwise you never consider blanks.

I coded it in c++ using DP in O(nlogk).
Idea is to maintain a multiset with next k values for a given position. This multiset will typically have k values in mid processing. Each time you move an element and push new one. Art is how to maintain this list to have the profit[i] + answer[i+2]. More details on set:
/*
* Observation 1: ith state depends on next k states i+2....i+2+k
* We maximize across this states added on them "accumulative" sum
*
* Let Say we have list of numbers of state i+1, that is list of {profit + state solution}, How to get states if ith solution
*
* Say we have following data k = 3
*
* Indices: 0 1 2 3 4
* Profits: 1 3 2 4 2
* Solution: ? ? 5 3 1
*
* Answer for [1] = max(3+3, 5+1, 9+0) = 9
*
* Indices: 0 1 2 3 4
* Profits: 1 3 2 4 2
* Solution: ? 9 5 3 1
*
* Let's find answer for [0], using set of [1].
*
* First, last entry should be removed. then we have (3+3, 5+1)
*
* Now we should add 1+5, but entries should be incremented with 1
* (1+5, 4+3, 6+1) -> then find max.
*
* Could we do it in other way but instead of processing list. Yes, we simply add 1 to all elements
*
* answer is same as: 1 + max(1-1+5, 3+3, 5+1)
*
*/
ll dp()
{
multiset<ll, greater<ll> > set;
mem[n-1] = profit[n-1];
ll sumSoFar = 0;
lpd(i, n-2, 0)
{
if(sz(set) == k)
set.erase(set.find(added[i+k]));
if(i+2 < n)
{
added[i] = mem[i+2] - sumSoFar;
set.insert(added[i]);
sumSoFar += profit[i];
}
if(n-i <= k)
mem[i] = profit[i] + mem[i+1];
else
mem[i] = max(mem[i+1], *set.begin()+sumSoFar);
}
return mem[0];
}

This looks like a linear programming problem. This problem would be linear, but for the requirement that no more than K adjacent billboards may remain.
See wikipedia for a general treatment: http://en.wikipedia.org/wiki/Linear_programming
Visit your university library to find a good textbook on the subject.
There are many, many libraries to assist with linear programming, so I suggest you do not attempt to code an algorithm from scratch. Here is a list relevant to Python: http://wiki.python.org/moin/NumericAndScientific/Libraries

Let P[i] (where i=1..n) be the maximum profit for billboards 1..i IF WE REMOVE billboard i. It is trivial to calculate the answer knowing all P[i]. The baseline algorithm for calculating P[i] is as follows:
for i=1,N
{
P[i]=-infinity;
for j = max(1,i-k-1)..i-1
{
P[i] = max( P[i], P[j] + C[j+1]+..+C[i-1] );
}
}
Now the idea that allows us to speed things up. Let's say we have two different valid configurations of billboards 1 through i only, let's call these configurations X1 and X2. If billboard i is removed in configuration X1 and profit(X1) >= profit(X2) then we should always prefer configuration X1 for billboards 1..i (by profit() I meant the profit from billboards 1..i only, regardless of configuration for i+1..n). This is as important as it is obvious.
We introduce a doubly-linked list of tuples {idx,d}: {{idx1,d1}, {idx2,d2}, ..., {idxN,dN}}.
p->idx is index of the last billboard removed. p->idx is increasing as we go through the list: p->idx < p->next->idx
p->d is the sum of elements (C[p->idx]+C[p->idx+1]+..+C[p->next->idx-1]) if p is not the last element in the list. Otherwise it is the sum of elements up to the current position minus one: (C[p->idx]+C[p->idx+1]+..+C[i-1]).
Here is the algorithm:
P[1] = 0;
list.AddToEnd( {idx=0, d=C[0]} );
// sum of elements starting from the index at top of the list
sum = C[0]; // C[list->begin()->idx]+C[list->begin()->idx+1]+...+C[i-1]
for i=2..N
{
if( i - list->begin()->idx > k + 1 ) // the head of the list is "too far"
{
sum = sum - list->begin()->d
list.RemoveNodeFromBeginning()
}
// At this point the list should containt at least the element
// added on the previous iteration. Calculating P[i].
P[i] = P[list.begin()->idx] + sum
// Updating list.end()->d and removing "unnecessary nodes"
// based on the criterion described above
list.end()->d = list.end()->d + C[i]
while(
(list is not empty) AND
(P[i] >= P[list.end()->idx] + list.end()->d - C[list.end()->idx]) )
{
if( list.size() > 1 )
{
list.end()->prev->d += list.end()->d
}
list.RemoveNodeFromEnd();
}
list.AddToEnd( {idx=i, d=C[i]} );
sum = sum + C[i]
}

//shivi..coding is adictive!!
#include<stdio.h>
long long int arr[100001];
long long int sum[100001];
long long int including[100001],excluding[100001];
long long int maxim(long long int a,long long int b)
{if(a>b) return a;return b;}
int main()
{
int N,K;
scanf("%d%d",&N,&K);
for(int i=0;i<N;++i)scanf("%lld",&arr[i]);
sum[0]=arr[0];
including[0]=sum[0];
excluding[0]=sum[0];
for(int i=1;i<K;++i)
{
sum[i]+=sum[i-1]+arr[i];
including[i]=sum[i];
excluding[i]=sum[i];
}
long long int maxi=0,temp=0;
for(int i=K;i<N;++i)
{
sum[i]+=sum[i-1]+arr[i];
for(int j=1;j<=K;++j)
{
temp=sum[i]-sum[i-j];
if(i-j-1>=0)
temp+=including[i-j-1];
if(temp>maxi)maxi=temp;
}
including[i]=maxi;
excluding[i]=including[i-1];
}
printf("%lld",maxim(including[N-1],excluding[N-1]));
}
//here is the code...passing all but 1 test case :) comment improvements...simple DP

Related

Optimal choice algorithm

i have an appointment for university which is due today and i start getting nervous. We recently discussed dynamic programming for algorithm optimization and now we shall implement an algorithm ourself which uses dynamic programming.
Task
So we have a simple game for which we shall write an algorithm to find the best possible strategy to get the best possible score (assuming both players play optimized).
We have a row of numbers like 4 7 2 3 (note that according to the task description it is not asured that it always is an equal count of numbers). Now each player turnwise takes a number from the back or the front. When the last number is picked the numbers are summed up for each player and the resulting scores for each player are substracted from each other. The result is then the score for player 1. So an optimal order for the above numbers would be
P1: 3 -> p2: 4 -> p1: 7 -> p2: 2
So p1 would have 3, 7 and p2 would have 4, 2 which results in a final score of (3 + 7) - (4 + 2) = 4 for player 1.
In the first task we should simply implement "an easy recursive way of solving this" where i just used a minimax algorithm which seemed to be fine for the automated test. In the second task however i am stuck since we shall now work with dynamic programming techniques. The only hint i found was that in the task itself a matrix is mentioned.
What i know so far
We had an example of a word converting problem where such a matrix was used it was called Edit distance of two words which means how many changes (Insertions, Deletions, Substitutions) of letters does it take to change one word into another. There the two words where ordered as a table or matrix and for each combination of the word the distance would be calculated.
Example:
W H A T
| D | I
v v
W A N T
editing distance would be 2. And you had a table where each editing distance for each substring was displayed like this:
"" W H A T
1 2 3 4
W 1 0 1 2 3
A 2 1 1 2 3
N 3 2 2 2 3
T 4 3 3 3 2
So for example from WHA to WAN would take 2 edits: insert N and delete H, from WH to WAN would also take 2 edits: substitude H->A and insert N and so on. These values where calculated with an "OPT" function which i think stands for optimization.
I also leanred about bottom-up and top-down recursive schemes but im not quite sure how to attach that to my problem.
What i thought about
As a reminder i use the numbers 4 7 2 3.
i learned from the above that i should try to create a table where each possible result is displayed (like minimax just that it will be saved before). I then created a simple table where i tried to include the possible draws which can be made like this (which i think is my OPT function):
4 7 2 3
------------------
a. 4 | 0 -3 2 1
|
b. 7 | 3 0 5 4
|
c. 2 | -2 -5 0 -1
|
d. 3 | -1 -4 1 0
the left column marks player 1 draws, the upper row marks player 2 draws and each number then stands for numberP1 - numberP2. From this table i can at least read the above mentioned optimal strategy of 3 -> 4 -> 7 -> 2 (-1 + 5) so im sure that the table should contain all possible results, but im not quite sure now how to draw the results from it. I had the idea to start iterating over the rows and pick the one with the highest number in it and mark that as the pick from p1 (but that would be greedy anyways). p2 would then search this row for the lowest number and pick that specific entry which would then be the turn.
Example:
p1 picks row a. 7 | 3 0 5 4 since 5 is the highest value in the table. P2 now picks the 3 from that row because it is the lowest (the 0 is an invalid draw since it is the same number and you cant pick that twice) so the first turn would be 7 -> 4 but then i noticed that this draw is not possible since the 7 is not accessible from the start. So for each turn you have only 4 possibilities: the outer numbers of the table and the ones which are directly after/before them since these would be accessable after drawing. So for the first turn i only have rows a. or d. and from that p1 could pick:
4 which leaves p2 with 7 or 3. Or p1 takes 3 which leaves p2 with 4 or 2
But i dont really know how to draw a conclusion out of that and im really stuck.
So i would really like to know if im on the right way with that or if im overthinking this pretty much. Is this the right way to solve this?
The first thing you should try to write down, when starting a dynamic programming algorithm, is a recurrence relation.
Let's first simplify a very little the problem. We will consider that the number of cards is even, and that we want to design an optimal strategy for the first player to play. Once we have managed to solve this version of the problem, the others (odd number of cards, optimize strategy for second player) follows trivially.
So, first, a recurrence relation. Let X(i, j) be the best possible score that player 1 can expect (when player 2 plays optimally as well), when the cards remaining are from the i^th to the j^th ones. Then, the best score that player 1 can expect when playing the game will be represented by X(1, n).
We have:
X(i, j) = max(Arr[i] + X(i+1, j), X(i, j-1) + Arr[j]) if j-i % 2 == 1, meaning that the best score that player one can expect is the best between taking the card on the left, and taking the card on the right.
In the other case, the other player is playing, so he'll try to minimize:
X(i, j) = min(Arr[i] + X(i+1, j), X(i, j-1) + Arr[j]) if j-i % 2 == 0.
The terminal case is trivial: X(i, i) = Arr[i], meaning that when there is only one card, we just pick it, and that's all.
Now the algorithm without dynamic programming, here we only write the recurrence relation as a recursive algorithm:
function get_value(Arr, i, j) {
if i == j {
return Arr[i]
} else if j - i % 2 == 0 {
return max(
Arr[i] + get_value(i+1, j),
get_value(i, j-1) + Arr[j]
)
} else {
return min(
Arr[i] + get_value(i+1, j),
get_value(i, j-1) + Arr[j]
)
}
}
The problem with this function is that for some given i, j, there will be many redundant calculations of X(i, j). The essence of dynamic programming is to store intermediate results in order to prevent redundant calculations.
Algo with dynamic programming (X is initialized with + inf everywhere.
function get_value(Arr, X, i, j) {
if X[i][j] != +inf {
return X[i][j]
} else if i == j {
result = Arr[i]
} else if j - i % 2 == 0 {
result = max(
Arr[i] + get_value(i+1, j),
get_value(i, j-1) + Arr[j]
)
} else {
result = min(
Arr[i] + get_value(i+1, j),
get_value(i, j-1) + Arr[j]
)
}
X[i][j] = result
return result
}
As you can see the only difference with the algorithm above is that we now use a 2D array X to store intermediate results. The consequence on time complexity is huge, since the first algorithm runs in O(2^n), while the second runs in O(n²).
Dynamic programming problems can generally be solved in 2 ways, top down and bottom up.
Bottom up requires building a data structure from the simplest to the most complex case. This is harder to write, but offers the option of throwing away parts of the data that you know you won't need again. Top down requires writing a recursive function, and then memoizing. So bottom up can be more efficient, top down is usually easier to write.
I will show both. The naive approach can be:
def best_game(numbers):
if 0 == len(numbers):
return 0
else:
score_l = numbers[0] - best_game(numbers[1:])
score_r = numbers[-1] - best_game(numbers[0:-1])
return max(score_l, score_r)
But we're passing a lot of redundant data. So let's reorganize it slightly.
def best_game(numbers):
def _best_game(i, j):
if j <= i:
return 0
else:
score_l = numbers[i] - _best_game(i+1, j)
score_r = numbers[j-1] - _best_game(i, j-1)
return max(score_l, score_r)
return _best_game(0, len(numbers))
And now we can add a caching layer to memoize it:
def best_game(numbers):
seen = {}
def _best_game(i, j):
if j <= i:
return 0
elif (i, j) not in seen:
score_l = numbers[i] - _best_game(i+1, j)
score_r = numbers[j-1] - _best_game(i, j-1)
seen[(i, j)] = max(score_l, score_r)
return seen[(i, j)]
return _best_game(0, len(numbers))
This approach will be memory and time O(n^2).
Now bottom up.
def best_game(numbers):
# We start with scores for each 0 length game
# before, after, and between every pair of numbers.
# There are len(numbers)+1 of these, and all scores
# are 0.
scores = [0] * (len(numbers) + 1)
for i in range(len(numbers)):
# We will compute scores for all games of length i+1.
new_scores = []
for j in range(len(numbers) - i):
score_l = numbers[j] - scores[j+1]
score_r = numbers[j+i] - scores[j]
new_scores.append(max(score_l, score_r))
# And now we replace scores by new_scores.
scores = new_scores
return scores[0]
This is again O(n^2) time but only O(n) space. Because after I compute the games of length 1 I can throw away the games of length 0. Of length 2, I can throw away the games of length 1. And so on.

2048 game: how many moves did I do?

2048 used to be quite popular just a little while ago. Everybody played it and a lot of people posted nice screenshots with their accomplishments(myself among them). Then at some point I began to wonder if it possible to tell how long did someone play to get to that score. I benchmarked and it turns out that(at least on the android application I have) no more than one move can be made in one second. Thus if you play long enough(and fast enough) the number of moves you've made is quite good approximation to the number of seconds you've played. Now the question is: is it possible having a screenshot of 2048 game to compute how many moves were made.
Here is an example screenshot(actually my best effort on the game so far):
From the screenshot you can see the field layout at the current moment and the number of points that the player has earned. So: is this information enough to compute how many moves I've made and if so, what is the algorithm to do that?
NOTE: I would like to remind you that points are only scored when two tiles "combine" and the number of points scored is the value of the new tile(i.e. the sum of the values of the tiles being combined).
The short answer is it is possible to compute the number of moves using only this information. I will explain the algorithm to do that and I will try to post my answer in steps. Each step will be an observation targeted at helping you solve the problem. I encourage the reader to try and solve the problem alone after each tip.
Observation number one: after each move exactly one tile appears. This tile is either 4 or 2. Thus what we need to do is to count the number of tiles that appeared. At least on the version of the game I played the game always started with 2 tiles with 2 on them placed at random.
We don't care about the actual layout of the field. We only care about the numbers that are present on it. This will become more obvious when I explain my algorithm.
Seeing the values in the cells on the field we can compute what the score would be if 2 had appeared after each move. Call that value twos_score.
The number of fours that have appeared is equal to the difference of twos_score and actual_score divided by 4. This is true because for forming a 4 from two 2-s we would have scored 4 points, while if the 4 appears straight away we score 0. Call the number of fours fours.
We can compute the number of twos we needed to form all the numbers on the field. After that we need to subtract 2 * fours from this value as a single 4 replaces the need of two 2s. Call this twos.
Using this observations we are able to solve the problem. Now I will explain in more details how to perform the separate steps.
How to compute the score if only two appeared?
I will prove that to form the number 2n, the player would score 2n*(n - 1) points(using induction).
The statements is obvious for 2 as it directly appears and therefor no points are scored for it.
Let's assume that for a fixed k for the number 2k the user will score 2k*(k - 1)
For k + 1: 2k + 1 can only be formed by combining two numbers of value 2k. Thus the user will score 2k*(k - 1) + 2k*(k - 1) + 2k+1(the score for the two numbers being combined plus the score for the new number).
This equals: 2k + 1*(k - 1) + 2k+1= 2k+1 * (k - 1 + 1) = 2k+1 * k. This completes the induction.
Therefor to compute the score if only twos appeared we need to iterate over all numbers on the board and accumulate the score we get for them using the formula above.
How to compute the number of twos needed to form the numbers on the field?
It is much easier to notice that the number of twos needed to form 2n is 2n - 1. A strict proof can again be done using induction, but I will leave this to the reader.
The code
I will provide code for solving the problem in c++. However I do not use anything too language specific(appart from vector which is simply a dynamically expanding array) so it should be very easy to port to many other languages.
/**
* #param a - a vector containing the values currently in the field.
* A value of zero means "empty cell".
* #param score - the score the player currently has.
* #return a pair where the first number is the number of twos that appeared
* and the second number is the number of fours that appeared.
*/
pair<int,int> solve(const vector<vector<int> >& a, int score) {
vector<int> counts(20, 0);
for (int i = 0; i < (int)a.size(); ++i) {
for (int j = 0; j < (int)a[0].size(); ++j) {
if (a[i][j] == 0) {
continue;
}
int num;
for (int l = 1; l < 20; ++l) {
if (a[i][j] == 1 << l) {
num = l;
break;
}
}
counts[num]++;
}
}
// What the score would be if only twos appeared every time
int twos_score = 0;
for (int i = 1; i < 20; ++i) {
twos_score += counts[i] * (1 << i) * (i - 1);
}
// For each 4 that appears instead of a two the overall score decreases by 4
int fours = (twos_score - score) / 4;
// How many twos are needed for all the numbers on the field(ignoring score)
int twos = 0;
for (int i = 1; i < 20; ++i) {
twos += counts[i] * (1 << (i - 1));
}
// Each four replaces two 2-s
twos -= fours * 2;
return make_pair(twos, fours);
}
Now to answer how many moves we've made we should add the two values of the pair returned by this function and subtract two because two tiles with 2 appear straight away.

Find best adjacent pair such that to maximize the sum of the first element

I was asked this question in an interview, but couldn't figure it out and would like to know the answer.
Suppose we have a list like this:
1 7 8 6 1 1 5 0
I need to find an algorithm such that it pairs adjacent numbers. The goal is to maximize the benefit but such that only the first number in the pair is counted.
e.g in the above, the optimal solution is:
{7,8} {6,1} {5,0}
so when taking only the first one:
7 + 6 + 5 = 18.
I tried various greedy solutions, but they often pick on {8,6} which leads to a non-optimal solution.
Thoughts?
First, observe that it never makes sense to skip more than one number *. Then, observe that the answer to this problem can be constructed by comparing two numbers:
The answer to the subproblem where you skip the first number, and
The answer to the subproblem where you keep the first number
Finally, observe that the answer to a problem with the sequence of only one number is zero, and the solution to the problem with only two numbers is the first number of the two.
With this information in hand, you can construct a recursive memoized solution to the problem, or a dynamic programming solution that starts at the back and goes back deciding on whether to include the previous number or not.
* Proof: assume that you have a sequence that produces the max sum, and that it skip two numbers in the original sequence. Then you can add the pair that you skipped, and improve on the answer.
A simple dynamic programming problem. Starting from one specific index, we can either make a pair at current index, or skip to the next index:
int []dp;//Array to store result of sub-problem
boolean[]check;//Check for already solve sub-problem
public int solve(int index, int []data){
if(index + 1 >= data.length){//Special case,which cannot create any pair
return 0;
}
if(check[index]){//If this sub-problem is solved before, return the value
return dp[index];
}
check[index] = true;
//Either make a pair at this index, or skip to next index
int result = max(data[index] + solve(index + 2, data) , solve(index + 1,data));
return dp[index] = result;
}
It's a dynamic programming problem, and the table can be optimised away.
def best_pairs(xs):
b0, b1 = 0, max(0, xs[0])
for i in xrange(2, len(xs)):
b0, b1 = b1, max(b1, xs[i-1]+b0)
return b1
print best_pairs(map(int, '1 7 8 6 1 1 5 0'.split()))
After each iteration, b1 is the best solution using elements up to and including i, and b0 is the best solution using elements up to and including i-1.
This is my solution in Java, hope it helps.
public static int getBestSolution(int[] a, int offset) {
if (a.length-offset <= 1)
return 0;
if (a.length-offset == 2)
return a[offset];
return Math.max(a[offset] + getBestSolution(a,offset+2),
getBestSolution(a,offset+1));
}
Here is a DP formulation for O(N) solution : -
MaxPairSum(i) = Max(arr[i]+MaxPairSum(i+2),MaxPairSum(i+1))
MaxPairSum(i) is max sum for subarray (i,N)

Divvying people into rooms by last name?

I often teach large introductory programming classes (400 - 600 students) and when exam time comes around, we often have to split the class up into different rooms in order to make sure everyone has a seat for the exam.
To keep things logistically simple, I usually break the class apart by last name. For example, I might send students with last names A - H to one room, last name I - L to a second room, M - S to a third room, and T - Z to a fourth room.
The challenge in doing this is that the rooms often have wildly different capacities and it can be hard to find a way to segment the class in a way that causes everyone to fit. For example, suppose that the distribution of last names is (for simplicity) the following:
Last name starts with A: 25
Last name starts with B: 150
Last name starts with C: 200
Last name starts with D: 50
Suppose that I have rooms with capacities 350, 50, and 50. A greedy algorithm for finding a room assignment might be to sort the rooms into descending order of capacity, then try to fill in the rooms in that order. This, unfortunately, doesn't always work. For example, in this case, the right option is to put last name A in one room of size 50, last names B - C into the room of size 350, and last name D into another room of size 50. The greedy algorithm would put last names A and B into the 350-person room, then fail to find seats for everyone else.
It's easy to solve this problem by just trying all possible permutations of the room orderings and then running the greedy algorithm on each ordering. This will either find an assignment that works or report that none exists. However, I'm wondering if there is a more efficient way to do this, given that the number of rooms might be between 10 and 20 and checking all permutations might not be feasible.
To summarize, the formal problem statement is the following:
You are given a frequency histogram of the last names of the students in a class, along with a list of rooms and their capacities. Your goal is to divvy up the students by the first letter of their last name so that each room is assigned a contiguous block of letters and does not exceed its capacity.
Is there an efficient algorithm for this, or at least one that is efficient for reasonable room sizes?
EDIT: Many people have asked about the contiguous condition. The rules are
Each room should be assigned at most a block of contiguous letters, and
No letter should be assigned to two or more rooms.
For example, you could not put A - E, H - N, and P - Z into the same room. You could also not put A - C in one room and B - D in another.
Thanks!
It can be solved using some sort of DP solution on [m, 2^n] space, where m is number of letters (26 for english) and n is number of rooms. With m == 26 and n == 20 it will take about 100 MB of space and ~1 sec of time.
Below is solution I have just implemented in C# (it will successfully compile on C++ and Java too, just several minor changes will be needed):
int[] GetAssignments(int[] studentsPerLetter, int[] rooms)
{
int numberOfRooms = rooms.Length;
int numberOfLetters = studentsPerLetter.Length;
int roomSets = 1 << numberOfRooms; // 2 ^ (number of rooms)
int[,] map = new int[numberOfLetters + 1, roomSets];
for (int i = 0; i <= numberOfLetters; i++)
for (int j = 0; j < roomSets; j++)
map[i, j] = -2;
map[0, 0] = -1; // starting condition
for (int i = 0; i < numberOfLetters; i++)
for (int j = 0; j < roomSets; j++)
if (map[i, j] > -2)
{
for (int k = 0; k < numberOfRooms; k++)
if ((j & (1 << k)) == 0)
{
// this room is empty yet.
int roomCapacity = rooms[k];
int t = i;
for (; t < numberOfLetters && roomCapacity >= studentsPerLetter[t]; t++)
roomCapacity -= studentsPerLetter[t];
// marking next state as good, also specifying index of just occupied room
// - it will help to construct solution backwards.
map[t, j | (1 << k)] = k;
}
}
// Constructing solution.
int[] res = new int[numberOfLetters];
int lastIndex = numberOfLetters - 1;
for (int j = 0; j < roomSets; j++)
{
int roomMask = j;
while (map[lastIndex + 1, roomMask] > -1)
{
int lastRoom = map[lastIndex + 1, roomMask];
int roomCapacity = rooms[lastRoom];
for (; lastIndex >= 0 && roomCapacity >= studentsPerLetter[lastIndex]; lastIndex--)
{
res[lastIndex] = lastRoom;
roomCapacity -= studentsPerLetter[lastIndex];
}
roomMask ^= 1 << lastRoom; // Remove last room from set.
j = roomSets; // Over outer loop.
}
}
return lastIndex > -1 ? null : res;
}
Example from OP question:
int[] studentsPerLetter = { 25, 150, 200, 50 };
int[] rooms = { 350, 50, 50 };
int[] ans = GetAssignments(studentsPerLetter, rooms);
Answer will be:
2
0
0
1
Which indicates index of room for each of the student's last name letter. If assignment is not possible my solution will return null.
[Edit]
After thousands of auto generated tests my friend has found a bug in code which constructs solution backwards. It does not influence main algo, so fixing this bug will be an exercise to the reader.
The test case that reveals the bug is students = [13,75,21,49,3,12,27,7] and rooms = [6,82,89,6,56]. My solution return no answers, but actually there is an answer. Please note that first part of solution works properly, but answer construction part fails.
This problem is NP-Complete and thus there is no known polynomial time (aka efficient) solution for this (as long as people cannot prove P = NP). You can reduce an instance of knapsack or bin-packing problem to your problem to prove it is NP-complete.
To solve this you can use 0-1 knapsack problem. Here is how:
First pick the biggest classroom size and try to allocate as many group of students you can (using 0-1 knapsack), i.e equal to the size of the room. You are guaranteed not to split a group of student, as this is 0-1 knapsack. Once done, take the next biggest classroom and continue.
(You use any known heuristic to solve 0-1 knapsack problem.)
Here is the reduction --
You need to reduce a general instance of 0-1 knapsack to a specific instance of your problem.
So lets take a general instance of 0-1 knapsack. Lets take a sack whose weight is W and you have x_1, x_2, ... x_n groups and their corresponding weights are w_1, w_2, ... w_n.
Now the reduction --- this general instance is reduced to your problem as follows:
you have one classroom with seating capacity W. Each x_i (i \in (1,n)) is a group of students whose last alphabet begins with i and their number (aka size of group) is w_i.
Now you can prove if there is a solution of 0-1 knapsack problem, your problem has a solution...and the converse....also if there is no solution for 0-1 knapsack, then your problem have no solution, and vice versa.
Please remember the important thing of reduction -- general instance of a known NP-C problem to a specific instance of your problem.
Hope this helps :)
Here is an approach that should work reasonably well, given common assumptions about the distribution of last names by initial. Fill the rooms from smallest capacity to largest as compactly as possible within the constraints, with no backtracking.
It seems reasonable (to me at least) for the largest room to be listed last, as being for "everyone else" not already listed.
Is there any reason to make life so complicated? Why cann't you assign registration numbers to each student and then use the number to allocate them whatever the way you want :) You do not need to write a code, students are happy, everyone is happy.

finding the position of a fraction in farey sequence

For finding the position of a fraction in farey sequence, i tried to implement the algorithm given here http://www.math.harvard.edu/~corina/publications/farey.pdf under "initial algorithm" but i can't understand where i'm going wrong, i am not getting the correct answers . Could someone please point out my mistake.
eg. for order n = 7 and fractions 1/7 ,1/6 i get same answers.
Here's what i've tried for given degree(n), and a fraction a/b:
sum=0;
int A[100000];
A[1]=a;
for(i=2;i<=n;i++)
A[i]=i*a-a;
for(i=2;i<=n;i++)
{
for(j=i+i;j<=n;j+=i)
A[j]-=A[i];
}
for(i=1;i<=n;i++)
sum+=A[i];
ans = sum/b;
Thanks.
Your algorithm doesn't use any particular properties of a and b. In the first part, every relevant entry of the array A is a multiple of a, but the factor is independent of a, b and n. Setting up the array ignoring the factor a, i.e. starting with A[1] = 1, A[i] = i-1 for 2 <= i <= n, after the nested loops, the array contains the totients, i.e. A[i] = phi(i), no matter what a, b, n are. The sum of the totients from 1 to n is the number of elements of the Farey sequence of order n (plus or minus 1, depending on which of 0/1 and 1/1 are included in the definition you use). So your answer is always the approximation (a*number of terms)/b, which is close but not exact.
I've not yet looked at how yours relates to the algorithm in the paper, check back for updates later.
Addendum: Finally had time to look at the paper. Your initialisation is not what they give. In their algorithm, A[q] is initialised to floor(x*q), for a rational x = a/b, the correct initialisation is
for(i = 1; i <= n; ++i){
A[i] = (a*i)/b;
}
in the remainder of your code, only ans = sum/b; has to be changed to ans = sum;.
A non-algorithmic way of finding the position t of a fraction in the Farey sequence of order n>1 is shown in Remark 7.10(ii)(a) of the paper, under m:=n-1, where mu-bar stands for the number-theoretic Mobius function on positive integers taking values from the set {-1,0,1}.
Here's my Java solution that works. Add head(0/1), tail(1/1) nodes to a SLL.
Then start by passing headNode,tailNode and setting required orderLevel.
public void generateSequence(Node leftNode, Node rightNode){
Fraction left = (Fraction) leftNode.getData();
Fraction right= (Fraction) rightNode.getData();
FractionNode midNode = null;
int midNum = left.getNum()+ right.getNum();
int midDenom = left.getDenom()+ right.getDenom();
if((midDenom <=getMaxLevel())){
Fraction middle = new Fraction(midNum,midDenom);
midNode = new FractionNode(middle);
}
if(midNode!= null){
leftNode.setNext(midNode);
midNode.setNext(rightNode);
generateSequence(leftNode, midNode);
count++;
}else if(rightNode.next()!=null){
generateSequence(rightNode, rightNode.next());
}
}

Resources