Does Peterson's algorithm satisfy starvation? - algorithm

I've been searching information on Peterson's algorithm but have come across references stating it does not satisfy starvation but only deadlock. Is this true? and if so can someone elaborate on why it does not?
Peterson's algorithm:
flag[0] = 0;
flag[1] = 0;
turn;
P0: flag[0] = 1;
turn = 1;
while (flag[1] == 1 && turn == 1)
{
// busy wait
}
// critical section
...
// end of critical section
flag[0] = 0;
P1: flag[1] = 1;
turn = 0;
while (flag[0] == 1 && turn == 0)
{
// busy wait
}
// critical section
...
// end of critical section
flag[1] = 0;
The algorithm uses two variables, flag and turn. A flag value of 1 indicates that the process wants to enter the critical section. The variable turn holds the ID of the process whose turn it is. Entrance to the critical section is granted for process P0 if P1 does not want to enter its critical section or if P1 has given priority to P0 by setting turn to 0.

As Ben Jackson suspects, the problem is with a generalized algorithm. The standard 2-process Peterson's algorithm satisfies the no-starvation property.
Apparently, Peterson's original paper actually had an algorithm for N processors. Here is a sketch that I just wrote up, in a C++-like language, that is supposedly this algorithm:
// Shared resources
int pos[N], step[N];
// Individual process code
void process(int i) {
int j;
for( j = 0; j < N-1; j++ ) {
pos[i] = j;
step[j] = i;
while( step[j] == i and some_pos_is_big(i, j) )
; // busy wait
}
// insert critical section here!
pos[i] = 0;
}
bool some_pos_is_big(int i, int j) {
int k;
for( k = 0; k < N-1; k++ )
if( k != i and pos[k] >= j )
return true;
}
return false;
}
Here's a deadlock scenario with N = 3:
Process 0 starts first, sets pos[0] = 0 and step[0] = 0 and then waits.
Process 2 starts next, sets pos[2] = 0 and step[0] = 2 and then waits.
Process 1 starts last, sets pos[1] = 0 and step[0] = 1 and then waits.
Process 2 is the first to notice the change in step[0] and so sets j = 1, pos[2] = 1, and step[1] = 2.
Processes 0 and 1 are blocked because pos[2] is big.
Process 2 is not blocked, so it sets j = 2. It this escapes the for loop and enters the critical section. After completion, it sets pos[2] = 0 but immediately starts competing for the critical section again, thus setting step[0] = 2 and waiting.
Process 1 is the first to notice the change in step[0] and proceeds as process 2 before.
...
Process 1 and 2 take turns out-competing process 0.
References. All details obtained from the paper "Some myths about famous mutual exclusion algorithms" by Alagarsamy. Apparently Block and Woo proposed a modified algorithm in "A more efficient generalization of Peterson's mutual exclusion algorithm" that does satisfy no-starvation, which Alagarsamy later improved in "A mutual exclusion algorithm with optimally bounded bypasses" (by obtaining the optimal starvation bound N-1).

A Rex is wrong with the deadlock situation.
(as a side note: the correct term would be starvation scenario, since for a deadlock there are at least two threads required to be 'stuck' see wikipedia: deadlock and starvation)
As process 2 and 1 go into level 0, step[0] is set to either 1 or 2 and thus making the advance condition of process 0 false since step[0] == 0 is false.
The peterson algorithm for 2 processes is a little simpler and does protect against starvation.
The peterson algorithm for n processes is much more complicated
To have a situation where a process starves the condition step[j] == i and some_pos_is_big(i, j) must be true forever. This implies that no other process enters the same level (which would make step[j] == i false) and that at least one process is always on the same level or on a higher level as i (to guarantee that some_pos_is_big(i, j) is kept true)
Moreover, only one process can be deadlocked in this level j. If two were deadlocked then for one of them step[j] == i would be false and therefor wouldn't be deadlocked.
So that means no process can't enter the same level and there must always be a a process in a level above.
As no other process could join the processes above (since they can't get into level j and therefor not above lelel j) at least one process must be deadlocked too above or the process in the critical section doesn't release the critical section.
If we assume that the process in the critical section terminates after a finite time, then only one of the above processes must be deadlocked.
But for that one to be deadlocked, another one above must be deadlocked etc.
However, there are only finite processes above, so eventually the top process can't be deadlocked, as it'll advance once the critical section is given free.
And therefor the peterson algorithm for n processes protects against starvation!

I suspect the comment about starvation is about some generalized, N-process Peterson's Algorithm. It is possible to construct an N-process version with bounded waiting, but without having one in particular to discuss we can't say why that particular generalization might be subject to starvation.
A quick Google turned up this paper which includes pseudocode. As you can see, the generalized version is much more complex (and expensive).

Related

Why does n-of show a discontinuity when the size of the reported agentset goes from 2 to 3?

The n-of reporter is one of those reporters making random choices, so we know that if we use the same random-seed we will always get the same agentset out of n-of.
n-of takes two arguments: size and agentset (it can also take lists, but a note on this later). I would expect that it works by throwing a pseudo-random number, using this number to choose an agent from agentset, and repeating this process size times.
If this is true we would expect that, if we test n-of on the same agentset and using the same random-seed, but each time increasing size by 1, every resulting agentset will be the same as in the previous extraction plus a further agent. After all, the sequence of pseudo-random numbers used to pick the first (size - 1) agents was the same as before.
This seems to be confirmed generally. The code below highlights the same patches plus a further one everytime size is increased, as shown by the pictures:
to highlight-patches [n]
clear-all
random-seed 123
resize-world -6 6 -6 6
ask n-of n patches [
set pcolor yellow
]
ask patch 0 0 [
set plabel word "n = " n
]
end
But there is an exception: the same does not happen when size goes from 2 to 3. As shown by the pictures below, n-of seems to follow the usual behaviour when starting from a size of 1, but the agentset suddenly changes when size reaches 3 (becoming the agentset of the figures above - which, as far as I can tell, does not change anymore):
What is going on there behind the scenes of n-of, that causes this change at this seemingly-unexplicable threshold?
In particular, this seems to be the case only for n-of. In fact, using a combination of repeat and one-of doesn't show this discontinuity (or at least as far as I've seen):
to highlight-patches-with-repeat [n]
clear-all
random-seed 123
resize-world -6 6 -6 6
repeat n [
ask one-of patches [
set pcolor yellow
]
]
ask patch 0 0 [
set plabel word "n = " n
]
end
Note that this comparison is not influenced by the fact that n-of guarantees the absence of repetitions while repeat + one-of may have repetitions (in my example above the first repetition happens when size reaches 13). The relevant aspect simply is that the reported agentset of size x is consistent with the reported agentset of size x + 1.
On using n-of on lists instead of agentsets
Doing the same on a list results in always different numbers being extracted, i.e. the additional extraction does not equal the previous extraction with the addition of a further number. While this looks to me as a counter-intuitive behaviour from the point of view of expecting always the same items to be extracted from a list if the extraction is based on always the same sequence of pseudo-random numbers, at least it looks to happen consistently and therefore it does not look to me as ambiguous behaviour as in the case of agentsets.
So let's find out how this works together. Let's start by checking the primitive implementation itself. It lives here. Here is the relevant bit with error handling and comments chopped out for brevity:
if (obj instanceof LogoList) {
LogoList list = (LogoList) obj;
if (n == list.size()) {
return list;
}
return list.randomSubset(n, context.job.random);
} else if (obj instanceof AgentSet) {
AgentSet agents = (AgentSet) obj;
int count = agents.count();
return agents.randomSubset(n, count, context.job.random);
}
So we need to investigate the implementations of randomSubset() for lists and agentsets. I'll start with agentsets.
The implementation lives here. And the relevant bits:
val array: Array[Agent] =
resultSize match {
case 0 =>
Array()
case 1 =>
Array(randomOne(precomputedCount, rng.nextInt(precomputedCount)))
case 2 =>
val (smallRan, bigRan) = {
val r1 = rng.nextInt(precomputedCount)
val r2 = rng.nextInt(precomputedCount - 1)
if (r2 >= r1) (r1, r2 + 1) else (r2, r1)
}
randomTwo(precomputedCount, smallRan, bigRan)
case _ =>
randomSubsetGeneral(resultSize, precomputedCount, rng)
}
So there you go. We can see that there is a special case when the resultSize is 2. It auto-generates 2 random numbers, and flips them to make sure they won't "overflow" the possible choices. The comment on the randomTwo() implementation clarifies that this is done as an optimization. There is similarly a special case for 1, but that's just one-of.
Okay, so now let's check lists. Looks like it's implementation of randomSubset() lives over here. Here is the snippit:
def randomSubset(n: Int, rng: Random): LogoList = {
val builder = new VectorBuilder[AnyRef]
var i = 0
var j = 0
while (j < n && i < size) {
if (rng.nextInt(size - i) < n - j) {
builder += this(i)
j += 1
}
i += 1
}
LogoList.fromVector(builder.result)
}
The code is a little obtuse, but for each element in the list it's randomly adding it to the resulting subset or not. If early items aren't added, the odds for later items go up (to 100% if need be). So changing the overall size of the list changes the numbers that will be generated in the sequence: rng.nextInt(size - i). That would explain why you don't see the same items selected in order when using the same seed but a larger list.
Elaboration
Okay, so let's elaborate on the n = 2 optimization for agentsets. There are a few things we have to know to explain this:
What does the non-optimized code do?
The non-optimized agentset code looks a lot like the list code I already discussed - it iterates each item in the agentset and randomly decides to add it to the result or not:
val iter = iterator
var i, j = 0
while (j < resultSize) {
val next = iter.next()
if (random.nextInt(precomputedCount - i) < resultSize - j) {
result(j) = next
j += 1
}
i += 1
}
Note that this code, for each item in the agentset will perform a couple of arithmetic operations, precomputedCount - i and resultSize - j as well as the final < comparison and the increments for j and i abd the j < resultSize check for the while loop. It also generates a random number for each checked element (an expensive operation) and calls next() to move our agent iterator forward. If it fills the result set before processing all elements of the agentset it will terminate "early" and save some of the work, but in the worst case scenario it is possible it'll perform all those operations for each element in the agentset when winds up needing the last agent to completely "fill" the results.
What does the optimized code do and why is it better??
So now let's check the optimized code n = 2 code:
if (!kind.mortal)
Array(
array(smallRandom),
array(bigRandom))
else {
val it = iterator
var i = 0
// skip to the first random place
while(i < smallRandom) {
it.next()
i += 1
}
val first = it.next()
i += 1
while (i < bigRandom) {
it.next()
i += 1
}
val second = it.next()
Array(first, second)
}
First, the check for kind.mortal at the start is basically checking if this is a patch agentset or not. Patches never die, so it's safe to assume all agents in the agentset are alive and you can just return the agents found in the backing array at the two provided random numbers as the result.
So on to the second bit. Here we have to use the iterator to get the agents from the set, because some of them might be dead (turtles or links). The iterator will skip over those for us as we call next() to get the next agent. You see the operations here are doing the while checks as it increments i up through the desired random numbers. So here the work is the increments for the indexer, i, as well as the checks for the while() loops. We also have to call next() to move the iterator forward. This works because we know smallRandom is smaller than bigRandom - we're just skipping through the agents and plucking out the ones we want.
Compared the non-optimized version we've avoided generator many of the random numbers, we avoid having an extra variable to track the result set count, and we avoid the math and less-than check to determine memebership in the result set. That's not bad (especially the RNG operations).
What would the impact be? Well if you have a large agentset, say 1000 agents, and you are picking 2 of them, the odds of picking any one agent are small (starting at 1/1000, in fact). That means you will run all that code for a long time before getting your 2 resulting agents.
So why not optimize for n-of 3, or 4, or 5, etc? Well, let's look back at the code to run the optimized version:
case 2 =>
val (smallRan, bigRan) = {
val r1 = rng.nextInt(precomputedCount)
val r2 = rng.nextInt(precomputedCount - 1)
if (r2 >= r1) (r1, r2 + 1) else (r2, r1)
}
randomTwo(precomputedCount, smallRan, bigRan)
That little logic at the end if (r2 >= r1) (r1, r2 + 1) else (r2, r1) makes sure that smallRan < bigRan; that is strictly less than, not equal. That logic gets much more complex when you need to generate 3, 4, or 5+ random numbers. None of them can be the same, and they all have to be in order. There are ways to quickly sort lists of numbers which might work, but generating random numbers without repetition is much harder.

fork in nested loops

can someone explain How many child processes do this program create?
the answer is 127, but I couldn't understand how they got it.
int create(int num){
int i;
for(i=0;i<num;i++)
fork();
}
int main() {
int i;
fork();
for(i=0;i<4;i++)
create(i);
return 0;
}
This really sounds like it's a homework problem for a class on operating systems, but it's an interesting problem so I'll answer it for you. First off, let's look at the code as follows. Functionally, it's the same thing, but it'll make things a little easier to digest. Also, to start, let's ignore that initial fork() call. We'll count how many there are if it weren't there, and then if we add it back in, we'll have the same amount of processes, times two.
int main() {
int i, j;
// fork();
for (i = 0; i < 4; i++) {
for (j = 0; j < i; j++) {
fork();
}
}
}
Now this is partly a math problem, and partly a programming problem. First, we need to understand what happens when we call fork(). When you create a child process, the child inherits it's own copy of all of the parent's variables at the variables' current values at the time at which the fork() call was made. So that means that now, the parent and the child have copies of the exact same variables with the exact same values, but they can modify those variables independently, and they won't effect each other. So then in the following simple example,
int main() {
int i = 0, pid = fork();
if (pid == 0) {
i = 1;
}
if (pid > 0) {
i = 2;
}
}
In the parents world, i gets the value 2, and in the child's world i gets the value 1, and these are now separate variables we're talking about so the parent can have what it wants, the child can have what it wants, they don't conflict, and everybody's happy.
Now, to answer your problem, we have to keep this in mind. So let's see how many processes we have first without the initial fork() call. Well now, the parent itself will spawn 6 child process. For each of those processes, the variables (i,j) will have values (1,0), (2,0), (2,1), (3,0), (3,1), and (3,2), respectively.
So the last child spawned at (3,2) will exit the loop, and won't spawn any more children. The child spawned at (3,1) will then continue the for loops, increment j, spawn another process, and then both children will see (i,j) at (3,2), exit the for loops, and then die. Then we had another child spawned by the parent at (3,0). Well now this child will continue through the for loops, spawn a child at (3,1) and (3,2), and then die, and then this new child spawned at (3,1) will spawn another child and then they'll die. I think we can see this is starting to get pretty complex, so we can represent this situation with the following graph.
Each vertex of the graph represents a process and the vertex labeled p is the parent process. The ordered pair on each of the edges represents the values of (i,j) at the time at which the child process was spawned. Notice how we can group the processes. In that first group, we have 1 process, the next, we have 2, the next 4, then 8, and we should see now how things are going. The next group will have 16 processes, and the next group will have 32. Therefore, if we count all the processes we have, including the parent, we have 64 process. Make sense so far?
Well now let's put that initial fork() call back in. That will cause the exact same situation that we just described to happen twice, which would give us 128 process in total, including the parent, which means that we have spawned 127 children.
So yeah, half math problem, half programming problem. Let me know your questions.
You could rewrite the first loop to be for (i = 1; i <= n; i++). Then I'm pretty sure we could say that in general, your parent process will spawn children, where

Can shared memory be inconsistent between OpenMP parallel regions?

I'm writing a tool to test some graph algorithms. The tool has to go through all the edges in the graph and mark nodes at either end under certain conditions. It then has to go through all the nodes, ensuring they were all marked. The code should be pretty straight-forward, but I'm having some concurrency issues.
Here is my code:
#pragma omp parallel for reduction(+:seen_edges) default(shared)
for (size_t i = 0; i < n_edges; i++)
{
int64_t v0 = node_left(edges[i]), v1 = node_right(edges[i]);
// Do some work...
// This is where I mark nodes if the other end of the edge corresponds to the parent array
if (v0 != v1)
{
if(parents[v0] == v1)
reached[v0] = true;
if(parents[v1] == v0)
reached[v1] = true;
}
// Do more work...
}
#pragma omp parallel for default(shared)
for (size_t i = 0; i < n_nodes; i++)
{
if (i != source && !reached[i])
error("No traversed edge leading to node", i, &n_errors);
}
The reached array is initialised to false everywhere.
I can guarantee that on the input I'm using all the nodes should be marked, and thus no error should be printed. However, sometimes, some nodes remain unmarked.
I think memory shared should be consistent between OpenMP parallel regions, and I never set any element in reached to false except for at initialisation. The implicit barrier at the end of the first region should prevent any thread from going into the second one until all edges have been checked (and all nodes marked on this test input).
I see two possible options, but have no further explanation:
Some kind of data race is going on. But because I never set elements back to false, even if multiple threads try to write to a location at the same time, should that element eventually become true?
The elements are set to true, but memory is not consistent between threads in the second parallel region. Can this even happen in OpenMP?
If someone has any insight, I'd be grateful. Cheers.
Edit: The // Do work parts don't use the reached array, and parents is never modified in the program.

Is there a way to control partitioning of OpenMP parallel_for construct?

I use OpenMP (OMP) for parallelizing a for loop. However, it seems that OMP will partition my for loop in equal interval sizes, e.g.
for( int i = 0; i < n; ++i ) {
...
}
there are NUM_THREADS blocks each sized n/NUM_THREADS. Unfortunately, I use this to parallelize a sweep over a triangular matrix, hence the last blocks have much more work to do than the first blocks. So what I really want to ask is how to perform load balancing in such a scenario. I could imagine, that if ( i % THREAD_NUMBER == 0) .. would be fine (in other words, each run in the loop is assigned to a different thread). I know this is not optimal, as caching then would be corrupted, however, is there a way to control the loop partitioning with OMP?
There is a clause that can be added to your #pragma omp for construct that is called schedule
With that you can specify how the chunks (what you would call one partition) are distributed over the threads
description of the scheduling variants can be found here. For your purpose dynamic or guided would fit best.
With dynamic each thread gets the same number of iterations ( << total_iterations) and requests more iterations if finished.
Wiht guided nearly the same is done but the number of iterations decreases during execution so you first get big amount of iterations and later lesser amount of iterations per request.
I think schedule(guided) is the right choice here. Though your statement that the last blocks have more work to do is opposite of what I would expect but it depends on how you're doing the loop. Normally I would run over a trianglar matrix something like this.
#pragma omp parallel for schedule(guided)
for(int i=0; i<n-1; i++) {
for(int j=i+1; j<n; j++) {
//M[i][j]
}
}
Let's choose n=101 and look at some schedulers. Assume there are four threads
If you use the schedule(static) which is normally the default (but it does not have to be).
Thread one i = 0-24, j = 1-100, j range = 100
Thread two i = 25-49, j = 26-100, j range = 75
Thread three i = 50-74, j = 51-100, j range = 50
Thread four i = 75-99, j = 76-100, j range = 25
So the fourth thread only goes over j 25 times compared to 100 times for the first thread. The load is not balanced. If we switch to schedule(guided) we get:
Thread one i = 0-24, j = 1-100, j range = 100
Thread two i = 25-44, j = 26-100, j range = 75
Thread three i = 45-69, j = 46-100, j range = 55
Thread four i = 60-69, j = 61-100, j range = 40
Thread one i = 70-76, j = 71-100
...
Now the fourth thread runs over j 40 times compared to 100 for thread 1. That's still not evenly balanced but it's a lot better. But the balancing gets better as the scheduler moves on to further iterations so it converges to better load balancing.

Divvying people into rooms by last name?

I often teach large introductory programming classes (400 - 600 students) and when exam time comes around, we often have to split the class up into different rooms in order to make sure everyone has a seat for the exam.
To keep things logistically simple, I usually break the class apart by last name. For example, I might send students with last names A - H to one room, last name I - L to a second room, M - S to a third room, and T - Z to a fourth room.
The challenge in doing this is that the rooms often have wildly different capacities and it can be hard to find a way to segment the class in a way that causes everyone to fit. For example, suppose that the distribution of last names is (for simplicity) the following:
Last name starts with A: 25
Last name starts with B: 150
Last name starts with C: 200
Last name starts with D: 50
Suppose that I have rooms with capacities 350, 50, and 50. A greedy algorithm for finding a room assignment might be to sort the rooms into descending order of capacity, then try to fill in the rooms in that order. This, unfortunately, doesn't always work. For example, in this case, the right option is to put last name A in one room of size 50, last names B - C into the room of size 350, and last name D into another room of size 50. The greedy algorithm would put last names A and B into the 350-person room, then fail to find seats for everyone else.
It's easy to solve this problem by just trying all possible permutations of the room orderings and then running the greedy algorithm on each ordering. This will either find an assignment that works or report that none exists. However, I'm wondering if there is a more efficient way to do this, given that the number of rooms might be between 10 and 20 and checking all permutations might not be feasible.
To summarize, the formal problem statement is the following:
You are given a frequency histogram of the last names of the students in a class, along with a list of rooms and their capacities. Your goal is to divvy up the students by the first letter of their last name so that each room is assigned a contiguous block of letters and does not exceed its capacity.
Is there an efficient algorithm for this, or at least one that is efficient for reasonable room sizes?
EDIT: Many people have asked about the contiguous condition. The rules are
Each room should be assigned at most a block of contiguous letters, and
No letter should be assigned to two or more rooms.
For example, you could not put A - E, H - N, and P - Z into the same room. You could also not put A - C in one room and B - D in another.
Thanks!
It can be solved using some sort of DP solution on [m, 2^n] space, where m is number of letters (26 for english) and n is number of rooms. With m == 26 and n == 20 it will take about 100 MB of space and ~1 sec of time.
Below is solution I have just implemented in C# (it will successfully compile on C++ and Java too, just several minor changes will be needed):
int[] GetAssignments(int[] studentsPerLetter, int[] rooms)
{
int numberOfRooms = rooms.Length;
int numberOfLetters = studentsPerLetter.Length;
int roomSets = 1 << numberOfRooms; // 2 ^ (number of rooms)
int[,] map = new int[numberOfLetters + 1, roomSets];
for (int i = 0; i <= numberOfLetters; i++)
for (int j = 0; j < roomSets; j++)
map[i, j] = -2;
map[0, 0] = -1; // starting condition
for (int i = 0; i < numberOfLetters; i++)
for (int j = 0; j < roomSets; j++)
if (map[i, j] > -2)
{
for (int k = 0; k < numberOfRooms; k++)
if ((j & (1 << k)) == 0)
{
// this room is empty yet.
int roomCapacity = rooms[k];
int t = i;
for (; t < numberOfLetters && roomCapacity >= studentsPerLetter[t]; t++)
roomCapacity -= studentsPerLetter[t];
// marking next state as good, also specifying index of just occupied room
// - it will help to construct solution backwards.
map[t, j | (1 << k)] = k;
}
}
// Constructing solution.
int[] res = new int[numberOfLetters];
int lastIndex = numberOfLetters - 1;
for (int j = 0; j < roomSets; j++)
{
int roomMask = j;
while (map[lastIndex + 1, roomMask] > -1)
{
int lastRoom = map[lastIndex + 1, roomMask];
int roomCapacity = rooms[lastRoom];
for (; lastIndex >= 0 && roomCapacity >= studentsPerLetter[lastIndex]; lastIndex--)
{
res[lastIndex] = lastRoom;
roomCapacity -= studentsPerLetter[lastIndex];
}
roomMask ^= 1 << lastRoom; // Remove last room from set.
j = roomSets; // Over outer loop.
}
}
return lastIndex > -1 ? null : res;
}
Example from OP question:
int[] studentsPerLetter = { 25, 150, 200, 50 };
int[] rooms = { 350, 50, 50 };
int[] ans = GetAssignments(studentsPerLetter, rooms);
Answer will be:
2
0
0
1
Which indicates index of room for each of the student's last name letter. If assignment is not possible my solution will return null.
[Edit]
After thousands of auto generated tests my friend has found a bug in code which constructs solution backwards. It does not influence main algo, so fixing this bug will be an exercise to the reader.
The test case that reveals the bug is students = [13,75,21,49,3,12,27,7] and rooms = [6,82,89,6,56]. My solution return no answers, but actually there is an answer. Please note that first part of solution works properly, but answer construction part fails.
This problem is NP-Complete and thus there is no known polynomial time (aka efficient) solution for this (as long as people cannot prove P = NP). You can reduce an instance of knapsack or bin-packing problem to your problem to prove it is NP-complete.
To solve this you can use 0-1 knapsack problem. Here is how:
First pick the biggest classroom size and try to allocate as many group of students you can (using 0-1 knapsack), i.e equal to the size of the room. You are guaranteed not to split a group of student, as this is 0-1 knapsack. Once done, take the next biggest classroom and continue.
(You use any known heuristic to solve 0-1 knapsack problem.)
Here is the reduction --
You need to reduce a general instance of 0-1 knapsack to a specific instance of your problem.
So lets take a general instance of 0-1 knapsack. Lets take a sack whose weight is W and you have x_1, x_2, ... x_n groups and their corresponding weights are w_1, w_2, ... w_n.
Now the reduction --- this general instance is reduced to your problem as follows:
you have one classroom with seating capacity W. Each x_i (i \in (1,n)) is a group of students whose last alphabet begins with i and their number (aka size of group) is w_i.
Now you can prove if there is a solution of 0-1 knapsack problem, your problem has a solution...and the converse....also if there is no solution for 0-1 knapsack, then your problem have no solution, and vice versa.
Please remember the important thing of reduction -- general instance of a known NP-C problem to a specific instance of your problem.
Hope this helps :)
Here is an approach that should work reasonably well, given common assumptions about the distribution of last names by initial. Fill the rooms from smallest capacity to largest as compactly as possible within the constraints, with no backtracking.
It seems reasonable (to me at least) for the largest room to be listed last, as being for "everyone else" not already listed.
Is there any reason to make life so complicated? Why cann't you assign registration numbers to each student and then use the number to allocate them whatever the way you want :) You do not need to write a code, students are happy, everyone is happy.

Resources