Known algorithm for efficiently distributing items and satisfying minima? - algorithm

For the following problem I'm wondering if there is a known algorithm already as I don't want to reinvent the wheel.
In this case it's about hotel rooms, but I think that is rather irrelevant:
name | max guests | min guests
1p | 1 | 1
2p | 2 | 2
3p | 3 | 2
4p | 4 | 3
I'm trying to distribute a certain amount of guests over available rooms, but the distribution has to satisfy the 'min guests' criteria of the rooms. Also, the rooms need to be used as efficiently as possible.
Let's take 7 guests for example. I wouldn't want this combination:
3 x 3p ( 1 x 3 guests, 2 x 2 guests )
.. this would satisfy the minimum criteria, but would be inefficient. Rather I'm looking for combinations such as:
1 x 3p and 1 x 4p
3 x 2p and 1 x 1p
etc...
I would think this is a familiar problem. Is there any known algorithm already to solve this problem?
To clarify:
By efficient I mean, distribute guests in such a way that rooms are filled up as much as possible (guests preferences are of secondary concern here, and are not important for the algorithm I'm looking for).
I do want all permutations that satisfy this efficiency criteria though. So in above example 7 x 1p would be fine as well.
So in summary:
Is there a known algorithm that is able to distribute items as efficiently as possible over slots with a min and max capacity, always satisfying the min criteria and trying to satisfy the max criteria as much as possible.

You need to use dynamic programming, define a cost function, and try to fit people in possible rooms having a cost function as small as possible.
Your cost function can be something like :
Sum of vacancy in rooms + number of rooms
It can be a bit similar to the least rageness problem : Word wrap to X lines instead of maximum width (Least raggedness)
You fit people in room, as you fit words in line.
The constraints are the vacancies in the rooms instead of being the length of the lines. (infinite cost if you don't fullfil the constraints)
and the recursion relation is pretty much the same .
Hope it helps

$sql = "SELECT *
FROM rooms
WHERE min_guests <= [$num_of_guests]
ORDER BY max_guests DESC
LIMIT [$num_of_guests]";
$query = $this->db->query($sql);
$remaining_guests = $num_of_guests;
$rooms = array();
$filled= false;
foreach($query->result() as $row)
{
if(!$filled)
{
$rooms[] = $row;
$remaining_guests -= $row->max_guests;
if(remaining_guests <= 0)
{
$filled = true;
break;
}
}
}
Recursive function:
public function getRoomsForNumberOfGuests($number)
{
$sql = "SELECT *
FROM rooms
WHERE min_guests <= $number
ORDER BY max_guests DESC
LIMIT 1";
$query = $this->db->query($sql);
$remaining_guests = $number;
$rooms = array();
foreach($query->result() as $row)
{
$rooms[] = $row;
$remaining_guests -= $row->max_guests;
if($remaining_guests > 0)
{
$rooms = array_merge($this->getRoomsForNumberOfGuests($remaining_guests), $rooms);
}
}
return $rooms;
}
Would something like this work for ya? Not sure what language your in?

For efficint = minimum rooms used, perhaps this would work. To minimise the number of rooms used you want to put max guests in the large rooms.
So sort the rooms in descending order of max guests, then allocate guests to them in that order, placing max guests in each room in turn. Try to place all remaining guests is any remaining room that will accept that many min guests; if that is impossible, back-track and try again. When back tracking, hold back the room with the smallest min guests. Held back rooms are not allocated guests in the max guests phase.
EDIT
As Ricky Bobby pointed out, this does not work as such, because of the difficulty of the back-tracking. I'm keeping this answer for now, more as a warning than as a suggestion :-)

This can be done as a fairly straightforward modification of the recursive algorithm to enumerate integer partitions. In Python:
import collections
RoomType = collections.namedtuple("RoomType", ("name", "max_guests", "min_guests"))
def room_combinations(count, room_types, partial_combo=()):
if count == 0:
yield partial_combo
elif room_types and count > 0:
room = room_types[0]
for guests in range(room.min_guests, room.max_guests + 1):
yield from room_combinations(
count - guests, room_types, partial_combo + ((room.name, guests),)
)
yield from room_combinations(count, room_types[1:], partial_combo)
for combo in room_combinations(
7,
[
RoomType("1p", 1, 1),
RoomType("2p", 2, 2),
RoomType("3p", 3, 2),
RoomType("4p", 4, 3),
],
):
print(combo)

we should select a list of room (maybe more than one time for each room) in a way that sum of the min value of selected room get equal or smaller than the number of guests and sum of the max value get equal or bigger than the number of guests.
I defined cost as total free space in selected rooms. (cost = max - min)
We can check for the answer with this code and find all possible combinations with the minimum cost. (c# code)
class FindRooms
{
// input
int numberOfGuest = 7; // your goal
List<Room> rooms;
// output
int cost = -1; // total free space in rooms.
List<String> options; // list of possible combinations for the best cost
private void solve()
{
// fill rooms data
// fill numberOfGuest
// run this function to find the answer
addMoreRoom("", 0, 0);
// cost and options are ready to use
}
// this function add room to the list recursively
private void addMoreRoom(String selectedRooms, int minPerson, int maxPerson)
{
// check is it acceptable
if (minPerson <= numberOfGuest && maxPerson >= numberOfGuest)
{
// check is it better than or equal to previous result
if(maxPerson - minPerson == cost)
{
options.Add(selectedRooms);
}
else if (maxPerson - minPerson < cost)
{
cost = maxPerson - minPerson;
options.Clear();
options.Add(selectedRooms);
}
}
// check if too many room selected
if (minPerson > numberOfGuest)
return;
// add more room recursively
foreach (Room room in rooms)
{
// add room and min and max space to current state and check
addMoreRoom(selectedRooms + "," + room, minPerson + room.min, maxPerson + room.max);
}
}
public class Room
{
public String name;
public int min;
public int max;
}
}

Related

Checking the validity of a pyramid of dominoes

I came across this question in a coding interview and couldn't figure out a good solution.
You are given 6 dominoes. A domino has 2 halves each with a number of spots. You are building a 3-level pyramid of dominoes. The bottom level has 3 dominoes, the middle level has 2, and the top has 1.
The arrangement is such that each level is positioned over the center of the level below it. Here is a visual:
[ 3 | 4 ]
[ 2 | 3 ] [ 4 | 5 ]
[ 1 | 2 ][ 3 | 4 ][ 5 | 6 ]
The pyramid must be set up such that the number of spots on each domino half should be the same as the number on the half beneath it. This doesn't apply to neighboring dominoes on the same level.
Is it possible to build a pyramid from 6 dominoes in the arrangement described above? Dominoes can be freely arranged and rotated.
Write a function that takes an array of 12 ints (such that arr[0], arr[1] are the first domino, arr[2], arr[3] are the second domino, etc.) and return "YES" or "NO" if it is possible or not to create a pyramid with the given 6 dominoes.
Thank you.
You can do better than brute-forcing. I don't have the time for a complete answer. So this is more like a hint.
Count the number of occurrences of each number. It should be at least 3 for at least two numbers and so on. If these conditions are not met, there is no solution. In the next steps, you need to consider the positioning of numbers on the tiles.
Just iterate every permutation and check each one. If you find a solution, then you can stop and return "YES". If you get through all permutations then return "NO". There are 6 positions and each domino has 2 rotations, so a total of 12*10*8*6*4*2 = 46080 permutations. Half of these are mirrors of each other so we're doubling our necessary workload, but I don't think that's going to trouble the user. I'd fix the domino orientations, then iterate through all the position permutations, then iterate the orientation permutations and repeat.
So I'd present the algorithm as:
For each permutation of domino orientations
For each permutation of domino positions
if arr[0] == arr[3] && arr[1] == arr[4] && arr[2] == arr[7] && arr[3] == arr[8] && arr[4] == arr[9] && && arr[5] == arr[10] then return "YES"
return "NO"
At that point I'd ask the interviewer where they wanted to go from there. We could look at optimisations, equivalences, implementations or move on to something else.
We can formulate a recursive solution:
valid_row:
if row_index < N - 1:
copy of row must exist two rows below
if row_index > 2:
matching left and right must exist
on the row above, around a center
of size N - 3, together forming
a valid_row
if row_index == N - 1:
additional matching below must
exist for the last number on each side
One way to solve it could be backtracking while tracking chosen dominoes along the path. Given the constraints on matching, a six domino pyramid ought to go pretty quick.
Before I start... There is an ambiguity in the question, which may be what the interviewer was more interested than the answer. This would appear to be a question asking for a method to validate one particular arrangement of the values, except for the bit which says "Is it possible to build a pyramid from 6 dominoes in the arrangement described above? Dominoes can be freely arranged and rotated." which implies that they might want you to also move the dominoes around to find a solution. I'm going to ignore that, and stick with the simple validation of whether it is a valid arrangement. (If it is required, I'd split the array into pairs, and then brute force the permutations of the possible arrangements against this code to find the first one that is valid.)
I've selected C# as a language for my solution, but I have intentionally avoided any language features which might make this more readable to a C# person, or perform faster, since the question is not language-specific, so I wanted this to be readable/convertible for people who prefer other languages. That's also the reason why I've used lots of named variables.
Basically check that each row is duplicated in the row below (offset by one), and stop when you reach the last row.
The algorithm drops out as soon as it finds a failure. This algorithm is extensible to larger pyramids; but does no validation of the size of the input array: it will work if the array is sensible.
using System;
public static void Main()
{
int[] values = new int[] { 3, 4, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6 };
bool result = IsDominoPyramidValid(values);
Console.WriteLine(result ? "YES" : "NO");
}
private const int DominoLength = 2;
public static bool IsDominoPyramidValid(int[] values)
{
int arrayLength = values.Length;
int offset = 0;
int currentRow = 1; // Note: I'm using a 1-based value here as it helps the maths
bool result = true;
while (result)
{
int currentRowLength = currentRow * DominoLength;
// Avoid checking final row: there is no row below it
if (offset + currentRowLength >= arrayLength)
{
break;
}
result = CheckValuesOnRowAgainstRowBelow(values, offset, currentRowLength);
offset += currentRowLength;
currentRow++;
}
return result;
}
private static bool CheckValuesOnRowAgainstRowBelow(int[] values, int startOfCurrentRow, int currentRowLength)
{
int startOfNextRow = startOfCurrentRow + currentRowLength;
int comparablePointOnNextRow = startOfNextRow + 1;
for (int i = 0; i < currentRowLength; i++)
{
if (values[startOfCurrentRow + i] != values[comparablePointOnNextRow + i])
{
return false;
}
}
return true;
}

find optimal sum of elements in list of numbers that are > a given number

I have a list of numbers, sorted descending, size of array is variable and any number can appear (typically less than 1000)
given an input number (x), I need to find the optimal combination of values in the list, that is larger than x by the smallest amount possible. I've been reading about the NP-optimization and sum subset type problems, but haven't found a solution yet. I've found some psuedocode for an approximate alogorithm, but i would like to find the exact optimal solution from the given list of numbers.
thanks
From your comments I understood that the input array has unsigned (integers).
This problem seems not much different from the subset sum problem with non-negative integers which can be solved in polynomial time.
I found this to be a reasonably well-performing algorithm that finds the optimal solution:
The algorithm in pseudo code
For each element of the array:
select this element
if sum of selected <= x:
# Sum is too small, so add more (smaller) term(s)
execute algorithm recursively for the remaining part of array
else if < sum in best solution so far:
# Sum is closer to target, so this is currently the best
best solution = current selected terms
# Back-track: remove this term from the sum
unselect this element
return best solutiuon
When the algorithm is called recursively the previous selected terms remain selected, and in the loop one more term is selected. It can recurse again, etc. The total number of selected terms corresponds to the depth of the recursion.
There are two ways in which the recursion cuts many combinations:
when a high value term is selected, the remaining value for crossing the target value becomes relatively small, reducing the possibilities of which other terms can contribute to a (better) solution;
when a lower value term is selected, the remaining number of terms is relatively small (because of the order), and so also the possibilities are reduced.
This suggests that the time complexity is less than O(2n), which would be the complexity when all possible combinations would have to be investigated (or a constant fraction of it).
Implementation
Here is an implementation in JavaScript, so you can run it. It offers a randomize button so you can generate an array of any given length with random numbers, and a random target value.
The algorithm seems to run in the order of O(n.logn) time, by just looking at the number of combinations it checks on average. That is of course not a proof.
The comments in the code should give clarification.
// Main algorithm
function solve(a /* array of int */, x /* int */) {
// Initialise
var best = {sum: a[0] * a.length, numSumsVerified: 0, numTerms: 0, terms: []};
var current = {sum: 0, terms: []};
function recurse(start) {
var ok = start < a.length;
for (var i = start; i < a.length && best.sum > x + 1 && ok; i++) {
// Use this term for the sum
current.sum += current.terms[current.terms.length] = a[i];
// Keep statistics of how many combinations we check
best.numSumsVerified++;
if (current.sum <= x) {
// Sum is too small, so add more (smaller) term(s)
ok = recurse(i+1);
} else if (current.sum < best.sum) {
// Sum is closer to target, so this is currently the best
best.sum = current.sum;
best.terms = current.terms.slice(0);
}
// Back-track: remove this term from the sum
current.sum -= current.terms.pop();
}
return ok || i > start + 1;
}
// start the search, and capture errors
try {
recurse(0);
} catch (ex) {
best.error = 'Too much recursion!';
best.sum = null;
return best;
}
best.numTerms = best.terms.length;
// if no solution, set error message
if (!best.terms.length) {
best.error = 'no solution';
best.sum = null;
}
return best;
}
// Utility for randomizing
function createRandomNumbers(limit, count) {
res = [];
while (count--) res.push(Math.floor(Math.random() * limit));
return res;
}
// I/O
var inputA = document.querySelector('#a');
var inputX = document.querySelector('#x');
var buttonSolve = document.querySelector('#solve');
var inputSize = document.querySelector('#size');
var buttonRandom = document.querySelector('#randomize');
var output = document.querySelector('pre');
buttonSolve.onclick = function() {
// Get input
var a = inputA.value.match(/\d+/g).map(Number);
var x = Number(inputX.value);
// Sort descending
a.sort(function(a,b) { return b-a; });
// Solve
var result = solve(a, x);
// Output
inputA.value = a.join(' '); // just for reformatting
// Reduce detail when many items
if (result.terms.length > 100) result.terms = '(too many to display)';
output.textContent = JSON.stringify(result, null, 4);
};
buttonRandom.onclick = function() {
// Generate random input
var size = Number(inputSize.value);
var limit = size * 20;
var a = createRandomNumbers(limit, size).sort((a,b) => b-a);
var sum = a.reduce((a,b) => a+b);
var x = createRandomNumbers(sum, 1).pop();
// Populate the input boxes
inputA.value = a.join(' ');
inputX.value = x;
// Trigger click on "solve" button
setTimeout(buttonSolve.click.bind(buttonSolve), 0);
}
Enter list of integers: <input id="a" size="50" value="18 13 12 10 9 8 6 6 1 0"><br>
Sum must be larger than: <input id="x" size="10" value="16"><br>
<button id="solve">Solve</button><br>
Desired array size: <input id="size" size="6" value="50">
<button id="randomize">Random Input</button>
<pre></pre>
As this algorithm performs a recursive call for every term that is added to the combination, the recursion might go deep for large input arrays. At a certain point it may hit stack limits. In my browser that limit gets hit often near array sizes of 10,000 elements. Probably this algorithm can be rewritten without use of recursion if it needs to be used for such large arrays.
This sounds very much like a knapsack problem. If it is, then finding an exact solution is not going to be easy.
Why is that? It is because the best solution may be comprised of potentially any subset for your numbers. There are 2^N subsets, so that is an extremely large space to search for solutions.
How large is it?
Items | Search Space | How this feels
16 | 65,536 | I can do it!
32 | 4,294,967,296 | Better go get lunch
40 | 1,099,511,627,776 | If I only had a supercomputer
50 | 1,125,899,906,842,624 | I will watch as the last stars day
There are tricks and things which can reduce the search space, but you'll still hit epicly large problems for low item counts.
Given this, you should:
Choose only small problem sizes and use exhaustive search because it is easy to implement.
Learn about branch-and-bound and other search space reducing techniques
Approximate. Everyone does it.

Finding a majority of unorderable items

I have this one problem with finding solution to this task.
You have N students and N courses. Student can attend only one course
and one course can be attended by many students. Two students are
classmates if they are attending same course. How to find out if there
are N/2 classmates in N students with this?
conditions: You can take two students and ask if they are classmates
and only answer you can get is "yes" or "no". And you need to do this
in O(N*log(N)).
I need just some idea how to make it, pseudo code will be fine. I guess it will divide the list of students like merge sort, which gives me the logarithmic part of complexity. Any ideas will be great.
First, pair off each student (1&2, 3&4, 5&6... etc), and you check and see which pairs are in the same class. The first student of the pair gets "promoted". If there an "oddball" student, they are in their own class, so they get promoted as well. If a single class contains >=50% of the students, then >=50% of the promoted students are also in this class. If no students are promoted, then if a single class contains >=50% of the students then either the first or the second student must be in the class, so simply promote both of them. This leads to the case where >=50% of the promotions are in the large class. This always takes ⌊N/2⌋ comparisons.
Now when we examine the promoted students, then if a class contains >=50% of the students, then >=50% of the promoted students are in this class. Therefore, we can simply recurse, until we reach a stop condition: there are less than three promoted students. At each step we promote <=50% of the students (plus one sometimes), so this step occurs at most ⌈log(N,2)⌉ times.
If there are less than three promoted students, then we know that if >=50% of the original students are in the class, then at least one of these remaining students is in that class. Therefore, we can simply compare each and every original student against these promoted students, which will reveal either (A) the class with >=50% of the students, or (B) that no class has >=50% of the students. This takes at most (N-1) comparisons, but only occurs once. Note that there is the possibility where all the original students match with one of the two remaining students evenly, and this detects that both classes have =50% of the students.
So the complexity is N/2 *~ log(N,2) + N-1. However, the *~ signifies that we don't iterate over all N/2 students at each of the log(N,2) iterations, only decreasing fractions N/2, N/4, N/8..., which sum to N. So the total complexity is N/2 + N/2 + N-1 = 2N-1, and when we remove the constants we get O(N). (I feel like I may have made a math error in this paragraph. If someone spots it, let me know)
See it in action here: http://coliru.stacked-crooked.com/a/144075406b7566c2 (The comparison counts may be slightly over the estimate due to simplifications I made in the implementation)
Key here is that if >50% of the students are in a class, then >=50% of the arbitrary pairs are in that class, assuming the oddball student matches himself. One trick is that if exactly 50% match, it's possible that they alternate in the original order perfectly and thus nobody gets promoted. Luckily, the only cases is the alternating, so by promoting the first and second students, then even in that edge case, >=50% of the promotions are in the large class.
It's complicated to prove that >=50% of the promotions are in the large class, and I'm not even certain I can articulate why this is. Confusingly, it also doesn't hold prettily for any other fractions. If the target is >=30% of the comparisons, it's entirely possible that none of the promoted students are in the target class(s). So >=50% is the magic number, it isn't arbitrary at all.
If one can know the number of students for each course then it should suffice to know if there is a course with a number of students >= N/2. In this case you have a complexity of O(N) in the worst case.
If it is not possible to know the number of students for each course then you could use an altered quicksort. In each cycle you pick a random student and split the other students in classmates and non-classmates. If the number of classmates is >= N/2 you stop because you have the answer, else you analyze the non-classmates partition. If the number of students in that partition is < N/2 you stop because it is not possible to have a quantity of classmates >= N/2, else you pick another student from the non-classmates partition and repeat everything using only the non-classmates elements.
What we take from the quicksort algorithm is just the way we partition the students. The above algorithm has nothing to do with sorting. In pseudo-code it would look something like this (array indexing starts from 1 for the sake of clarity):
Student[] students = all_students;
int startIndex = 1;
int endIndex = N; // number of students
int i;
while(startIndex <= N/2){
endIndex = N; // this index must point to the last position in each cycle
students.swap(startIndex, start index + random_int % (endIndex-startIndex));
for(i = startIndex + 1; i < endIndex;){
if(students[startIndex].isClassmatesWith(students[i])){
i++;
}else{
students.swap(i,endIndex);
endIndex--;
}
}
if(i-startIndex >= N/2){
return true;
}
startIndex = i;
}
return false;
The situation of the partition before the algorithm starts would be as simple as:
| all_students_that_must_be_analyzed |
during the first run the set of students would be partitioned this way:
| classmates | to_be_analyzed | not_classmates |
and during each run after it, the set of students would be partitioned as follows:
| to_ignore | classmates | to_be_analyzed | not_classmates |
In the end of each run the set of students would be partitioned in the following way:
| to_ignore | classmates | not_classmates |
At this moment we need to check if the classmates partition has more than N/2 elements. If it has, then we have a positive result, if not we need to check if the not_classmates partition has >= N/2 elements. If it has, then we need to proceed with another run, otherwise we have a negative result.
Regarding complexity
Thinking more in depth on the complexity of the above algorithm, there are two main factors that affect it, which are:
The number of students in each course (it's not necessary to know this number for the algorithm to work).
The average number of classmates found in each iteration of the algorithm.
An important part of the algorithm is the random choice of the student to be analyzed.
The worst case scenario would be when each course has 1 student. In this case (for obvious reasons I would say) the complexity would be O(N^2). If the number of students for the courses varies then this case won't happen.
An example of the worst case scenario would be when we have, let's say, 10 students, 10 courses, and 1 student for each course. We would check 10 students the first time, 9 students the second time, 8 students the third time, and so on. This brings a O(N^2) complexity.
The best case scenario would be when the first student you choose is in a course with a number of students >= N/2. In this case the complexity would be O(N) because it would stop in the first run.
An example of the best case scenario would be when we have 10 students, 5 (or more) of which are classmates, and in the first run we pick one of these 5 students. In this case we would check only 1 time for the classmates, find 5 classmates, and return true.
The average case scenario is the most interesting part (and more close to a real-world scenario). In this case there are some probabilistic calculations to make.
First of all, the chances of a student from a particular course to be picked are [number_of_students_in_the_course] / N. This means that, in the first runs it's more probable to pick a student with many classmates.
That being said, let's consider the case where the average number of classmates found in each iteration is a number smaller that N/2 (as is the length of each partition in the average case for quicksort). Let's say that the average amount of classmates found in each iteration is 10% (number taken for ease of calculations) of the remaining M students (that are not classmates of the previously picked students). In this case we would have these values of M for each iteration:
M1 = N - 0.1*N = 0.9*N
M2 = M1 - 0.1*M1 = 0.9*M1 = 0.9*0.9*N = 0.81*N
M3 = M2 - 0.1*M2 = 0.9*M2 = 0.9*0.81*N = 0.729*N and I would round it to 0.73*N for ease of calculations
M4 = 0.9*M3 = 0.9*0.73*N = 0.657*N ~= 0.66*N
M5 = 0.9*M4 = 0.9*0.66*N = 0.594*N ~= 0.6*N
M6 = 0.9*M5 = 0.9*0.6*N = 0.54*N
M7 = 0.9*M6 = 0.9*0.54*N = 0.486*N ~= 0.49*N
The algorithm stops because we have 49% of remaining students and we can't have more than N/2 classmates among them.
Obviously, in the case of a smaller percentage of average classmates the number of iterations will be greater but, combined with the first fact (the students in courses with many students have a higher probability to get picked in the early iterations), the complexity will tend towards O(N), the number of iterations (in the outer cycle in the pseudo-code) will be (more or less) constant and will not depend on N.
To better explain this scenario let's work with greater (but more realistic) numbers and more than 1 distribution. Let's say that we have 100 students (number taken for the sake of simplicity in calculations) and that these students are distributed among the courses in one of the following (hypothetical) ways (the numbers are sorted just for explanation purposes, they are not necessary for the algorithm to work):
50, 30, 10, 5, 1, 1, 1, 1, 1
35, 27, 25, 10, 5, 1, 1, 1
11, 9, 9, 8, 7, 7, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 1
The numbers given are also (in this particular case) the probabilities that a student in a course (not a particular student, just a student of that course) is picked in the first run. The 1st case is when we have a course with half the students. The 2nd case is when we don't have a course with half the students but more than 1 course with many students. The 3rd case is when we have a similar distribution among the courses.
In the 1st case we would have a 50% probability that a student of the 1st course gets picked in the first run, 30% probability that a student of the 2nd course gets picked, 10% probability that a student of the 3rd course gets picked, 5% probability that a student of the 4th course gets picked, 1% that a student from the 5th course gets picked, and so on for the 6th, 7th, 8th, and 9th course. The probabilities are higher for a student of the 1st case to get picked early, and in the case a student from this course does not get picked in the first run the probabilities it gets picked in the second run only increase. For example, let's suppose that in the 1st run a student from the second course is picked. 30% of the students would be "removed" (as in "not considered anymore") and not be analyzed in the 2nd run. In the 2nd run we would have 70 students remaining. The probability to pick a student from the 1st course in the second run would be 5/7, more than 70%. Let's suppose that - out of bad luck - in the 2nd run a student from the 3rd course gets picked. In the 3rd run we would have 60 students left and the probability that a student from the first course gets picked in the 3rd run would be 5/6 (more than 80%). I would say that we can consider our bad luck to be over in the 3rd run, a student from the 1st course gets picked, and the method returns true :)
For the 2nd and 3rd case I would follow the probabilities for each run, just for the sake of simplicity of calculations.
In the 2nd case we would have a student from the 1st course picked in the 1st run. Being that the number of classmates is not <= N/2 the algorithm would go on with the 2nd run. In the end of the 2nd run we would have "removed" from the student set 35+27=62 students. In the 3rd run we would have 38 students left, and being that 38 < (N/2) = 50 the computation stops and returns false.
The same happens in the 3rd case (in which we "remove" an average of 10% of the remaining students in each run), but with more steps.
Final considerations
In any case, the complexity of the algorithm in the worst case scenario is O(N^2). The average case scenario is heavily based on probabilities and tends to pick early the students from courses with many attendees. This behaviour tends to bring the complexity down to O(N), complexity that we also have in the best case scenario.
Test of the algorithm
In order to test the theoretical complexity of the algorithm I wrote the following code in C#:
public class Course
{
public int ID { get; set; }
public Course() : this(0) { }
public Course(int id)
{
ID = id;
}
public override bool Equals(object obj)
{
return (obj is Course) && this.Equals((Course)obj);
}
public bool Equals(Course other)
{
return ID == other.ID;
}
}
public class Student
{
public int ID { get; set; }
public Course Class { get; set; }
public Student(int id, Course course)
{
ID = id;
Class = course;
}
public Student(int id) : this(id, null) { }
public Student() : this(0) { }
public bool IsClassmatesWith(Student other)
{
return Class == other.Class;
}
public override bool Equals(object obj)
{
return (obj is Student) && this.Equals((Student)obj);
}
public bool Equals(Student other)
{
return ID == other.ID && Class == other.Class;
}
}
class Program
{
static int[] Sizes { get; set; }
static List<Student> Students { get; set; }
static List<Course> Courses { get; set; }
static void Initialize()
{
Sizes = new int[] { 2, 10, 100, 1000, 10000, 100000, 1000000 };
Students = new List<Student>();
Courses = new List<Course>();
}
static void PopulateCoursesList(int size)
{
for (int i = 1; i <= size; i++)
{
Courses.Add(new Course(i));
}
}
static void PopulateStudentsList(int size)
{
Random ran = new Random();
for (int i = 1; i <= size; i++)
{
Students.Add(new Student(i, Courses[ran.Next(Courses.Count)]));
}
}
static void Swap<T>(List<T> list, int i, int j)
{
if (i < list.Count && j < list.Count)
{
T temp = list[i];
list[i] = list[j];
list[j] = temp;
}
}
static bool AreHalfOfStudentsClassmates()
{
int startIndex = 0;
int endIndex;
int i;
int numberOfStudentsToConsider = (Students.Count + 1) / 2;
Random ran = new Random();
while (startIndex <= numberOfStudentsToConsider)
{
endIndex = Students.Count - 1;
Swap(Students, startIndex, startIndex + ran.Next(endIndex + 1 - startIndex));
for (i = startIndex + 1; i <= endIndex; )
{
if (Students[startIndex].IsClassmatesWith(Students[i]))
{
i++;
}
else
{
Swap(Students, i, endIndex);
endIndex--;
}
}
if (i - startIndex + 1 >= numberOfStudentsToConsider)
{
return true;
}
startIndex = i;
}
return false;
}
static void Main(string[] args)
{
Initialize();
int studentsSize, coursesSize;
Stopwatch stopwatch = new Stopwatch();
TimeSpan duration;
bool result;
for (int i = 0; i < Sizes.Length; i++)
{
for (int j = 0; j < Sizes.Length; j++)
{
Courses.Clear();
Students.Clear();
studentsSize = Sizes[j];
coursesSize = Sizes[i];
PopulateCoursesList(coursesSize);
PopulateStudentsList(studentsSize);
Console.WriteLine("Test for {0} students and {1} courses.", studentsSize, coursesSize);
stopwatch.Start();
result = AreHalfOfStudentsClassmates();
stopwatch.Stop();
duration = stopwatch.Elapsed;
var studentsGrouping = Students.GroupBy(s => s.Class);
var classWithMoreThanHalfOfTheStudents = studentsGrouping.FirstOrDefault(g => g.Count() >= (studentsSize + 1) / 2);
Console.WriteLine(result ? "At least half of the students are classmates." : "Less than half of the students are classmates");
if ((result && classWithMoreThanHalfOfTheStudents == null)
|| (!result && classWithMoreThanHalfOfTheStudents != null))
{
Console.WriteLine("There is something wrong with the result");
}
Console.WriteLine("Test duration: {0}", duration);
Console.WriteLine();
}
}
Console.ReadKey();
}
}
The execution time matched the expectations of the average case scenario. Feel free to play with the code, you just need to copy and paste it and it should work.
I will post some of my ideas..
First of all, I think that we need to do something like mergesort, to make that logarithmical part... I thought, that at the lowest level, where we have just 2 students to compare, we just ask and got an answer. But that doesnt solve anything. In this case, we will just have N/2 pairs of students and knowledge either they are classmates or not so ever. And this doesnt help..
Next idea was little bit better. I didnt divide that set to minimum level, but i stopped when i had sets of 4 students. so I had N/4 little sets where I compared everyone to each other. And if I found, that at least two of them are classmates, that was good. If not, and all of them was from different class, I completly forgot that group of 4. When I applyed this to each group, I started to joining them to groups of 8 just by comparing those, who were already flagged as classmates. (thanks to transitivity). And again... if there were at least 4 classmates, in group of 8, I was happy and if not, I forgot about that group. This ought to be repeated until i have two sets of students and make one comparsion on students from both sets to got final answer. BUT problem is, that in there can be n/2-1 classmates in one half and in another half just one student matching with them.. and this agorithm doesnt work with this idea.

Divvying people into rooms by last name?

I often teach large introductory programming classes (400 - 600 students) and when exam time comes around, we often have to split the class up into different rooms in order to make sure everyone has a seat for the exam.
To keep things logistically simple, I usually break the class apart by last name. For example, I might send students with last names A - H to one room, last name I - L to a second room, M - S to a third room, and T - Z to a fourth room.
The challenge in doing this is that the rooms often have wildly different capacities and it can be hard to find a way to segment the class in a way that causes everyone to fit. For example, suppose that the distribution of last names is (for simplicity) the following:
Last name starts with A: 25
Last name starts with B: 150
Last name starts with C: 200
Last name starts with D: 50
Suppose that I have rooms with capacities 350, 50, and 50. A greedy algorithm for finding a room assignment might be to sort the rooms into descending order of capacity, then try to fill in the rooms in that order. This, unfortunately, doesn't always work. For example, in this case, the right option is to put last name A in one room of size 50, last names B - C into the room of size 350, and last name D into another room of size 50. The greedy algorithm would put last names A and B into the 350-person room, then fail to find seats for everyone else.
It's easy to solve this problem by just trying all possible permutations of the room orderings and then running the greedy algorithm on each ordering. This will either find an assignment that works or report that none exists. However, I'm wondering if there is a more efficient way to do this, given that the number of rooms might be between 10 and 20 and checking all permutations might not be feasible.
To summarize, the formal problem statement is the following:
You are given a frequency histogram of the last names of the students in a class, along with a list of rooms and their capacities. Your goal is to divvy up the students by the first letter of their last name so that each room is assigned a contiguous block of letters and does not exceed its capacity.
Is there an efficient algorithm for this, or at least one that is efficient for reasonable room sizes?
EDIT: Many people have asked about the contiguous condition. The rules are
Each room should be assigned at most a block of contiguous letters, and
No letter should be assigned to two or more rooms.
For example, you could not put A - E, H - N, and P - Z into the same room. You could also not put A - C in one room and B - D in another.
Thanks!
It can be solved using some sort of DP solution on [m, 2^n] space, where m is number of letters (26 for english) and n is number of rooms. With m == 26 and n == 20 it will take about 100 MB of space and ~1 sec of time.
Below is solution I have just implemented in C# (it will successfully compile on C++ and Java too, just several minor changes will be needed):
int[] GetAssignments(int[] studentsPerLetter, int[] rooms)
{
int numberOfRooms = rooms.Length;
int numberOfLetters = studentsPerLetter.Length;
int roomSets = 1 << numberOfRooms; // 2 ^ (number of rooms)
int[,] map = new int[numberOfLetters + 1, roomSets];
for (int i = 0; i <= numberOfLetters; i++)
for (int j = 0; j < roomSets; j++)
map[i, j] = -2;
map[0, 0] = -1; // starting condition
for (int i = 0; i < numberOfLetters; i++)
for (int j = 0; j < roomSets; j++)
if (map[i, j] > -2)
{
for (int k = 0; k < numberOfRooms; k++)
if ((j & (1 << k)) == 0)
{
// this room is empty yet.
int roomCapacity = rooms[k];
int t = i;
for (; t < numberOfLetters && roomCapacity >= studentsPerLetter[t]; t++)
roomCapacity -= studentsPerLetter[t];
// marking next state as good, also specifying index of just occupied room
// - it will help to construct solution backwards.
map[t, j | (1 << k)] = k;
}
}
// Constructing solution.
int[] res = new int[numberOfLetters];
int lastIndex = numberOfLetters - 1;
for (int j = 0; j < roomSets; j++)
{
int roomMask = j;
while (map[lastIndex + 1, roomMask] > -1)
{
int lastRoom = map[lastIndex + 1, roomMask];
int roomCapacity = rooms[lastRoom];
for (; lastIndex >= 0 && roomCapacity >= studentsPerLetter[lastIndex]; lastIndex--)
{
res[lastIndex] = lastRoom;
roomCapacity -= studentsPerLetter[lastIndex];
}
roomMask ^= 1 << lastRoom; // Remove last room from set.
j = roomSets; // Over outer loop.
}
}
return lastIndex > -1 ? null : res;
}
Example from OP question:
int[] studentsPerLetter = { 25, 150, 200, 50 };
int[] rooms = { 350, 50, 50 };
int[] ans = GetAssignments(studentsPerLetter, rooms);
Answer will be:
2
0
0
1
Which indicates index of room for each of the student's last name letter. If assignment is not possible my solution will return null.
[Edit]
After thousands of auto generated tests my friend has found a bug in code which constructs solution backwards. It does not influence main algo, so fixing this bug will be an exercise to the reader.
The test case that reveals the bug is students = [13,75,21,49,3,12,27,7] and rooms = [6,82,89,6,56]. My solution return no answers, but actually there is an answer. Please note that first part of solution works properly, but answer construction part fails.
This problem is NP-Complete and thus there is no known polynomial time (aka efficient) solution for this (as long as people cannot prove P = NP). You can reduce an instance of knapsack or bin-packing problem to your problem to prove it is NP-complete.
To solve this you can use 0-1 knapsack problem. Here is how:
First pick the biggest classroom size and try to allocate as many group of students you can (using 0-1 knapsack), i.e equal to the size of the room. You are guaranteed not to split a group of student, as this is 0-1 knapsack. Once done, take the next biggest classroom and continue.
(You use any known heuristic to solve 0-1 knapsack problem.)
Here is the reduction --
You need to reduce a general instance of 0-1 knapsack to a specific instance of your problem.
So lets take a general instance of 0-1 knapsack. Lets take a sack whose weight is W and you have x_1, x_2, ... x_n groups and their corresponding weights are w_1, w_2, ... w_n.
Now the reduction --- this general instance is reduced to your problem as follows:
you have one classroom with seating capacity W. Each x_i (i \in (1,n)) is a group of students whose last alphabet begins with i and their number (aka size of group) is w_i.
Now you can prove if there is a solution of 0-1 knapsack problem, your problem has a solution...and the converse....also if there is no solution for 0-1 knapsack, then your problem have no solution, and vice versa.
Please remember the important thing of reduction -- general instance of a known NP-C problem to a specific instance of your problem.
Hope this helps :)
Here is an approach that should work reasonably well, given common assumptions about the distribution of last names by initial. Fill the rooms from smallest capacity to largest as compactly as possible within the constraints, with no backtracking.
It seems reasonable (to me at least) for the largest room to be listed last, as being for "everyone else" not already listed.
Is there any reason to make life so complicated? Why cann't you assign registration numbers to each student and then use the number to allocate them whatever the way you want :) You do not need to write a code, students are happy, everyone is happy.

VBScript Poker Game -- What hand do I have?

Odd little project I am working on. Before you answer, yes, I know that vbscript is probably the worst language to use for this.
I need help determining what each player has. Each card has a unique number (which I 'translate' into it's poker value with a ♥♦♣♠ next to it). For example:
A♥ = 0
2♥ = 1
3♥ = 2
...
and so on. I need help determining what hand I have. I have thought of a few ways. The first is using the delta between each card value. For example, a straight would be:
n
n +/- (1+ (13 * (0 or 1 or 2 or 3)))
n +/- (2 + (13 * (0 or 1 or 2 or 3 )))
...
and so on. For example cards 3, 3+1+0, 3+2+13, 3+3+(13*3), 3+4+(13*2)
would give me:
4♥ 5♥ 6♦ 7♠ 8♣
My questions is, should I attempt to use regex for this? What is the best way to tell the computer what hand he has without hardcoding every hand?
EDIT: FULL CODE HERE: https://codereview.stackexchange.com/questions/21338/how-to-tell-the-npc-what-hand-it-has
Poker hands all depend on the relative ranks and/or suits of cards.
I suggest writing some utility functions, starting with determining a rank and suit.
So a card in your representation is an int from 0..51. Here are some useful functions (pseudo-code):
// returns rank 0..12, where 0 = Ace, 12 = King
getRank(card) {
return card % 13;
}
// returns suit 0..3, where 0 = Heart, 1 = Diamond, 2 = Club, 3 = Spade
getSuit(card) {
return card / 13; // or floor(card / 13) if lang not using floored division
}
Now that you can obtain the rank and suit of a set of hands you can write some utilities to work with those.
// sort and return the list of cards ordered by rank
orderByRank(cards) {
// ranked = []
// for each card in cards:
// get the rank
// insert into ranked list in correct place
}
// given a ranked set of cards return highest number of identical ranks
getMaxSameRank(ranked) {
duplicates = {} // map / hashtable
for each rank in ranked {
duplicates[rank] += 1
}
return max(duplicates.vals())
}
// count the number of cards of same suit
getSameSuitCount(cards) {
suitCounts = {} // a map or hashtable if possible
// for each card in cards:
// suitCounts{getSuit(card)} += 1
// return max suit count (highest value of suitCounts)
}
You will need some more utility functions, but with these you can now look for a flush or straight:
isFlush(cards) {
if (getSameSuitCount(cards) == 5) {
return true
}
return false
}
isStraight(cards) {
ranked = orderByRank(cards)
return ranked[4] - ranked[0] == 3 && getMaxSameRank(ranked) == 1
}
isStraightFlush(cards) {
return isFlush(cards) && isStraight(cards)
}
And so on.
In general, you will need to check each hand against the possible poker hands, starting with the best, working down to high card. In practice you will need more than that to differentiate ties (two players have a fullhouse, the winner is the player with the higher ranked three of a kind making their fullhouse). So you need to store a bit more information for ranking two hands against one another, such as kickers.
// simplistic version
getHandRanking(cards) {
if (isStraightFlush()) return STRAIGHT_FLUSH
if (isQuads()) return QUADS
...
if (isHighCard) return HIGH_CARD
}
getWinner(handA, handB) {
return max(getHandRanking(handA), getHandRanking(handB))
}
That would be my general approach. There is a wealth of information on poker hand ranking algorithms out there. You might enjoy the Unit 1: Winning Poker Hands from Peter Norvig's Udacity course Design of Computer Programs

Resources