Algorithm to find all combination from an array of objects - algorithm

Recently a friend of mine proposed to solve a simple problem and I'm struggle to find the best algorithm to solve it.
There's a list of athletes, every athlete has a name, a weight and can have multiple roles, at least 1 maximum 3 ( Internal, Median and External ).
A team is composed by 6 athletes ( 2 internals, 2 medians and 2 externals ).
There is also a weight limit which is at 540kg.
What is the best algorithm to find all possible combination for this problem?
Right now I find all combination iterating through the list of athletes, all of its roles and removing all combinations that overpass a limit.
Do you have any better algorithm to solve this problem?
What is the best method to approach a problem like this in order to solve it?
Thank you

What is the best algorithm to find all possible combination for this problem?
In the worst case, where you have n >= 6 athletes, each weighing so little that the limit doesn't matter, and each able to play all roles, the number of teams grows very, very quickly, even if you don't want to multiply-count the teams that have the same set of players assigned to different roles.
The exact number in this case is "n choose 6" or:
n * (n - 1) * (n - 2) * (n - 3) * (n - 4) * (n - 5) / 720
This is an O(n6) problem. This is going to be slow no matter what if n is larger than, like, 30. Once n > 123, the quantity won't fit in an unsigned 32-bit integer anymore.
If the team size can vary, then the problem is O(nk), where k is the team size. This is no longer polynomial; in fact, it's harder than NP-Complete. It would be able to enumerate solutions to the Knapsack Problem, which is ♯P complete.[1]
Thus, this is an engineering problem more than it is an algorithms problem.
Right now I find all combination iterating through the list of athletes, all of its roles and removing all combinations that overpass a limit.
This is pretty much what I would do, only you can make it somewhat more efficient by pruning partial teams as early as possible. Here are some ideas I had:
Sort the list of athletes by weight so that when you go over the weight limit, you can stop looking at the later athletes in the list.
Keep track of how many roles have a possible athlete to fulfill them, given the partial team. When you reach the fifth member, you know the athlete must have a role in one of the shorthanded roles.
For example, if you have a team of four athletes so far with no median players, you must only consider median players for the next two.
To make this easier, create (sorted) auxiliary lists for each role that contain pointers to the players that can have that role.

Related

Optimize event seat assignments with Corona restrictions

Problem:
Given a set of group registrations, each for a varying number of people (1-7),
and a set of seating groups (immutable, at least 2m apart) varying from 1-4 seats,
I'd like to find the optimal assignment of people groups to seating groups:
People groups may be split among several seating groups (though preferably not)
Seating groups may not be shared by different people groups
(optional) the assignment should minimize the number of 'wasted' seats, i.e. maximize the number of seats in empty seating groups
(ideally it should run from within a Google Apps script, so memory and computational complexity should be as small as possible)
First attempt:
I'm interested in the decision problem (is it feasible?) as well as the optimization problem (see optional optimization function). I've modeled it as a SAT problem, but this does not find an optimal solution.
For this reason, I've tried to model it as an optimization problem. I'm thinking along the lines of a (remote) variation of multiple-knapsack, but I haven't been able to name it yet:
items: seating groups (size -> weight)
knapsacks: people groups (size -> container size)
constraint: combined item weight >= container size
optimization: minimize the number of items
As you can see, the constraint and optimization are inverted compared to the standard problem. So my question is: Am I on the right track here or would you go about it another way? If it's correct, does this optimization problem have a name?
You could approach this as an Integer Linear Programming Problem, defined as follows:
let P = the set of people groups, people group i consists of p_i people;
let T = the set of tables, table j has t_j places;
let x_ij be 1 if people from people group i are placed at table j, 0 otherwise
let M be a large penalty factor for empty seats
let N be a large penalty factor for splitting groups
// # of free spaces = # unavailable - # occupied
// every time a group uses more than one table,
// a penalty of N * (#tables - 1) is incurred
min M * [SUM_j(SUM_i[x_ij] * t_j) - SUM_i(p_i)] + N * SUM_i[(SUM_j(x_ij) - 1)]
// at most one group per table
s.t. SUM_i(x_ij) <= 1 for all j
// every group has enough seats
SUM_j(x_ij * t_j) = p_i for all i
0 <= x_ij <= 1
Although this minimises the number of empty seats, it does not minimise the number of tables used or maximise the number of groups admitted. If you'd like to do that, you could expand the objective function by adding a penalty for every group turned away.
ILPs are NP-hard, so without the right solvers, it might not be possible to make this run with Google Apps. I have no experience with that, so I'm afraid I can't help you. But there are some methods to reduce your search space.
One would be through something called column generation. Here, the problem is split into two parts. The complex master problem is your main research question, but instead of the entire solution space, it tries to find the optimum from different candidate assignments (or columns).
The goal is then to define a subproblem that recommends these new potential solutions that are then incorporated in the master problem. The power of a good subproblem is that it should be reducable to a simpler model, like Knapsack or Dijkstra.

Best approach to a variation of a bucketing problem

Find the most appropriate team compositions for days in which it is possible. A set of n participants, k days, a team has m slots. A participant specifies how many days he wants to be a part of and which days he is available.
Result constraints:
Participants must not be participating in more days than they want
Participants must not be scheduled in days they are not available in.
Algorithm should do its best to include as many unique participants as possible.
A day will not be scheduled if less than m participants are available for that day.
I find myself solving this problem manually every week at work for my football team scheduling and I'm sure there is a smart programmatic approach to solve it. Currently, we consider only 2 days per week and colleagues write down their name for which day they wanna participate, and it ends up having big lists for each day and impossible to please everyone.
I considered a new approach in which each colleague writes down his name, desired times per week to play and which days he is available, an example below:
Kane 3 1 2 3 4 5
The above line means that Kane wants to play 3 times this week and he is available Monday through Friday. First number represents days to play, next numbers represent available days(1 to 7, MOnday to Sunday).
Days with less than m (in my case, m = 12) participants are not gonna be scheduled. What would be the best way to approach this problem in order to find a solution that does its best to include each participant at least once and also considers their desires(when to play, how much to play).
I can do programming, I just need to know what kind of algorithm to implement and maybe have a brief logical explanation for the choice.
Result constraints:
Participants must not play more than they want
Participants must not be scheduled in days they don't want to play
Algorithm should do its best to include as many participants as possible.
A day will not be scheduled if less than m participants are available for that day.
Scheduling problems can get pretty gnarly, but yours isn't too bad actually. (Well, at least until you put out the first automated schedule and people complain about it and you start adding side constraints.)
The fact that a day can have a match or not creates the kind of non-convexity that makes these problems hard, but if k is small (e.g., k = 7), it's easy enough to brute force through all of the 2k possibilities for which days have a match. For the rest of this answer, assume we know.
Figuring out how to assign people to specific matches can be formulated as a min-cost circulation problem. I'm going to write it as an integer program because it's easier to understand in my opinion, and once you add side constraints you'll likely be reaching for an integer program solver anyway.
Let P be the set of people and M be the set of matches. For p in P and m in M let p ~ m if p is willing to play in m. Let U(p) be the upper bound on the number of matches for p. Let D be the number of people demanded by each match.
For each p ~ m, let x(p, m) be a 0-1 variable that is 1 if p plays in m and 0 if p does not play in m. For all p in P, let y(p) be a 0-1 variable (intuitively 1 if p plays in at least one match and 0 if p plays in no matches, but hold on a sec). We have constraints
# player doesn't play in too many matches
for all p in P, sum_{m in M | p ~ m} x(p, m) ≤ U(p)
# match has the right number of players
for all m in M, sum_{p in P | p ~ m} x(p, m) = D
# y(p) = 1 only if p plays in at least one match
for all p in P, y(p) ≤ sum_{m in M | p ~ m} x(p, m)
The objective is to maximize
sum_{p in P} y(p)
Note that we never actually force y(p) to be 1 if player p plays in at least one match. The maximization objective takes care of that for us.
You can write code to programmatically formulate and solve a given instance as a mixed-integer program (MIP) like this. With a MIP formulation, the sky's the limit for side constraints, e.g., avoid playing certain people on consecutive days, biasing the result to award at least two matches to as many people as possible given that as many people as possible got their first, etc., etc.
I have an idea if you need a basic solution that you can optimize and refine by small steps. I am talking about Flow Networks. Most of those that already know what they are are probably turning their nose because flow network are usually used to solve maximization problem, not optimization problem. And they are right in a sense, but I think it can be initially seen as maximizing the amount of player for each day that play. No need to say it is a kind of greedy approach if we stop here.
No more introduction, the purpose is to find the maximum flow inside this graph:
Each player has a number of days in which he wants to play, represented as the capacity of each edge from the Source to node player x. Each player node has as many edges from player x to day_of_week as the capacity previously found. Each of this 2nd level edges has a capacity of 1. The third level is filled by the edges that link day_of_week to the sink node. Quick example: player 2 is available 2 days: monday and tuesday, both have a limit of player, which is 12.
Until now 1st, 2nd and 4th constraints are satisfied (well, it was the easy part too): after you found the maximum flow of the entire graph you only select those path that does not have any residual capacity both on 2nd level (from players to day_of_weeks) and 3rd level (from day_of_weeks to the sink). It is easy to prove that with this level of "optimization" and under certain conditions, it is possible that it will not find any acceptable path even though it would have found one if it had made different choices while visiting the graph.
This part is the optimization problem that i meant before. I came up with at least two heuristic improvements:
While you visit the graph, store day_of_weeks in a priority queue where days with more players assigned have a higher priority too. In this way the amount of residual capacity of the entire graph is certainly less evenly distributed.
randomness is your friend. You are not obliged to run this algorithm only once, and every time you run it you should pick a random edge from a node in the player's level. At the end you average the results and choose the most common outcome. This is an situation where the majority rule perfectly applies.
Better to specify that everything above is just a starting point: the purpose of heuristic is to find the best approximated solution possible. With this type of problem and given your probably small input, this is not the right way but it is the easiest one when you do not know where to start.

Most efficient seating arrangement

There are n (n < 1000) groups of friends, with the size of the group being characterized by an array A[] (2 <= A[i] < 1000). Tables are present such that they can accommodate r(r>2) people at a time. What is the minimum number of tables needed for seating everyone, subject to the constraint that for every person there should be another person from his/her group sitting at his/her table.
The approach I was thinking was to break every group into sizes of twos and threes and try to solve this problem, but there are many ways of dividing a number n into groups of twos and threes and not all of them may be optimal.
Does a Mixed Integer Programming model count?
Some notes on this formulation:
I used random data to form the groups.
x(i,j) is the number of people of group i sitting at table j.
x(i,j) is a semi-integer variable, that is: it is an integer variable with values zero or between LO and UP. Not all MIP solvers offer semi-continuous and semi-integer variables but it may come handy. Here I use it to enforce that at least 2 persons from the same group need to sit at a table. If a solver does not offer these type of variables, we can formulate this construct using additional binary variables as well.
y(j) is a binary variable (0 or 1) indicating if a table is used.
the capacity equation is somewhat smart: if a table is not used (y(j)=0) its capacity is reduced to zero.
the option optcr=0 indicates we want to solve to optimality. For large, difficult problems we may want to stop say at 5%.
the order equation makes sure we start filling tables from table 1. This also reduces the symmetry of the problem and may speed up solution times.
the above model (with 200 groups and 200 potentially used tables) generates a MIP problem with 600 equations (rows) and 40k variables (columns). There are 37k integer variables. With a good MIP solver we find the proven optimal solution (with 150 tables used) in less than a minute.
Notice this is certainly not a knapsack problem (as suggested in another answer -- a knapsack problem has just one constraint) but it resembles a bin-packing problem.
It is same problem as knapsack problem which is NP complete (see https://en.wikipedia.org/wiki/Bin_packing_problem ). So finding optimal solution is pretty hard.
A heuristic that works most of the time:
Sort the groups according decreasing size.
For each group put it in the table that has least amount of space, but still can accommodate this group.
Your approach is workable. If a solution exists for a given number of tables, then a solution exists where you've split every group into some number of twos and some number of threes. First, split a three off of every group of odd size. You're left with a bunch of groups of even size. Next, split twos off of every group whose size isn't divisible by six. And forget that it's one bigger group; split it into a bunch of groups of six.
At this point, you have split all of your groups into some number of twos, some number of threes, and some number of sixes. Give each table of odd size one three, splitting sixes as necessary; now all tables have even size. All remaining sixes can now be split into twos and seated arbitrarily.

What's the grouping plan so that every two people are grouped together just once?

So, we have N people.
Everyday, we make N/2 groups out of them, i.e., 2 people are in one group.
We keep grouping everyday, until every two people have been paired exactly once, no more no less.
Please give the grouping plan for every day.
Here are my thoughts:
Out of N people, there are N * (N-1) / 2 possible pairs. Since everyday we will have N/2 pairs, totally we will need N-1 days.
So basically, if our algorithm takes a list of N people as input, we will output N-1 lists, each list will contain the pairs for a day.
But how to organise these N * (N-1) / 2 pairs into N-1 days?
I know how to do it in a bruteforce way, like worst case, we try every combinations of pairs for every day, or better use hashtset to see whether a combination for a day is possible or not (a hashset for a day).
But I think there must be a more elegant and efficient way to solve the problem. Graph?
Have a look at http://en.wikipedia.org/wiki/Round-robin_tournament#Scheduling_algorithm - This seems to answer your question. I have also seen this discussed in the context of chess matches and bond settlements.

Finding subsets being used at most k times

Every now and then I read all those conspiracy theories about Lotto-based games being controlled and a computer browsing through the combinations chosen by the players and determining the non-used subset. It got me thinking - how would such algorithm have to work in order to determine such subsets really efficiently? Finding non-used numbers is definitely crossed out as is finding the least used because it's not necesserily providing us with a solution. Also, going deeper, how could an algorithm efficiently choose such a subset that it was used some k times by the players? Saying more formally:
We are given a set of 50 numbers 1 to 50. In the draw 6 numbers are picked.
INPUT: m subsets each consisting of 6 distinct numbers 1 to 50 each,
integer k (0<=k) being the maximum players having all of their 6
numbers correct.
OUTPUT: Subsets which make not more than k players win the jackpot ('winning' means all the numbers they chose were picked in the draw).
Is there any efficient algorithm which could calculate this without using a terrabyte HDD to store all the encounters of every possible 50!/(44!*6!) in the pessimistic case? Honestly, I can't think of any.
If I wanted to run such a conspirancy I would first of all acquire the list of submissions by players. Then I would generate random lottery selections and see how many winners would be produced by each such selection. Then just choose the random lottery selection most attractive to me. There is little point doing anything more sophisticated, because that is probably already powerful enough to be noticed by staticians.
If you want to corrupt the lottery it would probably be easier and safer to select a few competitors you favour and have them win the lottery. In (the book) "1984" I think the state simply announced imaginary lottery winners, with the announcement in each area announcing somebody outside the area. One of the ideas in "The Beckoning Lady" by Margery Allingham is of a gang who attempt to set up a racecourse so they can rig races to allow them to disguise bribes as winnings.
First of all, the total number of combinations (choosing 6 from 50) is not very large. It is about 16 million which can be easily handled.
For each combination keep a count of number of people who played it. While declaring a winner choose the combination that has less than k plays.
If the number within each subset are sorted, then you can treat your subsets as strings - sort them in lexicographical order, then it is easy to count how many players selected each subset (and which subsets were not selected at all). So the time is proportional to the number of players and not the number of numbers in the lottery.

Resources