Divide and conquer algorithm to solve a real-life situation - sorting

I am trying to figure out an approach to this problem I am trying to solve. The problem is say we are having a vote to build a new facility in some company, and the company policy states that the winning facility must obtain strictly more than half of the votes to be built, i.e., let n be the
total number of votes, if n = 20, we must obtain at least 11 votes to win; if n = 21, we also need 11 votes to win.
My job is to figure out what facility is the winner given all the votes, or decide that no new facility is being built.
The input of this algorithm is a list “votes” of all facilities, where each element is a facility object. The list may look like the following:
votes = [
Bathroom, gym, food court, Bathroom,
spa, gym, Bathroom, Bathroom,Bathroom, game room, Bathroom
]
Our algorithm should return a facility object who is the winner of the vote or return None if nothing has a majority vote.
Note that we CANNOT sort the list of votes because the output of the comparison operation between facility
objects, like “Bathroom < gym” is NOT defined. However, we CAN test the equality between Candidate objects, because
the output of the equality test “Bathroom == Bathroom” is well defined
I am just trying to figure out the pseudo code for this algorithm and how can this algorithm be represented recursively using the divide and conquer technique any help well be appreciated thanks.

Solve in Python; build a Counter dictionary from votes. Extract the most popular choice and its vote count with max on the counts. If the vote_count > len(votes) / 2.0, return the choice -- else return None.
I don't see where this is a divide-and-conquer algorithm. Even if you have to construct the counter table yourself, it's a straight iterative problem.

Related

Algorithm to allocate resources by priority

My problem is the following:
Me and my team are moving to another part of the office and we have to decide everybody's place to sit. However, everybody has priorities. I would like to find an algorithm which helps us to distribute the seats in a way that everybody is satisfied. (Or the most of them at least.)
I've started to implement my own algorithm where I ask 3 preferred options (the team consists of 10 people and there are 10 places) from everybody and consider there "seniority" (the length of the time they have spent in the team) as a rank between them.
However, I've stuck without any luck, tried to browse the internet for an algorithm which solves a similar problem but didn't find any.
What would be the best way to solve this? Is there any
generally known algorithm which solves this or a similar problem?
Thank you!
What first comes to mind for me is the stable marriage problem. Here's the problem statement for the original algorithm:
Given n men and n women, where each person has ranked all members of the opposite sex in order of preference, marry the men and women together such that there are no two people of opposite sex who would both rather have each other than their current partners. When there are no such pairs of people, the set of marriages is deemed stable.
Please read up on the Gale–Shapley algorithm, which is what I'll adapt for this problem.
Have each worker make a list of their rankings for all the spots. These will be the "men". Then, each spot will use the seniority ranking as their rankings for the "men". The spots will be the "women" in the Gale-Shapley algorithm.
You will get a seat assignment that has no "unstable marriage". Here's what an unstable marriage is:
There is an element A of the first matched set which prefers some given element B of the second matched set over the element to which A is already matched, and
B also prefers A over the element to which B is already matched.
In this context, an unstable marriage means that there is a worker-seat between W1 and S1 assignment such that another worker, W2, has ranked S1 higher. Not only that, S1 has also ranked W2 higher. Since the seats made their list based off the seniority list, it means that W2 has higher seniority.
In effect, this means that you'll get a seating assignment such that no worker has a seat that someone else with higher seniority wants "more".
The bottom of that Wiki article mentions packages in R and Python that have already implemented the algorithm, so it's just up to you to input the preference lists.
Disclaimer: This is probably not the most efficient algorithm. All the seats have the same ranking list, so there's probably a shortcut somewhere. However, it's easier to use a cannon to kill a fly, if the cannon is already written in R/Python for you. Also, this is the only algorithm I remember from uni, so this is the only hammer I have for any nail.
I decided to implement a brute force solution as lots of the comments suggested.
So:
I asked everybody from the team to give a preference order between the seats (10 to 1, what I use as score the "teamMember-seat" pairings, 10 is the highest score)
collected all of the "teamMember-seat" pairings with scores e.g. name:Steve, seat:seat1, score:5 (the score is from the given order from the previous step)
generated all the possible sitting combination from these
e.g.
List1: [name:Steve seat:seat1 score:5], [name:John seat:seat2 score:3] ... [name:X seat:seatY score:X]
List2: [name:Steve seat:seat2 score:4], [name:John seat:seat1 score:4] ... [name:X seat:seatY score:X]
...
ListX: [],[]...
chose the "teamMember-seat" list(s) with the highest score (score of the list is calculated by summing the scores of the "teamMember-seat" pairings)
if there are 2 lists with equal scores, then the algorithm choose that one where the most senior team members get the most preferred seats of them
if still there are more then one list (combination) the algorithm choose one randomly
I'm sure there are some better algorithms to do this as some of you suggested but I've run out of time.
I didn't post the code since it is really long and not too complicated to implement. However, if you need it, don't hesitate to drop a private message.

Need an algorithm approach to calculate meal plan

I’m having trouble solving a deceptively simple problem. My girlfriend and I are trying to formulate weekly meal plans and I had this brilliant idea that I could optimize what we buy in order to maximize the things that we could make from it. The trouble is, the problem is not as easy as it appears. Here’s the problem statement in a nutshell:
The problem:
Given a list of 100 ingredients and a list of 50 dishes that are composed of one or more of the 100 ingredients, find a list of 32 ingredients that can produce the maximum number of dishes.
This problem seems simple, but I’m finding that computing the answer is not trivial. The approach that I’ve taken is that I’ve computed a combination of the 32 ingredients as a 100 bit string with 32 of the bits set. Then I do a check of what dishes can be made with that ingredient number. If the number of dishes is greater than the current maximum, I save off the list. Then I compute the next valid ingredient combination and repeat, repeat, and repeat.
The number of combinations of the 32 ingredients is staggering! The way that I see it, it would take about 300 trillion years to calculate using my method. I’ve optimized the code so that each combination takes a mere 75 microseconds to figure out. Assuming that I can optimize the code, I might be able to reduce the run time to a mere trillion years.
I’m thinking that a completely new approach is in order. I'm currently coding this in XOJO (REALbasic), but I think the real problem is with approach rather than specific implementation. Anybody have an idea for an approach that has a chance of completion during this century?
Thanks,
Ron
mcdowella's branch and bound solution will be a big improvement over exhaustive enumeration, but it might still take a few thousand years. This is the kind of problem that is really best solved by an ILP solver.
Assuming that the set of ingredients for meal i is given by R[i] = { R[i][1], R[i][2], ..., R[i][|R[i]|] }, you can encode the problem as follows:
Create an integer variable x[i] for each ingredient 1 <= i <= 100. Each of these variables should be constrained to the range [0, 1].
Create an integer variable y[i] for each meal 1 <= i <= 50. Each of these variables should be constrained to the range [0, 1].
For each meal i, create |R[i]| additional constraints of the form y[i] <= x[R[i][j]] for 1 <= j <= |R[i]|. These will guarantee that we can only set y[i] to 1 if all of meal i's ingredients have been included.
Add a constraint that the sum of all x[i] must be <= 32.
Finally, the objective function should be the sum of all y[i], and we should be trying to maximise this.
Solving this will produce assignments for all the variables x[i]: 1 means the ingredient should be included, 0 means it should not.
My feeling is that a commercial ILP solver like CPLEX or Gurobi will probably solve a 150-variable ILP problem like this in milliseconds; even freely available solvers like lp_solve, which as a rule are much slower, should have no problems. In the unlikely case that it seems to be taking forever, you can still solve the LP relaxation, which will be very fast (milliseconds) and will give you (a) an upper bound on the maximum number of meals that can be prepared and (b) "hints" in the variable values: although the x[i] will in general not be exactly 0 or 1, values close to 1 are suggestive of ingredients that should be included, while values close to 0 suggest unhelpful ingredients.
There will be a http://en.wikipedia.org/wiki/Branch_and_bound solution to this, but it may be too expensive to get the exact answer - ILP as suggested by j_random_hacker is probably better - the LP relaxation of that is probably a better heuristic than the relaxation proposed here, and the ILP solver will be heavily optimized.
The basic idea is that you do a recursive depth first search of a tree of partial solutions, extending them one at a time. Once you recurse far enough down to reach a fully populated solution you can start keeping track of the best solution found so far. If I label your ingredients A, B, C, D... a partial solution is a list of ingredients of length <= 32. You start with the zero-length solution, then when you visit a partial solution e.g. ABC you consider ABCD, ABCE, ... and so on, and may visit some of these.
For each partial solution you work out the maximum score that any descendant of that solution could achieve. Getting an accurate idea of this is important. Here is one suggestion - suppose you have a partial solution of length 20. This leaves 12 ingredients to be chosen, so the best you could possibly do is to make all dishes which require no more than 12 ingredients not already in the 20 you have chosen so far work out how many of those there are and this is one example of a best possible score to any descendant of the partial solution.
Now when you consider extending the partial solution ABC to ABCD or ABCE or ABCF... if you have a best solution found so far you can ignore all extensions that cannot possibly score more than the best solution so far - this means that you don't need to consider all possible combinations of your 32 ingredients.
Once you have worked out which of the possible extensions might contain a new best answer, your recursive search should continue with the most promising of these possible extensions, because this is the one most likely to survive finding a better best solution so far.
One way to make this fast is to code it cleverly so that recursing up and down means only small changes to the existing data structure which you typically make on the way down and reverse on the way up.
Another way is to cut corners. One obvious way is to stop when you run out of time and go for the best solution found so far at that stage. Another way is to discard partial solutions more aggressively. If you have a score so far of e.g. 100 you could discard partial solutions that couldn't score any better than 110. This speeds up the search, and you know that although you might have best a better answer than 100 whatever you missed could not have been better than 110.
Solving some discrete mathematics huh? Well here is the wiki.
You also have not factored in anything about quantity. For example, flour would be used in a lot of fried recipes but buying 10 pounds of flour might not be great. And cost might be prohibitive for some ingredients that your solution wants. Not to mention a lot of ingredients are in everything. (milk, water, salt, pepper, sugar things like that)
In reality, optimization to this degree is probably not necessary. But I will not provide relationship advice on SO.
As for a new solution:
I would suggest identifying a lot of what you want to make and with what, and then writing a program to suggest things to make with the rest.
Why not just order the list of ingredients by the number of dishes they are used in?
This would be more like a greedy solution, of course, but it should give you some clues about what ingredients are most often used. From that you can compile a list of dishes that can be cooked already with the top 30 (or whatever) ingredients.
Also you could order the list of remaining (non-cookable) dishes by number of missing ingredients and maybe try to optimize on that to maximize the number of cookable dishes.
To be more "algorithmic", I think a local search is most promising here. Start with a candidate solution (random assignments to the 32 ingredients) and calculate as a fitness function the number of cookable dishes. Then check the neighboring states (switching one ingredient) and move to the state with the highest value. Repeat until a maximum is reached. Do this veeeery often and you should find a good solution. (This would be a simple greedy hill-climbing algorithm)
There are a lot of local search algorithms, you should be able to find more than enough information on the net. Most often you won't find the optimal solution (of course that depends on the problem), but a very good one nonetheless.

Quick relative ranking algorithm

Let's say I own 100 video games, and I want to order them from most liked to least liked. It's very hard to give each video game a numeric value that represents how much I like it, so I thought of comparing them to each other.
One solution I came up with is picking 2 random video games, and selecting which one I liked more, and discarding the other one. Unfortunately this solution only lets me know the #1 video game since that would be the last one remaining, and provides little information about the others. I could then repeat the process for the other 99 video games, and so on but that is very impractical: O(n^2).
Are there any O(n) (or just reasonable) algorithms that can be used to sort data based on relative criteria?
If you want to present the games in a sequential order, you need to decide upon it.
It is possible to derive a sequential order from a set of pairwise comparisons.
Here is an example. You have 100 video games. We assume that every video game is associated with a parameter ai (where i ranges from 1 to 100). It is a real number that describes how "much" you like the game. We don't know the values of those parameters yet. We then choose a function that describes how likely it is that you prefer video game i over video game j in terms of the parameters. We choose the logistic curve and define
P[i preferred over j] = 1/(1+eaj - ai)
Now when ai = aj you have P = 0.5, and when, say, ai = 1 and aj = 0 you have P = 1/(1 + e-1) = 0.73, showing that a relative higher parameter values increases the probability that the corresponding video game is preferred.
Now then, when you have your actual comparison results in a table, you use the method of maximum likelihood to calculate the actual values for the parameters ai. Then you sort your video games in descending order of the calculated parameters.
What happens is that the maximum likelihood method calculates those values for the parameters ai that make the actual observed preferences as likely as possible, so the calculated parameters represent the best guess about a total ordering between the video games. Note that for this to work, you need to compare video games to other video games enough many times---every game needs at least one comparison, and the comparisons cannot form disjoint subsets (e.g. you compare A to B to C to A, and D to E to F to D, but there is no comparison between a game from {A,B,C} and a game from {D,E,F}).
You could use quicksort aka pivot sort. Pick a game, and compare every other game to it, so you have a group of worse game and better games. Repeat for each half recursively. Average case performance is n log n.
http://en.wikipedia.org/wiki/Quicksort
Other way would be to extend your idea. Display more that 2 games and sort them out by your rating. The idea i similar to a merge sort to rate your games. If you pick games for rating correctly you wont need to do a lot of iterations. Just a fiew. IMO O(n) will be quite hard, because your (as a human) observation is limited.
As a start, you could keep a list and insert each element one-by-one using binary search, giving you an O(n log n) approach.
I'm also certain that you can't beat O(n log n), unless I misunderstood what you want. Basically, what you're telling me is that you want to be able to sort some elements (in your example, video games) using only comparisons.
Think of your algorithm as this: you start with the n! possible ways to arrange your games, and every time you make a comparison, you split the arrangements into POSSIBLE and IMPOSSIBLE, discarding the latter group. (POSSIBLE here meaning that the arrangement is consistent with the comparisons you have made)
In the worst-case scenario, the POSSIBLE group is always at least as big as the IMPOSSIBLE group. In that instance, none of your comparisons reduce the search space by at least a factor of 2, meaning you need at least log_2(n!) = O(n log n) comparisons to reduce the space to 1, giving you your ordering of games.
As to whether there is an O(n) way to sort n objects, there aren't. The lower bound on such a sort would be O(nlogn).
There is a special case, however. If you have a unique and bounded preference, then you can do what's called a bucket sort.
A preference is unique if no two games tie.
A preference is bounded if there is a minimum and maximum value for your preference.
Let 1 .. m be the bound for your set of games.
Just create an array with m elements, and place each game in the index according to your preference.
Now you can just do a linear scan over the array for your sorted order.
But of course, that's not comparison-based.
One possibility would be to create several criteria C1, C2, ..., Cn like:
video quality
difficulty
interest of scenario
...
You pass each game thru this sieve.
Then you compare a subset of game pairs (2-rank choice), and tells which one you prefer. There exist some Multi-Criteria-Decision-Making/Analysis (MCDM or MCDA) algorithm that will transform your 2-rank choices into a multi-criteria-ranking function, for example one could calculate coefficients a1, ..., an to build a linear ranking function a1*C1+a2*C2+...+an*Cn.
Good algorithms won't let you choose pairs at random but will propose you the pairs to compare based on a non dominated subset.
See wikipedia http://en.wikipedia.org/wiki/Multi-criteria_decision_analysis which gives some usefull links, and be prepared to do/read some math.
Or buy a software like ModeFrontier which has some of these algorithms embedded (a bit expensive if just for ranking a library).
I dont think it can be done in O(n) time. Best we can get is O(nlogn)using merge or quick sort.
the way I would approach this is to have an array with the game title and a counting slot.
Object[][] Games = new Object[100][2];
Games[0][0] = "Game Title1";
Games[0][1] = 2;
Games[1][0] = "Game Title2";
Games[1][1] = 1;
for every time you vote it should add one to the Games[*][1] slot and from there you can sort based on that.
While not O(n), a pairwise comparison is one way to rank the elements of a set relative to one another.
To implement the algorithm:
Create a 100x100 matrix
Each row represents a movie, each column represents a movie. The movie at r1 is the same as the movie at c1, r2=c2...r100=c100.
Here is some quick pseudo-code to describe the algorithm:
for each row
for each column
if row is better than column
row.score++
else
column.score++
end
end
movie_rating = movie[row] + movie[column]
sort_by_movie_rating()
I understand its hard to quantify how much you like something but what if you created several "fields" that you would judge each game on:
graphics
story
multiplayer
etc...
give each 1-5 out of 5 for each category (changing weights for categories you deem more important). Try to create an objective scale for judging (possibly using external sources, e.g. metacritic)
Then you add them all up which gives an overall rating of how much you like them. Then use a sorting algorithm (MergeSort? InsertionSort?) to place them in order. That would be O(n*m+nlogn) [n = games, m = categories] which is pretty good considering m is likely to be very small.
If you were really determined, you could use machine learning to approximate future games based on your past selection.

algorithm for solving resource allocation problems

Hi I am building a program wherein students are signing up for an exam which is conducted at several cities through out the country. While signing up students provide a list of three cities where they would like to give the exam in order of their preference. So a student may say his first preference for an exam centre is New York followed by Chicago followed by Boston.
Now keeping in mind that as the exam centres have limited capacity they cannot accomodate each students first choice .We would however try and provide as many students either their first or second choice of centres and as far as possible avoid students having to give the third choice centre to a student
Now any ideas of a sorting algorithm that would mke this process more efficent.The simple way to do this would be to first go through the list of first choice of students allot as many as possible then go through the list of second choices and allot. However this may lead to the students who are first in the list getting their first centre and the last students getting their third choice or worse none of their choices. Anything that could make this more efficient
Sounds like a variant of the classic stable marriages problem or the college admission problem. The Wikipedia lists a linear-time (in the number of preferences, O(n²) in the number of persons) algorithm for the former; the NRMP describes an efficient algorithm for the latter.
I suspect that if you randomly generate preferences of exam places for students (one Fisher–Yates shuffle per exam place) and then apply the stable marriages algorithm, you'll get a pretty fair and efficient solution.
This problem could be formulated as an instance of minimum cost flow. Let N be the number of students. Let each student be a source vertex with capacity 1. Let each exam center be a sink vertex with capacity, well, its capacity. Make an arc from each student to his first, second, and third choices. Set the cost of first choice arcs to 0; the cost of second choice arcs to 1; and the cost of third choice arcs to N + 1.
Find a minimum-cost flow that moves N units of flow. Assuming that your solver returns an integral solution (it should; flow LPs are totally unimodular), each student flows one unit to his assigned center. The costs minimize the number of third-choice assignments, breaking ties by the number of second-choice assignments.
There are a class of algorithms that address this allocating of limited resources called auctions. Basically in this case each student would get a certain amount of money (a number they can spend), then your software would make bids between those students. You might use a formula based on preferences.
An example would be for tutorial times. If you put down your preferences, then you would effectively bid more for those times and less for the times you don't want. So if you don't get your preferences you have more "money" to bid with for other tutorials.

Algorithm for sorting people into rooms based on age and nationality

I’m working on program for the English Language school I work for. I’m not being paid, its just a kind of a hobby to improve / automate my work flow.
It’s a residential school and one aspects I’m looking at automating is the way we allocate room to students, and although I don’t want a full blown solution I was hoping someone could point me in the right direction… Suggestions of the way you might approach this or by suggesting algorithms to look at etc.
Basically at the school we have a whole bunch of different rooms ranging from singles to dormitories for 8 people. We get lots of different nationalities from all over the world, and we always try to maker sure each room has a mix of nationalities. Where there is more than one nationality we try to balance them. Age is also important, we always put students of a similar age together, while still trying to mix nationalities, and its unusual for us to have students sharing with more than two years between them.
I suppose more generically speaking, I am in interested in how to sort a given set of students based on two parameters to an optimal result with a few rules attached.
I hope I’ve explain clearly what I am trying to achieve… in a way it sounds really simple, but I’ve trying to think how to do it in a simple way, i.e. by sorting by nationality and then by age but it just doesn’t cut it and I know there must be a better way of approaching this. When I do it “by hand” on an excel sheet it does feel quite intuitive.
Thank you to anyone who offers help / advice.
This is an interesting question but it's not easy to answer. Somehow it's connected with subdivsion and bin packing or the cutting-stock problem. You may want to look for a topological sort too. You can look for Drools a business logic platform that let you define such rules.
First of all you might find this interesting: Stable Room-mates Problem (wikipedia). Unfortunately it does not answer your question.
Try a genetic algorithm.
There are three main criteria for using a genetic algorithm:
ability to represent a solution as a mutable array. We can have an array of integers such that a[i] is the room for the ith student.
mutation of the state should produce predictable results. In our case this is true. Mutating the array will predictably shuffle students between the rooms.
easy to write a fast fitness function. Shouldn't be too hard to write a O(n) fitness function.
This is an interesting problem. I'll try writing some code with this approach and we'll see what happens.
How about, you think of a room as something that repels students of a nationality it already has, and attracts students of a close age to what it already has. The closer the age to the average age, the more it attracts it, and the more guys of X nationality are in the room, the more if repels guys of X nationality.
Then you would, for every new student to be added, iterate through each room and see which is the one that attracts it more. I guess if the room is empty you can set all forces to 0. Also, you would have a couple of constants that multiply each of both "forces" so you can calibrate it depending on how important is to have the same age against how important is to have different nationalities.
I'd analyze each student and create a 'personality' vector based on his/her age & nationality. Then I'd sort the vectors, and maybe scramble the results a bit after sorting to encourage diversity.
The general theme of "assign x to y with respect to constraints while optimizing some quantity" falls within operations research or more specifically http://en.wikipedia.org/wiki/Mathematical_optimization. The usual approach is to formally specify the problem and use a generic optimization solver such as one of those listed in http://en.wikipedia.org/wiki/List_of_optimization_software.
Give it a try, the formal specification languages for using the existing solvers are rather easy to learn and you might get an optimal solution without having to debug a complicated algorithm.
Formulation as a General Optimization Problem
It will be useful to formalize constraints and parameters. Let us assume that for 1 <= i <= 8, we have n_i rooms available of size i. Now let us impose the hard constraint that in a particular room S, every two students a, b \in S, we have that:
|Grade(a) - Grade(b)| <= 2 (1)
Now we are interested in optimizing the "diversity" function which intuitively represents the idea that we want rooms to be as mixed as possible. So we can represent this goal as:
max over all arrangements {{ Sum over all rooms S of DiversityScore(S) }}
where we have DiversityScore(S) = # of Different Nationalities in the Room
Formulation as a Graph Problem
This is the most general setting, but clearly max over all arrangements is not computationally feasible. Now let us pose this as a sort of graph problem with the hard grade constraints. Denote all students as a vertex in a Graph G. Connect two vertices if students satisfy constraint (1). Now a clique in this graph represents a group of students that can all be placed in the same room. Now proceed in a greedy manner. Choose the largest clique of size 4 which has the largest Diversity Score. Then place them in a room and continue until all rooms are filled. This clique search method can also incorporate gender constraints which is useful, however not that Clique finding is NP Hard Problem.
Now before trying to come up with something that may be faster, let us think about how to weaken the hard constraint (1). We can massage our graph formulation by including edge weights into the picture. So if the hard constraint is satisfied denote the edge weight from i to j as 1. If two students i and j deviate by age more than 2 denote the edge weight as 1 / (Age Difference)^2 or something. Then the score of a clique should be a product of the cliques edge weights with some diversity score. However it becomes clear that now the problem is on a complete graph, which is just the general optimization we hoped to avoid, so we need to impose some hard restrictions to reduce the connectivity of our graph.
A Basic Sorting Approximation Algorithm
Sort all students by their age, so we have a sorted array where all students in a[i] have the same age, and all students in a[i] are older than all students in a[j] for all j < i.
Now consider each pair i, j, of which there are O(n^2), where we also have that |Age[i] - Age[j]| <= 2. Find the largest group of students with different nationalities and place them in a room together. We successively iterate over O(n^2) index pairs which satisfy the hard constraint and take any students with nationality difference (which we can find by preprocessing and hashing on the index pairs). Doing this carefully (like looking at indices i j which are spread apart before close together) improves running time further. It feels like it should be polytime, but I think there are certain subtleties to address first before saying so.

Resources