Algorithm to minimizing switching between paths - algorithm

I have a list of lists in the form l = [[1,2,3],[3,4,6],...]. There are m sublists each representing a player. Each player can perform a number of tasks (there are n tasks). I would like to find the shortest path through all the steps by minimizing the number of switches between players. So basically have the same player perform the tasks consecutively as often as possible. I'm trying to write an algorithm to optimize this that runs in polynomial time but I'm having a bit of trouble coming up with a good scheme. I was thinking it could be like Dijkstra's algorithm, but I'm not exactly sure how to adapt it to fit my case. Below a concrete example of what I want.
Example
n = 5 and m = 3 such that we have a list of lists l = [[1,2,5],[1,3,5],[2,3,4]]
The algorithm would return [0,2,2,2,0]
i.e. player 0 would be chosen first then swap to player 2 for 3 tasks than back to player 0 for the last task.
I'm just looking for pseudo code or a push in the right direction. Really struggling and brute force won't work for large numbers!

Since it is never beneficial to have a player perform fewer consecutive tasks than he can, a simple greedy algorithm suffices to find the optimal solution:
Starting with task 1, find the player that can execute the largest number of consecutive tasks starting with that first task.
Starting with the first task that the previously found player can't do, find the player that can execute the largest number of consecutive tasks starting with that task.
Repeat until all the tasks are done.
Here's a proof that this algorithm is optimal:
Let say there's an optimal solution that has player A performing tasks i through j and then player B performing tasks j+1 through k.
If there is any player (including A) that can perform tasks i through j+1, then we can use that player to do those tasks instead, and the solution will be as good or better. Either B will perform tasks j+2 through k, and the number of player switches will be the same, or j+1 = k and we won't need player B at all.
Therefore there is an optimal solution in which every chosen player maximizes the number of consecutive moves that can be performed by that player. In fact, since every such solution is equivalent, they are all optimal.
EDIT: As I was writing this, Pham suggests to use a segment tree, but no such complex data structure is necessary. If the sublists are sorted and you make an index from each task number to the sublist positions at which it can be found, then you can do this in O(N) time.

Related

Algorithm to find the most efficient way to distribute non-identical work tasks between workers

Let's say I have a set of tasks that need to be done. I have two identical workers that will process them, and (for simplicity's sake) let's assume that I have perfect information on the complexity of the tasks: there are not tasks I don't know about, and I know exactly how long each one will take to complete. (But different tasks will take different amounts of time.) Each worker can only work on one task at a time, and once begun, must continue to work on it until the task is complete. There are no dependencies between tasks, such that one must be finished before another can be worked on.
Given these constraints, is there any known-best algorithm to divide the work between the two workers so that the total time to complete all tasks is minimized? The obvious, naive solution is that each time a worker is free, always assign it the longest (or shortest) remaining task, but is there any method that is more efficient?
This is the partition problem, which is NP-Complete, but if the tasks times are given in relatively low integers - there is a pseudo-polynomial time Dynamic programming solution to solve it.
In your case, you are basically given a set of numbers - and you want to assign them to two subsets, such that the sum of subsets is equal (or closest as possible to being equal, which is the optimization variant of partition problem).
The recursive formula for the DP solution should be something similar to this:
DP[0, 0] = true
DP[0, x] = false | x != 0
DP[i, x] = DP[i-1, x-value[i]] OR DP[i-1, x]
^ ^
assigned i to S1 Assigned i to S2
Calculate all values needed for DP[n+1, SUM] (where SUM is the total sum of tasks and n is the number of tasks), and you are looking for a value DP[n+1, SUM/2] to see if that can be done perfectly.
Getting the actual tasks for each subset is done by retracing your steps, similar to explained here.

probability of winning a special sing-elimination tournament using dynamic programming and bitmask

Let's say we have N teams in a tournament and based on historical data we know what is the probability of each team winning any other team .Lets put all the probabilities in a matrix called P . P[a][b] is the probability team a winning team b. It is obvious that P[a][a] = 0 and P[a][b] = 1-P[b][a].
In this tournament at every round, two of teams compete against each other and the loser is eliminated. This two team are chosen randomly (with equal possibility of each team being picked). So at the first round we have n teams, next n-1 teams and so on until only one team remains and becomes the champion. What is the probability of each team becoming the champion? ( 1 <= N <= 18).
At first when I didn't know how to approach the problem but after some reading and search and keeping in mind that max n is 18 I figured at that using Dynamic programming and Bitmask is the way to go. How ever I couldn't figure at a solution. Here are my problems:
I have really hard time to figure at what are the sub problems and what sub problems should not be recomputed, basically I can't find a well defined recursive ( or not recursive) equation for the problem
In bitmask+dp problems we usually define something like dp[mask][n] or dp[n][mask]. I tried different approaches to define the mask but since the general solution is not clear to me there was no success
Some guidance on this two problems would be very helpful.
This is not really a dynamic programming problem.
If you have a vector V that gives the probability of each player being in the game after n rounds, then you can calculate the player probabilities for n+1 rounds by:
V'i = 2/((18-n)(17-n)) * sum over all j!=i of [ViVjPi,j]
That first factor is the probability that any given available match will be chosen, which depends on the number of previous rounds, because each successive round has fewer players to match up.
The second part is the probability of the players being available for each match, times the probability that the current player will win.
Just do this calculation 17 times to get the player probabilities after 17 rounds, which is the answer you're looking for. You can even drop that first factor, and fix it at the end by normalizing the vector so that the probabilities sum to 1.

Best approach to a variation of a bucketing problem

Find the most appropriate team compositions for days in which it is possible. A set of n participants, k days, a team has m slots. A participant specifies how many days he wants to be a part of and which days he is available.
Result constraints:
Participants must not be participating in more days than they want
Participants must not be scheduled in days they are not available in.
Algorithm should do its best to include as many unique participants as possible.
A day will not be scheduled if less than m participants are available for that day.
I find myself solving this problem manually every week at work for my football team scheduling and I'm sure there is a smart programmatic approach to solve it. Currently, we consider only 2 days per week and colleagues write down their name for which day they wanna participate, and it ends up having big lists for each day and impossible to please everyone.
I considered a new approach in which each colleague writes down his name, desired times per week to play and which days he is available, an example below:
Kane 3 1 2 3 4 5
The above line means that Kane wants to play 3 times this week and he is available Monday through Friday. First number represents days to play, next numbers represent available days(1 to 7, MOnday to Sunday).
Days with less than m (in my case, m = 12) participants are not gonna be scheduled. What would be the best way to approach this problem in order to find a solution that does its best to include each participant at least once and also considers their desires(when to play, how much to play).
I can do programming, I just need to know what kind of algorithm to implement and maybe have a brief logical explanation for the choice.
Result constraints:
Participants must not play more than they want
Participants must not be scheduled in days they don't want to play
Algorithm should do its best to include as many participants as possible.
A day will not be scheduled if less than m participants are available for that day.
Scheduling problems can get pretty gnarly, but yours isn't too bad actually. (Well, at least until you put out the first automated schedule and people complain about it and you start adding side constraints.)
The fact that a day can have a match or not creates the kind of non-convexity that makes these problems hard, but if k is small (e.g., k = 7), it's easy enough to brute force through all of the 2k possibilities for which days have a match. For the rest of this answer, assume we know.
Figuring out how to assign people to specific matches can be formulated as a min-cost circulation problem. I'm going to write it as an integer program because it's easier to understand in my opinion, and once you add side constraints you'll likely be reaching for an integer program solver anyway.
Let P be the set of people and M be the set of matches. For p in P and m in M let p ~ m if p is willing to play in m. Let U(p) be the upper bound on the number of matches for p. Let D be the number of people demanded by each match.
For each p ~ m, let x(p, m) be a 0-1 variable that is 1 if p plays in m and 0 if p does not play in m. For all p in P, let y(p) be a 0-1 variable (intuitively 1 if p plays in at least one match and 0 if p plays in no matches, but hold on a sec). We have constraints
# player doesn't play in too many matches
for all p in P, sum_{m in M | p ~ m} x(p, m) ≤ U(p)
# match has the right number of players
for all m in M, sum_{p in P | p ~ m} x(p, m) = D
# y(p) = 1 only if p plays in at least one match
for all p in P, y(p) ≤ sum_{m in M | p ~ m} x(p, m)
The objective is to maximize
sum_{p in P} y(p)
Note that we never actually force y(p) to be 1 if player p plays in at least one match. The maximization objective takes care of that for us.
You can write code to programmatically formulate and solve a given instance as a mixed-integer program (MIP) like this. With a MIP formulation, the sky's the limit for side constraints, e.g., avoid playing certain people on consecutive days, biasing the result to award at least two matches to as many people as possible given that as many people as possible got their first, etc., etc.
I have an idea if you need a basic solution that you can optimize and refine by small steps. I am talking about Flow Networks. Most of those that already know what they are are probably turning their nose because flow network are usually used to solve maximization problem, not optimization problem. And they are right in a sense, but I think it can be initially seen as maximizing the amount of player for each day that play. No need to say it is a kind of greedy approach if we stop here.
No more introduction, the purpose is to find the maximum flow inside this graph:
Each player has a number of days in which he wants to play, represented as the capacity of each edge from the Source to node player x. Each player node has as many edges from player x to day_of_week as the capacity previously found. Each of this 2nd level edges has a capacity of 1. The third level is filled by the edges that link day_of_week to the sink node. Quick example: player 2 is available 2 days: monday and tuesday, both have a limit of player, which is 12.
Until now 1st, 2nd and 4th constraints are satisfied (well, it was the easy part too): after you found the maximum flow of the entire graph you only select those path that does not have any residual capacity both on 2nd level (from players to day_of_weeks) and 3rd level (from day_of_weeks to the sink). It is easy to prove that with this level of "optimization" and under certain conditions, it is possible that it will not find any acceptable path even though it would have found one if it had made different choices while visiting the graph.
This part is the optimization problem that i meant before. I came up with at least two heuristic improvements:
While you visit the graph, store day_of_weeks in a priority queue where days with more players assigned have a higher priority too. In this way the amount of residual capacity of the entire graph is certainly less evenly distributed.
randomness is your friend. You are not obliged to run this algorithm only once, and every time you run it you should pick a random edge from a node in the player's level. At the end you average the results and choose the most common outcome. This is an situation where the majority rule perfectly applies.
Better to specify that everything above is just a starting point: the purpose of heuristic is to find the best approximated solution possible. With this type of problem and given your probably small input, this is not the right way but it is the easiest one when you do not know where to start.

Branching Factor and Depth

Question
A simple two-player game involves a pile of N matchsticks and two
players who have alternating turns. In each turn, a player removes 1,
2 or 3 matchsticks from the pile. The player who removes the last
matchstick loses the game.
A) What are the branching factor and depth of the game tree (give a general solution expressed in terms of N)? How large is the search
space?
B) How many unique states are there in the game? For large N what could be done to make the search more efficient?
Answer
A) I said the branching factor would be 3 but I justified this because the player could only ever remove up to 3 matches, meaning our tree would usually have three children. The second part with regards to the depth, I'm not sure.
B) N x 2 where N is the number of matches remaining. I am not sure how we could make the search more efficient though? Maybe introducing Alpha-beta pruning?
A :
For the depth, just imagine what the longest possible game would look like. It is the game that consists of both players only removing 1 match in each turn. Since there are n matches, such a game would take n turns : the tree has depth n.
B :
There are only 2*N states, each of them accessible from 3 states of higher matchstick count. Since the number of matches necessarily goes down as the game goes on, the graph of possible states is a DAG (Directed Acyclic Graph). A dynamic programming method is therefore possible to analyze this game. In the end, you will see that the optimal move only depends on N mod 4, with N the number of remaining matches.
EDIT : Proof idea for the N mod 4 :
Every position is either a losing or a winning position. A losing position is a situation where no matter what you play, if your adversary plays optimally, you will lose. Similarly, a winning position is a situation where if you play the right moves, the adversary cannot win. N=1 is a losing position (by definition of the game). Therefore, N=2,3,4 are winning positions because by removing the right amount of matches you put the adversary in a losing position. N=5 is a losing position because no matter what admissible number of matches you remove, you put the adversary in a winning position. N=6,7,8 are winning positions ... you get the idea.
Now it is just about making this proof formal : take as hypothesis that a position N is a losing position if and only if N mod 4 = 1. If that is true up to some integer k, you can prove that it is true for k+1. It's true up to k = 4 as we showed earlier. By recurrence, it is true for any N.
The state of the game at any time can be described by whose turn it is and the number of matches held by each player. After n moves there are 3^n possible histories, but for large n, many fewer than 3^n possible states, so you can save time by, for example, recognising that you are about to encounter a state that you have already encountered and worked out a value for before.
See also https://en.wikipedia.org/wiki/Nim - if this is Nim, or a variety of Nim, there are efficient strategies already worked out for it.

Ordering a dictionary to maximize common letters between adjacent words

This is intended to be a more concrete, easily expressable form of my earlier question.
Take a list of words from a dictionary with common letter length.
How to reorder this list tto keep as many letters as possible common between adjacent words?
Example 1:
AGNI, CIVA, DEVA, DEWA, KAMA, RAMA, SIVA, VAYU
reorders to:
AGNI, CIVA, SIVA, DEVA, DEWA, KAMA, RAMA, VAYU
Example 2:
DEVI, KALI, SHRI, VACH
reorders to:
DEVI, SHRI, KALI, VACH
The simplest algorithm seems to be: Pick anything, then search for the shortest distance?
However, DEVI->KALI (1 common) is equivalent to DEVI->SHRI (1 common)
Choosing the first match would result in fewer common pairs in the entire list (4 versus 5).
This seems that it should be simpler than full TSP?
What you're trying to do, is calculate the shortest hamiltonian path in a complete weighted graph, where each word is a vertex, and the weight of each edge is the number of letters that are differenct between those two words.
For your example, the graph would have edges weighted as so:
DEVI KALI SHRI VACH
DEVI X 3 3 4
KALI 3 X 3 3
SHRI 3 3 X 4
VACH 4 3 4 X
Then it's just a simple matter of picking your favorite TSP solving algorithm, and you're good to go.
My pseudo code:
Create a graph of nodes where each node represents a word
Create connections between all the nodes (every node connects to every other node). Each connection has a "value" which is the number of common characters.
Drop connections where the "value" is 0.
Walk the graph by preferring connections with the highest values. If you have two connections with the same value, try both recursively.
Store the output of a walk in a list along with the sum of the distance between the words in this particular result. I'm not 100% sure ATM if you can simply sum the connections you used. See for yourself.
From all outputs, chose the one with the highest value.
This problem is probably NP complete which means that the runtime of the algorithm will become unbearable as the dictionaries grow. Right now, I see only one way to optimize it: Cut the graph into several smaller graphs, run the code on each and then join the lists. The result won't be as perfect as when you try every permutation but the runtime will be much better and the final result might be "good enough".
[EDIT] Since this algorithm doesn't try every possible combination, it's quite possible to miss the perfect result. It's even possible to get caught in a local maximum. Say, you have a pair with a value of 7 but if you chose this pair, all other values drop to 1; if you didn't take this pair, most other values would be 2, giving a much better overall final result.
This algorithm trades perfection for speed. When trying every possible combination would take years, even with the fastest computer in the world, you must find some way to bound the runtime.
If the dictionaries are small, you can simply create every permutation and then select the best result. If they grow beyond a certain bound, you're doomed.
Another solution is to mix the two. Use the greedy algorithm to find "islands" which are probably pretty good and then use the "complete search" to sort the small islands.
This can be done with a recursive approach. Pseudo-code:
Start with one of the words, call it w
FindNext(w, l) // l = list of words without w
Get a list l of the words near to w
If only one word in list
Return that word
Else
For every word w' in l do FindNext(w', l') //l' = l without w'
You can add some score to count common pairs and to prefer "better" lists.
You may want to take a look at BK-Trees, which make finding words with a given distance to each other efficient. Not a total solution, but possibly a component of one.
This problem has a name: n-ary Gray code. Since you're using English letters, n = 26. The Wikipedia article on Gray code describes the problem and includes some sample code.

Resources