Hungarian method variant - algorithm

I have some workers and tasks, and want to assign the best workers for each task, that's a typical use for Hungarian method.
But let's add that tasks happen at certain day/time, and I want to add in consideration that for a given worker, I'd like his tasks to be as close in time as possible.
Is there an algorithm where I could set a priority between the two different goals?
edit: let's try a simple example. I have 3 tasks happening respectively at 1:00, 2:00 and 3:00. I have 2 workers for my tasks. If I consider only the quality of the work, I assign worker 1 on slots 1 and 3, and worker 2 on slot 2. But then the 1st worker has a hole in his schedule, So I'd like to put a "weight" value so that I have an equilibrium between the total quality and the will to have compact work hours.
Thanks,
Guillaume

Related

A job allocating proble

I got n people, need to do N tasks, each task takes (x man-day), tasks have dependence, such as task A must start after task B finished. How should I arrangement it? Thank you so much!
I set up 4 rules for this question:
one Task could set most num of people to do it (such as 5)
there are logical dependence between tasks
one Task will cost determinated man-days
everyday concurrent working people can not be above current people num(n)
the rules are above, but i don't know how to calculate a minimum total time. Please tell me a method to solve it or some inspiration, thank you so much!
Make topological sort of jobs.
Now you have sequence of events (time; job start/job end)
On start of the job assign a man, if there are free workers, otherwise wait.
On job end free a worker, assign him to the waiting job if exists.
This problem is well-known to be NP-complete. Even the simple version with no dependencies, and only one worker per job, is NP-complete.
So you likely need to accept a reasonable heuristic. Here is one.
With a breadth-first search starting from the jobs which nothing else depends on, figure out for each job the minimum wall-clock time (max out workers on the job, and everything that depends on it, even if that is not actually possible) from starting that job to finishing.
And now you start at the top. Always assign workers to jobs by the following rules:
Longest wallclock time to finish first.
Break ties in favor of longest job.
Break ties in favor of job with fewest max workers.
In other words you're prioritizing putting people to work on the critical path and any slow jobs.

Assign same queue item to multiple workers

We have a logical queue of tasks, where each task has to be assigned to multiple workers.
The numbers of workers to be assigned are based on a configuration of Minimum and Maximum workers.
A worker should not see the same task that they have already completed. It is not necessary that all workers will see all the tasks.
Total number of workers can change dynamically. Each worker can become online or offline anytime.
Each worker can choose to either complete the task or let it expire.
On expiry the task should be assigned to any worker who has not already completed the task.
Is there a good algorithm to solve this scenario ?
The easy solution:
Greedily assign tasks:
For every task that is ready.
Find the Minimum <= N <= Maximum workers that
didn't see the task yet and assign them.
Repeat until you either run
out of workers or finish all the tasks.
If a worker comes online or finishes task, re-check for all the tasks.
If a new task comes, re-check for the available workers.
This solution might be enough if there are not that many tasks, as it is computation heavy and recomputes everything.
The possible optimizations:
If the greedy solution fails(and it probably will), there are ways to improve on it. I will try to list those that come to my mind but it won't be an exhaustive list.
First, my personal favorite: Network flows. Unfortunately I don't see an easy way to resolve the minimal number of workers requirement, however it would be fast and it would result in a maximum possible workers being assigned at any given moment.
Create the network Source - Workers - Tasks - Sink. Edges workers to tasks will be linked and unlinked as needed:
when a worker is available for a task, create the edge with weight 1, otherwise don't create the edge.
From source link an edge with weight one to each online worker.
From every task link an edge with weight equal to its maximal worker capacity.
You could even differentiate between different kinds of workers, network flows are awesome. The algorithms are fast which makes them suitable even for large graphs. Also they are available in many libraries so you won't have to implement them yourself. Unfortunately, there is no easy way to enforce the minimal workers rule. At least I don't see one right now, there might be some way. Or maybe at least a heuristic
Second, be smart while being greedy:
Create a queue for every task.
When a worker is available, register him for every task he can do into its queue.
When a worker is not available, remove him from the queues.
When a task has enough workers, start progress and disable these workers.
This is still the brute force approach however since you keep the queues, you limit the amount of the necessary computation to a reasonable level. The potential downside is that large tasks(with a large minimal number of workers) might be stalled by the small tasks that will be easier to start - and will eat up the workers. So there will probably be some further checking/balancing and prioritizing needed.
There is definitely more to be tried & done for your task at hand, however the information you provided is rather limited so this advice is not that specific.

Recommend algorithm of fair distributed resources allocation consensus

There are distributed computation nodes and there are set of computation tasks represented by rows in a database table (a row per task):
A node has no information about other nodes: can't talk other nodes and doesn't even know how many other nodes there are
Nodes can be added and removed, nodes may die and be restarted
A node connected only to a database
There is no limit of tasks per node
Tasks pool is not finite, new tasks always arrive
A node takes a task by marking that row with some timestamp, so that other nodes don't consider it until some timeout is passed after that timestamp (in case of node death and task not done)
The goal is to evenly distribute tasks among nodes. To achieve that I need to define some common algorithm of tasks acquisition: when a node starts, how many tasks to take?
If a node takes all available tasks, when one node is always busy and others idle. So it's not an option.
A good approach would be for each node to take tasks 1 by 1 with some delay. So
each node periodically (once in some time) checks if there are free tasks and takes only 1 task. In this way, shortly after start all nodes acquire all tasks that are more or less equally distributed. However the drawback is that because of the delay, it would take some time to take last task into processing (say there are 10000 tasks, 10 nodes, delay is 1 second: it would take 10000 tasks * 1 second / 10 nodes = 1000 seconds from start until all tasks are taken). Also the distribution is non-deterministic and skew is possible.
Question: what kind/class of algorithms solve such problem, allowing quickly and evenly distribute tasks using some sync point (database in this case), without electing a leader?
For example: nodes use some table to announce what tasks they want to take, then after some coordination steps they achieve consensus and start processing, etc.
So this comes down to a few factors to consider.
How many tasks are currently available overall?
How many tasks are currently accepted overall?
How many tasks has the node accepted in the last X minutes.
How many tasks has the node completed in the last X minutes.
Can the row fields be modified? (A field added).
Can a node request more tasks after it has finished it's current tasks or must all tasks be immediately distributed?
My inclination is do the following:
If practical, add a "node identifier" field (UUID) to the table with the rows. A node when ran generates a UUID node identifier. When it accepts a task it adds a timestamp and it's UUID. This easily let's other nodes be able to determine how many "active" nodes there are.
To determine allocation, the node determines how many tasks are available/accepted. it then notes how many many unique node identifiers (including itself) have accepted tasks. It then uses this formula to accept more tasks (ideally at random to minimize the chance of competition with other nodes). 2 * available_tasks / active_nodes - nodes_accepted_tasks. So if there are 100 available tasks, 10 active nodes, and this node has accepted 5 task already. Then it would accept: 100 / 10 - 5 = 5 tasks. If nodes only look for more tasks when they no longer have any tasks then you can just use available_tasks / active_nodes.
To avoid issues, there should be a max number of tasks that a node will accept at once.
If node identifier is impractical. I would just say that each node should aim to take ceil(sqrt(N)) random tasks, where N is the number of available tasks. If there are 100 tasks. The first node will take 10, the second will take 10, the 3rd will take 9, the 4th will take 9, the 5th will take 8, and so on. This won't evenly distribute all the tasks at once, but it will ensure the nodes get a roughly even number of tasks. The slight staggering of # of tasks means that the nodes will not all finish their tasks at the same time (which admittedly may or may not be desirable). By not fully distributing them (unless there are sqrt(N) nodes), it also reduces the likelihood of conflicts (especially if tasks are randomly selected) are reduced. It also reduces the number of "failed" tasks if a node goes down.
This of course assumes that a node can request more tasks after it has started, if not, it makes it much more tricky.
As for an additional table, you could actually use that to keep track of the current status of the nodes. Each node records how many tasks it has, it's unique UUID and when it last completed a task. Though that may have potential issues with database churn. I think it's probably good enough to just record which node has accepted the task along with when it accepted it. This is again more useful if nodes can request tasks in the future.

What's the best Task scheduling algorithm for some given tasks?

We have a list of tasks with different length, a number of cpu cores and a Context Switch time.
We want to find the best scheduling of tasks among the cores to maximize processor utilization.
How could we find this?
Isn't it like if we choose the biggest available tasks from the list and give them one by one to the current ready cores, it's going to be best or you think we must try all orders to find out which is the best?
I must add that all cores are ready at the time unit 0 and the tasks are supposed to work concurrently.
The idea here is that there's no silver bullet, for what you must consider what are the types of tasks being executed, and try to schedule them as nicely as possible.
CPU-bound tasks don't use much communication (I/O), and thus, need to be continuously executed, and interrupted only when necessary -- according to the policy being used;
I/O-bound tasks may be continuously put aside in the execution, allowing other processes to work, since it will be sleeping for many periods, waiting for data to be retrieved to primary memory;
interative tasks must be continuously executed, but needs not to be executed without interruptions, as it will generate interruptions, waiting for user inputs, but it needs to have a high priority, in order not to let the user notice delays in the execution.
Considering this, and the context switch costs, you must evaluate what types of tasks you have, choosing, thus, one or more policies for your scheduler.
Edit:
I thought this was a simply conceptual question. Considering you have to implement a solution, you must analyze the requirements.
Since you have the length of the tasks, and the context switch times, and you have to maintain the cores busy, this becomes an optimization problem, where you must keep the minimal number of cores idle when it reaches the end of the processes, but you need to maintain the minimum number of context switches, so that your overall execution time does not grow too much.
As pointed by svick, this sounds like a partition problem, which is NP-complete, and in which you need to divide a sequence of numbers into a given number of lists, so that the sum of each list is equal to each other.
In your problem you'd have a relaxation on the objective, so that you no longer need all the cores to execute the same amount of time, but you want the difference between any two cores execution time to be as small as possible.
In the reference given by svick, you can see a dynamic programming approach that you may be able to map onto your problem.

Optimum algorithm

Two workers has several tasks.Assume the tasks has duration of 14, 7, 2, 4. The next task goes to first worker that is free. The two workers have to finish several tasks in one day. The same task takes the same time on the two workers. Our goal is to finish the tasks as soon as possible.
Two questions:
1.show that the algorithm always completes the task prior to time 2*T,T is the optimum completion time.
2.express the optimum scheduling with a resursion(multi-dimentonal)
Not HW PRoblem
Please give me some suggestions
What is a multi-dimentional recursion?
Since you ask for suggestions...
Try drawing the problem out. Have a timeline for worker #1 and worker #2 and specify what tasks they are working on for what stretches of time. Once you understand why this algorithm completes in less than 2*T time, you then can start figuring out how to formally prove it.

Resources