Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am having the following optimization problem setup:
Given:
about 100 Mechanics, each with
Work time per day [e.g. 8 hours]
Break time per day [e.g. min 1 hour]
Maximum overtime per day [e.g. 1 hour]
Location [e.g. Detroit]
about 1000 Tasks, each with
Location [e.g. Chicago]
Duration [e.g. 1 hour]
Fixed time slot [e.g. 1pm] [optional]
The goal is to schedule all tasks to the mechanics with short paths. One constraint is that every mechanic starts & ends at his home location.
Is there any way to solve this problem in an easy & understandable way? Are there any similar examples online in e.g. python?
Not all workers would be available to do a task because of Location. If Locations don't overlap, you could at least segment the problem into Location-specific one to avoid dealing with it. Then you could assign the fixed timeslots first, always picking the workers with the least hours on the schedule. Since hours are a discrete value, you could pick the nearest worker by distance when choosing a worker amongst several that have an equal number of scheduled hours.
This would be a very basic approach that would do the scheduling but may not do it in a practical manner - for example, two close-by jobs may be assigned to different workers and efficiency may not be good at all when you consider travel time between jobs. You would have to iterate with the business and apply some heuristics to get to a usable solution.
I'd advise you to get a real-world sample of the input data - availability, locations, jobs etc - as large as possible, and create some evaluation function first: overtime, travel time, utilization of the workforce should all factor in, then you could see what heuristics need to be applied to the basic algorithm.
Another approach would be to cluster the jobs by location, into 1-worker-per-day clusters, and assign close-by jobs to the same worker. Look into graph clustering algorithms for that. Within a cluster you could assign the fixed-time jobs first, then the rest in random order. You could also limit the clusters to not have overlapping fixed-time jobs.
Either way, you'll have to come up with heuristics, whichever approach you take.
Finding the optimal solution may be an NP-hard problem http://www.cs.mun.ca/~kol/courses/6901-f14/lec3.pdf
Related
I'm trying to increase OptaPlanner performance using parallel methods, but I'm not sure of the best strategy.
I have PDPTW:
vehicle routing
time-windowed (1 hr windows)
pickup and delivery
When a new customer wants to add a delivery, I'm trying to figure out a fast way (less than a second) to show them what time slots are available in a day (8am, 9am, 10am, etc). Each time slot has different score outcomes. Some are very efficient and some aren't bookable depending on the time/situation with increased drive times.
For performance, I don't want to try each of the hour times in sequence as it's too slow.
How can I try the customer's delivery across all the time slots in parallel? It would make sense to run the solver first before adding the customer's potential delivery window and then share that solved original state with all the different added delivery's time slots being solved independently.
Is there an intuitive way to do this? Eg:
Reuse some of the original solving computation (the state before adding the new delivery). Maybe this can even be cached ahead of time?
Perhaps run all the time slot solving instances on separate servers (or at least multiple threads).
What is the recommended setup for something like this? It would be great to return an HTTP response within a second. This is for roughly 100-200 deliveries and 10-20 trucks.
Thanks!
A) If you optimize the assignment of 1 customer to 1 index in 1 of the vehicles, while pinning all other already assigned customers, then you forgoing all optimization benefits. It's not NP-hard.
You can still use OptaPlanner <constructionHeuristic/> for this (<localSearch/> won't improve the score), with or without moveThreadCount to spread it across cores, even though the main benefit will just the the incremental score calculation, not the AI algoritms.
B) Optimize assignment of all customers to an index of a vehicle. The real business benefits - like 25% less driving time - come when adding a new customer allows moving existing customer assignments too. The problem is that those existing customers already received a time window they blocked out in their agenda. But that doesn't need to be a problem if those time windows are wide enough: those are just hard constraints. Wider time windows = more driving time optimization opportunities (= more $$$, less CO² emissions).
What about the response within one minute?
At that point, you don't need to publish (= share info with the customer) which vehicle will come at which time in which order. You only need to publish whether or not you accept the time window. There's two ways to accomplish this:
C) Decision table based (a relaxation): no more than 5 customers per vehicle per day.
Pitfall: if it gets 5 customers in the 5 corners of the country/state, then it might still be infeasible. Factor in the average eucledean distance between any 2 customer location pairs to influence the decision.
D) By running optaplanner until termination feasible=true, starting from a warm start of the previous schedule. If no such feasible solution is found within 1000ms, reject the time window proposal.
Pitfall with D): if 2 requests come in at the same time, and you run them in parallel, so neither takes into account the other one, they could be feasible individually but infeasible together.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to implement football match processing and so I spent a lot of time to find good algorithms to do it. I have some data as input - players that have some parameters. Some of parameters are static while the match is processing (skills) and some of them are dynamic (physical state, psycological state) that are changing in process. Also I have external parameters that I can change manually. I don't need it to be so close to the real football (excluding result. 20:0 will be awful anyway). And the last main idea is that the same input will not lead to the same output. Some of middle calculations should return random values.
The algorithm should not be very slow because in the near future it will be necessary to process about 1000 matches at the same time step-by-step. Each step will be calculated once per 3 seconds. And also these steps should be logically linked because I will make graphic match process with all ball and players movements.
What algorithms can you recommend for me? I thought about neural network but I'm not sure that this is a good solution.
You will really help me because I spent about of a half of year to find it so thank you very much!
Let's say you have an "action" every 5 minutes of the game, so 90/5 = 18 actions. To make it more realistic you can choose random number like:
numberOfActions = round(10,20);
This number can appear as a lenght of you for().
Than you have interactions between defense and offence parameters of two sets of your players. Lets say each point of offenceA-defenceB creates ten per cent chance to succeed:
if((TeamA.Offence-TeamB.Defence)*10 > round(0,100))
{
TeamA.points++;
}
Of course the goalkeeper can decrease this probability, maybe even significantly.
And so on. Of course you can make it more complicated. Like you can compare your stats only for certain players, depending on who's having the ball. Your both offence and defense parameters can be decreased by time and raised by condition:
TeamA._realOffenceValue =
TeamA.Offence*
(1-i/numberOfActions)*
(TeamA.leftOffencePlayer.Condition);
Remember that in games like football manager or Europa Universalis it's everything about cheating the player. Balancing the game is a job for many hours and nobody will make it for you on the forum :)
I have been trying for some time solve a scheduling problem for a pool that I used to work at. This problem is as follows...
There are X many lifeguards that work at the pool, and each has a specific number of hours they would like to work. We hope to keep the average number of hours away from each lifeguards desired number of hours as low as possible, and as fair as possible for all. Each lifeguard is also a college student, and thus will have a different schedule of availability.
Each week the pool's schedule of events is different than the last, thus a new schedule must be created each week.
Within each day there will be so many lifeguards required for certain time intervals (ex: 3 guards from 8am-10am, 4 guards from 10am-3pm, and 2 guards from 3pm-10pm). This is where the hard part comes in. There are no clearly defined shifts (slots) to place each of the lifeguards into (because of the fact that creating a schedule may not be possible provided the availability of the lifeguards plus the changing weekly pool schedule of events).
Therefore a schedule must be created from a blank slate provided only with...
The Lifeguards and their information (# of desired hours, availability)
The pool's schedule of events, plus number of guards required to be on duty at any moment
The problem can now be clearly defined as "Create a possible schedule that covers the required number of guards at all times each day of the week AND be as fair as possible to all lifeguards in scheduling."
Creating a possible schedule that covers the required number of guards at all times each day of the week is the part of the problem that is a necessity and must be completely solved. The second half about being as fair as possible to all lifeguards significantly complicates the problem leading me to believe I will need an approximation approach, since the possible number of way to divide up a work day could be ridiculous, but sometimes may be necessary as the only possible schedule may be ridiculous for fairness.
Edit: One of the most commonly suggested algorithms I find is the "Hospitals/Residents problem", however I don't believe this will be applicable since there are no clearly defined slots to place workers.
One way to solve this is with constraint programming - the Wikipedia article provides links for several constraint programming languages and libraries. Here is a paper describing how to use constraint programming to solve scheduling problems.
Another option is to use a greedy algorithm to find a (possibly invalid) solution, and to then use local search to make the invalid solution valid, or else to improve the sub-optimal greedy solution. For example, start by assigning each lifeguard their preferred hours, which will result in too many guards being scheduled for some slots and will also result in some guards being assigned a ridiculous number of hours; then use local search to un-assign the guards with the most hours from the slots that have too many guards assigned.
You need to turn your fairness criterion into an objective function. Then you can pick from any number of workplace scheduling tools.For instance, you describe wanting to minimize the average difference between desired and assigned hours. However, I'd suggest that you consider minimizing the maximum difference. This seems fairer (to me) and it will generally result in a different schedule.
The problem, however, is a bit more complex. For instance, if one guard is always getting shorted while the others all get their desired hours, that's unfair as well. So you might want to introduce variables into your fairness model that represent the cumulative discrepancy for each guard from previous weeks. Also, a one-hour discrepancy for a guard who wants to work four hours a week may be more unfair than for a guard who wants to work twenty. To handle things like that, you might want to weight the discrepancies.
You might have to introduce constraints, such as that no guard is assigned more than a certain number of hours, or that every guard has a certain amount of time between shifts, or that the number of slots assigned to any one guard in a week should not exceed some threshold. Many scheduling tools have capabilities to handle these kinds of constraints, but you have to add them to the model.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
During our iteration planning, we frequently find ourselves in the same position as this guy - How to estimate a programming task if you have no experience in it
I definitely agree with prototyping before you can give a reasonable estimate. But the same applies to anything that needs a bit of architecture and design - but I'm not that comfortable doing all this outwith the scope of a sprint.
The basic idea is that you identify as many tasks as you can that you're confident of, and estimate these as normal. For those areas that you're unsure of then there should be two 'types' of task identified: Investigation & Implementation.
Investigation tasks are brief descriptions of work that you're just unsure of, for example "Investigate how to bind Control X to data". An estimate is provided for these.
The Implementation task is a traditional rough guess, probably based on the story points assigned, of how long you think it would take to implement the feature.
During the sprint, when the investigation tasks have been completed, the developer should then be at a stage where they have a much better idea what is going on. 'Proper' Tasks can then be identified, which take the place of the Implementation placeholder. In addition, further Investigation tasks may be identified at this stage, and the cycle continues.
In the above example, we start with an Investigation task at 7 hours and an Implementation task estimated at 14. Once the first Investigation has been completed, Tasks 1, 2 and 3 will be identified and estimated with some degree of certainty, where Task 3 is another Investigation task from which Task 4 and 5 will be identified at a later stage. As you can see, the first Implementation estimate had delivery of the feature within 14 hours - but the reality is it took at least 4 + 7 + 3 + 4 + 2 = 20. A third more than the initial estimate.
alt text http://www.duncangunn.me.uk/myweb/images/estimate.png
All thoughts are welcome - my gut instinct is this will fly - am I right or am I the Wrong Brothers?
Cheers!
What we do.
Some features involve new technology. We can't accurately estimate them. Period.
We make up a number. Based on a couple of things. How hard does it "feel"? Can we get by with some kind of "partial" or "just-enough" implementation?
If it's hard, then it's hard. It will be expensive.
If there's a lot of parts, with a kernel of goodness and some bonus stuff layered on, we have a possibility of putting just the kernel into a release, and setting other stuff aside for later. A very few things are "all or nothing" where a partial release isn't possible. In that case, we have to provide enough time for "all", and that gets expensive.
Our standard approach is to get stuff that works, and possibly defer things to a later sprint if we ran of out time because of unexpected complexities.
What you're calling "investigation", we call technical spike sprints. For stuff that's new, we make up estimate number to placate managers who feel it necessary to overplan things. Then we spike the technology. Once it's spiked, we can revise the estimates based on what we now know.
Actually, the implementation of the feature took 27 hours - you forgot the first investigation of 7 hours, so in reality the actual implementation took almost twice as long as the estimate.
There are two ways you can go on this:
Just make the estimate as best you can and potentially experience a blowout in your sprint and a declined project velocity (you should only do this if the feature is both urgent and critical); or
Schedule the investigation for this sprint and leave the implementation for another sprint - without an idea of how long the task will take, the Product Owner does not have enough information to make a decision about in which sprint to schedule it or even whether to do it at all. Only tasks that have been estimated should be included in your sprint.
The first choice means your sprint and project estimates are somewhat arbitrary. The second choice gives much more predictability to your sprints.
In your example, the initial investigation may be scheduled for Sprint 1 but without knowledge of how long the task will take the Product Owner can't decide how to schedule it. If you came back with an estimate of 200 hours the Product Owner may decide not to do that feature at all, or to delay it until Release 2 of the product. The estimate comes in and the Product Owner schedules Task 1, Task 2 and the investigation of Task 3 for Sprint 2. After estimating Task 3, Tasks 4 and 5 can be scheduled in Sprint 3 or later.
Estimating feature usually is complex task. After some time your estimation will become better. But good approach can be that you estimate features with the story points. Story point is abstract value (meaning agreed among the team) that express complexity of the problem.
You should assign the same complexity (same number of story points) to the features of the similar complexity. Then later on it is enough to estimate only smaller set of features (or looking at the historical data) and you should be able to estimate how much time you need.
Features with the similar complexity need similar time effort for implementation.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Observing one year of estimations during a project I found out some strange things that make me wonder if evidence based scheduling would work right here?
individual programmers seem to have favorite numbers (e.g. 2,4,8,16,30 hours)
the big tasks seem to be underestimated by a fix value (about 2) but the standard deviation is low here
the small tasks (1 or 2 hours) are absolutely wide distributed. In average they have the same average underestimation factor of 2, but the standard deviation is high:
some 5 minute spelling issues are estimated with 1 hour
other bugfixes are estimated with 1 hour too, but take a day
So, is it really a good idea to let the programmers break down the 30 hours task down to 4 or 2 hours steps during estimations? Won't this raise the standard deviation? (Ok, let them break it down - but perhaps after the estimations?!)
Yes, your observations are exatly the sort of problems EBS is designed to solve.
Yes, it's important to break bigger tasks down. Shoot for 1-2 day tasks, more or less.
If you have things estimated at under 2 hrs, see if it makes sense to group them. (It might not -- that's ok!)
If you have tasks that are estimated at 3+ days, see if there might be a way to break them up into pieces. There should be. If the estimator says there is not, make them defend that assertion. If it turns out that the task really just takes 3 days, fine, but the more of these you have, the more you should be looking hard in the mirror and seeing if folks aren't gaming the system.
Count 4 & 5 day estimates as 2x and 4x as bad as 3 day ones. Anyone who says something is going to take longer than 5 days and it can't be broken down, tell them you want them to spend 4 hrs thinking about the problem, and how it can be broken down. Remember, that's a task, btw.
As you and your team practice this, you will get better at estimating.
...You will also start to recognize patterns of failure, and solutions will present themselves.
The point of Evidence based scheduling is to use Evidence as the basis for your schedule, not a collection of wild-assed guesses. It's A Good Thing...!
I think it is a good idea. When people break tasks down - they figure out the specifics of the task, You may get small deviations here and there, this way or the other, they may compensate or not...but you get a feeling of what is happening.
If you have a huge task of 30 hours - can take all 100. This is the worst that could happen.
Manage the risk - split down. You already figured out these small deviation - you know what to do with them.
So make sure developers also know what they do and say :)
"So, is it really a good idea to let the programmers break down the 30 hours task down to 4 or 2 hours steps during estimations? Won't this raise the standard deviation? (Ok, let them break it down - but perhaps after the estimations?!)"
I certainly don't get this question at all.
What it sounds like you're saying (you may not be saying this, but it sure sounds like it)
The programmers can't estimate at all -- the numbers are always rounded to "magic" values and off by 2x.
I can't trust them to both define the work and estimate the time it takes to do the work.
Only I know the correct estimate for the time required to do the task. It's not a round 1/2 day multiple. It's an exact number of minutes.
Here's my follow-up questions:
What are you saying? What can't you do? What problem are you having? Why do you think the programmers estimate badly? Why can't they be trusted to estimate?
From your statements, nothing is broken. You're able to plan and execute to that plan. I'd say you were totally successful and doing a great job at it.
Ok, I have the answer. Yes it is right AND the observations I made (see question) are absolutely understandable. To be sure I made a small Excel simulation to ensure myself of what I was guessing.
If you add multiple small task with a high standard deviation to bigger tasks, they will have a lower deviation, because the small task partially compensate the uncertainty.
So the answer is: Yes, it will work, if you break down your tasks, so that they are about the same length. It's because the simulation will do the compensation for bigger tasks automatically. I do not need to worry about a higher standard deviation in the smaller tasks.
But I am sure you must not mix up low estimated tasks with high estimated tasks - because they simply do not have the same variance.
Hence, it's always better to break them down. :)
The Excel simulation I made:
create 50 rows with these columns:
first - a fixed value 2 (the very homogeneous estimation)
20 columns with some random function (e.g. "=rand()*rand()*20")
make sums fore each column
add "=VARIANCE(..)" for each random column
and add a variance calculation for the sums
The variance for each column in my simulation was about 2-3 and the variance of the sums below 1.