Parallel Solving with PDPTW in OptaPlanner - parallel-processing

I'm trying to increase OptaPlanner performance using parallel methods, but I'm not sure of the best strategy.
I have PDPTW:
vehicle routing
time-windowed (1 hr windows)
pickup and delivery
When a new customer wants to add a delivery, I'm trying to figure out a fast way (less than a second) to show them what time slots are available in a day (8am, 9am, 10am, etc). Each time slot has different score outcomes. Some are very efficient and some aren't bookable depending on the time/situation with increased drive times.
For performance, I don't want to try each of the hour times in sequence as it's too slow.
How can I try the customer's delivery across all the time slots in parallel? It would make sense to run the solver first before adding the customer's potential delivery window and then share that solved original state with all the different added delivery's time slots being solved independently.
Is there an intuitive way to do this? Eg:
Reuse some of the original solving computation (the state before adding the new delivery). Maybe this can even be cached ahead of time?
Perhaps run all the time slot solving instances on separate servers (or at least multiple threads).
What is the recommended setup for something like this? It would be great to return an HTTP response within a second. This is for roughly 100-200 deliveries and 10-20 trucks.
Thanks!

A) If you optimize the assignment of 1 customer to 1 index in 1 of the vehicles, while pinning all other already assigned customers, then you forgoing all optimization benefits. It's not NP-hard.
You can still use OptaPlanner <constructionHeuristic/> for this (<localSearch/> won't improve the score), with or without moveThreadCount to spread it across cores, even though the main benefit will just the the incremental score calculation, not the AI algoritms.
B) Optimize assignment of all customers to an index of a vehicle. The real business benefits - like 25% less driving time - come when adding a new customer allows moving existing customer assignments too. The problem is that those existing customers already received a time window they blocked out in their agenda. But that doesn't need to be a problem if those time windows are wide enough: those are just hard constraints. Wider time windows = more driving time optimization opportunities (= more $$$, less CO² emissions).
What about the response within one minute?
At that point, you don't need to publish (= share info with the customer) which vehicle will come at which time in which order. You only need to publish whether or not you accept the time window. There's two ways to accomplish this:
C) Decision table based (a relaxation): no more than 5 customers per vehicle per day.
Pitfall: if it gets 5 customers in the 5 corners of the country/state, then it might still be infeasible. Factor in the average eucledean distance between any 2 customer location pairs to influence the decision.
D) By running optaplanner until termination feasible=true, starting from a warm start of the previous schedule. If no such feasible solution is found within 1000ms, reject the time window proposal.
Pitfall with D): if 2 requests come in at the same time, and you run them in parallel, so neither takes into account the other one, they could be feasible individually but infeasible together.

Related

Safari lodge vehicle allocations - resource constraint / allocation algorithm?

I work in a safari lodge where we need to continuously allocate guests to vehicles for the duration of their stay, with the following contraints:
no more than 6 guests allocated to a vehicle at any time
guests must be allocated to the same vehicle for the duration of their stay
where a guest (or group of guests) pays for a private vehicle, no other guests may be allocated to that vehicle
private vehicle bookings impact the remaining number of vehicles available for allocation
guests travelling together must be allocated to the same vehicle(s)
the maximum number of available vehicles is fixed (=5)
the maximum number of potential guests to be allocated is fixed (=24)
at times, certain vehicles may be unavailable for a period
allocations are updated daily for latest + last minute bookings and must take into account the existing allocation configuration for the current day when updating forward into the future (i.e. tomorrow onwards)
the goal is to minimize the number of vehicles in use at any point, without having to break up groups or move guests from one vehicle to another during their stay.
I've come across a range of algorithmic approaches, from resource scheduling to greedy algorithms - but realistically I'm not technical enough to evaluate what I need and how it needs to hang together. I have some basic coding experience in Javascript and VBA (shudder), but I'm not especially mathematical. Viewing this as an interesting learning exercise that could make life a lot easier if I can find a way to automate much of the allocation logic. I can't quite formulate a 'picture' of the problem/approach in my mind - whether it's a large matrix or some kind of tree?
Ideally, I'd be able to generate an algorithmic solution that would give me 90% of what I need, which I could then tweak manually (and minimally) for a final solution, taking into account the particular idiosyncrasies of certain guests or groups. This final solution would then need to be the input (starting point) to the next allocation update when the most recent list of (updated) bookings is received.
It's very possible that this problem is too complex for my technical abilities - but I won't know until I ask! Many thanks.

What are best known algorithms/techniques for updating edges on huge graph structures like social networks?

On social networks like twitter where millions follow single account, it must be very challenging to update all followers instantly when a new tweet is posted. Similarly on facebook there are fan pages with millions of followers and we see updates from them instantly when posted on page. I am wondering what are best known techniques and algorithms to achieve this. I understand with billion accounts, they have huge data centers across globe and even if we reduce this problem for just one computer in following manner - 100,000 nodes with average 200 edges per node, then every single node update will require 200 edge updates. So what are best techniques/algorithms to optimize such large updates. Thanks!
The best way is usually just to do all the updates. You say they can be seen "instantly", but actually the updates probably propagate through the network and can take up to a few seconds to show up in followers' feeds.
Having to do all those updates may seem like a lot, but on average follower will check for updates much more often than a person being followed will produce them, and checking for updates has to be much faster.
The choices are:
Update 1 million followers, a couple times a day, within a few seconds; or
Respond to checks from 1 million followers, a couple hundred times a day, within 1/10 second or so.
There are in-between strategies involving clustering users and stuff, but usage patterns like you see on Facebook and Twitter are probably so heavily biased toward option (1) that such strategies don't pay off.

Micro-job platform - matching orders with nearest worker

I am currently planning to develop a micro-job platform to match maintenance workers (plumbers, electricians, carpenter, etc) with users in need through geolocation. Bookings can be made well in advance or last minute (minimum 1h before) at the same fare.
Now, I am struggling to find a suited matching algorithm to efficiently assign bookings to available workers. The matcher is basically a listener catching "NewBooking" events, which are fired on a regular basis until a worker is assigned to the specific booking. Workers can accept or decline the job and can choose working hours with a simple toggle button (when it's off they will not receive any request).
On overall the order is assigned within a certain km range.
The first system I thought of is based on concentric zones, whose radius is incremented every time the event is fired (not indefinitely). All workers online within the area will be notified and the first to accept gets the job.
Pros:
more opportunities to match last minute bookings;
Cons:
workers may get a lot of notifications;
the backend processing several push & mail messages;
A second solution is based on linear distance, assigning the work to the nearest available worker and, if (s)he does not accept it within a certain timeframe (like 30'), the algorithm goes to the next available person and so on.
Pros:
less processing power;
scalability with lots of workers and requests;
Cons:
less chances to match last minute orders;
Third alternative is to use the first approach sending orders in multiple batches according to feedback ratings; the first group to receive the notification is made out of those with 4+ stars, then 3+ avg. of votes and so on.
I was wondering if there is a best practice when it comes to this kind of matching algorithms since even taxi apps face these issues.
Anyway, which approach would you suggest (If any), or do you have any proposal on possible improvements?
Thank you

Optimal shift scheduling algorithm

I have been trying for some time solve a scheduling problem for a pool that I used to work at. This problem is as follows...
There are X many lifeguards that work at the pool, and each has a specific number of hours they would like to work. We hope to keep the average number of hours away from each lifeguards desired number of hours as low as possible, and as fair as possible for all. Each lifeguard is also a college student, and thus will have a different schedule of availability.
Each week the pool's schedule of events is different than the last, thus a new schedule must be created each week.
Within each day there will be so many lifeguards required for certain time intervals (ex: 3 guards from 8am-10am, 4 guards from 10am-3pm, and 2 guards from 3pm-10pm). This is where the hard part comes in. There are no clearly defined shifts (slots) to place each of the lifeguards into (because of the fact that creating a schedule may not be possible provided the availability of the lifeguards plus the changing weekly pool schedule of events).
Therefore a schedule must be created from a blank slate provided only with...
The Lifeguards and their information (# of desired hours, availability)
The pool's schedule of events, plus number of guards required to be on duty at any moment
The problem can now be clearly defined as "Create a possible schedule that covers the required number of guards at all times each day of the week AND be as fair as possible to all lifeguards in scheduling."
Creating a possible schedule that covers the required number of guards at all times each day of the week is the part of the problem that is a necessity and must be completely solved. The second half about being as fair as possible to all lifeguards significantly complicates the problem leading me to believe I will need an approximation approach, since the possible number of way to divide up a work day could be ridiculous, but sometimes may be necessary as the only possible schedule may be ridiculous for fairness.
Edit: One of the most commonly suggested algorithms I find is the "Hospitals/Residents problem", however I don't believe this will be applicable since there are no clearly defined slots to place workers.
One way to solve this is with constraint programming - the Wikipedia article provides links for several constraint programming languages and libraries. Here is a paper describing how to use constraint programming to solve scheduling problems.
Another option is to use a greedy algorithm to find a (possibly invalid) solution, and to then use local search to make the invalid solution valid, or else to improve the sub-optimal greedy solution. For example, start by assigning each lifeguard their preferred hours, which will result in too many guards being scheduled for some slots and will also result in some guards being assigned a ridiculous number of hours; then use local search to un-assign the guards with the most hours from the slots that have too many guards assigned.
You need to turn your fairness criterion into an objective function. Then you can pick from any number of workplace scheduling tools.For instance, you describe wanting to minimize the average difference between desired and assigned hours. However, I'd suggest that you consider minimizing the maximum difference. This seems fairer (to me) and it will generally result in a different schedule.
The problem, however, is a bit more complex. For instance, if one guard is always getting shorted while the others all get their desired hours, that's unfair as well. So you might want to introduce variables into your fairness model that represent the cumulative discrepancy for each guard from previous weeks. Also, a one-hour discrepancy for a guard who wants to work four hours a week may be more unfair than for a guard who wants to work twenty. To handle things like that, you might want to weight the discrepancies.
You might have to introduce constraints, such as that no guard is assigned more than a certain number of hours, or that every guard has a certain amount of time between shifts, or that the number of slots assigned to any one guard in a week should not exceed some threshold. Many scheduling tools have capabilities to handle these kinds of constraints, but you have to add them to the model.

How to manage transactions, debt, interest and penalty?

I am making a BI system for a bank-like institution. This system should manage credit contracts, invoices, payments, penalties and interest.
Now, I need to make a method that builds an invoice. I have to calculate how much the customer has to pay right now. He has a debt, which he has to pay for. He also has to pay for the interest. If he was ever late with due payment, penalties are applied for each day he's late.
I thought there were 2 ways of doing this:
By having only 1 original state - the contract's original state. And each time to compute the monthly payment which the customer has to make, consider the actual, made payments.
By constantly making intermediary states, going from the last intermediary state, and considering only the events that took place between the time of these 2 intermediary states. This means having a job that performs periodically (daily, monthly), that takes the last saved state, apply the changes (due payments, actual payments, changes in global constans like the penalty rate which is controlled by the Central Bank), and save the resulting state.
The benefits of the first variant:
Always actual. If changes were made with a date from the past (a guy came with a paid invoice 5 days after he made the payment to the bank), they will be correctly reflected in the results.
The flaws of the first variant:
Takes long to compute
Documents printed with the current results may differ if the correct data changes due to operations entered with a back date.
The benefits of the second variant:
Works fast, and aggregated data is always available for search and reports.
Simpler to compute
The flaws of the second variant:
Vulnerable to failed jobs.
Errors in the past propagate until the end, to the final results.
An intermediary result cannot be changed if new data from past transactions arrives (it can, but it's hard, and with many implications, so I'd rather mark it as Tabu)
Jobs cannot be performed successfully and without problems if an unfinished transaction exists (an issued invoice that wasn't yet paid)
Is there any other way? Can I combine the benefits from these two? Which one is used in other similar systems you've encountered? Please share any experience.
Problems of this nature are always more complicated than they first appear. This
is a consequence of what I like to call the Rumsfeldian problem of the unknown unknown.
Basically, whatever you do now, be prepared to make adjustments for arbitrary future rules.
This is a tough proposition. some future possibilities that may have a significant impact on
your calculation model are back dated payments, adjustments and charges.
Forgiven interest periods may also become an issue (particularly if back dated). Requirements
to provide various point-in-time (PIT) calculations based on either what was "known" at
that PIT (past view of the past) or taking into account transactions occurring after the reference PIT that
were back dated to a PIT before the reference (current view of the past). Calculations of this nature can be
a real pain in the head.
My advice would be to calculate from "scratch" (ie. first variant). Implement optimizations (eg. second variant) only
when necessary to meet performance constraints. Doing calculations from the beginning is a compute intensive
model but is generally more flexible with respect to accommodating unexpected left turns.
If performance is a problem but the frequency of complicating factors (eg. back dated transactions)
is relatively low you could explore a hybrid model employing the best of both variants. Here you store the
current state and calculate forward
using only those transactions that posted since the last stored state to create a new current state. If you hit a
"complication" re-do the entire account from the
beginning to reestablish the current state.
Being able to accommodate the unexpected without triggering a re-write is probably more important in the long run
than shaving calculation time right now. Do not place restrictions on your computation model until you have to. Saving
current state often brings with it a number of built in assumptions and restrictions that reduce wiggle room for
accommodating future requirements.

Resources