Micro-job platform - matching orders with nearest worker - algorithm

I am currently planning to develop a micro-job platform to match maintenance workers (plumbers, electricians, carpenter, etc) with users in need through geolocation. Bookings can be made well in advance or last minute (minimum 1h before) at the same fare.
Now, I am struggling to find a suited matching algorithm to efficiently assign bookings to available workers. The matcher is basically a listener catching "NewBooking" events, which are fired on a regular basis until a worker is assigned to the specific booking. Workers can accept or decline the job and can choose working hours with a simple toggle button (when it's off they will not receive any request).
On overall the order is assigned within a certain km range.
The first system I thought of is based on concentric zones, whose radius is incremented every time the event is fired (not indefinitely). All workers online within the area will be notified and the first to accept gets the job.
Pros:
more opportunities to match last minute bookings;
Cons:
workers may get a lot of notifications;
the backend processing several push & mail messages;
A second solution is based on linear distance, assigning the work to the nearest available worker and, if (s)he does not accept it within a certain timeframe (like 30'), the algorithm goes to the next available person and so on.
Pros:
less processing power;
scalability with lots of workers and requests;
Cons:
less chances to match last minute orders;
Third alternative is to use the first approach sending orders in multiple batches according to feedback ratings; the first group to receive the notification is made out of those with 4+ stars, then 3+ avg. of votes and so on.
I was wondering if there is a best practice when it comes to this kind of matching algorithms since even taxi apps face these issues.
Anyway, which approach would you suggest (If any), or do you have any proposal on possible improvements?
Thank you

Related

Parallel Solving with PDPTW in OptaPlanner

I'm trying to increase OptaPlanner performance using parallel methods, but I'm not sure of the best strategy.
I have PDPTW:
vehicle routing
time-windowed (1 hr windows)
pickup and delivery
When a new customer wants to add a delivery, I'm trying to figure out a fast way (less than a second) to show them what time slots are available in a day (8am, 9am, 10am, etc). Each time slot has different score outcomes. Some are very efficient and some aren't bookable depending on the time/situation with increased drive times.
For performance, I don't want to try each of the hour times in sequence as it's too slow.
How can I try the customer's delivery across all the time slots in parallel? It would make sense to run the solver first before adding the customer's potential delivery window and then share that solved original state with all the different added delivery's time slots being solved independently.
Is there an intuitive way to do this? Eg:
Reuse some of the original solving computation (the state before adding the new delivery). Maybe this can even be cached ahead of time?
Perhaps run all the time slot solving instances on separate servers (or at least multiple threads).
What is the recommended setup for something like this? It would be great to return an HTTP response within a second. This is for roughly 100-200 deliveries and 10-20 trucks.
Thanks!
A) If you optimize the assignment of 1 customer to 1 index in 1 of the vehicles, while pinning all other already assigned customers, then you forgoing all optimization benefits. It's not NP-hard.
You can still use OptaPlanner <constructionHeuristic/> for this (<localSearch/> won't improve the score), with or without moveThreadCount to spread it across cores, even though the main benefit will just the the incremental score calculation, not the AI algoritms.
B) Optimize assignment of all customers to an index of a vehicle. The real business benefits - like 25% less driving time - come when adding a new customer allows moving existing customer assignments too. The problem is that those existing customers already received a time window they blocked out in their agenda. But that doesn't need to be a problem if those time windows are wide enough: those are just hard constraints. Wider time windows = more driving time optimization opportunities (= more $$$, less CO² emissions).
What about the response within one minute?
At that point, you don't need to publish (= share info with the customer) which vehicle will come at which time in which order. You only need to publish whether or not you accept the time window. There's two ways to accomplish this:
C) Decision table based (a relaxation): no more than 5 customers per vehicle per day.
Pitfall: if it gets 5 customers in the 5 corners of the country/state, then it might still be infeasible. Factor in the average eucledean distance between any 2 customer location pairs to influence the decision.
D) By running optaplanner until termination feasible=true, starting from a warm start of the previous schedule. If no such feasible solution is found within 1000ms, reject the time window proposal.
Pitfall with D): if 2 requests come in at the same time, and you run them in parallel, so neither takes into account the other one, they could be feasible individually but infeasible together.

Algorithm for Scheduling One Appointment in Already Full Schedule

I'm building a calendar scheduling application for, let's say a plumbing company. The company has one or more plumbers, who each have a schedule of appointments at different times throughout the day. So Josh's schedule on May 30th might include a 30-minute appointment at 10 AM, a 45-minute appointment at 1 PM, and an hour-long appointment at 3 PM, while Maria has a completely different schedule that day. Now say a customer wants to book an appointment with this company, and my program has already calculated the time this new appointment will take. I want my program to return a list of possible appointment times for any plumber(s). Is there a standard algorithm for this type of problem?
I'd prefer language-agnostic, general steps just to be more helpful to anyone who might be in a similar situation with a different language, though I'm using PHP and PostgreSQL if there's a specific language feature suited to this.
Here's what I've tried so far:
Get all available shifts for every plumber on the requested day
Get all appointments already made on that day
Do a sort of boolean subtraction to cut the appointments out of the shifts, leaving gaps in each plumber's schedule
Get rid of all schedule gaps that are smaller than the requested appointment length (I also calculate drive times here so I know how far appointments need to be from one another)
Return those gaps to the customer, trimmed to the appointment length, as appointment possibilities
I've learned that the problem with this approach is that it doesn't understand what to do with a gap much larger than the requested appointment. If you have a gap from 10 AM to 6 PM but you want an hour-long appointment, it will only suggest 10 AM to 11 AM. This approach doesn't allow for time-of-day choices, either. In that same scenario, what if the customer wants a morning appointment? Then it should only suggest 10-11 and 11-12. Or if they want an evening appointment, it should only suggest 5-6 PM. This approach also doesn't consider two plumbers working together. If we assume that two workers = half the time, then maybe the algorithm should look for the same 30 minutes available in both Josh and Maria's schedules along with 60-minute gaps in either plumber's schedule. Lastly, this algorithm feels very inefficient.
By the way, I've looked at several other questions here and around the Internet about how to solve similar situations, but I'm finding that most (if not all) of those questions involve optimizing a schedule. That might be valuable for other parts of this program, but for now, let's assume that the existing appointments are fixed and unchangeable. We're just looking to fit a new appointment into an existing schedule. I know this is possible because applications like Calendly have similar inputs and outputs.
In short, is there a better way of meeting these goals:
Suggest available gaps in one plumber's schedule given a time interval
If possible, only return appointment possibilities in the given time of day (morning = 4-12, afternoon = 12-5, evening = 5-10, night = 10-4, or any), and if not possible, continue with the algorithm as if no time of day had been specified
Suggest smaller gaps where n plumbers might do the job in 1/n time (there aren't that many plumbers, so setting a limit on this isn't necessary). This isn't as important as the other criteria, so if this isn't possible or would make the algorithm far more complex, then don't worry about it.
Split big appointment gaps into smaller gaps so we can suggest 4 hour-long gaps in between 10 AM and 2 PM. Obviously we can't suggest all possible hour-long segments of that gap because they'd be infinite
Thank you in advance.
There is no need for any sophisticated algorithm. There is only a small number of possible appointment times throughout a day, let's say every 30 minutes or so. Iterate over all possible times: 06:00, 06:30, 07:00, ... 20:00. Check each time if it matches the requirements, that check can either return a yes/no result, or a number that say how good a match that time is. You end up with a list of possible appointment times, pick the best one or all of them.

Location and timeframe matching algorithm

I am developing a B2C service on demand and I need to create the matching algorithm, simple yet efficient enough to run on a Laravel backend.
The service matches house cleaners with customers, where the customer, when booking, defines a certain date and timeframe when (s)he will be available at home.
On the other hand (#1), cleaners don't have a working schedule and should be picked according to geographical proximity and, of course, availability (not to have two services starting at the same time or too close time-wise one from the other).
Obviously, there should be a geographical limit so that after a certain ε km the algorithm stops searching and, according to the type of service chosen, there is a specific duration.
Another solution (#2) could be to have a predefined weekly schedule and match schedules in order to define a "working calendar" as new requests come.
In any case the cleaner can opt not to accept a booking, in that case the algorithm will continue searching for another possible candidate.
Which algorithm would you suggest for solution #1 and #2?
Thank you

How to make a GPS transportation app?

I live in Split,Croatia, and a city bus company recently accuired a new piece of software, and what it does is this: if I am a passenger and am awaiting a bus on a bus station, there is a huge monitor on which I can see the bus code and the time it will take for him to get to my station. The problem is, that in two years of having the software, not once have I seen that the time of the arrival is even remotley accurate. I am aware that GPS data can be inaccurate but this.... And that makes me so frustrated, that I decided that I will try to write a similar application for my final exam in CS in college. The problem is that I searched the web extensively in the last few days, and I cannot find good starting points. So my question is: have you ever been involved in such a project, and if so could you give me some pointers be it tutorials, or books regarding the subject? I appreciate any kind of input.
If I made a mistake regarding the question itself feel free to close it.
Thanks!
You will probably have:
Vehicle object containing each vehicles position, assigned route, direction of travel on route, and next scheduled stop, previous scheduled stop, arrival time at previous scheduled stop
Array of routes which comprises a list of stops and a data structure holding historic transit times between stops on each route
Now, updates to a vehicle's location push to the vehicle's object.
When you want to update a display at a station, find all routes passing through that station and for each route display the estimated arrival time of the next vehicle on that route.
The estimated arrival time structure is at the heart of this. Seed it by assuming some distance between stops and an average travel speed.
Now, every time a vehicle arrives at a stop, calculate the real time it took to get there from its last stop and use this to update an average transit time binned by half-hour increments (or what have you), you could also bin by season and/or day of week. The purpose of the binning is to implicitly account for varying traffic congestion by time of day, day of week, and/or season. Assuming otherwise homogeneous conditions, you'll eventually converge on a decent estimate of the transit time between each station.
You may find it useful to employ a Kalman filter.
Estimates of travel times between more distant stations may be more accurate than travel times between adjacent stations, if you feel like looking into that. Higher-order Markov chains may also help describe the underlying statistics of transit times.
Just ideas.

How to manage transactions, debt, interest and penalty?

I am making a BI system for a bank-like institution. This system should manage credit contracts, invoices, payments, penalties and interest.
Now, I need to make a method that builds an invoice. I have to calculate how much the customer has to pay right now. He has a debt, which he has to pay for. He also has to pay for the interest. If he was ever late with due payment, penalties are applied for each day he's late.
I thought there were 2 ways of doing this:
By having only 1 original state - the contract's original state. And each time to compute the monthly payment which the customer has to make, consider the actual, made payments.
By constantly making intermediary states, going from the last intermediary state, and considering only the events that took place between the time of these 2 intermediary states. This means having a job that performs periodically (daily, monthly), that takes the last saved state, apply the changes (due payments, actual payments, changes in global constans like the penalty rate which is controlled by the Central Bank), and save the resulting state.
The benefits of the first variant:
Always actual. If changes were made with a date from the past (a guy came with a paid invoice 5 days after he made the payment to the bank), they will be correctly reflected in the results.
The flaws of the first variant:
Takes long to compute
Documents printed with the current results may differ if the correct data changes due to operations entered with a back date.
The benefits of the second variant:
Works fast, and aggregated data is always available for search and reports.
Simpler to compute
The flaws of the second variant:
Vulnerable to failed jobs.
Errors in the past propagate until the end, to the final results.
An intermediary result cannot be changed if new data from past transactions arrives (it can, but it's hard, and with many implications, so I'd rather mark it as Tabu)
Jobs cannot be performed successfully and without problems if an unfinished transaction exists (an issued invoice that wasn't yet paid)
Is there any other way? Can I combine the benefits from these two? Which one is used in other similar systems you've encountered? Please share any experience.
Problems of this nature are always more complicated than they first appear. This
is a consequence of what I like to call the Rumsfeldian problem of the unknown unknown.
Basically, whatever you do now, be prepared to make adjustments for arbitrary future rules.
This is a tough proposition. some future possibilities that may have a significant impact on
your calculation model are back dated payments, adjustments and charges.
Forgiven interest periods may also become an issue (particularly if back dated). Requirements
to provide various point-in-time (PIT) calculations based on either what was "known" at
that PIT (past view of the past) or taking into account transactions occurring after the reference PIT that
were back dated to a PIT before the reference (current view of the past). Calculations of this nature can be
a real pain in the head.
My advice would be to calculate from "scratch" (ie. first variant). Implement optimizations (eg. second variant) only
when necessary to meet performance constraints. Doing calculations from the beginning is a compute intensive
model but is generally more flexible with respect to accommodating unexpected left turns.
If performance is a problem but the frequency of complicating factors (eg. back dated transactions)
is relatively low you could explore a hybrid model employing the best of both variants. Here you store the
current state and calculate forward
using only those transactions that posted since the last stored state to create a new current state. If you hit a
"complication" re-do the entire account from the
beginning to reestablish the current state.
Being able to accommodate the unexpected without triggering a re-write is probably more important in the long run
than shaving calculation time right now. Do not place restrictions on your computation model until you have to. Saving
current state often brings with it a number of built in assumptions and restrictions that reduce wiggle room for
accommodating future requirements.

Resources