Moving Probability of Occurrences based on Success Rate - probability

I want to implement a microservice that calculates the Success Rate of each Participant. This success rate is then used for evaluating their Probability of Occurrence, for instance, a Participant with higher success rate is more likely to get a task, and vice versa for the one with lowest success rate.
After a certain amount of time, the Probability of Occurrence will be re-evaluated based on Success Rates, and the process continues recursively.
What I have found so far:
For calculating Success Rates, I am thinking of using Exponentially Weighted Moving Average
In my scenario, the smoothing factor should be high (around 0.80-0.93).
What I am looking for:
A reliable approach for the calculation of Success Rates with slower decay
A reliable approach for evaluating the Probability of Occurrence and how tasks should be routed to each Participant.
For more clarity, I'll try to explain this with an example:
Suppose there are 5 agents working in a call center. Now a call is received. The success rate of each agent for handling the calls and solving the query successfully is calculated. And then the call is routed to the best available choice in the pool. But, I don't want a single agent to handle all the calls. The calls should be routed to every agent, but the probability of routing the call to a certain agent should be based on his success rate calculated from the history. All agents will also be given a chance to improve their ranking iteratively after a certain amount of time.
Please suggest me some good approaches that I can look into.

Related

Parallel Solving with PDPTW in OptaPlanner

I'm trying to increase OptaPlanner performance using parallel methods, but I'm not sure of the best strategy.
I have PDPTW:
vehicle routing
time-windowed (1 hr windows)
pickup and delivery
When a new customer wants to add a delivery, I'm trying to figure out a fast way (less than a second) to show them what time slots are available in a day (8am, 9am, 10am, etc). Each time slot has different score outcomes. Some are very efficient and some aren't bookable depending on the time/situation with increased drive times.
For performance, I don't want to try each of the hour times in sequence as it's too slow.
How can I try the customer's delivery across all the time slots in parallel? It would make sense to run the solver first before adding the customer's potential delivery window and then share that solved original state with all the different added delivery's time slots being solved independently.
Is there an intuitive way to do this? Eg:
Reuse some of the original solving computation (the state before adding the new delivery). Maybe this can even be cached ahead of time?
Perhaps run all the time slot solving instances on separate servers (or at least multiple threads).
What is the recommended setup for something like this? It would be great to return an HTTP response within a second. This is for roughly 100-200 deliveries and 10-20 trucks.
Thanks!
A) If you optimize the assignment of 1 customer to 1 index in 1 of the vehicles, while pinning all other already assigned customers, then you forgoing all optimization benefits. It's not NP-hard.
You can still use OptaPlanner <constructionHeuristic/> for this (<localSearch/> won't improve the score), with or without moveThreadCount to spread it across cores, even though the main benefit will just the the incremental score calculation, not the AI algoritms.
B) Optimize assignment of all customers to an index of a vehicle. The real business benefits - like 25% less driving time - come when adding a new customer allows moving existing customer assignments too. The problem is that those existing customers already received a time window they blocked out in their agenda. But that doesn't need to be a problem if those time windows are wide enough: those are just hard constraints. Wider time windows = more driving time optimization opportunities (= more $$$, less CO² emissions).
What about the response within one minute?
At that point, you don't need to publish (= share info with the customer) which vehicle will come at which time in which order. You only need to publish whether or not you accept the time window. There's two ways to accomplish this:
C) Decision table based (a relaxation): no more than 5 customers per vehicle per day.
Pitfall: if it gets 5 customers in the 5 corners of the country/state, then it might still be infeasible. Factor in the average eucledean distance between any 2 customer location pairs to influence the decision.
D) By running optaplanner until termination feasible=true, starting from a warm start of the previous schedule. If no such feasible solution is found within 1000ms, reject the time window proposal.
Pitfall with D): if 2 requests come in at the same time, and you run them in parallel, so neither takes into account the other one, they could be feasible individually but infeasible together.

Micro-job platform - matching orders with nearest worker

I am currently planning to develop a micro-job platform to match maintenance workers (plumbers, electricians, carpenter, etc) with users in need through geolocation. Bookings can be made well in advance or last minute (minimum 1h before) at the same fare.
Now, I am struggling to find a suited matching algorithm to efficiently assign bookings to available workers. The matcher is basically a listener catching "NewBooking" events, which are fired on a regular basis until a worker is assigned to the specific booking. Workers can accept or decline the job and can choose working hours with a simple toggle button (when it's off they will not receive any request).
On overall the order is assigned within a certain km range.
The first system I thought of is based on concentric zones, whose radius is incremented every time the event is fired (not indefinitely). All workers online within the area will be notified and the first to accept gets the job.
Pros:
more opportunities to match last minute bookings;
Cons:
workers may get a lot of notifications;
the backend processing several push & mail messages;
A second solution is based on linear distance, assigning the work to the nearest available worker and, if (s)he does not accept it within a certain timeframe (like 30'), the algorithm goes to the next available person and so on.
Pros:
less processing power;
scalability with lots of workers and requests;
Cons:
less chances to match last minute orders;
Third alternative is to use the first approach sending orders in multiple batches according to feedback ratings; the first group to receive the notification is made out of those with 4+ stars, then 3+ avg. of votes and so on.
I was wondering if there is a best practice when it comes to this kind of matching algorithms since even taxi apps face these issues.
Anyway, which approach would you suggest (If any), or do you have any proposal on possible improvements?
Thank you

How could time series prediction be applied in a real-time setting where it is infeasible to track every single item?

The problem: Find the estimated lifetime of an object (e.g. the time it is written next) or the corresponding PDF. This is known as a renewal process.
Constraint: it is infeasible to track metadata for every single object
Assumptions: Prediction inaccuracies for low-volume objects are permissible but inaccuracy should decrease with increasing popularity of an object
Do you have any ideas how these predictions could be achieved, possibly through the use of sketch data structures (Bloom filters, Count-min sketches, etc.) or forms of sampling (e.g. exponentially-biased reservoir sampling)? Would assuming a specific stochastic process (e.g. a Poisson process) make the problem easier to solve?
A colorful example of this problem would be: estimate when a user will next visit your site / click something, without being able to track the history of every user.

How to make a GPS transportation app?

I live in Split,Croatia, and a city bus company recently accuired a new piece of software, and what it does is this: if I am a passenger and am awaiting a bus on a bus station, there is a huge monitor on which I can see the bus code and the time it will take for him to get to my station. The problem is, that in two years of having the software, not once have I seen that the time of the arrival is even remotley accurate. I am aware that GPS data can be inaccurate but this.... And that makes me so frustrated, that I decided that I will try to write a similar application for my final exam in CS in college. The problem is that I searched the web extensively in the last few days, and I cannot find good starting points. So my question is: have you ever been involved in such a project, and if so could you give me some pointers be it tutorials, or books regarding the subject? I appreciate any kind of input.
If I made a mistake regarding the question itself feel free to close it.
Thanks!
You will probably have:
Vehicle object containing each vehicles position, assigned route, direction of travel on route, and next scheduled stop, previous scheduled stop, arrival time at previous scheduled stop
Array of routes which comprises a list of stops and a data structure holding historic transit times between stops on each route
Now, updates to a vehicle's location push to the vehicle's object.
When you want to update a display at a station, find all routes passing through that station and for each route display the estimated arrival time of the next vehicle on that route.
The estimated arrival time structure is at the heart of this. Seed it by assuming some distance between stops and an average travel speed.
Now, every time a vehicle arrives at a stop, calculate the real time it took to get there from its last stop and use this to update an average transit time binned by half-hour increments (or what have you), you could also bin by season and/or day of week. The purpose of the binning is to implicitly account for varying traffic congestion by time of day, day of week, and/or season. Assuming otherwise homogeneous conditions, you'll eventually converge on a decent estimate of the transit time between each station.
You may find it useful to employ a Kalman filter.
Estimates of travel times between more distant stations may be more accurate than travel times between adjacent stations, if you feel like looking into that. Higher-order Markov chains may also help describe the underlying statistics of transit times.
Just ideas.

Strategy to find your best route via Public Transportation only?

Finding routes for a car is pretty easy: you store a weighted graph of all the roads and you could use Djikstra's algorithm [1]. A bus route is less obvious. With a bus you have to represent things like "wait 10 minutes for the next bus" or "walk one block to another bus stop" and feed those into your pathfinding algorithm.
It's not even always simple for cars. In some cities, some roads are one-way-only into the city in the morning, and one-way-only out of the city in the evening. Some advanced GPSs know how to avoid busy routes during rush hour.
How would you efficiently represent this kind of time-dependent graph and find a route? There is no need for a provably optimal solution; if the traveler wanted to be on time, they would buy a car. ;-)
[1] A wonderful algorithm to mention in an example because everyone's heard of it, though A* is a likelier choice for this application.
I have been involved in development of one journy planner system for Stockholm Public Transportation in Sweden. It was based on Djikstra's algorithm but with termination before every node was visited in the system. Today when there are reliable coordinates available for each stop, I guess the A* algorithm would be the choise.
Data about upcoming trafic was extracted from several databases regularly and compiled into large tables loaded into memory of our search server cluster.
One key to a sucessfull algorith was using a path cost function based on travel and waiting time multiplied by diffrent weightes. Known in Swedish as “kresu"-time these weighted times reflect the fact that, for example, one minute’s waiting time is typically equivalent in “inconvenience” to two minutes of travelling time.
KRESU Weight table
x1 - Travel time
x2 - Walking between stops
x2 - Waiting at a stop
during the journey. Stops under roof,
with shops, etc can get a slightly
lower weight and crowded stations a
higher to tune the algorithm.
The weight for the waiting time at the first stop is a function of trafic intensity and can be between 0.5 to 3.
Data structure
Area
A named area where you journey can start or end. A Bus Stop could be an area with two Stops. A larger Station with several platforms could be one area with one stop for each platform.
Data: Name, Stops in area
Stops
An array with all bus stops, train and underground stations. Note that you usually need two stops, one for each direction, because it takes some time to cross the street or walk to the other platform.
Data: Name, Links, Nodes
Links
A list with other stops you can reach by walking from this stop.
Data: Other Stop, Time to walk to other Stop
Lines/Tours
You have a number on the bus and a destination. The bus starts at one stop and passes several stops on its way to the destination.
Data: Number, Name, Destination
Nodes
Usually you have a timetable with the least the time for when it should be at the first and last stop in a Tour. Each time a bus/train passes a stop you add a new node to the array. This table can have millions of values per day.
Data: Line/Tour, Stop, Arrival Time, Departure Time, Error margin, Next Node in Tour
Search
Array with the same size as the Nodes array used to store how you got there and the path cost.
Data: Back-link with Previous Node (not set if Node is unvisited), Path Cost (infinit for unvisited)
What you're talking about is more complicated than something like the mathematical models that can be described with simple data structures like graphs and with "simple" algorithms like Djikstra's. What you are asking for is a more complex problem like those encountered in the world of automated logistics management.
One way to think about it is that you are asking a multi-dimensional problem, you need to be able to calculate:
Distance optimization
Time optimization
Route optimization
"Time horizon" optimization (if it's 5:25 and the bus only shows up at 7:00, pick another route.)
Given all of these circumstances you can attempt to do deterministic modeling using complex multi-layered data structures. For example, you could still use a weighted di-graph to represent the existing potential routes, wherein each node also contained a finite state automata which added a weight bias to a route depending on time values (so by crossing a node at 5:25 you get a different value than if your simulation crossed it at 7:00.)
However, I think that at this point you are going to find yourself with a simulation that is more and more complex, which most likely does not provide "great" approximation of optimal routes when the advice is transfered into the real world. It turns out that software and mathematical modeling and simulation is at best a weak tool when encountering real world chaotic behaviors and dynamism.
My suggestion would go to use an alternate strategy. I would attempt to use a genetic algorithm in which the DNA for an individual calculated a potential route, I would then create a fitness function which encoded costs and weights in a more "easy to maintain" lookup table fashion. Then I would let the Genetic Algorithm attempt to converge on a near optimal solution for a public transport route finder. On modern computers a GA such as this is probably going to perform reasonably well, and it should be at least relatively robust to real world dynamism.
I think that most systems that do this sort of thing take the "easy way out" and simply do something like an A* search algorithm, or something similar to a greedy costed weighted digraph walk. The thing to remember is that the users of the public transport don't themselves know what the optimal route would be, so a 90% optimal solution is still going to be a great solution for the average case.
Some data points to be aware of from the public transportation arena:
Each transfer incurs a 10 minute penalty (unless it is a timed transfer) in the riders mind. That is to say mentally a trip involving a single bus that takes 40 minutes is roughly equivalent to a 30minute trip that requires a transfer.
Maximum distance that most people are willing to walk to a bus stop is 1/4 mile. Train station / Light rail about 1/2 mile.
Distance is irrelevant to the public transportation rider. (Only time is important)
Frequency matters (if a connection is missed how long until the next bus). Riders will prefer more frequent service options if the alternative is being stranded for an hour for the next express.
Rail has a higher preference than bus ( more confidence that the train will come and be going in the right direction)
Having to pay a new fare is a big hit. (add about a 15-20min penalty)
Total trip time matters as well (with above penalties)
How seamless is the connect? Does the rider have to exist a train station cross a busy street? Or is it just step off a train and walk 4 steps to a bus?
Crossing busy streets -- another big penalty on transfers -- may miss connection because can't get across street fast enough.
if the cost of each leg of the trip is measured in time, then the only complication is factoring in the schedule - which just changes the cost at each node to a function of the current time t, where t is just the total trip time so far (assuming schedules are normalized to start at t=0).
so instead of Node A having a cost of 10 minutes, it has a cost of f(t) defined as:
t1 = nextScheduledStop(t); //to get the next stop time at or after time t
baseTime for leg = 10 //for example, a 10-minute trip
return (t1-t)+baseTime
wait-time is thus included dynamically in the cost of each leg, and walks between bus stops are just arcs with a constant time cost
with this representation you should be able to apply A* or Dijkstra's algorithm directly
Finding routes for a car is pretty
easy: you store a weighted graph of
all the roads and you could use
Djikstra's algorithm. A bus route
is less obvious.
It may be less obvious, but the reality is that it's merely another dimension to the car problem, with the addition of infinite cost calculation.
For instance, you mark the buses whose time is past as having infinite cost - they then aren't included in the calculation.
You then get to decide how to weight each aspect.
Transit Time might get weighted by 1
Waiting time might get weighted by 1
Transfers might get weighted by 0.5 (since I'd rather get there sooner and have an extra transfer)
Then you calculate all the routes in the graph using any usual cost algorithm with the addition of infinite cost:
Each time you move along an edge you have to keep track of 'current' time (add up the transit time) and if you arrive at a vector you have to assign infinite cost to any buses that are prior to your current time. The current time is incremented by the waiting time at that vector until the next bus leaves, then you're free to move along another edge and find the new cost.
In other words, there's a new constraint, "current time" which is the time of the first bus starting, summed with all the transit and waiting times of buses and stops traveled.
It complicates the algorithm only a little bit, but the algorithm is still the same. You can see that most algorithms can be applied to this, some might require multiple passes, and a few won't work because you can't add the time-->infinite cost calculation inline. But most should work just fine.
You can simplify it further by simply assuming that the buses are on a schedule, and there's ALWAYS another bus, but it increases the waiting time. Do the algorithm only adding up the transit costs, then go through the tree again and add waiting costs depending on when the next bus is coming. It will sometimes result in less efficient versions, but the total graph of even a large city is actually pretty small, so it's not really an issue. In most cases one or two routes will be the obvious winners.
Google has this, but also includes additional edges for walking from one bus stop to another so you might find a slightly more optimal route if you're willing to walk in cities with large bus systems.
-Adam
The way I think of this problem is that ultimately you are trying to optimize your average speed from your starting point to your ending point. In particular, you don't care at all about total distance traveled if going [well] out of your way saves time. So, a basic part of the solution space is going to need to be identifying efficient routes available that cover non-trivial parts of the total distance at relatively high speeds between start and finish.
To your original point, the typical automotive route algorithms used by GPS navigation units to make the trip by car is a good bound for a target optimal total time and optimal route evaluations. In other words, your bus based trip would be doing really good to approach a car based solution. Clearly, the bus route based system is going to have many more constraints than the car based solutions, but having the car solution as a reference (time and distance) gives the bus algorithm a framework to optimize against*. So, put loosely, you want to morph the car solution towards the set of possible bus solutions in an iterative fashion or perhaps more likely take possible bus solutions and score them against your car based solution to know if you are doing "good" or not.
Making this somewhat more concrete, for a specific departure time there are only going to be a limited number of buses available within any reasonable period of time that can cover a significant percentage of your total distance. Based on the straight automotive analysis reasonable period of time and significant percentage of distance become quantifiable using some mildly subjective metrics. Certainly, it becomes easier to score each possibility relative to the other in a more absolute sense.
Once you have a set of possible major segment(s) available as possible answers within the solution, you then need to hook them together with other possible walking and waiting paths....or if sufficiently far apart recursive selection of additional short bus runs. Intuitively, it doesn't seem that there is really going to be a prohibitive set of choices here because of the Constraints Paradox (see footnote below). Even if you can't brute force all possible combinations from there, what remains should be able to be optimized using a simulated annealing (SA) type algorithm. A Monte Carlo method would be another option.
The way we've broken the problem down to this point leaves us something that is quite analogous to how SA algorithms are applied to the automated layout and routing of ASIC chips, FPGA's and also the placement and routing of printed circuit boards of which there is quite a bit of published work on optimizing that type of problem form.
* Note: I usually refer to this as "The Constraints Paradox" - my term. While people can naturally think of more constrained problems as harder to solve, the constraints reduce choices and less choices means easier to brute force. When you can brute force, then even the optimal solution is available.
Basically, a node in your graph should not only represent a location, but also the earliest time you can get there. You can think of it as graph exploration in the (place,time) space. Additionally, if you have (place, t1) and (place,t2) where t1<t2, discard (place,t2).
Theoretically, this will get the earliest arrival time for all possible destinations from your starting node. In practice, you need some heuristic to prune roads that take you too far away from your destination.
You also need some heuristic to consider the promising routes before the less promising ones - if a route leads away from your destination, it is less likely (but not totally unlikely) to be good.
I think Your problem is more complicated than You expect. Recent COST action is focused on solving this problem: http://www.cost.esf.org/domains_actions/tud/Actions/TU1004 : "Modelling Public Transport Passenger Flows in the Era of Intelligent Transport Systems".
From my point of view regular SPS algorithms are not suitable for this. You have dynamic network state, where certain options to travel forward are incotinuous (route is always "opened" for car, bike, pedestrain, while transit connection is available only at certain dwell time).
I think new polycriterial (time, reliability, cost, comfort, and more criteria) approach is desired here. It needs to be computed real-time to 1) publish information to end user within short time 2) be able to adjust path in real-time (based on real-time traffic conditions - from ITS).
I'm about to think about this problem for the next several months (maybe even throughout a PhD thesis).
Regards
Rafal
I dont think there is any other special data structure that would cater for these specific needs but you can still use the normal data structures like a linked list and then make route calculations per given factor-you are going to need some kind of input into your app of the variables that affect the result and then make calculations accordingly i.e depending on the input.
As for the waiting and stuff, these are factors that are associated with a particular node right? You can translate this factor into a route node for each of the branches attached to the node. For example you can say for every branch from Node X, if there is a wait for say m minutes on Node X, then scale up the weight of the branch by
[m/Some base value*100]% (just an example). In this way, you have factored in the other factors uniformly but at the same time maintaining a simple representation of the problem you want to solve.
If I was tackling this problem, I'd probably start with an annotated graph. Each node on the graph would represent every intersection in the city, whether or not the public transit system stops there - this helps account for the need to walk, etc. On intersections with transit service, you annotate these with stop labels - the labels allowing you to lookup the service schedule for the stop.
Then you have a choice to make. Do you need the best possible route, or merely a route? Are you displaying the routes in real time, or can solutions be calculated and cached?
If you need "real time" calculation, you'll probably want to go with a greedy algorithm of sorts, I think an A* algorithm would probably fit this problem domain fairly nicely.
If you need optimal solutions, you should look at dynamic programming solutions to the graph... optimal solutions will likely take much longer to calculate, but you only need to find them once, then they can be cached. Perhaps your A* algorithm could use pre-calculated optimal paths to inform its decisions about "similar" routes.
A horribly inefficient way that might work would be to store a copy of each intersection in the city for each minute of the day. A bus route from Elm St. and 2nd to Main St. and 25th would be represented as, say,
elm_st_and_2nd[12][30].edges :
elm_st_and_1st[12][35] # 5 minute walk to the next intersection
time = 5 minutes
transport = foot
main_st_and_25th[1][15] # 40 minute bus ride
time = 40 minutes
transport = bus
elm_st_and_1st[12][36] # stay in one place for one minute
time = 1 minute
transport = stand still
Run your favorite pathfinding algorithm on this graph and pray for a good virtual memory implementation.
You're answering the question yourself. Using A* or Dijkstra's algorithm, all you need to do is decide on a good cost per part of each route.
For the bus route, you're implying that you don't want the shortest, but the fastest route. The cost of each part of the route must therefore include the average travel speed of a bus in that part, and any waits at bus stops.
The algorithm for finding the most suitable route is then still the same as before. With A*, all the magic happens in the cost function...
You need to weight the legs differently. For example - on a rainy day I think someone might prefer to travel longer in a vehicle than walk in the rain. Additionally, someone who detests walking or is unable to walk might make a different/longer trip than someone who would not mind walking.
These edges are costs, but I think you can expand the notion/concept of costs and they can have different relative values.
The algorithm remains the same, you just increase the weight of each graph edge according to different scenarios (Bus schedules etc).
I put together a subway route finder as an exercise in graph path finding some time ago:
http://gfilter.net/code/pathfinderDemo.aspx

Resources