Minimize transaction costs for settling debts in a pool - algorithm

Suppose a group of friends shared some expenses across a month, and need to settle debts at the end. Each friend has a certain amount of money they should give/receive (the sum of debts and receivable amounts is zero), but everything must be settled with direct transfers (no central pool, only money from one person to another), and each transfer has a cost.
For example, suppose 3 people, A, B and C.
Person A must pay $100
Person B must receive $50
Person C must receive $50
The cost of each transaction can be described by the following matrix(the person in the row paying to the person in the column).
Given these costs, the optimal solution would be
Person A transfers $100 to Person B
Person B transfers $50 to Person C
This settles all debts with a transaction cost of 2. How can I generalize this?
All I could find searching was for simplifying chain debts (person A owes person B that owes person C, so person A owes person C).
The closest I found is this, but it doesn't have costs for transactions.
Backstory(if anyone is interested):
We live in a house with 8 people, and each month we pay bills from our own money and annotate it in a spreadsheet so at the end of the month we share the expenses fairly. However we have accounts on different banks, and some of these require fees for transferring money to other banks, so we prefer to keep transactions from the same bank.

I found another, simpler solution. We're still talking about transfer costs which are proportional to the transferred amount. You can build a simple graph with just as many nodes as people and run the network simplex algorithm. Python example:
import networkx as nx
G = nx.DiGraph()
G.add_node('A', demand=-100) # A has to send 100
G.add_node('B', demand=50) # B will receive 50
G.add_node('C', demand=50) # C will receive 50
G.add_edge('A', 'B', weight=1)
G.add_edge('A', 'C', weight=5)
G.add_edge('B', 'A', weight=1)
G.add_edge('B', 'C', weight=1)
G.add_edge('C', 'A', weight=2)
G.add_edge('C', 'B', weight=2)
print nx.network_simplex(G)
outputs (150, {'A': {'C': 0, 'B': 100}, 'C': {'A': 0, 'B': 0}, 'B': {'A': 0, 'C': 50}})

In case, the bank charges a percentage of the transferred amount, your task is finding a min-cost max flow.
Your graph should have 5 layers.
(direction)
| Source layer: S
| / | \
| Exchange: A --- B --- C (complete graph)
| \ | /
V Sink : T
The source is connected to the nodes A, B, C ... The capacity of S -> A is how much A has to pay, and 0 if A does not own money. The cost of the edge is 0.
In the exchange layer A, B, C... are all connected to each other (complete graph).
The capacity of A -> B is infinite and the cost is how much you have to pay after transferring $1 from A to B (same for all pairs).
The nodes are connected to the sink. The capacity of A -> Sink is how much A will receive, and 0 if A does not receive money. The cost of the edge is 0.
Run a min-cost max flow algorithm on the above graph from the root source to the root sink, such as Edmonds-Karp + cycle-canceling. You are probably better off finding a library (such as the Boost Graph Library for C++), instead of implementing the algorithms yourself.

After #j_random_hacker explained this problem is np-hard in the comments, I lost all hope of making a pretty solution and set out to do one that works.
Following #user3386109's suggestion, also in the comments, I based myself on minpaths to solve the problem. So I began by using the [Floyd-Warshall algorithm] to find the minimum cost to get money from a person to another for every pair. This yields a matrix of the minimum cost for transferring money from the person in the row to the person in the column. I also applied the modification(it is available in the Wikipedia article in the Path Reconstruction section) to obtain a matrix that yields the next node(next hop, the actual person you must send your money if want to reach the person in the column with minimal cost) for the person in the row to transfer money to the person in the column.
Example of the initialized matrixes, and the result after running the algorithm(The important elements that changed have a red dot):
I then decided to make simple Branch-and-Bound recursion. Every recursion call would receive a list of debts, a matrix for all transactions so far and the cost to reach this state. It must return TODO. We keep a global for the best solution found, and at the beginning of the recursion call check if the cost to reach this state is already worst than the global best, if it is, we return an "infinite" cost to signal this is not a solution we need to consider.
Then we select someone that owes money in the debt list and for each person that must receive money, we create copies of the debt list and transaction matrix, and simulate the transaction of the maximum amount of money between these two people(If A must pay 100 but C only needs to receive 50, the maximum is 50). The transactions matrix is modified by increasing all transaction in the minpath for these two people by the amount transferred. The cost is incremented if we increased an element that was previously zero. We then call the recursion. If the debt list reaches 0, a solution was found, the global minimum cost is updated and the cost returned is 0.
In a previous version, I spawned a recursion for each pair of owing/receiving person. This yielded horrible performance, and proved unnecessary, since the order of transactions doesn't matter, and any debts that weren't yet settled we be treated at lower levels of recursion.
This algorithm seemed correct, but there was still a problem!
In the following example:
Person A must pay 40$
Person B must pay 40$
Person C must pay 20$
Person D must receive 100$
The algorithm as it is now, makes A, B, C each make a transfer to D. The actual best way would be to choose one of A, B or C to transfer all the money to D in only one payment.
In this example, person A,B and C all have the same extremely high cost to transfer to D, and the next hop for all of them to get money to D is to go directly to D. However, the optimal solution would be to make everyone transfer all their money to a single person n transfer it to D all in one go. The algorithm is failing to recognize that since someone already made a transfer to person D, the cost of this transaction is 0, and so we should use this new path. To address this issue, I included a matrix for costs and a matrix for paths in the recursion, at the beginning of the recursion we set all costs of transfers that have already been made in this recursion branch to 0, and run Floyd-Warshall algorithm again. The recursion uses these matrixes instead of the global ones and passes them on. Sure, it means multiplying complexity by V^3, but it's the only way I found to solve this issue.
The algorithm seems to be working now, but I will probably keep trying to improve it, especially in code readability. The complete code is available at:
My gitlab project, inside the calculations folder
Sorry for the long answer and delay in posting it, but I found important to document my work thoroughly.

Related

Football Guaranteed Relegation/Promotion Algorithm

I'm wondering if there is a way to speed up the calculation of guaranteed promotion in football (soccer) for a given football league table. It seems like there is a lot of structure to the problem so the exhaustive solution is perhaps slower than necessary.
In the problem there is a schedule (a list of pairs of teams) that will play each other in the future and a table (map) of points each team has earned in games in the past. I've included a sample real life points table below. Each future game can be won, lost or tied and teams earn 3 points for a win and 1 for a tie. Points (Pts) is what ultimately matters for promotion and num_of_promoted_teams (positive integer, usually around 1-3) are promoted at the end of each season.
The problem is to determine which (if any) teams are currently guaranteed promotion. Where guaranteed promotion means that no matter the outcome of the final games the team must end up promoted.
def promoted(num_of_promoted_teams, table, schedule):
return guaranteed_promotions
I've been thinking about using depth first search (of the future game results) to eliminate teams which would lower the average but not the worst case performance. This certainly help early in the season, but the problem could become large in mid-season before shrinking again near the end. It seems like there might be a better way.
A constraint solver should be fast enough in practice thanks to clever pruning algorithms, and hard to screw up. Here’s some sample code with the OR-Tools CP-SAT solver.
from ortools.sat.python import cp_model
def promoted(num_promoted_teams, table, schedule):
for candidate in table.keys():
model = cp_model.CpModel()
final_table = table.copy()
for home, away in schedule:
home_win = model.NewBoolVar("")
draw = model.NewBoolVar("")
away_win = model.NewBoolVar("")
model.AddBoolOr([home_win, draw, away_win])
model.AddBoolOr([home_win.Not(), draw.Not()])
model.AddBoolOr([home_win.Not(), away_win.Not()])
model.AddBoolOr([draw.Not(), away_win.Not()])
final_table[home] += 3 * home_win + draw
final_table[away] += draw + 3 * away_win
candidate_points = final_table[candidate]
num_not_behind = 0
for team, team_points in final_table.items():
if team == candidate:
continue
is_behind = model.NewBoolVar("")
model.Add(team_points < candidate_points).OnlyEnforceIf(is_behind)
model.Add(candidate_points <= team_points).OnlyEnforceIf(is_behind.Not())
num_not_behind += is_behind.Not()
model.Add(num_promoted_teams <= num_not_behind)
solver = cp_model.CpSolver()
status = solver.Solve(model)
if status == cp_model.INFEASIBLE:
yield candidate
print(*promoted(2, {"A": 10, "B": 8, "C": 8}, [("B", "C")]))
Here’s an alternative solution that is less extensible and probably slower in exchange for being self-contained and predictable.
This solution consists of an algorithm to test whether a particular team can finish behind a particular set of other teams (assuming unfavorable tie-breaking), wrapped in a loop over pairs consisting of a top-k team ℓ and a set of k teams W that might or might not finish ahead of ℓ (where k is the number of promoted teams).
If there were no draws, then we could use bipartite matching. Mark ℓ as having lost its remaining matches and mark W as having won their matches against teams not in W. On one side of the bipartite graph, there are nodes corresponding to matches between members of W. On the other side, there are zero or more nodes for each team in W, corresponding to the number of matches that that team must win to pull ahead of ℓ. If there is a matching that completely matches the latter side, then W can finish collectively in front of ℓ, and ℓ is not guaranteed promotion.
This could be extended easily if wins were 2 points instead of 3, but alas, 3 points causes the problem not to be convex, and we’re going to need some branching. The simplest branching method depends on the observation that it’s better for two teams to each win once and lose once against each other than draw twice. Hence, loop over all subsets of at most k choose 2 pairs of teams and run the algorithm above after marking each pair in the subset as having drawn once.
(I could propose improvements, but k is small, computers are cheap, programmers are expensive, and sports fans are relentless.)

Cannot find a solution for a budget expenditure maximisation problem

I am trying to solve a DP problem which consists on the following:
Let's say that we are a town hall with a budget B and have a set of projects P, each one with a cost p_i to build. Since the town hall's budget leftovers at the end of the year will not carry to the next one, they want to maximize the expenditure as much as possible.
Due to this, they want to find the maximum possible cost they can have by approving projects from P, without surpassing the maximum budget B.
So far, I have established the following recurrence relation/equation:
maxCost (B, i) =
if (B >= p_i)
// Since the budget allows it, choose the maximum between approving this
// project and evaluating the previous one with a smaller budget, or not
// approving it and evaluating the previous one with the same budget
max{maxCost(B-p_i, i - 1) + p_i, maxCost(B, i - 1)}
else
// Since the budget does not allow approving the current project, check the
// previous one with the same budget
maxCost(B, i - 1)
Additionally, maxCost (B, i) = 0 if i > length(P) || j < 1 (or 0, if 0-indexed)
The memoization structure can be a cost array C of size B (ignoring memory issues if B is enormous, or non-integer), where we always fill it from the right (we will always consult Cfor the evaluation of the previous project, for values smaller than the current budget, and write onto C[B]). The initial values for C will all be 0.
The problem appears when writing the algorithm itself, which translates to replacing maxCost(B, i) with C[B], keeping and updating B as a variable depending on which choice is made (approve or pass onto the next project), and iterating over the projects from the first one to the last one.
If the algorithm is done like that, any mock example will show that the solution is not optimal (it behaves more like a greedy algorithm), since the budget is subtracted unconditionally from the start and later projects will not be able to get approved, even if they were better choices.
For example, for:
B = 50
P = [10, 20, 40]
C = [0, 0, ... , 0] (size 50)
The algorithm will choose to approve the first and second projects, and the third project will not get approved since B(30)<40, but the optimal solution would be approving the first and third projects.
Is there something missing from the equations, or is the implementation for this case special? I have been unable to find a similar base problem for this one, so I couldn't find similar solutions.

Solving a travelling salesman problem to maximize gain in minimum time

Team
I need suggestions on how to solve the below problem.
There are n places (for example say 10 places). Time taken from any one place to the other is known. On reaching a particular place a known reward is given in the form of rupees (ex. if I travel from place 1 to place 2, I get 100 rupees. Travelling from place 2 to place 3 will fetch me 50 rupees etc...). Also, sometimes a particular place is unavailable to travel to which changes with time. At all time instances, whatever places can be traveled to is known, reward fetched from each place is known and the time taken to travel from one place to other is known. This is an ongoing process, meaning after you reach place A and earn 100 rupees, you travelled to place B and fetch 100 Rs. Then it is possible that place A can again fetch you rupees say 50 if you travel from B to A again.
Problem statement is:
A path should be followed with time ( A to B, B to C, C to B, B to A etc...) so that I always have maximum rupees in a given time. Thus at the end of 1 month, I should have followed a path that fetches me the maximum amount among all possibilities available.
We already know that in the traveling salesman problem it takes O(N!) to calculate the best way for the month if there are no changes. Because of the unknown changes that can happen, the best way is to use a greedy algorithm such that every time you come to new place, you calculate where you get the most R's in the least amount of time. It will take O(N*k) where k is the amount of time that you move between places in a month.
I'm not sure how this problem is related to travelling salesman -- I understood the latter as having the restriction of at least visiting all the places once.
Assuming we have all of the time instances and their related information ahead of our calculation, if we work backwards from each place we imagine ending at, the choices we have for the previous location visited dictate the possible times it took to get to the last place and the possible earning we could have gotten. Clearly from those choices we would choose the best reward among them because it's our last choice. Apply this idea recursively from there, until we reach the start of the month. If we run this recursion from each possible ending place, we can reuse states we've seen before; for example if we reached place A at time T as one of the options when calculating backwards from B, and then we reach A again at time T when calculating a path that started at C, we can reuse the record for the first state. The search space would be O(N*T) but practically would vary with the input.
Something like this? (Assumes we cannot wait in any one place. Otherwise, the solution could be better coded bottom-up where we can try all place + time states.) Return the best of running f with the same memo map on all possible ending states.
get_travel_time(place_a, place_b):
# returns travel time from place a to place b
get_neighbours(place):
# returns places from which we can travel to place
get_reward(place, time):
# returns the reward awarded at place place at time time
f(place, time, memo={}):
if time == 0:
return 0
key = (place, time)
if key in memo:
return memo[key]
current_reward = get_reward(place, time)
best = -Infinity
for neighbour in get_neighbours(place):
previous_time = time - get_travel_time(neighbour, place)
if previous_time >= 0:
best = max(best, current_reward + f(neighbour, previous_time, memo))
memo[key] = best
return memo[key]

Finding out if it is possible to balance out debts between people using a graph

Say a bunch of friends went on a trip. Something went wrong, and when they got back, some were not friends anymore. Now we need to find a way to split the cost of the trip. Some people are owed money (because they paid too much) and some are owing (paid too little).
Say we have two lists, debts and relationship. The debts list contains a list of integers. Each integer is a person. A positive integer means they owe that much money, negative means that are owed that much. Relationships is a list of tuples that shows if two people are friends. For instance (0, 3) would mean that debts[0] and debts[3] are friends. I'm trying to find a way to see if everyone can get even by transferring money between people.
An example:
debts = [5, -5, 10]
relationship = [(0, 1), (1, 2)]
The way I was thinking was making a graph to construct the relationships between each individual. Using DFS, I would start with the first person, clear their debt (i.e make it 0) and passing that money on to the next person. Eventually I would have gone through everyone and if at the end the last person does not have a value of 0, it is not possible to get even.
Am I on the correct path? I believe this would take O(|V|+|E|)

I'm trying to find a "bartender algorithm"

I was solving some example questions from an old programming contest. In this question we get an input of how many bartender we have, and which recipe's they know. Each cocktail takes 1 minute to make and we need to calculate if the order can be finished within 5 minutes, using all bartenders.
The key to solving this problem is assigning cocktails as efficient as possible. And thats where I'm stuck, my current algorithm gives the order to the bartender who knows the least other recipes. But of course this isn't 100% correct yet. Could anyone point me in the right direction (or give me an algorithm name to google) which solves this "bartender problem"?
This could be solved with a flow network.
The source has edges to each bartender, with capacity 5.
Each bartender have edges to each drink he/she can make, with capacity 5.
Each drink have edges to the sink, with a capacity corresponding to the number that is ordered.
Compute the maximum flow from the source to the sink. If any order remains unfulfilled, there is no solution.
Create a list of cocktails on the order, sequenced by how many tenders know how to make that cocktail
ie The order is for
(2*CocktailA, 1*CocktailB, 2*CocktailC, 1*CocktailD)
CocktailA can be made by 4 tenders (Tenders A, B, C, D)
CocktailA can be made by 4 tenders (Tenders A, B, C, D)
CocktailB can be made by 3 tenders (Tenders A, B, C)
CocktailC can be made by 1 tender (Tender A)
CocktailC can be made by 1 tender (Tender A)
CocktailD can be made by 1 tender (Tender B)
Work backwards through that list, assigning jobs to tenders. If multiple tenders can make the cocktail, then pick the one with the least amount of jobs already assigned.
CocktailD = Tender B
CocktailC = Tender A
CocktailC = Tender A (again)
CocktailB = Tender C
CocktailA = Tender D
CocktailA = Tender B (again)
Tenders A and B both have 2 jobs, so the order will take 2 mins.
This is a vertex coloring problem. It is exactly analogous to the register allocation problem which is very well studied. See http://en.wikipedia.org/wiki/Register_allocation. It can also be thought of as a set cover problem which is analogous to vertex coloring.
Of course, here we need not find the actual coloring, we just need to determine whether its cardinality is 5 or less. If the bartender graph can be colored in 5 or fewer colors, then the answer is Yes, otherwise No. Here is another nice paper describing the problem in terms of "tasks" and "days" and "machines": http://www.polymtl.ca/pub/sites/lagrapheur/docs/en/documents/NotesChap7.pdf.
Now, to figure this out, what is called the "chromatic number" or "chromatic index" of the graph, is NP-hard. In fact, someone has already asked on SO for an algorithm to find the chromatic number of a graph, but unfortunately did not get much of a response, see Algorithm for Chromatic Number of a Graph?
Just looking around the web I did find some code resources for doing colorings. One that can do this problem is called SMALLK. SMALLK can find colorings up to 8. Since we only need 5 for this problem this package can do it.
This is a variant on the college matching problem. Where drinks are students and bartenders are colleges. In turn it is a generalization of the stable marriage problem, which might be of more use to you.

Resources