Algorithm for maximizing happiness - algorithm

Imagine you have:
100 people
100 projects
Each person ranks all 100 projects in the order in which they would like to work on them. What kind of algorithm can be used to maximize the happiness of the people (i.e. being assigned to a project they ranked higher translates to greater happiness).
Assume one project per person.

The algorithm for this kind of problems is very popular and is known as the Hungarian algorithm. The similar problem solved with this kind of problem:
We consider an example where four jobs (J1, J2, J3, and J4) need to be
executed by four workers (W1, W2, W3, and W4), one job per worker. The
matrix below shows the cost of assigning a certain worker to a certain
job. The objective is to minimize the total cost of the assignment.
Source: http://www.hungarianalgorithm.com/examplehungarianalgorithm.php
Please note that the default hungarian algorithm finds the minimum cost but you can alter the program to make it work as maximizing the cost.
If the goal is to find the assignment that yields the maximum cost,
the problem can be altered to fit the setting by replacing each cost
with the maximum cost subtracted by the cost.
Source: http://en.wikipedia.org/wiki/Hungarian_algorithm
I've already implemented the Hungarian algorithm on my Github,
so feel free to use it and modify it to make it work as maximizing the cost.

Related

Dual knapsack algorithm

Say you have a warehouse with fragile goods (f.e. vegetables or fruits), and you can only take out a container with vegetables once. If you move them twice, they'll rot too fast and cant be sold anymore.
So if you give a value to every container of vegetables (depending on how long they'll still be fresh), you want to sell the lowest value first. And when a client asks a certain weight, you want to deliver a good service, and give the exact weight (so you need to take some extra out of your warehouse, and throw the extra bit away after selling).
I don't know if this problem has a name, but I would consider this the dual form of the knapsack problem. In the knapsack problem, you want to maximise the value and limit the weight to a maximum. While here you want to minimise the value and limit the weight to a minimum.
You can easily see this duality by treating the warehouse as the knapsack, and optimising the warehouse for the maximum value and limited weight to a maximum of the current weight minus what the client asks.
However, many practical algorithms on solving the knapsack problem rely on the assumption that the weight you can carry is small compared to the total weight you can chose from. F.e. the dynamic programming 0/1 solution relies on looping until you reach the maximum weight, and the FPTAS solution guarantees to be correct within a factor of (1-e) of the total weight (but a small factor of a huge value can still make a pretty big difference).
So both have issues when the wanted weight is big.
As such, I wondered if anyone studied the "dual knapsack problem" already (if some literature can be found around it), or if there's some easy modification to the existing algorithms that I'm missing.
The usual pseudopolynomial DP algorithm for solving knapsack asks, for each i and w, "What is the largest total value I can get from the first i items if I use at most w capacity?"
You can instead ask, for each i and w, "What is the smallest total value I can get from the first i items if I use at least w capacity?" The logic is almost identical, except that the direction of the comparison is reversed, and you need a special value to record the possibility that even taking all i of the first i items cannot reach w capacity -- infinity works for this, since you want this value to lose against any finite value when they are compared with min().

Travelling Salesman with multiple salesmen with a limit on number of cities per salesman?

Problem: I need to drop (n) employees from office to their homes(co-ordinates available). I have (x) 7-seater & (y) 4-seater cabs available.
I have to design an algorithm to drop all the employees to their homes while travelling minimum distance.
Also, the algorithm must tell me how many 7-seater or/and 4-seater vehicles I must choose so as to travel minimum distance.
eg. If I have 15 employees then the algorithm may tell me to use 1 (7-seater) cab & 2 (4-seater) cab & have the employees in each cab as following:
[(E2, E4, E6, E8), (E1, E3, E5, E7, E9, E10, E12), (E11, E13, E14, E15)]
Approach: I'm thinking of this as a Travelling Salesman Problem with multiple salesmen with an upper limit on number of cities each can travel. Also salesmen do not need to come back to the origin. Ant's colony problem came to my mind, but I can't really choose wisely which algorithm to choose
Requirement: I really need the ALGORITHM. Either TSP or Ant's colony, doesn't matter. I'll welcome opinions, but I really need the ALGORITHM.
This is a cost minimization problem, not a travelling salesman problem. It is related to TSP in the sense that TSP is a very specific cost minimization problem.
The solution consists of three steps:
Generate a list of employee drop-off points (nodes)
Create distinct paths that do not intersect, nor branch. These will be your routes and help prevent wasteful route overlaps. Use cost(path) = distance(furthest node and origin) + taxi_cost(nodes) + sum(distance between nodes) to compare paths and/or brute-force all potential networks. Networks are layouts of paths. DO NOT BRANCH THE PATHS!!
Total distance is a line of defense against waste ensuring that routes are not too long.
Sum of distances helps the algorithm converge on neighbourhoods where many employees live (when possible).
Because this variation of the coin problem allows imperfect solutions, it reduces to a variant of the Knapsack Problem. The utility of each taxi is capacity. If you also wish to choose the cheapest way to transport your employees, utility(taxi) = capacity/cost. From this our simplest solution is to be greedy; who cares about empty space? If you really care about filling up taxis perfectly (as opposed to cost efficiently), you'll need a much more complex solution. You only specify the least distance as your metric (with each additional taxi multiplying cost). I assume this is a proxy to say 'I don't want to pay too much'.
Therefore: taxi_cost(nodes) = math.floor(amount(nodes)/max(utility(taxis)+1). This equation selects the cheapest, roomiest taxi, and figures out how many of them are required to fully service the route.
Be sure to calculate the cost of each network you examine as sum(cost(path))
Once you've found the cheapest network to service, for each path in the chosen network:
make a list of employees travelling to the furthest node
fill the preferred taxi with those employees
repeat with the next furthest node until you have a full taxi, then add the filled taxi to the list. If you run out of employees, you've finished assigning taxis to the route. (The benefit of furthest-first selection is that you can ask employees in unfilled taxis to walk if that part of the route is within blocks of the office).
The algorithm above is not perfect, but it will have many desirable tendencies.
routes will be as short as possible and cover the greatest possible area (by not looping or branching)
routes will tend to service neighbourhoods, rather than trying to overlap responsibilities. This part of the algorithm isn't optimal, but is effective. This makes it really easy to remove service routes without needing to recalculate the transportation network.
the taxis chosen will be cost-efficient, helping to avoid paying more than necessary.
routes will use as few taxis as possible, taking into account the relative cost of upgrading to roomier ones with higher capacity
because the taxis travelling furthest will be full, it has less of an impact on your employee's ability to get to work if you decide to cancel service to emptier taxis.
Every step closer to perfection costs you many times more than the previous step, so diminished returns are acceptable if the solution provide desirable features. Although the algorithm makes some potentially sub-optimal tradeoffs, they come with huge value; your network of taxi routes becomes much easier to modify.
If you'd like to make an optimal solution, the Knapsack Problem, Coin Problem, and Change-making Problem help determine the cost of taxis and routes.
Spanning Trees are the most effective way to determine routes. Center the spanning tree at the office and calculate the cost of each branch as the maximum distance from the office. Try to keep each branch servicing areas with high density to make it easier to add and remove taxi routes.
Studying pathfinding can help you learn how to determine good cost functions so that you can numerically compare different potential paths. Remember that your network consists of a set of paths, but will require its own cost function so that you can compare different layouts.
I've written an in-depth guide to pathfinding for this answer. Pathfinding articles are few and just don't go into enough depth for a lot of problem spaces. A good cost function can get you a nearly perfect solution if you have multiple priorities. Unfortunately, good cost functions are domain specific so you will need to identify them yourself. Feel free to message me if you aren't sure how to make a path with certain traits and I'll help you figure out a good cost function.
It's a constraint satisfaction problem, not really a TSP. If you're looking for a project that may be able to help you, you could look into cspdb, which is what I wrote some time ago:
https://github.com/gtoonstra/cspdb
You'd be using a database in the backend that maintains the state and write a couple of scripts in it's own grammar that manipulates that state. A couple of examples are included to solve nqueens and classroom scheduling with multiple constraints.
From a list d destinations you can make the array of pairwise travel costs c. where c[a,b] is the travel cost from a to b...
Now you have a start point p. add to the array c2 values for p to each point in d.
So you now have the concept of groups.
You can look at this as a greedy algorithm. Given the list c2 you can take the cheapest option given your state.
Your state is the vector all you cab vectors (the costs of getting from where ever they are to where ever they could go next) .* the assignment vector where k == 0 for k in the. You find the minimum option given your state (considering adding another onerous to a cab of 4 costs the difference between the 4 person cab and the 7 person cab and adding a person to a zero person cab or adding a new cab also has a cost. Once all your people have been assigned to their cabs you have an answer.
The Idea of a greedy algorithm is most often characterized by the backpack problem but it can also be implemented for statistical methods such as feature selection.
Like #Aaron3468 this approach is not perfect and does not guarantee the best solution.
If you want the best possible solutions you can iterate through all the combinations but this becomes impractical quickly.
From my point of view your algorithm should solve 2 problems: the number of cars of each type and the shortest distance (how you number your employees depends on you or you should give more details). Sorry I'm a using a phone and I don't have have all features of the site.
For the number of cars you can use below algorithm. To solve issues related to distances, you should give more info about the paths and their lengths. A graph algorithm may then be combined with this to do the trick. Here 11=7+4.
Begin
Integer temp:= n/11
Integer rem:= n mod 11
If rem=0
x:=temp
y:=temp
Else if rem<=4
x:=temp
y:=temp+1
Else if rem<=7
x:=temp+1
y:=temp
Else
x:=temp+1
y:=temp+1
Endif
End

Multiple Knapsack Variations

I'm and undergrad doing some evolutionary algorithm work on the multiple knapsack problem. I've completed my code, but I'm struggling to understand an aspect of test cases. I've noticed that they have a constraint (weights or costs) matrix, as opposed to a list. Why? Why should the cost of an object depend on which knapsack it's in? I can certainly add to the algorithm to make this happen, but I don't understand its applications. Each test case I've found is in this format. Any help with the matrix or data with 1-dimensial constraint would be appreciated.
As a popular paper on the travelings salesman says:
The popularity of the travelings salesman problem does not originate from millions of salesman that want to calculate the optimal route.
Based on your specification of the problem, I think you are talking about the Multiple-Choice Knapsack Problem (p. 12). With P the price matrix and W the weight matrix.
The same of course holds for the Knapsack problem. Although the story is about a knapsack that knapsack can be anything.
Take for instance a scheduling problem. Say you have a (fictitious) hospital with three employees: two doctors and one nurse. Now each day, one makes a list of tasks these employees have to carry out (for instance examining patients, filling in forms,...). Now we can represent each of the employees as a knapsack since they have a limited number of hours they work that day.
The weight of a task describes the amount of time employee i needs to handle task j. And furthermore it can be used to specify that a certain task is forbidden. For instance in Belgium a nurse with an A-certificate is not allowed to give a patient an injection. You can enforce this by specifying that the nurse would take years to handle such task, the "weight" of that task is thus too large with respect to the capacity of that bag.
So: wij describes the time employee i spends to carry out task j and is set above the capacity if that task cannot be carried out by that employee.
Furthermore the value is for instance the quality of carrying out the task. If one of the doctors is specialized in heart diseases, evidently his diagnosis for patients with heart problems will be better. Or you can for instance use the inverse of the amount you need to pay the employee to handle that task (if the employees are paid on a per-task basis), to minimize the cost.
So pij specifies the quality with which employee i will carry out task j, or for instance the inverse of the cost of employee i carrying out task j.
The optimal configuration of the knapsack will thus specify which tasks the employees will handle that day such that no employee works more than is allowed (or performs tasks he/she is not licensed for), and furthermore optimizes the quality of service or minimizes the operational costs.
So if xij=1, that means in the optimal scenario, employee i will carry out task j that day.
A typical application is thus one where multiple employees/machines/servers handle tasks/problems/requests with different costs and bounds.
#IVlad made some constructive comments pointing to related problems:
In the Assignment problem, one aims to construct a set of edges in a bipartite graph such that no two edges share a node and the total weight is maximized. One cannot (evidently) map the MCKP to the AP, since the weights will discard situations where an employee will perform too much tasks, simply because it is optimal.
The multi-objective variant transforms the price matrix P into a tensor (or beyond) such that you have different evaluation criteria you can take into account (for instance both the quality and the price), and you search for an optimal solution for both objectives.
The test data I'm referring to is people.brunel.ac.uk/~mastjjb/jeb/orlib/files/mknap2.txt. It seems that this data is for the multi-dimensional knapsack problem.
This looks like data for the multiconstraint 0/1 knapsack problem: as you can see in that paper, there are m constraints:
Maximize
z = sum{j = 1 to n: c[j]*x[j]}
Subject to:
sum{j = 1 to n: a[i,j]*x[j]} <= b[i] i = 1, ..., m
x[j] in {0,1}
This is suggested by the following reference at the start of the data document you linked to:
Simulated Annealing: A. Drexel (1988) "A Simulated Annealing
Approach to the Multiconstraint Zero-One Knapsack Problem."
Computing, 40:1-8.
And it seems to be the only thing that fits the data format.

Group incoming and outgoing invoices to make their sum 0

I've faced an interesting problem today, and decided to write an algorithm in C# to solve it.
There are incoming invoices with negative totals and outgoing invoices with positive totals. The task is to make groups out of these invoices, where the total of the invoices adds up to exactly 0. Each group can contain unlimited members, so if there are two positive and one negative members but they total value is 0, it's okay.
We try to minimize the sum of the remaining invoices' totals, and there are no other constraints at all.
I'm wondering if this problem could be traced back to a known problem, and if not, which would be the most effective way to do this. The naive approach would be to separate incoming and outgoing invoices into two different groups, sort by total, then to try add invoices one by one until zero is reached or the sign has changed. However, this presumes that the invoices in a group should be approximately of the same magnitude, which is not true (one huge incoming invoice could be put against 10 smaller outgoing ones)
Any ideas?
The problem you are facing is a well known and studied one, and is called The Subset Sum Problem.
Unfortunately, the problem is NP-Complete, so there is no known polynomial solution for it1.
In fact, there is no known polynomial solution to even determine if such a subset (even a single one) exists, let alone find it.
However, if your input consists of relatively small (absolute value) integers, there is a pretty efficient (pseudo polynomial) dynamic programming solution that can be utilized to solve the problem.
If this is not the case some other alternatives are:
Using exponential solution like brute force (you might be able to optimize it using branch and bound technique)
Heuristical solutions, such as Steepest Ascent Hill Climbing or Genethic Algorithms.
Approximation algorithms
(1) And most computer science researchers believe one does not exist, this is basically the P VS NP Problem.

Calculating taxi movements

Let's say I have N taxis, and N customers waiting to be picked up by the taxis. The initial positions of both customers and taxis are random/arbitrary.
Now I want to assign each taxi to exactly one customer.
The customers are all stationary, and the taxis all move at identical speed. For simplicity, let's assume there are no obstacles, and the taxis can move in straight lines to assigned customers.
I now want to minimize the time until the last customer enters his/her taxi.
Is there a standard algorithm to solve this? I have tens of thousands of taxis/customers. Solution doesn't have to be optimal, just ‘good’.
The problem can almost be modelled as the standard “Assignment Problem”, solvable using the Hungarian algorithm (the Kuhn–Munkres algorithm or Munkres assignment algorithm). However, I want to minimize the cost of the costliest assignment, not minimize the sum of costs of the assignments.
Since you mentioned Hungarian Algorithm, I guess one thing you could do is using some different measure of distance rather than the euclidean distance and then run t Hungarian Algorithm on it. For example, instead of using
d = sqrt((x0 - x1) ^ 2 + (y1 - y0) ^ 2)
use
d = ((x0 - x1) ^ 2 + (y1 - y0) ^ 2) ^ 10
that could cause the algorithm to penalize big numbers heavily, which could constrain the length of the max distance.
EDIT: This paper "Geometry Helps in Bottleneck Matching and Related
Problems" may contains a better algorithm. However, I am still in the process of reading it.
I'm not sure that the Hungarian algorithm will work for your problem here. According to the link, it runs in n ^ 3 time. Plugging in 25,000 as n would yield 25,000 ^ 3 = 15,625,000,000,000. That could take quite a while to run.
Since the solution does not need to be optimal, you might consider using simulated annealing or possibly a genetic algorithm instead. Either of these should be much faster and still produce close to optimal solutions.
If using a genetic algorithm, the fitness function can be designed to minimize the longest period of time that an individual would need to wait. But, you would have to be careful because if that is the sole criteria, then the solution won't work too well for cases when there is just one cab that is closest to the passenger that is furthest away. So, the fitness function would need to take into account the other waiting times as well. One idea to solve this would be to run the model iteratively and remove the longest cab trip (both cab & person) after each iteration. But, doing that for all 10,000+ cabs/people could be expensive time wise.
I don't think any cab owner or manager would even consider minimizing the waiting time for the last customer entering his cab over minimizing the sum of the waiting time for all cabs - simply because they make more money overall when minimizing the sum of the waiting times. At least Louie DePalma would never do that... So, I suspect that the real problem you have has little or nothing to do with cabs...
A "good" algorithm that would solve your problem is a Greedy Algorithm. Since taxis and people have a position, these positions can be related to a "central" spot. Sort the taxis and people needing to get picked up in order (in relation to the "centre"). Then start assigning taxis, in order, to pick up people in order. This greedy rule will ensure taxis closest to the centre will pick up people closest to the centre and taxis farthest away pick up people farthest away.
A better way might be to use Dynamic Programming however, I am not sure nor have the time to invest. A good tutorial for Dynamic Programming can be found here
For an optimal solution: construct a weighted bipartite graph with a vertex for each taxi and customer and an edge from each taxi to each customer whose weight is the travel time. Scan the edges in order of nondecreasing weight, maintaining a maximum matching of the subgraph containing the edges scanned so far. Stop when the matching is perfect.

Resources