Multiple Knapsack Variations - algorithm

I'm and undergrad doing some evolutionary algorithm work on the multiple knapsack problem. I've completed my code, but I'm struggling to understand an aspect of test cases. I've noticed that they have a constraint (weights or costs) matrix, as opposed to a list. Why? Why should the cost of an object depend on which knapsack it's in? I can certainly add to the algorithm to make this happen, but I don't understand its applications. Each test case I've found is in this format. Any help with the matrix or data with 1-dimensial constraint would be appreciated.

As a popular paper on the travelings salesman says:
The popularity of the travelings salesman problem does not originate from millions of salesman that want to calculate the optimal route.
Based on your specification of the problem, I think you are talking about the Multiple-Choice Knapsack Problem (p. 12). With P the price matrix and W the weight matrix.
The same of course holds for the Knapsack problem. Although the story is about a knapsack that knapsack can be anything.
Take for instance a scheduling problem. Say you have a (fictitious) hospital with three employees: two doctors and one nurse. Now each day, one makes a list of tasks these employees have to carry out (for instance examining patients, filling in forms,...). Now we can represent each of the employees as a knapsack since they have a limited number of hours they work that day.
The weight of a task describes the amount of time employee i needs to handle task j. And furthermore it can be used to specify that a certain task is forbidden. For instance in Belgium a nurse with an A-certificate is not allowed to give a patient an injection. You can enforce this by specifying that the nurse would take years to handle such task, the "weight" of that task is thus too large with respect to the capacity of that bag.
So: wij describes the time employee i spends to carry out task j and is set above the capacity if that task cannot be carried out by that employee.
Furthermore the value is for instance the quality of carrying out the task. If one of the doctors is specialized in heart diseases, evidently his diagnosis for patients with heart problems will be better. Or you can for instance use the inverse of the amount you need to pay the employee to handle that task (if the employees are paid on a per-task basis), to minimize the cost.
So pij specifies the quality with which employee i will carry out task j, or for instance the inverse of the cost of employee i carrying out task j.
The optimal configuration of the knapsack will thus specify which tasks the employees will handle that day such that no employee works more than is allowed (or performs tasks he/she is not licensed for), and furthermore optimizes the quality of service or minimizes the operational costs.
So if xij=1, that means in the optimal scenario, employee i will carry out task j that day.
A typical application is thus one where multiple employees/machines/servers handle tasks/problems/requests with different costs and bounds.
#IVlad made some constructive comments pointing to related problems:
In the Assignment problem, one aims to construct a set of edges in a bipartite graph such that no two edges share a node and the total weight is maximized. One cannot (evidently) map the MCKP to the AP, since the weights will discard situations where an employee will perform too much tasks, simply because it is optimal.
The multi-objective variant transforms the price matrix P into a tensor (or beyond) such that you have different evaluation criteria you can take into account (for instance both the quality and the price), and you search for an optimal solution for both objectives.

The test data I'm referring to is people.brunel.ac.uk/~mastjjb/jeb/orlib/files/mknap2.txt. It seems that this data is for the multi-dimensional knapsack problem.
This looks like data for the multiconstraint 0/1 knapsack problem: as you can see in that paper, there are m constraints:
Maximize
z = sum{j = 1 to n: c[j]*x[j]}
Subject to:
sum{j = 1 to n: a[i,j]*x[j]} <= b[i] i = 1, ..., m
x[j] in {0,1}
This is suggested by the following reference at the start of the data document you linked to:
Simulated Annealing: A. Drexel (1988) "A Simulated Annealing
Approach to the Multiconstraint Zero-One Knapsack Problem."
Computing, 40:1-8.
And it seems to be the only thing that fits the data format.

Related

Complexity of a non-integer K-Commodity Flow without conservation property

I'm working on a problem that can be seen as a version of the Santa Claus problem (defined here for example : https://dl.acm.org/citation.cfm?id=1132522) where the goods are divisble instead of indivisible.
For the indivisible problem, a reduction to the Partitoning problem is possible to classify it as NP-hard (see Golovin 2005 : page 3). However, with divisble goods, I couldn't find much litterature unless i changed the problem to another form.
The problem can be reduced to the K-commodity problem (an extension of ND38 (from Garey and Johnson) : Directed Two-commodity Integral Flow) with integral flow which is NP-complete, and with non-integral flows it is poynomially equivalent to Linear Programming (for two or more commodities).
However, the edge that I have in my model wouldn't be conservative as the utility of each resources is not the same for each commodities, and thus, a total input flow flow of 1 unit of commodity i into v doesn't means that the output flow is also 1. From Wikipedia it would be defined as preflow because it lacks the "Flow Conservation" property, which is essential in the problem also defined on Wikipedia.
Is there a way to prove/explain the complexity class of the K-commodity non-integral flows without the flow conservation property (which my problem can be reduced to) ?
To explain a bit more about the problem, i have N employees and M tasks. Each employee i has an efficiency in each task j defined as e_(i,j). The efficiency can be 0 if the employee doesn't know how to do the task. Each employees can work up to H_i hours and can divide his time between the different tasks that he can do.
The objective here is to maximize to production function of the firm which is a Leontieff production function, which is the last done task (the production is a max-min across the differents tasks). There is no collaboration, so the produced task amount is equal to the contribution of each employee (efficiency multiplies by the number of hours passed on this task).
If we think of the task as the agents, this problem can be seen as a max-min utility across tasks (agents) of the allocation of divisble goods (worker hours) with differentiated utilities (efficiencies).
As I can't use a linear solver insidemy program, i am limited to finding a good greedy or FPTAS algorithm to solve this within an acceptable margin of error.
Thank you for reading. I would be grateful if you have any idea or general direction/keywords to guide me in my research.

Travelling Salesman with multiple salesmen with a limit on number of cities per salesman?

Problem: I need to drop (n) employees from office to their homes(co-ordinates available). I have (x) 7-seater & (y) 4-seater cabs available.
I have to design an algorithm to drop all the employees to their homes while travelling minimum distance.
Also, the algorithm must tell me how many 7-seater or/and 4-seater vehicles I must choose so as to travel minimum distance.
eg. If I have 15 employees then the algorithm may tell me to use 1 (7-seater) cab & 2 (4-seater) cab & have the employees in each cab as following:
[(E2, E4, E6, E8), (E1, E3, E5, E7, E9, E10, E12), (E11, E13, E14, E15)]
Approach: I'm thinking of this as a Travelling Salesman Problem with multiple salesmen with an upper limit on number of cities each can travel. Also salesmen do not need to come back to the origin. Ant's colony problem came to my mind, but I can't really choose wisely which algorithm to choose
Requirement: I really need the ALGORITHM. Either TSP or Ant's colony, doesn't matter. I'll welcome opinions, but I really need the ALGORITHM.
This is a cost minimization problem, not a travelling salesman problem. It is related to TSP in the sense that TSP is a very specific cost minimization problem.
The solution consists of three steps:
Generate a list of employee drop-off points (nodes)
Create distinct paths that do not intersect, nor branch. These will be your routes and help prevent wasteful route overlaps. Use cost(path) = distance(furthest node and origin) + taxi_cost(nodes) + sum(distance between nodes) to compare paths and/or brute-force all potential networks. Networks are layouts of paths. DO NOT BRANCH THE PATHS!!
Total distance is a line of defense against waste ensuring that routes are not too long.
Sum of distances helps the algorithm converge on neighbourhoods where many employees live (when possible).
Because this variation of the coin problem allows imperfect solutions, it reduces to a variant of the Knapsack Problem. The utility of each taxi is capacity. If you also wish to choose the cheapest way to transport your employees, utility(taxi) = capacity/cost. From this our simplest solution is to be greedy; who cares about empty space? If you really care about filling up taxis perfectly (as opposed to cost efficiently), you'll need a much more complex solution. You only specify the least distance as your metric (with each additional taxi multiplying cost). I assume this is a proxy to say 'I don't want to pay too much'.
Therefore: taxi_cost(nodes) = math.floor(amount(nodes)/max(utility(taxis)+1). This equation selects the cheapest, roomiest taxi, and figures out how many of them are required to fully service the route.
Be sure to calculate the cost of each network you examine as sum(cost(path))
Once you've found the cheapest network to service, for each path in the chosen network:
make a list of employees travelling to the furthest node
fill the preferred taxi with those employees
repeat with the next furthest node until you have a full taxi, then add the filled taxi to the list. If you run out of employees, you've finished assigning taxis to the route. (The benefit of furthest-first selection is that you can ask employees in unfilled taxis to walk if that part of the route is within blocks of the office).
The algorithm above is not perfect, but it will have many desirable tendencies.
routes will be as short as possible and cover the greatest possible area (by not looping or branching)
routes will tend to service neighbourhoods, rather than trying to overlap responsibilities. This part of the algorithm isn't optimal, but is effective. This makes it really easy to remove service routes without needing to recalculate the transportation network.
the taxis chosen will be cost-efficient, helping to avoid paying more than necessary.
routes will use as few taxis as possible, taking into account the relative cost of upgrading to roomier ones with higher capacity
because the taxis travelling furthest will be full, it has less of an impact on your employee's ability to get to work if you decide to cancel service to emptier taxis.
Every step closer to perfection costs you many times more than the previous step, so diminished returns are acceptable if the solution provide desirable features. Although the algorithm makes some potentially sub-optimal tradeoffs, they come with huge value; your network of taxi routes becomes much easier to modify.
If you'd like to make an optimal solution, the Knapsack Problem, Coin Problem, and Change-making Problem help determine the cost of taxis and routes.
Spanning Trees are the most effective way to determine routes. Center the spanning tree at the office and calculate the cost of each branch as the maximum distance from the office. Try to keep each branch servicing areas with high density to make it easier to add and remove taxi routes.
Studying pathfinding can help you learn how to determine good cost functions so that you can numerically compare different potential paths. Remember that your network consists of a set of paths, but will require its own cost function so that you can compare different layouts.
I've written an in-depth guide to pathfinding for this answer. Pathfinding articles are few and just don't go into enough depth for a lot of problem spaces. A good cost function can get you a nearly perfect solution if you have multiple priorities. Unfortunately, good cost functions are domain specific so you will need to identify them yourself. Feel free to message me if you aren't sure how to make a path with certain traits and I'll help you figure out a good cost function.
It's a constraint satisfaction problem, not really a TSP. If you're looking for a project that may be able to help you, you could look into cspdb, which is what I wrote some time ago:
https://github.com/gtoonstra/cspdb
You'd be using a database in the backend that maintains the state and write a couple of scripts in it's own grammar that manipulates that state. A couple of examples are included to solve nqueens and classroom scheduling with multiple constraints.
From a list d destinations you can make the array of pairwise travel costs c. where c[a,b] is the travel cost from a to b...
Now you have a start point p. add to the array c2 values for p to each point in d.
So you now have the concept of groups.
You can look at this as a greedy algorithm. Given the list c2 you can take the cheapest option given your state.
Your state is the vector all you cab vectors (the costs of getting from where ever they are to where ever they could go next) .* the assignment vector where k == 0 for k in the. You find the minimum option given your state (considering adding another onerous to a cab of 4 costs the difference between the 4 person cab and the 7 person cab and adding a person to a zero person cab or adding a new cab also has a cost. Once all your people have been assigned to their cabs you have an answer.
The Idea of a greedy algorithm is most often characterized by the backpack problem but it can also be implemented for statistical methods such as feature selection.
Like #Aaron3468 this approach is not perfect and does not guarantee the best solution.
If you want the best possible solutions you can iterate through all the combinations but this becomes impractical quickly.
From my point of view your algorithm should solve 2 problems: the number of cars of each type and the shortest distance (how you number your employees depends on you or you should give more details). Sorry I'm a using a phone and I don't have have all features of the site.
For the number of cars you can use below algorithm. To solve issues related to distances, you should give more info about the paths and their lengths. A graph algorithm may then be combined with this to do the trick. Here 11=7+4.
Begin
Integer temp:= n/11
Integer rem:= n mod 11
If rem=0
x:=temp
y:=temp
Else if rem<=4
x:=temp
y:=temp+1
Else if rem<=7
x:=temp+1
y:=temp
Else
x:=temp+1
y:=temp+1
Endif
End

Algorithm for maximizing happiness

Imagine you have:
100 people
100 projects
Each person ranks all 100 projects in the order in which they would like to work on them. What kind of algorithm can be used to maximize the happiness of the people (i.e. being assigned to a project they ranked higher translates to greater happiness).
Assume one project per person.
The algorithm for this kind of problems is very popular and is known as the Hungarian algorithm. The similar problem solved with this kind of problem:
We consider an example where four jobs (J1, J2, J3, and J4) need to be
executed by four workers (W1, W2, W3, and W4), one job per worker. The
matrix below shows the cost of assigning a certain worker to a certain
job. The objective is to minimize the total cost of the assignment.
Source: http://www.hungarianalgorithm.com/examplehungarianalgorithm.php
Please note that the default hungarian algorithm finds the minimum cost but you can alter the program to make it work as maximizing the cost.
If the goal is to find the assignment that yields the maximum cost,
the problem can be altered to fit the setting by replacing each cost
with the maximum cost subtracted by the cost.
Source: http://en.wikipedia.org/wiki/Hungarian_algorithm
I've already implemented the Hungarian algorithm on my Github,
so feel free to use it and modify it to make it work as maximizing the cost.

Algorithm for sorting people into rooms based on age and nationality

I’m working on program for the English Language school I work for. I’m not being paid, its just a kind of a hobby to improve / automate my work flow.
It’s a residential school and one aspects I’m looking at automating is the way we allocate room to students, and although I don’t want a full blown solution I was hoping someone could point me in the right direction… Suggestions of the way you might approach this or by suggesting algorithms to look at etc.
Basically at the school we have a whole bunch of different rooms ranging from singles to dormitories for 8 people. We get lots of different nationalities from all over the world, and we always try to maker sure each room has a mix of nationalities. Where there is more than one nationality we try to balance them. Age is also important, we always put students of a similar age together, while still trying to mix nationalities, and its unusual for us to have students sharing with more than two years between them.
I suppose more generically speaking, I am in interested in how to sort a given set of students based on two parameters to an optimal result with a few rules attached.
I hope I’ve explain clearly what I am trying to achieve… in a way it sounds really simple, but I’ve trying to think how to do it in a simple way, i.e. by sorting by nationality and then by age but it just doesn’t cut it and I know there must be a better way of approaching this. When I do it “by hand” on an excel sheet it does feel quite intuitive.
Thank you to anyone who offers help / advice.
This is an interesting question but it's not easy to answer. Somehow it's connected with subdivsion and bin packing or the cutting-stock problem. You may want to look for a topological sort too. You can look for Drools a business logic platform that let you define such rules.
First of all you might find this interesting: Stable Room-mates Problem (wikipedia). Unfortunately it does not answer your question.
Try a genetic algorithm.
There are three main criteria for using a genetic algorithm:
ability to represent a solution as a mutable array. We can have an array of integers such that a[i] is the room for the ith student.
mutation of the state should produce predictable results. In our case this is true. Mutating the array will predictably shuffle students between the rooms.
easy to write a fast fitness function. Shouldn't be too hard to write a O(n) fitness function.
This is an interesting problem. I'll try writing some code with this approach and we'll see what happens.
How about, you think of a room as something that repels students of a nationality it already has, and attracts students of a close age to what it already has. The closer the age to the average age, the more it attracts it, and the more guys of X nationality are in the room, the more if repels guys of X nationality.
Then you would, for every new student to be added, iterate through each room and see which is the one that attracts it more. I guess if the room is empty you can set all forces to 0. Also, you would have a couple of constants that multiply each of both "forces" so you can calibrate it depending on how important is to have the same age against how important is to have different nationalities.
I'd analyze each student and create a 'personality' vector based on his/her age & nationality. Then I'd sort the vectors, and maybe scramble the results a bit after sorting to encourage diversity.
The general theme of "assign x to y with respect to constraints while optimizing some quantity" falls within operations research or more specifically http://en.wikipedia.org/wiki/Mathematical_optimization. The usual approach is to formally specify the problem and use a generic optimization solver such as one of those listed in http://en.wikipedia.org/wiki/List_of_optimization_software.
Give it a try, the formal specification languages for using the existing solvers are rather easy to learn and you might get an optimal solution without having to debug a complicated algorithm.
Formulation as a General Optimization Problem
It will be useful to formalize constraints and parameters. Let us assume that for 1 <= i <= 8, we have n_i rooms available of size i. Now let us impose the hard constraint that in a particular room S, every two students a, b \in S, we have that:
|Grade(a) - Grade(b)| <= 2 (1)
Now we are interested in optimizing the "diversity" function which intuitively represents the idea that we want rooms to be as mixed as possible. So we can represent this goal as:
max over all arrangements {{ Sum over all rooms S of DiversityScore(S) }}
where we have DiversityScore(S) = # of Different Nationalities in the Room
Formulation as a Graph Problem
This is the most general setting, but clearly max over all arrangements is not computationally feasible. Now let us pose this as a sort of graph problem with the hard grade constraints. Denote all students as a vertex in a Graph G. Connect two vertices if students satisfy constraint (1). Now a clique in this graph represents a group of students that can all be placed in the same room. Now proceed in a greedy manner. Choose the largest clique of size 4 which has the largest Diversity Score. Then place them in a room and continue until all rooms are filled. This clique search method can also incorporate gender constraints which is useful, however not that Clique finding is NP Hard Problem.
Now before trying to come up with something that may be faster, let us think about how to weaken the hard constraint (1). We can massage our graph formulation by including edge weights into the picture. So if the hard constraint is satisfied denote the edge weight from i to j as 1. If two students i and j deviate by age more than 2 denote the edge weight as 1 / (Age Difference)^2 or something. Then the score of a clique should be a product of the cliques edge weights with some diversity score. However it becomes clear that now the problem is on a complete graph, which is just the general optimization we hoped to avoid, so we need to impose some hard restrictions to reduce the connectivity of our graph.
A Basic Sorting Approximation Algorithm
Sort all students by their age, so we have a sorted array where all students in a[i] have the same age, and all students in a[i] are older than all students in a[j] for all j < i.
Now consider each pair i, j, of which there are O(n^2), where we also have that |Age[i] - Age[j]| <= 2. Find the largest group of students with different nationalities and place them in a room together. We successively iterate over O(n^2) index pairs which satisfy the hard constraint and take any students with nationality difference (which we can find by preprocessing and hashing on the index pairs). Doing this carefully (like looking at indices i j which are spread apart before close together) improves running time further. It feels like it should be polytime, but I think there are certain subtleties to address first before saying so.

Algorithm to transform a workflow DAG into parallel resource allocation?

Say I have a graph where nodes are workloads of various kinds and edges are dependencies between the workloads. (This is a DAG since cyclical dependencies must not exist.)
I also have a set of multiple agents who can perform the work.
Some workload varieties may be given to any agent, others must be given to a specific agent, and others must be given to one agent among a particular group of agents.
How do I assign workloads such that:
No workload is given to an agent until all its blocking workloads are completed
The shortest possible time is required to complete the total workload graph. (Note that minimizing agent idle time is generally good, but not a fundamental requirement - there may be scenarios under which one particular agent idles for longer but the total time to complete all jobs across all agents is at a minimum.)
Workloads have duration estimates, but assume for simplicity's sake that every workload takes equal time to compute. (Just break each workload down into multiple, serially-dependent workloads until every workload is effectively a constant-time operation.)
I'm aware of topological DAG sorting, but that produces a single, serial ordering of nodes. I have multiple agents operating in parallel, and the relationships are such that potentially large timing optimizations can be made by non-obvious reordering of the tasks.
The result of this would be rendered best as a Gantt chart of minimum overall duration. In fact, if you think of the problem as the allocation of bug tickets in a milestone to engineers in a team, with the goal of getting the milestone done ASAP, then you get the idea. (No... please don't tell me to import my graph into MS Project and then export it :) - I'm interested in the algorithm behind it!)
Pointers to well known algorithms, software libraries, or general issues and principles are much appreciated!
Unless you have infinite number of agents so that a compatible agent is available as soon as all the predecessors of a task is done, this is an NP-hard problem.
< shameless plug >
A very similar problem is there in my book "Algorithms For Interviews"
< /shameless plug >
Here is the problem and the solution from the book:
We need to schedule N lectures in M classrooms. Some of those lectures are prerequisites for others. How would you choose when and where to hold the lectures in order to finish all the lectures as soon as possible?
Solution:
We are given a set of N unit duration lectures and M classrooms. The lectures can be held simultaneously as long as no two lectures need to happen in the same classroom at the same time and all the precedence constraints are met.
The problem of scheduling these lectures so as to minimize the time taken to completion is known to be NP-complete.
This problem is naturally modeled using graphs. We model lectures as vertices, with an edge from vertex u to vertex v if u is a prerequisite for v. Clearly, the graph must be acyclic for the precedence constraints to be satisfied.
If there is just one lecture room, we can simply hold the lectures in topological order and complete the N lectures in N time (assuming each lecture is of unit duration).
We can develop heuristics by observing the following: at any time, there is a set of lectures whose precedence constraints have been satisfied. If this set is smaller than M, we can schedule all of them; otherwise, we need to select a subset to schedule.
The subset selection can be based on several metrics:
Rank order lectures based on the length of the longest dependency chain that they are at the start of.
Rank order lectures based on the number of lectures that they are immediate prerequisites for.
Rank order lectures based on the total number of lectures that they are direct or indirect prerequisites for.
We can also use combinations of these criteria to order the lectures that are currently schedulable.
For example, for each vertex, we define its criticality to be the length of a longest path from it to a sink. We schedule lectures by processing vertices in topological order. At any point in our algorithm, we have a set of candidate lectures-these are the lectures whose prerequisites have already been scheduled.
If the candidate set is less than size M, we schedule all the lectures; otherwise, we choose the M most critical lectures and schedule those-the idea is that they should be scheduled sooner since they are at the start of longer dependency chains.
The criterion is heuristic and may not lead to optimum schedules-this is to be expected since the problem is NP-complete. Other heuristics may be employed, e.g., we may use the number of lectures that depend on lecture L as the criticality of lecture L or some combination of the criterion.
The Wikipedia article on PERT might be a useful place to start.

Resources