Knapsack problem with dependent selection - knapsack-problem

Just as classic knapsack problem, we want to maximize the total value while do not let the total weight exceed the capacity, and their values and weights are independent. But, for some items, if you want to select it, you have to select some other items.
For exmaple: There are item_1, item_2, ..., item_n. If you want to select item_1, you have to select item_3 and item_5, and if you want to select item_3, you have to select item_2, item_7, item_9... etc.
The dependencies are independent, that is, if we draw the dependency graph, it is just a "directed graph".
First, I noticed "precedence constrained knapsack problem" and "partially ordered knapsack problem", but in my problem, the dependency doesn't follow antisymmetric (that is, the dependency graph may contain cycles).
The closest problem I found is "set-union knapsack problem"
Given a set of items, select the subset with largest total value, subject to the constraint that the total weight of the items selected does not exceed a fixed capacity. The total value of a set of items is the sum of the individual values and the total weight is the sum of the individual weights. In the Set-Union Knapsack Problem, the items each have a value, but instead of a weight, each corresponds to a set of elements. Each element has a weight. The total value of a set of items is sum of the individual values, but the total weight is the sum of the weights of the elements in the union of the corresponding sets.
But it only unions the "weights", the value of some items may accumulate several times.
Is there any way to efficiently solve this problem?
EDITED:
I found a way which I can leverage some approximation algorithm
step 1. Make a directed dependency graph
step 2. Transfer this graph to component graph (use DFS to find strongly connected component) to remove cycle
step 3. So now, this become a "precedence constrained knapsack problem" or "partially ordered knapsack problem". These are strongly NP-complete but there were a lot of papers talk about this, and can find a approximation algorithm to solve.

Before selecting the item you have to check weather item will create cycle or not if it creates cycle then discard it and move to next item. for that you can use Kruskal's algorithms.

Related

Optimization problem in connected graphs with profits

I am trying to develop an algorithm to solve a problem that I am not able to classify, I expose the subject:
You have a map divided into sections that have a certain area and where a certain number of people live.
The problem consists of finding sets of connected sections whose area does not exceed a certain value maximizing the number of selected inhabitants.
For now I can think of two approaches:
Treat the problem as an all-pairs shortest paths problem in an
undirected graph with positive natural values where the solutions
that do not meet the constraint of the maximum selected area will be
discarded. For this you could use the Floyd-Warshall algorithm,
Dijkstra for all pairs or Thorup algorithm (which can be done in time
V * E, where these are the vertices and edges of the graph).
Treat it as an open vehicle routing problem with profits where each
vehicle can start and end wherever it wants (open vehicle routing
problem with profits or OVRPP).
Another aproach
Also, depending on the combinatorics of the particular problem it is possible in certain cases to use genetic algorithms, together with tabu search, but this is only for cases where finding an optimal solution is inadmissible.
To be clearer, what is sought is to obtain a selection of connected sections whose sum of areas does not exceed a total area. The parameter to maximize is the sum of populations of the selected sections. The objective is to find an optimal solution.
For example, this is the optimal selection with max area of 6 (red color area)
Thank you all in advance!
One pragmatic approach would be to formulate this as an instance of integer linear programming, and use an off-the-shelf ILP solver. One way to formulate this as an ILP instance is build a graph with one vertex per section and an edge between each pair of adjacent sections; then, selecting a connected component in that graph is equivalent to selecting a spanning tree for that component.
So, let x_v be a set of zero-or-one variables, one for each vertex v, and let y_{u,v} be another set of zero-or-one variables, one per edge (u,v). The intended meaning is that x_v=1 means that v is one of the selected sections; and that y_{u,v}=1 if and only if x_u=x_v=1, which can be enforced by y_{u,v} >= x_u + x_v - 1, y_{u,v} <= x_u, y_{u,v} <= x_v. Also add a constraint that the number of y's that are 1 is one less than the number of x's that are 1 (so that the y's form a tree): sum_v x_v = 1 + sum_{u,v} y_{u,v}. Finally, you have a constraint that the total area does not exceed the maximum: sum_v A_v x_v <= maxarea, where A_v is the area of section v.
Then your goal is to maximize sum_v P_v x_v, where P_v is the population of section v. Then the solution to this integer linear programming problem will give the optimal solution to your problem.

Maximum Diversity: translate an heuristic algorithm in C (or pseudocode)

I have a set of N items and I know their mutual distances. every element has a cost and I have a budget. I should accomplish the following task: suppose I put an Item in the basket, the following item in the basket will be the item whose distance is the maximum from the first (under budget constraint) the third item will be the item whose sum of distances from item1 and item2 is the maximum (under budget constraints), a forth item will be the one whose sum of distances from item 1,2 and 3 is the maximum (always budget) etc. How do I find the subset whose total distance (computed as above) is max? Do you know any algorithm to solve this problem? thanks in advance
UPDATE: I've done some research and this problem is called Maximum Diversity Problem. I can't traslate the heuristic algorithm (that would solve the problem) stated above in C or pseudocode!
This is an interesting question. If I understand correctly you are trying to find a path with maximum distance given a budget.
Let us imagine the items here as a connected graph thus we can use tools from graph theory. The edges are the costs and the vertices or nodes are the actual items. Essentially it seems you want to find a maximum path under budget constraints so a reverse dijkstra algorithm.
Steps:
Select starting vertex
Evaluate distance from starting point.
Select vertex with maximum distance if this is above your budget go to the next one burning the edge that was above your budget
Calculate distance between the newly added item to the others as the sum of the path to get to the item + the cost of choosing the other item (i.e. first iteration say we got item 1 then went to item 2 then the distance between item 2 and item x would be item 1 + item 2 +item x)
Select maximum again if above budget go to the next maximum burning the edge to the maximum that would be above your budget.
Repeat until budget exhausted
Hope this helps feel free to ask for clarification if this makes sense. I suggest some background reading on graph theory and associated algorithms

Maximum weight matching (MWM) for a predefined number of nodes

I am given a weighted graph and want to find a set of edges such that every node is only incident to one edge and such that the sum of the selected edge weights is maximized. As far as I know this problem is generally referred to as maximum weight matching and there exist fast approximations for it: https://web.eecs.umich.edu/~pettie/papers/ApproxMWM-JACM.pdf
However, for my application it would be better if only a certain ratio of nodes is paired. It's more important for my application that the nodes that get paired have a high weight between them. Leaving some nodes unpaired is no big problem.
Currently, I sort the weights between nodes in descending order and always select the edge with the highest weight until I have paired a certain number of nodes. Of course I ensure that pairs of nodes are mutually exclusive. This is only a 1/2 approximation to the original problem and it's probably even worse for the modified problem.
Could you please suggest an algorithm for this issue or tell me how this problem is called?
Some thoughts.
The greedy algorithm for this problem is a 2-approximation. Imagine that, during the execution of the greedy algorithm, we keep score versus an optimal solution in the following manner. Every time we add an edge to the greedy solution, we delete the incident edges in the optimal solution, which I claim must have weight no greater than the greedy edge. If no edge would be deleted from the optimal solution, we delete the two heaviest edges instead, which also I claim must have weight no greater than the greedy edge. Since the greedy solution must have at least half as many edges as the optimal solution, I claim that there are no optimal edges left at the end, and hence greedy is a 2-approximation because we never deleted more than 2x weight with each greedy edge.
A complete 20,000-vertex graph is right on that line where I don't know whether integer programming would be a good idea. I think it's still worth a try because it's easy enough.
There are polynomial-time algorithms for computing the maximum-weight independent set in the intersection of two matroids (for this problem, the matching matroid and the uniform matroid whose bases have size equal to the size of the desired matching). I don't know if they would be practical.

How to select k nodes in a fully connected graph with max separation between any pair of nodes?

Supposing I have a fully connected graph of N nodes, and I know the weight between any two pairs of nodes. How do I select k nodes such that I maximize the minimum distance between any pair of nodes?
I mapped this problem as a more general case of the one I actually want to solve, which I've dubbed the cheating students problem (I don't know if it has an actual name).
Cheating Students problem:
Given an N.M matrix, how to select k cells with maximum distance between any pair of cells? You could assume the matrix is a classroom where k cheating students are giving a test. No pair of students should be close to each other, and thus we want to maximize the minimum distance between any pair.
Your generalized graph problem appears to be very closely related to the maximum independent set problem described in https://en.wikipedia.org/wiki/Independent_set_%28graph_theory%29, which is NP-complete. I can find a maximum independent set by running a binary chop to find the largest k for which an algorithm solving your graph problem returns a minimum distance greater than 1. Since finding a maximum independent set is hard, I think your generalized problem is hard.
I don't see an easy way to solve the matrix problem, either, but the related problem of packing circles as efficiently as possible on a 2-d surface of infinite size has been solved, and the answer is what is called a hexagonal packing (https://en.wikipedia.org/wiki/Circle_packing) which confusingly is based on a triangular tiling (https://en.wikipedia.org/wiki/Triangular_tiling - "The vertices of the triangular tiling are the centers of the densest possible circle packing").
So for finite matrices and numbers of students it is possible that arranging the students in widely separated rows, with the rows staggered so that each student is centered between the pair of students nearest them in the row in front of them and behind them, is not too far from optimal - or at least a good place from which to start some sort of hill-climbing attempt.

How to find subpackages that don't cause cyclic dependencies

We have a rule that disallows cyclic dependencies between packages.
We also have a rather huge package that needs some splitting.
The question is: How can I identify a/all/maximum subset of classes, that can be extracted from the package into a new package without creating a cyclic dependency.
Is there a well known algorithm for that?
A variation would be awesome in which on can define a maximum number of dependencies that can be ignored by the algorithm.
Rather obviously the subset(s) should be not identical to the package, nor empty.
In case of a maximum subset it should be smaller than one half of the original package.
Basically, your classes, objects, or what have you, are stored in a matrix (called adjacency matrix) that represents a directed graph (with or without cycles). See the graph below and the corresponding adjacency matrix.
From this, we can calculate the reachability matrix, which describes to which nodes can one travel from the current node. For this graph, the reachability matrix is
You need an algorithm that rearranges the rows and the columns of the matrix, so that all non-zero elements are below the main diagonal. A sequence of object indexes for which this is true can be executed in the order in which they appear in the matrix, and all necessary dependencies for each object would be satisfied. If the graph is known to be acyclic, this can be achieved by topological sorting.
When cycles appear in the directed graph, you won't be able to find an ordering for which this is true.
Enter Design/Dependency Structure Matrix (DSM). A so called partitioning algorithm can be implemented to divide the objects into levels. For each of those levels, the objects can be executed in arbitrary order, and are not dependent one or another. For the graph above, nodes 3, 4 and 5 are not dependent on each other and can be executed in any order.
A partitioning algorithm has been developed in (Warfield 1973), which is able to detect and isolate cycles in the DSM. This is similar to the topological sorting algorithm, but with usage of the reachability matrix to detect and isolate cycles.
The algorithm briefly:
Create a new partition level
Calculate the reachability and the antecedent sets R(s) and A(s)
For each element in the DSM, calculate the set product R(s)A(s)
If R(s)A(s)=R(s), then add the element s to the current level
Remove element s from the list, and all references to it from the reachability and antecedent sets of all other elements.
Repeat from 1 if the item list is not empty.
The antecedent set A(s) is the set of row indices of non-zero elements in column s, while the reachability set R(s) is the set of column indices of the non-zero elements of s.
Finally, some pseudocode (in VB.NET, no less):
CalculateInitialAntecedentSets()
CalculateInitialReachabilitySets()
While UnlabelledItems > 0
Sequence.AddNewPartitionLevel()
For Each s In ReachabilityMatrix
If NoDependencies(s) and AlreadyConsidered(s) Then
AddToLevel(CurrentLevel, s)
End If
Next
RemoveDependencies(ReachabilitySets, Sequence.Level(CurrentLevel))
RemoveDependencies(AntecedentSets, Sequence.Level(CurrentLevel))
UpdateConsideredList(Sequence.Level(CurrentLevel))
Unlabelled = Unlabelled - Sequence.Level(CurrentLevel).Count
CurrentLevel = CurrentLevel + 1
End While
(This was the topic of my Master thesis some years ago)
Warfield, John N. (1973), `Binary matrices in system modelling', IEEE Transactions on Systems, Man, and Cybernetics SMC-3(5), 441--449.
Just an idea:
Build a directed graph where your classes are nodes and dependencies are edges. Detect all strongly connected components. Calculate their weight (= number of nodes/classes). Now you have balanced partition problem - to partition a set of component weights into two subsets with minimal differences between their sums.
The algorithm you're looking for is topological sorting. Simply extract items until you encounter a cycle.

Resources