Independence property of sub problems for dynamic programming techniques to apply - algorithm

Two criteria for an algorithm to be solved by dynamic programming technique is
Sub problems should be independent.
Sub problems should overlap .
I think I understand what overlapping means . It basically means that the subproblems have subsubproblems that may be the same . So instead of solving the subsubproblem over and over again we solve it once, put it in a hashtable or array and can look it up the nest time it is required . But what does point 1 ie independence of subproblems mean here ? If they have some common subsubproblems how can we call them to be independent? I mean it sounds very much counterintuitive to me at this stage .
Edit: This crtiteria is actually given in the famous book: Introduction to Algorithms by CLRS in the Dynamic Programming chapter.

Please tell us where you are reading that DP applies to problems with overlapping and independent sub-problems. I don't think that's correct, for the same intuitive reason you give-- if the problems overlap, they aren't independent.
I usually see independent sub-problems given as a criterion for Divide-And-Conquer style algorithms, while I see overlapping sub-problems and optimal sub-structure given as criteria for the Dynamic Programming family. (Intuitively, optimal substructure means that the best solution of a larger problem is composed of the best solutions of sub-problems. The classic example is the shortest path in a graph problem: If you know that the shortest path from A to B goes through C, then you also know that the part of the shortest path from A to B that goes through C happens to be the shortest path from A to C.)
UPDATE: Oh, I see-- yes, I guess they do mention independence. But I don't read that with the same emphasis that you are. Meaning, they mention independence in the context of, or as a way of understanding, the larger and more important concept of optimal substructure.
What they mean specifically by independence is that even if two problems overlap, they are "independent" in the sense that they don't interact-- the solution to one does not really depend on the solution to the other. They actually use the same example I did, the shortest path. Sub-problems of the shortest path problem are smaller shortest path problems that are independent: If the shortest path from A to B goes through C, then the shortest path from A to C doesn't use any edges in the shortest path from C to B. The longest path problem, by contrast, does not share that independence of sub-problems.
I don't think CLRS are wrong to bring up independence, but I do think the language they're using is a little ambiguous.

As offered in CLRS, the authors address the distinction between independent and overlapping properties of subproblems. They write,
"It may seem strange that dynamic programming relies on subproblems being both independent and overlapping. Although these requirements may sound contradictory, they describe two different notions, rather than two points on the same axis. Two subproblems of the same problem are independent if they do not share resources. Two subproblems are overlapping if they are really the same subproblem that occurs as a subproblem of different problems" (CLRS 3rd edition, 386).

I think these criteria have been worded badly because overlapping and independent have sort of a clashing meaning.
Anyway to be able to use effectively a DP approach you need to have
a problem that can be defined recursively in terms of simpler problems
a concept of partial solution in which the solution to the remaining part does't depend ho how you got to current point
Example: if you want to compute what is the maximum-sum path when moving in a matrix starting from the first row and having each step on next row and in the same or in an adjacent column you can use as "state" the current sum, current row and current column because for the solution it doesn't matter what was the path used to get to the current position.
1 4 [3] 2 1 4 9
2 1 [3] 1 2 3 1
9 [8] 3 0 1 2 9
0 [0] 2 4 1 6 3
1 2 [6] 3 0 4 1
In the schema above this path has a sum of 3+3+8+0+6. To maximize the sum you can observe that the maximum for paths passing from a certain point can be obtained as the maximum for getting there and the maximum for going from there to the end of the matrix. The solution can therefore be split in independent subproblems and you can cache the result of what is the maximum sum from a given point of the matrix to the end (independently on how you got to the point).
def maxsum(x, y):
if (x, y) in cache:
return cache[(x, y)]
if y == height - 1:
return matrix[y][x]
if x == 0:
left = -1
else:
left = matrix[y][x] + maxsum(x-1, y+1)
center = matrix[y][x] + maxsum(x, y+1)
if x == width-1:
right = -1
else:
right = matrix[y][x] + maxsum(x+1, y+1)
result = cache[(x, y)] = max(left, center, right)
return result
If I add to the rules that no more than three "9"s are allowed however you cannot use as state just the coordinates, because following subproblem (going to the end) will be influenced by the previous one (i.e. by how many "9" you already collected while getting to the intermediate position).
You can still use a dynamic programming approach, but with a larger state space by for example adding the numbers of collected "9" to the current state representation.
def maxsum(x, y, number_of_nines):
if (x, y, number_of_nines) in cache:
return cache[(x, y, number_of_nines)]
...

My understanding is sub problem should be solved independent of parent bigger problem. Like in backtracking the subproblem do depend on solutions you picked in bigger problems.

The subproblems are independent.
Independence is not there in divide and conquer.
For eg. in mergesort.
The subproblems are merged after dividing which means that the solution has had common subproblems. And everything is needed to be merged and not one path will give an answer.
Every subproblem share sub-subproblem which are needed to be solved in order to get the final answer.
(1, 4)
/ \
(1, 2) (3, 4)
/ \ / \
(1,1) (2,2) (3,3) (4,4)
\ / \ /
(1,2) (3, 4)
\ /
(1, 4)

I don't think subproblems should be dependent. In fact it would be great if subproblems are independent, but it's not necessary.
A good example for a dp problem with dependent subproblems is here:
Paint Houses - Algorithmic problems (paint house problem)
Here, the solution to subproblems depends on the color of the previous house. That dependency can be solved by adding a dimension to the dp array and building the solution based on the color of the previous house.

Related

Is dynamic programming backtracking with cache

I've always wondered about this. And no books state this explicitly.
Backtracking is exploring all possibilities until we figure out one possibility cannot lead us to a possible solution, in that case we drop it.
Dynamic programming as I understand it is characterized by overlapping sub-problems. So, can dynamic programming can be stated as backtracking with cache (for previously explored paths) ?
Thanks
This is one face of dynamic programming, but there's more to it.
For a trivial example, take Fibonacci numbers:
F (n) =
n = 0: 0
n = 1: 1
else: F (n - 2) + F (n - 1)
We can call the above code "backtracking" or "recursion".
Let us transform it into "backtracking with cache" or "recursion with memoization":
F (n) =
n in Fcache: Fcache[n]
n = 0: 0, and cache it as Fcache[0]
n = 1: 1, and cache it as Fcache[1]
else: F (n - 2) + F (n - 1), and cache it as Fcache[n]
Still, there is more to it.
If a problem can be solved by dynamic programming, there is a directed acyclic graph of states and dependencies between them.
There is a state that interests us.
There are also base states for which we know the answer right away.
We can traverse that graph from the vertex that interests us to all its dependencies, from them to all their dependencies in turn, etc., stopping to branch further at the base states.
This can be done via recursion.
A directed acyclic graph can be viewed as a partial order on vertices. We can topologically sort that graph and visit the vertices in sorted order.
Additionally, you can find some simple total order which is consistent with your partial order.
Also note that we can often observe some structure on states.
For example, the states can be often expressed as integers or tuples of integers.
So, instead of using generic caching techniques (e.g., associative arrays to store state->value pairs), we may be able to preallocate a regular array which is easier and faster to use.
Back to our Fibonacci example, the partial order relation is just that state n >= 2 depends on states n - 1 and n - 2.
The base states are n = 0 and n = 1.
A simple total order consistent with this order relation is the natural order: 0, 1, 2, ....
Here is what we start with:
Preallocate array F with indices 0 to n, inclusive
F[0] = 0
F[1] = 1
Fine, we have the order in which to visit the states.
Now, what's a "visit"?
There are again two possibilities:
(1) "Backward DP": When we visit a state u, we look at all its dependencies v and calculate the answer for that state u:
for u = 2, 3, ..., n:
F[u] = F[u - 1] + F[u - 2]
(2) "Forward DP": When we visit a state u, we look at all states v that depend on it and account for u in each of these states v:
for u = 1, 2, 3, ..., n - 1:
add F[u] to F[u + 1]
add F[u] to F[u + 2]
Note that in the former case, we still use the formula for Fibonacci numbers directly.
However, in the latter case, the imperative code cannot be readily expressed by a mathematical formula.
Still, in some problems, the "forward DP" approach is more intuitive (no good example for now; anyone willing to contribute it?).
One more use of dynamic programming which is hard to express as backtracking is the following: Dijkstra's algorithm can be considered DP, too.
In the algorithm, we construct the optimal paths tree by adding vertices to it.
When we add a vertex, we use the fact that the whole path to it - except the very last edge in the path - is already known to be optimal.
So, we actually use an optimal solution to a subproblem - which is exactly the thing we do in DP.
Still, the order in which we add vertices to the tree is not known in advance.
No. Or rather sort of.
In backtracking, you go down and then back up each path. However, dynamic programming works bottom-up, so you only get the going-back-up part not the original going-down part. Furthermore, the order in dynamic programming is more breadth first, whereas backtracking is usually depth first.
On the other hand, memoization (dynamic programming's very close cousin) does very often work as backtracking with a cache, as you describede.
Yes and no.
Dynamic Programming is basically an efficient way to implement a recursive formula, and top-down DP is many times actually done with recursion + cache:
def f(x):
if x is in cache:
return cache[x]
else:
res <- .. do something with f(x-k)
cahce[x] <- res
return res
Note that bottom-up DP is implemented completely different however - but still pretty much follows the basic principles of the recursive approach, and at each step 'calculates' the recursive formula on the smaller (already known) sub-problems.
However, in order to be able to use DP - you need to have some characteristics for the problem, mainly - an optimal solution to the problem consists of optimal solutions to its sub-problems. An example where it holds is shortest-path problem (An optimal path from s to t that goes through u must consist of an optimal path from s to u).
It does not exist on some other problems such as Vertex-Cover or Boolean satisfiability Problem , and thus you cannot replace the backtracking solution for it with DP.
No. What you call backtracking with cache is basically memoization.
In dynamic programming, you go bottom-up. That is, you start from a place where you don't need any subproblems. In particular, when you need to calculate the nth step, all the n-1 steps are already calculated.
This is not the case for memoization. Here, you start off from the kth step (the step you want) and go on solving the previous steps wherever required. And obviously keep these values stored somewhere so that you may access these later.
All these being said, there are no differences in running time in case of memoization and dynamic programming.

Efficient algorithm for eliminating nodes in "graph"?

Suppose I have a a graph with 2^N - 1 nodes, numbered 1 to 2^N - 1. Node i "depends on" node j if all the bits in the binary representation of j that are 1, are also 1 in the binary representation of i. So, for instance, if N=3, then node 7 depends on all other nodes. Node 6 depends on nodes 4 and 2.
The problem is eliminating nodes. I can eliminate a node if no other nodes depend on it. No nodes depend on 7; so I can eliminate 7. After eliminating 7, I can eliminate 6, 5, and 3, etc. What I'd like is to find an efficient algorithm for listing all the possible unique elimination paths. (that is, 7-6-5 is the same as 7-5-6, so we only need to list one of the two). I have a dumb algorithm already, but I think there must be a better way.
I have three related questions:
Does this problem have a general name?
What's the best way to solve it?
Is there a general formula for the number of unique elimination paths?
Edit: I should note that a node cannot depend on itself, by definition.
Edit2: Let S = {s_1, s_2, s_3,...,s_m} be the set of all m valid elimination paths. s_i and s_j are "equivalent" (for my purposes) iff the two eliminations s_i and s_j would lead to the same graph after elimination. I suppose to be clearer I could say that what I want is the set of all unique graphs resulting from valid elimination steps.
Edit3: Note that elimination paths may be different lengths. For N=2, the 5 valid elimination paths are (),(3),(3,2),(3,1),(3,2,1). For N=3, there are 19 unique paths.
Edit4: Re: my application - the application is in statistics. Given N factors, there are 2^N - 1 possible terms in statistical model (see http://en.wikipedia.org/wiki/Analysis_of_variance#ANOVA_for_multiple_factors) that can contain the main effects (the factors alone) and various (2,3,... way) interactions between the factors. But an interaction can only be present in a model if all sub-interactions (or main effects) are present. For three factors a, b, and c, for example, the 3 way interaction a:b:c can only be in present if all the constituent two-way interactions (a:b, a:c, b:c) are present (and likewise for the two-ways). Thus, the model a + b + c + a:b + a:b:c would not be allowed. I'm looking for a quick way to generate all valid models.
It seems easier to think about this in terms of sets: you are looking for families of subsets of {1, ..., N} such that for each set in the family also all its subsets are present. Each such family is determined by the inclusion-wise maximal sets, which must be overlapping. Families of pairwise overlapping sets are called Sperner families. So you are looking for Sperner families, plus the union of all the subsets in the family. Possibly known algorithms for enumerating Sperner families or antichains in general are useful; without knowing what you actually want to do with them, it's hard to tell.
Thanks to #FalkHüffner's answer, I saw that what I wanted to do was equivalent to finding monotonic Boolean functions for N arguments. If you look at the figure on the Wikipedia page for Dedekind numbers (http://en.wikipedia.org/wiki/Dedekind_number) the figure expresses the problem graphically. There is an algorithm for generating monotonic Boolean functions (http://www.mathpages.com/home/kmath094.htm) and it is quite simple to construct.
For my purposes, I use the algorithm, then eliminate the first column and last row of the resulting binary arrays. Starting from the top row down, each row has a 1 in the ith column if one can eliminate the ith node.
Thanks!
You can build a "heap", in which at depth X are all the nodes with X zeros in their binary representation.
Then, starting from the bottom layer, connect each item to a random parent at the layer above, until you get a single-component graph.
Note that this graph is a tree, i.e., each node except for the root has exactly one parent.
Then, traverse the tree (starting from the root) and count the total number of paths in it.
UPDATE:
The method above is bad, because you cannot just pick a random parent for a given item - you have a limited number of items from which you can pick a "legal" parent... But I'm leaving this method here for other people to give their opinion (perhaps it is not "that bad").
In any case, why don't you take your graph, extract a spanning-tree (you can use Prim algorithm or Kruskal algorithm for finding a minimal-spanning-tree), and then count the number of paths in it?

A greedy or dynamic algorithm to subset selection

I have a simple algorithmic question. I would be grateful if you could help me.
We have some 2 dimensional points. A positive weight is associated to them (a sample problem is attached). We want to select a subset of them which maximizes the weights and neither of two selected points overlap each other (for example, in the attached file, we cannot select both A and C because they are in the same row, and in the same way we cannot select both A and B, because they are in the same column.) If there is any greedy (or dynamic) approach I can use. I'm aware of non-overlapping interval selection algorithm, but I cannot use it here, because my problem is 2 dimensional.
Any reference or note is appreciated.
Regards
Attachment:
A simple sample of the problem:
A (30$) -------- B (10$)
|
|
|
|
C (8$)
If you are OK with a good solution, and do not demand the best solution - you can use heuristical algorithms to solve this.
Let S be the set of points, and w(s) - the weightening function.
Create a weight function W:2^S->R (from the subsets of S to real numbers):
W(U) = - INFINITY is the solution is not feasible
Sigma(w(u)) for each u in U otherwise
Also create a function next:2^S -> 2^2^S (a function that gets a subset of S, and returns a set of subsets of S)
next(U) = V you can get V from U by adding/removing one element to/from U
Now, given that data - you can invoke any optimization algorithm in the Artificial Intelligence book, such as Genetic Algorithm or Hill Climbing.
For example, Hill Climbing with random restarts, will be something like that:
1. best<- -INFINITY
2. while there is more time
3. choose a random subset s
4. NEXT <- next(s)
5. if max{ W(v) | for each v in NEXT} < W(s): //s is a local maximum
5.1. if W(s) > best: best <- W(s) //if s is better then the previous result - store it.
5.2. go to 2. //restart the hill climbing from a different random point.
6. else:
6.1. s <- max { NEXT }
6.2. goto 4.
7. return best //when out of time, return the best solution found so far.
The above algorithm is anytime - meaning it will produce better results if given more time.
This can be treated as a linear assignment problem, which can be solved using an algorithm like the Hungarian algorithm. The algorithm tries to minimize the sum of costs, so just negate your weights, and use them as the costs. The assignment of rows to columns will give you the subset of points that you need. There are sparse variants for cases where not every (row,column) pair has an associated point, but you can also just use a large positive cost for these.
Well you can think of this as a binary constraint optimization problem, and there are various algorithms. The easiest algorithm for this problem is backtracking and arc propogation. However, it takes exponential time in the worst case. I am not sure if there are any specific algorithms to take advantage of the geometrical nature of the problem.
This can be solved by a pretty straight forward dynamic programming approach with a exponential time complexity
s = {A, B, C ...}
getMaxSum(s) = max( A.value + getMaxSum(compatibleSubSet(s, A)),
B.value + getMaxSum(compatibleSubSet(s, B)),
...)
where compatibleSubSet(s, A) gets the subset of s that does not overlap with A
To optimize it, you can memorize the result for each subset
Some way to do it:
Write a function that generates subsets ordered from the subset off maximum weight to the subset off minimum weight while ignoring the constraints.
Then call this function repeatedly until a subset that honors the constraints pops up.
In order to improve the performance, you can write a not so dumb generator function that for instance honors the not-on-the-same-row constraint but that ignores the not-on-the-same-column one.

Algorithm design to assign nodes to graphs

I have a graph-theoretic (which is also related to combinatorics) problem that is illustrated below, and wonder what is the best approach to design an algorithm to solve it.
Given 4 different graphs of 6 nodes (by different, I mean different structures, e.g. STAR, LINE, COMPLETE, etc), and 24 unique objects, design an algorithm to assign these objects to these 4 graphs 4 times, so that the number of repeating neighbors on the graphs over the 4 assignments is minimized. For example, if object A and B are neighbors on 1 of the 4 graphs in one assignment, then in the best case, A and B will not be neighbors again in the other 3 assignments.
Obviously, the degree to which such minimization can go is dependent on the specific graph structures given. But I am more interested in a general solution here so that given any 4 graph structures, such minimization is guaranteed as the result of the algorithm.
Any suggestion/idea of solving this problem is welcome, and some pseudo-code may well be sufficient to illustrate the design. Thank you.
Representation:
You have 24 elements, I will name this elements from A to X (24 first letters).
Each of these elements will have a place in one of the 4 graphs. I will assign a number to the 24 nodes of the 4 graphs from 1 to 24.
I will identify the position of A by a 24-uple =(xA1,xA2...,xA24), and if I want to assign A to the node number 8 for exemple, I will write (xa1,Xa2..xa24) = (0,0,0,0,0,0,0,1,0,0...0), where 1 is on position 8.
We can say that A =(xa1,...xa24)
e1...e24 are the unit vectors (1,0...0) to (0,0...1)
note about the operator '.':
A.e1=xa1
...
X.e24=Xx24
There are some constraints on A,...X with these notations :
Xii is in {0,1}
and
Sum(Xai)=1 ... Sum(Xxi)=1
Sum(Xa1,xb1,...Xx1)=1 ... Sum(Xa24,Xb24,... Xx24)=1
Since one element can be assign to only one node.
I will define a graph by defining the neighbors relation of each node, lets say node 8 has neighbors node 7 and node 10
to check that A and B are neighbors on node 8 for exemple I nedd:
A.e8=1 and B.e7 or B.e10 =1 then I just need A.e8*(B.e7+B.e10)==1
in the function isNeighborInGraphs(A,B) I test that for every nodes and I get one or zero depending on the neighborhood.
Notations:
4 graphs of 6 nodes, the position of each element is defined by an integer from 1 to 24.
(1 to 6 for first graph, etc...)
e1... e24 are the unit vectors (1,0,0...0) to (0,0...1)
Let A, B ...X be the N elements.
A=(0,0...,1,...,0)=(xa1,xa2...xa24)
B=...
...
X=(0,0...,1,...,0)
Graph descriptions:
IsNeigborInGraphs(A,B)=A.e1*B.e2+...
//if 1 and 2 are neigbors in one graph
for exemple
State of the system:
L(A)=[B,B,C,E,G...] // list of
neigbors of A (can repeat)
actualise(L(A)):
for element in [B,X]
if IsNeigbotInGraphs(A,Element)
L(A).append(Element)
endIf
endfor
Objective functions
N(A)=len(L(A))+Sum(IsneigborInGraph(A,i),i in L(A))
...
N(X)= ...
Description of the algorithm
start with an initial position
A=e1... X=e24
Actualize L(A),L(B)... L(X)
Solve this (with a solveur, ampl for
exemple will work I guess since it's
a nonlinear optimization
problem):
Objective function
min(Sum(N(Z),Z=A to X)
Constraints:
Sum(Xai)=1 ... Sum(Xxi)=1
Sum(Xa1,xb1,...Xx1)=1 ...
Sum(Xa24,Xb24,... Xx24)=1
You get the best solution
4.Repeat step 2 and 3, 3 more times.
If all four graphs are K_6, then the best you can do is choose 4 set partitions of your 24 objects into 4 sets each of cardinality 6 so that the pairwise intersection of any two sets has cardinality at most 2. You can do this by choosing set partitions that are maximally far apart in the Hasse diagram of set partitions with partial order given by refinement. The general case is much harder, but perhaps you can still begin with this crude approximation of a solution and then be clever with which vertex is assigned which object in the four assignments.
Assuming you don't want to cycle all combinations and calculate the sum every time and choose the lowest, you can implement a minimum problem (solved depending on your constraints using either a linear programming solver i.e. symplex algorithm engines or a non-linear solver, much harder talking in terms of time) with constraints on your variables (24) depending on the shape of your path. You can also use free software like LINGO/LINDO to create rapidly a decision theory model and test its correctness (you need decision theory notions though)
If this has anything to do with the real world, then it's unlikely that you absolutely must have a solution that is the true minimum. Close to the minimum should be good enough, right? If so, you could repeatedly randomly make the 4 assignments and check the results until you either run out of time or have a good-enough solution or appear to have stopped improving your best solution.

Multiple Constraint Knapsack Problem

If there is more than one constraint (for example, both a volume limit and a weight limit, where the volume and weight of each item are not related), we get the multiply-constrained knapsack problem, multi-dimensional knapsack problem, or m-dimensional knapsack problem.
How do I code this in the most optimized fashion? Well, one can develop a brute force recursive solution. May be branch and bound.. but essentially its exponential most of the time until you do some sort of memoization or use dynamic programming which again takes a huge amount of memory if not done well.
The problem I am facing is this
I have my knapsack function
KnapSack( Capacity, Value, i) instead of the common
KnapSack ( Capacity , i ) since I have upper limits on both of those. can anyone guide me with this? or provide suitable resources for solving these problems for reasonably large n
or is this NP complete ?
Thanks
Merge the constraints. Look at http://www.diku.dk/~pisinger/95-1.pdf
chapter 1.3.1 called Merging the Constraints.
An example is say you have
variable , constraint1 , constraint2
1 , 43 , 66
2 , 65 , 54
3 , 34 , 49
4 , 99 , 32
5 , 2 , 88
Multiply the first constraint by some big number then add it to the second constraint.
So you have
variable , merged constraint
1 , 430066
2 , 650054
3 , 340049
4 , 990032
5 , 20088
From there do whatever algorithm you wanted to do with one constraint. The main limiter that comes to mind with this how many digits your variable can hold.
As a good example would serve the following problem:
Given an undirected graph G having positive weights and N vertices.
You start with having a sum of M money. For passing through a vertex i, you must pay S[i] money. If you don't have enough money - you can't pass through that vertex. Find the shortest path from vertex 1 to vertex N, respecting the above conditions; or state that such path doesn't exist. If there exist more than one path having the same length, then output the cheapest one. Restrictions: 1
Pseudocode:
Set states(i,j) as unvisited for all (i,j)
Set Min[i][j] to Infinity for all (i,j)
Min[0][M]=0
While(TRUE)
Among all unvisited states(i,j) find the one for which Min[i][j]
is the smallest. Let this state found be (k,l).
If there wasn't found any state (k,l) for which Min[k][l] is
less than Infinity - exit While loop.
Mark state(k,l) as visited
For All Neighbors p of Vertex k.
If (l-S[p]>=0 AND
Min[p][l-S[p]]>Min[k][l]+Dist[k][p])
Then Min[p][l-S[p]]=Min[k][l]+Dist[k][p]
i.e.
If for state(i,j) there are enough money left for
going to vertex p (l-S[p] represents the money that
will remain after passing to vertex p), and the
shortest path found for state(p,l-S[p]) is bigger
than [the shortest path found for
state(k,l)] + [distance from vertex k to vertex p)],
then set the shortest path for state(i,j) to be equal
to this sum.
End For
End While
Find the smallest number among Min[N-1][j] (for all j, 0<=j<=M);
if there are more than one such states, then take the one with greater
j. If there are no states(N-1,j) with value less than Infinity - then
such a path doesn't exist.
Knapsack with multiple constraints is a packing problem. Read up. http://en.wikipedia.org/wiki/Packing_problem
There are greedy like heuristics that calculate an "efficiency" for each item, that run quickly and yield approximate solutions.
You can use a branch and bound algorithm. You can get an initial lower bound using a greedy like heuristic, which can be used to initialize the incumbent solution. You can calculate upper bounds for various sub-problems by considering each of the m constraints one at time (relaxing the other constraints in the problem), then use the lowest of these bounds as an upper bound for the original problem. This technique is due to Shih. However this technique probably won't work well if no particular constraint tends to dominate the solution, or if the initial solution from the greedy like heuristic is not close to the optimum.
There are better more modern algorithms which are harder to implement, see "multidimensional knapsack problem" papers by J Puchinger!
As you said vol and weight both are positive quantities, try to use that fact that weight always decreases:
knap[position][vol][t]
Now t=0 when wt is positive, t=1 when wt is negative.

Resources