Knapsack problem with values per item and limited items - algorithm

I'm trying to solve a variant of knapsack problem that i haven't seen before.
in this variant we have a vector v consist of values per gram for each item and we also have a limited weight of each item and our goal is to find the maximum value that can be gain if we have a pack of size M.
I tried greedy approached but haven't found any solution. i think the most difficult part is to do it in O(n) because we shouldn't sort anything.
anyone has any idea?

If the value per gram has reasonably narrow bounds, you can counting-sort or radix-sort or bucket-sort it in linear time by the value per gram, and then just fill up the bucket in order of most valuable substances. What do I mean by reasonable limits? Specifically, I mean that there are asymptotically fewer meaningful "values per gram" than there are kinds of substances.

Related

Ordered Knapsack Problem Correctness/Proof

A thief is given the choice of n objects to steal, but only has one knapsack with a capacity of
taking M weight. Each object i has weight w_i, and profit p_i. Suppose he also knows the following:
the order of these items when sorted by increasing weight is the same as their order when sorted
by decreasing value. Give a greedy algorithm to find an optimal solution to this variant of the
knapsack problem. Prove the correctness and running time.
So the greedy algorithm I came up with was to sort the items based off of increasing weight which is also decreasing value. This means that the price per weight is in decreasing order. So the thief can take the highest valued item until the weight >= M. The running time would be O(n log n) since sorting takes O(n log n) and iterating through the list takes O(n). The part I am stuck on is the proof for correctness. Here is my proof so far:
Suppose there is an instance such that the solution stated above (referred to as GA) is not optimal. Let the optimal solution be referred to as OS, and the items taken by OS be sorted in increasing value. Since OS is more optimal than GA, then the profit earned from GA is less than or equal to the profit earned from OS. Since GA takes the item with the highest profit/weight ratio, then the first element, i, must be greater than or equal to the first element of OS. Because OS is more optimal, then there must exist a i that is greater than or equal to an item j in the set of GA. But because GA and OS are done on the same set, and GA is always taking the item with the highest profit/weight, there cannot be a i in OS that is greater than a j in GA.
Can anyone help with the proof? Thanks
Your approach to the solution is valid and the reasoning on the running time is correct. In the sequel, suppose that the input is "de-trivialized" in the sense that every occurring obejct actually fits into the knapsack and that it is impossible to select the entire input.
The sorting of the items that is generated by the sorting is both
decreasing in value
increasing weight
which makes it a special case of the general knapsack problem. The argumentation for the proof of correctnes is as follows. Let i' denote the breaking index which is the index of the first item in the sorted sequence which is rejected by the greedy algorithm. For clarity, call the corresponding object the breaking object. Note that
w_j > w_i' for each j > i'
holds, which means that that the greedy algorithm also rejects every object succeeding the breaking object (as it does not fit into the knapsack, just like the breaking object).
In total, the greedy algorithm selects a prefix of the sorted sequence; we aim at showing that any optimal solution (which we consider fixed in the sequel) is the same prefix.
Note that the optimal solution, as it it optimal, does not leave space for an additional object.
Aiming at a contradiction, let k be the minimal index which occurs in the greedy solution but not in the optimal solution. As it is impossible to select object k additionally into the optimal solution, there must (via minimality of k) be some item in the optimal solution with an index
k' > k
which permits an exchange of items in the optimal solution. As
w_k < w_k' and p_k > p_k'
hold, object k' can be replaced by object k in the optimal solution, which yields a solution with profit larger than the one of the optimal solution, which is a contradiction to its optimality.
Hence, there is no item in the greedy solution which is missing in the optimal solution, which means that the greedy solution is a subset of the optimal solution. On the other hand, the greedy solution is maximal with respect to inclusion, which means that the optimal solution cannot contain an item which is missing in the greedy solution.
Note that the greedy algorithm als is useful for the general knapsack problem; taking the better one of the greedy solution and an item with maximum profit yields an approximation algorithm with ratio 2.

Isn't this a correct but very efficient and simple way of solving the 0-1 knapsack?

As I understand it, in a 0-1 knapsack problem, only 0 or 1 objects of the same variant are allowed. Wouldn't it be better to just divide every weight by it's value to get the respectiv ratios and then just take every ratio beginning from the largest and put it in the knapsack until the maximum allowed weight is reached? Wouldn't its time complexity be better than the dynamic programming solution and obviously better than bruteforcing?
The point of the 0-1 Knapsack problem is to find if the maximum value occurs if an item is put into the knapsack or not included in the knapsack. This prevents the problem where including an item results in an unfillable space in the knapsack. A greedy approach that always includes an object could result in an unfillable space in the knapsack.

Algorithms for bucketizing integers into buckets with zero sums

Suppose we have an array of integers (both negative and positive) A[1 ... n] such that all the elements sum to zero. Now, whenever I have a bunch of integers that sum to zero, I will call them a group and I want to split A in as many disjoint groups as possible. Can you suggest any paper discussing this very same problem?
It sounds like your problem consists of two NP-Complete problems.
The first would be finding all subsets that solve the Subset Sum problem. This problem does have an exponential time complexity (as implied by amit in the comments), but it is a very reasonable extension of the Subset Sum problem from a theoretical standpoint. For example, if you can solve the Subset Sum problem by dynamic programming and generate the canonical 2D array as a result, this array will contain enough information to generate all possible solutions using a traceback.
The second NP-Complete problem embedded within your problem is the Integer Linear Programming problem. Given all possible subsets solving the Subset Sum problem, N total, we want to select select 0<=n<=N, such that the value of n is maximized and no element of A is repeated.
I doubt there is a publication devoted to describing this problem because it seems to involve a straightforward application of known theory.

Finding a single cluster of points with low variance

Given a collection of points in the complex plane, I want to find a "typical value", something like mean or mode. However, I expect that there will be a lot of outliers, and that only a minority of the points will be close to the typical value. Here is the exact measure that I would like to use:
Find the mean of the largest set of points with variance less than some programmer-defined constant C
The closest thing I have found is the article Finding k points with minimum diameter and related problems, which gives an efficient algorithm for finding a set of k points with minimum variance, for some programmer-defined constant k. This is not useful to me because the number of points close to the typical value could vary a lot and there may be other small clusters. However, incorporating the article's result into a binary search algorithm shows that my problem can be solved in polynomial time. I'm asking here in the hope of finding a more efficient solution.
Here is way to do it (from what i have understood of problem) : -
select the point k from dataset and calculate sorted list of points in ascending order of their distance from k in O(NlogN).
Keeping k as mean add the points from sorted list into set till variance < C and then stop.
Do this for all points
Keep track of set which is largest.
Time Complexity:- O(N^2*logN) where N is size of dataset
Mode-seeking algorithms such as Mean-Shift clustering may still be a good choice.
You could then just keep the mode with the largest set of points that has variance below the threshold C.
Another approach would be to run k-means with a fairly large k. Then remove all points that contribute too much to variance, decrease k and repeat. Even though k-means does not handle noise very well, it can be used (in particular with a large k) to identify such objects.
Or you might first run some simple outlier detection methods to remove these outliers, then identify the mode within the reduced set only. A good candidate method is 1NN outlier detection, which should run in O(n log n) if you have an R-tree for acceleration.

Maximum two-dimensional subset-sum

I'm given a task to write an algorithm to compute the maximum two dimensional subset, of a matrix of integers. - However I'm not interested in help for such an algorithm, I'm more interested in knowing the complexity for the best worse-case that can possibly solve this.
Our current algorithm is like O(n^3).
I've been considering, something alike divide and conquer, by splitting the matrix into a number of sub-matrices, simply by adding up the elements within the matrices; and thereby limiting the number of matrices one have to consider in order to find an approximate solution.
Worst case (exhaustive search) is definitely no worse than O(n^3). There are several descriptions of this on the web.
Best case can be far better: O(1). If all of the elements are non-negative, then the answer is the matrix itself. If the elements are non-positive, the answer is the element that has its value closest to zero.
Likewise if there are entire rows/columns on the edges of your matrix that are nothing but non-positive integers, you can chop these off in your search.
I've figured that there isn't a better way to do it. - At least not known to man yet.
And I'm going to stick with the solution I got, mainly because its simple.

Resources