Knapsack variation / combinatorial optimization - algorithm

I have a problem I need to solve that seems to be close to a Knapsack problem, but taken the other way around.
I was wondering if anyone knew the name of the exact problem - if it exists in combinatorics or combinatorial optimization - or if not, if you would have any leads or advice.
Basically, my problem is :
I have a finite set of products. All those products have a value associated.
I have a total associated with those products where I have a total value and a total quantity.
I want to find which products compose my total (both quantity and value).
More information :
1) My values are floats and are not unique.
2) The quantities of products are in [0, +inf]
3) I actually want to get the list of possible subsets
And basically, every total corresponds to a daily total.
The next day's products that compose the next day's total may come from both its daily products and the leftovers from the previous days.
Thank you for any indication or input on how to solve this problem.

Related

calculating the Optimal list of things

problem statement
There are 5 buckets. Each n[many] of different products. Each product can repeat multiple times in the same bucket and across the bucket also. We want to derive products which should be based on two conditions:
Products counts should be considerably high across the buckets
same Product should have a considerably high count in each segment
i want list of products which satisfies the condition.
I tried knapsack algorithms. By providing random weights and profits. It seems wrong approach
In linear time you can find:
The total count of each products across all buckets
The count of each product in the bucket which has the least of it
If linear time is too slow, in sublinear time you could estimate the same by sampling the buckets.
In either case, you have the information (or an estimate thereof) that you need to make a decision.
After doing the above, you need to decide how you want to pick products -- essentially how you want to trade off the min vs total per product.

Algorithmic Identification / Engineering

Suppose you have a warehouse that receives orders during the day. These order can either be withdrawals or additions of products to the warehouse. At the end of the day you get an inventory list of items still contained in the warehouse. Because the workforce is stretched quite thin it can happen that an order is not taken care of at the same day it is received in the ticket system of the warehouse. Therefore at the end of the day you have to match the issued orders against the inventory list of the warehouse to find out which ones have actually been executed and which orders are still open.
Codewise i've been solving this by several nested loops just aggregating and comparing the inventory positions trying to match the orders. Unfortunately this is not really efficient and with a large number of orders and positions the resulting problem takes quite some time to complete.
In order to improve that i want to identify the underlying problem. I.e. is it Set Cover, Knapsack or something else and based on the problem and whether it is in P or NP is there an efficient algorithm or at least an efficient heuristic to solve it?
As stated, we know you have the following sources of information:
an end-of day inventory list of items still contained in the warehouse
"issued orders" which are to be matched against the inventory list above
a "ticket system" about which nothing is known
"orders" coming in during the day, but we've no idea if or how they're stored
One solution is to create hash sets from the current and previous days' inventory lists, then as you iterate "issued orders", compare the order quantity with the difference between the inventory sets.
The time for this is:
the time to create two sets from unsorted (as far as we know) lists (if there's reason to care, the set for today can be kept for re-use tomorrow, halving this cost) - this is O(n) in the hash set size, and
the time to iterate over the "issued orders" and do two O(1) look-ups in the inventory hash sets: that's O(n) where n is the number of orders
Sounds pretty fast to me.

maximize profit with n products satisfying certain constraints

I am given a list of n products with associated profits and costs per unit. The aim is to maximize the profits while keeping the total cost below some threshold. For each product either one or zero are produced.
Now suppose we have three products and Suppose we label these products 1,2 and 3. Then all possible combinations of productions can be given as the binary numbers 111,110,101,011,100,010,001 and 000, where a 1 in the i^th position denotes a production of one of product i and similarly for zero. We could then easily check which of these combinations has a production cost under the threshold and has the maximum profit. This algorithm would then be of order O(2^n) because for n products we have to check 2^n binary numbers. We can probably make this a little faster by recognizing that if 100 is above the threshold already we need not check 110 and 111 and some stuff like this but the order will not change because of this. How can I make a smarter algorithm maybe that has a better time complexity. The n can be as large as 100 in which case checking 2^100 numbers is not possible. Thanks in advance
If your costs are integers that are not too big, you can use the dynamic programming solution for the knapsack problem, which is listed in the link mentioned in David Eisenstat's comment. If your costs are either big integers or fractional, then your best bet is using one of the existing knapsack solvers that e.g. reduce to an integer linear programming problem and then do something like branch and bound in order to solve. At any rate, your problem IS the knapsack problem, with the only slight modification that you don't have to fill the knapsack completely, you can fill it partially as long as you don't overfill it. However this variant is also studied along with the original formulation, and there are solvers for it. Also it is easy to modify the dynamic programming solution to handle this, let me know if it's unclear how and I'll update my answer with an explanation.

Group incoming and outgoing invoices to make their sum 0

I've faced an interesting problem today, and decided to write an algorithm in C# to solve it.
There are incoming invoices with negative totals and outgoing invoices with positive totals. The task is to make groups out of these invoices, where the total of the invoices adds up to exactly 0. Each group can contain unlimited members, so if there are two positive and one negative members but they total value is 0, it's okay.
We try to minimize the sum of the remaining invoices' totals, and there are no other constraints at all.
I'm wondering if this problem could be traced back to a known problem, and if not, which would be the most effective way to do this. The naive approach would be to separate incoming and outgoing invoices into two different groups, sort by total, then to try add invoices one by one until zero is reached or the sign has changed. However, this presumes that the invoices in a group should be approximately of the same magnitude, which is not true (one huge incoming invoice could be put against 10 smaller outgoing ones)
Any ideas?
The problem you are facing is a well known and studied one, and is called The Subset Sum Problem.
Unfortunately, the problem is NP-Complete, so there is no known polynomial solution for it1.
In fact, there is no known polynomial solution to even determine if such a subset (even a single one) exists, let alone find it.
However, if your input consists of relatively small (absolute value) integers, there is a pretty efficient (pseudo polynomial) dynamic programming solution that can be utilized to solve the problem.
If this is not the case some other alternatives are:
Using exponential solution like brute force (you might be able to optimize it using branch and bound technique)
Heuristical solutions, such as Steepest Ascent Hill Climbing or Genethic Algorithms.
Approximation algorithms
(1) And most computer science researchers believe one does not exist, this is basically the P VS NP Problem.

how to represent values of stock in a polynom?

i'm doing a project in genetic algorithms and we need to build a software that chooses set of stocks based on their history.
we need to do it on genetic programming which means we need a fitness function and a chromosome.
right i thought to the fitness function by the positive diffrence between the avarge history of the stock and it real value.(so if it's matched it will be 0 ).
does anyone have any idea how to express the chromosome?
The problem doesn't seem to be well-defined. The fitness function you mentioned would give you a selection of stocks whose prices hover around their actual values, provided you know the actual value of the stocks.
Other possibilities:
First scenario:You are trying to select a set of the most promising stocks based on its historical performance i.e. maximize expected return and/or minimize variance/risk. If the number of possible stocks to choose from is not large, simplest option is to have a binary string: 0 representing no selection and 1 representing selection. The position corresponds to the index of the stock. If you have a very large number of possible stocks to choose from, you can encode the labels/indices of the stocks as your chromosome. This might mean a variable-length chromosome if you do not have a maximum cap on the number of stocks to be selected, and it would be harder to code.
Fitness function (to be maximized) would be the sum of (expected return - standard deviation) of selected stocks. The expected return could be formulated in two ways: expected future price - current price, or current price - underlying value (if you know the underlying value, that is). Expected future price can be estimated from historical data (e.g. fit a simple curve of ur choice, or apply ARIMA and extend to next time points). The standard deviation can be estimated directly from historical data.
If your chromosome is binary (values are 0/1), once you have the expected return and standard deviation, a simple dot product would do the computation needed. I suppose there may be a cap also on the number of stocks selected, in which case you have a constrained optimization problem. You can represent constraints as penalties in the fitness.
The problem is essentially a binary integer linear program (BILP) and you can benchmark the GA against other bilp solvers. With a decent mixed integer linear programming solver (e.g. symphony, gurobi, ibm cplex,etc), you can usually solve large problems faster than with a GA.
Second scenario: You are trying to find how many of what stocks to buy at current price to maximise expected return . Your chromosome here would be non negative integers, unless you want to represent shorting. The fitness would still be the same as in item (1), i.e. sum of prices of selected stocks, averaged over time, minus standard deviation of historical prices of selected stocks over time. The problem becomes an integer linear programming problem. Everything else is the same as in item (1). Again, if the number of stocks from which you can choose is large, you will find that a MILP solver would serve you much, much better than a GA.
Further, GP (genetic programming) is sufficiently different from GA.
If you are trying to evolve a stock selection strategy, or an expression that predicts stock prices in the future, you actually a GP. For the stock selection problem, a ga wld b sufficient.

Resources