problem statement
There are 5 buckets. Each n[many] of different products. Each product can repeat multiple times in the same bucket and across the bucket also. We want to derive products which should be based on two conditions:
Products counts should be considerably high across the buckets
same Product should have a considerably high count in each segment
i want list of products which satisfies the condition.
I tried knapsack algorithms. By providing random weights and profits. It seems wrong approach
In linear time you can find:
The total count of each products across all buckets
The count of each product in the bucket which has the least of it
If linear time is too slow, in sublinear time you could estimate the same by sampling the buckets.
In either case, you have the information (or an estimate thereof) that you need to make a decision.
After doing the above, you need to decide how you want to pick products -- essentially how you want to trade off the min vs total per product.
Related
I have a problem I need to solve that seems to be close to a Knapsack problem, but taken the other way around.
I was wondering if anyone knew the name of the exact problem - if it exists in combinatorics or combinatorial optimization - or if not, if you would have any leads or advice.
Basically, my problem is :
I have a finite set of products. All those products have a value associated.
I have a total associated with those products where I have a total value and a total quantity.
I want to find which products compose my total (both quantity and value).
More information :
1) My values are floats and are not unique.
2) The quantities of products are in [0, +inf]
3) I actually want to get the list of possible subsets
And basically, every total corresponds to a daily total.
The next day's products that compose the next day's total may come from both its daily products and the leftovers from the previous days.
Thank you for any indication or input on how to solve this problem.
I have two question about the differences between pointwise and pairwise learning-to-rank algorithms on DATA WITH BINARY RELEVANCE VALUES (0s and 1s). Suppose the loss function for a pairwise algorithm calculates the number of times an entry with label 0 gets ranked before an entry with label 1, and that for a pointwise algorithm calculates the overall differences between the estimated relevance values and the actual relevance values.
So my questions are: 1) theoretically, will the two groups of algorithms perform significantly differently? 2) will a pairwise algorithm degrade to pointwise algorithm in such settings?
thanks!
In point wise estimation the errors across rows in your data (rows with items and users, you want to rank items within each user/query) are assumed to be independent sort of like normally distributed errors. Whereas in pair wise evaluation the algorithm loss function often used is cross entropy - a relative measure of accurately classifying 1's as 1's and 0's as 0s in each pair (with information - i.e. one of the item is better than other within the pair).
So changes are that the pair wise is likely to learn better than point-wise.
Only exception I could see is a business scenario when users click items without evaluating/comparing items from one another per-say. This is highly unlikely though.
Suppose you have a warehouse that receives orders during the day. These order can either be withdrawals or additions of products to the warehouse. At the end of the day you get an inventory list of items still contained in the warehouse. Because the workforce is stretched quite thin it can happen that an order is not taken care of at the same day it is received in the ticket system of the warehouse. Therefore at the end of the day you have to match the issued orders against the inventory list of the warehouse to find out which ones have actually been executed and which orders are still open.
Codewise i've been solving this by several nested loops just aggregating and comparing the inventory positions trying to match the orders. Unfortunately this is not really efficient and with a large number of orders and positions the resulting problem takes quite some time to complete.
In order to improve that i want to identify the underlying problem. I.e. is it Set Cover, Knapsack or something else and based on the problem and whether it is in P or NP is there an efficient algorithm or at least an efficient heuristic to solve it?
As stated, we know you have the following sources of information:
an end-of day inventory list of items still contained in the warehouse
"issued orders" which are to be matched against the inventory list above
a "ticket system" about which nothing is known
"orders" coming in during the day, but we've no idea if or how they're stored
One solution is to create hash sets from the current and previous days' inventory lists, then as you iterate "issued orders", compare the order quantity with the difference between the inventory sets.
The time for this is:
the time to create two sets from unsorted (as far as we know) lists (if there's reason to care, the set for today can be kept for re-use tomorrow, halving this cost) - this is O(n) in the hash set size, and
the time to iterate over the "issued orders" and do two O(1) look-ups in the inventory hash sets: that's O(n) where n is the number of orders
Sounds pretty fast to me.
I am trying to come up with a algorithm for the following problem.
There is a set of N objects with M different variations of each object. The goal is to find which variation is the best for each object based on feedback from different users.
At the end, the users will be placed in a category to determine which category prefers which variation.
It is required that at most two variations of an object are placed side by side.
The problem with this is that if M is large then the number of possible combinations become too large and the user may become disinterested and potentially skew the results.
The Elo algorithm/score can be used once I know the order of selection from the user as discussed in this this post
Comparison-based ranking algorithm
Question:
Is there an algorithm that can reduce the number of possible combinations presented to a user and still get correct order?
example: 7 different types of fruits. Each fruit is available in 5 different shapes. The users give their ranking of 1-5 for each fruit based on the size they prefer. This means that for each fruit there are max 10 combinations the user has to choose from (since sizes are different, no point presenting as {1,1}). How would I reduce "10 combinations" ?
If the user's preferences are always consistent with a total order, and you can change comparisons to take account of the results of comparisons made so far, you just need an efficient sorting algorithm. For 5 items it seems that you need a minimum of 7 comparisons - see Sorting 5 elements with minimum element comparison. You could also look at http://en.wikipedia.org/wiki/Sorting_network.
In general, when you are trying to produce some sort of experimental design, you will often find that making random comparisons, although not optimum, isn't too far away from the best possible answer.
i'm doing a project in genetic algorithms and we need to build a software that chooses set of stocks based on their history.
we need to do it on genetic programming which means we need a fitness function and a chromosome.
right i thought to the fitness function by the positive diffrence between the avarge history of the stock and it real value.(so if it's matched it will be 0 ).
does anyone have any idea how to express the chromosome?
The problem doesn't seem to be well-defined. The fitness function you mentioned would give you a selection of stocks whose prices hover around their actual values, provided you know the actual value of the stocks.
Other possibilities:
First scenario:You are trying to select a set of the most promising stocks based on its historical performance i.e. maximize expected return and/or minimize variance/risk. If the number of possible stocks to choose from is not large, simplest option is to have a binary string: 0 representing no selection and 1 representing selection. The position corresponds to the index of the stock. If you have a very large number of possible stocks to choose from, you can encode the labels/indices of the stocks as your chromosome. This might mean a variable-length chromosome if you do not have a maximum cap on the number of stocks to be selected, and it would be harder to code.
Fitness function (to be maximized) would be the sum of (expected return - standard deviation) of selected stocks. The expected return could be formulated in two ways: expected future price - current price, or current price - underlying value (if you know the underlying value, that is). Expected future price can be estimated from historical data (e.g. fit a simple curve of ur choice, or apply ARIMA and extend to next time points). The standard deviation can be estimated directly from historical data.
If your chromosome is binary (values are 0/1), once you have the expected return and standard deviation, a simple dot product would do the computation needed. I suppose there may be a cap also on the number of stocks selected, in which case you have a constrained optimization problem. You can represent constraints as penalties in the fitness.
The problem is essentially a binary integer linear program (BILP) and you can benchmark the GA against other bilp solvers. With a decent mixed integer linear programming solver (e.g. symphony, gurobi, ibm cplex,etc), you can usually solve large problems faster than with a GA.
Second scenario: You are trying to find how many of what stocks to buy at current price to maximise expected return . Your chromosome here would be non negative integers, unless you want to represent shorting. The fitness would still be the same as in item (1), i.e. sum of prices of selected stocks, averaged over time, minus standard deviation of historical prices of selected stocks over time. The problem becomes an integer linear programming problem. Everything else is the same as in item (1). Again, if the number of stocks from which you can choose is large, you will find that a MILP solver would serve you much, much better than a GA.
Further, GP (genetic programming) is sufficiently different from GA.
If you are trying to evolve a stock selection strategy, or an expression that predicts stock prices in the future, you actually a GP. For the stock selection problem, a ga wld b sufficient.