Limit stores - most efficient or easy approach? - algorithm

Say you have multiple vendors (lets say 5) all selling the same items for different prices. You must buy item A,B,C, D, E. Each vendor has item A,B C,D,E so you can easily go through each vendor and find the cheapest version of each item. However, say you are limited to only shopping from X vendors. So you must now find the cheapest combinations that make sure you do not use more vendors than allowed. How does one solve this problem without trying every combination of vendors?
Another way of phrasing an example.
for 5 shops, we are only allowed to use 2 shops to buy 5 items (with each item at a different price in different shops). How do we save the most money? Is there a method of solving without trying every combination

your search is about linear programming ... realy long to explain ^^ : https://en.wikipedia.org/wiki/Linear_programming

Related

Using scoring to find customers

I have a site where customers purchase items that are tagged with a variety of taxonomy terms. I want to create a group of customers who might be interested in the same items by considering the tags associated with purchases they've made. Rather than comparing a list of tags for each customer each time I want to build the group, I'm wondering if I can use some type of scoring to solve the problem.
The way I'm thinking about it, each tag would have some unique number assigned to it. When I perform a scoring operation it would render a number that could only be achieved by combining a specific set of tags.
I could update a customer's "score" periodically so that it remains relevant.
Am I on the right track? Any ideas?
Your description of the problem looks much more like a clustering or recommendation problem. I am not sure if those tags are enough of an information to use clustering or recommendation tough.
Your idea of the score doesn't look promising to me, because the same sum could be achieved in several ways, if those numbers aren't carefully enough chosen.
What I would suggest you:
You can store tags for each user. When some user purchases a new item, you will add the tags of the item to the user's tags. On periodical time you will update the users profiles. Let's say we have users A and B. If at the time of the update the similarity between A and B is greater than some threshold, you will add a relation between the users which will indicate that the two users are similar. If it's lower you will remove the relation (if previously they were related). The similarity could be either a number of common tags or num_common_tags / num_of_tags_assigned_either_in_A_or_B.
Later on, when you will want to get users with particular set of tags, you will just do a query which checks which users have that set of tags. Also you can check for similar users to given user, just by looking up which users are linked with the user in question.
If you assign a unique power of two to each tag, then you can sum the values corresponding to the tags, and users with the exact same sets of tags will get identical values.
red = 1
green = 2
blue = 4
yellow = 8
For example, only customers who have the set of { red, blue } will have a value of 5.
This is essentially using a bitmap to represent a set. The drawback is that if you have many tags, you'll quickly run out of integers. For example, if your (unsigned) integer type is four bytes, you'd be limited to 32 tags. There are libraries and classes that let you represent much larger bitsets, but, at that point, it's probably worth considering other approaches.
Another problem with this approach is that it doesn't help you cluster members that are similar but not identical.

Fulfilling maximum customer orders

There is an inventory of products like eg. A- 10Units, B- 15units, C- 20Units and so on. We have some customer orders of some products like customer1{A- 10Units, B- 15Units}, customer2{A- 5Units, B- 10Units}, customer3{A- 5Units, B- 5Units}. The task is fulfill maximum customer orders with the limited inventory we have. The result in this case should be filling customer2 and customer3 orders instead of just customer1.[The background for this problem is a realtime online retail scenario, where we have millions of customers and millions of products and we are trying to fulfill the orders as efficiently as possible]
How do I solve this?Is there an algorithm for this kind of problem, something like optimisation?
Edit: The requirement here is fixed. The only aim here is maximizing the number of fulfilled orders regardless of value. But we have millions of users and millions of products.
This problem includes as a special case a knapsack problem. To see why consider only one product A: the storage amount of the product is your bag capacity, the order quantities are the weights and each rock value is 1. Your problem is to maximize the total value you can fit in the bag.
Don't expect an exact solution for your problem in polynomial time...
An approach I'd go for is a random search: make a list of the orders and compute a solution (i.e. complete orders in sequence, skipping the orders you cannot fulfill). Then change the solution by applying a permutation on the orders and see if it's better.
Keep going with search until time runs out or you're happy with the solution.
It can be solved by DP.
Firstly sort all your orders with respect to A in increasing order.
Use this DP :
DP[n][m][o] = DP[n-a][m-b][o-c] + 1 where n-a>=0 and m-b >=0 o-c>=0
DP[0][0][0] = 1;
Do bottom up computation :
Set DP[i][j][k] = 0 , for all i =0 to Amax; j= 0 to Bmax; k = 0 to Cmax
For Each n : 0 to Amax
For Each m : 0 to Bmax
For Each o : 0 to Cmax
if(n>=a && m>=b && o>= c)
DP[n][m][o] = DP[n-a][m-b][o-c] + 1;
You will then have to find the max value of DP[i][j][k] for all values of i,j,k possible. This is your answer. - O(n^3)
Reams have been written about order fulfillment and yet no one has come up with a standard answer. The reason being that companies have different approaches and different requirements.
There are so many variables that a one size solution that fits all is not possible.
You would have to sit down and ask hundreds of questions before you could even start to come up with an approach tailored to your customers needs.
Indeed those needs might also vary, based on the time of year, the day of the week, what promotions are currently being run, whether customers are ranked, numbers of picking and packing staff/machinery currently employed, nature, size, weight of products, where products are in the warehouse, whether certain products are in fast/automated picking lines, standard picking faces or in bulk. The list can appear endless.
Then consider whether all orders are to be filled or are you allowed to partially fill an order and back-order out of stock products.
Does the entire order have to fit in a single box or are multiple box orders permitted.
Are you dealing with multiple warehouses and if so can partial orders be sent from each or do they have to be transferred for consolidation.
Should precedence be given to local or overseas orders.
The amount of information that you need at your finger tips before you can even start to plan a methodology to fit your customers specific requirements can be enormous and sadly, you are not going to get a definitive answer. It does not exist.
Whilst I realise that this is not a) an answer or b) necessarily a welcome post, the hard truth is that you will require your customer to provide you with immense detail as to what it is that they wish to achieve, how and when.
You job, initially, is the play devils advocate, in attempting to nail them down.
P.S. Welcome to S.O.

n! combinations, how to find best one without killing computer?

I'll get straight to it. I'm working on an web or phone app that is responsible for scheduling. I want students to input courses they took, and I give them possible combinations of courses they should take that fits their requirements.
However, let's say there's 150 courses that fits their requirements and they're looking for 3 courses. That would be 150C3 combinations, right?.
Would it be feasible to run something like this in browser or a mobile device?
First of all you need a smarter algorithm which can prune the search tree. Also, if you are doing this for the same set of courses over and over again, doing the computation on the server would be better, and perhaps precomputing a feasible data structure can reduce the execution time of the queries. For example, you can create a tree where each sub-tree under a node contains nodes that are 'compatible'.
Sounds to me like you're viewing this completely wrong. At most institutions there are 1) curriculum requirements for graduation, and 2) prerequisites for many requirements and electives. This isn't a pure combinatorial problem, it's a dependency tree. For instance, if Course 201, Course 301, and Course 401 are all required for the student's major, higher numbers have the lower numbered ones as prereqs, and the student is a Junior, you should be strongly recommending that Course 201 be taken ASAP.
Yay, mathematics I think I can handle!
If there are 150 courses, and you have to choose 3, then the amount of possibilities are (150*149*148)/(3*2) (correction per jerry), which is certainly better than 150 factorial which is a whole lot more zeros ;)
Now, you really don't want to build an array that size, and you don't have to! All web languages have the idea of randomly choosing an element in an array, so you get an element in an array and request 3 random unique entries from it.
While the potential course combinations is very large, based on your post I see no reason to even attempt to calculate them. This task of random selection of k items from n-sized list is delightfully trivial even for old, slow devices!
Is there any particular reason you'd need to calculate all the potential course combinations, instead of just grab-bagging one random selection as a suggestion? If not, problem solved!
Option 1 (Time\Space costly): let the user on mobile phone browse the list of (150*149*148) possible choices, page by page, the processing is done at the server-side.
Option 2 (Simple): instead of the (150*149*148)-item decision tree, provide a 150-item bag, if he choose one item from the bag, remove it from the bag.
Option 3 (Complex): expand your decision tree (possible choices) using a dependency tree (parent course requires child courses) and the list of course already taken by the student, and his track\level.
As far as I know, most educational systems use the third option, which requires having a profile for the student.

Have/Want List Matching Algorithm

Have/Want List Matching Algorithm
I am implementing an item trading system on a high-traffic site. I have a large number of users that each maintain a HAVE list and a WANT list for a number of specific items. I am looking for an algorithm that will allow me to efficiently suggest trading partners based on your HAVEs and WANTs matched with theirs. Ideally I want to find partners with the highest mutual trading potential (i.e. I have a ton of things you want, you have a ton of things I want). I don't need to find the global highest-potential pair (which sounds hard), just find the highest-potential pairs for a given user (or even just some high-potential pairs, not the global max).
Example:
User 1 HAS A,C WANTS B,D
User 2 HAS D WANTS A
User 3 HAS A,B,D WANTS C
User 1 goes to the site and clicks a button that says
"Find Trading Partners" and the top-ranked result is
User 3, followed by User 2.
An additional source of complexity is that the items have different values, and I want to match on the highest valued trade possible, rather than on the most number of matches between two traders. So in the example above, if all items are worth 1, but A and D are both worth 10, User 1 now gets matched with User 2 above User 3.
A naive way to do this would to compute the max trade value between the user looking for partners vs. all other users in the database. I'm thinking with some lookup tables on the right things I might be able to do better. I've tried googling around, since this seems like a classical problem, but I don't know the name for it.
Can anyone recommend a good approach to solving this problem? I've seen sites like the Magic Online Trading League that seem to solve it in realtime.
You could do this in O(n*k^2) (n is the number of people, k is the average number of items they have/want) by keeping hash tables (or, in a database, indexes) of all the people who have and want given items, then giving scores for all the people who have items the current user wants, and want items the current user has. Display the top 10 or 20 scores.
[Edit] Example of how this would be implemented in SQL:
-- Get score for #userid wants
SELECT UserHas.UserID, SUM(Items.Weight) AS Score
FROM UserWants
INNER JOIN UserHas ON UserWants.ItemID = UserHas.ItemID
INNER JOIN Items ON Items.ItemID = UserWants.ItemID
WHERE UserWants.UserID = #userid
GROUP BY UserWants.UserID, UserHas.UserID
This gives you a list of other users and their score, based on what items they have that the current user wants. Do the same for items the current user has the others want, then combine them somehow (add the scores or whatever you want) and grab the top 10.
This problem looks pretty similar to stable roomamates problem. I don't see any thing wrong with the SQL implementation that got highest votes but as some else suggested this is like a dating/match making problem similar to the lines of stable marriage problem but here all the participants are in one pool.
The second wikipedia entry also has a link to a practical solution in javascript which could be useful
You could maintain a per-item list (as a complement to per-user list). Item search is then spot on. Now you can allow your self brute force search for most valuable pair by checking most valuable items first. If you want more complex (arguably faster) search you could introduce set of items that often come together as meta-items, and look for them first.
Okay, what about this:
There are basically giant "Pools"
Each "pool" contains "sections." Each "Pool" is dedicated to people who own a specific item. Each section is for people who own that item, and want another.
What I mean:
Pool A (For those requesting A)
--Section B (For those requesting A that have B)
--Section C (For those requesting A that have C, even if they also have B)
Pool B
--Section A
--Section B
Pool C
--Section A
--Section C
Each section is filled with people.
"Deals" would consist of one "Requested" item, and a "Pack," you're willing to give any or all of the items up to get the item you requested.
Every "Deal" is calculated per-pool.... if you want a given item, you go to the pools of the items you'd be willing to give, and it find the Section which belongs to the item you are requesting.
Likewise, your deal is placed in the pools. So you can immediately find all of the applicable people, because you know EXACTLY which pools, and EXACTLY which sections to search in, no sorting necessary once they've entered the system.
And, then, age would have priority, older deals would be picked, rather than new ones.
Let's assume you can hash your items, or at least sort them. Assume your goal is to find the best result for a given user, on request, as in your original example. (Optimizing trading partners to maximize overall trade value is a different question.)
This would be fast. O(log n) for each insertion operation. Worst case O(n) for suggesting trading partners, but you bound this by processing time.
You're already maintaining a list of items per user.
Give each user a score equal to the sum of the values of the items they have.
Maintain a list of user-HAVES and user-WANTS per item (#Dialecticus), sorted by user score. (You can sort on demand, or keep the lists sorted dynamically every time a user changes their HAVE list.)
When a user user1 requests suggested trade partners
Iterate over their items item in order by value.
Iterate over the user-HAVES user2 for each item, in order by user score.
Compute trade value for user1 trades-with user2.
Remember best trade so far.
Keep hash of users processed so far to avoid recomputing value for a user multiple times.
Terminate when you run out of processing time (your real-time guarantee).
Sorting by item value and user score is the approximation that makes this fast. I'm not sure how sub-optimal it would be, though. There are certainly easy examples where this would fail to find the best trade if you don't run it to completion. In practice, it seems like it might be good enough. In the limit, you can make it optimal by letting it run until it exhausts the lists in step 4.1 and 4.2. There's extra memory cost associated with the inverted lists, but you didn't say you were memory constrained. And generally, if you want speed, it's not uncommon to trade-off space to get it.
I mark item by letter and user by number.
m - number of items in all have/want lists (have or want, not have and want)
x - number of users.
For each user you have list of his wants and haves. Left line is want list, right is have list (both will be sorted so we can use binary search).
1 - ABBCDE FFFGH
2 - CFGGH BE
3 - AEEGH BBDF
For each pair of users you generate two values and store them somewhere, you'd only generate it once and than actualize. Sorting first table and generating second, is O(m*x*log(m/x)) + O(log(m)) and will require O(x^2) extra memory. These values are: how many would first user get and how many another (if you want you can modify these values by multiplying them by value of particular item).
1-2 : 1 - 3 (user 1 gets 1) - (user 2 gets 3)
1-3 : 3 - 2
2-3 : 1 - 1
You also compute and store best trader for each user. After you've generated this helpful data you can quickly query.
Adding/Removing item - O(m*log(m/x)) (You loop through user's have/want list and do binary search on have/want list of every other user and actualize data)
Finding best connection - O(1) or O(x) (Depends on whether result stored in cache is correct or needs to be updated. You loop through user's pairs and do whatever you want with data to return to user the best connection)
By m/x I estimate number of items in single user's want/have list.
In this algorithm I'm assuming that all data isn't stored in Database (I don't know if binary search is possible with Databases) and that inserting/removing item into list is O(1).
PS. Sorry for bad english and I hope I've computed it all correctly and that it is working because I also need it.
Of course you could always seperate the system into three categories; "Wants," "Haves," and "Open Offers." So lets say User1 has Item A, User2 has Item B & C and is trading those for item A, but User1 still wants Item D, and User2 wants Item E. So User1 (assuming he's the trade "owner") puts a request, or want for Item D and Item E, thus the offer stands, and goes on the "Open Offers" list. If it isn't accepted or edited within two or so days, it's automatically cancelled. So User3 is looking for Item F and Item G, and searches on the "Have list" for Items F & G, which are split between User1 & User2. He realizes that User1 and User2's open offer includes requests for Items D & E, which he has. So he chooses to "join" the operation, and it's accepted on their terms, trading and swaping they items among them.
Lets say User1 now wants Item H. He simply searches on the "Have" list for the item, and among the results, he finds that User4 will trade Item H for Item I, which User1 happens to have. They trade, all is well.
Just make it BC only. That solves all problems.

Finding a subset of numbers that equals a single number

The reason I place this post is that I am looking to reconcile customer accounts receivable accounts where "payments" are posted to accounts instead of matched with the open invoices and cleared. So here is my issue:
Have a single number (payment) that should equal a subset of a given set of numbers (invoice amounts). Simple example:
Payment $10,002
Invoices values:
5001
2932
876
98
21
9923
2069
123
432
765
I would want a way to pull out 5001, 2932 and 2069 from this set.
Being a non-programmer, an Excel spreadsheet application is easiest for me to create. Ideas?
You're talking about an NP-Complete problem called Subset-sum.
Basically, this means that in general it is very computationally hard to compute the subset of prices that sums to your grand total. It is, however, very easy to check your answer since you merely sum your answers together.
My guess is, that if you want to examine N prices, you're going to have to use about 2^N cells in Excel to calculate this. The wikiepdia article linked above give some heuristics for approximating this.
Bottom line is, if you need to do this on a large scale (N is, say, in the thousands hundreds) you should rethink why you need to do this.
If you can find out a way to do it very efficiently, there may be a prize involved.
I worked on a very similar Java application that mapped receipts to accounts receivable transactions. We did not try to progammatically link summed receipts to a single transactions or vice-versa for a number of reasons. However, we did allow users to manually do that mapping. We just mapped receipt figures to transactions figures that matched, if there were multiple reciepts and transactions with the same amount, we only matched when there were the same number of duplicate amounts.

Resources