Optimizing Algorithm for Price of Delivery - algorithm

I have a customer who ordered 3 items online.
Then I have a list of store for each item, sorted by cheapest delivery rate.
For example I have 5 stores. Then there is 5^3 = 125 combinations of stores.
So
Item 1 - store 1, store 9, store 4, store 3, store 2
Item 2 - store 9, store 10, store 1, store 2, store 5
Item 3 - store 5, store 1, store 4, store 8, store 7
So store 1,9,5 have the lowest delivery rate respectively for Items 1, 2, and 3.
But you can see that I can send both Item 1 and 2 from store 9 and store 2, and I can send all three items from store 1.
When sending a package, we might use a box with a certain dimensions, and maybe sending Items 1 and 2 from store 9 will be cheaper than sending Item 1 from store 1 and Item 2 from store 9.
The same applies to store 1. Maybe sending all 3 items from store 1 in a box will be cheaper than sending them separately from stores 1, 9, and 5.
Right now I am thinking about checking the box delivery rate of all the stores that contains 2 or more items and trying to determine the lowest price.
Know that sometime the customer can order more than 10 items and the number of combinations will then be 5^10+ which is huge.
I am wondering if there is any quicker way to find the best price.

The way I would approach this is using integer programming, defining two sets of variables.
The first set of variables would be binary variables that indicate whether an item is sent from a particular store. In your example, there would be 15 such binary variables, one for each item-store pairing. We can call the binary variable for item i and store s x_is.
The other set of variables would be a binary indicator for whether we ship any items from store s. In your example, there would be 9 such binary variables (you have stores 1, 2, 3, 4, 5, 7, 8, 9, and 10). We can call the binary variable for store s y_s.
Then you would need to add constraints that make sure that an item i is sent from a store s (aka x_is = 1) only when we ship from that store (aka y_s = 1). You can do this by adding a constraint x_is <= y_s for all items i and stores s.
Now you can build an objective that separately indicates the cost of providing an item i from store x (these are the coefficients on the x_is variables) and the cost of shipping a package from store s (these are the coefficients on the y_s variables). Your goal would be to minimize this objective.
You can solve these models using any number of different programming languages. One of the simplest might by the Excel Solver package, though there are integer programming solvers in all major programming languages.

Related

Calculating total cost per customer from a large data file

I have task where I have to read a big file and process the data within. Every row in the file it look like this:
CustomerId ItemId Amount Price
I then need to calculate the total cost for a customer, but first I need to work out the most expensive item purchased. I then having to substract the most expensive item from the total cost.
My idea is first I can make this table:
CustomerId ItemId Total_Cost
Then I sort the table and find highest cost and store this in a variable.
Then I can make this table:
CustomerId Total_Cost
Then I'll subtract the highest cost from each row.
I feel that this is a brute force approach, and I was wondering if there is a more clever and efficient way to do this. Also, I need advice on which library to use. I am confused as to which is best for this problem: Spark, Storm, Flume, or Akka-Stream.
You can do this faster by keeping track of the most expensive item purchased by each customer.
Lets asume your data is:
4, 34, 2, 500
4, 21, 1, 700
4, 63, 5, 300
On the first line, customer 4 purchases 2 items of 500. You do not add this yet to total cost because at this point this purchase is most expensive.
When line 2 comes, you compare this purchase against your most expensive, if more than replace most expensive and add previous most expensive to totalcost. If less, add to totalcost.

Apriori Algorithm- frequent item set generation

I am using Apriori algorithm to identify the frequent item sets of the customer.Based on the identified frequent item sets I want to prompt suggest items to customer when customer adds a new item to his shopping list, As the frequent item sets I got the result as follows;
[1],[3],[2],[5]
[2.3],[3,5],[1,3],[2,5]
[2,3,5]
My problem is if I consider only [2,3,5] set to make suggestions to customer am I wrong? i.e If customer adds item 3 to his shopping list I would recommend item 2 and item 5. If customer adds item 1 to the shopping list no suggestions will be made since I am considering only set [2,3,5] and item 1 is not available in that set. I want to know whether my logic (considering only set [2,3,5]) is enough to make suggestions for the user
You should base on how the frequency of the item set is relative to its sub item sets to figure out the rule. For example
if frequency of (2,3,5) is close to the frequency of (3,5), the rule will be (3,5) -> 2
If frequency of (2,3,5) is close to the frequency of (3), the rule will be 3 -> (2,5)
If frequency of (2,3) is close to the frequency of (2), the rule will be 2 -> 3
That means not only largest frequent item set could be used to make rule but its sub frequent item sets also. And the rule will be more pricise if you could consider how close frequency of item sets is relative to others.
No. Deriving recommendation rules requires more effort.
Just because [2,3,5] is frequent does not mean 2 -> 3,5 is a good rule.
Consider the case that 2 is a very popular product, but 3,5 are just barely frequent. Consider a gas station. [gas, coffee, bagel] is probably a frequent itemset, but rather few customers who buy gas will also buy coffee and a bagel (low confidence).
You do want to consider rules such as 2,3 -> 5 because they may have higher confidence. I.e. if the customer buys gas and coffee, suggest a bagel.
Frequency is not sufficient for recommendations! Consider 2 and 3 are bought in 80% of cases. 2, 3, 5 is bought in 60% of cases. Naively, in 6 out of 8 times, the customer will also buy 5, that's 75% correct! But this does not mean 5 is a good recommendation! Because 5 could be in 80% total, so if he bought 2 and 3, he is actually 5% less likely to buy 5, and we have a negative correlation here. That's why you need to look at lift, too. Or other measures like it, there are many.

Combination optimization

I am creating an app that generates combinations of items which collectively satisfy a criteria supplied by the user. The following are the some details of the system;
There is a collection of gear which are categorized into 3 types.
Each gear has two abilities which are never the same.
A kit is a grouping of 3 different types of gear.
A kit may/may not have repeated attributes abilities.
The app allows the user to supply k abilities (0 < k < 7) and then generates all possible kits that have the specified abilities. (If k<6 then the kit may have repeated abilities or unspecified abilities to make up 6).
Currently I generate all possible kits and while grouping the gear; check to see if the kit has all the attributes before putting it in a collection. This method is rather slow as you can imagine, and I've been really at a loss of how to optimize this via a data structure or otherwise, apart from maybe using a database.
Invert your problem, by taking the result, and look at how to construct it.
The algo which follows has a fixed time to pre-compute, and then a minimal time, to compute every kits, depending on how many items in each group.
In each type - say A, B, C, let us group together, items with 2 defined capabilities => then you have 15 distinct groups (6x5/2) in A, 15 in B, 15 in C.
Then you have 3375 possibilities. But only 15 x 6 = 80 goods.
Algorithm:
1 group together elements of A, of B, of C with same abilities and put them in a sort of map : element => GA1, ... (see above)
2 generate and separate the 15 groups of A, the 15 of B, the 15 of C
like GA1, GA2, ... GA15, GB1, GB2, ... GB15, GC 1, GC2, GC15
3 generate compatible GA / GB / GC like a list you will have 80 lists
4 then take each of the 80 lists, and replace inside , elements of A, of B and of C
then you have every possible kits.

Simple storage allocation algorithm

We have a whole bunch of machines which use a whole bunch of data stores. We want to transfer all the machines' data to new data stores. These new stores vary in the amount of storage space available to the machines. Furthermore each machine varies in the amount of data it needs stored. All the data of a single machine must be stored on a single data store; it cannot be split. Other than that, it doesn't matter how the data is apportioned.
We currently have more data than we have space, so it is inevitable that some machines will need to have their data left where it is, until we find some more. In the meantime, does anyone know an algorithm (relatively simple: I'm not that smart) that will provide optimal or near-optimal allocation for the storage we have (i.e. the least amount of space left over on the new stores, after allocation)?
I realise this sounds like a homework problem, but I assure you it's real!
At first glance this may appear to be the multiple knapsack problem (http://www.or.deis.unibo.it/knapsack.html, chapter 6.6 "Multiple knapsack problem - Approximate algorithms"), but actually it is a scheduling problem because it involves a time element. Needless to say it is complicated to solve these types of problems. One way is to model them as network flow and use a network flow library like GOBLIN.
In your case, note that you actually do not want to fill the stores optimally, because if you do that, smaller data packages will be more likely to be stored because it will lead to tighter packings. This is bad because if large packages get left on the machines then your future packings will get worse and worse. What you want to do is prioritize storing larger packages, even if that means leaving more extra space on the stores, because then you will gain more flexibility in the future.
Here is how to solve this problem with a simple algorithm:
(1) Determine the bin sizes and sort them. For example, if you have 3 stores with space 20 GB, 45 GB and 70 GB, then your targets are { 20, 45, 70 }.
(2) Sort all the data packages by size. For example, you might have data packages: { 2, 2, 4, 6, 7, 7, 8, 11, 13, 14, 17, 23, 29, 37 }.
(3) If any of the packages sum to > 95% of a store, put them in that store and go to step (1). Not the case here.
(4) Generate all the permutations of two packages.
(5) If any of the permutations sum to > 95% of a store, put them in that store. If there is a tie, prefer a combination with a bigger package. In my example, there are two such pairs { 37, 8 } = 45 and { 17, 2 } = 19. (Notice that using { 17, 2 } trumps using { 13, 7 }). If you find one or more matches, go back to step (1).
Okay, now we just have one store left: 70 and the following packages: { 2, 4, 6, 7, 7, 11, 13, 14, 23, 29 }.
(6) Increase the number of perms by 1 and go to Step 5. For example, in our case we find that no 3-perm adds to over 95% of 70, but the 4 perm { 29, 23, 14, 4 } = 70. At the end we are left with packages { 2, 6, 7, 7, 11, 13 } that are left on the machines. Notice these are mostly the smaller packages.
Notice that perms are tested in reverse lexical order (biggest first). For example, if you have "abcde" where e is the biggest, then the reverse lexical order for 3-perms is:
cde
bde
ade
bce
ace
etc.
This algorithm is very simple and will yield a good result for your situation.

How many time do I need to add 2 to get the value equivalent to 2 raised to 1000

I am trying to solve Project Euler Prob 16. I am planning to take two dimensional array to store the calculate and store the results.
To calculate 2 raised to 3, I need to add 2, 4 times
To calculate 2 raised to 4, I need to add 2, 8 times
Similarly, how many times will I need to add 2 to the result to get 2 raised to 1000?
Which data structure will be best suited for this (preferably in C++)?
And what would be a good algorithm to solve this in C++?
If I understand you correctly, you're asking what data structure you should use to hold the digits of the integer value of the result of 21000. Instead of reinventing the wheel, I'd recommend using something like Java's BigInteger class that will hold arbitrarily large numbers.
A solution to exactly the same kind of problem has been previously shown here. The number of decimal digits in 21000 is ceil((1000 + 1) * log10(2)) = 302. One more character will be needed for the string terminator, '\0'. * log10(2) = * 0.3 can be approximated with / 3 = * 0.3(3).

Resources