Algorithm for heavily restricted Knapsack problem - algorithm

I have a following problem to solve:
We have 53 weeks in a year, for each week we need to choose one model from the list: [A1, A2,....,F149, F150]. In total around 750 models in 6 classes: A,B,C,D,E,F.
Models can repeat and each has a specific value from around 3 to 10 and a weight. The goal is to achieve a target total value of 280+-5% with a minimal weight by the end of the year.
However there are a ton of restrictions. For example:
Models must be held for at least 4 weeks in a row. If we have chosen A1 for week 1, then we need to choose A1 for weeks 2,3,4;
If we have chosen model classes E,F then, after they end, we cannot choose E, F for another 4 weeks.
Throughout the year we can only choose 23 models of class D.
an so on
What I've tried so far:
Based on a target value create a corridor of allowed values throughout the year:
Corridor looks like this
Starting at week 1, choose a random model for the week from the list of allowed models -> Based on the choice modify the list of allowed models for next weeks
If our choice satisfies the criterion (also lies within a corridor), then week+=1. If not, delete this possibility.
If there is no more models for this week, go back one week, delete the possibility we have chosen before and choose random from what's left.
Pictorially the algorithm is like following the branches of a tree. If the branch is bad, return back to the fork and cut off the bad branch.
This algorithm can generate a random valid solution (in about 5 to 80 minutes with a mean time of 25 minutes). Then I need to generate more of those and choose one that has the least weight. Which is not a very good approach, I presume.
Question
The question is: what is the optimal way to solve the problem? The priority is to find the solution with a minimal weight and a target value and not the fastest algorithm. But it should at least end in a final amount of time =)
The problem statement above is a bit oversimplified and due to the complexity of calculations and the amount of combinations, there is no way to consider and compare all combinations.

Related

Solving a travelling salesman problem to maximize gain in minimum time

Team
I need suggestions on how to solve the below problem.
There are n places (for example say 10 places). Time taken from any one place to the other is known. On reaching a particular place a known reward is given in the form of rupees (ex. if I travel from place 1 to place 2, I get 100 rupees. Travelling from place 2 to place 3 will fetch me 50 rupees etc...). Also, sometimes a particular place is unavailable to travel to which changes with time. At all time instances, whatever places can be traveled to is known, reward fetched from each place is known and the time taken to travel from one place to other is known. This is an ongoing process, meaning after you reach place A and earn 100 rupees, you travelled to place B and fetch 100 Rs. Then it is possible that place A can again fetch you rupees say 50 if you travel from B to A again.
Problem statement is:
A path should be followed with time ( A to B, B to C, C to B, B to A etc...) so that I always have maximum rupees in a given time. Thus at the end of 1 month, I should have followed a path that fetches me the maximum amount among all possibilities available.
We already know that in the traveling salesman problem it takes O(N!) to calculate the best way for the month if there are no changes. Because of the unknown changes that can happen, the best way is to use a greedy algorithm such that every time you come to new place, you calculate where you get the most R's in the least amount of time. It will take O(N*k) where k is the amount of time that you move between places in a month.
I'm not sure how this problem is related to travelling salesman -- I understood the latter as having the restriction of at least visiting all the places once.
Assuming we have all of the time instances and their related information ahead of our calculation, if we work backwards from each place we imagine ending at, the choices we have for the previous location visited dictate the possible times it took to get to the last place and the possible earning we could have gotten. Clearly from those choices we would choose the best reward among them because it's our last choice. Apply this idea recursively from there, until we reach the start of the month. If we run this recursion from each possible ending place, we can reuse states we've seen before; for example if we reached place A at time T as one of the options when calculating backwards from B, and then we reach A again at time T when calculating a path that started at C, we can reuse the record for the first state. The search space would be O(N*T) but practically would vary with the input.
Something like this? (Assumes we cannot wait in any one place. Otherwise, the solution could be better coded bottom-up where we can try all place + time states.) Return the best of running f with the same memo map on all possible ending states.
get_travel_time(place_a, place_b):
# returns travel time from place a to place b
get_neighbours(place):
# returns places from which we can travel to place
get_reward(place, time):
# returns the reward awarded at place place at time time
f(place, time, memo={}):
if time == 0:
return 0
key = (place, time)
if key in memo:
return memo[key]
current_reward = get_reward(place, time)
best = -Infinity
for neighbour in get_neighbours(place):
previous_time = time - get_travel_time(neighbour, place)
if previous_time >= 0:
best = max(best, current_reward + f(neighbour, previous_time, memo))
memo[key] = best
return memo[key]

Best approach to a variation of a bucketing problem

Find the most appropriate team compositions for days in which it is possible. A set of n participants, k days, a team has m slots. A participant specifies how many days he wants to be a part of and which days he is available.
Result constraints:
Participants must not be participating in more days than they want
Participants must not be scheduled in days they are not available in.
Algorithm should do its best to include as many unique participants as possible.
A day will not be scheduled if less than m participants are available for that day.
I find myself solving this problem manually every week at work for my football team scheduling and I'm sure there is a smart programmatic approach to solve it. Currently, we consider only 2 days per week and colleagues write down their name for which day they wanna participate, and it ends up having big lists for each day and impossible to please everyone.
I considered a new approach in which each colleague writes down his name, desired times per week to play and which days he is available, an example below:
Kane 3 1 2 3 4 5
The above line means that Kane wants to play 3 times this week and he is available Monday through Friday. First number represents days to play, next numbers represent available days(1 to 7, MOnday to Sunday).
Days with less than m (in my case, m = 12) participants are not gonna be scheduled. What would be the best way to approach this problem in order to find a solution that does its best to include each participant at least once and also considers their desires(when to play, how much to play).
I can do programming, I just need to know what kind of algorithm to implement and maybe have a brief logical explanation for the choice.
Result constraints:
Participants must not play more than they want
Participants must not be scheduled in days they don't want to play
Algorithm should do its best to include as many participants as possible.
A day will not be scheduled if less than m participants are available for that day.
Scheduling problems can get pretty gnarly, but yours isn't too bad actually. (Well, at least until you put out the first automated schedule and people complain about it and you start adding side constraints.)
The fact that a day can have a match or not creates the kind of non-convexity that makes these problems hard, but if k is small (e.g., k = 7), it's easy enough to brute force through all of the 2k possibilities for which days have a match. For the rest of this answer, assume we know.
Figuring out how to assign people to specific matches can be formulated as a min-cost circulation problem. I'm going to write it as an integer program because it's easier to understand in my opinion, and once you add side constraints you'll likely be reaching for an integer program solver anyway.
Let P be the set of people and M be the set of matches. For p in P and m in M let p ~ m if p is willing to play in m. Let U(p) be the upper bound on the number of matches for p. Let D be the number of people demanded by each match.
For each p ~ m, let x(p, m) be a 0-1 variable that is 1 if p plays in m and 0 if p does not play in m. For all p in P, let y(p) be a 0-1 variable (intuitively 1 if p plays in at least one match and 0 if p plays in no matches, but hold on a sec). We have constraints
# player doesn't play in too many matches
for all p in P, sum_{m in M | p ~ m} x(p, m) ≤ U(p)
# match has the right number of players
for all m in M, sum_{p in P | p ~ m} x(p, m) = D
# y(p) = 1 only if p plays in at least one match
for all p in P, y(p) ≤ sum_{m in M | p ~ m} x(p, m)
The objective is to maximize
sum_{p in P} y(p)
Note that we never actually force y(p) to be 1 if player p plays in at least one match. The maximization objective takes care of that for us.
You can write code to programmatically formulate and solve a given instance as a mixed-integer program (MIP) like this. With a MIP formulation, the sky's the limit for side constraints, e.g., avoid playing certain people on consecutive days, biasing the result to award at least two matches to as many people as possible given that as many people as possible got their first, etc., etc.
I have an idea if you need a basic solution that you can optimize and refine by small steps. I am talking about Flow Networks. Most of those that already know what they are are probably turning their nose because flow network are usually used to solve maximization problem, not optimization problem. And they are right in a sense, but I think it can be initially seen as maximizing the amount of player for each day that play. No need to say it is a kind of greedy approach if we stop here.
No more introduction, the purpose is to find the maximum flow inside this graph:
Each player has a number of days in which he wants to play, represented as the capacity of each edge from the Source to node player x. Each player node has as many edges from player x to day_of_week as the capacity previously found. Each of this 2nd level edges has a capacity of 1. The third level is filled by the edges that link day_of_week to the sink node. Quick example: player 2 is available 2 days: monday and tuesday, both have a limit of player, which is 12.
Until now 1st, 2nd and 4th constraints are satisfied (well, it was the easy part too): after you found the maximum flow of the entire graph you only select those path that does not have any residual capacity both on 2nd level (from players to day_of_weeks) and 3rd level (from day_of_weeks to the sink). It is easy to prove that with this level of "optimization" and under certain conditions, it is possible that it will not find any acceptable path even though it would have found one if it had made different choices while visiting the graph.
This part is the optimization problem that i meant before. I came up with at least two heuristic improvements:
While you visit the graph, store day_of_weeks in a priority queue where days with more players assigned have a higher priority too. In this way the amount of residual capacity of the entire graph is certainly less evenly distributed.
randomness is your friend. You are not obliged to run this algorithm only once, and every time you run it you should pick a random edge from a node in the player's level. At the end you average the results and choose the most common outcome. This is an situation where the majority rule perfectly applies.
Better to specify that everything above is just a starting point: the purpose of heuristic is to find the best approximated solution possible. With this type of problem and given your probably small input, this is not the right way but it is the easiest one when you do not know where to start.

Formal name for this optimization algorithm?

I have the following problem in one of my coding project which I will simplify here:
I am ordering groceries online and want very specific things in very specific quantities. I would like to order the following:
8 Apples
1 Yam
2 Soups
3 Steaks
20 Orange Juices
There are many stores equidistant from me which I will have food delivered from. Not all stores have what I need. I want to obtain what I need with the fewest number of orders made. For example, ordering from Store #2 below is a wasted order, since I can complete my items in less orders by ordering from different stores. What is the name of the optimization algorithm that solves this?
Store #1 Supply
50 Apples
Store #2 Supply
1 Orange Juice
2 Steaks
1 Soup
Store #3 Supply
25 Soup
50 Orange Juices
Store #4 Supply
25 Steaks
10 Yams
The lowest possible orders is 3 in this case. 8 Apples from Store #1. 2 Soup and 20 Orange Juice from Store #3. 1 Yam and 3 Steaks from Store #4.
To me, this most likely sounds like a restricted case of the Integer Linear programming problem (ILP), namely, its 0-or-1 variant, where the integer variables are restricted to the set {0, 1}. This is known to be NP-hard (and the corresponding decision problem is NP-complete).
The problem is formulated as follows (following the conventions in the op. cit.):
Given the matrix A, the constraint vector b, and the weight vector c, find the vector x ∈ {0, 1}N such that all the constraints A⋅x ≥ b are satisfied, and the cost c⋅x is minimal.
I flipped the constraint inequality, but this is equivalent to changing the sign of both A and b.
The inequalities indicate satisfaction of your order: that you can buy at the least the amount of every item in the visited store. Note that b has the same length as the number of rows in A and the number of columns in both c and x. The dot-product c⋅x is, naturally, a scalar.
Since you are minimizing the number of trips, each trip costs the same, so that c = 1, and c⋅x is the total number of trips. The store inventory matrix A has a row per item, and a column per store, and the b is your shopping list.
Naturally, the exact best solution is found by trying all possible 2N values for the x.
Since there is no single approach to NP-hard problems, consider the problem size, and how close to the optimum you want to arrive. A greedy approach would work well (when your next store to visit has the most total number of items not yet satisfied) when the "inventories" are large. If you have the idea in advance about the expected minimum number of trips, you can trim the search beam at some value, exceeding the number of trips by some multiplication coefficient. This is the best approach when your search is time constrained (I routinely do beam searches, closely related to the branch-and-cut approach mentioned in the article, in graphs that take a few GB of memory slightly faster than the limit of 30ms per exploration step with a beam as wide as 10,000). Simulated annealing also works, if the search landscape is not excessively rough.
Also search on cs.SE; it may be even a better place for questions of this type.

Variation to the Set-Covering Prob (Maybe an Activity Selection Prob)

Everyday from 9am to 5pm, I am supposed to have at least one person at the factory supervising the workers and make sure that nothing goes wrong.
There are currently n applicants to the job, and each of them can work from time si to time ci, i = 1, 2, ..., n.
My goal is to minimize the time that more than two people are keeping watch of the workers at the same time.
(The applicants' available working hours are able to cover the time period from 9am to 5pm.)
I have proved that at most two people are needed for any instant of time to fulfill my needs, but how should I get from here to the final solution?
Finding the time periods where only one person is available for the job and keeping them is my first step, but finding the next step is what troubles me... .
The algorithm must run in polynomial-time.
Any hints(a certain type of data structure maybe?) or references are welcome. Many thanks.
I think you can do this with dynamic programming by solving the sub-problem:
What is the minimum overlap time given that applicant i is the last worker and we have covered all times from start of day up to ci?
Call this value of the minimum overlap time cost(i).
You can compute the value of cost(i) by considering cases:
If si is equal to the start of day, then cost(i) = 0 (no overlap is required)
Otherwise, consider all previous applicants j. Set cost(i) to the minimum of cost(j)+overlap between i and j. Also set prev(i) to the value of j that attains the minimum.
Then the answer to your problem is given by the minimum of cost(k) for all values of k where ck is equal to the end of the day. You can work out the correct choice of people by backtracking using the values of prev.
This gives an O(n^2) algorithm.

If you know the future prices of a stock, what's the best time to buy and sell?

Interview Question by a financial software company for a Programmer position
Q1) Say you have an array for which the ith element is the price of a given stock on
day i.
If you were only permitted to buy one share of the stock and sell one share
of the stock, design an algorithm to find the best times to buy and sell.
My Solution :
My solution was to make an array of the differences in stock prices between day i and day i+1 for arraysize-1 days and then use Kadane Algorithm to return the sum of the largest continuous sub array.I would then buy at the start of the largest continuous array and sell at the end of the largest
continous array.
I am wondering if my solution is correct and are there any better solutions out there???
Upon answering i was asked a follow up question, which i answered exactly the same
Q2) Given that you know the future closing price of Company x for the next 10 days ,
Design a algorithm to to determine if you should BUY,SELL or HOLD for every
single day ( You are allowed to only make 1 decision every day ) with the aim of
of maximizing profit
Eg: Day 1 closing price :2.24
Day 2 closing price :2.11
...
Day 10 closing price : 3.00
My Solution: Same as above
I would like to know what if theres any better algorithm out there to maximise profit given
that i can make a decision every single day
Q1 If you were only permitted to buy one share of the stock and sell one share of the stock, design an algorithm to find the best times to buy and sell.
In a single pass through the array, determine the index i with the lowest price and the index j with the highest price. You buy at i and sell at j (selling before you buy, by borrowing stock, is in general allowed in finance, so it is okay if j < i). If all prices are the same you don't do anything.
Q2 Given that you know the future closing price of Company x for the next 10 days , Design a algorithm to to determine if you should BUY,SELL or HOLD for every single day ( You are allowed to only make 1 decision every day ) with the aim of of maximizing profit
There are only 10 days, and hence there are only 3^10 = 59049 different possibilities. Hence it is perfectly possible to use brute force. I.e., try every possibility and simply select the one which gives the greatest profit. (Even if a more efficient algorithm were found, this would remain a useful way to test the more efficient algorithm.)
Some of the solutions produced by the brute force approach may be invalid, e.g. it might not be possible to own (or owe) more than one share at once. Moreover, do you need to end up owning 0 stocks at the end of the 10 days, or are any positions automatically liquidated at the end of the 10 days? Also, I would want to clarify the assumption that I made in Q1, namely that it is possible to sell before buying to take advantage of falls in stock prices. Finally, there may be trading fees to be taken into consideration, including payments to be made if you borrow a stock in order to sell it before you buy it.
Once these assumptions are clarified it could well be possible to take design a more efficient algorithm. E.g., in the simplest case if you can only own one share and you have to buy before you sell, then you would have a "buy" at the first minimum in the series, a "sell" at the last maximum, and buys and sells at any minima and maxima inbetween.
The more I think about it, the more I think these interview questions are as much about seeing how and whether a candidate clarifies a problem as they are about the solution to the problem.
Here are some alternative answers:
Q1) Work from left to right in the array provided. Keep track of the lowest price seen so far. When you see an element of the array note down the difference between the price there and the lowest price so far, update the lowest price so far, and keep track of the highest difference seen. My answer is to make the amount of profit given at the highest difference by selling then, after having bought at the price of the lowest price seen at that time.
Q2) Treat this as a dynamic programming problem, where the state at any point in time is whether you own a share or not. Work from left to right again. At each point find the highest possible profit, given that own a share at the end of that point in time, and given that you do not own a share at the end of that point in time. You can work this out from the result of the calculations of the previous time step: In one case compare the options of buying a share and subtracting this from the profit given that you did not own at the end of the previous point or holding a share that you did own at the previous point. In the other case compare the options of selling a share to add to the profit given that you owned at the previous time, or staying pat with the profit given that you did not own at the previous time. As is standard with dynamic programming you keep the decisions made at each point in time and recover the correct list of decisions at the end by working backwards.
Your answer to question 1 was correct.
Your answer to question 2 was not correct. To solve this problem you work backwards from the end, choosing the best option at each step. For example, given the sequence { 1, 3, 5, 4, 6 }, since 4 < 6 your last move is to sell. Since 5 > 4, the previous move to that is buy. Since 3 < 5, the move on 5 is sell. Continuing in the same way, the move on 3 is to hold and the move on 1 is to buy.
Your solution for first problem is Correct. Kadane's Algorithm runtime complexity is O(n) is a optimal solution for maximum subarray problem. And benefit of using this algorithm is that it is easy to implement.
Your solution for second problem is wrong according to me. What you can do is to store the left and right index of maximum sum subarray you find. Once you find have maximum sum subarray and its left and right index. You can call this function again on the left part i.e 0 to left -1 and on right part i.e. right + 1 to Array.size - 1. So, this is a recursion process basically and you can further design the structure of this recursion with base case to solve this problem. And by following this process you can maximize profit.
Suppose the prices are the array P = [p_1, p_2, ..., p_n]
Construct a new array A = [p_1, p_2 - p_1, p_3 - p_2, ..., p_n - p_{n-1}]
i.e A[i] = p_{i+1} - p_i, taking p_0 = 0.
Now go find the maximum sum sub-array in this.
OR
Find a different algorithm, and solve the maximum sub-array problem!
The problems are equivalent.

Resources