Interview Question by a financial software company for a Programmer position
Q1) Say you have an array for which the ith element is the price of a given stock on
day i.
If you were only permitted to buy one share of the stock and sell one share
of the stock, design an algorithm to find the best times to buy and sell.
My Solution :
My solution was to make an array of the differences in stock prices between day i and day i+1 for arraysize-1 days and then use Kadane Algorithm to return the sum of the largest continuous sub array.I would then buy at the start of the largest continuous array and sell at the end of the largest
continous array.
I am wondering if my solution is correct and are there any better solutions out there???
Upon answering i was asked a follow up question, which i answered exactly the same
Q2) Given that you know the future closing price of Company x for the next 10 days ,
Design a algorithm to to determine if you should BUY,SELL or HOLD for every
single day ( You are allowed to only make 1 decision every day ) with the aim of
of maximizing profit
Eg: Day 1 closing price :2.24
Day 2 closing price :2.11
...
Day 10 closing price : 3.00
My Solution: Same as above
I would like to know what if theres any better algorithm out there to maximise profit given
that i can make a decision every single day
Q1 If you were only permitted to buy one share of the stock and sell one share of the stock, design an algorithm to find the best times to buy and sell.
In a single pass through the array, determine the index i with the lowest price and the index j with the highest price. You buy at i and sell at j (selling before you buy, by borrowing stock, is in general allowed in finance, so it is okay if j < i). If all prices are the same you don't do anything.
Q2 Given that you know the future closing price of Company x for the next 10 days , Design a algorithm to to determine if you should BUY,SELL or HOLD for every single day ( You are allowed to only make 1 decision every day ) with the aim of of maximizing profit
There are only 10 days, and hence there are only 3^10 = 59049 different possibilities. Hence it is perfectly possible to use brute force. I.e., try every possibility and simply select the one which gives the greatest profit. (Even if a more efficient algorithm were found, this would remain a useful way to test the more efficient algorithm.)
Some of the solutions produced by the brute force approach may be invalid, e.g. it might not be possible to own (or owe) more than one share at once. Moreover, do you need to end up owning 0 stocks at the end of the 10 days, or are any positions automatically liquidated at the end of the 10 days? Also, I would want to clarify the assumption that I made in Q1, namely that it is possible to sell before buying to take advantage of falls in stock prices. Finally, there may be trading fees to be taken into consideration, including payments to be made if you borrow a stock in order to sell it before you buy it.
Once these assumptions are clarified it could well be possible to take design a more efficient algorithm. E.g., in the simplest case if you can only own one share and you have to buy before you sell, then you would have a "buy" at the first minimum in the series, a "sell" at the last maximum, and buys and sells at any minima and maxima inbetween.
The more I think about it, the more I think these interview questions are as much about seeing how and whether a candidate clarifies a problem as they are about the solution to the problem.
Here are some alternative answers:
Q1) Work from left to right in the array provided. Keep track of the lowest price seen so far. When you see an element of the array note down the difference between the price there and the lowest price so far, update the lowest price so far, and keep track of the highest difference seen. My answer is to make the amount of profit given at the highest difference by selling then, after having bought at the price of the lowest price seen at that time.
Q2) Treat this as a dynamic programming problem, where the state at any point in time is whether you own a share or not. Work from left to right again. At each point find the highest possible profit, given that own a share at the end of that point in time, and given that you do not own a share at the end of that point in time. You can work this out from the result of the calculations of the previous time step: In one case compare the options of buying a share and subtracting this from the profit given that you did not own at the end of the previous point or holding a share that you did own at the previous point. In the other case compare the options of selling a share to add to the profit given that you owned at the previous time, or staying pat with the profit given that you did not own at the previous time. As is standard with dynamic programming you keep the decisions made at each point in time and recover the correct list of decisions at the end by working backwards.
Your answer to question 1 was correct.
Your answer to question 2 was not correct. To solve this problem you work backwards from the end, choosing the best option at each step. For example, given the sequence { 1, 3, 5, 4, 6 }, since 4 < 6 your last move is to sell. Since 5 > 4, the previous move to that is buy. Since 3 < 5, the move on 5 is sell. Continuing in the same way, the move on 3 is to hold and the move on 1 is to buy.
Your solution for first problem is Correct. Kadane's Algorithm runtime complexity is O(n) is a optimal solution for maximum subarray problem. And benefit of using this algorithm is that it is easy to implement.
Your solution for second problem is wrong according to me. What you can do is to store the left and right index of maximum sum subarray you find. Once you find have maximum sum subarray and its left and right index. You can call this function again on the left part i.e 0 to left -1 and on right part i.e. right + 1 to Array.size - 1. So, this is a recursion process basically and you can further design the structure of this recursion with base case to solve this problem. And by following this process you can maximize profit.
Suppose the prices are the array P = [p_1, p_2, ..., p_n]
Construct a new array A = [p_1, p_2 - p_1, p_3 - p_2, ..., p_n - p_{n-1}]
i.e A[i] = p_{i+1} - p_i, taking p_0 = 0.
Now go find the maximum sum sub-array in this.
OR
Find a different algorithm, and solve the maximum sub-array problem!
The problems are equivalent.
I think this is a scheduling problem, but I'm not even sure on that much! What I want is to find the optimal sequence of non-overlapping purchase decisions, when I have full knowledge of their value and what opportunities are coming up in the future.
Imagine a wholesaler who sells various goods that I want to buy for my own shop. At any time they may have multiple special offers running; I will sell at full price, so their discount is my profit.
I want to maximize profit, but the catch is that I can only buy one thing at a time, and no such thing as credit, and worse, there is a delivery delay. The good news is I will sell the items as soon as they are delivered, and I can then go spend my money again. So, one path through all the choices might be: I buy 100kg apples on Monday, they are delivered on Tuesday. I then buy 20 nun costumes delivered, appropriately enough, on Sunday. I skip a couple of days, as I know on Wednesday they'll have a Ferrari at a heavy discount. So I buy one of those, it is delivered the following Tuesday. And so on.
You can consider compounding profits or not. The algorithm comes down to a decision at each stage between choosing one of today's special offers, or waiting a day because something better is coming tomorrow.
Let's abstract that a bit. Buy and delivery become days-since-epoch. Profit is written as sell-price divided by buy-price. I.e. 1.00 means break-even, 1.10 means a 10% profit, 2.0 means I doubled my money.
buy delivery profit
1 2 1.10 Apples
1 3 1.15 Viagra
2 3 1.15 Notebooks
3 7 1.30 Nun costumes
4 7 1.28 Priest costumes
6 7 1.09 Oranges
6 8 1.11 Pears
7 9 1.16 Yellow shoes
8 10 1.15 Red shoes
10 15 1.50 Red Ferrari
11 15 1.40 Yellow Ferrari
13 16 1.25 Organic grapes
14 19 1.30 Organic wine
NOTES: opportunities exist only on the buy day (e.g. the organic grapes get made into wine if no-one buys them!), and I get to sell on the same day as delivery, but cannot buy my next item until the following day. So I cannot sell my nun costumes at t=7 and immediately buy yellow shoes at t=7.
I was hoping there exists a known best algorithm, and that there is already an R module for it, but algorithms or academic literature would also be good, as would anything in any other language. Speed matters, but mainly when the data gets big, so I'd like to know if it is O(n2), or whatever.
By the way, does the best algorithm change if there is a maximum possible delivery delay? E.g. if delivery - buy <= 7
Here is the above data as CSV:
buy,delivery,profit,item
1,2,1.10,Apples
1,3,1.15,Viagra
2,3,1.15,Notebooks
3,7,1.30,Nun costumes
4,7,1.28,Priest costumes
6,7,1.09,Oranges
6,8,1.11,Pears
7,9,1.16,Yellow shoes
8,10,1.15,Red shoes
10,15,1.50,Red Ferrari
11,15,1.40,Yellow Ferrari
13,16,1.25,Organic grapes
14,19,1.30,Organic wine
Or as JSON:
{"headers":["buy","delivery","profit","item"],"data":[[1,2,1.1,"Apples"],[1,3,1.15,"Viagra"],[2,3,1.15,"Notebooks"],[3,7,1.3,"Nun costumes"],[4,7,1.28,"Priest costumes"],[6,7,1.09,"Oranges"],[6,8,1.11,"Pears"],[7,9,1.16,"Yellow shoes"],[8,10,1.15,"Red shoes"],[10,15,1.5,"Red Ferrari"],[11,15,1.4,"Yellow Ferrari"],[13,16,1.25,"Organic grapes"],[14,19,1.3,"Organic wine"]]}
Or as an R data frame:
structure(list(buy = c(1L, 1L, 2L, 3L, 4L, 6L, 6L, 7L, 8L, 10L,
11L, 13L, 14L), delivery = c(2L, 3L, 3L, 7L, 7L, 7L, 8L, 9L,
10L, 15L, 15L, 16L, 19L), profit = c(1.1, 1.15, 1.15, 1.3, 1.28,
1.09, 1.11, 1.16, 1.15, 1.5, 1.4, 1.25, 1.3), item = c("Apples",
"Viagra", "Notebooks", "Nun costumes", "Priest costumes", "Oranges",
"Pears", "Yellow shoes", "Red shoes", "Red Ferrari", "Yellow Ferrari",
"Organic grapes", "Organic wine")), .Names = c("buy", "delivery",
"profit", "item"), row.names = c(NA, -13L), class = "data.frame")
LINKS
Are there any R Packages for Graphs (shortest path, etc.)? (igraph offers a shortest.paths function and in addition to the C library, has an R package and a python interface)
The simplest way to think of this problem is as analogous to a shortest-path problem (although treating it as a maximum flow problem probably is technically better). The day numbers, 1 ... 19, can be used as node names; each node j has a link to node j+1 with weight 1, and each product (b,d,g,p) in the list adds a link from day b to day d+1 with weight g. As we progress through the nodes when path-finding, we keep track of the best multiplied values seen so far at each node.
The Python code shown below runs in time O(V+E) where V is the number of vertices (or days), and E is the number of edges. In this implementation, E = V + number of products being sold. Added note: The loop for i, t in enumerate(graf): treats each vertex once. In that loop, for e in edges: treats edges from the current vertex once each. Thus, no edge is treated more than once, so performance is O(V+E).
Edited note 2: krjampani claimed that O(V+E) is slower than O(n log n), where n is the number of products. However, the two orders are not comparable unless we make assumptions about the number of days considered. Note that if delivery delays are bounded and product dates overlap, then number of days is O(n) whence O(V+E) = O(n), which is faster than O(n log n).
However, under a given set of assumptions the run time orders of my method and krjampani's can be the same: For large numbers of days, change my method to create graph nodes only for days in the sorted union of x[0] and x[1] values, and using links to day[i-1] and day[i+1] instead of to i-1 and i+1. For small numbers of days, change krjampani's method to use an O(n) counting sort.
The program's output looks like the following:
16 : 2.36992 [11, 15, 1.4, 'Yellow Ferrari']
11 : 1.6928 [8, 10, 1.15, 'Red shoes']
8 : 1.472 [4, 7, 1.28, 'Priest costumes']
4 : 1.15 [1, 3, 1.15, 'Viagra']
which indicates that we arrived at day 16 with compounded profit of 2.36992, after selling Yellow Ferrari's on day 15; arrived at day 11 with profit 1.6928, after selling Red shoes; and so forth. Note, the dummy entry at the beginning of the products list, and removal of quotes around the numbers, are the main differences vs the JSON data. The entry in list element graf[j] starts out as [1, j-1, 0, [[j+1,1,0]]], that is, is of form [best-value-so-far, best-from-node#, best-from-product-key, edge-list]. Each edge-list is a list of lists which have form [next-node#, edge-weight, product-key]. Having product 0 be a dummy product simplifies initialization.
products = [[0,0,0,""],[1,2,1.10,"Apples"],[1,3,1.15,"Viagra"],[2,3,1.15,"Notebooks"],[3,7,1.30,"Nun costumes"],[4,7,1.28,"Priest costumes"],[6,7,1.09,"Oranges"],[6,8,1.11,"Pears"],[7,9,1.16,"Yellow shoes"],[8,10,1.15,"Red shoes"],[10,15,1.50,"Red Ferrari"],[11,15,1.40,"Yellow Ferrari"],[13,16,1.25,"Organic grapes"],[14,19,1.30,"Organic wine"]]
hiDay = max([x[1] for x in products])
graf = [[1, i-1, 0, [[i+1,1,0]]] for i in range(2+hiDay)]
for i, x in enumerate(products):
b, d, g, p = x[:]
graf[b][3] += [[d+1, g, i]] # Add an edge for each product
for i, t in enumerate(graf):
if i > hiDay: break
valu = t[0] # Best value of path to current node
edges = t[3] # List of edges out of current node
for e in edges:
link, gain, prod = e[:]
v = valu * gain;
if v > graf[link][0]:
graf[link][0:3] = [v, i, prod]
day = hiDay
while day > 0:
if graf[day][2] > 0:
print day, ":\t", graf[day][0], products[graf[day][2]]
day = graf[day][1]
This problem maps naturally to the problem of finding the maximum weight independent intervals among a set of weighted intervals. Each item in your input set corresponds to an interval whose start and end points are the buy and delivery dates and the item's profit represents the weight of the interval. The maximum weight independent intervals problem is to find a set of disjoint intervals whose total weight is the maximum.
The problem can be solved in O(n log n) as follows. Sort the intervals by their end points (see the figure). We then travel through each interval i in the sorted list and compute the optimal solution for the subproblem that consists of intervals from 1...i in the sorted list. The optimal solution of the problem for intervals 1...i is the maximum of:
1. The optimal solution of the problem for intervals `1...(i-1)` in the
sorted list or
2. Weight of interval `i` + the optimal solution of the problem for intervals
`1...j`, where j is the last interval in the sorted list whose end-point
is less than the start-point of `i`.
Note that this algorithm runs in O(n log n) and computes the value of the optimal solution for every prefix of the sorted list.
After we run this algorithm, we can travel through the sorted-list in reverse order and find the intervals present in the optimal solution based on the values computed for each prefix.
EDIT:
For this to work correctly the weights of the intervals should be the actual profits of the corresponding items (i.e. they should be sell_price - buy_price).
Update 2: Running time
Let V be the number of days (following jwpat7's notation).
If V is much smaller than O(n log n), we can use the counting sort to sort the intervals in O(n + V) time and use an array of size V to record the solutions to the subproblems. This approach results in a time complexity of O(V + n).
So the running time of the algorithm is min(O(V+n), O(n log n)).
This is a dynamic programming problem. Making an overall optimal choice only requires making optimal choices at each step. You can make a table that describes the optimal choice at each step based on the previous state and the profit of taking various steps from that state. You can collapse a large set of possibilities into a smaller set by eliminating the possibilities that are clearly non-optimal as you go.
In your problem, the only state that affects choices is the delivery date. For example, on day one, you have three choices: You can buy apples, set your profit to 1.10, and set your delivery date to 2; buy viagra, set your profit to 1.15, and set your delivery date to 3; or buy nothing, set your profit to zero, and set your delivery date to 2. We can represent these alternatives like this:
(choices=[apples], delivery=2, profit=1.10) or
(choices=[viagra], delivery=3, profit=1.15) or
(choices=[wait], delivery=2, profit=0.00)
It isn't going to make any difference whether you buy viagra or buy nothing on the first day as far as making future decisions. Either way, the next day you can make a purchase is day two, so you can eliminate waiting as an alternative since the profit is lower. However, if you buy apples, that will affect future decisions differently than if you buy viagra or wait, so it is a different alternative you have to consider. That just leaves you with these alternatives at the end of day one.
(choices=[apples], delivery=2, profit=1.10) or
(choices=[viagra], delivery=3, profit=1.15)
For day two, you need to consider your alternatives based on what the alternatives were on day one. This produces three possibilties:
(choices=[apples,notebooks], delivery=3, profit=2.25) or
(choices=[apples,wait], delivery=3, profit=1.10) or
(choices=[viagra,wait], delivery=3, profit=1.15)
All three of these choices put you in the same state as far as future decisions are considered, since they all put the delivery date at 3, so you simply choose the one with maximum profit:
(choices=[apples,notebooks], delivery=3, profit=2.25)
Going on to day three, you have two alternatives
(choices=[apples,notebooks,wait], delivery=4, profit=2.25)
(choices=[apples,notebooks,nun costumes], delivery=7, profit=3.55)
both of these alternatives have to be kept, since they will affect future decisions in different ways.
Note that we're just making future decisions based on the delivery date and the profit. We keep track of the choices just so that we can report the best set of choices at the end.
Now maybe you can see the pattern. You have a set of alternatives, and whenever you have multiple alternatives that have the same delivery date, you just choose the one with the maximum profit and eliminate the others. This process of collapsing your alternatives keeps the problem from growing exponentially, allowing it to be solved efficiently.
You can solve this as a linear programming problem. This is the standard approach to solving logistics problems, such as those faced by airlines and corporations, with much larger problem spaces than yours. I won't formally define your problem here, but in broad terms: Your objective function is the maximisation of profit. You can represent the buy days, and the "only one purchase per day" as linear constraints.
The standard linear programming algorithm is the simplex method. Although it has exponential worst case behaviour, in practice, it tends to be very efficient on real problems. There are lots of good freely available implementations. My favourite is the GNU Linear Programming Kit. R has an interface to GLPK. Lp_solve is another well-known project, which also has an R interface. The basic approach in each case is to formally define your problem, then hand that off to the third party solver to do its thing.
To learn more, I recommend you take a look at the Algorithm Design Manual, which should give you enough background to do further research online. p.412 onwards is a thorough summary of linear programming and its variations (e.g. integer programming if you have integrality constraints).
If you've never heard of linear programming before, you might like to take a look at some examples of how it can be used. I really like this simple set of tutorial problems in Python. They include maximising profit on tins of cat food, and solving a Sudoku problem.