Maximum profit earned in Interval - algorithm

We have to rent our car to customers. We have a list whose each element represent the time at which car will be given, second -> the time at which car will be returned and third -> the profit earned at that lending. So i need to find out the maximum profit that can be earned.
Eg:
( [1,2,20], [3,6,15], [2,8,25], [7,12,18], [13,31,22] )
The maximum profit earned is 75. [1,2] + [3,6] + [7,12] + [13,31].
We can have overlapping intervals. We need to select such case that maximizes our profit.

Assuming you have only one car, then the problem we are solving in Weighted Interval Scheduling
Let us assume we have intervals I0 , I1, I2, ....In-1 and Interval Ii is (si,ti,pi)
Algorithm :
First sort the Intervals on the basis of starting points si.
Create a array for Dynamic Programming, MaxProfit[i] represent the maximum profit you can make from intervals Ii,Ii+1,In-1.Initialise the last value
MaxProfit[n-1] = profit_of_(n-1)
Then using DP we can find the maximum profit as :
a. Either we can ignore the given interval, In this case our maximum profit will be the maximum profit we can gain from the remaining intervals
MaxProfit[i+1]
b. Or we can include this interval, In this case the maximum profit can be written as
profit_of_i + MaxProfit[r]
where r is the next Interval such that sr > ti
So our overall DP becomes
MaxProfit[i] = max{MaxProfit[i+1], profit_of_i + MaxProfit[r] }
Return the value of MaxProfit[0]

use something like dynamic programming.
at first sort with first element.
you have 2 rows, first show most earned if this time is used and another is for most earned if not used.
then you will put each task in the related period of time and see in each time part it is a good choice to have it or not.
take care that if intervals are legal we choose all of them.

Related

Finding algorithm that minimizes a cost function

I have a restroom that I need to place at some point. I want the restroom's placement to minimize the total distance people have to travel to get there.
So I have x apartments, and each house has n people living in each apartment, so the apartments would be like a_1, a_2, a_3, ... a_x and the number of people in a_1 would be n_1, a_2 would be n_2, etc. No two apartments can be in the same space and each apartment has a positive number of people.
So I know the distance between an apartment a_1 and the proposed bathroom, placed at a, would be |a_1 - a|.
MY WORKING:
I defined a cost function, C(a) = SUM[from i = 1 to x] (n_i)|a_i - a|. I want to find the location a that minimizes this cost function, given two arrays - one for the location of the apartments and one for the number of people in each apartment. I want my algorithm to be in O(n) time.
I was thinking of representing this as a graph and using MSTs or Djikstra's but that would not meet the O(n) runtime. Clearly, there must be something I can do without graphs, but I am unsure.
My understanding of your problem:
You have a once dimensional line with points a1,...,an. Each point has a value n1,....n, and you need to pick a point a that minimizes the cost function
SUM[from i = 1 to x] (n_i)|a_i - a|.
Lets assume our input a1...an is sorted.
Our strategy will be a sweep from left to right, calculating possible a on the way.
Things we will keep track of:
total_n : the total number of people
left_n : the number of people living to the left or at our current position
right_n : the number of people living to the right of our current position
a: our current postition
Calculate:
C(a)
left_n = n1
right_n = total_n - left_n
Now we consider what happens to the sum if we move our restroom to the right 1 step. The people on the left get 1 step further away, but the people on the right get 1 step closer.
We can say that C(a+1) = C(a) +left_n -right_n
If the range an-a1 is fairly small, we can use this and just step through the range using this formula to update the sum. Note that when this sum starts increasing we have gone too far and can safely stop.
However, if the apartments are very far apart we cannot step 1 by 1 unit. We need instead to step apartment by apartment. Note
C(a[i]) = C(a[i-1]) + left_n*(a[i]-a[a-1]) - right_n*(a[i]-a[i-1])
If at any point C(a[i]) > C(a[i-1]) we know that the correct position of the restroom is somewhere between i and i-1.
We can calculate that position, lets call it x.
The sum at x is C(a[i-1]) + left_n*(x-a[i-1]) - right_n*(x-a[i-1]) and we want to minimize this. Note that everything but x is known values.
We can simplify to
f(x) = C(a[i-1]) + left_n*x-left_n*a[i-1]) - right_n*x-left_n*a[i-1])
Constant terms cannot affect our decision so we are actually looking to minize
f(x) = x*(left_n-right_n)
We see that if left_n < right_n we want the restroom to be at i+1, but if left_n > right_n we want the restroom to be at i.
We need to at most do this calculation at each apartment, so the running time is O(n).

Algorithm for grouping train trips

Imagine you have a full calendar year in front of you. On some days you take the train, potentially even a few times in a single day and each trip could be to a different location (I.E. The amount you pay for the ticket can be different for each trip).
So you would have data that looked like this:
Date: 2018-01-01, Amount: $5
Date: 2018-01-01, Amount: $6
Date: 2018-01-04, Amount: $2
Date: 2018-01-06, Amount: $4
...
Now you have to group this data into buckets. A bucket can span up to 31 consecutive days (no gaps) and cannot overlap another bucket.
If a bucket has less than 32 train trips it will be blue. If it has 32 or more train trips in it, it will be red. The buckets will also get a value based on the sum of the ticket cost.
After you group all the trips the blue buckets get thrown out. And the value of all the red buckets gets summed up, we will call this the prize.
The goal, is to get the highest value for the prize.
This is the problem I have. I cant think of a good algorithm to do this. If anyone knows a good way to approach this I would like to hear it. Or if you know of anywhere else that can help with designing algorithms like this.
This can be solved by dynamic programming.
First, sort the records by date, and consider them in that order.
Let day (1), day (2), ..., day (n) be the days where the tickets were bought.
Let cost (1), cost (2), ..., cost (n) be the respective ticket costs.
Let fun (k) be the best prize if we consider only the first k records.
Our dynamic programming solution will calculate fun (0), fun (1), fun (2), ..., fun (n-1), fun (n), using the previous values to calculate the next one.
Base:
fun (0) = 0.
Transition:
What is the optimal solution, fun (k), if we consider only the first k records?
There are two possibilities: either the k-th record is dropped, then the solution is the same as fun (k-1), or the k-th record is the last record of a bucket.
Let us then consider all possible buckets ending with the k-th record in a loop, as explained below.
Look at records k, k-1, k-2, ..., down to the very first record.
Let the current index be i.
If the records from i to k span more than 31 consecutive days, break from the loop.
Otherwise, if the number of records, k-i+1, is at least 32, we can solve the subproblem fun (i-1) and then add the records from i to k, getting a prize of cost (i) + cost (i+1) + ... + cost (k).
The value fun (k) is the maximum of these possibilities, along with the possibility to drop the k-th record.
Answer: it is just fun (n), the case where we considered all the records.
In pseudocode:
fun[0] = 0
for k = 1, 2, ..., n:
fun[k] = fun[k-1]
cost_i_to_k = 0
for i = k, k-1, ..., 1:
if day[k] - day[i] > 31:
break
cost_i_to_k += cost[i]
if k-i+1 >= 32:
fun[k] = max (fun[k], fun[i-1] + cost_i_to_k)
return fun[n]
It is not clear whether we are allowed to split records on a single day into different buckets.
If the answer is no, we will have to enforce it by not considering buckets starting or ending between records in a single day.
Technically, it can be done by a couple of if statements.
Another way is to consider days instead of records: instead of tickets which have day and cost, we will work with days.
Each day will have cost, the total cost of tickets on that day, and quantity, the number of tickets.
Edit: as per comment, we indeed can not split any single day.
Then, after some preprocessing to get days records instead of tickets records, we can go as follows, in pseudocode:
fun[0] = 0
for k = 1, 2, ..., n:
fun[k] = fun[k-1]
cost_i_to_k = 0
quantity_i_to_k = 0
for i = k, k-1, ..., 1:
if k-i+1 > 31:
break
cost_i_to_k += cost[i]
quantity_i_to_k += quantity[i]
if quantity_i_to_k >= 32:
fun[k] = max (fun[k], fun[i-1] + cost_i_to_k)
return fun[n]
Here, i and k are numbers of days.
Note that we consider all possible days in the range: if there are no tickets for a particular day, we just use zeroes as its cost and quantity values.
Edit2:
The above allows us to calculate the maximum total prize, but what about the actual configuration of buckets which gets us there?
The general method will be backtracking: at position k, we will want to know how we got fun (k), and transition to either k-1 if the optimal way was to skip k-th record, or from k to i-1 for such i that the equation fun[k] = fun[i-1] + cost_i_to_k holds.
We proceed until i goes down to zero.
One of the two usual implementation approaches is to store par (k), a "parent", along with fun (k), which encodes how exactly we got the maximum.
Say, if par (k) = -1, the optimal solution skips k-th record.
Otherwise, we store the optimal index i in par (k), so that the optimal solution takes a bucket of records i to k inclusive.
The other approach is to store nothing extra.
Rather, we run a slight modification code which calculates fun (k).
But instead of assigning things to fun (k), we compare the right part of the assignment to the final value fun (k) we already got.
As soon as they are equal, we found the right transition.
In pseudocode, using the second approach, and days instead of individual records:
k = n
while k > 0:
k = prev (k)
function prev (k):
if fun[k] == fun[k-1]:
return k-1
cost_i_to_k = 0
quantity_i_to_k = 0
for i = k, k-1, ..., 1:
if k-i+1 > 31:
break
cost_i_to_k += cost[i]
quantity_i_to_k += quantity[i]
if quantity_i_to_k >= 32:
if fun[k] == fun[i-1] + cost_i_to_k:
writeln ("bucket from $ to $: cost $, quantity $",
i, k, cost_i_to_k, quantity_i_to_k)
return i-1
assert (false, "can't happen")
Simplify the challenge, but not too much, to make an overlookable example, which can be solved by hand.
That helps a lot in finding the right questions.
For example take only 10 days, and buckets of maximum length of 3:
For building buckets and colorizing them, we need only the ticket count, here 0, 1, 2, 3.
On Average, we need more than one bucket per day, for example 2-0-2 is 4 tickets in 3 days. Or 1-1-3, 1-3, 1-3-1, 3-1-2, 1-2.
But We can only choose 2 red buckets: 2-0-2 and (1-1-3 or 1-3-3 or 3-1-2) since 1-2 in the end is only 3 tickets, but we need at least 4 (one more ticket than max day span per bucket).
But while 3-1-2 is obviously more tickets than 1-1-3 tickets, the value of less tickets might be higher.
The blue colored area is the less interesting one, because it doesn't feed itself, by ticket count.

rudimentary flight trip planner

![question based on travel planner][1]
what approach will be best for solving this problem?,any kind of help will be appreciated
The input is the set of flights between various cities. It is given as a file. Each line of the file contains "city1 city2 departure-time arrival-time flight-no. price" This means that there is a flight called "flight-no" (which is a string of the form XY012) from city1 to city2 which leaves city1 at time "departure-time" and arrives city2 at time "arrival-time". Further the price of this flight is "price" which is a poitive integer. All times are given as a string of 4 digits in the 24hr format e.g. 1135, 0245, 2210. Assume that all city names are integers between 1 and a number N (where N is the total number of cities).
Note that there could be multiple flights between two cities (at different times).
The query that you have to answer is: given two cities "A" and "B", times "t1", "t2", where t1 < t2, find the cheapest trip which leaves city "A" after time "t1" and arrives at city "B" before time "t2". A trip is a sequence of flights which starts at A after time t1 and ends at B before time t2. Further, the departure time from any transit (intermediate) city C is at least 30 mins after the arrival at C
You can solve this problem with a graph search algorithm, such as Dijkstra's Algorithm.
The vertices of the graph are tuples of locations and (arrival) times. The edges are a combination of a layover (of at least 30 minutes) and an outgoing flight. The only difficulty is marking the vertices we've visited already (the "closed" list), since arriving in an airport at a given time shouldn't prevent consideration of flights into that airport that arrive earlier. My suggestion is to mark the departing flights we've already considered, rather than marking the airports.
Here's a quick implementation in Python. I assume that you've already parsed the flight data into a dictionary that maps from the departing airport name to a list of 5-tuples containing flight info ((flight_number, cost, destination_airport, departure_time, arrival_time)):
from heapq import heappush, heappop
from datetime import timedelta
def find_cheapest_route(flight_dict, start, start_time, target, target_time):
queue = [] # a min-heap based priority queue
taken_flights = set() # flights that have already been considered
heappush(queue, (0, start, start_time - timedelta(minutes=30), [])) # start state
while queue: # search loop
cost_so_far, location, time, route = heappop(queue) # pop the cheapest route
if location == target and time <= target_time: # see if we've found a solution
return route, cost
earliest_departure = time + timedelta(minutes=30) # minimum layover
for (flight_number, flight_cost, flight_dest, # loop on outgoing flights
flight_departure_time, flight_arrival_time) in flight_dict[location]:
if (flight_departure_time >= earliest_departure and # check flight
flight_arrival_time <= target_time and
flight_number not in taken_flights):
queue.heappush(queue, (cost_so_far + flight_cost, # add it to heap
flight_dest, flight_arrival_time,
route + [flight_number]))
taken_flights.add(flight_number) # and to the set of seen flights
# if we got here, there's no timely route to the destination
return None, None # or raise an exception
If you don't care about efficiency, you can solve the problem like this:
For each "final leg" flight arriving at the destination before t2, determine the departure city of the flight (cityX) and the departure time of the flight (tX). Subtract 30 minutes from the departure time (tX-30). Then recursively find the cheapest trip from start, departing after t1, arriving in cityX before tX-30. Add the cost of that trip to the cost of the final leg to determine the total cost of the trip. The minimum over all those trips is the flight you want.
There is perhaps a more efficient dynamic programming approach, but I might start with the above (which is very easy to code recursively).

Algorithm to select a best combination from two list

I have a search result from two way flight. So, there are two lists that contain the departure flights and arrival flights such as:
The departure flights list has 20 flights.
The arrival flights list has 30 flights
So, I will have 600 (20*30) combination between departure flight and arrival flight. I will call the combination list is the result list
However, I just want to select a limitation from 600 combination. For instance, I will select the best of 100 flight combination. The criteria to combine the flights is the cheap price for departure and arrival flight.
To do that, I will sort the result list by the total price of departure and arrival flight. And I then pick up the first 100 elements from result list to get what I want.
But, if the departure flights list has 200 flights and arrival flights list has 300 flights, I will have the result list with 60.000 elements. For that reason, I will sort a list with 60.000 elements to find the best 100 elements.
So, there is any an algorithm to select the best combinations as my case.
Thank you so much.
Not 100% clear from your question, but I understand that you are looking for a faster algorithm to find a certain number of best / cheapest combinations of departure and arrival flights.
You can do this much faster by sorting the lists of departure and arrival flights individually by cost and then using a heap for expanding the next-best combinations one-by-one until you have enough.
Here's the full algorithm -- in Python, but without using any special libraries, just standard data structures, so this should be easily transferable to any other language:
NUM_FLIGHTS, NUM_BEST = 1000, 100
# create test data: each entry corresponds to just the cost of one flight
from random import randint
dep = sorted([randint(1, 100) for i in range(NUM_FLIGHTS)])
arr = sorted([randint(1, 100) for i in range(NUM_FLIGHTS)])
def is_compatible(i, j): # for checking constraints, e.g. timing of flights
return True # but for now, assume no constraints
# get best combination using sorted lists and heap
from heapq import heappush, heappop
heap = [(dep[0] + arr[0], 0, 0)] # initial: best combination from dep and arr
result = [] # the result list
visited = set() # make sure not to add combinations twice
while heap and len(result) < NUM_BEST:
cost, i, j = heappop(heap) # get next-best combination
if (i, j) in visited: continue # did we see those before? skip
visited.add((i, j))
if is_compatible(i, j): # if 'compatible', add to results
result.append((cost, dep[i], arr[j]))
# add 'adjacent' combinations to the heap
if i < len(dep) - 1: # next-best departure + same arrival
heappush(heap, (dep[i+1] + arr[j], i+1, j))
if j < len(arr) - 1: # same departure + next-best arrival
heappush(heap, (dep[i] + arr[j+1], i, j+1))
print result
# just for testing: compare to brute-force (get best from all combinations)
comb = [(d, a) for d in dep for a in arr]
best = sorted((d+a, d, a) for (d, a) in comb)[:NUM_BEST]
print best
print result == best # True -> same results as brute force (just faster)
This works roughly like this:
sort both the departure flights dep and the arrival flights arr by their cost
create a heap and put the best combination (best departure and best arrival) as well as the corresponding indices in their lists into the heap: (dep[0] + arr[0], 0, 0)
repeat until you have enough combinations or there are no more elements in the heap:
pop the best element from the heap (sorted by total cost)
if it satisfies the contraints, add it to the result set
make sure you do not add flights twice to the result set, using visited set
add the two 'adjacent' combinations to the heap, i.e. taking the same flight from dep and the next from arr, and the next from dep and the same from arr, i.e. (dep[i+1] + arr[j], i+1, j) and (dep[i] + arr[j+1], i, j+1)
Here's a very small worked example. The axes are (the costs of) the dep and arr flights, and the entries in the table are in the form n(c)m, where n is the iteration that entry was added to the heap (if it is at all), c is the cost, and m is the iteration it was added to the 'top 10' result list (if any).
dep\arr 1 3 4 6 7
2 0(3)1 1(5)4 4(6)8 8(8)- -
2 1(3)2 2(5)6 6(6)9 9(8)- -
3 2(4)3 3(6)7 7(7)- - -
4 3(5)5 5(7)- - - -
6 5(7)10 - - - -
Result: (1,2), (1,2), (1,3), (3,2), (1,4), (3,2), (3,3), (2,4), (2,4), (1,6)
Note how the sums in both the columns and the rows of the matrix are always increasing, so the best results can always be found in a somewhat triangular area in the top-left. Now the idea is that if your currently best combination (the one that's first in the heap) is dep[i], arr[i], then there's no use in checking, e.g., combination dep[i+2], arr[i] before checking dep[i+1], arr[i], which must have a lower total cost, so you add dep[i+1], arr[i] (and likewise dep[i], arr[i+1]) to the heap, and repeat with popping the next element from the heap.
I compared the results of this algorithm to the results of your brute-force approach, and the resulting flights are the same, i.e. the algorithm works, and always yields the optimal result. Complexity should be O(n log(n)) for sorting the departure and arrival lists (n being the number of flights in those original lists), plus O(m log(m)) for the heap-loop (m iterations with log(m) work per iteration, m being the number of elements in the result list).
This finds the best 1,000 combinations of 100,000 departure and 100,000 arrival flights (for a total of 1,000,000,000,000 possible combinations) in less than one second.
Note that those numbers are for the case that you have no additional constraints, i.e. each departure flight can be combined with each arrival flight. If there are constraints, you can use the is_compatible function sketched in the above code to check those and to skip that pairing. This means, that for each incompatible pair with low total cost, the loop needs one additional iteration. This means that in the worst case, for example if there are no compatible pairs at all, or when the only compatible pairs are those with the highest total cost, the algorithm could in fact expand all the combination.
On average, though, this should not be the case, and the algorithm should perform rather quickly.
I think the best solution would be using some SQL statements to do the Cartesian product. You can apply any kind of filters, based on the data itself, ordering, range selection, etc. Something like this:
SELECT d.time as dep_time, a.time as arr_time, d.price+a.price as total_price
FROM departures d, arrivals a
WHERE a.time > d.time + X
ORDER BY d.price+a.price
LIMIT 0,100
Actually X can be even 0, but arrival should happen AFTER the departure anyways.
Why I would choose SQL:
It's closest to the data itself, you don't have to query them
It's highly optimized, if you use indexes, I'm sure you can't beat its performance with your own code
It's simple and declarative :)

Algorithm For Ranking Items

I have a list of 6500 items that I would like to trade or invest in. (Not for real money, but for a certain game.) Each item has 5 numbers that will be used to rank it among the others.
Total quantity of item traded per day: The higher this number, the better.
The Donchian Channel of the item over the last 5 days: The higher this number, the better.
The median spread of the price: The lower this number, the better.
The spread of the 20 day moving average for the item: The lower this number, the better.
The spread of the 5 day moving average for the item: The higher this number, the better.
All 5 numbers have the same 'weight', or in other words, they should all affect the final number in the with the same worth or value.
At the moment, I just multiply all 5 numbers for each item, but it doesn't rank the items the way I would them to be ranked. I just want to combine all 5 numbers into a weighted number that I can use to rank all 6500 items, but I'm unsure of how to do this correctly or mathematically.
Note: The total quantity of the item traded per day and the donchian channel are numbers that are much higher then the spreads, which are more of percentage type numbers. This is probably the reason why multiplying them all together didn't work for me; the quantity traded per day and the donchian channel had a much bigger role in the final number.
The reason people are having trouble answering this question is we have no way of comparing two different "attributes". If there were just two attributes, say quantity traded and median price spread, would (20million,50%) be worse or better than (100,1%)? Only you can decide this.
Converting everything into the same size numbers could help, this is what is known as "normalisation". A good way of doing this is the z-score which Prasad mentions. This is a statistical concept, looking at how the quantity varies. You need to make some assumptions about the statistical distributions of your numbers to use this.
Things like spreads are probably normally distributed - shaped like a normal distribution. For these, as Prasad says, take z(spread) = (spread-mean(spreads))/standardDeviation(spreads).
Things like the quantity traded might be a Power law distribution. For these you might want to take the log() before calculating the mean and sd. That is the z score is z(qty) = (log(qty)-mean(log(quantities)))/sd(log(quantities)).
Then just add up the z-score for each attribute.
To do this for each attribute you will need to have an idea of its distribution. You could guess but the best way is plot a graph and have a look. You might also want to plot graphs on log scales. See wikipedia for a long list.
You can replace each attribute-vector x (of length N = 6500) by the z-score of the vector Z(x), where
Z(x) = (x - mean(x))/sd(x).
This would transform them into the same "scale", and then you can add up the Z-scores (with equal weights) to get a final score, and rank the N=6500 items by this total score. If you can find in your problem some other attribute-vector that would be an indicator of "goodness" (say the 10-day return of the security?), then you could fit a regression model of this predicted attribute against these z-scored variables, to figure out the best non-uniform weights.
Start each item with a score of 0. For each of the 5 numbers, sort the list by that number and add each item's ranking in that sorting to its score. Then, just sort the items by the combined score.
You would usually normalize your data entries to their respective range. Since there is no fixed range for them, you'll have to use a sliding range - or, to keep it simpler, normalize them to the daily ranges.
For each day, get all entries for a given type, get the highest and the lowest of them, determine the difference between them. Let Bottom=value of the lowest, Range=difference between highest and lowest. Then you calculate for each entry (value - Bottom)/Range, which will result in something between 0.0 and 1.0. These are the numbers you can continue to work with, then.
Pseudocode (brackets replaced by indentation to make easier to read):
double maxvalues[5];
double minvalues[5];
// init arrays with any item
for(i=0; i<5; i++)
maxvalues[i] = items[0][i];
minvalues[i] = items[0][i];
// find minimum and maximum values
foreach (items as item)
for(i=0; i<5; i++)
if (minvalues[i] > item[i])
minvalues[i] = item[i];
if (maxvalues[i] < item[i])
maxvalues[i] = item[i];
// now scale them - in this case, to the range of 0 to 1.
double scaledItems[sizeof(items)][5];
double t;
foreach(i=0; i<5; i++)
double delta = maxvalues[i] - minvalues[i];
foreach(j=sizeof(items)-1; j>=0; --j)
scaledItems[j][i] = (items[j][i] - minvalues[i]) / delta;
// linear normalization
something like that. I'll be more elegant with a good library (STL, boost, whatever you have on the implementation platform), and the normalization should be in a separate function, so you can replace it with other variations like log() as the need arises.
Total quantity of item traded per day: The higher this number, the better. (a)
The Donchian Channel of the item over the last 5 days: The higher this number, the better. (b)
The median spread of the price: The lower this number, the better. (c)
The spread of the 20 day moving average for the item: The lower this number, the better. (d)
The spread of the 5 day moving average for the item: The higher this number, the better. (e)
a + b -c -d + e = "score" (higher score = better score)

Resources