Algorithm to find the most valuable combination? - algorithm

I'm working on a small project where I need help in finding the best and cheapest tickets based on some input from the user:
Between what periods (start & end date)?
Within that period, are you skipping 1 or several dates?
How many times do you need to use the ticket each day?
There are x number of tickets. A ticket can cover:
Single ticket, to be used only once, price $5.
Period ticket (unlimited rides each day), to be used as much as you want from 1 day/$10, 3 days/$30, 7 days/$45..
I guess I'm looking for some kind of algorithm to determine the best combination of tickets based on periods (including or excluding skipping dates), and also their price.
Also, I guess there needs to be considered the case where it will be a better and cheaper outcome for me to buy a period ticket that covers more days than I actually need, but is cheaper based on how many rides I'm going for each day...
UPDATE (based on Petr suggestion..)
<?php
$tickets = array(
array("price"=>5, "label"=>"single", "period"=>null),
array("price"=>10, "label"=>"1 day", "period"=>1),
array("price"=>30, "label"=>"3 days", "period"=>3),
array("price"=>45, "label"=>"7 days", "period"=>7)
);
$trips = 2;
$startDate = new DateTime("2015-06-23");
$endDate = new DateTime("2015-06-30");
$endDate->modify("+1 day");
$interval = DateInterval::createFromDateString('1 day');
$period = new DatePeriod($startDate, $interval, $endDate);
$cost = array();
$day = 1;
foreach( $period as $date ){
$span = $startDate->diff($date);
$days = ( $span->format('%a') + 1 );
$ticket = getCheapestTicket( $days );
$cost[ $day ] = $ticket;
$day++;
}
function getCheapestTicket( $days ){
global $tickets, $trips;
$lowestSum = null;
$cheapestTicket = null;
echo "-- getCheapestTicket --" . PHP_EOL;
echo "DAYS TO COVER: " . $days . " / TRIPS: " . $trips . PHP_EOL;
foreach( $tickets as $ticket ){
$price = $ticket['price'];
$period = $ticket['period'] ? $ticket['period'] : -1;
if( $ticket['period'] ){
$units = ceil( $days / $period );
$sum = round( $units * $price );
}else{
$units = ceil( $days * $trips );
$sum = round( ( $days * $price ) * $trips );
}
if( $sum <= $lowestSum || !$lowestSum ){
if( $ticket['period'] > $cheapestTicket['period'] ){
$cheapestTicket = $ticket;
$lowestSum = $sum;
}else{
$lowestSum = $sum;
$cheapestTicket = $ticket;
}
}
echo "TICKET: " . $ticket['label'] . " / Units to cover days: " . $units . " / Sum: " . $sum . " / Period: " . $period . PHP_EOL;
}
echo "CHEAPEST TICKET: " . $cheapestTicket['label'] .
" / PRICE PER UNIT: " . $cheapestTicket['price'] . " / SUM: " . $lowestSum . PHP_EOL. PHP_EOL;
return $cheapestTicket;
}
I'm not sure if this is on the way yet :)

Lets assume you have all the data stored in some array of days and each day has the number of rides for that day written down.
Side note: I am going to relax the conditions of a ticket lasting 24 hours and just assume each periodical ticket is good for that date (i.e, not starting at 15:00 and lasting until 14:59 the next day). This could be fixed by looking at it as hourly time units instead of days.
Sub optimal solution:
Now lets assign to all the days the cost of buying one ride tickets for that day and then start iterating over the array and checking whether or not you could substitute some of them with a cheaper ticket. You finish when no changes are done. The problem here is you might miss the optimal solution. You might assign two 3-day tickets (days 1-3 and 7-9) where a 7-day ticket (2-8) and two 1-day tickets would be better.
Tree solution: (data in leafs)
The tree option would be to arrange it in a tree with each sub tree holding the best solution for that sub tree. Then, each subtree root could check if using a ticket "covering" only part of the subtree could be useful, when taking into account the values of the roots of the parts left out.
Maybe a rank tree would come in handy here.

You can solve this problem using a dynamic programming approach.
Firstly, for simplicity of the algorithm, let's for each l calculate the cheapest single ticket that can be used to cover l consecutive days. For your example this will be: 1 day $10, 2 days $30 (buy a 3-day ticket and use it only for 2 days), 3 days $30, 4-7 days $45, etc. (There will obviously be some maximal value of l beyond which there will be no such ticket.) Denote these results as cost[l].
Now the main dynamic programming solution. For each date i in your [begin, end] range, calculate ans[i] = the minimal cost to buy tickets to cover at least interval from begin to i.
Assuming that you have already calculated all the values before date i, calculation for date i is simple. You will need some ticket that ends on day i. Let's say it covers length of l days, then the price for this last ticket will be cost[l], and you will also have to cover the days from begin to i-l, which will cost ans[i-l] (or zero if i-l is before begin). So for a given i iterate over all possible ls and find the one that minimizes the solution.
This gives you the O(NL) solution, where N is the number of days and L is the maximal span of a single ticket.
This all assumes that each ticked covers several full consecutive days. If it covers, say, 24 full hours (from the hour of buying to the same hour next day), then just calculate answers for each hour.

As from my example, based on what #Petr said, I don't really know how it can solve the situation where for example the period covers 8 days (2 trips each day) and you end up with a result like this:
-- getCheapestTicket --
DAYS TO COVER: 8 / TRIPS: 2
TICKET: single / Units to cover days: 16 / Sum: 80 / Period: -1
TICKET: 1 day / Units to cover days: 8 / Sum: 80 / Period: 1
TICKET: 3 days / Units to cover days: 3 / Sum: 90 / Period: 3
TICKET: 7 days / Units to cover days: 2 / Sum: 90 / Period: 7
CHEAPEST TICKET: 1 day / PRICE PER UNIT: 10 / SUM: 80
Where it should give me a result of this combination:
7 days, $45
1 day, $10
Or is this what you mean when you said "(There will obviously be some maximal value of l beyond which there will be no such ticket.)"?
Would be really sweet to get another round of explanation on your thoughts!

Related

How to define this time complexity?

Given the problem to calculate the next year that has no duplicated digits on it. I created this code:
import math
# Time complexity:
# Space complexity:
def solve(year):
""" Method to identify the closest year without repeating two digits on it."""
year_str = str(year)
if year_str[3] == year_str[0] or year_str[3] == year_str[1] or year_str[3] == year_str[2]:
year = year + 1
return solve(year)
if year_str[2] == year_str[0] or year_str[2] == year_str[1]:
year = (math.floor(year/10)+1)*10
return solve(year)
if year_str[1] == year_str[0]:
year = (math.floor(year/100)+1)*100
return solve(year)
return year
y = int(input())
print(solve(y+1))
Not sure if this is the more optimal solution, but I would like to define the time complexity of this solution.
However, I am not sure how to do it in this case.
According to my tests, when it has to process the year 2222 for example, it calls the function "solve" 4 times.
But in cases near the change of decade and century, e.g. 1987 (with result 2013), it calls the function "solve" 8 times.
Does anybody have any idea?

Sort Thousands of Chuck E. Cheese Tickets

I need to sort an n-thousand size array of random unique positive integers into groups of consecutive integers, each of group size k or larger, and then further grouped into dividends of some arbitrary positive integer j.
In other words, let's say I work at Chuck E. Cheese and we sometimes give away free tickets. I have a couple hundred thousand tickets on the floor and want to find out what employee handed out what but only for ticket groupings of consecutive integers that are larger than 500. Each employee has a random number from 0 to 100 assigned to them. That number corresponds to what "batch" of tickets where handed out, i.e. tickets from #000000 to #001499 where handed out by employee 1, tickets from #001500 to #002999 were handed out by employee 2, and so on. A large number of tickets are lost or are missing. I only care about groups of consecutive ticket numbers larger than 500.
What is the fastest way for me to sort through this pile?
Edit:
As requested by #trincot, here is a worked out example:
I have 150,000 unique tickets on the floor ranging from ticket #000000 to #200000 (i.e. missing 50,001 random tickets from the pile)
Step 1: sort each ticket from smallest to largest using an introsort algorithm.
Step 2: go through the list of tickets one by one and gather only tickets with "consecutiveness" greater than 500. i.e. I keep a tally of how many consecutive values I have found and only keep those with tallys 500 or higher. If I have tickets #409 thru #909 but not #408 or #1000 then I would keep that group but if that group had missed a ticket anywhere from #409 to #909, I would have thrown out the group and moved on.
Step 3: combine all my newly sorted groups together, each of which are size 500 or larger.
Step 4: figure out what tickets belong to who by going through the final numbers one by one again, dividing each by 1500, rounding down to nearest whole number, and putting them in their respective pile where each pile represents an employee.
The end result is a set of piles telling me which employees gave out more than 500 tickets at a time, how many times they did so, and what tickets they did so with.
Sample with numbers:
where k == 3 and j = 1500; k is minimum consecutive integer grouping size, j is final ticket interval grouping size i.e. 5,6, and 7 fall into the 0th group of intervals of size 1500 and 5996, 5997, 5998, 5999 fall into the third group of intervals of size 1500.
Input: [5 , 5996 , 8111 , 1000 , 1001, 5999 , 8110 , 7 , 5998 , 2500 , 1250 , 6 , 8109 , 5997]
Output:[ 0:[5, 6, 7] , 3:[5996, 5997, 5998, 5999] , 5:[8109, 8110, 8111] ]
Here is how you could do it in Python:
from collections import defaultdict
def partition(data, k, j):
data = sorted(data)
start = data[0] # assuming data is not an empty list
count = 0
output = defaultdict(list) # to automatically create a partition when referenced
for value in data:
bucket = value // j # integer division
if value % j == start % j + count: # in same partition & consecutive?
count += 1
if count == k:
# Add the k entries that we skipped so far:
output[bucket].extend(list(range(start, start+count)))
elif count > k:
output[bucket].append(value)
else:
start = value
count = 1
return dict(output)
# The example given in the question:
data = [5, 5996, 8111, 1000, 1001, 5999, 8110, 7, 5998, 2500, 1250, 6, 8109, 5997]
print(partition(data, k=3, j=1500))
# outputs {0: [5, 6, 7], 3: [5996, 5997, 5998, 5999], 5: [8109, 8110, 8111]}
Here is untested Python for the fastest approach that I can think of. It will return just pairs of first/last ticket for each range of interest found.
def grouped_tickets (tickets, min_group_size, partition_size):
tickets = sorted(tickets)
answer = {}
min_ticket = -1
max_ticket = -1
next_partition = 0
for ticket in tickets:
if next_partition <= ticket or max_ticket + 1 < ticket:
if min_group_size <= max_ticket - min_ticket + 1:
partition = min_ticket // partition_size
if partition in answer:
answer[partition].append((min_ticket, max_ticket))
else:
answer[partition] = [(min_ticket, max_ticket)]
# Find where the next partition is.
next_partition = (ticket // partition_size) * partition_size + partition_size
min_ticket = ticket
max_ticket = ticket
else:
max_ticket = ticket
# And don't lose the last group!
if min_group_size <= max_ticket - min_ticket + 1:
partition = min_ticket // partition_size
if partition in answer:
answer[partition].append((min_ticket, max_ticket))
else:
answer[partition] = [(min_ticket, max_ticket)]
return answer

Dynamic Programming and Probability

I've been staring at this problem for hours and I'm still as lost as I was at the beginning. It's been a while since I took discrete math or statistics so I tried watching some videos on youtube, but I couldn't find anything that would help me solve the problem in less than what seems to be exponential time. Any tips on how to approach the problem below would be very much appreciated!
A certain species of fern thrives in lush rainy regions, where it typically rains almost every day.
However, a drought is expected over the next n days, and a team of botanists is concerned about
the survival of the species through the drought. Specifically, the team is convinced of the following
hypothesis: the fern population will survive if and only if it rains on at least n/2 days during the
n-day drought. In other words, for the species to survive there must be at least as many rainy days
as non-rainy days.
Local weather experts predict that the probability that it rains on a day i ∈ {1, . . . , n} is
pi ∈ [0, 1], and that these n random events are independent. Assuming both the botanists and
weather experts are correct, show how to compute the probability that the ferns survive the drought.
Your algorithm should run in time O(n2).
Have an (n + 1)×n matrix such that C[i][j] denotes the probability that after ith day there will have been j rainy days (i runs from 1 to n, j runs from 0 to n). Initialize:
C[1][0] = 1 - p[1]
C[1][1] = p[1]
C[1][j] = 0 for j > 1
Now loop over the days and set the values of the matrix like this:
C[i][0] = (1 - p[i]) * C[i-1][0]
C[i][j] = (1 - p[i]) * C[i-1][j] + p[i] * C[i - 1][j - 1] for j > 0
Finally, sum the values from C[n][n/2] to C[n][n] to get the probability of fern survival.
Dynamic programming problems can be solved in a top down or bottom up fashion.
You've already had the bottom up version described. To do the top-down version, write a recursive function, then add a caching layer so you don't recompute any results that you already computed. In pseudo-code:
cache = {}
function whatever(args)
if args not in cache
compute result
cache[args] = result
return cache[args]
This process is called "memoization" and many languages have ways of automatically memoizing things.
Here is a Python implementation of this specific example:
def prob_survival(daily_probabilities):
days = len(daily_probabilities)
days_needed = days / 2
# An inner function to do the calculation.
cached_odds = {}
def prob_survival(day, rained):
if days_needed <= rained:
return 1.0
elif days <= day:
return 0.0
elif (day, rained) not in cached_odds:
p = daily_probabilities[day]
p_a = p * prob_survival(day+1, rained+1)
p_b = (1- p) * prob_survival(day+1, rained)
cached_odds[(day, rained)] = p_a + p_b
return cached_odds[(day, rained)]
return prob_survival(0, 0)
And then you would call it as follows:
print(prob_survival([0.2, 0.4, 0.6, 0.8])

What time system is this?

Im trying to understand some program that handles time in a certain system I am still to know of. Hopefully you can tell me what system is, if any.
One of the value in numbers is 170000000 and it represents the 26th of April 2037. Another example is 164632577 which represents the 20th of December 2022.
I tested both with an EPOCH converter but I get completely different dates so its not EPOCH for sure. Have any clue?
Thank you.
We assume that the fomula for converting from a date to those strange time units is of the following form:
f(x) = m*x + b
where x is in strange time units and f is in days:
f(2037*365.2425 + 31 + 28 + 31 + 26) = 170000000
f(2022*365.2425 + 365.2425 - (31 + 1 - 20) = 164632577
because we have two data points, we can make two formulas:
I : f1 = m * x1 + b
II: f2 = m * x2 + b
Now we’re looking for: m, b
We solve as follows:
I => III: b = f1 - m*x1
III into II: f2 = m*x2 + f1 - m*x1 => f2 - f1 = m(x2 - x1) => m = (f2 - f1) / (x2 - x1)
goes down to:
m = 1024.04 units/day and (exactly 1024, most likely, because that’s 2^10)
with b = f1 - m*x1
b = -591973731.84 (??)
so you get:
for converting from days since year 1 to those strange time units:
f(x) = 1024 * x - 591973731.84
where x is in days, so year * 365.2524 + (months-1) * 30 + days
testing it reveals that
f(Jan 1st 2038 = 2038*365.2425) = 1024*2038*365.2425 - 591973731.84 = 170255224.3, which is just a bit more than Dec 20th 2037, so it works.
Strangely, the 0 point of those strange time units represents about the year 1582 (solution for x of f(x) = 0).
170000000 seems very rounded. Do you know exact seconds within the game your numbers represent?
It should be a linear system, and you know that the value of (170000000 - 164632577) = 5367423 equals the difference between your days (2037-04-26 - 2022-12-20) = 5241 days.
This means that one day is (5241 / 5367423) = 0.00097644623873...
Counting back from 164632577 to zero places takes us back (164632577 * 0.00097644623873...) = 160754.86 days, from 2022-12-20 to 1582-11-02.
Same calculations done on 170000000 takes us back (170000000 * 0.00097644623873...) = 165995,86 days, from 2037-04-26 to 1582-11-02. Heuruka!
So, you have a system where timeFor($value) = [1582-11-02] + [0.00097644623873... * $value days].
Issues:
There are several rounding issues with these numbers. Your dates most probably include seconds, but we've calculating on whole days.
We're moving back in time, and time travel related issues will appear. These include, but are not limited to, non existing dates according to your calendar.
Things to consider; October 15th of 1582, or 1582-10-15 is the start of the Gregorian calender. This is probably the real start date for your data.
Edit: I previously wrote that the multiplier should probably be 0.001, but as Daniel noted in another answer, it's actually 1/1024 = 0.0009765625.

how to get a value when looping in FORTRAN

Hello guys! There is a population of say, 120 million, which increases by 8% every year. I want to have a DO loop starting from 1990 to 2020 to state the year the population exceeds 125 million. Pseudocode or Fortran code will be appreciated.
This is not a problem for which loops are either necessary or appropriate. The simple equation
num_years = log(125.0/120.0)/log(1.08)
(which evaluates to approximately 0.53) is all that is necessary. This is a straightforward rewriting of the formula for compound interest calculations, that is
compound_amt = initial_amt * (1+interest_rate)**num_years
with, in this case, initial_amt = 120*10**6, compound_amt = 125*10**6 and interest_rate = 8%.
Its easy to find tutorials for loops in Fortran that solves your problems. See here for example. But generally, you want something like this:
sum=120e6
startyear=1990
do i = 1,30
sum = sum + sum*8./100.
if sum > 125e6 then
write(*,*), "Year ", i+startyear, " population exceeded ", sum
end if
end do
population = 120000.0
year = 1990
loop:
population = population + (population * 0.08)
year = year + 1
if (population > 125000.0) go to print_done
if (year > 2020) go to print_not_found
go to loop
print_done:
print "The population was " population " in the year " year
stop
print_not_found:
print "Searched to year " year " and the population only reached " population
stop
Note that there's an issue as to whether the year should be incremented before or after the population is checked. This depends on whether you want the population at the beginning of the year or the end of the year (assuming the initial value was at the beginning of the year).

Resources