Evenly schedule timed items into fixed size containers - algorithm

I got a task at work to evenly schedule commercial timed items into pre-defined commercial breaks (containers).
Each campaign has a set of commercials with or without spreading order. I need to allow users to chose multiple campaigns and distribute all the commercials to best fit the breaks within a time window.
Example of Campaign A:
Item | Duration | # of times to schedule | order
A1 15 sec 2 1
A2 25 sec 2 2
A3 30 sec 2 3
Required outcome:
each item should appear only once in a break, no repeating.
if there is specific order try to best fit by keeping the order. If
no order shuffle it.
At the end of the process the breaks should contain evenly amount of
commercial time.
Ideal spread would fully fill all desired campaigns into the breaks.
For example: Campaign {Item,Duration,#ofTimes,Order}
Campaign A which has set {A1,15,2,1},{A2,25,2,2},{A3,10,1,3}
Campaign B which has set {B1,20,2,2},{B2,35,3,1},
Campaign C which has set {C1,10,1,1},{C2,15,2,3 sec},{C3,15,1,2 sec}
,{C4,40,1,4}
A client will choose to schedule those campaigns in a specific date that hold 5 breaks of 60 second each.
A good outcome would result in:
Container 1: {A1,15}{B2,35}{C1,10} total of 60 sec
Container 2: {C3,15}{A2,25}{B1,20} total of 60 sec
Container 3: {A3,10}{C2,15}{B2,35} total of 60 sec
Container 4: {C4,40}{B1,20} total of 60 sec
Container 5: {C2,15}{A3,10}{B3,35} total of 60 sec
Of course it's rarely that all will fit so perfectly in real-life examples.
There are so many combinations with large amount of items and I'm not sure how to go about it. The order of items inside a break needs to be dynamically calculated so that the end result would best fit all the items into the breaks.
If the architecture is poor and someone has a better idea (like giving priority to items over order and schedule based on priority or such I'll be glad to hear).

It seems like simulated annealing might be a good way to approach this problem. Just incorporate your constraints of: keeping order, even spreading and fitting into 60sec frame into the scoring function. Your random neighbor function might just swap 2 items with each other | move an item to a different frame.

Related

Best data structure to find price difference between two time frames

I am working on a project where my task is to find the % price change given 2 time frames for 100+ stocks in an efficient way.
The time frames are pre defined and can only be 5 mins, 10 mins, 30 mins, 1 hour, 4 hour, 12 hour, 24 hours.
Given a time frame, I need a way to efficiently figure out the % price change of all the stocks that I am tracking.
As per the current implementation, I am getting price data for those stocks every second and dumping the data to a price table.
I have another cron job which updates the % change of the stock based on the values in the price table every few seconds.
The solution is kind of in working state but its not efficient. Is there any data structure/ algorithm that I can use to find the % change efficiently?

Distribute user active time blocks subject to total constraint

I am building an agent-based model for product usage. I am trying to develop a function to decide whether the user is using the product at a given time, while incorporating randomness.
So, say we know the user spends a total of 1 hour per day using the product, and we know the average distribution of this time (e.g., most used at 6-8pm).
How can I generate a set of usage/non-usage times (i.e., during each 10 minute block is the user active or not) while ensuring that at the end of the day the total active time sums to one hour.
In most cases I would just run the distributor without concern for the total, and then at the end normalize by making it proportional to the total target time so the total was 1 hour. However, I can't do that because time blocks must be 10 minutes. I think this is a different question because I'm really not computing time ranges, I'm computing booleans to associate with different 10 minute time blocks (e.g., the user was/was not active during a given block).
Is there a standard way to do this?
I did some more thinking and figured it out, if anyone else is looking at this.
The approach to take is this: You know the allowed number n of 10-minute time blocks for a given agent.
Iterate n times, and on each iteration select a time block out of the day subject to your activity distribution function.
Main point is to iterate over the number of time blocks you want to place, not over the entire day.

Algorithm for aggregating stock chart datapoints out of many DB entries

I have a database with stock values in a table, for example:
id - unique id for this entry
stockId - ticker symbol of the stock
value - price of the stock
timestamp - timestamp of that price
I would like to create separate arrays for a timeframe of 24 hour, 7 days and 1 month from my database entries, each array containing datapoints for a stock chart.
For some stockIds, I have just a few data points per hour, for others it could be hundreds or thousands.
My question:
What is a good algorithm to "aggregate" the possibly many datapoints into a few - for example, for the 24 hours chart I would like to have at a maximum 10 datapoints per hour. How do I handle exceptionally high / low values?
What is the common approach in regards to stock charts?
Thank you for reading!
Some options: (assuming 10 points per hour, i.e. one roughly every 6 minutes)
For every 6 minute period, pick the data point closest to the centre of the period
For every 6 minute period, take the average of the points over that period
For an hour period, find the maximum and minimum for each 4 minutes period and pick the 5 maximum and 5 minimum in these respective sets (4 minutes is somewhat randomly chosen).
I originally thought to pick the 5 minimum points and the 5 maximum points such that each maximum point is at least 8 minutes apart, and similarly for minimum points.
The 8 minutes here is so we don't have all the points stacked up on each other. Why 8 minutes? At an even distribution, 60/5 = 12 minutes, so just moving a bit away from that gives us 8 minutes.
But, in terms of the implementation, the 4 minutes approach will be much simpler and should give similar results.
You'll have to see which one gives you the most desirable results. The last one is likely to give a decent indication of variation across the period, whereas the second one is likely to have a more stable graph. The first one can be a bit more erratic.

Algorithm to decompose set of timestamps into subsets with even temporal spacing

I have a dataset containing > 100,000 records where each record has a timestamp.
This dataset has been aggregated from several "controller" nodes which each collect their data from a set of children nodes. Each controller collects these records periodically, (e.g. once every 5 minutes or once every 10 minutes), and it is the controller that applies the timestamp to the records.
E.g:
Controller One might have 20 records timestamped at time t, 23 records timestamped at time t + 5 minutes, 33 records at time t + 10 minutes.
Controller Two might have 30 records timestamped at time (t + 2 minutes) + 10 minutes, 32 records timestamped at time (t + 2 minutes) + 20 minutes, 41 records timestamped at time (t + 2 minutes) + 30 minutes etcetera.
Assume now that the only information you have is the set of all timestamps and a count of how many records appeared at each timestamp. That is to say, you don't know i) which sets of records were produced by which controller, ii) the collection interval of each controller or ii) the total number of controllers. Is there an algorithm which can decompose the set of all timestamps into individual subsets such that the variance in difference between consecutive (ordered) elements of each given subset is very close to 0, while adding any element from one subset i to another subset j would increase this variance? Keep in mind, for this dataset, a single controller's "periodicity" could fluctuate by +/- a few seconds because of CPU timing/network latency etc.
My ultimate objective here is to establish a) how many controllers there are and b) the sampling interval of each controller. So far I've been thinking about the problem in terms of periodic functions, so perhaps there are some decomposition methods from that area that could be useful.
The other point to make is that I don't need to know which controller each record came from, I just need to know the sampling interval of each controller. So e.g. if there were two controllers that both started sampling at time u, and one sampled at 5-minute intervals and the other at 50-minute intervals, it would be hard to separate the two at the 50-minute mark because 5 is a factor of 50. This doesn't matter, so long as I can garner enough information to work out the intervals of each controller despite these occasional overlaps.
One basic approach would be to perform an FFT decomposition (or, if you're feeling fancy, a periodogram) of the dataset and look for peaks in the resulting spectrum. This will give you a crude approximation of the periods of the controllers, and may even give you an estimate of their number (and by looking at the height of the peaks, it can tell you how many records were logged).

Google transit is too idealistic. How would you change that?

Suppose you want to get from point A to point B. You use Google Transit directions, and it tells you:
Route 1:
1. Wait 5 minutes
2. Walk from point A to Bus stop 1 for 8 minutes
3. Take bus 69 till stop 2 (15 minues)
4. Wait 2 minutes
5. Take bus 6969 till stop 3(12 minutes)
6. Walk 7 minutes from stop 3 till point B for 3 minutes.
Total time = 5 wait + 40 minutes.
Route 2:
1. Wait 10 minutes
2. Walk from point A to Bus stop I for 13 minutes
3. Take bus 96 till stop II (10 minues)
4. Wait 17 minutes
5. Take bus 9696 till stop 3(12 minutes)
6. Walk 7 minutes from stop 3 till point B for 8 minutes.
Total time = 10 wait + 50 minutes.
All in all Route 1 looks way better. However, what really happens in practice is that bus 69 is 3 minutes behind due to traffic, and I end up missing bus 6969. The next bus 6969 comes at least 30 minutes later, which amounts to 5 wait + 70 minutes (including 30 m wait in the cold or heat). Would not it be nice if Google actually advertised this possibility? My question now is: what is the better algorithm for displaying the top 3 routes, given uncertainty in the schedule?
Thanks!
How about adding weightings that express a level of uncertainty for different types of journey elements.
Bus services in Dublin City are notoriously untimely, you could add a 40% margin of error to anything to do with Dublin Bus schedule, giving a best & worst case scenario. you could also factor in the chronic traffic delays at rush hours. Then a user could see that they may have a 20% or 80% chance of actually making a connection.
You could sort "best" journeys by the "most probably correct" factor, and include this data in the results shown to the user.
My two cents :)
For the UK rail system, each interchange node has an associated 'minimum transfer time to allow'. The interface to the route planner here then has an Advanced option allowing the user to either accept the default, or add half hour increments.
In your example, setting a' minimum transfer time to allow' of say 10 minutes at step 2 would prevent Route 1 as shown being suggested. Of course, this means that the minimum possible journey time is increased, but that's the trade off.
If you take uncertainty into account then there is no longer a "best route", but instead there can be a "best strategy" that minimizes the total time in transit; however, it can't be represented as a linear sequence of instructions but is more of the form of a general plan, i.e. "go to bus station X, wait until 10:00 for bus Y, if it does not arrive walk to station Z..." This would be notoriously difficult to present to the user (in addition of being computationally expensive to produce).
For a fixed sequence of instructions it is possible to calculate the probability that it actually works out; but what would be the level of certainty users want to accept? Would you be content with, say, 80% success rate? When you then miss one of your connections the house of cards falls down in the worst case, e.g. if you miss a train that leaves every second hour.
I wrote many years a go a similar program to calculate long-distance bus journeys in Finland, and I just reported the transfer times assuming every bus was on schedule. Then basically every plan with less than 15 minutes transfer time or so was disregarded because they were too risky (there were sometimes only one or two long-distance buses per day at a given route).
Empirically. Record the actual arrival times vs scheduled arrival times, and compute the mean and standard deviation for each. When considering possible routes, calculate the probability that a given leg will arrive late enough to make you miss the next leg, and make the average wait time P(on time)*T(first bus) + (1-P(on time))*T(second bus). This gets more complicated if you have to consider multiple legs, each of which could be late independently, and multiple possible next legs you could miss, but the general principle holds.
Catastrophic failure should be the first check.
This is especially important when you are trying to connect to that last bus of the day which is a critical part of the route. The rider needs to know that is what is happening so he doesn't get too distracted and knows the risk.
After that it could evaluate worst-case single misses.
And then, if you really wanna get fancy, take a look at the crime stats for the neighborhood or transit station where the waiting point is.

Resources