Best approach: transfer daily values from one year to another - algorithm

I will try to explain what I want to accomplish. I am looking for an algorithm or approach, not the actual implementation in my specific system.
I have a table with actuals (incoming customer requests) on a daily basis. These actuals need to be "copied" into the next year, where they will be used as a basis for planning the amount of requests in the future.
The smallest timespan for planning, on a technical basis, is a "period", which consists of at least one day. A period always changes after a week or after a month. This means, that if a week is both in May and June, it will be split in two periods.
Here's an example:
2010-05-24 - 2010-05-30 Week 21 | Period_Id 123
2010-05-31 - 2010-05-31 Week 22 | Period_Id 124
2010-06-01 - 2010-06-06 Week 22 | Period_Id 125
We did this to reduce the amount of data, because we have a few thousand items that have 356 daily values. For planning, this is reduced to "a few thousand x 65" (or whatever the period count is per year). I can aggregate a month, or a week, by combining all periods that belong to one month. The important thing about this is, I could still use daily values, then find the corresponding period and add it there if necessary.
What I need, is an approach on aggregating the actuals for every (working)day, week or month in next years equivalent period. My requirements are not fixed here. The actuals have a certain distribution, because there are certain deadlines and habits that are reflected in the data. I would like to be able to preserve this as far as possible, but planning is never completely accurate, so I can make a compromise here.

Don't know if this is what you're looking for, but this is a strategy for calculating the forecasts using flexible periods:
First define a mapping for each day in next year to the corresponding day in this year. Then when you need a forecast for period x you take all days in that period and sum the actuals for the matching days.
With this you can precalculate every week/month but create new forecasts if the contents of periods change.

Map weeks to weeks. The first full week of this year to the first full week of the next. Don't worry about "periods" and aggregation; they are irrelevant.
Where a missing holiday leaves a hole in the data, just take the values for the same day of the previous week or the next week, and do the same at the beginning/end of the year.
Now for each day of the week, combine the results for the year and look for events more than, say, two standard deviations from the mean (if you don't know what that means then skip this step), and look for correlations with known events like holidays. If a holiday doesn't show an effect in this test then ignore it. If you find an effect, shift it to compensate for the different date next year. Don't worry about higher-order effects, you don't have enough data to pin them down.
Now draw in periods wherever you like and aggregate all you want.
Don't make any promises about the accuracy of these predictions, there's no way to know it. Don't worry about whether this is the best possible way; it isn't, but it's as good as any you're likely to find. You can spend as much more time and effort fine-tuning this as you wish; it might raise expectations but it's not likely to make the results much more accurate-- it's about as likely to make them worse.

There is no A-priori way to answer that question. You have to look at your data, and decide what the important parameters (day of week, week number, month, season, temperature outside?) using the results.
For example, if many of your customers are jewish/muslim, then the gregorian calendar, and ISO-week numbers and all that won't help you much, because jewish/muslim holidays (and so users behaviour) are determined using other calendars.
Another example - Trying to predict iPhone search volume according to last year's search doesn't sound like a good idea. It seems that the important timescales are much longer than a year (the technology becoming mainstream over the years) and much shorter than a year (Specific events that affect us for days-weeks).

Related

Simple algorithm to alternate days

I need to alternate between 2 tasks every day, and I need a simple algorithm to know which task I need to do.
I need to be able to run this algorithm by head, using simple general knowledge (like day of week, day of month, etc), and it must not rely of which task has been done the previous day (because I have a crappy memory).
I have tried checking for parity in a combination of day of week / day of month / # of month, etc, but couldn't find a suitable system: day of week have 2 consecutive odd numbers, same goes for day of month every so often.
I am afraid that this is impossible: if you can't remember what you did the day before, any other procedure will require more mnemonic effort.
remember what you did on January first (or another date),
remember the parities of the cumulated months: oeoeoeooeoe or ooeoeoeeoeo for a leap year,
add the cumulated parity of the month before* to the parity of the day,
add that to the parity of the first task.
E.g. if A on January 1st 2022, then on March 17, 2022: e + o = o gives B.
*In January, use even.
You can also state the month parity rule as: until August inclusive, use the co-parity of the month number; then use the parity. But for a leap year, change that parity after February (excluded).
I need to be able to run this algorithm by head
So, you don't need to take help of Computer science. You can use cognitive human ability to map a thing to another thing.
Note: This need not make sense to everybody though, if you are thinking out of the box.
Map task 1 as God's day.
Map task 2 as Devil's day in your brain.
This should be simple just like day and night.
Now, remember that devil's evil karma is always burnt by God the next day and that devil never learns his lesson. So this way, alternating would be easy.
Friends Episode snippet on Youtube
Just count the number of days in between your date and a given "zero" one...then use parity.
Take number of seconds (or milli, or whatever) since EPOCH (common zero for date and time), divide (integer division) by 60x60x24 (or 1000x60x60x24, or what is appropriate), you then get the number of days since EPOCH.
----EDIT----
Example: Got 1653910695 seconds since EPOCH (at the time of my experience). Dividing it by 60x60x24 give 19142 days. To morrow it will give 19143, etc.
<?php
$day = Date('j');
$previous_day = date('j', strtotime("-1 days"));
if($day%2==0 OR $previous_day%2!=0)
echo "Task 1";
}else{
echo "Task 2";
}
?>

What's the language-agnostic algorithm for finding the "Same day of week last year"

I want to find the "same day of the week last year". I'm sure that question is going to have litany of subtleties that I've not yet begun to think about but I believe this question is likely a common one.
Here are a few use cases where someone might want to use this algorithm:
Example 1
I'm a manager at a Walmart. I want to find out how many kitten mittens I sold the same day last year. I know that kitten mitten purchases are closely related to day of week and week of year. Thus I want to know "how many kitten mittens do I need to stock for tomorrow "the first Tuesday in January".
Example 2
I'm a nurse at a hospital. I want to determine how many patients are coming in each day next week so I can better align staff with bed demand. I know that there are strong trends with how many patients arrive at the hospital by day of week, and I want to see how many patients we had "the third Friday of November" last year.
I feel like this is a standard problem people have to have come across. Is there a best approach to this challenge? I can imagine issues where in the current year there are five Fridays in say November, and the last year there were only four, so you would not be able to report in that manner.
What is a language-independent (although if you're curious, I would be implementing this in M) approach to this algorithm?
There is a function that returns week day number. In Cache it's $zd(date,10), in GT.M there should be similar one. All you need to do is to correct your date using this function:
set currentDate=+$h
set currentWeekDay=$zd(currentDate,10)
set dateAboutYearAgo=currentDate-365
set weekDayAboutYearAgo=$zd(dateAboutYearAgo,10)
set sameWeekDayAboutYearAgo=dateAboutYearAgo-weekDayAboutYearAgo+currentWeekDay

How to handle recurring times?

First off, I marked this question as language agnostic, but I'm using PHP and MySQL. It shouldn't affect the question itself very much tho.
I'm creating an application which shows times of certain shows throughout the week. Every single show is recurring (on weekly basis) and there might be shows which are airing through 2 days - eg. starting on Sunday at 23:30, ending on Monday at 00:30. I'm storing start of the show (day of the week - Monday, Tuesday... - it's never exact date; time) and duration. There are never shows that would take more than 24 hours.
My problem is with validation if newly added shows aren't overlapping some old ones. Especially if it comes to Sunday-Monday shows.
How are such recurring events usually handled on both DB side and server side?
tl;dr version with stuff I considered
My first idea was to create some custom validation algorithm, but it seemed too cumbersome and complicated. Not that I'd whine about complicated hand-made solutions, but I'm interested if there isn't something more basic that I'm missing.
Other alternative that came to mind was to change table structure to use datetime (instead of "day of week" and "time"), and use a fake fixed date range to store the data. For example all Mondays would be set to 5th Jan 1970, Sundays would use 11th Jan 1970. There would be one exception to this rule - if there would be some show which starts on Sunday and ends on Monday, it would be stored as 12th Jan 1970. This solution would allow more flexible quering of the DB than the original one, and it would also simplify queries for shows which overlap between individual weeks (since we can do the comparison directly in the query). There are some disadvantages to this solution as well (for one, using fake dates might make it confusing).
Both solutions smell of wrong algorithms to me and would love to hear some opinions from more experienced fellow developers.
Sounds like you could just store the starting minute of each show as an integer number of minutes since the start of the week (10,080 possible values).
Then a show starting at minute $a with duration $dur_a will overlap $b if and only if
(10080 + $b - $a) % 10080 < $dur_a
For example consider a show starting at 11pm Sunday and another starting at 12.30am Monday. Here $a == 10020 and $dur_a == 120 and $b == 30. (10080 + $b - $a) % 10080 == 90. This is less than $dur_a and hence the shows overlap.
This problem could be simplified by converting the data into a format that is amenable to the calculations that are required. I recommend creating a type that represents the start times as the number of minutes from Sunday at midnight. Then simple integer range comparisons could be used to find overlapping shows.
The internal representation must, of course, be hidden and abstracted. You may, at some point, want to change the representation from minutes to seconds, for example.
I would opt for a custom validation algorithm:
For each show, compute all showing intervals [start1, end1], [start2, end2], ... [startN, endN], where N is the number of recurrence of the show.
For a new show, also compute these intervals.
Now check if any of these new intervals intersect any old intervals. This is the case if the start or the end of one interval is contained in the other.

Slight problem with day of the week calculation (base doomsday for a century)

From this online calculator: http://homer.freeshell.org/dd.cgi using its data I've successfully written a working version, however its data is limited to years 1500 to 2600. I want to modify (and make a better one) so that I can calculate for any year > 2600.
Referring to Table X, is there actually a formula to calculate the base doomsday for all base centuries (above 2600)?
I've tried working it out myself by putting centuries higher than this e.g. 2700 gave me a base doomsday of '00', 2800 gave '02;, 2900 back to '00' again...
Help appreciated.
As I understand it, that page's “Base Doomsday” is just an offset to allow for the four-hundred-year cycle of leap day calculations. So, you can extend it indefinitely into the future simply by adding blocks of four centuries.
Are there any other calculators out there that do this?
Two common methods for calculating the day of the week
given a date are Doomsday, which you are using,
and Zeller's Congruence
www.merlyn.demon.co.uk provides
some really interesting information on date/time calculations, various calendar
systems and significant dates as they relate to calendar/date calculations.
The calculator at this link http://homer.freeshell.org/dd.cgi is the best in terms of explaining doomsday algorithm cleanly and clearly for human, with one little caveat.
If you input 2/29/1900, it would say it's a Thursday. Well, there is no 2/29/1900, because it's not a leap year.
Of course if your input 1/35/2016, it would "garbage-in-garbage-out" for you as well.
Imagine there are only 364 days in a year, then the day of week for each date will never change year after year, because mod(364,7)==0.
But we have 365 days a year, so the day steps forward 1 each year, that's where the second term mod(year, 7) comes from.
In addition, every 4 year, there is a leap year, which contributes to the last term mod(year, 4).
But every 100 years, you subtract a leap year, and every 400 years, you add one leap year. That's where the first term "3,2,0,5" comes in.
You see, it's all because of this leap year, and mod(365,7)==1 business.
7/11, 5to9 helps to remember table Z greatly.

Efficient algorithm for determining if a date is in DST

I'm looking for a better than O(n) algorithm to determine if a date in the future will have daylight savings time applied (and how much). Given a year, month, day, hour, minute and time zone (and a copy of the Olsen Time Zone database) how does one efficiently determine if that date will be in DST? I'm looking for the algorithm, not a library function to call.
Thank you.
FURTHER EXPLANATION: The date library I'm using is very slow when you create an object with a date in the future and a time zone. It turns out its doing a linear calculation to calculate if the date is in daylight savings time. Not only that, its doing this at object creation time. Obviously it could wait until asked, but it should also be more efficient.
Sure, DST rules change and a date library can't predict the future, but the alternative is to put an arbitrary upper limit on localized dates.
Everybody's already commented on the problems with always-changing DSTs. But I can accept the premise that we just pretend the currently known rules will apply forever.
To get your DST information, the first thing to do is to calculate the year/month/day for your future date (if it isn't in that form already). Then you look up your time zone and pull out the variation against UTC, the DST on/off rule and offset. There could be several different rules depending on which year, you want to be sure to grab the right one for your "target" year. For reasons explained below, it may be handy to also be aware of the rules for the preceding year.
The on/off rules will have a funny spec like "Oct lastSun": That means the switch occurs in the night of the last Sunday in October.
What you need to do is gather up all of these tersely formatted "rules" and develop a bit of code for each to determine the last date indicated by that rule. It's currently December, so given a couple of rules like "Mar lastSun" and "Oct lastSun" for my time zone, those dates would be March 29, 2009 and October 25, 2009. Which of these dates is more recent? October. October is associated with an "off", so we must currently have NO DST.
You can calculate the DST on/off dates for the current (i.e. target) year regardless of whether the target date is before or after those dates; if the on/off date is in the future of your target date, then simply do the rule calculation again for the previous year. Note that the rules may have changed during the interval, so be sure to apply the correct one for the year you're looking at.
Worst case for this calculation is, you have to repeat your two rule calculations for the previous year. But there's no searching going on otherwise, so it's strictly O(1).
I found a Local/DST/Tz calculator here: http://home-4.tiscali.nl/~t876506/WhatDay.html and as it's a JavaScript applet you should be able to simply crib the code. It doesn't handle all rules, though, so you will need to add some code for the remaining rules.
Update: I just noticed you have an hour and minute in your time as well. That complicates matters just a little. If your date is not on a "switch" date then the instructions I gave above will do you fine. Otherwise, you need to consider the time. I guess the cleanest thing to do would be to include the time in your determination of "most recent". I.e. if your target time is 00:30 UTC and switch time for the given zone is 01:00, then the target year's switch time is still in the future and you have to use the previous year's switch time instead. For practical purposes, this will mean that the "other" switch time was the most recent, and its on/off status applies.
Your number one problem is daylight savings rules that are set by the local authorities. The latter can pass almost any law at any time and therefore change the rules in a way you can't possibly predict.
As far as I know DST changes that are known start and end on a fixed day each year (first weekend in april, last weekend in october, stuff like that). So you could ese the Doomsday Algorithm to find the days of the week for the given year and calculate the conversion dates from that. Then you can determine if DST is in effect in source and/or destination locale. The converion itself is simply a matter of adding and/or subtracting an hour to compensate for DST and then factor in the timezone difference.
Hmm, as I see the problematic point is to determine weekday for a given day, far in the future.
For that, I suggest something like that:
after every 400 years, the complete system turns around, so first divide the number of years with 400, take the integral part. In 400 years, there are 99 leap years and 301 simple ones. If an arbitrary day is Monday, then the same day 400 years later will be 301+2x99 = 499 (mod 7) ---> Monday+2 ---> Wednesday. So you have to say something like that:
wday = (ref_day + 2 * (int)((target_year - ref_year) / 400)) mod 7
then you can do further optimizations, but I guess you can go year-by-year, that will do it. At most you make 399 simple operations, if (leap_year) then ++ else +=2, mod 7.
After you have the weekday for Jan 1 that year, you can calculate DST switching dates, as Carl Smotricz has written.

Resources