Date histogram every half a month - elasticsearch

How to write an interval that groups by every half a month? Rather than 1M I want something like 1/2M to group by from the first to the 16th and from the 16th to the end of the month, every month. Is there a way to do so?
I don't want to end up doing an interval on each day and then calculate manually my results as it's not clean and it would be resource hungry, is there a simple way to do so using setInterval? (in Elasticsearch or Elastica I don't care, I just want the algorithm behind it, thanks!)
$date_grp_agg = new \Elastica\Aggregation\DateHistogram('date');
$date_grp_agg->setField('date')->setFormat("MM-yy")->setInterval('1M'); // This one

Unfortunately, neither 0.5M (half month) nor 2w (2 weeks) are supported, but you could try to use a number of days, i.e. 15d.
$date_grp_agg->setField('date')->setFormat("MM-yy")->setInterval('15d');
Granted, it will not fit months perfectly, i.e. it won't start on the 1st and end on the last day of the month, but it can get you close to the kind of interval you're looking for.

Related

Simple algorithm to alternate days

I need to alternate between 2 tasks every day, and I need a simple algorithm to know which task I need to do.
I need to be able to run this algorithm by head, using simple general knowledge (like day of week, day of month, etc), and it must not rely of which task has been done the previous day (because I have a crappy memory).
I have tried checking for parity in a combination of day of week / day of month / # of month, etc, but couldn't find a suitable system: day of week have 2 consecutive odd numbers, same goes for day of month every so often.
I am afraid that this is impossible: if you can't remember what you did the day before, any other procedure will require more mnemonic effort.
remember what you did on January first (or another date),
remember the parities of the cumulated months: oeoeoeooeoe or ooeoeoeeoeo for a leap year,
add the cumulated parity of the month before* to the parity of the day,
add that to the parity of the first task.
E.g. if A on January 1st 2022, then on March 17, 2022: e + o = o gives B.
*In January, use even.
You can also state the month parity rule as: until August inclusive, use the co-parity of the month number; then use the parity. But for a leap year, change that parity after February (excluded).
I need to be able to run this algorithm by head
So, you don't need to take help of Computer science. You can use cognitive human ability to map a thing to another thing.
Note: This need not make sense to everybody though, if you are thinking out of the box.
Map task 1 as God's day.
Map task 2 as Devil's day in your brain.
This should be simple just like day and night.
Now, remember that devil's evil karma is always burnt by God the next day and that devil never learns his lesson. So this way, alternating would be easy.
Friends Episode snippet on Youtube
Just count the number of days in between your date and a given "zero" one...then use parity.
Take number of seconds (or milli, or whatever) since EPOCH (common zero for date and time), divide (integer division) by 60x60x24 (or 1000x60x60x24, or what is appropriate), you then get the number of days since EPOCH.
----EDIT----
Example: Got 1653910695 seconds since EPOCH (at the time of my experience). Dividing it by 60x60x24 give 19142 days. To morrow it will give 19143, etc.
<?php
$day = Date('j');
$previous_day = date('j', strtotime("-1 days"));
if($day%2==0 OR $previous_day%2!=0)
echo "Task 1";
}else{
echo "Task 2";
}
?>

Summing times in Google sheets

I have a sheet where I record my working hours (this is more for me to remind me to stop working than anything else). For every day, I have three possible shifts - early, normal & late, and I have a formula which will sum up any times put into these columns and give me the daily total hours.
To summarise the duration of time spent working in a day, I use the following formula: =(C41-B41)+(E41-D41)+12+(G41-F41) which is:
early end time minus early start time
normal end time minus normal start time PLUS 12 hours
late end time minus late start time
Which gives me output like this:
What I cannot seem to achieve is, the ability to sum the daily totals into something which shows me the total hours worked over 1-week. If I attempt to sum the daily totals together for the example image shown, I get some wild figure such as 1487:25:00 when formatting as 'Duration' or 23:25:00 when formatted as 'Time'!
All my cells where I record the hours worked are formatted as 'Time'
When using arithmetic operations on date values in Google Sheets, it's important to remember that the internal representation of a date is numeric, and understood as the number of days since January 1, 1970.
What follows from that, is that if you want to add 12 hours to a time duration, you should not write "+12" because that will in fact add 12 days. Instead add "+12/24". In other words, try the following formula instead of the one you are using now:
=(C41-B41)+(E41-D41)+(12/24+G41-F41)

Scheduling software data structures?

Just a question for the experienced programmers.
I'm creating a scheduling software.
I have a start date and a number of days it takes to complete the task and I want to calculate the end date.
Easy enough.
However, I want to subtract weekends and holidays.
My instinct is to have two lists, one of days that have passed and one for the weekends/holidays to skip, and for each day that passes check to see if it's a holiday or a weekend and skip it.
But that seems like it would scale exponentially.
Wondering how an experienced programmer would implement this.
Okay,
We have list of Holiday in a table.
We have start date.
we have number of days to complete task.
Initialize counter to 0
Loop counter till no of days to complete is reached
1. Increment start date with one.
2. Skip 4 if it belongs to Holiday.
3. Skip 4 if it is weekend.
4. increment the counter
Print start date as End date

How to handle recurring times?

First off, I marked this question as language agnostic, but I'm using PHP and MySQL. It shouldn't affect the question itself very much tho.
I'm creating an application which shows times of certain shows throughout the week. Every single show is recurring (on weekly basis) and there might be shows which are airing through 2 days - eg. starting on Sunday at 23:30, ending on Monday at 00:30. I'm storing start of the show (day of the week - Monday, Tuesday... - it's never exact date; time) and duration. There are never shows that would take more than 24 hours.
My problem is with validation if newly added shows aren't overlapping some old ones. Especially if it comes to Sunday-Monday shows.
How are such recurring events usually handled on both DB side and server side?
tl;dr version with stuff I considered
My first idea was to create some custom validation algorithm, but it seemed too cumbersome and complicated. Not that I'd whine about complicated hand-made solutions, but I'm interested if there isn't something more basic that I'm missing.
Other alternative that came to mind was to change table structure to use datetime (instead of "day of week" and "time"), and use a fake fixed date range to store the data. For example all Mondays would be set to 5th Jan 1970, Sundays would use 11th Jan 1970. There would be one exception to this rule - if there would be some show which starts on Sunday and ends on Monday, it would be stored as 12th Jan 1970. This solution would allow more flexible quering of the DB than the original one, and it would also simplify queries for shows which overlap between individual weeks (since we can do the comparison directly in the query). There are some disadvantages to this solution as well (for one, using fake dates might make it confusing).
Both solutions smell of wrong algorithms to me and would love to hear some opinions from more experienced fellow developers.
Sounds like you could just store the starting minute of each show as an integer number of minutes since the start of the week (10,080 possible values).
Then a show starting at minute $a with duration $dur_a will overlap $b if and only if
(10080 + $b - $a) % 10080 < $dur_a
For example consider a show starting at 11pm Sunday and another starting at 12.30am Monday. Here $a == 10020 and $dur_a == 120 and $b == 30. (10080 + $b - $a) % 10080 == 90. This is less than $dur_a and hence the shows overlap.
This problem could be simplified by converting the data into a format that is amenable to the calculations that are required. I recommend creating a type that represents the start times as the number of minutes from Sunday at midnight. Then simple integer range comparisons could be used to find overlapping shows.
The internal representation must, of course, be hidden and abstracted. You may, at some point, want to change the representation from minutes to seconds, for example.
I would opt for a custom validation algorithm:
For each show, compute all showing intervals [start1, end1], [start2, end2], ... [startN, endN], where N is the number of recurrence of the show.
For a new show, also compute these intervals.
Now check if any of these new intervals intersect any old intervals. This is the case if the start or the end of one interval is contained in the other.

Best approach: transfer daily values from one year to another

I will try to explain what I want to accomplish. I am looking for an algorithm or approach, not the actual implementation in my specific system.
I have a table with actuals (incoming customer requests) on a daily basis. These actuals need to be "copied" into the next year, where they will be used as a basis for planning the amount of requests in the future.
The smallest timespan for planning, on a technical basis, is a "period", which consists of at least one day. A period always changes after a week or after a month. This means, that if a week is both in May and June, it will be split in two periods.
Here's an example:
2010-05-24 - 2010-05-30 Week 21 | Period_Id 123
2010-05-31 - 2010-05-31 Week 22 | Period_Id 124
2010-06-01 - 2010-06-06 Week 22 | Period_Id 125
We did this to reduce the amount of data, because we have a few thousand items that have 356 daily values. For planning, this is reduced to "a few thousand x 65" (or whatever the period count is per year). I can aggregate a month, or a week, by combining all periods that belong to one month. The important thing about this is, I could still use daily values, then find the corresponding period and add it there if necessary.
What I need, is an approach on aggregating the actuals for every (working)day, week or month in next years equivalent period. My requirements are not fixed here. The actuals have a certain distribution, because there are certain deadlines and habits that are reflected in the data. I would like to be able to preserve this as far as possible, but planning is never completely accurate, so I can make a compromise here.
Don't know if this is what you're looking for, but this is a strategy for calculating the forecasts using flexible periods:
First define a mapping for each day in next year to the corresponding day in this year. Then when you need a forecast for period x you take all days in that period and sum the actuals for the matching days.
With this you can precalculate every week/month but create new forecasts if the contents of periods change.
Map weeks to weeks. The first full week of this year to the first full week of the next. Don't worry about "periods" and aggregation; they are irrelevant.
Where a missing holiday leaves a hole in the data, just take the values for the same day of the previous week or the next week, and do the same at the beginning/end of the year.
Now for each day of the week, combine the results for the year and look for events more than, say, two standard deviations from the mean (if you don't know what that means then skip this step), and look for correlations with known events like holidays. If a holiday doesn't show an effect in this test then ignore it. If you find an effect, shift it to compensate for the different date next year. Don't worry about higher-order effects, you don't have enough data to pin them down.
Now draw in periods wherever you like and aggregate all you want.
Don't make any promises about the accuracy of these predictions, there's no way to know it. Don't worry about whether this is the best possible way; it isn't, but it's as good as any you're likely to find. You can spend as much more time and effort fine-tuning this as you wish; it might raise expectations but it's not likely to make the results much more accurate-- it's about as likely to make them worse.
There is no A-priori way to answer that question. You have to look at your data, and decide what the important parameters (day of week, week number, month, season, temperature outside?) using the results.
For example, if many of your customers are jewish/muslim, then the gregorian calendar, and ISO-week numbers and all that won't help you much, because jewish/muslim holidays (and so users behaviour) are determined using other calendars.
Another example - Trying to predict iPhone search volume according to last year's search doesn't sound like a good idea. It seems that the important timescales are much longer than a year (the technology becoming mainstream over the years) and much shorter than a year (Specific events that affect us for days-weeks).

Resources