how to get airflow schedule right - etl

I found very confusing about how the airflow schedule works.
I would like to schedule a dag that runs on Friday and I would like use its result on Saturday. So I did the crontab expression like this: 00 16 * * 5, however, as of today 2020-03-10, the last execution date I got from airflow run is 2020-02-28. This is not desired as the most recent Friday is actually 2020-03-06, I couldn't get the 2020-03-06 to run unless I schedule it every day and skip it if it is not Friday. Is there a way to do this schedule right?

A lot of people get confused by how Airflow's execution_date and schedule_interval values work, namely that it waits for a period of time to "close" before it'll execute for that period; here's a portion from a previous answer I gave:
Think of it like this: If you ran a process quarterly and generated a report from data for that quarter, would you name the report for the quarter you were in when you created the file, or for the quarter the data in the report is from? That's what the execution_date is.
Try changing your start_date to be less one whole schedule interval. It should run on 03/06 but its execution_date will say 02/28

Related

How can I prevent a CA7 job with two schedule IDs from conflicting when they run on the same day?

I have a single job with two schedule ID’s. One runs the job the last day of the month and the other schid runs the last Sunday of the month. The problem I’m trying to correct arises when the last day of the month falls on a Sunday which causes a conflict between the two schedules.
My first thought was to split the two schids into two separate job names, while making one job a requirement for the other so they can’t run at the same time. The problem is this scenario would only apply on the rare days they both run and on all other days the requirement wouldn’t be met.
I’m not extremely familiar with ca7 so thought I’d look for some input. Can a schid be created with a conditional statement?
I think two schids will be needed. The options assume a calendar that specifies all days as processing days.
Option 1
Schid=1 schedules the job on the last Sunday of the month UNLESS that Sunday falls on the last day of the month.
Schid=2 schedules the job on the last day of the month.
ID=001 ROLL=N INDEX=+000
SCAL= DOTM=2000 LEADTM=0010 STARTM=1950
MONTHLY DAY=SUN WEEK=-00 MONTH=ALL
MONTHLY NRDAY=-00 MONTH=ALL
ID=002 ROLL=N INDEX=+000
SCAL= DOTM=2000 LEADTM=0010 STARTM=1950
MONTHLY RDAY=-00 MONTH=ALL
Option 2
Schid=1 runs the last day of the month unless that day falls on a Sunday.
Schid=2 runs on the last Sunday of the month.
ID=001 ROLL=D INDEX=+000
SCAL= DOTM=1300 LEADTM=0030 STARTM=1230
MONTHLY NDAY=SUN WEEK=-00 MONTH=ALL
MONTHLY RDAY=-00 MONTH=ALL
ID=002 ROLL=D INDEX=+000
SCAL= DOTM=1300 LEADTM=0030 STARTM=1230
MONTHLY DAY=SUN WEEK=-00 MONTH=ALL
Another approach is to define a separate calendar with precisely the days I want the job to run so the schedule definition is pretty simple.

Storing recurring time periods in Oracle database

I'm writing monitoring software, where most of the logic will be in Oracle databasen & pl/sql.
When my monitoring is called it should alert about problems. For example, it should alert about problem if
1. There are less than 2 operation, in every minute, on Friday from 22:00 till 23:00
2. There are less than 5 operation, in every minute, on 31 of January from 22:00-23:00
3. There are less than 3 operation, in every minute, every day from 10:00 till 12:00
If my monitoring is called on 22:30, 31 of January I should compare my operation number to 5.
4. If there are less than 5 operation, in every minute, from Friday 22:00 till Monday 15:00
I was thinking about saving data periods with cron expression format in database. In this case I have to compare SYSDATE (current call date of monitoring function) to cron expression saved in the database.
My questions:
1. How can I find out if SYSDATE falls under cron expression?
2. Is it correct to use cron expressions in this case, at all? Can you suggest any other way of saving periods of time.
Don't do it
I am completely with SpaceTrucker: Don't do it in SQL or PL/SQL, do it in Java with either Java 8 date API or JodaTime.
How to do it nevertheless
But even when you should't do it, there might still be some good reason to do it. So here is how:
Table for each instant you want to check
First let's create a table for each second or minute in the interval you want to check. The granularity and the length of your interval depends on the cron expressions you want to allow. Usually one second for a whole week should be sufficient (about 100'000 rows). If you want to check a whole year, use minutes as granularity (about 500'000 rows). Both amount or rows are nothing for a modern database. On my notebook, according queries return instantly.
CREATE TABLE week AS
SELECT
running_second,
ts,
EXTRACT(SECOND FROM ts) as sec,
EXTRACT(MINUTE FROM ts) as min,
EXTRACT(HOUR FROM ts) as h,
to_char(ts, 'Day') as dow
FROM (
SELECT
level as running_second,
TO_TIMESTAMP_TZ('2015-09-05 00:00:00 0:00',
'YYYY-MM-DD HH24:MI:SS TZH:TZM') +
NUMTODSINTERVAL(level-1, 'SECOND') AS ts
FROM dual CONNECT BY level<=60*60*24*7
)
;
Query for each filter expression
Next, you convert each cron expression to a query. You can either use PL/SQL to transform each cron expression to a where clause, or you can use a generic where clause.
You should get something like this:
SELECT
*
FROM
week
WHERE
h =5
AND min=0
AND sec=0;
or in a generic version:
SELECT
filter_expression.name, week.ts
FROM
week, filter_expressions
WHERE
(fiter_hour is null or h = filter_hour)
AND (filter_min is null or min = filer_min)
AND (filter_sec is null or sec = filter_sec);
(given your filters are stored in a table filter_expressions, that has a column for each constraint type, and each row has either a parameter for the constraint or NULL if the constraint is not applicable).
Store the result in a global temporary table cron_startpoints.
Check for violations
Group the table cron_startpoints to check for constraint violations. You can count, how many matches are there for Friday or midnight or whatever and can check, whether that number is OK for you or not.
It depends on how much flexibility you want. For the examples you provided such structure would be enough:
CREATE TABLE monitoring_periods (
id INTEGER NOT NULL PRIMARY KEY,
monit_month VARCHAR2(2),
monit_day VARCHAR(2),
monit_day_of_week VARCHAR(3),
monit_time_from INTERVAL DAY TO SECOND,
monit_time_to INTERVAL DAY TO SECOND,
required_ops INTEGER
);
Here are some examples to store the periods and checking against sysdate. I would avoid storing the cron expression literally as a string, as it would require parsing it at query time. However, the more complex your expressions are (kind of '5 4,15,22 */2 * 1-5') the more complicated the structure to store it - you need to think carefully of your requirements.
I once had the task to write difficult date calculations with recurring periods and time windoes for 10g. Among those were things like "Tuesday of the second week of the month every 2 months between 8 AM and 2 PM". We decided to use java stored procedures for this (also because they were already in use for other purposes).
Depending on your oracle version, you can choose a joda-time version, which can be run within the oracle database jvm. Also note that joda-time 1.6 can be compiled with java 1.3 (which we had to use).
If you are looking for cron expressions explicitly, than you might also do well with using another java library within the oracle database jvm. For example here is one:
CronExpression expression = CronExpression.parser()
.withSecondsField(true)
.withOneBasedDayOfWeek(true)
.allowBothDayFields(false)
.parse("0 15 10 L * ?");
assert expression.matches(dateTime);
However i think cron is not suited for your task at hand. Cron is a way to specify when to run jobs. However you need to observe what happend. So for your requirement There are less than 2 operation, in every minute you could have operations at the 1st and 2nd second or at the 1st 31st second and both are valid, but their cron expressions are very different.
When it's about saving the time periods, you could also look at ISO 8601 recurinng intervals stored as varchars:
P1Y2M10DT2H30M
In any case you will need to apply calculations on every row you would like to match. Depending on how many lines that are, you might need to use some heuristics to sort out results which are far away from meeting your criteria.
Thinking a bit more outside the box:
you should question your architecture. The requirements you listed ca be represented by state machines. You can feed them with the events that occured in chronological order. If a state machine reaches some unwanted state you can just report that. However I doubt that this can be easily done in pure pl/sql.

How to automatically update dates in past tasks based on the start date on a task in MS Project

I have a Project file with 30 tasks registered and they are all set to Automatically Schedule.
I have tasks done until task number 10, so I want to set the task number 11 start date as Today, but I want the past tasks, from 1 to 10, to have its dates automatically updated.
When I just click on the start date of task number 11 and change to Today, Project creates a "No earlier than" constraint, but that is not what I want.
Could someone help me?
A task's scheduled start date is calculated using the following:
relationships (e.g. predecessors)
constraints (e.g. "Start No Earlier Than")
calendars (e.g. which days are working days)
If Task 11 has no predecessors and no constraints, it will default to start on the first working day after the project start date. If you want it to start today, use a constraint or drive the date by setting a Finish-to-Start relationship to a task that completed yesterday.

How can I dynamically set the time period on a Workflow Time Trigger?

I have a custom object with Start and End date fields, and also a Status field.
When a new record is created, if the Start date specified is in the future, I would like to be able to have a Workflow update the Status field to 'Active' when the Start date is today. However, I only appear to be able to select a fixed number of days after which my Field Update task can be triggered. What I want is to set that number of days equal to Start date - Today's date.
Can this be done?
OK, I realised I can set the time delay to zero days after the Start date. Not intuitive (For me), but it makes sense.

Every 4th Monday?

I need a reminder every 4th Monday (= every 28 days) from the Monday I start counting. So I thought I can have ruby script
to send me an email ( I know how so do this part )
every 4th Monday , not every 4th Monday of a month. ( <--- how to code this bit in ruby? )
I will run the ruby scrip via cron every Monday ....
Assuming you've already determined that it's a Monday, then count the days since the beginning of the year (yday()), modulus by 28, and see if the result is in 22 and 27 (inclusive) or is 0.
Given a starting (initial run) date, you can just use step(28) to get the next date and if it equals the current date, run and then store that for your next run.
As an aside, if you had meant the 4th Monday in the month, check the current day of the Month, and if it's between the 22nd and 28th of the month, you're in the fourth occurance of that day of the week that month.
Give this a try. Whenever is cron for ruby.
This gentleman calculated the first week of the month, if you're running your script every Monday by cron, then you only need to find out if it is the 4th week of the month by ruby.
http://www.ruby-forum.com/topic/95015
I think you should use these two ruby gems:
To set a cron job -> WHENEVER - https://github.com/javan/whenever
gem install whenever
To send an email (obviously) -> GMAIL - https://github.com/nu7hatch/gmail
gem install gmail
Good Luck :D

Resources