for some reason my cron job which is scheduled at 8 am has been running late. Yesterday it ran at 8:01:14 am and today at 8:02:29 am. The job runs inside a container and is the only cron entry in the crontab. I read somewhere that if you have > 100 jobs cron will reschedule your job to run 60 seconds later. But, since I am in a container on a server which has 2 container with 1 job per container I cant see how i would have over 100 jobs in the first place.
The job doesn't have any dependencies. Is there a way to see why this is happening? ever since I set this up 2 months ago it has been firing at 8:00 am sharp, has only acted up last 2 days.
The task is just a basic shell(.sh) script
The time on server and container are both in sync (tested using date command). No changes have been made this week at all.
What is interesting is that the other process in the other container which is basically a duplicate of the first one ran on time.
Related
so I have a job that should run every minute. But one of the rules in my hosting provider is "Cron job should run at least once every 15 minutes". My question is does my task scheduler still running within the next 15 minutes? if no, any ideas to make it runs every minute? thanks
We are going to deploy same code on two different servers which will be started and running at the same time but the problem is both servers will be started at the same time will run cron job at the same time, then the cron job process will run by 2 servers which will be duplicated. So I want to start cron job on both servers should start at a different initial time so that 1 server can read some rows from DB and finish its task and change the status of the same row which is processed, so that other server can read only new entries.
I need to schedule the JDBC consumer job to run everyday morning at 5 am, as far as I know, I can make the job run at 5 am when I start the job at 5 am and put 24 hours in the query interval.
But I need to schedule the first instance to start at 5 am without starting it manually (i'm lazy to wake up at 5 am :P) Is there a way to achieve this?
(Copying my answer from Ask StreamSets)
There is no built-in scheduler in SDC, but you could use cron and the StreamSets CLI to start the pipeline.
I have the following properties set in my oozie-site.xml (Using safety-valve in Cloudera Manager)
oozie.services.ext - org.apache.oozie.service.PurgeService
oozie.service.PurgeService.older.than - 15
oozie.service.PurgeService.coord.older.than - 7
oozie.service.PurgeService.bundle.older.than - 7
oozie.service.PurgeService.purge.interval - 60
However, I still see some old jobs which are KILLED or completed as old as September 2014
To give an example,
I have a Coordinator which is currently in RUNNING state. When I use the Oozie Web Console to list the instances of that Co-ordinator i.e. Click on Co-ordinator tab and click on my co-ordinator and in the pop up I see the oldest job of all materialised workflow jobs (co-ordinator actions) of September 2014.
I assume the property responsible for cleaning this up is oozie.service.PurgeService.older.than which I have set to 15 days.
So what am I missing here?
The problem is for long running coordinator jobs with high frequency. all child workflows are never purged as the coord job is still running.
The solution is to (quoting from the external link),
What you can do as a workaround, is split up your long-running
Coordinators. For example, instead of making your Coordinator run for
years? forever?, make it run for, say, 6 months. And have an
identical Coordinator scheduled to start exactly when that one ends.
This will allow Oozie to cleanup the old child Workflows from that
Coordinator every 6 months. Otherwise, you can schedule a cron job
to manually delete old jobs from the Database. However, please be
careful about this. When deleting a workflow job from the WF_JOBS
table, you'll also need to delete the workflow actions from the
WF_ACTIONS table that belong to it, as well as the coordinator action
from the WF_ACTIONS table that it belongs to. If you miss something,
it will likely cause problems.
References:
https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Oozie-not-cleaning-up-old-jobs-from-Oozie-database/m-p/30692#U30692
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/zkWa2kDMyyo
http://qnalist.com/questions/5404909/oozie-purging
JIRA Link:
https://issues.apache.org/jira/browse/OOZIE-1532
I have a cron job that is set to run at midnight. I would like to know if this is midnight from the server timezone or is it as soon as i create the cron job i have to wait 24h until it's being executed?
You don't have to wait 24 hours. The cron in launched in the scheduled hour based on server's time.