I have installed multiple instances of OpenNMS.
It seems that it automatically deletes cleared alarms after 5 minutes. Is there a configuration where I can modify this time to 15 minutes?
Related
I am running luigi, a pipeline manager which is processing 1000 tasks. Currently I poll for the AWS termination notice. If it is present then I requeue the job; wait 30 minutes; then launch a new server starting all the tasks from scratch. However sometimes it restarts the same job multiple times which is inefficient.
Instead I am considering using create_fleet with InstanceInterruptionBehaviour=Stop? If I do this then when it restarts will it still be running the luigi daemon and retain the state of all the tasks?
All InstanceInterruptionBehaviour=Stop does is effectively shutdown your EC2 instance rather than terminate it. Since the "persistent" setting is required in addition to EBS storage" you will keep all the data currently on the attached EBS volumes at the time of the instance stop.
It is completely dependent on the application itself (Luigi in this case) to be able to store the state of its execution and pick back up from where it left off. For one, you'll want to ensure you enable the service daemon to automatically start upon system start (example):
sudo systemctl enable yourservice
Does anyone know how to retrieve a cluster that was auto removed from Databricks after not using it for some time? I added a bunch of libraries and global init scripts to it and it automatically got deleted after not using it for a month. I want to see what I did last time to either retrieve it or replicate it.
yes . 30 days after a cluster is terminated, it is permanently deleted. To keep an all-purpose cluster configuration even after a cluster has been terminated for more than 30 days, an administrator can pin the cluster. Up to 70 clusters can be pinned.You can refer : link
so I have a job that should run every minute. But one of the rules in my hosting provider is "Cron job should run at least once every 15 minutes". My question is does my task scheduler still running within the next 15 minutes? if no, any ideas to make it runs every minute? thanks
for some reason my cron job which is scheduled at 8 am has been running late. Yesterday it ran at 8:01:14 am and today at 8:02:29 am. The job runs inside a container and is the only cron entry in the crontab. I read somewhere that if you have > 100 jobs cron will reschedule your job to run 60 seconds later. But, since I am in a container on a server which has 2 container with 1 job per container I cant see how i would have over 100 jobs in the first place.
The job doesn't have any dependencies. Is there a way to see why this is happening? ever since I set this up 2 months ago it has been firing at 8:00 am sharp, has only acted up last 2 days.
The task is just a basic shell(.sh) script
The time on server and container are both in sync (tested using date command). No changes have been made this week at all.
What is interesting is that the other process in the other container which is basically a duplicate of the first one ran on time.
I have the following properties set in my oozie-site.xml (Using safety-valve in Cloudera Manager)
oozie.services.ext - org.apache.oozie.service.PurgeService
oozie.service.PurgeService.older.than - 15
oozie.service.PurgeService.coord.older.than - 7
oozie.service.PurgeService.bundle.older.than - 7
oozie.service.PurgeService.purge.interval - 60
However, I still see some old jobs which are KILLED or completed as old as September 2014
To give an example,
I have a Coordinator which is currently in RUNNING state. When I use the Oozie Web Console to list the instances of that Co-ordinator i.e. Click on Co-ordinator tab and click on my co-ordinator and in the pop up I see the oldest job of all materialised workflow jobs (co-ordinator actions) of September 2014.
I assume the property responsible for cleaning this up is oozie.service.PurgeService.older.than which I have set to 15 days.
So what am I missing here?
The problem is for long running coordinator jobs with high frequency. all child workflows are never purged as the coord job is still running.
The solution is to (quoting from the external link),
What you can do as a workaround, is split up your long-running
Coordinators. For example, instead of making your Coordinator run for
years? forever?, make it run for, say, 6 months. And have an
identical Coordinator scheduled to start exactly when that one ends.
This will allow Oozie to cleanup the old child Workflows from that
Coordinator every 6 months. Otherwise, you can schedule a cron job
to manually delete old jobs from the Database. However, please be
careful about this. When deleting a workflow job from the WF_JOBS
table, you'll also need to delete the workflow actions from the
WF_ACTIONS table that belong to it, as well as the coordinator action
from the WF_ACTIONS table that it belongs to. If you miss something,
it will likely cause problems.
References:
https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Oozie-not-cleaning-up-old-jobs-from-Oozie-database/m-p/30692#U30692
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/zkWa2kDMyyo
http://qnalist.com/questions/5404909/oozie-purging
JIRA Link:
https://issues.apache.org/jira/browse/OOZIE-1532