How to make whenever skip files in progress - ruby

We're using the Ruby gem whenever to manage large batches of import jobs. But what if a file is still being imported when the next cron job occurs?
For example:
12am: whenever starts an import cron job for import.csv
2am: import.csv is still being imported, but the next cron job is scheduled in whenever.
Would whenever skip that file or try to run it again? Any suggestions to make sure it doesn't try to process the same file twice?

Whenever is merely a frontend for the crontab. Whenever doesn't actually launch any of the processes, it writes a crontab that handles the actual scheduling and launching. Whenever cannot do what you're asking.
The crontab cannot do what you want either. It launches the process and that's it.
You need to implement the checking yourself in the process launched by cron. A common way of doing this could be a lockfile, and I'm sure there are libraries for this (ie http://rubygems.org/gems/lockfile).
Depending on your situation you might be able to create other checks before launching the import.

Well, this isn't really an issue of whenever
However, you could rename the file you want to import when you start processing (12am to 2am is a reasonable amount of time to do that) and move it to an archive directory once you are done processing so there is no confusion.
The next time the task runs it should look for all files that do not match a naming pattern (as already suggested in one of the comments)
And you might want to add an additional task that checks for imports that might have failed (e.g. a file has a naming pattern including the exact time but after a whole day it is still not archived) and either create some kind of notification or just trigger the task again/rename the task so it is picked up again (depending on how well your rollback works)

Related

What's the difference between Laravels Queue\ShouldBeUnique and Queue\Middleware\WithoutOverlapping?

I have a job that is somehow getting kicked off multiple times. I want the job to kick off once and only once. If any other attempts to run the job while it's already on the queue, I want those runs to ABORT.
I've read the Laravel 8 documentation and can't figure out if I should use:
Queue\ShouldBeUnique (documented here: https://laravel.com/docs/8.x/queues#unique-jobs)
OR
Queue\Middleware\WithoutOverlapping
mentioned here: https://laravel.com/docs/8.x/queues#preventing-job-overlaps
I believe the first one aborts subsequent attempts to run the job whereas the second keeps it queued, just makes sure it doesn't run until the first job is finished. Can anyone confirm?
Confirmed locally by attempting to run multiple instances of the same job in a console window.
Implementing the Queue\ShouldBeUnique interface in the class of my job means that subsequent attempts are ABORTED.
Whereas adding ->withoutOverlapping() to the end of my job reference in the app\console\kernel.php file simply prevents it from running simultaneously. It does NOT abort the job if one is already running.

Oozie make-like behavior

I'm currently developing a set of map reduce tasks that have to be run in a particular order. I'm looking to use Oozie to manage the dependencies and running of this workflow. There's one key feature that I need, though, and I can't find any documentation that suggests that it is possible.
Basically, I am looking for a way to setup an action that checks to see if its output file is newer than the input file (and associated map-reduce code) has changed before executing the action. If so, it would skip executing the action. This way, I could make a change to a script and have only that stage of the workflow (and any that depend on its output) run.
Does anyone know how I'd go about doing this?
How about using shell action in oozie where in you can run a shell script which actually checks for difference in the content of the defined file. And then on success of this action goto the map-red action and continue your job else goto fail case and kill your job.
Hope this idea helps you , If this is what you are looking for

Running a custom Node script on DocPad server

Say I want to run a custom Node script on my DocPad server once a day (like a cron job), where would I put it? I can build a Node script that does stuff after an interval, I'm more curious about where to reference / run the script in the DocPad server.
A plugin is possible, though I've seen that you can require Node libraries within the DocPad configuration file so it could go in there.
Is there a suggested way to approach this?
If you're wanting something purely cron-like, probably using the docpadReady event would be the way to go, doing something like:
docpadReady: ->
require('schedule').every('2 minutes').do ->
require('safeps').spawn('your cron job')
Alternatively, maybe DocPad's regenerateEvery configuration option is suitable. This tells DocPad to regenerate every X millseconds, which will naturally call the generate events that you could hook into.
Alternatively, is there a need for these crons to run on the same server as DocPad? If not, you could do them completely separately.
A final option, is to see if your server you are deploying to supports spawning multiple files. So DocPad's Server is spawned, and so is cron, with DocPad not knowing about the cron task at all.

Crontab job as a service

I have a script that pulls some data from a web service and populates a mysql database. The idea is that this runs every minute, so I added a cron job to execute the script.
However, I would like the ability to occasionally suspend and re-start the job without modifying my crontab.
What is the best practice for achieving this? Or should I not really be using crontab to schedule something that I want to occasionally suspend?
I am considering an implementation where a global variable is set, and checked inside the script. But I thought I would canvas for more apt solutions first. The simpler the better - I am new to both scripting and ruby.
If I were you my script would look at a static switch, like you said with your global variable, but test for a file existence instead of a global variable. This seems clean to me.
Another solution is to have a service not using crontab but calling your script every minute. This service would be like other services in /etc/init.d or (/etc/rc.d depending on your distribution) and have start, stop and restart commands as other services.
These 2 solutions can be mixed:
the service only create or delete the switching file, and the crontab line is always active.
Or your service directly edits the crontab like this, but
I prefer not editing the crontab via a script and the described technique in the article is not atomic (if you change your crontab between the reading and the writting by the script your change is lost).
So at your place I would go for 1.

whenever gem minutely job. If minutely job takes more than one minute then what

I am plannning to use whenever gem which among other things will also run minutely rake task. If my rake task takes more than a minute then based on the output from whenever gem it seems like the second instance of the rake task will kick-in even though the first one is not quite finished.
Will whenever gem will wait for the miutely task to finish before starting the second one?
If not then what are the workarounds. I believe this question is better served in serverfault still I am putting it here.
whenever just writes cronjobs, and makes no effort to stop them overrunning themselves. This is the job of the task that is being run.
Use PID files, or file system locks to prevent the task running over the top of itself.
In my scheduled application, I scan the process list looking for other instances of my application running with the same configuration file on the command line - then exit with a logged note if a process is already running with the same configuration file.
That keeps the program from stepping on itself ...
PID files or some type of "locking" file are prone to problems when the process exits but the lock file still exists.

Resources