Is it possible to list running jobs with DRMAA? - cluster-computing

I was wondering if it is possible to list all running jobs in the resource manager, using the DRMAA library, not just the ones started via DRMAA itself?
That is, getting data similar to what is output by the squeue command for the SLURM resource manager.

As far as I know, yes, it is, but only for DRMAAv2, which implements listing and job persistence:
https://github.com/troeger/drmaav2-mock/blob/master/drmaa2-list.c
The python-drmaa module does not implement DRMAAv2 yet, but we might start working soon on it:
https://github.com/drmaa-python
If you want to jump in, you're very welcome! ;)

Related

Can I specify a timeout for a GCP ai-platform training job?

I recently submitted a training job with a command that looked like:
gcloud ai-platform jobs submit training foo --region us-west2 --master-image-uri us.gcr.io/bar:latest -- baz qux
(more on how this command works here: https://cloud.google.com/ml-engine/docs/training-jobs)
There was a bug in my code which cause the job to just keep running, rather than terminate. Two weeks and $61 later, I discovered my error and cancelled the job. I want to make sure I don't make that kind of mistake again.
I'm considering using the timeout command within the training container to kill the process if it takes too long (typical runtime is about 2 or 3 hours), but rather than trust the container to kill itself, I would prefer to configure GCP to kill it externally.
Is there a way to achieve this?
As a workaround, you could write a small script that runs your command and then sleeps the time you want until running a cancel job command.
As a timeout definition is not available in AI Platform training service, I took the liberty to open a Public Issue with a Feature Request to record the lack of this command. You can track the PI progress here.
Except the script mentioned above, you can also try:
TimeOut Keras callback, or timeout= Optuna param (depending on which library you actually use)
Cron-triggered Lambda (Cloud Function)

How to run spark-jobs outside the bin folder of spark-2.1.1-bin-hadoop2.7

I have an existing spark-job, the functionality of this spark-job is to connect kafka-server get the data and then storing the data into cassandra tables, now this spark-job is running on server inside spark-2.1.1-bin-hadoop2.7/bin but whenever I am trying to run this spark-job from other location, Its not running, this spark-job contains some JavaRDD related code.
Is there any chance, I can run this spark-job from outside also by adding any dependency in pom or something else?
whenever I am trying to run this spark-job from other location, Its not running
spark-job is a custom launcher script for a Spark application, perhaps with some additional command-line options and packages. Open it, review the content and fix the issue.
If it's too hard to figure out what spark-job does and there's no one nearby to help you out, it's likely time to throw it away and replace with the good ol' spark-submit.
Why don't you use it in the first place?!
Read up on spark-submit in Submitting Applications.

Laravel Scheduler (withoutOverlapping)

I have two apps running on the same server.
Now it seems like when adding withoutOverlapping() to the scheduler job and managing the base cronjob via cron itself, these 2 apps are blocking each other in execution.
Could that be?
Yes, withoutOverlapping only works per application.
Laravel creates a file in the storage folder with a hash of the job. This way, if the file exists, Laravel knows the job is still running. The one application cannot possibly know if the other one is currently running a job because it does not have access to the storage folder of the other application.
If your code looks like the following
$schedule->command('process:queue 0')->everyMinute()->withoutOverlapping();
$schedule->command('process:queue 1')->everyMinute()->withoutOverlapping();
It is because same commands with different parameters might bc considered overlapping.
I.e. the hash of the job will consider only the command signature.

Running a custom Node script on DocPad server

Say I want to run a custom Node script on my DocPad server once a day (like a cron job), where would I put it? I can build a Node script that does stuff after an interval, I'm more curious about where to reference / run the script in the DocPad server.
A plugin is possible, though I've seen that you can require Node libraries within the DocPad configuration file so it could go in there.
Is there a suggested way to approach this?
If you're wanting something purely cron-like, probably using the docpadReady event would be the way to go, doing something like:
docpadReady: ->
require('schedule').every('2 minutes').do ->
require('safeps').spawn('your cron job')
Alternatively, maybe DocPad's regenerateEvery configuration option is suitable. This tells DocPad to regenerate every X millseconds, which will naturally call the generate events that you could hook into.
Alternatively, is there a need for these crons to run on the same server as DocPad? If not, you could do them completely separately.
A final option, is to see if your server you are deploying to supports spawning multiple files. So DocPad's Server is spawned, and so is cron, with DocPad not knowing about the cron task at all.

Working with Flask-Script and cron jobs

So I've been meaning to create a cron job on my prototype Flask app running on Heroku. Searching the web I found that the best way is by using Flask-Script but I fail to see the point of using it. Do I get easier access to my app logic and storage info? And if I do use Flask-Script, how do I organize it around my app? I'm using it right now to start my server without really knowing the benefits. My folder structure is like this:
/app
/manage.py
/flask_prototype
all my Flask code
Should I put the 'script.py' to be run by the Heroku Scheduler on app folder, the same level as manage.py? If so, do I get access to the models defined within flask_prototype?
Thank you for any info
Flask-Script just provides a framework under which you can create your script(s). It does not give you any better access to the application than what you can obtain when you write a standalone script. But it handles a few mundane tasks for you, like command line arguments and help output. It also folds all of your scripts into a single, consistent command line master script (this is manage.py, in case it isn't clear).
As far as where to put the script, it does not really matter. As long as manage.py can import it and register it with Flask-Script, and that your script can import what it needs from the application you should be fine.

Resources