Snakemake on cluster, avoid submitting one rule as a job - cluster-computing

I'm executing a snakemake pipeline using
snakemake --cluster ...
It's working nicely right now. However for reasons I won't describe here I want one of my rules not to be submitted as a job, but to be executed in the machine/node where snakemake is running. Is it possible to make such an exception?

Yes, you can mark rules as local, see the docs: http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html?highlight=Localrules#local-rules

Related

gitlab-runner - accept jobs only at certain times?

Is there a way to configure a gitlab-runner (13.6.0) to accept jobs only between certain times of the day?
For example - I would like to kick off a 'deployment' pipeline with test, build and deploy stages at any time of the day, and the test and build stages can start immediately, but I would like the final deploy stage to happen only between, say, midnight and 2am.
Thanks
GitLab Documentation describes how to use cron to trigger nightly pipelines. Additionally, there is a $CI_PIPELINE_SOURCE predefined environment variable that can be used to limit the jobs that run in a pipeline.
Using these 2 features it shall be possible to run the same pipeline in 2 different ways. The "normal" runs will be only for test/build Jobs. The "nightly" runs triggered by cron will be only for deploy job that has to check the $CI_PIPELINE_SOURCE value.
Let me know if this option fits your environment.

How can I start running server in one yml job and tests in another when run server job is still running

So I have 2 yml pipelines currently... one starts running the server and after server is up and running I start the other pipeline that runs tests in one job and once that's completed starts a job that shuts down the server from first pipeline.
I'm kinda new to yml and wondering if there is a way to run all this in a single pipeline...
The problem I came across is that if I put server to run in a first job I do not know how to condition the second job to kick off after server is running. This job doesn't have succeeded of failed condition because it's still in progress as the server has to run in order for tests to be run.
I tried adding a variable that I set to true after server is running but it still never jumps to the next job?
I looked into templates too but those are not very clear to me so any suggestion or documentation or tutorial would be very helpful on how to achive putting this in one pipeline...
I already googled a bunch and will keep googling but figured someone here might have an answer already.
Each agent can run only one job at a time. To run multiple jobs in parallel you must configure multiple agents. You also need sufficient parallel jobs.
You can specify the conditions under which each job runs. By default, a job runs if it does not depend on any other job, or if all of the jobs that it depends on have completed and succeeded. You can customize this behavior by forcing a job to run even if a previous job fails or by specifying a custom condition.
Since you have added a variable that you set to true after server is running. Then try to enable a custom condition, set that job run if a variable is xxx.
More details please kindly check official doc here:
Specify jobs in your pipeline
Specify conditions

Amazon Elastic Map Reduce: Job flow fails because output file is not yet generated

I have an Amazon EMR job flow that performs three tasks, the output from the first being the input to the subsequent two. The second task's output is used by the third task DistributedCache.
I've created the job flow entirely on the EMR web site (console) but the cluster fails immediately because it cannot find the distributed cache file - because it has not yet been created by step #1.
Is my only option to create these steps from the CLI via a boostrap action, and specify the --wait-for-steps option? It seems strange that I cannot execute a multi-step job flow where the input of one task relies on the output of another.
In the end I got around this by creating an Amazon EMR cluster that bootstrapped but had no steps. Then I SSH'd into the head and ran the hadoop jobs on the console.
I now have the flexibility to add them to a script with individual configuration options per job.

Build schedule in Jenkins

I am working on a POC currently using Jenkins as CI server. I have setup jobs based on certain lifecycle stages such as test suite and QA. I have configured these jobs to become scheduled builds based on a cron expression.
I have a request to know how to find out what the next scheduled build will be in Jenkins based on the jobs i have created. I know what was the last succesful build, the last failed but i dont know the next proposed build. Any clues!? Or is there a view plugin for this? Sorry if this is a strange request but i need to find out.
Also i need to discover if there is an issue when more than one job is running concurrently what will happen. I would have understood this is not an issue. I do not have any slaves setup, i only have the master.
Jenkins version: 1.441
I found the first issue!
https://wiki.jenkins-ci.org/display/JENKINS/Next+Executions
So can you help me on the second question please? Is there any issue with more than one job building concurrently?
Thanks,
Shane.
For the next execution date take a look at Next Execution Plugin here.
For your second question .
The number of build you can run concurrently is configurable in the jenkins server params(http:///configure : executors param).
If the number of executor is reached each new job triggered will be add in jenkins's execution queue and will be run when one running job will end

Hadoop Job Scheduling query

I am a beginner to Hadoop.
As per my understanding, Hadoop framework runs the Jobs in FIFO order (default scheduling).
Is there any way to tell the framework to run the job at a particular time?
i.e Is there any way to configure to run the job daily at 3PM like that?
Any inputs on this greatly appreciated.
Thanks, R
What about calling the job from external java schedule framework, like Quartz? Then you can run the job as you want.
you might consider using Oozie (http://yahoo.github.com/oozie/). It allows (beside other things):
Frequency execution: Oozie workflow specification supports both data
and time triggers. Users can specify execution frequency and can wait
for data arrival to trigger an action in the workflow.
It is independent of any other Hadoop schedulers and should work with any of them, so probably nothing in you Hadoop configuration will change.
How about having a script to execute your Hadoop job and then using at command to execute at some specified time.if you want the job to run regularly, you could setup a cron job to execute your script.
I'd use a commercial scheduling app if Cron does not cut it and/or a custom workflow solution. We use a solution called jams but keep in mind it's .net-oriented.

Resources